Help!!, Changing from uppercase to lowercase


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 5 of 5

Thread: Help!!, Changing from uppercase to lowercase

  1. #1
    Join Date
    Mar 2005
    Posts
    1

    Help!!, Changing from uppercase to lowercase

    I am trying to read in a file and change the first letter of the first word to uppercase and then change the second word to upperCase and so on. Also, I need to delete all the spaces in between. Example:

    Contol Measure Mapping (needs to look like) controlMeasureMapping

    If anyone can assist me, I would really appreciate it. I will provide the code where I am stuck below.




    import java.io.*;
    import javax.swing.*;


    class BufferReaderDemo {

    public static void main(String args[]) {

    try {
    FileReader fr = new FileReader("C:\\Documents and Settings\\rwhitehurst\\Desktop\\Test.txt");
    BufferedReader br = new BufferedReader(fr);

    String s;

    int ToCharArray[] = new int[4];
    String output = "Index\tValue\n";

    while((s = br.readLine()) != null){
    if (s.length() > 0)
    {

    }
    System.out.println(s);
    }

    fr.close();

    }
    catch(Exception e) {
    System.out.println("Exception: " + e);
    }
    }
    }

  2. #2
    Join Date
    Mar 2005
    Location
    Sendling, MUC, .de
    Posts
    100
    **** ****! I thought a solution to this one would relate nicely to another thread here about "parsing binary files". Well, assuming good will on the reader's side, it does. On the other hand, I must admit that I underestimated the perils of capitalization by far...!

    Ok, preliminaries done. I found that the most elegant solution would be a really reusable and transparent one. So: wouldn't it be cool to simply wrap another Reader around your FileReader and out of it comes just what you want? Here it is:
    Code:
    import java.io.InputStreamReader;
    import java.io.BufferedReader;
    import java.io.FilterReader;
    import java.io.Reader;
    import java.io.IOException;
    
    /**
      @author  meisl
      */
    public class CapitalizingReader extends FilterReader {
    
      // whitespace due to java.util.regex
      static final String WHITESPACE  = " \t\n\f\r" + (char)0x0B;
    
      // a collection of what my humble self considers a word delimiter (besides WHITESPACE)
      static final String PUNCTUATION = "!\"$%&/()=?{[]}\\#'~+-*/,;.:<>|^";
    
      /* state variable indicating whether the next (non-whitespace) character forms
       * the beginning of a word (true) and thus is to be capitalized or not (false)
       */
      private boolean nextToCapitalize = true;
    
    
      protected CapitalizingReader(Reader in) {
        super(in);
      }
    
      private boolean isWhitespace(int x) {  // what is considered whitespace
        return WHITESPACE.indexOf( (char)x ) >= 0;   // in java.util.regex
        // there is also
        //return Character.isWhitespace( (char)x );  // maybe that's better...
        // and as of java 1.5, we have
        //return Character.isWhiteSpace( x );  // note: no cast, can handle "supplementary characters"
      }
    
      private boolean isPunctuation(int x) {  // after that, letters will be capitalized too
        return PUNCTUATION.indexOf( (char)x ) >= 0;
      }
    
      public int read() throws IOException {
        int x = super.read();
        if (x<0) return -1;  // if there's nothing more to read, exit
    
        if (nextToCapitalize) {  // if we expect a new word to begin
    
          while (isWhitespace(x)) {  // consume whitespace
            x = super.read();    // we read again a character so it's possible that
            if (x<0) return -1;  // there's nothing more to read. If so, exit.
          }
          // capitalize it
          switch (x) {
            case '':  // with that german Umlaut, Character.toUpperCase(char)
            case '':  // returns '?' or '', respectively. ?!!
              x = '';
              break;
            case '':  // ...and for this one it simply does not capitalize it
              x = '';
            case '':  // ...same here...
              x = '';
            default:
              x = Character.toUpperCase( (char)x ); // maybe better use Character.toTitleCase()??
          }
          // now that we've read the first char of a word
          // the next ones will be within that word (thus NOT to be capitalized)
          nextToCapitalize = isPunctuation(x);  // ... unless it was a (printable) word-separator
    
        } else {  // if we are within a word
    
          if (isPunctuation(x)) {        // words end with punctuation, so the next
            nextToCapitalize = true;         // non-whitespace is to be capitalized
          } else if (isWhitespace(x)) {  // words also end with whitespace, so the next
            nextToCapitalize = true;         // non-whitespace is to be capitalized
            x = read();  // this next non-ws we can most easily get this way
          }
    
        }
    
        return x;
      }
    
      /** Simply refers all the work to {@link #read()}; probably not the
        * most efficient way...
        */
      public int read(char[] cbuf, int off, int len) throws IOException {
        for (int i=off; i<len; i++) {
          int x = read();
          if (x<0) return i-off;
          cbuf[i] = (char)x;
        }
        return len;
      }
    
      public static void main(String[] args) {
        System.out.println( "This is the unbelievably funny CapitalizingReader.");
        System.out.println( "  It consumes whitespace (including line breaks) & capitilizes word beginnings." );
        System.out.println( "  After one of the punctuation characters " + PUNCTUATION );
        System.out.println( "  a new word is assumed to begin as well." );
        System.out.println( "Type in a line, press <RETURN> and see what CapitalizingReader made of it." );
        System.out.println( "To close the STDIN stream, type Ctrl-Z." );
        System.out.println( "NOTE: Since line breaks are printed BY THE CONSOLE as you type but");
        System.out.println( "      get consumed by CapitalizingReader, your next input starts" );
        System.out.println( "      IMMEDIATELY after the last output. The fact that one cannot" );
        System.out.println( "      'repair' this behaviour is not a bug but a feature since it " );
        System.out.println( "      proves CapitalizingReader correct wrt the specification." );
        System.out.println( "--------------");
        try {
          Reader in = 
            new CapitalizingReader(    // also try without it!
              new BufferedReader(
                new InputStreamReader(System.in)
              )
            )
          ;
          int x = in.read();  // System.in ain't ready unless you read a first character, don't know why...
                              // System.in.read() rather blocks / System.in.ready() says false, resp.
                              // until the first <RETURN> is typed.
          System.out.println(">>>>>>got a first character!!!");  // try typing <RETURN> as the first input!
          System.out.print( (char)x );
          while (in.ready()) {
            x = in.read();
            System.out.print( (char)x );
          }
          in.close();
        } catch ( IOException ioE) {
          ioE.printStackTrace();
        }
      }
    
    }
    Although I commented it massively throughout, there still remain some sophisticated things to say. Should there be interest, I'll come up with 'em (ie. talk about DFAs and how they led me to this).

    Only this:
    - see main() how to employ it (it can be run itself, try it)
    - for fun, I have added one more requirement, look for PUNCTUATION
    - apropos: I'm quite sure I got your spec, but aren't there at least 2 things in your example that are confusing? And what about the title of the thread?
    - if it's to be applied to large input, read(char[], int, int) should be redone
    - the german Umlaute and are not capitalized although I thought that I had taken precaution...
    - character encodings and localization aren't trivial, THAT I have learned...
    - I'm not sure about the correct order of the wrappings, ie. where to put BufferedReader
    - the name CapitalizingReader is not quite correct since it consumes the whitespace, too
    - I have not used any features beyond 1.3, especially java.nio. Maybe there's a performance penalty due to this.

    Hmm, now that it's done and due, I cannot keep quiet about the quick and dirty way:
    You could simply String.split() the whole input with the regex "\s", then upperCase() every first character and finally concatenate the pieces. Well, of course there's a reason why I didn't do so:
    - it'd work only with Strings, not StringBuffers or -Builders - that COSTS
    - I don't see a way to fulfill my additional PUNCTUATION-requirement with reasonable effort
    - it's not as cool as extending FilterReader
    - DFAs and the idea behind were hidden in the evaluation of the regular expression. At last, it was the former of which I wanted to illustrate the benefit. Even if I doubt my success in that now...

    One last thing: plz enclose your code in the tags [CODE]...[/CODE]

    --
    p.s.: Not really subito, but rather quickly, wasn't it?
    Last edited by meisl; 03-24-2005 at 07:00 PM.

  3. #3
    Join Date
    Nov 2004
    Location
    Norway
    Posts
    1,560

    This one leaves at least one blank, and it's generic I suppose...

    Code:
    /**
     * Generic (?) capitalizer.
     * @author sjalle
     * @version 1.0
     */
    
    import java.io.*;
    
    public class Capitalizer {
      static final String PUNCTUATION = "!\"$%&/()=?{[]}\\#'~+-*/,;.:<>|^";
      static final String WHITESPACE  = " \t\n\f\r" + (char)0x0B;
      private StringBuffer sb=new StringBuffer();
      private byte [] buf=null;
      private boolean setCap=false;
      /**
       * Three diferent ways to use, nice...
       * @param s
       */
      public Capitalizer (String s) {
        buf=s.getBytes();
      }
      public Capitalizer (byte [] b) {
        buf=b;
      }
      public Capitalizer (InputStream in) throws IOException {
        int n=in.available();
        if (n==0) return;
        buf=new byte[n];
        in.read(buf);
        in.close();// perhaps ....
      }
      /**
       * Do the stuff
       * @return
       */
      public String capitalize() {
        boolean inBlank=false;
        sb.setLength(0);
        for (int i=0; i<buf.length; i++) {
          char c=(char)buf[i];
          if (c==' ' || WHITESPACE.indexOf(c) >= 0) {
            if (inBlank) continue;
            inBlank=true;
            setCap=true;
          } else if (PUNCTUATION.indexOf(c) >= 0) {
            sb.append(c);
            setCap=true;
            continue;
          } else {
            inBlank=false;
          }
          if (setCap) {
            c=(char)new String(new char[]{c}).toUpperCase().getBytes()[0];
            if (c!=' ') setCap=false;
          }
          sb.append(c);
        }
        return sb.toString();
      }
      /**
       * ******************************' MAIN ******************************
       * @param args
       */
      public static void main (String [] args) {
        Capitalizer ct=new Capitalizer("try this $one      \t \t \tfirst%then a file");
        String s=ct.capitalize();
        try {
          FileInputStream in = new FileInputStream("c:\\tmp\\classes_tut.txt");
          ct = new Capitalizer(in);
          s = ct.capitalize();
          System.out.println(s);
        }
        catch (IOException ex) {
          ex.printStackTrace();
        }
      }
    }
    Last edited by sjalle; 03-24-2005 at 08:36 PM.
    eschew obfuscation

  4. #4
    Join Date
    Mar 2005
    Location
    Sendling, MUC, .de
    Posts
    100
    Yeah, that's how I like it!

    Well, somewhat shorter than mine (while less commented). I agree in that a "generic" solution should operate on arrays (rather of chars I think than of bytes). However, I don't see a connection to generics in the Tiger's sense.

    Now what's it doing? And how?
    Due to the heading of your post I suppose that you meant it to collapse any sequence of whitespace characters to the first whitespace character of that sequence instead of consuming the sequence in whole, right? Besides that, of course, to do the capitaliziation according to the requirements I have assumed above - which includes the first capitalizable character in the input to be capitalized.
    As mentioned, I thought this would well serve as an expample problem for illustrating how thinking of finite automata could help you develop an algorithm. The machine model I had in mind is extremely limited and cannot track back on the input, ie. cannot re-read a character. To emphasize this, I chose to implement read() which gets the current char by super.read(). Of course there is no backtracking in your algorithm either, sjalle.
    Anyways, I'd like to take the chance and throw my DFA-stuff in, using it to show that Capitalizer does not meet its (supposed) spec.; hope you're ok with that, aren't you, sjalle?
    In the attachment you see the DFA that is implemented by sjalle's code. For ease of notation, the input and output alphabet are assumed to consist only of four symbols respectively; namely w (representing whitespace), p (punctuation), x (lower case letter) and X (upper case letter). Please excuse this slight abstraction which, in addition, does not take into account non-capitalizable characters that aren't punctuation either (eg. upperCase('1')=='1').

    <<attached image should have shown up here>>

    As the states of the automaton I regard the four possible valuations of the boolean variables inBlank and setCap, and each pass of the for-loop is considered one transition, each depicted by an arrow (there are, however, situations where the variables change more than once within a single pass. These were left implicit for simplicity). The initial state is the upper left. Which transition to take depends on the input character that precedes the -> (a comma meaning OR) in the label of the transition arrow; the output character of a transition is shown after the -> (where "" means epsilon, ie. no output at all). Please counter-check it equivalent to sjalle's code by going through all of the 4x3 combinations (four valuations times three classes of input: 'w', 'p' and 'x'/'X').
    The problematic transition is shown in red.
    Maybe there's someone out there to whom this kind of viewing things is new and who can benefit from it when chasing the bugs...

    One last complaint:
    I don't get the point of
    Code:
    if (c==' ' || WHITESPACE.indexOf(c) >= 0) {
    since WHITESPACE already contains a space.


    Ok, to finish this up, I'd like to see here a solution that combines the best out of the two approaches AND scales acceptably with larger input. Everybody's invited to put together comprehensive requirements for that and - thereafter - post a solution.
    And how about a similar analysis of the algorithm I have posted? (Should contain only 2 states).

    --
    p.s.: Looking at the DFA, it's nearly trivial to pin down the slight modification to Capitalizer that is needed to make it meet its spec. as given above. As a variation of the theme: how would it have to be modified to work as specified except for that any whitespace before the first non-whitespace and any whitespace after the last non-whitespace gets consumed completely (ie. as if you put String.trim() around the result)?
    Attached Images Attached Images
    Last edited by meisl; 03-28-2005 at 08:28 PM.

  5. #5
    Join Date
    Mar 2005
    Location
    Sendling, MUC, .de
    Posts
    100
    Hmm, not too many fans of automata theory out there...

    Whatever, I tried to hint that sjalle's automaton is not minimal either. There are (diagrams of) five automata in the attachment, answering the questions I had posed above plus the minimal (without proof) automaton without and with trim(). I slightly changed notation but that shouldn't be a problem.

    Also I'd like to add that the state names for the minimal automata should rather read something like STATE_0, STATE_1, STATE_2, since the point of all this is: when "thinking automaton", you do not have to bother what exactly is the semantics of your variable eg "afterP" to figure out if it does what you want but rather can make explicit directly what the whole thing should do. Or, in other words: it's a more concise yet formal way to "get the whole picture"; the formalism making it much more straight-forward to prove things on than on code in some specific programming language.
    Translating this into program code is then nearly trivial and can even be automated while still producing efficient code.

    Write down the code for the minimal automata and you'll see.
    Attached Images Attached Images
    Last edited by meisl; 04-05-2005 at 01:23 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center