Help with using hash table in file access


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 4 of 4

Thread: Help with using hash table in file access

  1. #1
    Join Date
    Feb 2004
    Posts
    4

    Help with using hash table in file access

    Hello all, I am having a problem. I am currently writing a Java program that will read a text file and count the number of times each word appears and what line it was on. For example, a text file containing:

    Hello.
    Testing, testing.

    would result in the ouput from the code:

    Hello: 1, 1
    Testing: 2, 1

    My code so far is the following:

    //
    //
    //

    import java.io.*; // Import for file I/O methods.
    import java.util.*; // Import for StringTokenizer.
    import java.lang.*;


    public class WordCounter {

    static int count = 0; // Declare application-wide variables.
    static String output = "";
    static String line;
    static file entry[] = new file[50];
    private static final Integer ONE = new Integer(1);

    public static void main(String[] args){

    try // Try-catch block needed for file
    { // I/O exception handling.
    BufferedReader inputStream = new BufferedReader(new FileReader("test.txt"));

    // Place string from file into line identifier.
    line = inputStream.readLine();
    while (line != null){ // Keeps reading lines until
    processLine(); // Call method to handle line.
    line = inputStream.readLine(); // Read next line.
    }
    inputStream.close(); // Close inputStream.

    } // End try.
    catch(FileNotFoundException e){ // Don't worry about try-catch.
    // It is explained in CIS 121.
    System.out.println("File testfile.txt not found.");
    }
    catch(IOException e){
    System.out.println("Error reading from file testfile.txt.");
    }
    }

    // Process line of file method.
    public static void processLine(){
    String storage[] = new String[500];

    Map record = new HashMap();
    //Hashtable record = new Hashtable();
    StringTokenizer lineOfTokens = new StringTokenizer(line," \n\t.,?!,;)0123456789");
    String token; // Used to hold the value of the current token value.
    int u = 0;

    while (lineOfTokens.hasMoreTokens()){ // Loop until no more
    token = lineOfTokens.nextToken();
    entry[count] = new file(token);
    System.out.println(entry[count].getWord());
    storage[u] = entry[count].getWord();
    count++;
    u++;
    } // End while.

    Map m = new TreeMap();

    // Initialize frequency table from command line
    for (int i=0; i < storage.length; i++) {
    Integer freq = (Integer) m.get(storage[i]);
    m.put(storage[i], (freq==null ? ONE : new Integer(freq.intValue() + 1)));
    }

    System.out.println(m.size()+" distinct words detected:");
    System.out.println(m);

    } // End processLine

    }

    But it refuses to compile with the NullPointerException error. I am fairly new to Java, am I even on the right track with this? I would greatly appreciate any help. Thanks. 8)

  2. #2
    Join Date
    Dec 2002
    Posts
    83
    I'm not getting to a Null Pointer, there's a few problems before that I think.

    Code:
    static file entry[] = new file[50];
    Why do you have file in all lower case here, I assume you want a File (with the capital F)? That shows up later in the code too. Next question is:
    Code:
      System.out.println(entry[count].getWord());
    storage[u] = entry[count].getWord();
    If entry[count] is a File, there is no getWord() method. If you have your own "file" class, post that too.

    I'm not sure why you're making a File out of each token, but I'll wait to analyse the rest until I find out about the above.
    -- Steven

  3. #3
    Join Date
    Dec 2002
    Posts
    83
    Since you are on the right track, I went ahead and changed a few things around to provide an example of how I might approach this. It doesn't address the distinct lines each word appears on though, which I think is part of what you need

    The general ideas I added/changed:
    - Changed String line to be a local variable that is passed from main() into processLine(String line). Class level variables should be used only when necessary
    - Created an instance of WordCounter in main() to work with. This way the methods don't have to be static. I assume you made processLine() static because the compiler tells you you can't call non-static methods from static ones (main()). Instead,
    - I'm using one HashMap as a Class variable to hold the words as keys and their counts the corresponding values.
    - Added a printResults() method to navigate the HashMap and print the results.
    Code:
    import java.io.*; // Import for file I/O methods.
    import java.util.*; // Import for StringTokenizer.
    import java.lang.*;
    
    public class WordCounter {
    
    	private HashMap wordHash = new HashMap();
    	private static final Integer ONE = new Integer(1);
    
    	public static void main(String[] args) {
    		
    		WordCounter wordCounter = new WordCounter();
    		
    		try // Try-catch block needed for file
    			{ // I/O exception handling.
    			BufferedReader inputStream =
    				new BufferedReader(new FileReader("test.txt"));
    
    			// Place string from file into line identifier.
    			String line = inputStream.readLine();
    			while (line != null) { // Keeps reading lines until
    				wordCounter.processLine(line); // Call method to handle line.
    				line = inputStream.readLine(); // Read next line.
    			}
    			inputStream.close(); // Close inputStream.
    
    			wordCounter.printResults();
    
    		} // End try.
    		catch (FileNotFoundException e) { // Don't worry about try-catch.
    			// It is explained in CIS 121.
    			System.out.println("File testfile.txt not found.");
    		} catch (IOException e) {
    			System.out.println("Error reading from file testfile.txt.");
    		}
    	}
    
    	// Process line of file method.
    	public void processLine(String line) {
    
    		StringTokenizer lineOfTokens =
    			new StringTokenizer(line, " \n\t.,?!,;)0123456789");
    		String token; // Used to hold the value of the current token value.
    		Integer cnt = new Integer(0);
    
    		while (lineOfTokens.hasMoreTokens()) { // Loop until no more
    			token = lineOfTokens.nextToken();
    			if (wordHash.containsKey(token)) {
    				cnt = (Integer)wordHash.get(token);
    				wordHash.put(token,new Integer((cnt.intValue())+1));
    			}
    			else {
    				wordHash.put(token,ONE);
    			}
    						
    
    		} // End while.
    
    
    	} // End processLine
    
    	// Display words found with their counts
    	public void printResults() {
    	
    		System.out.println(wordHash.size() + " distinct words detected: \n");
    
    		String key = null;
    		Integer value = null;
    		ArrayList keys = new ArrayList(wordHash.keySet());
    		Collections.sort(keys); // sorts alphabetically
    		Iterator iter = keys.iterator();
    		
    		while (iter.hasNext()) {
    			key = (String)iter.next();
    			value = (Integer)wordHash.get(key);
    			System.out.println(key + " appears "+ value + " times.");
    		} 
    	}
    		
    }
    Now, I'm using a lot of the Java Collection classes... you had arrays (entry[] etc) so in your class you might not have gotten to the more advanced Collections and therefore might not want to use them unless you truly understand them. Teachers tend to get suspicious at things like that.

    Like I said, I just did this as an example/boost. Let me know if you have questions on your original code, which is forming up pretty well.
    -- Steven

  4. #4
    Join Date
    Feb 2004
    Posts
    808
    i agree with Steven's propsal, that you should use an array to hold every token on the string.. just pull a token, check if its in the hash.. if it is (most likely case), add it, else if not, add a 1 in there

    it's a downer that Integer is not mutable (mutable = changeable) so a new one must be made and inserted over the top of the old one.. this line would be much nicer if it was:

    cnt = (Integer)wordHash.get(token);
    wordHash.put(token,new Integer((cnt.intValue())+1));

    would be:
    ((Integer)wordHash.get(token)).increment(1);



    anyways..

    this application came up on codeguru and there was a discussion about efficiency in terms of using containsKey() to check.. some people argued that we could simply try and get() the token, no matter what. If the result was null, the token should be added, otherwise incrememnted.

    I personally felt that containsKey was more readable ans self-documenting therefore better for academic presentation, but the other method was more efficient in terms of hash lookups..
    The 6th edict:
    "A thing of reference thing can hold either a null thing or a thing to any thing whose thing is assignment compatible with the thing of the thing" - ArchAngel, www.dictionary.com et al.
    JAR tutorial GridBag tutorial Inherited Shapes Inheritance? String.split(); FTP?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center