I want to create a text file from the group of a documents. The text file should be organized into three colums where each row contains the document index, the word index, and the word count. For example:
1 2 10
1 3 4
2 2 6
should be read as "word 2 occurs 10 times in doc 1, word 3 occurs 4 times
in doc 1, and word 2 occurs 6 times in doc 2".
01-18-2007, 02:24 PM
As far as I can see you want to:
1. Create a list of all the words any of the files contain - this is how word 2 will mean something.
2. you want to count the number of occurences of every word in the files.
Obviously you have to open the files and read the data from them.
In a list you have to sotre the words that you have in your so called vocabulary(the different words in the file).
Having the vocabulary you have to walk through the files one by one and create a map per file that holds the word as a key and the number of occurences of that file.
Finally you have to print the contents of the maps in the result file:
every map holds the results for a file, every word(map key) has a number that is the position of the word in the vocabuary and finaly you print the number of occurences.
As probably you see all of this can be done in a single pass - single walk through the files:
While reading from the files one by one you can make your vocabuary grow and in the same time make a single map hold tha number of occurences for the current file. At the end of the end of the file you write the results for the file. Then you open the next file, clear the map only and do the same.
I hope I made it clear enough.
If you do not know how to open files and read data - read a book about java, please :)