I want to create a text file from the group of a documents. The text file should be organized into three colums where each row contains the document index, the word index, and the word count. For example:
1 2 10
1 3 4
2 2 6
should be read as "word 2 occurs 10 times in doc 1, word 3 occurs 4 times
in doc 1, and word 2 occurs 6 times in doc 2".
As far as I can see you want to:
1. Create a list of all the words any of the files contain - this is how word 2 will mean something.
2. you want to count the number of occurences of every word in the files.
Obviously you have to open the files and read the data from them.
In a list you have to sotre the words that you have in your so called vocabulary(the different words in the file).
Having the vocabulary you have to walk through the files one by one and create a map per file that holds the word as a key and the number of occurences of that file.
Finally you have to print the contents of the maps in the result file:
every map holds the results for a file, every word(map key) has a number that is the position of the word in the vocabuary and finaly you print the number of occurences.
As probably you see all of this can be done in a single pass - single walk through the files:
While reading from the files one by one you can make your vocabuary grow and in the same time make a single map hold tha number of occurences for the current file. At the end of the end of the file you write the results for the file. Then you open the next file, clear the map only and do the same.
I hope I made it clear enough.
If you do not know how to open files and read data - read a book about java, please
Last Post: 02-14-2006, 05:36 AM
By hcadieu in forum VB Classic
Last Post: 02-14-2006, 01:39 AM
By barbarosa80503 in forum VB Classic
Last Post: 10-28-2005, 04:33 PM
By James World in forum .NET
Last Post: 08-13-2001, 05:22 PM
Last Post: 08-12-2001, 04:59 PM
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL