I had a couple of questions. I'll explain what I have in mind first and then
ask the questions.
I'm working on a search engine for my group which searches in a particualar
directory set which has Word Documents and HTML Files. I made Visual Basic
app that converted all the Word Documents into TEXT and HTML files and looks
for new files at a certain period. I made another VB app that indexes all
the TEXT files. By that I mean, it goes thru the text file and counts the
occurrence of every word ignoring certain words and characters. My original
plan was to put every word in a text file with the name of the file being
the first letter of the word. For example, words like, hello, hi, harry
would go in the file named h.txt. To develop the actual backend to the search
engine, I thought about doing the following. Whenever someone enters a query,
I would simply go to the file and find out where and how many times the word
occurs and arrange the files in the order where the word(s) occur most.
I also wanted to the search using the LIKE operator always. Meaning, if
they entered hello, I want to give results on ****, helloword, sayhello,
One problem I didn't think about before was being able to get the word "sayhello"
since it would be in a different file. So the solution I came up with is
to just have one text file and put all the words in there.
Here are my questions:
- The backend of the search which actually looks through the files and finds
out the search result - should it be done in JSP, Servlets, EJBs or any other
technology? Which one is easier to implement my logic in?
- Instead of having more than 26 files( 1 for each alphabet) or more, if
I make one file, would it somehow affect the run time in how long it gives
back the results?
- Is it better to just do one file to implement the LIKE operator instead
of going thru all 26 files to find words like the user entered?
Please clarify anything if need be to answer my questions.