Question


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 2 of 2

Thread: Question

  1. #1
    sunny Guest

    Question


    I had a couple of questions. I'll explain what I have in mind first and then
    ask the questions.
    I'm working on a search engine for my group which searches in a particualar
    directory set which has Word Documents and HTML Files. I made Visual Basic
    app that converted all the Word Documents into TEXT and HTML files and looks
    for new files at a certain period. I made another VB app that indexes all
    the TEXT files. By that I mean, it goes thru the text file and counts the
    occurrence of every word ignoring certain words and characters. My original
    plan was to put every word in a text file with the name of the file being
    the first letter of the word. For example, words like, hello, hi, harry
    would go in the file named h.txt. To develop the actual backend to the search
    engine, I thought about doing the following. Whenever someone enters a query,
    I would simply go to the file and find out where and how many times the word
    occurs and arrange the files in the order where the word(s) occur most.
    I also wanted to the search using the LIKE operator always. Meaning, if
    they entered hello, I want to give results on ****, helloword, sayhello,
    etc..
    One problem I didn't think about before was being able to get the word "sayhello"
    since it would be in a different file. So the solution I came up with is
    to just have one text file and put all the words in there.

    Here are my questions:
    - The backend of the search which actually looks through the files and finds
    out the search result - should it be done in JSP, Servlets, EJBs or any other
    technology? Which one is easier to implement my logic in?
    - Instead of having more than 26 files( 1 for each alphabet) or more, if
    I make one file, would it somehow affect the run time in how long it gives
    back the results?
    - Is it better to just do one file to implement the LIKE operator instead
    of going thru all 26 files to find words like the user entered?

    Please clarify anything if need be to answer my questions.

    Thanks.


  2. #2
    Paul Clapham Guest

    Re: Question

    I would have used a database, rather than a collection of text files, to
    store that data. Then not only do you not have to do much programming to
    implement the LIKE operator, the database will probably work faster than
    your program would.

    "sunny" <java.@127.0.0.1> wrote in message news:3d233275$1@10.1.10.29...
    >
    > I had a couple of questions. I'll explain what I have in mind first and

    then
    > ask the questions.
    > I'm working on a search engine for my group which searches in a

    particualar
    > directory set which has Word Documents and HTML Files. I made Visual

    Basic
    > app that converted all the Word Documents into TEXT and HTML files and

    looks
    > for new files at a certain period. I made another VB app that indexes all
    > the TEXT files. By that I mean, it goes thru the text file and counts the
    > occurrence of every word ignoring certain words and characters. My

    original
    > plan was to put every word in a text file with the name of the file being
    > the first letter of the word. For example, words like, hello, hi, harry
    > would go in the file named h.txt. To develop the actual backend to the

    search
    > engine, I thought about doing the following. Whenever someone enters a

    query,
    > I would simply go to the file and find out where and how many times the

    word
    > occurs and arrange the files in the order where the word(s) occur most.
    > I also wanted to the search using the LIKE operator always. Meaning, if
    > they entered hello, I want to give results on ****, helloword, sayhello,
    > etc..
    > One problem I didn't think about before was being able to get the word

    "sayhello"
    > since it would be in a different file. So the solution I came up with is
    > to just have one text file and put all the words in there.
    >
    > Here are my questions:
    > - The backend of the search which actually looks through the files and

    finds
    > out the search result - should it be done in JSP, Servlets, EJBs or any

    other
    > technology? Which one is easier to implement my logic in?
    > - Instead of having more than 26 files( 1 for each alphabet) or more, if
    > I make one file, would it somehow affect the run time in how long it gives
    > back the results?
    > - Is it better to just do one file to implement the LIKE operator instead
    > of going thru all 26 files to find words like the user entered?
    >
    > Please clarify anything if need be to answer my questions.
    >
    > Thanks.
    >




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center