Keyword Scanning please help !!!


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 5 of 5

Thread: Keyword Scanning please help !!!

Hybrid View

  1. #1
    Join Date
    May 2005
    Location
    Malaysia
    Posts
    9

    Keyword Scanning please help !!!

    Hi all,

    I am doing my project, which is a web browser with filtering function, using Java.

    Now I am implementing the keyword scanning function, where if that are some keyword (like bad words) appear too much on a web page (let's say 5 times), then the webpage will be blocked.

    I try to use the .length method to catch the words that appear in a web page, but it seems not scan through the words which appear in a webpage.

    In my code, the class for me to implement the website recognizing and blocking function is in 'dcToolBar'.

    Ah, forgot to ask 1 thing, how come the way my browser display the webpage just like when the webpage is not loading properly, with lots of strange code on the page itself?

    Is anyone here have the idea on how to implement this such function? Please teach me if that are any, THANKS !!!
    Attached Files Attached Files

  2. #2
    Join Date
    Aug 2003
    Posts
    313
    Lets say that you have a file "badwords.txt" and another file "input.txt". If you want to count the number of occurances of any word in badwords.txt in input.txt. Then you can do the following:
    Code:
    protected String[] readArray(File file) {
      BufferedReader in = new BufferedReader(new FileReader(file));
      SortedSet<String> words = new TreeSet<String>();
      String word;
      while( (word = in.readLine()) != null ) {
        words.add(word);
      }
      return words.toArray();
    }
    
    public int countWords(String[] badWords, BufferedReader input) {
      String line;
      int count = 0;
      while( (line = input.readLine()) != null ) {
        // you should put punctuation and space delimiters in the string
        StringTokenizer st = new StringTokenizer(line," .,/\\!@#$%^&*()_-+=", false); 
        while( st.hasMoreTokens() ) {
          String token = st.nextToken();
          if( Arrays.binarySearch(badWords, token) != -1 ) {
            count++;
          }
        }
      }
      return count;
    }
    You would then call something like this:
    Code:
    String[] badWords = readArray(new File("badwords.txt"));
    BufferedReader input = new BufferedReader(new FileReader(new File("input.txt")));
    int count = countWords(badWords, input);
    // count is the number of occurances of any word in badwords in the file.
    Hope this helps.
    ~evlich

  3. #3
    Join Date
    May 2005
    Location
    Malaysia
    Posts
    9
    Hi evlich,

    I had tried out to do, now I can scan and save the text from a webpage into a temporary file for comparison, but here comes the problems: IT ONLY SAVE 1 LINES OF TEXT into the text file !!!

    I have no idea why it can only read and scan 1 line of text, I will need it to scan and save the ENTIRE body text from the webpage so that I can compare it with to find out keyword (the keyword are stored in another text file Keyword.txt, I will use Java I/O to read from it and do comparison)

    Please please teach me if that's any solution for this, THANKS !!!
    Attached Files Attached Files

  4. #4
    Join Date
    Aug 2003
    Posts
    313
    This looks a little fishy:
    Code:
    while ((str = inR.readLine()) != null) {
      // str is one line of text; readLine() strips the newline character(s)
      buffWrite=new BufferedWriter(new FileWriter("Website.txt")); 
      buffWrite.write(str);
      buffWrite.flush(); 
    }
    You don't want to open a new buffered writer every line. Try:
    Code:
    buffWrite=new BufferedWriter(new FileWriter("Website.txt")); 
    while ((str = inR.readLine()) != null) {
      // str is one line of text; readLine() strips the newline character(s)
      buffWrite.write(str);
      buffWrite.flush(); 
    }
    ~evlich

  5. #5
    Join Date
    May 2005
    Location
    Malaysia
    Posts
    9
    hi evlich,

    I had improve my program with your example, it's running properly now but it have some problem with the function, the keyword filtering still not work properly. I try out a target webpage but that page still displaying...

    I set the keyword in my Keyword.txt with this word "car" and then try to access this webpage: http://db.gamefaqs.com/console/ps2/...t_auto_sa_h.txt, which content a lot of word "car" ( I checked already). But at the end the page still displaying...

    It suppose to save a webpage into the 'Website.txt' (I use BufferReader and BufferWritter), and compare it with the content of 'Keyword.txt'. IF the keyword are match more than 5 times then the website will be blocked.

    I implement this function in my 'dcToolBar' class, I have no idea why it's not working because the code seems very logic already....

    Can someone please teach me some solution, I really cry out due to this problem, PLEASE HELP ME !!!
    *crying*
    Attached Files Attached Files

Similar Threads

  1. The 'internal' keyword.
    By Sridhar Mahadevan in forum .NET
    Replies: 6
    Last Post: 10-02-2002, 06:59 PM
  2. Replies: 1
    Last Post: 04-17-2002, 01:49 PM
  3. Sealing methods must include the override keyword, why ?
    By Louis-Philippe Carignan in forum .NET
    Replies: 2
    Last Post: 02-18-2002, 04:16 PM
  4. Rehashing NEW keyword a SECOND time, right NOW...
    By Larry Serflaten in forum .NET
    Replies: 64
    Last Post: 06-22-2001, 09:00 PM
  5. multiple keyword search to display to data grid control
    By Will Storer in forum VB Classic
    Replies: 1
    Last Post: 04-19-2000, 05:27 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center