How to get rid of SGML tags?


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 2 of 2

Thread: How to get rid of SGML tags?

  1. #1
    Join Date
    Nov 2005
    Posts
    37

    How to get rid of SGML tags?

    I have a file in SGML format, whose source code is as follow:

    <!DOCTYPE lewis SYSTEM "lewis.dtd">
    <REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="5544" NEWID="1">
    <DATE>26-FEB-1987 15:01:01.79</DATE>
    <TOPICS><D>cocoa</D></TOPICS>
    <PLACES><D>el-salvador</D><D>usa</D><D>uruguay</D></PLACES>
    <TEXT>
    <TITLE>BAHIA COCOA REVIEW</TITLE>
    <DATELINE> SALVADOR, Feb 26 - </DATELINE><BODY>Showers continued throughout the week in
    the Bahia cocoa zone, alleviating the drought since early
    January and improving prospects for the coming temporao,
    although normal humidity levels have not been restored,
    Comissaria Smith said in its weekly review.
    Reuter
    </BODY></TEXT>
    </REUTERS>

    What I am trying to do is to get rid of all the stuffs embraced in < >, and then write the result in a text file, hence realise 'sgm-txt' conversion. Could anybody teach me how to do this is Java?
    Last edited by WXY595; 01-19-2006 at 08:39 PM.

  2. #2
    Join Date
    Nov 2004
    Location
    Norway
    Posts
    1,560
    Code:
    import java.io.*;
    
    public class SGML2Txt {
      private String filePath=null;
      public SGML2Txt(String filePath) {
        this.filePath=filePath;
      }
      public void convert () throws IOException {
        String baseName=filePath.substring(filePath.lastIndexOf("\\")+1);
        String path=filePath.substring(0,filePath.lastIndexOf("\\"));
        String newBaseName=baseName.substring(0,baseName.indexOf("."))+".txt";
        String newFilePath=path+"\\"+newBaseName;
        FileInputStream in=new FileInputStream(filePath);
        FileOutputStream out=new FileOutputStream(newFilePath);
      
        int n=-1;
        boolean inBrackets=false;
        while ((n=in.read())!=-1) {
          char c=(char)n;
          if (c=='<' || c=='>') {
            inBrackets = (c=='<');
            out.write(' ');
            continue;
          }
          if (inBrackets) continue;
          out.write(n);
        }
        in.close();
        out.close();
        System.out.println("File: "+newFilePath+" created");
      }
      public static void main(String[] args) {
        SGML2Txt s2t = new SGML2Txt("c:\\tmp\\data.sgml");
        try {
          s2t.convert();
        }
        catch (IOException ex) {
          ex.printStackTrace();
        }
      }
    
    }
    eschew obfuscation

Similar Threads

  1. getting rid of window buttons
    By airrazor in forum Java
    Replies: 3
    Last Post: 11-06-2005, 04:51 PM
  2. How do I get rid of the picture on my form?
    By BOOGIEMAN in forum VB Classic
    Replies: 2
    Last Post: 04-13-2005, 01:54 AM
  3. How to get rid of these NDRs?
    By David in forum Enterprise
    Replies: 1
    Last Post: 03-22-2002, 01:55 PM
  4. How to get rid of the password window?
    By Dan-Yeung in forum Enterprise
    Replies: 1
    Last Post: 11-05-2001, 12:39 PM
  5. how do i get rid of that gray box?
    By scott in forum Java
    Replies: 0
    Last Post: 05-02-2001, 02:36 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center