DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 2 of 2

Threaded View

  1. #1
    Join Date
    Nov 2005
    Posts
    37

    How to read SGML files using Java

    I've got a text categorisation test collection called Reuters-21578 for my Information Retrieval project. It is distributed in 22 files. Each of the first 21 files (reut2-000.sgm through reut2-020.sgm) contains 1000 documents, while the last (reut2-021.sgm) contains 578 documents. The files are in SGML format. Each of the 22 files begins with a document type declaration line:
    <!DOCTYPE lewis SYSTEM "lewis.dtd"> The DTD file lewis.dtd is included in the distribution. Following the document type declaration line are individual Reuters articles marked up with SGML tags.

    My questions is how to write a java program to read those 21578 documents or transform them into 21578 seperated text files.
    Last edited by WXY595; 01-16-2006 at 10:48 AM.

Similar Threads

  1. Utility to Read Log Files...
    By Gary Furash in forum Database
    Replies: 0
    Last Post: 03-14-2003, 11:59 AM
  2. DevX does seem one sideded
    By Rob Abbe in forum Talk to the Editors
    Replies: 44
    Last Post: 01-13-2003, 02:57 PM
  3. Has Sun Given Up on the Desktop?
    By Lori Piquet in forum Talk to the Editors
    Replies: 114
    Last Post: 10-10-2002, 06:01 AM
  4. .NET vs. Enterprise Java: Who's Got Better Security?
    By Glen Kunene in forum Talk to the Editors
    Replies: 17
    Last Post: 03-23-2002, 12:43 AM
  5. Non Blocking read in consol Java app
    By Akhilesh Mritunjai in forum Java
    Replies: 1
    Last Post: 03-27-2000, 11:21 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


Top DevX Stories

Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL


Sponsored Links