Getting the content of HTML tags with java
Hello... Two questions, really.
First off, I have a java Document object representing an HTML page, say,
And I want to, with my program, get the text between the <title> </title> tags, so that I can know what the page's title is.
<title>Dumb HTML File</title>
<a href="http://tacorner.com/tsunami/dumb.html">Dumb link to self</a>
What I CAN do so far is get an Element object (javax.swing.text.Element) representing the <title> tag. I just cannot get the text between the <title> and </title> tags. I've spent a modest three hours digging through the API and trying stuff to no avail.
How do I access text between html tags with java? Is it possible with the Document interface and javax.swing.text.Element? If not, what do I need to do?
Second question. I've got a class extending JTextPane and displaying an HTML file. What I'd like to be able to do is call getDocument() or getStyledDocument() and get an object to work with. However, every call to those methods returns a default, empty document with none of the page's information.
There is one way to get a filled-in Document object, though: I have to have the HTML page link to itself. Then, once that link is clicked and the page navigates back to itself, the getDocument() and getStyledDocument() calls will give me a complete Document object.
Any ideas why that is, and what I can do to have getDocument() and getStyledDocument() work on the FIRST call?
Source Code: http://tsunami.tacorner.com/src/TsunamiWindow.java
Thanks in advance.