I have not used regular expressions before but by saerching the web and experimenting I managed to cobble together an expression that identifies all html tags includding multi-line comments <!-- ... -->.

I don't completely understand the expression but enough to get it to work.
Which is what counts!

The next step is to identify each tags attributes and their values.
Everything I have tried fails when an attributes value is a string that contains text that looks like another attribute/value pair.

I am using VB.Net 2005

This is my Attribute finding expression so far: "(\s\b[^=]*=)"
Between any two attributes is the previous attributes value and the last ones value simply runs to the end of the tag. This method would work if I could get it to ingnore the text within the string values, just like it ignores numeric values.

Thanks in advance!