-
Help with Xpath (normalize space)
Hey guys, I need a little help with my vb app. Ive used the HtmlAgilityPack and Xpath method to retrieve some text from a webpage but its also retrieving all the spacing, I would like to remove all the spaces somehow.
Here is my vb code so far:
Code:
Dim content2 As String = ""
Dim web2 As New HtmlAgilityPack.HtmlWeb
Dim doc2 As HtmlAgilityPack.HtmlDocument = web2.Load("http://www.yellowpages.ca/search/si/1/Estheticians/Calgary+AB ")
Dim hnc2 As HtmlAgilityPack.HtmlNodeCollection = doc2.DocumentNode.SelectNodes("//div[@class='address']")
For Each link As HtmlAgilityPack.HtmlNode In hnc2
Dim replaceUnwanted As String = ""
replaceUnwanted = link.InnerText.Replace("&", "&") '
replaceUnwanted = replaceUnwanted.Replace("'", "'")
content2 &= replaceUnwanted & vbNewLine
Next
RichTextBox2.Text = content2
Any ideas?
-
I'm not 100% sure without seeing a sample of the text and whitespace, but this should be able to be resolved using the XPath function, normalize-space(string).
http://www.w3schools.com/Xpath/xpath...ons.asp#string
The XPath could look like:
//normalize-space(div[@class='address'])
-
Im getting '//normalize-space(div[@class='address'])' has an invalid token.
Here is the formatting without the normalize-space:
Code:
101-424 10 St NW, Calgary, AB, T2N1V9
2359 Banff Trail NW, Calgary, AB, T2M4L2
Thats 2 addresses, there are more than 10.
-
can you send this xml document?
without xml source is hard to figure this out, but maybe
//div[@class='address']/text()[normalize-space(.)]
best regards,
tonci korsano
-
Cutting and pasting your result in Word (wonderful tool to see what lies under white spaces), I see that your white space is created with tabulations, so the following should do the trick:
Code:
Content2=Content2.Replace(ControlChars.Tab, "")
You will be left with real spaces after the comas, which you may not want, depending on the treatment you do with that data later on.
If you split the string on the comas, simply trim it after the split to remove the extraneous spaces.
Or you can remove them right there with :
Code:
Content2=Content2.Replace(", ", ",")
Content2=Content2.Replace(", ", ",")
You need to do it twice because come of the comas are followed by 2 spaces.
Jacques Bourgeois
JBFI
http://www3.sympatico.ca/jbfi/homeus.htm
-
Thanks guys, in combination of tkorsano and JBourgeois suggestion, I got the result I wanted.
Similar Threads
-
By TheBrenda in forum XML
Replies: 0
Last Post: 04-29-2009, 04:26 PM
-
By rpatil in forum Database
Replies: 5
Last Post: 12-02-2006, 04:10 PM
-
By Warren in forum Enterprise
Replies: 0
Last Post: 04-19-2001, 08:25 AM
-
By zedios in forum Database
Replies: 1
Last Post: 01-16-2001, 09:21 AM
-
By Brad Overlund in forum Web
Replies: 4
Last Post: 11-15-2000, 12:51 PM
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules
|
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL
|
Bookmarks