DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

+ Reply to Thread
Results 1 to 6 of 6
  1. #1
    Join Date
    Sep 2010
    Posts
    9

    Help with Xpath (normalize space)

    Hey guys, I need a little help with my vb app. Ive used the HtmlAgilityPack and Xpath method to retrieve some text from a webpage but its also retrieving all the spacing, I would like to remove all the spaces somehow.

    Here is my vb code so far:
    Code:
            Dim content2 As String = ""
            Dim web2 As New HtmlAgilityPack.HtmlWeb
            Dim doc2 As HtmlAgilityPack.HtmlDocument = web2.Load("http://www.yellowpages.ca/search/si/1/Estheticians/Calgary+AB ")
            Dim hnc2 As HtmlAgilityPack.HtmlNodeCollection = doc2.DocumentNode.SelectNodes("//div[@class='address']")
            For Each link As HtmlAgilityPack.HtmlNode In hnc2
                Dim replaceUnwanted As String = ""
                replaceUnwanted = link.InnerText.Replace("&", "&") '
                replaceUnwanted = replaceUnwanted.Replace("'", "'")
    
                content2 &= replaceUnwanted & vbNewLine
            Next
            RichTextBox2.Text = content2
    Any ideas?

  2. #2
    Join Date
    May 2009
    Location
    United Kingdom
    Posts
    49
    I'm not 100% sure without seeing a sample of the text and whitespace, but this should be able to be resolved using the XPath function, normalize-space(string).

    http://www.w3schools.com/Xpath/xpath...ons.asp#string

    The XPath could look like:
    //normalize-space(div[@class='address'])

  3. #3
    Join Date
    Sep 2010
    Posts
    9
    Im getting '//normalize-space(div[@class='address'])' has an invalid token.

    Here is the formatting without the normalize-space:
    Code:
    																																									101-424 10 St NW, Calgary,  AB,  T2N1V9 				
    																											
    																														
    
    																																									2359 Banff Trail NW, Calgary,  AB,  T2M4L2
    Thats 2 addresses, there are more than 10.

  4. #4
    Join Date
    Oct 2008
    Posts
    141

    can you send this xml document?

    without xml source is hard to figure this out, but maybe

    //div[@class='address']/text()[normalize-space(.)]

    best regards,

    tonci korsano

  5. #5
    Join Date
    Feb 2004
    Location
    Longueuil, Québec
    Posts
    577
    Cutting and pasting your result in Word (wonderful tool to see what lies under white spaces), I see that your white space is created with tabulations, so the following should do the trick:

    Code:
    Content2=Content2.Replace(ControlChars.Tab, "")
    You will be left with real spaces after the comas, which you may not want, depending on the treatment you do with that data later on.

    If you split the string on the comas, simply trim it after the split to remove the extraneous spaces.

    Or you can remove them right there with :

    Code:
    Content2=Content2.Replace(", ", ",")
    Content2=Content2.Replace(", ", ",")
    You need to do it twice because come of the comas are followed by 2 spaces.
    Jacques Bourgeois
    JBFI
    http://www3.sympatico.ca/jbfi/homeus.htm

  6. #6
    Join Date
    Sep 2010
    Posts
    9
    Thanks guys, in combination of tkorsano and JBourgeois suggestion, I got the result I wanted.

Similar Threads

  1. XPath Substring
    By TheBrenda in forum XML
    Replies: 0
    Last Post: 04-29-2009, 04:26 PM
  2. Space between lines in MS Access Report
    By rpatil in forum Database
    Replies: 5
    Last Post: 12-02-2006, 04:10 PM
  3. Address Space on X400 Connectors
    By Warren in forum Enterprise
    Replies: 0
    Last Post: 04-19-2001, 08:25 AM
  4. How to retrieve reserved DB data & log space
    By zedios in forum Database
    Replies: 1
    Last Post: 01-16-2001, 09:21 AM
  5. Quick fix...plz help
    By Brad Overlund in forum Web
    Replies: 4
    Last Post: 11-15-2000, 12:51 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


Top DevX Stories

Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL


Sponsored Links