DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 10 of 10

Thread: Removing Spaces From Char String Runs Slow.

  1. #1
    John Guest

    Removing Spaces From Char String Runs Slow.


    Hopefully somebody can help me with this. I'm reading in a text file line
    by line (well over 200,000 lines). Each line is exactly 170 characters and
    contains letters and numbers. There is a 31 char key at the end of each
    line that needs to stay in that position. The rest of the characters on
    the line have misc spaces between them. Some have one space, others may
    have two or more between characters.

    I've written a function that I send each line to that removes duplicate spaces
    from each line. If there is more than one consecutive space, the additionals
    are removed and just one space is left. I am having to rewrite the entire
    line, character by character to do this though. It takes forever. Can somebody
    look at this and find a better, faster way to do it?

    Public Function StripSpaces(sLine As String)

    Dim bIsSpace As Boolean
    Dim bPrevSpace As Boolean
    Dim i As Integer
    Dim a As Integer
    Dim sLineOutput As String
    Dim nSpaceCount As Integer
    Dim sNewLine As String
    Dim nDiff As Integer
    Dim sSpaces As String

    For i = 1 To Len(sLine)
    If Mid(sLine, i, 1) = Chr(32) Then
    bIsSpace = True
    Else
    bIsSpace = False
    End If

    If bIsSpace = True Then 'is cur char a space
    If bPrevSpace = True Then 'is prev char a space
    For a = 1 To Len(sLine)
    'Do not write the cur char to the new string
    If a <> i Then
    sNewLine = sNewLine + Mid(sLine, a, 1)
    End If
    Next
    i = i - 1
    Else
    sNewLine = sLine
    End If
    bPrevSpace = True
    Else
    sNewLine = sLine
    bPrevSpace = False
    End If

    sLine = sNewLine
    sNewLine = ""

    Next

    'If the first char of the line is a space, remove it
    If Mid(sLine, 1, 1) = Chr(32) Then
    For a = 2 To Len(sLine)
    sNewLine = sNewLine + Mid(sLine, a, 1)
    Next
    sLine = sNewLine
    End If

    'Next section adds necessary spaces to keep line length
    'at 170 characters and the 31 char key at the end of the line

    sNewLine = Mid(sLine, 1, Len(sLine) - 32)
    nDiff = 139 - Len(sNewLine)
    For a = 1 To nDiff
    sSpaces = sSpaces & Chr(32)
    Next
    sNewLine = sNewLine + sSpaces + Right(sLine, 31)
    sLine = sNewLine

    StripSpaces = sLine
    End Function

    Thanks for any input on this.

  2. #2
    Anthony Jones Guest

    Re: Removing Spaces From Char String Runs Slow.

    Do Until InStr(1, s, " ") = 0
    s = Replace(s, " ", " ")
    Loop



  3. #3
    Rick Rothstein Guest

    Re: Removing Spaces From Char String Runs Slow.

    "Anthony Jones" <Ant@yadayadayada.com> wrote in message
    news:3def6715$1@tnews.web.devx.com...
    > Do Until InStr(1, s, " ") = 0
    > s = Replace(s, " ", " ")
    > Loop


    That is part of the solution, although I prefer this construction

    Do While InStr(s, " ")

    over your posted

    Do Until InStr(1, s, " ") = 0

    (of course, this is strictly a personal preference thing); however, the
    OP wanted the final string to be 170 characters long with the last 31
    characters unchanged. A call to this Sub should change a given line of
    text the way the OP wants

    Sub ParseLine(LineOfText As String)
    Dim TempString As String
    TempString = Trim$(Left$(LineOfText, 139))
    Do While InStr(TempString, " ")
    TempString = Replace$(TempString, " ", " ")
    Loop
    Mid$(LineOfText, 1) = Left$(TempString & Space$(139), 139)
    End Sub

    Rick - MVP



  4. #4
    Bernie Guest

    Re: Removing Spaces From Char String Runs Slow.


    "John" <jking@midflorida.com> wrote:
    >
    >Hopefully somebody can help me with this. I'm reading in a text file line
    >by line (well over 200,000 lines). Each line is exactly 170 characters

    and
    >contains letters and numbers. There is a 31 char key at the end of each
    >line that needs to stay in that position. The rest of the characters on
    >the line have misc spaces between them. Some have one space, others may
    >have two or more between characters.
    >


    Hi,

    Since some others have given you hints for this, I won't rewrite your function
    once more.

    However, when dealing havily with strings it sometimes speeds up processing
    rapidly if the string is first converted to a byte-array (with the ascii-values).
    Then process it, and finally convert it back to a string.

    Bernie

  5. #5
    John Guest

    Re: Removing Spaces From Char String Runs Slow.


    "Rick Rothstein" <rickNOSPAMnews@NOSPAMcomcast.net> wrote:
    >"Anthony Jones" <Ant@yadayadayada.com> wrote in message
    >news:3def6715$1@tnews.web.devx.com...
    >> Do Until InStr(1, s, " ") = 0
    >> s = Replace(s, " ", " ")
    >> Loop

    >
    >That is part of the solution, although I prefer this construction
    >
    > Do While InStr(s, " ")
    >
    >over your posted
    >
    > Do Until InStr(1, s, " ") = 0
    >
    >(of course, this is strictly a personal preference thing); however, the
    >OP wanted the final string to be 170 characters long with the last 31
    >characters unchanged. A call to this Sub should change a given line of
    >text the way the OP wants
    >
    >Sub ParseLine(LineOfText As String)
    > Dim TempString As String
    > TempString = Trim$(Left$(LineOfText, 139))
    > Do While InStr(TempString, " ")
    > TempString = Replace$(TempString, " ", " ")
    > Loop
    > Mid$(LineOfText, 1) = Left$(TempString & Space$(139), 139)
    >End Sub
    >
    >Rick - MVP
    >
    >


    Thanks to both of you for the quick responses.

    Rick, your solution works perfectly and shortened my 32 minute process down
    to just under 4 minutes.

    When I wrote my code, I just knew there had to be an easy way but couldn't
    come up with it. Thanks again!

  6. #6
    Anthony Jones Guest

    Re: Removing Spaces From Char String Runs Slow.

    >(of course, this is strictly a personal preference thing);

    ;-) yeah don't start that again.

    >>

    however, the OP wanted the final string to be 170 characters long with the
    last 31
    characters unchanged.
    <<

    It's a good job I didn't post a complete function. Well done for finishing
    the job off.

    Anthony.



  7. #7
    Ulrich Korndoerfer Guest

    Re: Removing Spaces From Char String Runs Slow.

    John,

    following is a solution which quenches several consecutive spaces into
    one and trims leading and trailing spaces. You have to add code after
    the string has been quenched to get your 170 char length.

    It uses a sub with source and destination string given byref, as this is
    quicker than returning the result as a function return value. Both
    source and destination may refer to the same string and both may be
    empty. It is much faster than an InString/Replace combination.

    The Source string is converted to a byte array containing the UNICODE
    characters and then this byte array is quenched in place. After
    quenching the byte arrays length is adjusted and converted back to a
    string into the Dest param.

    There is even a faster way which does not have to convert the string
    into a byte array and back (which costs time). It uses a fake array of
    integers (a slimy, but safe hack :-)) mapped to the source string and
    thus one can operate directly on the string without having to convert
    it. I am working on it. If interested, I could post it.

    Usage example:

    Dim Source As String
    Source = " asad asdasdd dfsf fdfddf "
    NormalizeWS Source, Source
    Print Source '-> "asad asdasdd dfsf fdfddf"

    <Code>

    Public Sub NormalizeWS(ByRef Dest As String, ByRef Source As String)

    Dim charl As Long, charh As Long, b() As Byte, i As Long, c As Long,
    noseq As Long

    b = Source

    For i = 0 To UBound(b) Step 2
    charl = b(i): charh = b(i + 1)
    If charh = 0 Then
    If charl = &H20& Then
    If noseq Then
    If c < i Then b(c) = &H20&: b(c + 1) = 0
    c = c + 2: noseq = 0
    End If
    Else
    If c < i Then b(c) = charl: b(c + 1) = 0
    c = c + 2: noseq = -1
    End If
    Else
    If c < i Then b(c) = charl: b(c + 1) = charh
    c = c + 2: noseq = -1
    End If
    Next i

    If c > 0 Then 'if string was not empty and contained chars other than
    white space
    If noseq = 0 Then c = c - 2 'Remove trailing space if there is one
    If c < i Then ReDim Preserve b(0 To c - 1) 'white space was removed
    Dest = b
    Else
    Dest = vbNullString
    End If

    End Sub

    <\Code>

    Ulrich

    --
    VB tips and tricks -> http://www.proSource.de/Downloads/


  8. #8
    Rick Rothstein Guest

    Re: Removing Spaces From Char String Runs Slow.

    Okay, here is my attempt at implementing Ulrich's suggested Byte
    arrays... see if this speeds up your calculations any:

    Rick - MVP

    Sub ParseLine(LineOfText As String)
    Dim X As Long
    Dim TempX As Long
    Dim IsBlank As Boolean
    Dim Temp() As Byte
    Dim Bytes() As Byte
    Bytes = LineOfText
    Temp = String$(139, Chr$(0)) & _
    Mid$(LineOfText, 140)
    For X = 0 To 277 Step 2
    If Bytes(X) <> 32 Then
    Temp(TempX) = Bytes(X)
    TempX = TempX + 2
    IsBlank = False
    Else
    Temp(TempX) = 32
    If Not IsBlank Then
    TempX = TempX + 2
    IsBlank = True
    End If
    End If
    Next
    LineOfText = Temp
    If InStr(LineOfText, " ") = 1 Then
    Mid$(LineOfText, 1) = Mid$(LineOfText, 2, 138)
    Mid$(LineOfText, 139) = " "
    End If
    End Sub







    "John" <jking@midflorida.com> wrote in message
    news:3def77ab$1@tnews.web.devx.com...
    >
    > "Rick Rothstein" <rickNOSPAMnews@NOSPAMcomcast.net> wrote:
    > >"Anthony Jones" <Ant@yadayadayada.com> wrote in message
    > >news:3def6715$1@tnews.web.devx.com...
    > >> Do Until InStr(1, s, " ") = 0
    > >> s = Replace(s, " ", " ")
    > >> Loop

    > >
    > >That is part of the solution, although I prefer this construction
    > >
    > > Do While InStr(s, " ")
    > >
    > >over your posted
    > >
    > > Do Until InStr(1, s, " ") = 0
    > >
    > >(of course, this is strictly a personal preference thing); however,

    the
    > >OP wanted the final string to be 170 characters long with the last 31
    > >characters unchanged. A call to this Sub should change a given line

    of
    > >text the way the OP wants
    > >
    > >Sub ParseLine(LineOfText As String)
    > > Dim TempString As String
    > > TempString = Trim$(Left$(LineOfText, 139))
    > > Do While InStr(TempString, " ")
    > > TempString = Replace$(TempString, " ", " ")
    > > Loop
    > > Mid$(LineOfText, 1) = Left$(TempString & Space$(139), 139)
    > >End Sub
    > >
    > >Rick - MVP
    > >
    > >

    >
    > Thanks to both of you for the quick responses.
    >
    > Rick, your solution works perfectly and shortened my 32 minute process

    down
    > to just under 4 minutes.
    >
    > When I wrote my code, I just knew there had to be an easy way but

    couldn't
    > come up with it. Thanks again!




  9. #9
    Jim Edgar Guest

    Re: Removing Spaces From Char String Runs Slow.

    He might be able to speed up his code a little more if he opens
    the file in binary mode and reads the whole file at once. I
    mainly use SQL Server and Oracle databases so I'm no
    expert on file parsing but I have used the following technique
    on some rather large text files with good success.

    Sub parsefile(ByVal strFileName As String)
    Dim iCnt As Integer
    Dim iFileNum As Integer
    Dim aTempRecs As Variant
    Dim strBuffer As String
    ' Open the file to be parsed.
    iFileNum = FreeFile
    ' Should check here to see if strFileName
    ' points to a valid file.
    Open strFileName For Binary As #iFileNum
    ' Get the file information.
    strBuffer = Space(LOF(iFileNum))
    Get #iFileNum, , strBuffer
    ' Done with the file so close it.
    Close #iFileNum

    aTempRecs = Split(strBuffer, vbNewLine)

    ' Parse the new array.
    For iCnt = LBound(aTempRecs) To UBound(aTempRecs)
    ' Parse the lines of the file here.
    Next
    End Sub

    Jim Edgar



  10. #10
    Ulrich Korndoerfer Guest

    Re: Removing Spaces From Char String Runs Slow.

    John,

    I have finished the version which uses an fake integer array. See below.
    This and some other helpers are also available (documented) at my
    website:

    www.prosource.de/Downlods/index.html#Strings

    <code>

    'Put code in a class module

    Private Declare Sub MoveMemory _
    Lib "kernel32" Alias "RtlMoveMemory" _
    (ByRef Dest As Any, _
    ByRef Source As Any, _
    ByVal NumOfBytes As Long)

    Private Declare Function ArrPtr Lib "msvbvm60.dll" Alias "VarPtr" _
    (ByRef ArrVar() As Any) _
    As Long

    Private mDestArr() As Integer
    Private mDestSA(5) As Long

    '-------------------

    Public Sub NormalizeWS(ByRef Dest As String, ByRef Source As String)

    Dim i As Long, c As Long, noseq As Long

    If VarPtr(Dest) <> VarPtr(Source) Then Dest = Source
    mDestSA(3) = StrPtr(Dest)

    For i = 0 To Len(Source) - 1
    If mDestArr(i) = &H20& Then
    If noseq Then
    If c < i Then mDestArr(c) = &H20&
    c = c + 1: noseq = 0
    End If
    Else
    If c < i Then mDestArr(c) = mDestArr(i)
    c = c + 1: noseq = -1
    End If
    Next i

    If c > 0 Then
    If noseq = 0 Then c = c - 1
    If c < i Then Dest = Left$(Dest, c)
    Else
    Dest = vbNullString
    End If

    End Sub

    '-------------------

    Private Sub Class_Initialize()

    mDestSA(0) = 1: mDestSA(1) = 2: mDestSA(4) = 2 ^ 31 - 1
    MoveMemory ByVal ArrPtr(mDestArr), VarPtr(mDestSA(0)), 4

    End Sub

    '-------------------

    Private Sub Class_Terminate()

    MoveMemory ByVal ArrPtr(mDestArr), 0&, 4

    End Sub

    </code>

    Ulrich

    --
    VB tips and tricks -> http://www.proSource.de/Downloads/


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center