DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Page 1 of 2 12 LastLast
Results 1 to 15 of 18

Thread: More text file fun... topic du jour?

  1. #1
    Dana Guest

    More text file fun... topic du jour?


    Seems a lot of text file questions lately. =)

    I have a large (2+ meg) text file that I want to open, and read the furthest
    line down that begins with "1024a:".

    I've been researching this all afternoon before I posted something, because
    I have a track record of posting a question and then finding the answer around
    the same time that someone posts it up, so this time I did research before
    asking the question, but here I am.

    What is the most efficient method of doing this?

    I've pondered arrays, recordsets, opentextfile, etc. But I'm butting my head
    into a wall.

    Thanks as always for the hand.

  2. #2
    Phil Weber Guest

    Re: More text file fun... topic du jour?

    > I have a large (2+ meg) text file that I want to open,
    > and read the furthest line down that begins with "1024a:"
    > ...What is the most efficient method of doing this?


    Dana: Seems to me the most efficient method would be to read the file into a
    buffer, starting at the end. Search the buffer (using, for example, the
    InStrRev function) for "1024a:"; if you find it, you're done. If not, read
    another chunk and try again.

    Which aspect(s) of the above do you need help with?
    ---
    Phil Weber



  3. #3
    Dana Guest

    Re: More text file fun... topic du jour?


    From what I understood when I read, InStrRev applies to a string.. and again,
    from what I understand, a string cannot hold that much text.

    So I think the main question is what kind is the most efficient buffer container
    to use for 2megs of ascii? I had thought about feeding the text file into
    an array line by line, but then got stuck on the search part without resorting
    to a Do .. Loop stepping through the array lines backwards until I found
    a match. That seemed scary to me.

    Thanks Phil


    "Phil Weber" <pweber@_fawcette.com> wrote:
    > > I have a large (2+ meg) text file that I want to open,
    > > and read the furthest line down that begins with "1024a:"
    > > ...What is the most efficient method of doing this?

    >
    >Dana: Seems to me the most efficient method would be to read the file into

    a
    >buffer, starting at the end. Search the buffer (using, for example, the
    >InStrRev function) for "1024a:"; if you find it, you're done. If not, read
    >another chunk and try again.
    >
    >Which aspect(s) of the above do you need help with?
    >---
    >Phil Weber
    >
    >



  4. #4
    Phil Weber Guest

    Re: More text file fun... topic du jour?

    > From what I understood when I read, InStrRev applies to
    > a string.. and again, from what I understand, a string cannot
    > hold that much text.


    Dana: I was thinking of something like this:

    Dim hFile As Integer
    Dim lFound As Long
    Dim lPos As Long
    Dim lRemaining As Long
    Dim sBuffer As String
    Dim lBufLen As Long
    Const sFind As String = "1024a:"

    hFile = FreeFile
    Open "d:\path\filename.txt" For Input As hFile

    ' Read file in 16K chunks (adjust
    ' as desired)
    lBufLen = 16384
    sBuffer = Space$(lBufLen)
    lRemaining = LOF(hFile)

    Do
    ' If remainder is less than buffer
    ' size, resize buffer accordingly
    If lRemaining < lBufLen Then
    lBufLen = lRemaining
    sBuffer = Space$(lBufLen)
    End If
    ' Position file pointer
    lPos = lRemaining - lBufLen + 1
    Seek hFile, lPos
    ' Read a buffer-full
    sBuffer = Input$(lBufLen, hFile)
    ' Search for target
    lFound = InStrRev(sBuffer, sFind)
    ' If found, we're done
    If lFound Then Exit Do
    ' Otherwise, adjust file pointer
    ' and try again
    lRemaining = lRemaining - lBufLen + Len(sFind)
    Loop While lRemaining >= lBufLen
    Close hFile

    ' At this point, if lFound <> 0, that's the
    ' location of the text

    Let me know if you have any questions.
    ---
    Phil Weber



  5. #5
    Alex Guest

    Re: More text file fun... topic du jour?


    Dana,

    You do not need to read entire file into string buffer

    Phil mean something like this


    Private Sub Command1_Click()
    MsgBox FindLastPosition("C:\BigFile.txt", "1024a:")
    End Sub


    Function FindLastPosition(sFileName As String, sSearchStr As String) As Long
    Dim hFile As Integer
    Dim sBuffer As String
    Dim CurPos As Long, nFileLength As Long
    Dim nSearchStrLen As Integer
    Dim n As Long

    nSearchStrLen = Len(sSearchStr)

    hFile = FreeFile
    Open sFileName For Binary As hFile
    ' or
    ' Open sFileName For Input As hFile

    nFileLength = LOF(hFile)
    CurPos = nFileLength - 2048

    If CurPos > 0 Then
    Do While CurPos > 0
    'set current position in the file
    Seek #hFile, CurPos

    'read 2048 characters into buffer
    sBuffer = input(2048, #hFile)
    n = InStrRev(sBuffer, sSearchStr)
    If n > 0 Then
    FindLastPosition = CurPos + n - 1
    Exit Do
    End If

    'calculate new current position
    'take into concideration that only part of search string
    'might has been read into buffer
    'i.e "....102" "4a:....."
    Select Case CurPos
    Case 1
    Exit Do
    Case 2 To 2048 - nSearchStrLen
    CurPos = 1
    Case Else
    CurPos = CurPos - 2048 + nSearchStrLen
    End Select
    Loop

    Else
    'file is not this big
    sBuffer = input(LOF(hFile), #hFile)

    n = InStrRev(sBuffer, sSearchStr)
    If n > 0 Then
    FindLastPosition = n
    End If
    End If

    Close hFile

    End Function


    Alex



    "Dana" <dana.sims@ivans.com> wrote:
    >
    >From what I understood when I read, InStrRev applies to a string.. and again,
    >from what I understand, a string cannot hold that much text.
    >
    >So I think the main question is what kind is the most efficient buffer container
    >to use for 2megs of ascii? I had thought about feeding the text file into
    >an array line by line, but then got stuck on the search part without resorting
    >to a Do .. Loop stepping through the array lines backwards until I found
    >a match. That seemed scary to me.
    >
    >Thanks Phil
    >
    >
    >"Phil Weber" <pweber@_fawcette.com> wrote:
    >> > I have a large (2+ meg) text file that I want to open,
    >> > and read the furthest line down that begins with "1024a:"
    >> > ...What is the most efficient method of doing this?

    >>
    >>Dana: Seems to me the most efficient method would be to read the file into

    >a
    >>buffer, starting at the end. Search the buffer (using, for example, the
    >>InStrRev function) for "1024a:"; if you find it, you're done. If not, read
    >>another chunk and try again.
    >>
    >>Which aspect(s) of the above do you need help with?
    >>---
    >>Phil Weber
    >>
    >>

    >



  6. #6
    Willy Van den Driessche Guest

    Re: More text file fun... topic du jour?

    Phil, what's the performance of reading a file backwards (I might be a
    little too old fashioned here with very sequential ideas) ?
    --
    Van den Driessche Willy
    For a work in progress :
    http://users.skynet.be/wvdd2/index.html



  7. #7
    Dean Earley Guest

    Re: More text file fun... topic du jour?

    "Dana" <dana.sims@ivans.com> wrote in message news:3bd88413$1@news.devx.com...
    >
    > From what I understood when I read, InStrRev applies to a string.. and again,
    > from what I understand, a string cannot hold that much text.

    I have an FTP library that reads in the entire file as a string, then sends it out to the FTP port
    It has happily uploaded and downloaded a 5Mb file, the only problem was when downloading, it was holding two copies of the data in
    memory (

    PS. this isn't the best way to do it, but is the most flexible)

    --
    Dean Earley (dean.earley@icode.co.uk)
    Assistant Developer

    iCode Systems



  8. #8
    Dean Earley Guest

    Re: More text file fun... topic du jour?

    "Dean Earley" <dean.earley@icode.co.uk> wrote in message news:3bd9184a@news.devx.com...
    > "Dana" <dana.sims@ivans.com> wrote in message news:3bd88413$1@news.devx.com...
    > >
    > > From what I understood when I read, InStrRev applies to a string.. and again,
    > > from what I understand, a string cannot hold that much text.

    > I have an FTP library that reads in the entire file as a string, then sends it out to the FTP port
    > It has happily uploaded and downloaded a 5Mb file, the only problem was when downloading, it was holding two copies of the data in
    > memory (

    Ive just looked in MSDN, and the string can hold approximately 2 billion characters (2GB)
    A lot of memory )

    >
    > PS. this isn't the best way to do it, but is the most flexible)
    >
    > --
    > Dean Earley (dean.earley@icode.co.uk)
    > Assistant Developer
    >
    > iCode Systems
    >
    >




  9. #9
    Phil Weber Guest

    Re: More text file fun... topic du jour?

    > I've just looked in MSDN, and the string can hold approximately
    > 2 billion characters (2GB). A lot of memory )


    Dean: Except, I suspect, on Win9x systems, which, due to its 16-bit
    underpinnings, often limits individual items to 64K.
    ---
    Phil Weber



  10. #10
    Jonathan Wood Guest

    Re: More text file fun... topic du jour?

    Dana,

    > So I think the main question is what kind is the most efficient buffer

    container
    > to use for 2megs of ascii?


    You would use much less memory (although the code might not run quite as
    fast) if you simply read the file line by line.

    When you find a line that begins with 1024a:, then store it in a string. If
    you find another one, then replace the string with that line. When you are
    finished the string will contain the last line that begins with 1024a:.

    I had thought about feeding the text file into
    > an array line by line, but then got stuck on the search part without

    resorting
    > to a Do .. Loop stepping through the array lines backwards until I found
    > a match. That seemed scary to me.
    >
    > Thanks Phil
    >
    >
    > "Phil Weber" <pweber@_fawcette.com> wrote:
    > > > I have a large (2+ meg) text file that I want to open,
    > > > and read the furthest line down that begins with "1024a:"
    > > > ...What is the most efficient method of doing this?

    > >
    > >Dana: Seems to me the most efficient method would be to read the file

    into
    > a
    > >buffer, starting at the end. Search the buffer (using, for example, the
    > >InStrRev function) for "1024a:"; if you find it, you're done. If not,

    read
    > >another chunk and try again.
    > >
    > >Which aspect(s) of the above do you need help with?
    > >---
    > >Phil Weber
    > >
    > >

    >


    --
    Jonathan Wood
    SoftCircuits Programming
    http://www.softcircuits.com



  11. #11
    Jonathan Wood Guest

    Re: More text file fun... topic du jour?

    You can only read file data forward so you would have to constantly
    reposition the file pointer towards the start of the file. However, how
    would you know how far to move it? Do you do it a byte at a time? And given
    that many file routines buffer up more than you read each time for
    performaces reasons, the performance of this approach would be horrible.

    --
    Jonathan Wood
    SoftCircuits Programming
    http://www.softcircuits.com

    "Willy Van den Driessche" <Willy.Van.denDriessche@skynet.be> wrote in
    message news:3bd89b52$1@news.devx.com...
    > Phil, what's the performance of reading a file backwards (I might be a
    > little too old fashioned here with very sequential ideas) ?
    > --
    > Van den Driessche Willy
    > For a work in progress :
    > http://users.skynet.be/wvdd2/index.html





  12. #12
    Phil Weber Guest

    Re: More text file fun... topic du jour?

    > You can only read file data forward so you would have to
    > constantly reposition the file pointer towards the start of
    > the file. However, how would you know how far to move it?
    > Do you do it a byte at a time?


    Jonathan: Did you look at the code I posted? It reads the file backward from
    the end in 16K chunks. In my tests, performance seemed fine. Also, he's
    looking for the last occurrence of the target string; if it appeared near
    the end of the file, he would avoid having to read the entire 2MB file,
    which would save some time.

    I do agree, however, that your suggested approach is easier to code, and as
    long as performance is acceptable, programmer time is more valuable than
    processor time. ;-)
    ---
    Phil Weber



  13. #13
    Patrick Ireland Guest

    Re: More text file fun... topic du jour?


    Dana,

    There are several solutions to your problem, however, each solution has
    its strengths and weaknesses. I need to know more about the problem.

    1) Will this file be searched more than once in the program?

    2) Can the pattern to match occur within the confines of a line,
    i.e., not necessarily at the beginning of a line?

    3) What version of VB and OS are you using?

    Pat


    "Dana" <dana.sims@ivans.com> wrote:
    >
    >Seems a lot of text file questions lately. =)
    >
    >I have a large (2+ meg) text file that I want to open, and read the furthest
    >line down that begins with "1024a:".
    >
    >I've been researching this all afternoon before I posted something, because
    >I have a track record of posting a question and then finding the answer

    around
    >the same time that someone posts it up, so this time I did research before
    >asking the question, but here I am.
    >
    >What is the most efficient method of doing this?
    >
    >I've pondered arrays, recordsets, opentextfile, etc. But I'm butting my

    head
    >into a wall.
    >
    >Thanks as always for the hand.



  14. #14
    Jonathan Wood Guest

    Re: More text file fun... topic du jour?

    Phil,

    > Jonathan: Did you look at the code I posted? It reads the file backward

    from
    > the end in 16K chunks. In my tests, performance seemed fine. Also, he's
    > looking for the last occurrence of the target string; if it appeared near
    > the end of the file, he would avoid having to read the entire 2MB file,
    > which would save some time.
    >
    > I do agree, however, that your suggested approach is easier to code, and

    as
    > long as performance is acceptable, programmer time is more valuable than
    > processor time. ;-)


    My first concerns were coding ease and resource use. My approach was both
    trivial and used hardly any additional resources. You have a point against
    my approach in cases where speed is critical and the target data is expected
    closer to the end of the file. I don't know if either of those are
    considerations here.

    However, looking at your code, I'd have the following concerns: 1) once
    found, you have some potentially non-trival code to parse out the line, and
    2) you don't appear to handle cases where the data being searched for
    crosses a 16K boundary (perhaps I just missed it).

    --
    Jonathan Wood
    SoftCircuits Programming
    http://www.softcircuits.com



  15. #15
    Bob O`Bob Guest

    Re: More text file fun... topic du jour?

    Phil Weber wrote:
    >
    > > I've just looked in MSDN, and the string can hold approximately
    > > 2 billion characters (2GB). A lot of memory )

    >
    > Dean: Except, I suspect, on Win9x systems, which, due to its 16-bit
    > underpinnings, often limits individual items to 64K.



    But not strings.

    I just wrote a quickie program that appends a string to itself over & over,
    and the virtual memory seemed to start to thrash after a string length of
    over 20 million characters, but it kept going.

    On Win95b


    Bob O`Bob
    --
    Life makes SO much less sense when you're sane.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center