dcsimg


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 12 of 12

Thread: Read file character by character

  1. #1
    Join Date
    Mar 2008
    Location
    Blaricum, The Netherlands
    Posts
    78

    Question Read file character by character

    I've been trying to read a file, character bij character.
    It seems this works:

    Code:
    Dim bChar as Byte
    Open "Filename.txt" for binary access read as #1
    While not EOF(3)
        Get #1,  , bChar
    Wend
    However, when I read the file, for instance an Xml file, the first character will be an '<' character and it will present itself as '60' in the variable bChar.

    Is there a standard way to convert this into a string? (of one character)
    Or is there a better way to read characters one-by-one from a file?

    Friendly greetings
    Rens

  2. #2
    Join Date
    Nov 2003
    Location
    Portland, OR
    Posts
    8,387
    Question: Why are you opening the file as handle #1, but doing EOF(3)?

    I would read the file in chunks into a string variable, then pull characters out of the string:
    Code:
    Dim I As Integer
    Dim Handle As Integer
    Dim Buffer As String
    Dim BytesRemaining As Long
    Dim Char As String
    
    ' Create 1K buffer
    Buffer = String(1024, 0)
    
    Handle = FreeFile
    Open "D:\Path\FileName.ext" For Binary As Handle
    Do
        BytesRemaining = LOF(Handle) - Seek(Handle) + 1
        If BytesRemaining < Len(Buffer) Then
            Buffer = String(BytesRemaining, 0)
        End If
        Get Handle, , Buffer
        
        For I = 1 To Len(Buffer)
            ' Read chars out of buffer
            Char = Mid(Buffer, I, 1)
        Next
    Loop Until BytesRemaining = 0
    Close Handle
    Phil Weber
    http://www.philweber.com

    Please post questions to the forums, where others may benefit.
    I do not offer free assistance by e-mail. Thank you!

  3. #3
    Join Date
    Mar 2008
    Location
    Blaricum, The Netherlands
    Posts
    78
    Hello Phil,

    thanks for the quick response.
    I'll try that.

    It would however be easier for me to read character by character.
    That prevents me from writing a lot of handling code. :-)
    Is there no way, like C could?

    Friendly greetings
    Rens.

  4. #4
    Join Date
    Nov 2003
    Location
    Portland, OR
    Posts
    8,387
    You probably don't want to read the file one byte at a time, unless you're sure you'll never get a Unicode file: Unicode uses two bytes for each character. You can, however, read the file one character at a time:
    Code:
    Dim Handle As Integer
    Dim Char As String * 1
    
    Handle = FreeFile
    Open "D:\Path\FileName.ext" For Binary As Handle
    Do Until EOF(Handle)
        Get Handle, , Char
    Loop
    Close Handle
    Phil Weber
    http://www.philweber.com

    Please post questions to the forums, where others may benefit.
    I do not offer free assistance by e-mail. Thank you!

  5. #5
    Join Date
    Mar 2008
    Location
    Blaricum, The Netherlands
    Posts
    78
    Hello Phil,

    yes.... That's more or less what I showed in my initial code.
    And in that part of code i had a problem.
    In my code, I got '60' in my variable in stead of '<'.
    But you gave me the answer. Thanks.

    You initialise the variable like : Dim Char as String *1
    (In stead of Dim Char as Byte)
    That solves my problem.

    What does the *1 mean in this case please?
    I would like to understand what I'm doing. :-)

    Friendly greetings
    Rens

  6. #6
    Join Date
    Nov 2003
    Location
    Portland, OR
    Posts
    8,387
    That line declares Char as a fixed-length, one-character string.
    Phil Weber
    http://www.philweber.com

    Please post questions to the forums, where others may benefit.
    I do not offer free assistance by e-mail. Thank you!

  7. #7
    Join Date
    Mar 2008
    Location
    Blaricum, The Netherlands
    Posts
    78
    Thanks!

  8. #8
    Join Date
    Nov 2003
    Location
    Alameda, CA
    Posts
    1,737
    A variable Byte is just a number (8-bits) To convert it to the corresponding ascii character you can use the Chr() function

    But I agree with Phil that reading a String*1 is much better because it automatically deals with files written in multibyte character encoding like Unicode or UTF

    And, if the file is big and performance is an issue, consider to read the file in chunks to limit the number of I/O operations
    "There are two ways to write error-free programs. Only the third one works."
    Unknown

  9. #9
    Join Date
    Mar 2008
    Location
    Blaricum, The Netherlands
    Posts
    78

    Red face

    Thanks for the explanation.

    Yes.... Performance is an issue but mainly memory use is an issue.

    What I'm building is a small and fast standalone application to split up very, very large xml-files. (Up till now, this was done by somebody by reading in the complete XML, and than spitting it out in parts. But they had some memory crashes. So I volunteered to create a memory efficiency method.

    Friendly greetings
    Rens Duijsens

  10. #10
    Join Date
    Nov 2003
    Location
    Alameda, CA
    Posts
    1,737
    Gotcha.
    Well, at least you can read one line at the time, using the Line statement, see snippet below

    Code:
        Dim f As Integer
        Dim s As String
        
        f = FreeFile
        Open "filename.txt" For Input As #f
    
        Do Until EOF(f)
            Line Input #f, s
        Loop
    Best regards, Happy New Year (and safe coding)
    "There are two ways to write error-free programs. Only the third one works."
    Unknown

  11. #11
    Join Date
    Mar 2008
    Location
    Blaricum, The Netherlands
    Posts
    78

    Red face

    Yes... That would be nice, wouldn't it?
    This kite does not fly however due to a not so nice formatting of the XML.

    The XML should be formatted like this:
    Code:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <DocumentBatch>
    	<Document>
    		<EventId>TEST1</EventId>
    		<DocumentData>Test1</DocumentData>
    	</Document>
    	<Document>
    		<EventId>TEST2</EventId>
    		<DocumentData>Test2</DocumentData>
    	</Document>
    </DocumentBatch>
    It is sadly enough formatted like:
    Code:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <DocumentBatch><Document><EventId>TEST1</EventId>Test1</DocumentData></Document><Document><EventId>TEST2</EventId><DocumentData>Test2</DocumentData></Document>
    So we see the header, and than the complete XML on one line.
    Using the line input statement causes several megabytes up to several gigabytes being read in one go.

    Hence the character-by-character reading.

    Friendly greetings
    Rens

  12. #12
    Join Date
    Nov 2003
    Location
    Alameda, CA
    Posts
    1,737
    Ah. Well, that's a bummer.
    Problem is, VB is not that efficient dealing with String variable. Using String*1 you'll need millions of I/O operations, plus string allocations and management. Remember that a VB String is in reality a BSTR class, that is a COM object, very handy but also very expensive, there is a lot of over head involved. Just something you'll have to keep in mind.
    I had a similar problem (even worst, because the file can be both ansii and encoded), that I solved reading the file as in chunks of a fixed-length String, I process the data and when I am done I read another chunk and so on until the end of the file.

    Good luck, keep us posted.
    "There are two ways to write error-free programs. Only the third one works."
    Unknown

Similar Threads

  1. Replies: 0
    Last Post: 06-12-2008, 09:08 AM
  2. My browser doesn't read my XML file
    By saraaku in forum XML
    Replies: 1
    Last Post: 04-10-2008, 05:13 AM
  3. Replies: 3
    Last Post: 08-22-2007, 01:58 PM
  4. Replies: 1
    Last Post: 03-23-2005, 12:18 AM
  5. How to read objects from file in a loop
    By mail2vinaybabu in forum Java
    Replies: 10
    Last Post: 02-27-2005, 02:07 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center