Q: Using StringReader to monitor a text file


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 7 of 7

Thread: Q: Using StringReader to monitor a text file

Hybrid View

  1. #1
    Join Date
    Feb 2004
    Posts
    43

    Q: Using StringReader to monitor a text file

    I have a very interesting finding, at least for me!

    I use StreamReader to monitor a text file. Here are some codes used in a function that keeps reading the tail of the file if there are any new strings are added to it.

    Code:
    // my stream reader
    StreamReader myReader = new StreamReader("myLog.txt");
    ...
    // ========
    // Some codes of the function to read chars from a file till
    // to the end.
    //seek to position. m_pos is class level var, and initially as 0.
    myReader.BaseStream.Seek(m_pos, SeekOrigin.Begin);
    ...
    // in a loop to read a char till the end of file
    char[] c = new char[1];
    myReader.Read(c, 0, 1);
    m_pos++; // keep track of the current position in the file.
    ...
    // ========
    This function is used to monitor a text file: myLog.txt, as an example. At first, the file contains following lines:

    Test lines:
    Line 1.


    The function returns the correct strings and they are displayed in a RichTextBox(RTB) OK. After the call, the position(m_pos) is 20. The I appended "Line 2." as a new line to the file.

    It works fine if the file is a text file without any BOM(byte order mark). However, if I save the file as a Unicode text file(I need to support text files with BOM), the first call(start from 0) returns the correct the string, but the position is still 20. Then problem comes: the send call(start after 20) does not work correctly. It reads from ":", not the start of "Line 2".

    What I found is that as a text file with BOM like Unicode, each ASCII char takes 2 bytes like this: \0\ascii_code. The first time(position at 0) when the function is called, Read() function reads only ASCII chars out(no \0 preceeding at all!), hence the postion is 20(the BOM 2 bytes are skipped). However, in the second call(seek to 20 then start to read), the Read() reads \0 and continue reading byte by byte!

    I am not sure how I can handle the case of text file with BOM(Unicode, UTF8 and UBE). Should I detect BOM first and adjust position increatement based on the BOM for the first call (starting from 0)? And in the remaining calls(seek to >0), I have to skip all the \0s? and return all none \0 chars?

    Any othe better solutions?

    By the way, if I convert a plain text string embeded \0 to a RTF string and then place it to a RTB, the string chars after \0 do not show up at all. It looks like that it has been trimmed out.

  2. #2
    Join Date
    Nov 2003
    Location
    Portland, OR
    Posts
    8,387
    I don't have any information about the behavior you're seeing, but I notice that there are numerous C# "tail" implementation available on the Web: http://www.google.com/search?q=c%23+tail . Maybe one of them will provide a workaround?
    Phil Weber
    http://www.philweber.com

    Please post questions to the forums, where others may benefit.
    I do not offer free assistance by e-mail. Thank you!

  3. #3
    Join Date
    Feb 2004
    Posts
    43
    I tried a couple of examples. It works for files without BOM, but not files with Unicode, and Unicode big endian BOMs (you can save text file in notepad with these types).

    In other words, if there is BOM header in the text file, tail reading (seek position first) and then Read() with .Net StreamReader() may read embeded \0 chars. String with embeded \0 will be trimmed off when they are displayed in Text or RichTextBox.

    My solution is that to detect BOM first. If there is BOM, just skip it with seek. This will guarantee Read() reads byte by byte. Then I check if the char is \0, I just skip it and only add none \0 to the result string. It works OK for all types of text files saved in notepad.

    So, if you work with text file, be aware BOM in .Net when you read from middle of the file.

  4. #4
    Join Date
    Mar 2005
    Posts
    71
    It is a simple encoding problem.

    Each character takes 1 byte in ANSI and 2 bytes in Unicode.
    StreamReader.Read() is a smart method which can determine the encoding type, read the corresponding number of bytes, and return them as a character.
    So, no matter what encoding type is used, the 1st call should return "m_pos=20"

    The problem is that "m_pos" is the character count, not the byte count.
    Passing "m_pos" as byte offset parameter to Seek() gives error while reading Unicode.

    To solve it, you may obtain the encoding type by myReader.CurrentEncoding
    If Unicode is detected, apply "m_pos += 2" rather than "m_pos++".
    Last edited by oupoi; 08-15-2006 at 08:32 AM.

  5. #5
    Join Date
    Feb 2004
    Posts
    43
    I think the CurrentEncoding always returns Unicode, no matter what I have in the text file or with/without BOM (byte order mark). I tried to use notepad to save a text file in all different encoding types. The CurrentEncoding is Unicode for all types.

  6. #6
    Join Date
    Mar 2005
    Posts
    71
    Sorry for missing some important points.

    StreamReader won't detect the encoding type before read.
    If you simply prompt the CurrentEncoding.EncodingName you will get "Unicode (UTF-8)", which is the default type.

    However, if you prompt the CurrentEncoding.EncodingName after the Read() command, ANSI text file will give you "Unicode (UTF-8)" and Unicode text file will give "Unicode".

    Moreover, the CurrentEncoding.GetByteCount() function can tell you the number of byte is used for one (or more) character directly.

    There is another function, CurrentEncoding.GetMaxByteCount(), seems to be more suitable. But it returns 4bytes/character for UTF-8, which is not really the case of ANSI.

    Unfortunately, .NET seems can't distinguish ANSI and the real UTF-8 format. So please use Unicode for BOM.

    p.s.
    If it is a closed enviornment with only Unicode file, no need to worry about those stuffs, just change "m_pos++" to "m_pos+=2".
    Last edited by oupoi; 08-16-2006 at 10:24 PM.

  7. #7
    Join Date
    Mar 2005
    Posts
    71
    Indeed, if you don't mind changing your logic flow, keep tracking on the StreamReader.BaseStream.Length on every function call will be better than using an incremental variable inside the reading loop.

Similar Threads

  1. wav file fade in out help
    By jase_dukerider in forum C++
    Replies: 2
    Last Post: 04-14-2005, 07:48 PM
  2. Replies: 8
    Last Post: 03-08-2003, 06:25 PM
  3. Replies: 1
    Last Post: 04-02-2002, 10:45 PM
  4. open text file using shell
    By Hian Chew in forum VB Classic
    Replies: 18
    Last Post: 03-07-2001, 12:07 PM
  5. Replies: 0
    Last Post: 04-17-2000, 01:33 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center