I have a very interesting finding, at least for me!

I use StreamReader to monitor a text file. Here are some codes used in a function that keeps reading the tail of the file if there are any new strings are added to it.

// my stream reader
StreamReader myReader = new StreamReader("myLog.txt");
// ========
// Some codes of the function to read chars from a file till
// to the end.
//seek to position. m_pos is class level var, and initially as 0.
myReader.BaseStream.Seek(m_pos, SeekOrigin.Begin);
// in a loop to read a char till the end of file
char[] c = new char[1];
myReader.Read(c, 0, 1);
m_pos++; // keep track of the current position in the file.
// ========
This function is used to monitor a text file: myLog.txt, as an example. At first, the file contains following lines:

Test lines:
Line 1.

The function returns the correct strings and they are displayed in a RichTextBox(RTB) OK. After the call, the position(m_pos) is 20. The I appended "Line 2." as a new line to the file.

It works fine if the file is a text file without any BOM(byte order mark). However, if I save the file as a Unicode text file(I need to support text files with BOM), the first call(start from 0) returns the correct the string, but the position is still 20. Then problem comes: the send call(start after 20) does not work correctly. It reads from ":", not the start of "Line 2".

What I found is that as a text file with BOM like Unicode, each ASCII char takes 2 bytes like this: \0\ascii_code. The first time(position at 0) when the function is called, Read() function reads only ASCII chars out(no \0 preceeding at all!), hence the postion is 20(the BOM 2 bytes are skipped). However, in the second call(seek to 20 then start to read), the Read() reads \0 and continue reading byte by byte!

I am not sure how I can handle the case of text file with BOM(Unicode, UTF8 and UBE). Should I detect BOM first and adjust position increatement based on the BOM for the first call (starting from 0)? And in the remaining calls(seek to >0), I have to skip all the \0s? and return all none \0 chars?

Any othe better solutions?

By the way, if I convert a plain text string embeded \0 to a RTF string and then place it to a RTB, the string chars after \0 do not show up at all. It looks like that it has been trimmed out.