distance between stream iterators


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 8 of 8

Thread: distance between stream iterators

  1. #1
    Join Date
    Nov 2005
    Posts
    4

    Question distance between stream iterators

    I need to search a binary file for a specific sequence of byte values, and return the byte position of the found sequence. I'm using ifstream on the file, so iterator's came to mind. Learning as I go, I read 1000 bytes into a char vector and used std::find_first_of() on the vector to find the sequence, which returned an iterator. Then determined the byte location of the sequence using

    int where = distance(v.begin(),foundAt);

    The vector approach worked great, and proved to me the power of iterators. But this method requires that I know that the sequence exist within the first 1000 bytes, and doesn't take full advantage of iterators. A istream_iterator approached seemed in order. So, with the sequence to search for in fs_str i wrote....

    Code:
    ifstream _is;
    _is.open (filename.c_str(), ios::binary );	
    istream_iterator<char> first_byte(_is);
    istream_iterator<char> foundAt(_is);
    
    foundAt= std::find_first_of( istream_iterator<char>(_is),
                                              istream_iterator<char> (),
                                              fs_str.begin(),fs_str.end());
    
    long at = distance(first_byte, foundAt);
    But distance() doesn't seem to work correctly for this approach. It always returns zero. If works perfectly in the vector case where it measures the distance between v.begin() and the iterator returned from find_first_of().

    Even more confusing for me is the fact that using the debugger I can see that first_byte and foundAt are pointing to the correct bytes of interest. Since these are definitely different byte locations, why does distance always return zero? What is so special about a stream iterator, verses a vector interator?

    Can someone please help me understand and fix this problem?
    Last edited by gary_dr; 11-06-2005 at 01:02 PM.

  2. #2
    Join Date
    Nov 2003
    Posts
    4,118
    for distance to work, the first iterator and the last must point to the same sequence, which means that you can reach the last iterator by incrementing the first iterator. I don't know what the values of fs_str are (or what it is exactly) but it looks like this requirement isn't met in your code.
    Danny Kalev

  3. #3
    Join Date
    Nov 2005
    Posts
    4
    I guess I should have also shown what fs_str is.

    Code:
    char fs[] ={(char)0xF0,(char)0xFA,(char)0x00,(char)0x32};
    string fs_str(fs);
    That may seem odd, since i'm sure a better method exist for creating an iteratable sequence of raw byte values. But this seems to work, at least I don't think it has anything to do with my present problem.

    Danny, I look foward to another reply, now that I show what fs_str is. It's the thing I'm looking for in the binary input file.

    Maybe a little more application specific info would also be helpful. I know there are many ways to do this, but it should be possible with iterators directly, not indirectly as my vector approach demonstrated.

    One thing that worries me is a comment by Stroustrup, on page 555 of "The C++ Programming Language: special edition". He shows distance(is1,is2), where is1 and 2 are istream_iterators. But then he states that in a real app this wouldn't make much sense, since the effect would be to read input, throw it away and return the number of values thrown away.

    Well that's what I want to do, but it seems to make sense to me. Rewind a stream to the beginning of a file, search for a sequence that marks the beginning of some particular type of data, and return a interator to this found data area.

    The only reason I need to use distance, as opposed to accessing the data via this returned iterator, is that this found data area is a huge block of binary data that I want to use ifstream.read() on.
    Last edited by gary_dr; 11-06-2005 at 05:02 PM.

  4. #4
    Join Date
    Nov 2003
    Posts
    4,118
    there are many red herrings in your code that still make it impossible for me to test it. For starters, your characters contain null, which might cause a problem with certain I/O operations although in theory, this shouldn't happen. You simply don't know which high-level classes and services are implemented using C APIs or syscalls that choke on nulls. Secondly, you're using char across the board but it looks like you assume that char is unsinged. Again, this could cause a problem. Third, I can't run your code because I don't have the file in question (nor its contents) and finally, my compiler insisit on __int64 as the return type of distance, not long, so this could be another problem. In short, what I want is a code fragment in which find_first_of succeeds, and then I can proceed to the distance() call.
    Danny Kalev

  5. #5
    Join Date
    Nov 2005
    Posts
    4
    Here it is, red herrings and all. I did fix the signed/unsigned issues, and show a indirect vector based method that does work. Try it on any file you have, looking of course for some sequence that you already know is in your file.

    What's so special about istream_iterator that prevents me from determining the distance between two of them.

    In the code below, search() does return a foundAt iterator that points to the location in question, but distance() still always returns zero.

    However reading a chunk of file into a vector and searching the vector produces an iterator that distance() will measure properly.

    Evidently, it is not possible to measure the distance between two stream iterators, because two non-end-of-stream iterators are equal when they are constructed from the same stream. I still don't quite understand why that fact is true.

    Are istreams somehow processed as strings? Could the nulls in my pattern cause some problem here? Should I define the stream differently? There must be some way to operate directly on the stream!

    Code:
    //distance between stream iterators test
    #include <iostream>
    #include <fstream>
    #include <algorithm>
    #include <iterator>
    #include <vector>
    
    using namespace std;
    
    int main()
    {
       // the pattern can be found at byte location 28 in test.bin
       // let's see if we can get those same results by operating 
       // directly on the on the stream, instead of copying first
       // into some container.
        
       vector<unsigned char> pattern; // a pattern to find in a file
       
       // store FAF03200 pattern in a sequence vector	
    	std::inserter(pattern,pattern.end())= (unsigned char)0xF0;
    	std::inserter(pattern,pattern.end())= (unsigned char)0xFA;
    	std::inserter(pattern,pattern.end())= (unsigned char)0x00;
    	std::inserter(pattern,pattern.end())= (unsigned char)0x32;	
    		
        ifstream f;
        string filename("test.bin");
        f.open (filename.c_str(), ios::binary );	
        
        if ( !(f.good()) ){
            cout << "error opening file \n";
    		 return -1;
        }
    	
        //----------- find first occurance of pattern in file ----------	 
        f.seekg (0, ios::beg);  
        istream_iterator<unsigned char> first_byte(f);
        istream_iterator<unsigned char> foundAt(f);
    
        foundAt= std::find_first_of( istream_iterator<unsigned char>(f),
                                 istream_iterator<unsigned char> (),
                                 pattern.begin(),pattern.end());
    
        //ERROR: distance() always returns zero
        long at = distance(first_byte, foundAt); 
        
        cout << "Stream methods says it's at " << at << ".\n";
        
        //-------------- the indirect method works --------------------
        char c; 
        int numBytes=0;	
        vector<unsigned char> v; 
        
        // throw a chunk of file into a unsigned char vector
    	f.seekg (0, ios::beg); 	
    	while( f.get(c) && numBytes++ < 8192 ){
    		std::inserter(v,v.end())= c;
    	}  
    	
    	// find iterator to first pattern in char vector	
    	vector<unsigned char>::iterator pos;
    	pos = std::search (	v.begin(),v.end(),
    						pattern.begin(),pattern.end());
    	
    	// determine index from the return iterator
    	int where = distance(v.begin(),pos);   // GREAT: works perfect
    	
    	cout << "Indirect vector methods says it's at " << where << ".\n";
    	
    	f.close();
        return 0;
    }
    Last edited by gary_dr; 11-14-2005 at 01:20 PM.

  6. #6
    Join Date
    Nov 2003
    Posts
    4,118
    I tested this code and I have both good and bad news:
    the good news is that distance() actually works! it returns the distance between the two stream iterators, which happens to be zero. No the bad news: the two iterators are considered identical because that's what their overloaded == disctates, although their aren't really identical (each of them is pointing to a different character in the file). So this leaves you no choice but to map the file's contents to an auxiliary STL container and then manipulate that container.
    Danny Kalev

  7. #7
    Join Date
    Nov 2005
    Posts
    4
    Thanks Danny, I recently submitted this to C++.moderated which lead to quite a bit of discussion. Your accessment is of course correct, which I discovered just prior to submitting to C++.moderated. Some options other than an auxiliary container may still exist, but with my current pressures considered, I'll postpone that effort until later when I will also have a better understanding of iterators in general.

    Thanks for your time.

    If your interested, the C++.moderated extended discussion can be found under "distance between stream iterators" at
    http://groups.google.com/group/comp....850aaa241463ee

  8. #8
    Join Date
    Nov 2003
    Posts
    4,118
    Look slike you hit a loophole in the Standard... What really surprises me is that find_first_of requires forward iterators, and yet the compiler didn't complain about the use of input iterators. This is really bad. I don't know whether it's a problem of underspecification in the Standard or an implementation bug. Since we're using several different compilers (I have tested this code with VC++ 8.0 and C++BuilderX, no disgnostic issues by any of them), I suspect that it's the former. I will have too look it up in the standard though.
    Thanks for the link to that thread. I will keep an eye on it!
    Last edited by Danny; 11-17-2005 at 07:31 PM.
    Danny Kalev

Similar Threads

  1. How stream iterators work.
    By PeterS2 in forum C++
    Replies: 12
    Last Post: 10-22-2005, 01:27 PM
  2. Memory Stream
    By Hunterlmc in forum .NET
    Replies: 1
    Last Post: 07-25-2005, 06:39 AM
  3. Replies: 0
    Last Post: 03-05-2005, 09:35 AM
  4. Socket I-O Stream Multi-Threading
    By bowena in forum Java
    Replies: 2
    Last Post: 09-28-2002, 08:02 AM
  5. Replies: 0
    Last Post: 06-24-2002, 03:07 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center