DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 21
  1. #1
    Join Date
    Oct 2006
    Posts
    5

    Reading a file using structures?

    Hello guys. I am a newcomer in here.. nice site ;)
    well the thing is, I want to read a binary file. Here I have a list of how the file is set up:
    http://magos.thejefffiles.com/War3Mo...sMdxFormat.txt
    but I was just wondering. since this file very much looks like a c++ file, is there a way to read the whole file into some "digital" structues which I can easily extract the values from?

  2. #2
    Join Date
    Nov 2003
    Posts
    4,118
    Sure, you just need to define structs with the same layout, then open the file in binary mode, read sizeof(struct something) bytes and cast the raw bytes just read to a struct of type something.
    Danny Kalev

  3. #3
    Join Date
    May 2006
    Posts
    176
    In my opinion since your definitions does not have a fixed size, you cannot simply read file using just file-read functions. For instance, in case of your GeosetTranslation structure, you have first to read several fields including NrOfTracks and InterpolationType, and then, based on got data, decide which kind and amount of TranslationTrack sub-structures to read further.

    I think in order to represent you structures in C++, you have to use unions and variable-size arrays, and dynamic memory allocation. For instance, the GeosetTranslation structure probably will look like this:

    Code:
    struct GeosetTranslation
    {
      struct GeosetTranslationHeader
      {
        DWORD Signature;                     // 'KGTR';
    
        DWORD NrOfTracks;
        DWORD InterpolationType;             //0 - None
                                             //1 - Linear
                                             //2 - Hermite
                                             //3 – Bezier
                                             // TODO: use enumeration
        DWORD GlobalSequenceId;
      } Header;
    
      union 
      {
         struct TranslationTrack_Type1       // if(InterpolationType <= 1)
         {
           DWORD Time;
           FLOAT3 Translation;
         } Type1[];                          // NrOfTracks items
         struct TranslationTrack_Type2       // if(InterpolationType > 1)
         {
           DWORD Time;
           FLOAT3 Translation;
           FLOAT3 InTan;
           FLOAT3 OutTan;
         } Type2[];                           // NrOfTracks items
      } TranslationTracks;
    };
    In your read operation, first you have to read the Header structure, e.g.:

    Code:
    GeosetTranslation::GeosetTranslationHeader header;
    
    fread(&header, sizeof(header), 1, file);
    Then you calculate other sizes and allocate space for entire GeosetTranslation structure:


    Code:
    int variable_size = 
    (header.InterpolationType > 1 ? sizeof(GeosetTranslation::TranslationTrack_Type2) : sizeof(GeosetTranslation::TranslationTrack_Type1);
    
    GeosetTranslation * geosetTranslation = 
    	(GeosetTranslation *)new char[sizeof(GeosetTranslation::GeosetTranslationHeader) + variable_size * header.NrOfTracks];
    
    geosetTranslation->Header = header;
    
    fread(&geosetTranslation->TranslationTracks, variable_size, header.NrOfTracks, file);
    Note that in order to avoid additional unused bytes added to your structures in certain cases by C++ compiler, you probably have to use the #pragma pack(1) directive widely.

    I hope this helps.
    Last edited by Viorel; 10-20-2006 at 08:57 AM.

  4. #4
    Join Date
    Oct 2006
    Posts
    5
    wow.. well.. I might need to wait doing all this. I am quite new to c++ still and this is hard to be able to write myself.. but thanks

  5. #5
    Join Date
    Dec 2003
    Posts
    3,366
    If you simply make your data stores all the same size, its a lot simpler and easily doable by a novice.

  6. #6
    Join Date
    Oct 2006
    Posts
    5
    oh, but I mean.. this is a file I found about the fileformat .mdx. I didn't make it myself.I just wanted to make a script that extracts the information about the model

  7. #7
    Join Date
    Nov 2003
    Posts
    4,118
    Quote Originally Posted by Viorel
    Note that in order to avoid additional unused bytes added to your structures in certain cases by C++ compiler, you probably have to use the #pragma pack(1) directive widely.
    This is dangerous and unnecessary. some platforms will not be able to prcoess an improperly aogned struct, causing a crash. It's therefore best to stick to the default alignment, ensuring that the code that writes to the file and the code that reads from the file are compiled with the same compiler, compilation flags and OS.
    Danny Kalev

  8. #8
    Join Date
    May 2006
    Posts
    176
    Quote Originally Posted by Danny
    [#pragma pack(1) -- ] This is dangerous and unnecessary [...]
    I supposed that #pragma pack(1) is required in case we read and write a complex structure (containing a series of sub-structures) with a single file-read/file-write operation, and we do not want to have unnecessary gaps in the file. Without compact packing, I think we will need to serialize some (or all) of members separately with more effort.

  9. #9
    Join Date
    Oct 2006
    Posts
    5
    Is it hard to put all of this in a single big structure called file {}?
    then it'd be awesome if you guys could that, cause I cant do this before i've been playing around with c++ for like a year :D

  10. #10
    Join Date
    Nov 2003
    Posts
    4,118
    Quote Originally Posted by Viorel
    I supposed that #pragma pack(1) is required in case we read and write a complex structure (containing a series of sub-structures) with a single file-read/file-write operation, and we do not want to have unnecessary gaps in the file. Without compact packing, I think we will need to serialize some (or all) of members separately with more effort.
    As long as you're using the same ABI, it shouldn't be a problem: first read the entire struct, including its substructs into a flat byte array. Then manipoulate the byte array, divinding it into the substructs, assigning fields, allocating pointers etc. It's quite tricky but you always have to do that when you have transient data such as pointers that have to be allocated at runtime. The padding byes don't matter much -- they have to present in the memory buffer as well, so whether you read them from the file or let the runtime system add them later doesn't matter much. In any event, don't use the #pragma pack directive unless you're designing a universal protocol that has to be used by several differemt platforms. The overhead associated with this packing and the potnetial for runtime crashes due to misalignment is rather high and uncalled for.
    Danny Kalev

  11. #11
    Join Date
    Oct 2006
    Posts
    5
    I don't agree with Danny. While working with third-party (binary) file formats, I've found 2 approaches for accurate reading the file header into a structure:

    1) Reading each element one by one which is tedious and error prone.
    2) Reading into the original file header struct with byte alignment #pragma pack(1).

    No matter what I tried to do, without the pragma pack(1), file headers always get corrupted because the aligned struct cannot represent accurately the original file header and the results are unpredictable. Note that, this is while working with third-party files, where I cannot change elements alignment and the file header is strict.
    Of course, to leave the rest of the code and data aligned I always surround the struct with:

    #pragma pack(push)
    #pragma pack(1)

    // ...struct...

    #pragma pack(pop)

    So everything else is intact.

  12. #12
    Join Date
    Oct 2006
    Posts
    5
    well. would someone do this for me or direct me to a tutorial to do this thing? Cause I am totally lost by now :)

    But nice answers anyway.

  13. #13
    Join Date
    Nov 2003
    Posts
    4,118
    Quote Originally Posted by Enlight
    I don't agree with Danny. While working with third-party (binary) file formats, I've found 2 approaches for accurate reading the file header into a structure:

    1) Reading each element one by one which is tedious and error prone.
    2) Reading into the original file header struct with byte alignment #pragma pack(1).

    No matter what I tried to do, without the pragma pack(1), file headers always get corrupted because the aligned struct cannot represent accurately the original file header and the results are unpredictable. Note that, this is while working with third-party files, where I cannot change elements alignment and the file header is strict.
    Of course, to leave the rest of the code and data aligned I always surround the struct with:

    #pragma pack(push)
    #pragma pack(1)

    // ...struct...

    #pragma pack(pop)

    So everything else is intact.
    OK, but how do you deal with platform dependent sizes? For example,a struct that has a virtual function has an internal member _vptr, whose size may be 4 or 8 bytes. You can't tell it in advance, nor can you tell where the vptr is located. What about endian-ness? It's another problem that #pragma can't solve. So the botom line is this: you need to know the ABI of the writer, in order to reconstruct the struct successfully. Once you know that ABI, there's no need to use the #pragma directive (which is anyway non-portable!).
    The main problem with this tight packing is that some platforms simply can't deal with improperly aligned structs. If you try to access a char member in such a struct whose offset isn't disivisble by 4, 8 or whatever, the app will crash. On Intel processors a crash is unlikely but there's a significant performance overhead. So the best way to handle this case portably is to ensure that all data members have the size of a native word, or multiples thereof.
    I still don't understand what caused the problem in your code. The header itself? What does it contain?
    Last edited by Danny; 10-24-2006 at 11:15 AM.
    Danny Kalev

  14. #14
    Join Date
    Oct 2006
    Posts
    5
    Now I get your point Danny, but binary files are not intended to be portable or endian-aware. Thats why these days binary formats are being replaced with XML files. For example, the Xing MP3 header stores data in Big-Endian format. No one expects you to read it in Little-Endian format.

    Now the quick example:

    Code:
    #include "stdio.h"
    
    #pragma pack(push)
    #pragma pack(1)
    struct sampleStruct
    {
    	char	one_byte;
    	short	two_bytes;
    	int	 four_bytes;
    	char	last_byte;
    
    };
    #pragma pack(pop)
    
    void main()
    {
    	sampleStruct dest;
    	FILE * f;
    	fopen_s(&f,"data.bin","rb");
    	fread(&dest,sizeof(dest),1,f);
    	fclose(f);
    	return;
    }
    Data in data.bin is: 0x01 , 0x0202 , 0x03030303 , 0x04
    After reading, you have each member with the correct data.

    one_byte = 0x01
    two_bytes = 0x0202
    four_bytes = 0x03030303
    last_byte = 0x04

    Without the pack(1), each compiler will align the members with a value (WORD, DWORD, 8-bytes...) and the data gets corrupted, for example, in my case, my compiler did this without the pack(1):

    one_byte = 0x01
    two_bytes = 0x0302
    four_bytes = 0x04030303
    last_byte = [garbage]

    So the conclusion is, binary files, are non portable. If you want to read third-party binary data, use byte alignment and be aware of the file endian-ness.
    If you write your own binary data using a struct, forget about the pragma pack(1) and you'll get portable code, but unportable data between different platforms, which is not recommended if you really want the application to be portable. If this is the case, a better solution can be to go into the XML world for clean and safe data manipulation.

  15. #15
    Join Date
    Nov 2003
    Posts
    4,118
    I have to disagree. Without the #pragma directive, the struct will be stored with padding bytes in the filw, which is what we agree about. Your mistake is that you're trying to read individual fields instead of reader sizeof(sampleStruct) bytes from the file, and then super impose those very bytes on an empty sampleStruct:
    sampleStruct s1, dest={0};
    //populate s1, store it in a file
    fread(&dest,sizeof(dest),1,f);

    This should place every member in the correct offset. If there are padding bytes between members, their garbage values will be written into the cotrresponding padding bytes within dest. Again, since we both agree that binary compatibility is ruled out, I don't see the benefit of using #pragma. Howevever, if you test the code with and witout this directive you will notice a significant performance degradation with #pragma enables. Notice also that rearranging the struct as:
    struct sampleStruct
    {
    char one_byte;
    char last_byte;//not last anymore
    short two_bytes;
    int four_bytes;
    };

    Would make the #pragma pack unnecessary. Admittedly, it's not always possible to rearrange a struct's members but if you can, you certainly should.
    Danny Kalev

Similar Threads

  1. Replies: 14
    Last Post: 11-29-2005, 06:56 AM
  2. Reading and Writing to a file
    By Fergy25 in forum Java
    Replies: 1
    Last Post: 11-14-2005, 04:28 PM
  3. Replies: 2
    Last Post: 11-14-2001, 12:49 PM
  4. Replies: 3
    Last Post: 08-23-2001, 11:01 AM
  5. How to use Java to lock a file
    By Cynthia Leslie in forum Java
    Replies: 3
    Last Post: 06-09-2001, 06:43 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


Top DevX Stories

Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL


Sponsored Links