Good Java Book for Parsing Binary files


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 7 of 7

Thread: Good Java Book for Parsing Binary files

Hybrid View

  1. #1
    Join Date
    Mar 2005
    Posts
    3

    Good Java Book for Parsing Binary files

    Hi

    This is my first time here, and looking for a good book with examples that
    basically begin with the fundamentals up to advanced on the following
    topic.

    I am looking for parsing examples that will look at either textual or binary
    files that can find certain sync patterns in the data and extract those
    bytes whether by length indicator or by a start or end pattern.

    Is there such a book in existance thatwill show me the way to do this
    type of Java programming.

    thanks Bejay

    Ps If I happen to get a reply to this - is it also possible to send me an email
    to - mcdon18@blueyonder.co.uk -

  2. #2
    Join Date
    Mar 2005
    Location
    Sendling, MUC, .de
    Posts
    100
    Hmm, there's the classic Compilers - Principles, Techniques and Tools by Aho, Sethi, Ullman.

    But I'm quite sure that this is NOT what you're looking for. Normally, by "parsing" one refers to the process of accepting a "word" of a formal language in the class of context-free languages but above regular languages. These are all terms of computer science theory; an example of such a context-free language is java (well, strictly speaking it is not context-free but context-sensitive, although...) and a "word" of that language is a valid java program. The task of recognizing some sequence of characters as a valid java program or even analyze it so that it can be compiled to byte code is far from trivial as you might guess. This you definitely should not start with.

    The so-called regular languages are of much simpler structure, eg. expressions of arithmetic are regular. So if you want to implement a program that draws function graphs for example, you can use regular expressions, see the package java.util.regex. The theory behind is that of deterministic finite automata (funnilly, they may also be non-deterministic - makes no difference).

    But if you only want to find out where or if at all a given sequence of bytes occurs within some other sequence, using regular expressions would be like overdressing. There are much more efficient algorithms for that like Boyer-Moore or Donald Knuth's. There are also specialized algorithms for indexing portions of text. They are tricky, but no knowledge about formal languages should be needed.

    So I'm sorry that I could not give you a helpful recommendation for a book, it's just that I'm quite confused about what you're talking in your second paragraph. Maybe with the above, you could describe more precisely where you expect "fundamentals" to begin. Or maybe I'm just too dumb to get what you mean by sync patterns and length indicator...
    Last edited by meisl; 03-24-2005 at 07:42 PM.

  3. #3
    Join Date
    Mar 2005
    Posts
    3
    Meisl

    Thanks for your kind reply, I will follow iup on some of your thoughts. But if I can just expand on some of the things I want to do, with an example below that may help to know if such books exist.

    If I was looking at TCP/IP data from my local LAN network, I would have bytes that would for example begin with 0x45 (IP). Of course there will me MAC address bytes higher in the protocol stack and there will be other bytes lower before I get to the transport layer for example TCP, 0x06. So for this example and other variations to this theme would be to write code that will find my starting byte of 0x45 skip the next number of bytes (which may vary) and then find 0x06, giving me the number of bytes between 0x45 and 0x06, sending this to an output file.

    This would be the sort of thing, but being able to do all sorts of manipulations on files in this manner is what I would like to do.

    Thanks Again - Bejay

  4. #4
    Join Date
    Mar 2005
    Location
    Sendling, MUC, .de
    Posts
    100
    Ok, that gives me somewhat an idea of what you're trying to do.

    Although I see that the network traffic thing is only an example, it immediately reminded me of Ethereal, "the world's most popular network protocol analyzer". They also have a wiki, maybe there's something of interest for you.

    Now for the example problem itself:
    Imagine a "machine" that reads the bytes of a stream, one after the other, but cannot track back. This machine is in one of several states at any point in time and can do a state transition depending on the byte it has just read (and the state it is currently in).
    Besides that, assume the machine to be able to perform some action every time, say, it "leaves" a state - again, depending on the state (and the byte just read).
    Now the machine for your problem would be:
    Code:
    - initial state 1, "search for 0x45":
      If next byte is 0x45, switch to state 2. Otherwise stay in state 1 and continue.
    
    - state 2, "search for 0x06":
      If next byte is 0x06, output "/" and enter state 3.
      Otherwise output "1" and continue (ie. "leave" state 2 for, again, state 2).
    
    - state 3, "found 0x06":
      If next byte is 0x45, enter state 2. Otherwise enter state 1.
    In a way, this simple machine counts the bytes between the 0x45 and 0x06 in the stream. In fact it outputs numbers in the number system to base 1, separated by slashes.

    You may complain: if THAT machine were ever to be implemented in real, its commercial success would at least be questionable...
    Well, the real purpose of little machine is to give you an idea of how to think when approaching such problems. And: although the machine model is (on purpose) extremely limited, could you have described state 3 instantly, if it had not been given?

    Now you know how to do "the sort of thing", I guess. But "all sorts of manipulations"? First thing is: limit "all sorts" to some reasonable set of requirements. Second: how would you specify which thing to do, ie. tell your program what sort of analysis or manipulation it should perform? Keep in mind that to handle the user input may easily amount to a far bigger problem than the analysis/manipulation itself (that is, if you tell your program what to do via some textual input, this input might need to be parsed in the sense mentioned earlier).


    Hmm, again no book recommendation so far So I'd like to give one more (probably useless?): Computability and Logic by Boolos, Burgess and Jeffrey. On the cover it says that it "has become a classic because of its accessibility to students without a mathematical background, ...", at least.

    --
    p.s.: The machine described is called a DFA (deterministic finite automaton) with output. You may try the following exercise: is it possible that such a machine outputs numbers of a system to a base greater than 1, eg. 2? If yes, what is the base-2-machine for the above problem? If not, prove that and describe the necessary but most defensive extension to the machine model.
    Last edited by meisl; 03-24-2005 at 07:40 PM.

  5. #5
    Join Date
    Mar 2005
    Location
    Sendling, MUC, .de
    Posts
    100

    supplement

    Stream manipulation at work you can see here. Behind the scenes, there are ... you know what, don't you?
    Last edited by meisl; 03-24-2005 at 07:09 PM.

  6. #6
    Join Date
    Mar 2005
    Posts
    3
    Meisl,

    Thanks again for your input. Yes I have been using ETHEREAL for many years and this is my reason for learning the deeper arts of Java programming. Which I have to admit is still in its humble beginnings. I was hoping to cut down on the learning curve to hone in on the area of Java programming that I want to do if I could find a book specifically dedicated to that, and to learn by examples.

    However, you have inspired me to delve more, and to read more. Thanks again.

    Brian

  7. #7
    Join Date
    May 2005
    Posts
    1

    chk out www.examsguide.com for links to java books and sites

    http://www.examsguide.com provides related links to important sites and book publishers for SCJP,OCP,CCNA,MCSE,C,C++,SQL,PL/SQL,PHP,PERL,HTML,XML

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center