Good Java Book for Parsing Binary files
This is my first time here, and looking for a good book with examples that
basically begin with the fundamentals up to advanced on the following
I am looking for parsing examples that will look at either textual or binary
files that can find certain sync patterns in the data and extract those
bytes whether by length indicator or by a start or end pattern.
Is there such a book in existance thatwill show me the way to do this
type of Java programming.
Ps If I happen to get a reply to this - is it also possible to send me an email
to - firstname.lastname@example.org -
Hmm, there's the classic Compilers - Principles, Techniques and Tools by Aho, Sethi, Ullman.
But I'm quite sure that this is NOT what you're looking for. Normally, by "parsing" one refers to the process of accepting a "word" of a formal language in the class of context-free languages but above regular languages. These are all terms of computer science theory; an example of such a context-free language is java (well, strictly speaking it is not context-free but context-sensitive, although...) and a "word" of that language is a valid java program. The task of recognizing some sequence of characters as a valid java program or even analyze it so that it can be compiled to byte code is far from trivial as you might guess. This you definitely should not start with.
The so-called regular languages are of much simpler structure, eg. expressions of arithmetic are regular. So if you want to implement a program that draws function graphs for example, you can use regular expressions, see the package java.util.regex. The theory behind is that of deterministic finite automata (funnilly, they may also be non-deterministic - makes no difference).
But if you only want to find out where or if at all a given sequence of bytes occurs within some other sequence, using regular expressions would be like overdressing. There are much more efficient algorithms for that like Boyer-Moore or Donald Knuth's. There are also specialized algorithms for indexing portions of text. They are tricky, but no knowledge about formal languages should be needed.
So I'm sorry that I could not give you a helpful recommendation for a book, it's just that I'm quite confused about what you're talking in your second paragraph. Maybe with the above, you could describe more precisely where you expect "fundamentals" to begin. Or maybe I'm just too dumb to get what you mean by sync patterns and length indicator...
Last edited by meisl; 03-24-2005 at 07:42 PM.
Thanks for your kind reply, I will follow iup on some of your thoughts. But if I can just expand on some of the things I want to do, with an example below that may help to know if such books exist.
If I was looking at TCP/IP data from my local LAN network, I would have bytes that would for example begin with 0x45 (IP). Of course there will me MAC address bytes higher in the protocol stack and there will be other bytes lower before I get to the transport layer for example TCP, 0x06. So for this example and other variations to this theme would be to write code that will find my starting byte of 0x45 skip the next number of bytes (which may vary) and then find 0x06, giving me the number of bytes between 0x45 and 0x06, sending this to an output file.
This would be the sort of thing, but being able to do all sorts of manipulations on files in this manner is what I would like to do.
Thanks Again - Bejay
Ok, that gives me somewhat an idea of what you're trying to do.
Although I see that the network traffic thing is only an example, it immediately reminded me of Ethereal, "the world's most popular network protocol analyzer". They also have a wiki, maybe there's something of interest for you.
Now for the example problem itself:
Imagine a "machine" that reads the bytes of a stream, one after the other, but cannot track back. This machine is in one of several states at any point in time and can do a state transition depending on the byte it has just read (and the state it is currently in).
Besides that, assume the machine to be able to perform some action every time, say, it "leaves" a state - again, depending on the state (and the byte just read).
Now the machine for your problem would be:
In a way, this simple machine counts the bytes between the 0x45 and 0x06 in the stream. In fact it outputs numbers in the number system to base 1, separated by slashes.
- initial state 1, "search for 0x45":
If next byte is 0x45, switch to state 2. Otherwise stay in state 1 and continue.
- state 2, "search for 0x06":
If next byte is 0x06, output "/" and enter state 3.
Otherwise output "1" and continue (ie. "leave" state 2 for, again, state 2).
- state 3, "found 0x06":
If next byte is 0x45, enter state 2. Otherwise enter state 1.
You may complain: if THAT machine were ever to be implemented in real, its commercial success would at least be questionable...
Well, the real purpose of little machine is to give you an idea of how to think when approaching such problems. And: although the machine model is (on purpose) extremely limited, could you have described state 3 instantly, if it had not been given?
Now you know how to do "the sort of thing", I guess. But "all sorts of manipulations"? First thing is: limit "all sorts" to some reasonable set of requirements. Second: how would you specify which thing to do, ie. tell your program what sort of analysis or manipulation it should perform? Keep in mind that to handle the user input may easily amount to a far bigger problem than the analysis/manipulation itself (that is, if you tell your program what to do via some textual input, this input might need to be parsed in the sense mentioned earlier).
Hmm, again no book recommendation so far So I'd like to give one more (probably useless?): Computability and Logic by Boolos, Burgess and Jeffrey. On the cover it says that it "has become a classic because of its accessibility to students without a mathematical background, ...", at least.
p.s.: The machine described is called a DFA (deterministic finite automaton) with output. You may try the following exercise: is it possible that such a machine outputs numbers of a system to a base greater than 1, eg. 2? If yes, what is the base-2-machine for the above problem? If not, prove that and describe the necessary but most defensive extension to the machine model.
Last edited by meisl; 03-24-2005 at 07:40 PM.
Stream manipulation at work you can see here. Behind the scenes, there are ... you know what, don't you?
Last edited by meisl; 03-24-2005 at 07:09 PM.
Thanks again for your input. Yes I have been using ETHEREAL for many years and this is my reason for learning the deeper arts of Java programming. Which I have to admit is still in its humble beginnings. I was hoping to cut down on the learning curve to hone in on the area of Java programming that I want to do if I could find a book specifically dedicated to that, and to learn by examples.
However, you have inspired me to delve more, and to read more. Thanks again.
chk out www.examsguide.com for links to java books and sites
http://www.examsguide.com provides related links to important sites and book publishers for SCJP,OCP,CCNA,MCSE,C,C++,SQL,PL/SQL,PHP,PERL,HTML,XML
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL