Binary Compression


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 13 of 13

Thread: Binary Compression

  1. #1
    Join Date
    Oct 2005
    Location
    Needham, MA
    Posts
    6

    Question Binary Compression

    Im making a compression program for my senior project at my school, and I was wondering if it would work if I got the binary of every chacter in the file, and then applied the RLE - Run Length Encoding compression technique to it. It makes sense, but I dont think its been done and I was wondering if it was possible.

  2. #2
    Join Date
    Jul 2005
    Location
    SW MO, USA
    Posts
    299
    What is RLE?
    Could you explain the algorithm or technique?

  3. #3
    Join Date
    Oct 2005
    Location
    Needham, MA
    Posts
    6
    say i have a file thats this:
    aabbbnddiiiiidddlllllllnn
    the compression would be:
    2a3bn2d5i3d8l2n
    now that really wouldn't work with everything because "2a" might take up more size than "aa" because of the binary. Thats why i was wondering if it would make more sense to do that in binary.

    here is more info:
    http://en.wikipedia.org/wiki/Run-length_encoding

  4. #4
    Join Date
    Jul 2005
    Location
    SW MO, USA
    Posts
    299
    How many bits do you look at to find a repeating sequence/run length? In your example it was 8.
    In your example you only have 1 byte (or digit) for the count. What if there are more than 9?

    You would get no compression with ordinary english text. Very few repeated chars.
    Must be more to it than that.

  5. #5
    Join Date
    Jul 2005
    Location
    the Netherlands
    Posts
    128
    Quote Originally Posted by Dellion
    because "2a" might take up more size than "aa" because of the binary.
    What do you mean by that?

  6. #6
    Join Date
    Sep 2005
    Location
    TX
    Posts
    23

    Talking

    I did something similar to this a year past. We used a tree data structure and the huffman compression scheme. It's not very complicated at all. Of course, if you don't how to create a tree data structure you'll have to do that.

    But there's an abundance of walk-throughs, etc on how to do this online.
    Google it.

  7. #7
    Join Date
    Oct 2005
    Location
    Needham, MA
    Posts
    6
    Quote Originally Posted by prometheuzz
    What do you mean by that?
    ok...i dont know this off the top of my head, and this will probably be wrong, but youll get what im tying to say.

    What I mean by that prometheuzz is 2a (in binary) might be 1011010 (im not stupid see above), and aa might be 110110. 2a has one more binary character taking up more space.

  8. #8
    Join Date
    Jul 2005
    Location
    the Netherlands
    Posts
    128
    Quote Originally Posted by Dellion
    ok...i dont know this off the top of my head, and this will probably be wrong, but youll get what im tying to say.

    What I mean by that prometheuzz is 2a (in binary) might be 1011010 (im not stupid see above), and aa might be 110110. 2a has one more binary character taking up more space.
    Aha, like that.
    But if you have a long list of 1's and 0's, this string for example:
    11001110101101000010101
    What would be the first character then? (11001 or 110011).

    If you convert everything in binary, I think it's best to use the same length for every character.

  9. #9
    Join Date
    Oct 2005
    Location
    Needham, MA
    Posts
    6
    like a dictionary?

  10. #10
    Join Date
    Jul 2005
    Location
    the Netherlands
    Posts
    128
    Quote Originally Posted by Dellion
    like a dictionary?
    I mean that if you would only use [a-z] there are 26 different characters.
    These can all be described using 5 bits:

    a = 00001
    b = 00010
    c = 00011
    d = 00100
    e = 00101
    ...
    z = 11010


    that way you know 000110010000011 = 00011-00100-00011 = cdc (always 5 bits)

  11. #11
    Join Date
    Dec 2004
    Location
    San Bernardino County, California
    Posts
    1,468
    Did you follow up on nescafe's message that you look at huffman encoding? You perform an evaluation of your text, get the "frequency of occurence" of each character (including white space characters). You then build a tree which "encodes" each character of your alphabet.

    The more frequently used characters are written in fewer bits - perhaps one as quickly written as "11", and (if properly performed) there is no ambiguity.
    You would then create your encoding of the entire text, probably stored as a bitset, which would be a substantial compression of the size of the text stored as unicode characters.

    Other encoding/compression mechanisms to look at are tries (&quotPatricia" tries and suffix trees)

    There are many references about these applications of data structures/algorithms on the Internet

  12. #12
    Join Date
    Oct 2005
    Location
    Needham, MA
    Posts
    6
    thanks! that really helped alot prometheuzz!

  13. #13
    Join Date
    Jul 2005
    Location
    the Netherlands
    Posts
    128
    Quote Originally Posted by Dellion
    thanks! that really helped alot prometheuzz!
    You're welcome. But to get some real compression you should take a look at what nescafe and nspils suggested.

    Good luck.

Similar Threads

  1. Binary file
    By AM003295 in forum VB Classic
    Replies: 6
    Last Post: 03-11-2005, 12:36 AM
  2. Why is this not binary compatible
    By Kip Fryman in forum VB Classic
    Replies: 2
    Last Post: 09-07-2001, 07:19 AM
  3. Registry question: Save and read binary data
    By Paulo Costa in forum VB Classic
    Replies: 4
    Last Post: 07-31-2001, 02:15 PM
  4. Registry question: Save and read binary data
    By Paulo Costa in forum VB Classic
    Replies: 0
    Last Post: 07-30-2001, 06:15 PM
  5. binary data in XML
    By Jamie Harsevoort in forum XML
    Replies: 3
    Last Post: 12-18-2000, 04:44 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center