I forget what I was looking for

when I stumbled onto Haruhiko Okumura’s lzss.c, but I was really intrigued.  Every time I’ve seen anything to do with compression, it’s insanely massive.

Except for this.

Including the ‘main’ portion of the source, it’s 180 lines long.  4.3Kb.  That’s microscopic by today’s standards.  On OS X I get a 13kb executable, compared to 76kb for gzip, or 1.6MB for 7za.

And googling around I found a few other variations.  So I figured it would be slightly fun to have a ‘bakeoff’ with the ‘tradtional’ Calgary Corpus, which includes some variable types of data.

Unsurprisingly, 7zip is the best of the bunch.

$ ./testcomp.sh
compiling……..
cleaning up
unzipping
running…….The winner (smallest) is :
261057 6 Jun 20:18 book1.7z
53167 6 Jun 20:18 geo.7z
9472 6 Jun 20:18 obj1.7z
17322 6 Jun 20:18 paper1.7z
43596 6 Jun 20:18 pic.7z
15060 6 Jun 20:18 progl.7z
16748 6 Jun 20:18 trans.7z
30716 6 Jun 20:18 bib.7z
169894 6 Jun 20:18 book2.7z
119399 6 Jun 20:18 news.7z
61758 6 Jun 20:18 obj2.7z
27310 6 Jun 20:18 paper2.7z
12605 6 Jun 20:18 progc.7z
10428 6 Jun 20:18 progp.7z

But the source to 7zip is unwieldy at best.  So how did the small lzss and variants stuff do?

Compression percentage

Compression percentage

Honestly I’m surprised gzip put up a good fight.  Bzip2 & 7zip really fought for the top, The surprise to me was lzhuf leading the old stuff, which has it’s roots back in 1988/1989.  So let’s look at the data without anything modern in the way.

Old Compression only

Old Compression only

So from the numbers, we can see that lzs2 and lz3 run almost identical, with lzs & lzs4 at the bottom.  Now when we look at time, we get something different.

Compression duration

Compression duration

Both lzs & lzs4 take eight or more seconds!  So they are both out, as I’m shopping for something good/fast, and taking this long is out of the question!  So it comes down to how complicated lzhuf2, lzs2 and lzs3 are.

source lines
lzs.c 4360
lzs4.c 4623
lzs2.c 8308
lzs3.c 12844
lzhuf.c 18323
lzhuf2.c 22556

While lzs.c is still pretty impressive for the size, for what I’m going to try thought, I’m going to use lzs2.c as it’s 8kb, and seems to fit the bill.

For anyone who’s interested in running this on their own, here is the package.  I only tested on OS X, it may run on other UNIX stuff, it may not.  Extract it and run ‘testcomp.sh’.  And it may even work!  The only thing on OS X I had to add was ‘-Wno-return-type’ for compiling, as clang doesn’t like ancient source like this…

This entry was posted in compression by neozeed. Bookmark the permalink.
avatar

About neozeed

What is there to tell? I've loved UNIX like things since I was first exposed to QNX in highschool (we had the Unisys ICONS!), and spent the better time of my teenage years trying to get my own UNIX... I should have bought Coherent in retrospect.. Anyways latched onto Linux in 1992, and then got some old BSD admin books and have been hooked on the VAX BSD & other big/ancient things since...!

3 thoughts on “I forget what I was looking for

Leave a Reply

Your email address will not be published.

Notify me of followup comments via e-mail. You can also subscribe without commenting.