DMSDos

While trying to see if STAC ever went into the disk compression busness again, I ran across this interesting bit of software, DMSDOS, which is a kernel module for versions 2.0.36 and 2.2.3 of the Linux kernel support read/write access to STACKER version 3 and 4 containers.  It also support’s MS-DOS’s drive space, and doublespace!

The latest version dmsdos-0.9.2.3-pre2-alt2.tar.gz even runs on modern x86_64 and i386 boxes!  Although in library mode.

# ./mcdmsdos list /tmp/STACVOL.000
mcdmsdos version 0.2.0 (for libdmsdos 0.9.x and newer)
libdmsdos: DMSDOS library version 0.9.2pl3-pre2(alpha test) compiled Jun 12 2014 17:14:06 with options: read-write, doublespace/drivespace(<3), drivespace 3, stacker 3, stacker 4
libdmsdos: mounting CVF on device 0x3 read-write…
libdmsdos: CVF end padding 1 sectors.
libdmsdos: CVF is in stacker 4 format.
-rwxr-xr-x 1 0 0 10396254 Dec 16 1993 13:35 doom.wad
-rwxr-xr-x 1 0 0 68923 Dec 15 1993 01:01 setup.exe
-rwxr-xr-x 1 0 0 20850 Dec 15 1993 01:01 readme.exe
-rwxr-xr-x 1 0 0 627155 Dec 16 1993 13:53 u1_94.exe
-rwxr-xr-x 1 0 0 579187 Dec 16 1993 13:47 doom.exe
-rwxr-xr-x 1 0 0 439 Dec 15 1993 01:01 file_id.diz
-rwxr-xr-x 1 0 0 190 Dec 15 1993 01:01 use1_95.bat
-rwxr-xr-x 1 0 0 218 Dec 15 1993 01:01 use1_94.bat
-rwxr-xr-x 1 0 0 7527 Dec 15 1993 01:01 license.doc

How cool?

Even better it can extract files from the stacker volume!

# ./mcdmsdos copyout /tmp/STACVOL.000 /doom.wad doom.wad
mcdmsdos version 0.2.0 (for libdmsdos 0.9.x and newer)
libdmsdos: DMSDOS library version 0.9.2pl3-pre2(alpha test) compiled Jun 12 2014 17:14:06 with options: read-write, doublespace/drivespace(<3), drivespace 3, stacker 3, stacker 4
libdmsdos: mounting CVF on device 0x3 read-write…
libdmsdos: CVF end padding 1 sectors.
libdmsdos: CVF is in stacker 4 format.
# md5sum doom.wad 981b03e6d1dc033301aa3095acc437ce doom.wad
#

And if you google ‘981b03e6d1dc033301aa3095acc437ce’ you’ll know it’s the registered DOOM version 1.1 data file!

The author tired to get it to work with Microsoft Visual C, and it does not.  It also doesn’t work with MinGW or Cygwin, and the reason is once more again the way GCC handles it’s structures on Linux vs Windows.  Sadly there is no silverbullet fix for this, the structures oddly enough are too small on Windows, and too big for what they should be on Linux.

But at any rate, I though it was cool.

I forget what I was looking for

when I stumbled onto Haruhiko Okumura’s lzss.c, but I was really intrigued.  Every time I’ve seen anything to do with compression, it’s insanely massive.

Except for this.

Including the ‘main’ portion of the source, it’s 180 lines long.  4.3Kb.  That’s microscopic by today’s standards.  On OS X I get a 13kb executable, compared to 76kb for gzip, or 1.6MB for 7za.

And googling around I found a few other variations.  So I figured it would be slightly fun to have a ‘bakeoff’ with the ‘tradtional’ Calgary Corpus, which includes some variable types of data.

Unsurprisingly, 7zip is the best of the bunch.

$ ./testcomp.sh
compiling……..
cleaning up
unzipping
running…….The winner (smallest) is :
261057 6 Jun 20:18 book1.7z
53167 6 Jun 20:18 geo.7z
9472 6 Jun 20:18 obj1.7z
17322 6 Jun 20:18 paper1.7z
43596 6 Jun 20:18 pic.7z
15060 6 Jun 20:18 progl.7z
16748 6 Jun 20:18 trans.7z
30716 6 Jun 20:18 bib.7z
169894 6 Jun 20:18 book2.7z
119399 6 Jun 20:18 news.7z
61758 6 Jun 20:18 obj2.7z
27310 6 Jun 20:18 paper2.7z
12605 6 Jun 20:18 progc.7z
10428 6 Jun 20:18 progp.7z

But the source to 7zip is unwieldy at best.  So how did the small lzss and variants stuff do?

Compression percentage

Compression percentage

Honestly I’m surprised gzip put up a good fight.  Bzip2 & 7zip really fought for the top, The surprise to me was lzhuf leading the old stuff, which has it’s roots back in 1988/1989.  So let’s look at the data without anything modern in the way.

Old Compression only

Old Compression only

So from the numbers, we can see that lzs2 and lz3 run almost identical, with lzs & lzs4 at the bottom.  Now when we look at time, we get something different.

Compression duration

Compression duration

Both lzs & lzs4 take eight or more seconds!  So they are both out, as I’m shopping for something good/fast, and taking this long is out of the question!  So it comes down to how complicated lzhuf2, lzs2 and lzs3 are.

source lines
lzs.c 4360
lzs4.c 4623
lzs2.c 8308
lzs3.c 12844
lzhuf.c 18323
lzhuf2.c 22556

While lzs.c is still pretty impressive for the size, for what I’m going to try thought, I’m going to use lzs2.c as it’s 8kb, and seems to fit the bill.

For anyone who’s interested in running this on their own, here is the package.  I only tested on OS X, it may run on other UNIX stuff, it may not.  Extract it and run ‘testcomp.sh’.  And it may even work!  The only thing on OS X I had to add was ‘-Wno-return-type’ for compiling, as clang doesn’t like ancient source like this…