Adding deflate support to NihAV « Kostya's Boring Codec World (original) (raw)

Since I wanted to do something different I decided to finally implement deflate support for NihAV—by which I mean compression support in addition to decompression. Here is how well it went.

As usual, my goal was to implement it in mostly straightforward way but with reasonable speed instead of having something completely on par with zlib or better.

At first I implemented the simplest form of compression – copying data without compression (but with the proper headers and ADLER-32 checksum at the end). Then I added a simple encoding with fixed codes that simply output symbols as it—no compression yet but at least it tests how well bitstream is written. Then I moved to dynamic codes. Then I added a brute force search and started encoding matches. So by the end of the weekend I had something working already and then I could make it faster and/or better.

Of course the first thing to remember is that you can reduce search time by using some structure for a faster text search. I think suffix trie is now popular but I settled for an old-fashioned hash by three bytes. Initially it was twice as slow since while the number of string comparisons decreased hundredfold, updating hash table on each step consumed too much time. So I switched to linked-list hash that resembles FAT somewhat (i.e. for each position in the input you have a pointer to the next location of the same three-letter hash plus an additional table pointing to the start of chain for each hash key). And I calculated it once per a large block just discarding matches outside of the desired range. Of course this can be done better but it worked fast enough for me.

Now the compression. There are three main strategies I tried: naïve aka greedy one (you simply output the longest match you can find at the current step), lazy (you also check the next position if it produces even better match and use it if possible—surprisingly enough it gives a significant benefit) and theoretically optimal (you construct a trellis and see which combination and literals can give you the best coding; it has issues but theoretically it’s the best one).

So why it’s “theoretically optimal” and not just optimal? Because it needs to calculate the accurate bit cost and you can’t know it until you produce all the symbols to be encoded and calculate the actual lengths for them. Of course you can do it in an iterative process or employ a heuristic to predict bit length somehow but I simply used “9 bits for the symbol and 5 bits plus escape bits for distance additionally if it’s present”. I think for some cases it even produced larger files than lazy decoding.

Here is a list from the top of my head of things than can be improved (but I guess anybody who has written a LZ77-based compressor knows it better than me):

method selection—sometimes copying data verbatim is better (in the case of noise) or using fixed codes (because the overhead from transmitting dynamic codes eats all the advantage);
partitioning—currently I use 64kB blocks but depending on content (usually detected by the symbol frequency variations) it’s better to cut block earlier or make it larger. I played a bit with the block size but changing it (in either direction) currently leads to compression ratio drops;
faster search for the matching strings;
heuristics for either faster optimal parsing or better-compressing other method.

Of course some of it can be sped up by simply using unsafe Rust so no checks on array access are performed but I don’t think it’s worth it now.

And finally here are some benchmarks for the curious ones performed on a source file of the program:

copy: 32156 bytes (from 32145 bytes)
fixed codes and greedy search: 7847 bytes, 80ms
dynamic codes and greedy search: 6818 bytes, 80ms
dynamic codes and lazy search: 6665 bytes, 100ms
dynamic codes and “optimal” search: 6529 bytes, 690ms
gzip -9 for the reference: 6466 bytes, <10ms

As you can see it’s not fast but it works. I also checked that the resulting compressed data is decoded fine (plus some tests on large files that will be split into several blocks). Now all that’s left is to implement ZMBV decoder and encoder.

This entry was posted on Tuesday, May 18th, 2021 at 1:16 pm and is filed under NihAV. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.