1. Introduction

In 2017, I had the chance to work on a project that aims to improve the performance of data compression. During the process, I studied the zlib library and the deflate compression algorithm it implements. Here I would like to share my studies to those who also want to have a better understanding of zlib.

Euccas Jan, 2019

1.1. What is zlib

zlib is a free, open source software library for lossless data compression and decompression. It was written by Jean-loup Gailly (compression) and Mark Adler (decompression), in C language. The first version of zlib was released in May 1995. Jean-loup Gailly and Mark Adler also wrote the code for gzip (GNU zip). Under the hood, gzip uses zlib library.

To date, zlib is mainly maintained by Mark Adler, and its recent updates and version releases can all be found on GitHub. Mark Adler is also active on Stack Overflow to answer technical questions about zlib and gzip.

More than many software applications and libraries utilizes zlib. If paying attention, you can find zlib almost everywhere, in operation systems, internet services, streaming services, document editors, and more. This is an incomplete list of those applications.

The specification of zlib achieved official Internet RFC (Request for Comments) status in May 1996.

2. Compression Algorithm

The compression algorithm used in zlib is the deflate method. The deflate method encodes the input data into compressed data. The decompression algorithm used in zlib is the inflate method, which is the decoding process that takes a deflate bit stream for decompression and correctly produces the original full-size data or file.

In this document, I will focus on the compression part of zlib, as well as zlib’s implementation of the deflate algorithm.

I will use the words “data bytes”, “bytes of data”, “data symbols”, “data stream”, “bit stream” to indicate the data to be compressed. In the following part these words have the same meaning and are interchangeable.

2.1. Deflate

Deflate method was originally defined by Phil Katz in PKWARE’s archiving tool PKZIP 2.x. It is a combination of the LZ77 algorithm and Huffman encoding.

The following figure illustrates the deflate and inflate process from a high level.

Deflate Process

2.2. LZ77

LZ77 is a dictionary based lossless compression algorithm. It is also known as LZ1.

The basic idea of dictionary based algorithms is to replace an occurrence of a particular sequence of bytes in data with a reference to a previous occurrence of that sequence.

There are two main types of dictionary based compression algorithms: LZ77 and LZ78. These two algorithms are named after the two creators Jakob Ziv and Abraham Lempel. LZ77 (Lempel-Ziv77) and LZ78 (Lempel-Ziv78) were published in papers in 1977 and 1978, respectively.

LZ77 compression algorithm works by using a sliding window to find sequences of data that are repeated, and encoding each repeated sequence by a pair of numbers called a length-distance pair.

2.2.1. Sliding Window

The sliding window is used to examine the input data sequence, and to maintain the historical data that serve as the dictionary. In other words, the dictionary is a portion of the previously appeared and encoded data.

The sliding window consists of two parts: a search buffer, and a look-ahead buffer. The search buffer contains the dictionary - the recent encoded data, and the look-ahead buffer contains the next portion of input data sequence to be encoded. The following figure gives an example of a sliding window.

Sliding Window

The size of the sliding window is one of the key factors that affect compression performance. If the sliding window is too small, the compressor might find less repeated data sequences, as a result the compressed file size will be larger. If the sliding window is too large, the compressor might need spend longer time to find repeated data sequences, consequently the compression speed will be slower.

In practice, typically the size of the sliding window can be from several KB to MB, such as 4 KB, 32 KB, 1 MB, or 4 MB.

2.2.2. Length-Distance Pair

The length-distance pair indicates that each of the next length characters is the same as the character exactly distance characters behind it in the original data stream.

In LZ77 algorithm, the compressor searches back through the search buffer until it finds a match to the first character in the look-ahead buffer. There could be more than one matches exist in the search buffer, and the compressor will find the one match having the longest length. When the longest match is found, the compressor encodes it with a triple (D, L, C) where:

  • D = distance of the search cursor from the start of look-ahead buffer
  • L = length of longest match
  • C = next character in look-ahead buffer beyond longest match

The reason of adding the third element C in the triple is for handling the case where no match is found in the search buffer. In that case, the values of both D and L are 0, and C is the first character in current look-ahead buffer.

The following figure shows an example of how LZ77 finds a longest match and encodes the repeated characters for a given string “axrrmaxrbaxssr”.

Length Distance Pair

In practice, a compressor can optimize the encoding output according to its own implementation, and choose output formats other than the (D, L, C) triplet.

2.3. Huffman Encoding

Huffman encoding is a statistical compression method. It encodes data symbols (such as characters) with variable-length codes, and lengths of the codes are based on the frequencies of corresponding symbols.

Huffman encoding, as well as other variable-length coding methods, has two properties:

  1. Codes for more frequently occurring data symbols are shorter than codes for less frequently occurring data symbols.
  2. Each code can be uniquely decoded. This requires the codes to be prefix codes, meaning any code for one symbol is not a prefix of codes for other symbols. For example, if code “0” is used for symbol “A”, then code “01” cannot be used for symbol “B” as code “0” is a prefix of code “01”. The prefix property guarantees when decoding there is no ambiguity in determining where the symbol boundaries are.

Huffman encoding has two steps:

  1. Build a Huffman tree from original data symbols. A Huffman tree is a binary tree structure.
  2. Traverse the Huffman Tree and assign codes to data symbols.

Huffman codes can be fixed (static) or dynamic. Both are used in deflate method.

Fixed Huffman codes can be created by examining a large number of data sets, and finding typical code lengths. When using fixed Huffman coding, the same codes are used for all the input data symbols.

Dynamic Huffman codes are generated by breaking input data into blocks, and generating codes for each data block.

3. Implementation of zlib

The implementation of zlib is pragmatic and efficient. Over the past 20 years, people have made many attempts to improve the performance of compression applications, but it seems that we only achieve better performance by using algorithms other than deflate (and inflate), adopting parallel processing, or improving CPU level instructions. So zlib, as it states in its GitHub repository, is quite a massively spiffy yet delicately unobtrusive compression library.

In the following sections, we will look at some of the detailed techniques that zlib uses to implement the deflate compression algorithm.

3.1. Compression Levels

zlib has 10 compression levels (0-9). Different levels have different compression performance in terms of compression ratio and speed. Level 0 means no compression, zlib will output the original data. Level 1 is the fastest, while it has low compression ratio. Level 9 gives the highest compression ratio, but the compression speed is slower. The default compression level zlib uses is 6.

Under the hood, compression level changes the deflate strategy and parameters in the deflate process. More details will be discussed in the following sections.

3.2. Sliding Window

In zlib, the default size of sliding window is 64KB. The sliding window is divided into two parts, corresponding to the search buffer and look-ahead buffer, and each part is 32KB. Input bytes are read into the second half of the window, and move to the first half later to keep a dictionary of at least 32KB. This organization ensures that IO is always performed with a length multiple of the block size (8KB). Also, the 64KB limit is quite useful on older platform MSDOS.

The following code snippet shows how the sliding window is initialized. The macro MAX_WBITS determines the size of the sliding window. It’s configurable and the default value is 15, which leads to a 32KB search buffer and a 64KB sliding window.

1define MAX_WBITS   15 /* 32K LZ77 window */
2
3s->w_bits = windowBits;
4s->w_size = 1 << s->w_bits;
5s->w_mask = s->w_size - 1;
6
7s->window = (Bytef *) ZALLOC(strm, s->w_size, 2*sizeof(Byte));

Data are copied into sliding window when look-ahead buffer becomes insufficient. This process is implemented inside function fill_window.

1local void fill_window(s)
2    deflate_state *s;
3{
4    ...
5}

3.3. Finding Longest Match

The technique zlib uses to find the longest match in the search buffer is straightforward, and it turns out to be the fastest for most input files: use a string matching algorithm to find possible matches, then try all possible matches and select the longest.

The matching algorithm for small strings is inspired from Rabin-Karp algorithm. The key feature of this algorithm is that insertions into the string dictionary are very simple and thus fast, and deletions are avoided completely. Insertions are performed at each input character, whereas string matches are performed only when the previous match ends. So it is preferable to spend more time in matches to allow very fast string insertions and avoid deletions. A brute force approach is used to find longer strings when a small match has been found.

So in summary, the process of finding the longest match has two major parts:

  1. In the sliding window, for each data symbol in the look-ahead buffer, use a Rabin-Karp algorithm based method to find a possible match in the search buffer, and record the match start position. There can be multiple possible matches available.
  2. For each possible match, starting from the match start position, check each following data symbol to find the current longest match. Search in all the possible matches found in step 1, until finding the longest match, or finding a match whose length exceeds the pre-defined longest match limit.

3.3.1. Match Length Limit

zlib defines a MIN_MATCH and a MAX_MATCH as the minimum and maximum match lengths it searches for.

1#define MIN_MATCH  3
2#define MAX_MATCH  258

The MIN_MATCH is set to 3. The reason of having a minimum match length equals to 3 is obvious: the matches shorter than 3 will not help reduce the encoded data size, because the encoded data symbols will have the same or longer length.

This value of MIN_MATCH cannot be changed unless you change related code, such as calling UPDATE_HASH function for how many times.

The MAX_MATCH is set to 258. This number comes from the fact that one length-distance pair, which is the output of the LZ77 encoded data symbols, can represent at most 258 bytes. A length requires at least one bit and a distance requires at least one bit, so two bits in can give 258 bytes out.

The value of MAX_MATCH in zlib can be changed, but the change might affect compression performance. Also in zlib there is some logic controlled by condition MAX_MATCH == 258. Those codes, when enabled, could improve compression performance when using a modern compiler.

1#if (defined(UNALIGNED_OK) && MAX_MATCH == 258)
2        /* This code assumes sizeof(unsigned short) == 2. Do not use
3         * UNALIGNED_OK if your compiler uses a different size.
4         */
5        if (*(ushf*)(match+best_len-1) != scan_end ||
6            *(ushf*)match != scan_start) continue;
7        ...

3.3.2. Rabin-Karp Algorithm

Rabin-Karp algorithm is a string-searching algorithm created by Richard M. Karp and Michael O. Rabin. It uses hashing to find any one of a set of pattern strings in a text. For example, given a text “AABAACAADAABAABA”, and a pattern “AABA”, we can use Rabin-Karp algorithm to find pattern exists in the text at index 0, 9, 12.

Following pseudo code describes how Rabin-Karp algorithm works.

 1# p is a pattern, its length is m
 2# t is text, its length is n
 3# the algorithm searches for pattern p in text t
 4
 5Compute hash_p (for pattern p)
 6Compute hash_t (for the first substring of t with m length)
 7for i = 0 to n - m:
 8    if hash_p == hash_t:
 9        Match t[i . . . i+m-1] with p, if matched return 1
10    else:
11        Update hash_t for t[i+1 . . . i+m] using rolling hash
12End

The average and best case running time of the Rabin-Karp algorithm is O(n+m), where n is the length of text, and m is the length of pattern. But its worst-case time is O(nm). Worst case of Rabin-Karp algorithm occurs when all characters of pattern and text are same as the hash values of all the substrings of text match with hash value of pattern. For example text = “AAAAAAA” and pattern = “AAA”.

The key to the Rabin-Karp algorithm’s performance is the efficient computation of hash values of the successive substrings of the text, by using the rolling hash technique. The benefit of rolling hash is it computes the hash value of the next substring from the previous one by doing only a constant number of operations, rather than having to rehash the complete substring.

3.3.3. Hash Chain

As explained earlier, Rabin-Karp algorithm checks the hash value of substrings in order to find matches in text. To find matches in the search buffer which stores recent data symbols, zlib uses a hash chain organization to keep records of the hash values of every 3 (or other values defined by MIN_MATCH) bytes.

This hash chain in zlib is implemented by using two arrays: prev[] and head[]. Both arrays stores the positions in the sliding window. The head[] array stores the heads of the hash chains, the prev[] array stores and links the positions of strings with the same hash index. The following figure shows an example of how the hash chain works.

Hash Chain

In this example, the HashValue function and its results are just examples, and they are not accurate.

The size of prev[] is limited to half of the sliding window. Because the link that prev[] maintains is only for the data in the search buffer, and that’s only last 32K strings by default. An index in prev[] array is a window index modulo 32K.

Following code snippets show how zlib implements the hash chain organization. The hash size changes with parameter memLevel, which is configured for each compression level.

 1s->hash_bits = (uInt)memLevel + 7;
 2s->hash_size = 1 << s->hash_bits;
 3
 4s->prev   = (Posf *)  ZALLOC(strm, s->w_size, sizeof(Pos));
 5s->head   = (Posf *)  ZALLOC(strm, s->hash_size, sizeof(Pos));
 6
 7#define UPDATE_HASH(s,h,c) (h = (((h)<<s->hash_shift) ^ (c)) & s->hash_mask)
 8
 9#define INSERT_STRING(s, str, match_head) \
10   (UPDATE_HASH(s, s->ins_h, s->window[(str) + (MIN_MATCH-1)]), \
11    match_head = s->prev[(str) & s->w_mask] = s->head[s->ins_h], \
12    s->head[s->ins_h] = (Pos)(str))
13#endif
14
15#define CLEAR_HASH(s) \
16    s->head[s->hash_size-1] = NIL; \
17    zmemzero((Bytef *)s->head, (unsigned)(s->hash_size-1)*sizeof(*s->head));

In CLEAR_HASH, arrayhead[] is cleared. Array prev[] is cleared on the fly, not here.

3.3.4. Adaptive Search Limit

When searching for the longest match in the hash chain, zlib limits the chain length it searches to improve searching efficiency. The search limit is set by:

  1. The predefined max_chain_length value. This value is different for different compression levels.
  2. In the searching process, if a match has been found and its length is not less than a predefined good_match length, the search length will be shortened as new_search_chain_length = search_chain_length / 4

The search limit values are defined in a configuration table:

 1local const config configuration_table[10] = {
 2/*      good lazy nice max_chain_length */
 3/* 0 */ {0,    0,  0,    0, deflate_stored},  /* store only */
 4/* 1 */ {4,    4,  8,    4, deflate_fast}, /* max speed, no lazy matches */
 5/* 2 */ {4,    5, 16,    8, deflate_fast},
 6/* 3 */ {4,    6, 32,   32, deflate_fast},
 7/* 4 */ {4,    4, 16,   16, deflate_slow},  /* lazy matches */
 8/* 5 */ {8,   16, 32,   32, deflate_slow},
 9/* 6 */ {8,   16, 128, 128, deflate_slow},
10/* 7 */ {8,   32, 128, 256, deflate_slow},
11/* 8 */ {32, 128, 258, 1024, deflate_slow},
12/* 9 */ {32, 258, 258, 4096, deflate_slow}}; /* max compression */

In above code snippet:

  • 0 - 9 are compression levels.
  • good, lazy, nice are the values of the length of a good match, a lazy match and a nice match.
  • max_chain_length is the max chain length zlib searches.

3.4. Huffman Encoding

zib implements both static (fixed) Huffman encoding and dynamic Huffman encoding. For each block of LZ77 encoded data, zlib computes the number of bits in the block using both static Huffman encoding and dynamic Huffman encoding, then choose the method which produces smaller amount of data. If the number of bits are equal using two methods, zlib chooses static Huffman encoding as the decoding process is faster.

The whole data stream can contain a mix of static and dynamic Huffman encoded data. The Huffman codes are transmitted in the deflate stream header for each block.

In summary, the Huffman encoding process in zlib consists of the following steps:

  1. During LZ77 encoding process, collect statistics of data bytes and generate a histogram.
  2. For each data block, build a dynamic Huffman tree using the collected statistics.
  3. Compute and compare the encoded block lengths using a dynamic Huffman tree and a static Huffman tree, and decide whether it is worthwhile to use a dynamic or a static tree for the encoding phase.
  4. Perform dynamic or static Huffman encoding on the block of data.

3.5. I/O Buffer

One important notion in the deflate compression is data blocks. The deflate compressed data format is composed of blocks, which have a header that depends on the block data. Therefore the output of deflate comes a block at a time, with nothing written (except a zlib or gzip header) until the first block is completed. Considering how the data blocks transmit in the deflate process, zlib implements several buffers to store data blocks and control I/O performance.

3.5.1. Input Buffer

Before starting compression, zlib accumulates data in an input buffer and starts compression when the input buffer is full. Default input buffer size is 8KB.

The reason of having the input buffer is that zlib won’t generate any output data until 16K data symbols have been generated in the literal buffer (default size of the literal buffer is 16K, see the next section for details about the literal buffer). Therefore starting compression with very short length of data has no benefit. Using the input buffer improves the efficiency of I/O.

The default input buffer size can be changed using gzbuffer function.

1gzbuffer(file, size)
2{
3    ...
4    state->want = size;
5    ...
6}

3.5.2. Literal Buffer

The literal buffer stores data symbols encoded by LZ77. A symbol is either a single byte, coded as a literal, or a length-distance pair, which codes a copy of up to 258 bytes somewhere in the preceding 32K of uncompressed data. Default literal buffer size is 16K, so it can accumulate from 16K to as much as 4MB of uncompressed data (for highly compressible data).

Once the literal buffer is full, zlib decides what kind of block to construct for Huffman encoding, and then does so, creating the header, which for a dynamic block describes the Huffman codes in the block, and then creates the coded symbols for that block. Or it creates a stored or static block, whatever results in the fewest number of bits. Only then is that compressed data available for output.

Default literal buffer size is configured by marco DEF_MEM_LEVEL. In zlib’s code, DEF_MEM_LEVEL = 8, and it’s the same for all compression levels. So all compression levels have the same 16K literal buffer size.

3.5.3. Output Buffer

For outputting the compressed data, zlib uses two buffers: a pending buffer, and an output buffer. The data flow is as shown in the following figure:

Output Buffer

Upon initialization, zlib creates a pending buffer (default size is 36K), and an output buffer (default size is 8K). The output data are first accumulated in pending buffer, and then get copied to output buffer, finally be written to the output compressed zip or gz files.

Data are copied from pending buffer to output buffer in function flush_pending. This function is called when the literal buffer is full, which means a block of data has been processed. It’s also called in some other cases when flushing a block is needed. The length of data copied to the output buffer is limited by the available space in output buffer.

When the output buffer is full, or when flush signal is issued, zlib writes output buffer to zip or gz files.

zlib uses counters pending_out and avail_out to record how many bytes are available in pending buffer and output buffer. Counter value 0 means the buffer is full.

Related code snippets:

1flush_pending() {
2    len = s->pending;
3    if (len > strm->avail_out) len = strm->avail_out;
4    if (len == 0) return;
5 
6    zmemcpy(strm->next_out, s->pending_out, len);
7}
 1gzcomp() {
 2    if (strm->avail_out == 0 || (flush != Z_NO_FLUSH &&
 3            (flush != Z_FINISH || ret == Z_STREAM_END))) {
 4        have = (unsigned)(strm->next_out - state->x.next);
 5        if (have && ((got = write(state->fd, state->x.next, have)) < 0 || (unsigned)got != have)) {
 6            gz_error(state, Z_ERRNO, zstrerror());
 7            return -1;
 8        }
 9        if (strm->avail_out == 0) {
10            strm->avail_out = state->size;
11            strm->next_out = state->out;
12       }
13    }
14}

3.6. Misc.

This section is empty and waiting for my future updates.

4. Optimizing zlib

More than a few companies have been interested in improving compression performance by optimizing the implementation of zlib. Following is a summary of some of the recent related works.

4.1. Intel: zlib-new

Reference: Intel: High Performance ZLIB compression on Intel Architecture Processors (pdf)

Optimizations focus on improved hashing, the search for the longest prefix match of substrings in LZ77 process, and the Huffman code flow.

Improved hashing: For compression levels 1 through 5, hash elements as quadruplets (match at least 4 bytes). For compression levels 6 to 9, use zlib’s original hashing elements as triplets (match at least 3 bytes).

Add two additional strategies: DEFLATE_quick for level 1, DEFLATE_medium for level 4 to 6.

  • DEFLATE_quick: Limit hash chain search to the first entry.
  • DEFLATE_medium: Designed to strike a balance between the compression ratio of zlib’s DEFLATE_slow strategy, and the performance of zlib’s DEFLATE_fast strategy. It skips forwarding by the match length after a match is found and supports a variant of lazy match evaluation. When a match is found at position p and of length l, checks for a match at position p + l + 1. If a new match is found, scans backwards from position p + l + 1, to determine whether the second match can be improved.

Faster hash table shifting: Leverage SSE (Intel ©) to operate hash shifting on eight entries (16 bytes) at a time.

Faster CRC calculation: Leverage PCLMULQDQ (Intel ©) instruction to process 64 bytes of input at a time, with altered algorithm.

Reduce loop unrolling: Remove the excessive loop unrolling in Adler32 and CRC32 computations on Modern processors. For Adler32, reduce the unrolling factor from 16 to 8. For CRC32, reduce the unrolling factor from 8 to 4.

4.2. IBM: fast deflate

Reference: IBM: A fast implementation of Deflate

Optimizations focus on improving the speed of zlib deflate process by using LZ4’s repetition elimination process, and improved Huffman coding process.

Replace LZ77 with LZ4 for faster repetition elimination: Observed speed-up is moderate, didn’t achieve LZ4’s performance probably due to differences in cache usage. Also observed compression ratio is hardly affected.

Force the use of static Huffman tree: Reached fast speed but could have negative effect on compression ratio.

Semi-static Huffman encoding: Use static tree to compress the first block of intermediate data (LZ77’s output), and collect statistics of this first block to build a Huffman tree for encoding later blocks. The first block’s size is configuration, default is 8KB. The sizes of other blocks are also configurable, default is 128KB. An additional step added is evaluating the statistics of the initial block before building the Huffman tree to determine whether it worths to use a dynamic tree. This approach achieves both faster speed and better compression ratio.

4.3. Facebook: Zstandard

Reference: Facebook Zstandard (zstd) design introduction

Zstandard on GitHub

Zstandard is designed to scale with modern hardware and compress smaller and faster, for general-purpose compression for a variety of data types.

Improve upon zlib by combining several recent innovations and targeting modern hardware.

Increase window size to 1MB - 8MB (recommendation).

In compression, use Finite State Entropy based on ANS (Asymmetric Numeral System) to improve performance and reduce latency.

Use repcode modeling to efficiently compress structured data

Use a branchless design style to reduce the overhead of CPU branch predictor.

In decompression, separate data into multiple parallel streams and uses a new generation Huffman decoder to decode multiple symbols in parallel with a single core. This takes advantage of modern CPU’s ability to issue several instructions per cycle.

4.4. Google: Zopfli, Brotli

Reference: Data compression using Zopfli (pdf), Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 Compression Algorithms (pdf)

Zopfli on GitHub

Brotli on GitHub

In 2013 Google launched Zopfli, which is compatible with deflate format. Zopfli gives smaller compressed data size than gzip (3.7-8.3% smaller), but it has slower compression speed than gzip level 9. Zopfli library can only compress, not decompress.

Brotli attempts to implement a new compression format, and a more efficient algorithm than deflate. It is similar in speed with deflate but offers more dense compression.

The Brotli compression format is defined in RFC 7932. The algorithm uses a combination of a modern variant of the LZ77 algorithm, Huffman coding and 2nd order context modeling. It also uses a static dictionary containing more than 13K words. The static dictionary is helpful for compressing short files.

4.5. Apple: LZFSE

Reference: Apple Data Compression: LZFSE

LZFSE is designed for iOS and macOS, to achieve much higher energy efficiency and speed (between 2x and 3x).

LZFSE algorithm uses LZ77 and Finite State Entropy encoding.

4.6. CloudFlare: zlib

Reference: CloudFlare fights cancer: The Unexpected Benefit Of Open Sourcing Our Code

CloudFlare zlib fork on GitHub

CloudFlare’s improvements on zlib include:

  • Use uint64_t to replace the 16-bit types.
  • Use an improved hash function iSCSI CRC32. This function is implemented as a hardware instruction on Intel processors. It has very fast performance and better collision properties.
  • Search for matches of at least 4 bytes.
  • Use SIMD instructions for window rolling.
  • Use Intel’s hardware carry-less multiplication instruction PCLMULQDQ for the CRC32 checksum.
  • Optimized longest-match function. This is the most performance demanding function in the library.

Other experiments, not yet included in the released zlib fork:

  • Use an improved version of the linked list used in zlib (hash chain)

Benchmarking results

  • Testing datasets are four standard corpus data consists of ASCII text, bitmap images, numbers, and source code files.
  • Speed: In general, CloudFlare’s zlib is faster than zlib and Intel’s zlib on levels 2 to 9, especially 6 to 9 (2x - 7.5x).
  • Compression ratio: Very similar to zlib on all levels, and better than Intel’s zlib on level 1.

4.7. CloudFlare: preset dictionary

Reference: CloudFlare: improve compression with preset deflate dictionary

Optimizations aim to improve compression performance (~25% down) for HTML files (mostly short texts).

Change minimal match length to 4 bytes.

Use a preset dictionary so at the beginning of the input has back reference for searching matches.

  • Create the preset dictionary by performing pseudo LZ77 over a set of files to be compressed, and find strings that DEFLATE would not compress in the first 16Kb of each input file. Then performs a frequency count of the individual strings, and scores them according to their length and frequency. The highest scoring strings are saved into the dictionary file.
  • Use a Go program to make such deflate dictionary: dictator on GitHub

Reference

About

This document is written by Euccas Chen. Initial version was published on January 5th, 2019.

The web pages are generated by Hugo - an open source static site generator, and themed with Kraiklyn.

The web page style and logo design are inspired by GitBook.