You can explicitly set windowLog with --zstd=windowLog=...
It's sometimes useful to combine low-ish compression level with high window size. E.g. when the input data contains multiple similar large chunks that do not fit into the low-compression-level window.
At work we've recently been using zstd as a better-compressing alternative to gzip, and overall I've been pretty happy with it. A minor documentation gripe, though, is that the behavior around multithreaded compression is a bit unclear. I understand it's chunking the work and sending chunks to different threads to parallelize the compression process, and this means that I should expect to see better use of threads on larger files because there are more chunks to spread around, but what is the relationship?
When I look in man zstd I see that you can set
-B<num> to specify the size of the chunks, and it's
documented as "generally 4 * windowSize". Except the
documentation doesn't say how windowSize is set.
From a bit of poking at the source, it looks to me like the way this
works is that windowSize is 2**windowLog,
and windowLog depends on your compression level. If I
know I'm doing zstd -15, though, how does
compressionLevel=15 translate into a value for
windowLog? There's a table in lib/compress/clevels.h
which covers inputs >256KB:
| Level | windowLog | chainLog | hashLog | searchLog | minMatch | targetLength | strategy |
|---|---|---|---|---|---|---|---|
| <1 | 19 | 12 | 13 | 1 | 6 | 1 | fast |
| 1 | 19 | 13 | 14 | 1 | 7 | 0 | fast |
| 2 | 20 | 15 | 16 | 1 | 6 | 0 | fast |
| 3 | 21 | 16 | 17 | 1 | 5 | 0 | dfast |
| 4 | 21 | 18 | 18 | 1 | 5 | 0 | dfast |
| 5 | 21 | 18 | 19 | 3 | 5 | 2 | greedy |
| 6 | 21 | 18 | 19 | 3 | 5 | 4 | lazy |
| 7 | 21 | 19 | 20 | 4 | 5 | 8 | lazy |
| 8 | 21 | 19 | 20 | 4 | 5 | 16 | lazy2 |
| 9 | 22 | 20 | 21 | 4 | 5 | 16 | lazy2 |
| 10 | 22 | 21 | 22 | 5 | 5 | 16 | lazy2 |
| 11 | 22 | 21 | 22 | 6 | 5 | 16 | lazy2 |
| 12 | 22 | 22 | 23 | 6 | 5 | 32 | lazy2 |
| 13 | 22 | 22 | 22 | 4 | 5 | 32 | btlazy2 |
| 14 | 22 | 22 | 23 | 5 | 5 | 32 | btlazy2 |
| 15 | 22 | 23 | 23 | 6 | 5 | 32 | btlazy2 |
| 16 | 22 | 22 | 22 | 5 | 5 | 48 | btopt |
| 17 | 23 | 23 | 22 | 5 | 4 | 64 | btopt |
| 18 | 23 | 23 | 22 | 6 | 3 | 64 | btultra |
| 19 | 23 | 24 | 22 | 7 | 3 | 256 | btultra2 |
| 20 | 25 | 25 | 23 | 7 | 3 | 256 | btultra2 |
| 21 | 26 | 26 | 24 | 7 | 3 | 512 | btultra2 |
| 22 | 27 | 27 | 25 | 9 | 3 | 999 | btultra2 |
See the source if you're interested in other sizes.
So it looks like windowSize is:
≤1: 524k
2: 1M
3-8 (default): 2M
9-16: 4M
17-19: 8M
20: 32M
21: 64M
22: 128M
Probably best not to rely on any of this, but it's good to know what
zstd -<level> is doing by default!