Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Artifact ID: | aa614b33e594fe6404bf3a72007776ad281a2a38 |
---|---|
Page Name: | benchmarks (2019 update) |
Date: | 2019-02-06 18:02:16 |
Original User: | sandro |
Parent: | 9bb2561a09ff57702c43839516e187df9ce7ab63 (diff) |
Next | 3e9bc0d9c13c8a9a75e4ccca5dbce501ab8cb878 |
Content
Back to RasterLite2 doc index
RasterLite2 reference Benchmarks (2019 update)
Intended scopes
In recent years new and innovative lossless compression algorithms have been developed.The current benchmark is intended to check and verify by practical testing how these new compression methods do practically perform under the most usual conditions.
More specifically, a comparison will be made between the relative performances of new and older lossless compression methods.
The contenders
The following general purpose lossless compression methods will be systematically compared:- DEFLATE: (aka Zip)
This is the most classic and almost universally adopted lossless compression method.
It was initially introduced about 30 years ago (in 1991), so it can be assumed to be the venerable decane of all them. - LZMA: (aka 7-Zip)
This is a well known and widely adopted lossless compression method.
It's younger than DEFLATE having been introduced about 20 years ago (in 1998). LZMA is an extremist interpretation of lossless compression.
It's usually able to achieve really impressive compression ratios (by far better than DEFLATE can do), but at the cost of severely sacrificing the compression speed; LZMA can be easily deadly slow. - LZ4
This is a more modern algorithm having been introduced less than 10 years ago (in 2011), so it's diffusion and adoption is still rather limited.
LZ4 too is an extremist interpretation of lossless compression, but it goes exactly in the opposite direction of LZMA.
It's strongly optimized so to be extremely fast, but at the cost of sacrificing the compression ratios. - ZSTD (aka Zstandard)
This is a very recently introduced algorithm (2015), and it's adoption is still rather limited.
Curiously enough, both LZ4 and ZSTD are developed and maintained by the same author (Yann Collet).
ZSTD is a well balanced algorithm pretending to be a most modern replacement for DEFLATE, being able to be faster and/or to achieve better compression ratios.
Just few technical details about the most relevant innovations introduced by ZSTD:- The old DEFLATE was designed so to require a very limited amount of memory, and this impaired someway it's efficiency.
Modern HW can easily support a lot of memory, so ZSTD borrows few ideas from LZMA about a less constrained and more efficient memory usage.
More specifically, DEFLATE is based on a moving data window of only 32KB; both LZMA and ZSTD adopt a more generous moving window of 1MB. - Both DEFLATE and ZSTD adopts the classic Huffman coding for reducing the information entropy.
But ZSTD can also support a further advanced mechanism based on Finite State Entropy, a very recent technique being much faster.
- The old DEFLATE was designed so to require a very limited amount of memory, and this impaired someway it's efficiency.
Whenever possible and appropriate the following lossless compression methods specifically intended for images / rasters will be tested as well:
- PNG
This is a very popular format supporting RGB and Grayscale images (with or without Alpha transparencies).
PNG fully depends on DEFLATE for data compression. - CharLS
This is an image format (RGB and Grayscale) having a limited diffusion but rather popular for storying medical imagery.
CharLS is based on Lossless JPEG, a genuinely lossless image compression schema not to be confused with plain JPEG (that is the most classic example of lossy compression). - Jpeg2000
This is intended to be a more advanced replacement for JPEG, but it's not yet so widely supported as its ancestor.
Jpeg2000 is an inherently lossy compression, but under special settings it can effectively support a genuine lossless compression mode. - WebP
This too is an innovative image format pretending to be a better replacement for JPEG.
WebP images are expected to support the same visual quality of JPEG but requiring a significantly reduced storage space.
Exactly as Jpeg2000, WebP too is an inherently lossy compression, but under special settings it can effectively support a genuine lossless compression mode.
Testing generic datasets
We'll start first by testing several generic datasets, so to stress all compression methods under the most common conditions.The same dataset will be compressed and then decompressed using each method, so to gather informations about:
- the size of the resulting compressed file.
The ratio between the uncompressed and compressed sizes will correspond to the compression ratio. - the time required to compress the original dataset.
- the time required to decompress the compressed file so to recover the initial uncompressed dataset.
Note: compressing is a much harder operation than decompressing, and will always require more time.
The speed differences between the various compression algorithms will be strong and well marked when compressing, but also the differences in decompression speeds (although less impressive) are worth to be carefully evaluated.
- for any compression algorithm being slow (or even very slow) when compressing can be easily considered a trivial and forgivable issue.
Compression usually happens only once in the lifetime of a compressed dataset, and there are many ways for minimizing the adverse effects of intrinsic slowness.
You could e.g. compress your files in batch mode, may be during off-peak hours, and in such a scenario reaching stronger compression ratios could easily justify a longer process time.
Or alternatively you could enable (if possible) a multithread compression approach (parallel processing), so to significantly reduce the required time. - being slow when decompressing is a much more serious issue, because decompression will happen more frequently; very frequently in some specific scenario.
So a certain degree of slowness in decompression could easily become a serious bottleneck severely limiting the overall performances of your system.
test #1 - compressing many CSV files
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
0.97 GB | LZ4 | 289 MB | 3.46 | 6.550 sec | 2.256 sec |
DEFLATE | 155 MB | 6.44 | 33.079 sec | 2.159 sec | |
ZSTD | 110 MB | 9.09 | 2.924 sec | 1.313 sec | |
LZMA | 47 MB | 21.42 | 1220.329 sec | 10.179 sec |
- The sample was a tarball containing a whole GTFS dataset.
- Text files are usually expected to be highly compressible (so many repetitions of the same worlds and values), and this test confirms the expectations.
- LZ4 is very fast both when compressing and decompressing, but the compression ratio is rather disappointing.
- DEFLATE is a very effective and well balanced compromise between speed and effectiveness.
It scores a decent compression ratio and it's fast enough both when compressing and decompressing. - ZSTD clearly wins this first match hands down; it's impressively fast (in both directions) and it scores a very good compression ratio.
- LZMA scores a really impressive compressive ratio, but it's deadly slow when compressing (more than 10 times slower than DEFLATE).
But what's really bad is that it's slow even when decompressing (about 5 times slower than DEFLATE).
test #2 - compressing a SQLite database file
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.13 GB | LZ4 | 508 MB | 2.29 | 10.333 sec | 2.123 sec |
DEFLATE | 323 MB | 3.60 | 54.343 sec | 3.173 sec | |
ZSTD | 219 MB | 5.31 | 4.331 sec | 1.522 sec | |
LZMA | 82 MB | 14.26 | 646.670 sec | 17.930 sec |
- The sample was a SQLite/SpatiaLite database containing the same GTFS dataset used in the previous test.
- Databases are usually expected to be strongly compressible (so many repetitions of ZERO, SPACE and NULL values), and this test confirms the expectations.
- LZ4 confirms to be very fast but not very effective.
- DEFLATE confirms to be still valid despite its venerable age.
- ZSTD is once more the winner of this test, being both fast and effective.
- LZMA confirms to be unbeatable for reaching very high compression ratios, but unhappily it confirms its barely tolerable slowness.
test #3 - compressing many Shapefiles
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.19 GB | LZ4 | 0.99 GB | 1.20 | 6.413 sec | 0.893 sec |
DEFLATE | 870 MB | 1.40 | 48.004 sec | 4.553 sec | |
ZSTD | 880 MB | 1.39 | 5.416 sec | 1.292 sec | |
LZMA | 682 MB | 1.79 | 740.077 sec | 45.624 sec |
- The sample was a tarball containing several Shapefiles (Road Network and Administrative Boundaries of Tuscany).
- Shapefiles contain plenty of raw binary data, and consequently are rather hard to be strongly compressed.
This fully explains why in this specific test the compression ratios are always very bland. - LZ4 confirms to be very fast but not very effective.
- DEFLATE confirms to be still valid despite its venerable age.
- ZSTD is once more the winner of this test, being noticeably faster than DEFLATE.
But it's worth noting that in this specific test it's unable to reach a better compression ratio than DEFLATE. - LZMA confirms to be unbeatable for reaching very high compression ratios, but unhappily it confirms its barely tolerable slowness.
test #4 - compressing a Landsat 8 scene (satellite imagery)
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.78 GB | LZ4 | 1.07 GB | 1.65 | 5.104 sec | 1.285 sec |
DEFLATE | 928 MB | 1.97 | 56.643 sec | 7.176 sec | |
ZSTD | 929 MB | 1.96 | 7.261 sec | 2.329 sec | |
LZMA | 798 MB | 2.29 | 957.182 sec | 95.288 sec |
- The sample was a tarball containing a Landsat 8 scene.
- Satellite imagery contain plenty of raw binary data, and consequently are rather hard to be strongly compressed.
This fully explains why in this specific test the compression ratios are always very bland. - LZ4 confirms to be very fast but not very effective.
- DEFLATE confirms to be still valid despite its venerable age.
- ZSTD is once more the winner of this test, being noticeably faster than DEFLATE.
But it's worth noting that in this specific test it's unable to reach a better compression ratio than DEFLATE. - LZMA confirms to be unbeatable for reaching very high compression ratios, but unhappily it confirms its barely tolerable slowness.
Final assessment (and lessons learned)
- The intrinsic efficiency of all lossless compression algorithm strongly depends on the internal data distribution within the sample.
- samples presenting a very regular and easily predictable internal distribution have a low information entropy, and can be strongly compressed.
A typical example: text files written is some language based on the Latin alphabet. - samples presenting an irregular and random internal distribution have a high information entropy, and can be only moderately compressed.
A typical example: any kind of binary file.
Note: any binary file presenting a perfectly random internal distribution of values is conceptually impossible to be compressed at all.
- samples presenting a very regular and easily predictable internal distribution have a low information entropy, and can be strongly compressed.
- any lossless compression strategy implies a trade off between speed and compression ratio:
- you can optimize for speed, but in this case you are necessarily sacrificing the compression ratio.
(this is the choice adopted by LZ4). - at the opposite side of the spectrum you can optimize for high compression ratios, but in this case you are necessarily sacrificing speed.
(this is the choice adopted by LZMA). - the wisest approach falls somewhere in the middle; a well balanced mix (a reasonable compromise) between speed and compression ratio.
(this is the choice of both DEFLATE and ZSTD).
- you can optimize for speed, but in this case you are necessarily sacrificing the compression ratio.
- the very recently introduced ZSTD clearly is a superior alternative to the old DEFLATE:
- ZSTD is always noticeably faster than DEFLATE, both when compressing and decompressing.
- ZSTD is not always able to reach better compression ratios then DEFLATE (it depends on the sample's information entropy).
On many common cases ZSTD can easily outperform DEFLATE compression ratios.
When not, it still remains able to achieve (more or less) the same compression ratios than DEFLATE but in a faster time.
- LZ4 is not really interesting (at least for general purpose scopes).
It's surely very fast, but not impressively faster than ZSTD.
And it's compression ratios are always too mild and bland to be really appealing. - LZMA has no alternatives when very strong compression ratios are an absolute must.
But its terrible slowness (both when compressing and decompressing) must always be taken in very serious account, because it could easily become a severe bottleneck. - DEFLATE isn't at all dead; despite its rather venerable age it still confirms to be an honest performer.
And considering its almost universal and pervasive adoption it will surely survive for many long years to come.
Back to RasterLite2 doc index