Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Artifact ID: | ba9b634573152446c2b70bb4c504786f3e029ad1 |
---|---|
Page Name: | benchmarks (2019 update) |
Date: | 2019-02-08 09:38:25 |
Original User: | sandro |
Parent: | cd5476a157a6bf456415838bea5448242043f6e1 (diff) |
Next | b267fdc76b4f758b8206f0f8a552a5fdf2d05aba |
Content
Back to RasterLite2 doc index
RasterLite2 reference Benchmarks (2019 update)
Intended scopes
In recent years new and innovative lossless compression algorithms have been developed.The current benchmark is intended to check and verify by practical testing how these new compression methods do practically perform under the most usual conditions.
More specifically, a comparison will be made between the relative performances of new and older lossless compression methods.
The contenders
The following general purpose lossless compression methods will be systematically compared:- DEFLATE: (aka Zip)
This is the most classic and almost universally adopted lossless compression method.
It was initially introduced about 30 years ago (in 1991), so it can be assumed to be the venerable decane of all them. - LZMA: (aka 7-Zip)
This is a well known and widely adopted lossless compression method.
It's younger than DEFLATE having been introduced about 20 years ago (in 1998). LZMA is an extremist interpretation of lossless compression.
It's usually able to achieve really impressive compression ratios (by far better than DEFLATE can do), but at the cost of severely sacrificing the compression speed; LZMA can be easily deadly slow. - LZ4
This is a more modern algorithm having been introduced less than 10 years ago (in 2011), so it's diffusion and adoption is still rather limited.
LZ4 too is an extremist interpretation of lossless compression, but it goes exactly in the opposite direction of LZMA.
It's strongly optimized so to be extremely fast, but at the cost of sacrificing the compression ratios. - ZSTD (aka Zstandard)
This is a very recently introduced algorithm (2015), and it's adoption is still rather limited.
Curiously enough, both LZ4 and ZSTD are developed and maintained by the same author (Yann Collet).
ZSTD is a well balanced algorithm pretending to be a most modern replacement for DEFLATE, being able to be faster and/or to achieve better compression ratios.
Just few technical details about the most relevant innovations introduced by ZSTD:- The old DEFLATE was designed so to require a very limited amount of memory, and this impaired someway it's efficiency.
Modern HW can easily support a lot of memory, so ZSTD borrows few ideas from LZMA about a less constrained and more efficient memory usage.
More specifically, DEFLATE is based on a moving data window of only 32KB; both LZMA and ZSTD adopt a more generous moving window of 1MB. - Both DEFLATE and ZSTD adopts the classic Huffman coding for reducing the information entropy.
But ZSTD can also support a further advanced mechanism based on Finite State Entropy, a very recent technique being much faster.
- The old DEFLATE was designed so to require a very limited amount of memory, and this impaired someway it's efficiency.
Whenever possible and appropriate the following lossless compression methods specifically intended for images / rasters will be tested as well:
- PNG
This is a very popular format supporting RGB and Grayscale images (with or without Alpha transparencies).
PNG fully depends on DEFLATE for data compression. - CharLS
This is an image format (RGB and Grayscale) having a limited diffusion but rather popular for storying medical imagery.
CharLS is based on Lossless JPEG, a genuinely lossless image compression schema not to be confused with plain JPEG (that is the most classic example of lossy compression). - Jpeg2000
This is intended to be a more advanced replacement for JPEG, but it's not yet so widely supported as its ancestor.
Jpeg2000 is an inherently lossy compression, but under special settings it can effectively support a genuine lossless compression mode. - WebP
This too is an innovative image format pretending to be a better replacement for JPEG.
WebP images are expected to support the same visual quality of JPEG but requiring a significantly reduced storage space.
Exactly as Jpeg2000, WebP too is an inherently lossy compression, but under special settings it can effectively support a genuine lossless compression mode.
Testing generic datasets
We'll start first by testing several generic datasets, so to stress all compression methods under the most common conditions.The same dataset will be compressed and then decompressed using each method, so to gather informations about:
- the size of the resulting compressed file.
The ratio between the uncompressed and compressed sizes will correspond to the compression ratio. - the time required to compress the original dataset.
- the time required to decompress the compressed file so to recover the initial uncompressed dataset.
Note: compressing is a much harder operation than decompressing, and will always require more time.
The speed differences between the various compression algorithms will be strong and well marked when compressing, but also the differences in decompression speeds (although less impressive) are worth to be carefully evaluated.
- for any compression algorithm being slow (or even very slow) when compressing can be easily considered a trivial and forgivable issue.
Compression usually happens only once in the lifetime of a compressed dataset, and there are many ways for minimizing the adverse effects of intrinsic slowness.
You could e.g. compress your files in batch mode, may be during off-peak hours, and in such a scenario reaching stronger compression ratios could easily justify a longer process time.
Or alternatively you could enable (if possible) a multithread compression approach (parallel processing), so to significantly reduce the required time. - being slow when decompressing is a much more serious issue, because decompression will happen more frequently; very frequently in some specific scenario.
So a certain degree of slowness in decompression could easily become a serious bottleneck severely limiting the overall performances of your system.
test #1 - compressing many CSV files
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
0.97 GB | LZ4 | 289 MB | 3.46 | 6.550 sec | 2.256 sec |
DEFLATE | 155 MB | 6.44 | 33.079 sec | 2.159 sec | |
ZSTD | 110 MB | 9.09 | 2.924 sec | 1.313 sec | |
LZMA | 47 MB | 21.42 | 1220.329 sec | 10.179 sec |
- The sample was a tarball containing a whole GTFS dataset.
- Text files are usually expected to be highly compressible (so many repetitions of the same worlds and values), and this test confirms the expectations.
- LZ4 is very fast both when compressing and decompressing, but the compression ratio is rather disappointing.
- DEFLATE is a very effective and well balanced compromise between speed and effectiveness.
It scores a decent compression ratio and it's fast enough both when compressing and decompressing. - ZSTD clearly wins this first match hands down; it's impressively fast (in both directions) and it scores a very good compression ratio.
- LZMA scores a really impressive compressive ratio, but it's deadly slow when compressing (more than 10 times slower than DEFLATE).
But what's really bad is that it's slow even when decompressing (about 5 times slower than DEFLATE).
test #2 - compressing a SQLite database file
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.13 GB | LZ4 | 508 MB | 2.29 | 10.333 sec | 2.123 sec |
DEFLATE | 323 MB | 3.60 | 54.343 sec | 3.173 sec | |
ZSTD | 219 MB | 5.31 | 4.331 sec | 1.522 sec | |
LZMA | 82 MB | 14.26 | 646.670 sec | 17.930 sec |
- The sample was a SQLite/SpatiaLite database containing the same GTFS dataset used in the previous test.
- Databases are usually expected to be strongly compressible (so many repetitions of ZERO, SPACE and NULL values), and this test confirms the expectations.
- LZ4 confirms to be very fast but not very effective.
- DEFLATE confirms to be still valid despite its venerable age.
- ZSTD is once more the winner of this test, being both fast and effective.
- LZMA confirms to be unbeatable for reaching very high compression ratios, but unhappily it confirms its barely tolerable slowness.
test #3 - compressing many Shapefiles
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.19 GB | LZ4 | 0.99 GB | 1.20 | 6.413 sec | 0.893 sec |
DEFLATE | 870 MB | 1.40 | 48.004 sec | 4.553 sec | |
ZSTD | 880 MB | 1.39 | 5.416 sec | 1.292 sec | |
LZMA | 682 MB | 1.79 | 740.077 sec | 45.624 sec |
- The sample was a tarball containing several Shapefiles (Road Network and Administrative Boundaries of Tuscany).
- Shapefiles contain plenty of raw binary data, and consequently are rather hard to be strongly compressed.
This fully explains why in this specific test the compression ratios are always very bland. - LZ4 confirms to be very fast but not very effective.
- DEFLATE confirms to be still valid despite its venerable age.
- ZSTD is once more the winner of this test, being noticeably faster than DEFLATE.
But it's worth noting that in this specific test it's unable to reach a better compression ratio than DEFLATE. - LZMA confirms to be unbeatable for reaching very high compression ratios, but unhappily it confirms its barely tolerable slowness.
test #4 - compressing a Landsat 8 scene (satellite imagery)
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.78 GB | LZ4 | 1.07 GB | 1.65 | 5.104 sec | 1.285 sec |
DEFLATE | 928 MB | 1.97 | 56.643 sec | 7.176 sec | |
ZSTD | 929 MB | 1.96 | 7.261 sec | 2.329 sec | |
LZMA | 798 MB | 2.29 | 957.182 sec | 95.288 sec |
- The sample was a tarball containing a Landsat 8 scene.
- Satellite imagery contain plenty of raw binary data, and consequently are rather hard to be strongly compressed.
This fully explains why in this specific test the compression ratios are always very bland. - LZ4 confirms to be very fast but not very effective.
- DEFLATE confirms to be still valid despite its venerable age.
- ZSTD is once more the winner of this test, being noticeably faster than DEFLATE.
But it's worth noting that in this specific test it's unable to reach a better compression ratio than DEFLATE. - LZMA confirms to be unbeatable for reaching very high compression ratios, but unhappily it confirms its barely tolerable slowness.
Final assessment (and lessons learned)
- The intrinsic efficiency of all lossless compression algorithm strongly depends on the internal data distribution within the sample.
- samples presenting a very regular and easily predictable internal distribution have a low information entropy, and can be strongly compressed.
A typical example: text files written is some language based on the Latin alphabet. - samples presenting an irregular and random internal distribution have a high information entropy, and can be only moderately compressed.
A typical example: any kind of binary file.
Note: any binary file presenting a perfectly random internal distribution of values is conceptually impossible to be compressed at all.
- samples presenting a very regular and easily predictable internal distribution have a low information entropy, and can be strongly compressed.
- any lossless compression strategy implies a trade off between speed and compression ratio:
- you can optimize for speed, but in this case you are necessarily sacrificing the compression ratio.
(this is the choice adopted by LZ4). - at the opposite side of the spectrum you can optimize for high compression ratios, but in this case you are necessarily sacrificing speed.
(this is the choice adopted by LZMA). - the wisest approach falls somewhere in the middle; a well balanced mix (a reasonable compromise) between speed and compression ratio.
(this is the choice of both DEFLATE and ZSTD).
- you can optimize for speed, but in this case you are necessarily sacrificing the compression ratio.
- the very recently introduced ZSTD clearly is a superior alternative to the old DEFLATE:
- ZSTD is always noticeably faster than DEFLATE, both when compressing and decompressing.
- ZSTD is not always able to reach better compression ratios then DEFLATE (it depends on the sample's information entropy).
On many common cases ZSTD can easily outperform DEFLATE compression ratios.
When not, it still remains able to achieve (more or less) the same compression ratios than DEFLATE but in a faster time.
- LZ4 is not really interesting (at least for general purpose scopes).
It's surely very fast, but not impressively faster than ZSTD.
And it's compression ratios are always too mild and bland to be really appealing. - LZMA has no alternatives when very strong compression ratios are an absolute must.
But its terrible slowness (both when compressing and decompressing) must always be taken in very serious account, because it could easily become a severe bottleneck. - DEFLATE isn't at all dead; despite its rather venerable age it still confirms to be an honest performer.
And considering its almost universal and pervasive adoption it will surely survive for many long years to come.
Testing Raster Coverages
This second group of tests will be more specifically focussed on directly comparing the various lossless compression methods as implemented by RasterLite2 for encoding and decoding Raster Coverage Tiles.- Several distinct RasterLite2 databases will be created and fully populated by importing the same sample but by applying a different compression method for each database.
- The compression ratios will be then computed from the sizes of the uncompressed database (method NONE) and any other database based on the same sample.
- The compression time will be the time (as reported by rl2tool) required for creating and fully populating each database.
- The decompression time will be the time (as reported by spatialite CLI) for executing an SQL script containing 256 SELECT RL2_GetMapImageFromRaster() statements.
All requested images will be 1000x1000 pixels at full resolution, centered on different locations and adopting various SLD/SE styles.
This is assumed to be a realistic and significative evaluation, because it basically corresponds to the typical workload of an hypothetical WMS server. - Note: the measured timings will not directly correspond to the intrinsic speed of each compression method.
There are obviously several disturbing factors (mainly due to I/O operations) to be taken in account.
However the operational sequence is strictly the same for all tests based on the same sample, so the unique factor explaining for different timings is the compression method itself.
Test #5 - Grayscale Raster Coverage
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 481 MB | 1.00 | 54sec | 1min 44sec |
LZ4 very fast compression | 416 MB | 1.16 | 59sec | 1min 48sec |
DEFLATE zip compression | 349 MB | 1.38 | 1min 5sec | 1min 44sec |
ZSTD fast compression | 346 MB | 1.39 | 1min 0sec | 1min 54sec |
LZMA 7-zip compression | 345 MB | 1.40 | 3min 2sec | 2min 3sec |
PNG lossless image format | 346 MB | 1.39 | 1min 8sec | 1min 41sec |
LL_WEBP lossless WEbP | 320 MB | 1.50 | 4min 27sec | 2min 02sec |
LL_JP2 lossless Jpeg2000 | 323 MB | 1.49 | 4min 26sec | 2min 21sec |
CHARLS lossless JPEG | 339 MB | 1.42 | 2min 38sec | 2min 6sec |
- this test was based on a sample of 25 B&W TIFF+TFW Sections (forming a 5x5 square) centered around the city of Florence.
The original dataset is the Ortophoto imagery (year 1978; scale 1:10000) published by Tuscany - as we were expecting from our previous tests, lossless copression can very difficult reach strong compression ratios when applied to photographic images.
- in this specic test DEFLATE, ZSTD, and PNG score more or less equivalent compression ratios, and they mark very similar compression and decompression timings.
It's worth noting that DEFLATE, ZSTD and PNG require more or less the same decompression time than NONE (uncompressed), so they don't cause any rendering bottleneck. - as we were expecting LZ4 is fast but unable to reach a decent compression ratio.
- LZMA confirms to be very slow both when compressing and decompressing.
- The real delusion comes from LL_WEBP, LL_JP2 and CHARLS.
These algorithms are specifically designed for compressing photographic imagery, but they are unable to outperform the other generic multipurpose compression algorithms.
They score marginally better compression ratios, but they are deadly slow. The game is not worth the candle.
Test #6 - RGB Raster Coverage
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 1.51 GB | 1.00 | 1min 17sec | 1min 51sec |
LZ4 very fast compression | 1.21 GB | 1.25 | 1min 31sec | 1min 47sec |
DEFLATE zip compression | 800 MB | 1.94 | 1min 56sec | 1min 40sec |
ZSTD fast compression | 816 MB | 1.90 | 1min 29sec | 1min 37sec |
LZMA 7-zip compression | 710 MB | 2.18 | 7min 23sec | 2min 11sec |
PNG lossless image format | 830 MB | 1.86 | 2min 29sec | 1min 49sec |
LL_WEBP lossless WEbP | 525 MB | 2.95 | 7min 18sec | 1min 48sec |
LL_JP2 lossless Jpeg2000 | 802 MB | 1.92 | 11min 31sec | 3min 16sec |
CHARLS lossless JPEG | 912 MB | 1.70 | 7min 54sec | 2min 47sec |
- this test was based on a sample of 9 RGB TIFF+TFW Sections (forming a 3x3 square) centered around the town of San Giovanni Valdarno.
The original dataset is the exactly the same we'll see in the following test, but in this case the Near Infrared spectral band was completely removed. - this test simply confirms the general pattern we've already seen about Grayscale.
- the unique exception is LL_WEBP, that in this case scores the best compression ratio of them all, and marks a fairly good decompression time.
Test #7 - Multispectral (4-bands) Raster Coverage
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 2.01 GB | 1.00 | 3min 18sec | 1min 55sec |
LZ4 very fast compression | 1.61 GB | 1.24 | 3min 41sec | 1min 48sec |
DEFLATE zip compression | 1.02 GB | 1.97 | 5min 5sec | 1min 42sec |
ZSTD fast compression | 1.07 GB | 1.87 | 3min 35sec | 1min 46sec |
LZMA 7-zip compression | 882 MB | 2.34 | 11min 7sec | 2min 15sec |
PNG lossless image format | 1.08 GB | 1.85 | 4min 43sec | 1min 47sec |
LL_WEBP lossless WEbP | 758 MB | 2.72 | 9min 36sec | 1min 51sec |
LL_JP2 lossless Jpeg2000 | 1.05 GB | 1.92 | 16min 23sec | 3min 53sec |
- this test was based on a sample of 9 4-bands (RGB+NearInfrared) TIFF+TFW Sections (forming a 3x3 square) centered around the town of San Giovanni Valdarno.
The original dataset is the Ortophoto imagery (year 2013; scale 1:2000) published by Tuscany - this test simply confirms the general pattern we've already seen about Grayscale and RGB.
- in this case too LL_WEBP scores the best compression ratio of them all, and marks a fairly good decompression time.
Test #8 - Datagrid Raster Coverage (ASCII Grid - floating point single precision)
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 2.01 GB | 1.00 | 6min 30sec | 2min 6sec |
LZ4 very fast compression | 845 MB | 2.45 | 6min 36sec | 2min 9sec |
DEFLATE zip compression | 623 MB | 3.32 | 7min 2sec | 2min 6sec |
ZSTD fast compression | 614 MB | 3.36 | 6min 26sec | 1min 55sec |
LZMA 7-zip compression | 513 MB | 4.03 | 11min 20sec | 3min 5sec |
- this test was based on a huge ASCII Grid (DTM, 10m x 10m cell size).
The original dataset is the Orographic DTM 10x10 published by Tuscany - this specific test evidentiates a slight superiority of ZSTD above DEFLATE; it's able to score a better compression ratio and it's faster both when compressing and decompressing.
- LZ4 confirms to be fast but unable to score a good compression ratio.
- LZMA confirms to score impressive compression ratios but at the cost of a barely tolerable slowness.
Test #9 - Datagrid Raster Coverage (TIFF - INT16)
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 480 MB | 1.00 | 17sec | 1min 39sec |
LZ4 very fast compression | 317 MB | 1.51 | 21sec | 1min 48sec |
DEFLATE zip compression | 205 MB | 2.34 | 28sec | 1min 39sec |
ZSTD fast compression | 207 MB | 2.32 | 20sec | 1min 42sec |
LZMA 7-zip compression | 168 MB | 2.86 | 2min 0sec | 2min 3sec |
- this test was based on the very popular ETOPO1 global relief model of Earth's surface published by NOAA
- this specific test fails to evidentiates any superiority of ZSTD above DEFLATE; they are substantially on par.
- LZ4 confirms to be fast but unable to score a good compression ratio.
- LZMA confirms to score impressive compression ratios but at the cost of a barely tolerable slowness.
Back to RasterLite2 doc index