Wiki page
[benchmark-4.0] by
sandro
2012-11-10 12:59:07.
D 2012-11-10T12:59:07.552
L benchmark-4.0
P 2219f66660d84536ad1d5c0febc86ff21637efdb
U sandro
W 13328
<h2>Benchmarck (aka "comparing apples and oranges")</h2>
Back to <a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0#virtualogr">main page</a>
<h2>Introduction</h2>
Today I've casually read this <a href="http://gfoss.blogspot.it/2012/11/arcgis-vs-qgis-etc-clipping-contest.html">blog page</a>; many thanks to my dear friend Markus Neteler (the maintainer of GRASS GIS) for pointing my attention on this topic.<br>
The original <b>Clipping Contest</b> simply compared several popular Deskyop GIS (ArcGis, QGIS, gvSIG, GRASS and others), but there was an obvious omission.<br>
No Spatial DBMS was considered at all. (<i>too bad; a really embarrassing omission</i>)<br><br>
After all a <b>Clipping Contest</b> sounds more like a pure data-processing problem than like a typical GIS/GUI problem, so it's probably useful introducing some Spatial DBMS into this Contest.
This was my first thought while reading the benchmark report, so I duly started preparing my own testbed based on SpatiaLite (<i>BTW I'm going to release the new version 4.0.0 in the very next days, so after all performing some further test is surely welcome</i>).
<h2>Test configuration</h2>
<ul>
<li>Intel Core i5 3.46 GHz</li>
<li>4 GB RAM</li>
<li>Windows 7 64 bit (physical)</li>
<li>Debian Squeeze 32 bit (virtual machine)</li>
</ul><br>
I tested both Windows and Debian on the same hardware.
The measured timings are rather similar; anyway Linux performed slightly better than the native Windows system (more or less, by a 10% - 20% factor).
This is not at all surprising for me, because this trend seems to be constantly verified each time I manage some comparative benchmark.
Very often Linux is a best performer than Windows, even when running as a Virtual Machine on the top of a physical Windows system.
Quite paradoxical, but indisputably true and objectively verified many times.
<h2>The sample dataset</h2>
I've simply downloaded the same identical <a href="http://www.donmeltz.com/_files/ContourClipTest.zip">sample dataset</a> already used for all other tests reported in the Clipping Contest.<br>
Very shortly said: it's a really huge set of <b>Contour Lines</b> (the uncompressed Shapefile is about <b>1,3 GB</b>)<br><br>
<table cellspacing="6" cellpadding="6">
<tr><td><h3>Fig. 1</h3>
This one is the original set of Contour Lines to be clipped.</td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-0.png" alt="original dataset"></td></tr>
<tr><td><h3>Fig. 2</h3>
This is the final result of the clipping operation.<ul>
<li>the Study Area boundary (polygon) is represented in red.</li>
<li>the Contour Lines are represented in dark blue.</li>
<li>the azure rectangle marks the detail area shown in the next figure.</li>
</ul></td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-1.png" alt="clipped"></td></tr>
<tr><td><h3>Fig. 3</h3>This is a magnified detail of the Study Area and Clipped Contour Lines.<br><br>
As you can easily notice there are lots of Contour Lines, and each Line has many vertexes.<br><br>
This one is a rather complex problem to be resolved: and it surely represents a thorough and very demanding test.</td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-2.png" alt="load_extension"></td></tr>
</table>
<h2>The Spatial SQL approach</h2>
This is a pure data-processing approach; we'll use the most classical <b><i>Spatial Is Not Special</i></b> methodology, i.e. we'll go performing the <b>Clipping Contest</b> exactly as if it was a trivial ordinary SQL query performed on behalf of plain generic data.
We'll not use at all any GIS-specific frill, and we'll simply use a couple of Spatial SQL functions where strictly required.
(<i>BTW it's a good opportunity to introduce few basic concepts about Spatial SQL</i>).<br>
Let's go ...
<h4>Preparing the SQL script</h4>
<verbatim>
--
-- SQL script performing the Clipping Contest
-- Target DBMS: SQLite with SpatiaLite extension
--
SELECT DateTime('now'), 'Loading the StudyArea SHP';
--
-- loading the StudyArea Shapefile into the DB
--
.loadshp StudyArea1MileBuffer StudyArea1MileBuffer CP1252 2260 Geometry Id POLYGON 2d no no_spatial_index
SELECT DateTime('now'), 'Loading the Contours SHP';
--
-- loading the Contour Shapefile into the DB
--
.loadshp Contours20Ft Contours20Ft CP1252 2260 Geometry Id LINESTRING 2d no with_spatial_index
SELECT DateTime('now'), 'Clipping ... please wait';
--
-- creating a new table containing the Clipped Contours
--
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID,
c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
CastToMultiLinestring(ST_Intersection(
c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'Contours20Ft'
AND search_frame = a.Geometry
);
SELECT DateTime('now'), 'deleting NULL Geometries';
--
-- clean up
-- deleting any NULL Geometry from Clipped Contours
--
DELETE FROM Contours20FtClipped
WHERE Geometry IS NULL;
SELECT DateTime('now'), 'recovering a genuine Geometry column';
--
-- recovering a genuine Geometry Column (Clipped Contours)
---
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry', 2260, 'MULTILINESTRING', 2);
SELECT DateTime('now'), 'creating the Spatial Index';
--
-- creating a Spatial (Index supporting Clipped Contours)
--
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');
SELECT DateTime('now'), 'all done ... quitting';
--
-- end job
--
</verbatim>
Don't be fouled; complexity is more apparent than real. The following is a quick explanation of the step by step process defined by the above SQL script.
<table cellspacing="6" cellpadding="6">
<tr><td bgcolor="#f4f4f4"><b>.loadshp</b> <i>parameters</i></td>
<td><h3>Step #1</h3>
simply a <b>dot macro command</b> understood by the SpatiaLite front-end tool, allowing to directly import an external Shapefile into a corresponding Spatial Table within the DB.</td></tr>
<tr><td><verbatim>
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID,
c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
CastToMultiLinestring(ST_Intersection(
c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'Contours20Ft'
AND search_frame = a.Geometry
);
</verbatim></td>
<td><h3>Step #2</h3>
this single SQL query performs all the actual Clipping task.<br>
<b>ST_Intersection()</b> is a Spatial SQL function computing the intersection between the <b>Study Area</b> (<i>polygon</i>) and each <b>Contour Line</b> (<i>linestring</i>). The returned result could be:<ul>
<li>a Linestring or a MultiLinestring (depending on the specific shape of both geometries)</li>
<li>a NULL (i.e. </i>absolutely nothing</i>) if the Contour Line and the Study Area do not intersect at all.</li>
<li><b>CastToMultiLinestring()</b> simply is a <u><i>cast operator</i></u> ensuring that all returned geometries will be of the same type.</li>
<li>the inner sub-query <b>SELECT ROWID FROM SpatialIndex ...</b> is simply intended to access the Spatial Index supporting the Contour Lines, so to speed up the whole query.</li>
<li><b>CREATE TABLE Contours20FtClipped AS</b> simply intends that the resultset returned by this SQL query has to be permanently saved into yet another table (to be created on the fly).</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
DELETE FROM Contours20FtClipped
WHERE Geometry IS NULL;
</verbatim></td>
<td><h3>Step #3</h3>
just a stupid post-processing step; the previous query has surely generated many NULL geometries, and we'll now get rid of all them.</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry',
2260, 'MULTILINESTRING', 2);
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');
</verbatim></td>
<td><h3>Step #4</h3>
the very final post-processing step: transforming the table containing all Clipped Contour Lines into a genuine Spatial Table supported by its own Spatial Index.</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
SELECT DateTime('now'), 'some message';
</verbatim></td>
<td><h3>measuring the time</h3>this simply is a weird SQL trick; we'll query the system clock immediately before and after performing each step, so to get full trace of the corresponding timings.</li>
</ul><br>
</td></tr>
</table>
<h4>Bringing things to a head: running the test</h4>
<verbatim>
# spatialite my_test.sqlite < test.sql
SpatiaLite version ..: 4.0.0-RC2 Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualXL' [direct XLS access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualSpatialIndex' [R*Tree metahandler]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0
2012-11-10 00:03:52|Loading the StudyArea SHP
2012-11-10 00:03:52|Loading the Contours SHP
2012-11-10 00:04:54|Clipping ... please wait
2012-11-10 00:07:31|deleting NULL Geometries
2012-11-10 00:07:33|recovering a genuine Geometry column
2012-11-10 00:07:34|creating the Spatial Index
2012-11-10 00:07:36|all done ... quitting
#
</verbatim>
<h2>Final considerations</h2>
<ul>
<li>The whole process took less than <b>4 minutes</b></li>
<li>About <b>1 minute</b> was spent loading the Shapefiles and building the supporting Spatial Index</li>
<li>Less than <b>10 seconds</b> were spent performing post-processing steps.</li>
<li>The hard-core problem itself (generating a Clipped set of Contour Lines) took about <b>2 minutes and half</b>.</li>
</ul><br>
<u>Please note</u>: this <b>Clipping Contest</b> really is like <b>comparing apples and oranges</b>.
<ul>
<li>the CPUs used to actually run the tests are of different generations, and obviously have different intrinsic speeds.</li>
<li>it's not at all clearly stated if the Desktop GIS apps permanently saved the Clipped Contour Lines somewhere or not. And it's not at all clear which kind of data storage they used (directly accessing the disk-based Shapefiles ? loading all the stuff in memory ? some other kind of properly structured disk-based storage, may be temporary and transient ?)</li>
<li>the amount of available RAM is absolutely not relevant; this Contest could eventually benefit from a faster CPU, but the required memory footprint is minimal (just few hundredth MB in the SpatiaLite's case).
Consuming huge amounts of RAM doesn't looks at all a wise solution, and isn't at all strictly required in this specific case. When such a condition emerges this will probably mean that some <u><i>barbaric brute-force</i></u> data access strategy has been badly adopted.</li>
<li>adopting some kind of parallel processing will obviously introduce a noticeable performance boost (SpatiaLite is single threaded; so using a quad-core processor had absolutely no effect at all).</li>
</ul>
<h4>What are we measuring, really ?</h4>
<ul>
<li>many GFOSS implementations (QGIS, PostGIS, SpatiaLite) are based on <a href="http://trac.osgeo.org/geos/">GEOS</a>; so any measured time effectively represents the intrinsic efficiency of GEOS much more than the efficiency of the top-level application itself.
If a difference exists between different GEOS-based apps, this is simply when the top-level apps badly introduces some further overhead thus reducing the overall efficiency.</li>
<li>other GFOSS Java implementations (gvSIG, OpenJump, uDIG) are based on <a href="http://www.vividsolutions.com/jts/jtshome.htm">JTS</a>; but GEOS simply is a transposition in C++ of the original Java-based JTS, the underlaying algorithms are exactly the same for both libraries.
So, roughly speaking, all GFOSS applications (both C, C++ and Java) after all are based on the top of the same base-level algorithms.
And not at all surprisingly all measured timings are quite exactly the same.</li>
<li>with the remarkable exception of GRASS GIS; this one adopts an original approach of its own, being based on a conceptually different Geometry model (topology vs simple features).
Anyway, in this case too timings are very similar to the others.</li>
<li>IMHO it's a good demonstration that we have complessively reached the best possible efficiency for this kind of problems. And that openness combined with collaboration/competition between many different independent projects always leads to good, robust and efficient software.</li>
<li>many proprietary implementations looks to be embarrassangly less efficient.
A sad demonstration that a development policy based on closedness and secrecy very difficultly pays on long term periods.</li>
</li>
</ul>
<br><hr>
Back to <a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0#virtualogr">main page</a>
Z 588a7d20aedfb9f557343233977d1297