Wiki page
[benchmark-4.0] by
sandro
2012-11-10 11:59:21.
D 2012-11-10T11:59:21.901
L benchmark-4.0
P ac4d14b19870c304878d030920ae94afca7917fa
U sandro
W 12656
<h2>Benchmarck (aka "comparing apples and oranges")</h2>
Back to <a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0#virtualogr">main page</a>
<h2>Introduction</h2>
Today I've casually read this <a href="http://gfoss.blogspot.it/2012/11/arcgis-vs-qgis-etc-clipping-contest.html">blog page</a>; many thanks to my dear friend Markus Neteler (the maintainer of GRASS GIS) for pointing my attention on this topic.<br>
The original <b>Clipping Contest</b> simply compared several popular Deskyop GIS (ArcGis, QGIS, gvSIG, GRASS and others), but there was an obvious omission.<br>
No Spatial DBMS was considered at all. (<i>too bad; a really embarrassing omission</i>)<br><br>
After all a <b>Clipping Contest</b> sounds more like a pure data-processing problem than like a typical GIS/GUI problem, so it's probably useful introducing some Spatial DBMS into this Contest.
This was my first thought while reading the benchmark report, so I duly started preparing my own testbed based on SpatiaLite (<i>BTW I'm going to release the new version 4.0.0 in the very next days, do after all performing some further test is surely welcome</i>).
<h2>Test configuration</h2>
<ul>
<li>Intel Core i5 3.46 GHz</li>
<li>4 GB RAM</li>
<li>Windows 7 64 bit (physical)</li>
<li>Debian Squeeze 32 bit (virtual machine)</li>
</ul><br>
I tested both Windows and Debian on the same hardware.<br>
The measured timings are rather similar; anyway Linux performed slightly better than the native Windows system (more or less, by a 10% - 20% factor).<br>
This is not at all surprising for me, because this trend seems to be constantly verified each time I manage some comparative benchmark.<br>
Very often Linux is a best performer than Windows, even when running as a Virtual Machine on the top of a physical Windows system.<br>
Quite paradoxical, but indisputably true and objectively verified many times.
<h2>The sample dataset</h2>
I've simply downloaded the same identical <a href="http://www.donmeltz.com/_files/ContourClipTest.zip">sample dataset</a> already used for all other tests reported in the Clipping Contest.<br>
Very shortly said: it's a really huge set of <b>Contour Lines</b> (the uncompressed Shapefile is about <b>1,3 GB</b>)<br><br>
<table cellspacing="6" cellpadding="6">
<tr><td><h3>Fig. 1</h3>
This one is the original set of Contour Lines to be clipped.</td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-0.png" alt="original dataset"></td></tr>
<tr><td><h3>Fig. 2</h3>
This is the final result of the clipping operation.<ul>
<li>the Study Area boundary (polygon) is represented in red.</li>
<li>the Contour Lines are represented in dark blue.</li>
<li>the azure rectangle simply marks the detail area shown in the next figure.</li>
</ul></td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-1.png" alt="clipped"></td></tr>
<tr><td><h3>Fig. 3</h3>This is a magnified detail of the Study Area and Clipped Contour Lines.<br>
As you can easily notice there are lots of Contour Lines; and each Line has many vertexes.<br><br>
This one is a rather complex problem to be resolved: and it surely represents a thorough and demanding test.</td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-2.png" alt="load_extension"></td></tr>
</table>
<h2>The Spatial SQL approach</h2>
This is a pure data-processing approach; we'll use the most classical <b><i>Spatial Is Not Special</i></b> methodology, i.e. we'll go performing the <b>Clipping Contest</b> exactly as if it was a trivial ordinary SQL query performed on behalf of plain generic data.<br>
We'll not use at all any GIS-specific frill, and we'll simply use a couple of Spatial SQL functions where strictly required.<br>
(<i>BTW it's a good opportunity to introduce few basic concepts about Spatial SQL</i>).<br>
Let's go ...
<h4>Preparing the SQL script</h4>
<verbatim>
--
-- SQL script performing the Clipping Contest
-- Target DBMS: SQLite with SpatiaLite extension
--
SELECT DateTime('now'), 'Loading the StudyArea SHP';
--
-- loading the StudyArea Shapefile into the DB
--
.loadshp StudyArea1MileBuffer StudyArea1MileBuffer CP1252 2260 Geometry Id POLYGON 2d no no_spatial_index
SELECT DateTime('now'), 'Loading the Contours SHP';
--
-- loading the Contour Shapefile into the DB
--
.loadshp Contours20Ft Contours20Ft CP1252 2260 Geometry Id LINESTRING 2d no with_spatial_index
SELECT DateTime('now'), 'Clipping ... please wait';
--
-- creating a new table containing the Clipped Contours
--
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID,
c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
CastToMultiLinestring(ST_Intersection(
c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'Contours20Ft'
AND search_frame = a.Geometry
);
SELECT DateTime('now'), 'deleting NULL Geometries';
--
-- clean up
-- deleting any NULL Geometry from Clipped Contours
--
DELETE FROM Contours20FtClipped
WHERE Geometry IS NULL;
SELECT DateTime('now'), 'recovering a genuine Geometry column';
--
-- recovering a genuine Geometry Column (Clipped Contours)
---
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry', 2260, 'MULTILINESTRING', 2);
SELECT DateTime('now'), 'creating the Spatial Index';
--
-- creating a Spatial (Index supporting Clipped Contours)
--
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');
SELECT DateTime('now'), 'all done ... quitting';
--
-- end job
--
</verbatim>
Don't be fouled; complexity is more apparent than real. The following is a quick explanation of the step by step process defined by the above SQL script.
<table cellspacing="6" cellpadding="6">
<tr><td bgcolor="#f4f4f4"><b>.loadshp</b> <i>parameters</i></td>
<td><h3>Step #1</h3>
simply a <b>dot macro command</b> understood by the SpatiaLite front-end tool, allowing to directly import an external Shapefile into a corresponding Spatial Table within the DB.</td></tr>
<tr><td><verbatim>
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID,
c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
CastToMultiLinestring(ST_Intersection(
c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'Contours20Ft'
AND search_frame = a.Geometry
);
</verbatim></td>
<td><h3>Step #2</h3>
this single SQL query performs all the actual Clipping task.<br>
<b>ST_Intersection()</b> is a Spatial SQL function computing the intersection between the <b>Study Area</b> (<i>polygon</i>) and each <b>Contour Line</b> (<i>linestring</i>). This could be:<ul>
<li>a Linestring or a MultiLinestring (depending on the specific shape of both geometries)</li>
<li>a NULL (i.e. </i>absolutely nothing</i>) if the Contour Line and the Study Area do not intersect at all.</li>
<li><b>CastToMultiLinestring()</b> simply is a <u><i>cast operator</i></u> ensuring that all returned geometries will be of the same type.</li>
<li>the inner sub-query <b>SELECT ROWID FROM SpatialIndex ...</b> is simply intended to access the Spatial Index supporting the Contour Lines, so to speed up the whole query.</li>
<li><b>CREATE TABLE Contours20FtClipped AS</b> simply intends that the resultset returned by this SQL query has to be permanently saved into yet another table (to be created on the fly).</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
DELETE FROM Contours20FtClipped
WHERE Geometry IS NULL;
</verbatim></td>
<td><h3>Step #3</h3>
just a stupid post-processing step; the previous query has surely generated many NULL geometries, and we'll now get rid of all them.</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry',
2260, 'MULTILINESTRING', 2);
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');
</verbatim></td>
<td><h3>Step #4</h3>
the very final post-processing step: transforming the table containing all Clipped Contour Lines into a genuine Spatial Table supported by its own Spatial Index.</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
SELECT DateTime('now'), 'some message';
</verbatim></td>
<td><h3>measuring the time</h3>this simply is a weird SQL trick; we'll query the system clock immediately before and after performing each step, so to get full trace of the corresponding timings.</li>
</ul><br>
</td></tr>
</table>
<h4>Bringing things to a head: running the test</h4>
<verbatim>
# spatialite my_test.sqlite < test.sql
SpatiaLite version ..: 4.0.0-RC2 Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualXL' [direct XLS access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualSpatialIndex' [R*Tree metahandler]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0
2012-11-10 00:03:52|Loading the StudyArea SHP
2012-11-10 00:03:52|Loading the Contours SHP
2012-11-10 00:04:54|Clipping ... please wait
2012-11-10 00:07:31|deleting NULL Geometries
2012-11-10 00:07:33|recovering a genuine Geometry column
2012-11-10 00:07:34|creating the Spatial Index
2012-11-10 00:07:36|all done ... quitting
#
</verbatim>
<h2>Final considerations</h2>
<ul>
<li>The whole process took less than <b>4 minutes</b></li>
<li>About <b>1 minute</b> was spent loading the Shapefiles and building the supporting Spatial Index</li>
<li>Less than <b>10 seconds</b> were spent performing post-processing steps.</li>
<li>The hard-core problem itself (generating a Clipped set of Contour Lines) took about <b>2 minutes and half</b>.</li>
</ul><br>
<u>Please note</u>: this <b>Clipping Contest</b> really is like <b>comparing apples and oranges</b>.
<ul>
<li>the CPUs used to actually run the tests are of different generations, and obviously have different intrinsic speeds.</li>
<li>the amount of available RAM is absolutely not relevant; this Contest could eventually benefit from a faster CPU, but the required memory footprint is minimal (just few hundredth MB in the SpatiaLite's case).</li>
<li>adopting some kind of parallel processing will obviously introduce a noticeable performance boost (SpatiaLite is single threaded; so using a quad-core processor had absolutely no effect at all).</li>
</ul>
<h4>What are we measuring, really ?</h4>
<ul>
<li>Many GFOSS implementations (QGIS, PostGIS, SpatiaLite) are based on <a href="http://trac.osgeo.org/geos/">GEOS</a>; so any measured time effectively represents the intrinsic efficiency of GEOS much more than the efficiency of the top-level application itself.<br>
If a difference exists between different GEOS-based apps, this is simply when the top-level apps badly introduces some further overhead thus reducing the overall efficiency.</li>
<li>Other GFOSS Java implementations (gvSIG, OpenJump, uDIG) are based on <a href="http://www.vividsolutions.com/jts/jtshome.htm">JTS</a>; but GEOS simply is a transposition in C++ of JTS, the underlaying algorithms are exactly the same.<br>
So, roughly speaking, all GFOSS applications (both C, C++ and Java) after all are based on the top of the same base-level algorithms.<br>
And not at all surprisingly all measured timings are quite exactly the same.</li>
<li>With the remarkable exception of GRASS GIS; this one adopts an original approach of its own, being based on a conceptually different Geometry model (topology vs simple features).
Anyway, in this case too timings are very similar to the others.<br>
IMHO it's a good demonstration that we have complessively reached the best possible efficiency for this kind of problems. And that openness combined with collaboration/competition between many different independent projects leads to good, robust and efficient software.</li>
<li>Many proprietary implementations looks to be embarrassangly less efficient.
A sad demonstration that a development policy based on closedness and secrecy very difficultly pays on long term periods.</li>
</li>
</ul>
<br><hr>
Back to <a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0#virtualogr">main page</a>
Z 83c7f789225f2baeb5efed630905d23f