Wiki page
[benchmark-4.0] by
sandro
2012-11-10 02:50:22.
D 2012-11-10T02:50:22.490
L benchmark-4.0
U sandro
W 9470
<h2>Benchmack (aka "comparing apples and oranges")</h2>
Back to <a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0#virtualogr">main page</a>
<h2>Introduction</h2>
Today I've casually read this <a href="http://gfoss.blogspot.it/2012/11/arcgis-vs-qgis-etc-clipping-contest.html">blog page</a>; many thanks to my dear friend Markus Neteler (the maintainer of GRASS GIS) for pointing my attention on this topic.<br>
The original <b>Clipping Context</b> simply compared several popular Destkop GIS (ArcGis, QGIS, gvSIG, GRASS and others), but there was an obvious omission.<br>
No Spatial DBMS was considered at all. (<i>too bad; a really embarassing omission</i>)<br><br>
After all a <b>Clipping Context</b> sounds more like a pure data-processing problem than like a typical GIS/GUI problem, so it's probably useful introducing some Spatial DBMS in the Benchmark.
This was my first thought while reading the benchmark report, so I duly started preparing my own testbed based on SpatiaLite.
<h2>Test configuration</h2>
<ul>
<li>Intel Core i5 3.46 GHz</li>
<li>4 GB RAM</li>
<li>Windows 7 64 bit (physical)</li>
<li>Debian Squeeze 32 bit (virtual machine)</li>
</ul><br>
I tested both Windows and Debian on the same hardware.<br>
The measured timings are rather similar; anyway Linux performed stlightly better than the native Windows system (more or less, by a 10% - 15% factor).<br>
This is not at all surprising for me, because this trend seems to be constantly verified each time I manage some comparative benchmark.<br>
Very often Linux is a best performer than Windows, even when running as a Virtual Machine on the top of a physical Windows system.<br>
Quite paradoxical, but indisputably true and objectively verified.
<h2>The sample dataset</h2>
I've simply downloaded the same identical <a href="http://www.donmeltz.com/_files/ContourClipTest.zip">sample dataset</a> already used for all other tests reported in the Clipping Context.<br>
Very shortly said: it's a really huge set of <b>Contour Lines</b> (the uncompressed Shapefile is about <b>1,3 GB</b>)<br><br>
<table cellspacing="6" cellpadding="6">
<tr><td><h3>Fig. 1</h3>
This one is the orginal set of Contour Lines to be clipped.</td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-0.png" alt="original dataset"></td></tr>
<tr><td><h3>Fig. 2</h3>
This is the final result of the clipping operation.<ul>
<li>the Study Area boundary (polygon) is represented in red.</li>
<li>the Contour Lines are represented in dark blue.</li>
<li>the azure rectangle simply marks the detail area shown in the next figure.</li>
</ul></td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-1.png" alt="clipped"></td></tr>
<tr><td><h3>Fig. 3</h3>This is a magnified detail of the Study Area.<br>
As you can easily notice there are lots of Controur Lines; and each Line has many vertices.<br><br>
This one surely is a rather complex problem to be resolved, and it represents a thorough and demanding test.</td>
<td><img border="1" src="http://www.gaia-gis.it/gaia-sins/virtualogr/clip-2.png" alt="load_extension"></td></tr>
</table>
<h2>The Spatial SQL approach</h2>
This is a pure data-processing approach; we'll use the most classical <b><i>Spatial Is Not Special</i></b> methodology, i.e. we'll go performing the <b>Clipping Context</b> exactly as if it was a trivial ordinary SQL query performed on behalf of plain generic data.<br>
We'll not use at all any GIS-specific frill, and we'll simply use a couple of Spatial SQL functions where strictly required.<br>
Let's go ...
<h4>Preparing the SQL script</h4>
<verbatim>
--
-- SQL script performing the Clipping Context
-- Target DBMS: SQLite with SpatiaLite extension
--
SELECT DateTime('now'), 'Loading the StudyArea SHP';
--
-- loading the StudyArea Shapefile into the DB
--
.loadshp StudyArea1MileBuffer StudyArea1MileBuffer CP1252 2260 Geometry Id POLYGON 2d no no_spatial_index
SELECT DateTime('now'), 'Loading the Contours SHP';
--
-- loading the Contour Shapefile into the DB
--
.loadshp Contours20Ft Contours20Ft CP1252 2260 Geometry Id LINESTRING 2d no with_spatial_index
SELECT DateTime('now'), 'Clipping ... please wait';
--
-- creating a new table containing the Clipped Contours
--
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID,
c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
CastToMultiLinestring(ST_Intersection(
c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'Contours20Ft'
AND search_frame = a.Geometry
);
SELECT DateTime('now'), 'deleting NULL Geometries';
--
-- clean up
-- deleting any NULL Geometry from Clipped Contours
--
DELETE FROM Contours20FtClipped
WHERE Geometry IS NULL;
SELECT DateTime('now'), 'recovering a genuine Geometry column';
--
-- recovering a genuine Geometry Column (Clipped Contours)
---
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry', 2260, 'MULTILINESTRING', 2);
SELECT DateTime('now'), 'creating the Spatial Index';
--
-- creating a Spatial (Index supporting Clipped Contours)
--
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');
SELECT DateTime('now'), 'all done ... quitting';
--
-- end job
--
</verbatim>
Don't be fouled; complexity is more apparent than real.
<table cellspacing="6" cellpadding="6">
<tr><td bgcolor="#f4f4f4"><b>.loadshp</b> <i>parameters</i></td>
<td><h3>Step #1</h3>
simply a <b>dot macro command</b> understood by the SpatiaLite frontend tool, allowing to directly import an external Shapefile into a corresponding Spatial Table within the DB.</td></tr>
<tr><td><verbatim>
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID,
c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
CastToMultiLinestring(ST_Intersection(
c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'Contours20Ft'
AND search_frame = a.Geometry
);
</verbatim></td>
<td><h3>Step #2</h3>
this single SQL query performs all the actual Clipping task.<br>
<b>ST_Intersection()</b> is a Spatial SQL function computing the intersection between the <b>Study Area</b> (<i>polygon</i>) and each <b>Contour Line</b> (<i>linestring</i>). This could be:<ul>
<li>a Linestring or a MultiLinestring (depending on the specific shape of both geometries)</li>
<li>a NULL (i.e. </i>absolutely nothing</i>) if the Contour Line and the Study Area do not intersect at all.</li>
<li><b>CastToMultiLinestring()</b> simply is a <u><i>cast operator</i></u> ensuring that all returned geometries will be of the same type.</li>
<li>the inner sub-query <b>SELECT ROWID FROM SpatialIndex ...</b> is simply intended to access the Spatial Index supporting the Contour Lines, so to speed up the whole query.</li>
<li><b>CREATE TABLE Contours20FtClipped AS</b> simply intends that the resultset returned by this SQL query has to be pemanently saved into another table to be created on the fly.</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
DELETE FROM Contours20FtClipped
WHERE Geometry IS NULL;
</verbatim></td>
<td><h3>Step #3</h3>
just a stupid post-processing step; the previous query has surely generated many NULL geometries, and we'll now delete all them.</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry',
2260, 'MULTILINESTRING', 2);
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');
</verbatim></td>
<td><h3>Step #4</h3>
the very final post-processing step: transforming the table containing all Clipped Contour Lines into a genuine Spatial Table supported by its own Spatial Index.</li>
</ul><br>
</td></tr>
<tr><td><verbatim>
SELECT DateTime('now'), 'some message';
</verbatim></td>
<td>this simply is a weird but in this specifc case really useful SQL trick; we'll query the system clock immediately before and after performing each step, so to get full trace of the corresponding timings.</li>
</ul><br>
</td></tr>
</table>
<h4>Bringing things to a head</h4>
<verbatim>
# spatialite my_test.sqlite < test.sql
SpatiaLite version ..: 4.0.0-RC2 Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualXL' [direct XLS access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualSpatialIndex' [R*Tree metahandler]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0
2012-11-10 00:03:52|Loading the StudyArea SHP
2012-11-10 00:03:52|Loading the Contours SHP
2012-11-10 00:04:54|Clipping ... please wait
2012-11-10 00:07:31|deleting NULL Geometries
2012-11-10 00:07:33|recovering a genuine Geometry column
2012-11-10 00:07:34|creating the Spatial Index
2012-11-10 00:07:36|all done ... quitting
#
</verbatim>
<br><hr>
Back to <a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0#virtualogr">main page</a>
Z cf60cdb075bb958836d06edfc0884343