Update of "benchmark-4.0"
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: ac4d14b19870c304878d030920ae94afca7917fa
Page Name:benchmark-4.0
Date: 2012-11-10 02:50:22
Original User: sandro
Next df738fae756e09f6cc4b4ba9da6fa773390bc11a
Content

Benchmack (aka "comparing apples and oranges")

Back to main page

Introduction

Today I've casually read this blog page; many thanks to my dear friend Markus Neteler (the maintainer of GRASS GIS) for pointing my attention on this topic.
The original Clipping Context simply compared several popular Destkop GIS (ArcGis, QGIS, gvSIG, GRASS and others), but there was an obvious omission.
No Spatial DBMS was considered at all. (too bad; a really embarassing omission)

After all a Clipping Context sounds more like a pure data-processing problem than like a typical GIS/GUI problem, so it's probably useful introducing some Spatial DBMS in the Benchmark. This was my first thought while reading the benchmark report, so I duly started preparing my own testbed based on SpatiaLite.

Test configuration


I tested both Windows and Debian on the same hardware.
The measured timings are rather similar; anyway Linux performed stlightly better than the native Windows system (more or less, by a 10% - 15% factor).
This is not at all surprising for me, because this trend seems to be constantly verified each time I manage some comparative benchmark.
Very often Linux is a best performer than Windows, even when running as a Virtual Machine on the top of a physical Windows system.
Quite paradoxical, but indisputably true and objectively verified.

The sample dataset

I've simply downloaded the same identical sample dataset already used for all other tests reported in the Clipping Context.
Very shortly said: it's a really huge set of Contour Lines (the uncompressed Shapefile is about 1,3 GB)

Fig. 1

This one is the orginal set of Contour Lines to be clipped.
original dataset

Fig. 2

This is the final result of the clipping operation.
  • the Study Area boundary (polygon) is represented in red.
  • the Contour Lines are represented in dark blue.
  • the azure rectangle simply marks the detail area shown in the next figure.
clipped

Fig. 3

This is a magnified detail of the Study Area.
As you can easily notice there are lots of Controur Lines; and each Line has many vertices.

This one surely is a rather complex problem to be resolved, and it represents a thorough and demanding test.
load_extension

The Spatial SQL approach

This is a pure data-processing approach; we'll use the most classical Spatial Is Not Special methodology, i.e. we'll go performing the Clipping Context exactly as if it was a trivial ordinary SQL query performed on behalf of plain generic data.
We'll not use at all any GIS-specific frill, and we'll simply use a couple of Spatial SQL functions where strictly required.
Let's go ...

Preparing the SQL script

--
-- SQL script performing the Clipping Context
-- Target DBMS: SQLite with SpatiaLite extension
--

SELECT DateTime('now'), 'Loading the StudyArea SHP';
--
-- loading the StudyArea Shapefile into the DB
--
.loadshp StudyArea1MileBuffer StudyArea1MileBuffer CP1252 2260 Geometry Id POLYGON 2d no no_spatial_index

SELECT DateTime('now'), 'Loading the Contours SHP';
--
-- loading the Contour Shapefile into the DB
--
.loadshp Contours20Ft Contours20Ft CP1252 2260 Geometry Id LINESTRING 2d no with_spatial_index

SELECT DateTime('now'), 'Clipping ... please wait';
--
-- creating a new table containing the Clipped Contours
--
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID, 
  c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
  CastToMultiLinestring(ST_Intersection(
      c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
    SELECT ROWID 
    FROM SpatialIndex
    WHERE f_table_name = 'Contours20Ft' 
        AND search_frame = a.Geometry
);

SELECT DateTime('now'), 'deleting NULL Geometries';
--
-- clean up
-- deleting any NULL Geometry from Clipped Contours
--
DELETE FROM Contours20FtClipped 
WHERE Geometry IS NULL;

SELECT DateTime('now'), 'recovering a genuine Geometry column';
--
-- recovering a genuine Geometry Column (Clipped Contours)
---
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry', 2260, 'MULTILINESTRING', 2);

SELECT DateTime('now'), 'creating the Spatial Index';
--
-- creating a Spatial (Index supporting Clipped Contours)
--
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');


SELECT DateTime('now'), 'all done ... quitting';
--
-- end job
--
Don't be fouled; complexity is more apparent than real.
.loadshp parameters

Step #1

simply a dot macro command understood by the SpatiaLite frontend tool, allowing to directly import an external Shapefile into a corresponding Spatial Table within the DB.
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID, 
  c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
  CastToMultiLinestring(ST_Intersection(
      c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
    SELECT ROWID 
    FROM SpatialIndex
    WHERE f_table_name = 'Contours20Ft' 
        AND search_frame = a.Geometry
);

Step #2

this single SQL query performs all the actual Clipping task.
ST_Intersection() is a Spatial SQL function computing the intersection between the Study Area (polygon) and each Contour Line (linestring). This could be:
  • a Linestring or a MultiLinestring (depending on the specific shape of both geometries)
  • a NULL (i.e. absolutely nothing) if the Contour Line and the Study Area do not intersect at all.
  • CastToMultiLinestring() simply is a cast operator ensuring that all returned geometries will be of the same type.
  • the inner sub-query SELECT ROWID FROM SpatialIndex ... is simply intended to access the Spatial Index supporting the Contour Lines, so to speed up the whole query.
  • CREATE TABLE Contours20FtClipped AS simply intends that the resultset returned by this SQL query has to be pemanently saved into another table to be created on the fly.

DELETE FROM Contours20FtClipped 
WHERE Geometry IS NULL;

Step #3

just a stupid post-processing step; the previous query has surely generated many NULL geometries, and we'll now delete all them.
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry', 
    2260, 'MULTILINESTRING', 2);
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');

Step #4

the very final post-processing step: transforming the table containing all Clipped Contour Lines into a genuine Spatial Table supported by its own Spatial Index.
SELECT DateTime('now'), 'some message';
this simply is a weird but in this specifc case really useful SQL trick; we'll query the system clock immediately before and after performing each step, so to get full trace of the corresponding timings.

Bringing things to a head

# spatialite my_test.sqlite < test.sql

SpatiaLite version ..: 4.0.0-RC2	Supported Extensions:
	- 'VirtualShape'	[direct Shapefile access]
	- 'VirtualDbf'		[direct DBF access]
	- 'VirtualXL'		[direct XLS access]
	- 'VirtualText'		[direct CSV/TXT access]
	- 'VirtualNetwork'	[Dijkstra shortest path]
	- 'RTree'		[Spatial Index - R*Tree]
	- 'MbrCache'		[Spatial Index - MBR cache]
	- 'VirtualSpatialIndex'	[R*Tree metahandler]
	- 'VirtualFDO'		[FDO-OGR interoperability]
	- 'SpatiaLite'		[Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0

2012-11-10 00:03:52|Loading the StudyArea SHP
2012-11-10 00:03:52|Loading the Contours SHP
2012-11-10 00:04:54|Clipping ... please wait
2012-11-10 00:07:31|deleting NULL Geometries
2012-11-10 00:07:33|recovering a genuine Geometry column
2012-11-10 00:07:34|creating the Spatial Index
2012-11-10 00:07:36|all done ... quitting
#


Back to main page