last updated: 2018-09-22



System level performace hints

We have examined since now several optimization related topics: but all this mainly was to be intended as “smartly writing well designed queries”.

Although defining a properly planned SQL query surely represents the main factor to achieve optimal performances, this isn't enough.
A second level of performance optimization (fine tuning) exist, i.e. the one concerning interactions between the DBMS and the underlying Operating System / File System.


DB pages / page cache

Any SQLite DB simply is a single monolithic file: any data and related info is stored within this files.
As in many others DBMS, disk space isn't allocated at random, but is properly structured:
the atomic allocation unit is defined as a page, so a DB file simply is a well organized collection of pages.
All pages within the same DB must have the same identical size (typically 1 KB. i.e. 1024 bytes.):
  • adopting a bigger page size may actually reduce the I/O traffic, but may impose to waste a significant amount of unused space.
  • adopting a smaller page size is strongly discouraged, because will surely imply a much more sustained I/O traffic.
  • so the default page size of 1 KB. represents a mean case well fitted for the vast majority of real world situations.
Reading and writing from disk a single page at each time surely isn't an efficient process;
so SQLite maintains an internal page cache (stored in RAM), supporting fast access to the most often accessed pages.
Quite intuitively, adopting a bigger page cache can strongly reduce the overall I/O traffic;
and consequently an higher throughput can be achieved.

By default SQLite adopts a very conservative approach, so to require a light-weight memory footprint;
the initial page cache will simply store 2000 pages. (corresponding to a total allocation of only 20 MB.).

But a so small default page cache surely isn't enough to properly support an huge DB, (may be one ranging in the many-GB size);
this will easily become a real bottleneck, causing very poor global performances.

PRAGMA page_size;.
1024.
PRAGMA page_count;.
31850.
PRAGMA freelist_count;.
12326.
You can use several PRAGMA.s to check the page status for the currently connected DB:
PRAGMA page_size = 4096;.
PRAGMA page_size;.
1024.
You can call a PRAGMA page_size. so to set a different page size
(you must specify a power of two size argument, ranging from 512. to 65536.):
VACUUM;.
Performing a VACUUM. implies the following actions to be performed:
PRAGMA page_size;.
4096.
PRAGMA page_count;.
5197.
PRAGMA freelist_count;.
0.

Just a quick check: immediately after performing VACUUM. the new page size has been effectively applied, and there are no unused pages at all.

PRAGMA cache_size;.
1000.
PRAGMA cache_size = 1000000;.
PRAGMA cache_size;.
1000000.
You can use PRAGMA cache_size. in order to query or set the page cache:
Requesting a very generously (but wisely) dimensioned page cache usually will grant a great performance boost, most notably when you are processing a very large DB.


You can modify other important settings using the appropriate PRAGMA.s supported by SQLite:
  • PRAGMA ignore_check_constraint. can be used to query, enable or disable CHECK. constraints
    (e.g. disabling check constraints is unsafe, but may be required during preliminary data loading).
  • PRAGMA foreign_key. can be used to query, enable or disable FOREIGN KEY. constraints
    (and this too may be useful or required during preliminary data loading).
  • PRAGMA journal_mode. can be used to query or set fine details about TRANSACTION. journaling.
PRAGMA.'s implementation change from time to time, so you can usefully consult the appropriate SQLite documentation

There are several data formats that are absolutely widespread in the GIS professional world.
All them are open formats (i.e. they are not specifically bounded to any specific proprietary software, they are not patent covered, and they are publicly documented).

Not at all surprisingly, such formats are universally supported by any GIS-related software, and can be safely used for data interchange purposes between different platforms and systems.
  • the ESRI Shapefile format represents the de-facto universal format for GIS data exchange.
  • the DBF format was introduced in the very early days of personal computing by dBase (the first DBMS-like software to gain universal popularity):
    a DBF file is included in every Shapefile but is quite common finding naked DBF files simply used to ship flat tables.
  • the TXT/CSV format simply identifies any structured text file (usually, tab separated values or comma separated values are the most often found variants).
SpatiaLite supports all the above formats for import and/or export.

Please note well: other data formats are very popular and widespread. e.g. the following ones:
  • Microsoft Excel spreadsheets (.xls)
  • Microsoft Access database (.mdb)
  • and many, many others.
All them are closed (proprietary) formats.
i.e. some specific proprietary software or operating system is strictly required, there is no publicly available documentation, and/or they are patent covered.
And all this easily explains why SpatiaLite (as many others open source packages) cannot support such closed formats.


Shapefiles

SpatiaLite supports both import and export for Shapefiles: shapefile import
spatialite_gui.: Shapefile import

shapefile export
spatialite_gui.: Shapefile export

spatialite. CLI front-end: shapefile import:
> spatialite provinces.sqlite
SpatiaLite version ..: 5.0.0 Supported Extensions:
        - 'VirtualShape'        [direct Shapefile access]
        - 'VirtualDbf'          [direct DBF access]
        - 'VirtualText'         [direct CSV/TXT access]
        - 'VirtualNetwork'      [Dijkstra shortest path]
        - 'RTree'               [Spatial Index - R*Tree]
        - 'MbrCache'            [Spatial Index - MBR cache]
        - 'VirtualSpatialIndex'          [R*Tree metahandler]
        - 'VirtualElementary'          [ElemGeoms metahandler]
        - 'VirtualKNN'          [K-Nearest Neighbors metahandler]
        - 'VirtualXPath'          [XML Path Language - XPath]
        - 'VirtualFDO'          [FDO-OGR interoperability]
        - 'VirtualGPKG'          [OGC GeoPackage interoperability]
        - 'VirtualBBox'          [BoundingBox tables]
        - 'SpatiaLite'          [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.9.3, 15 August 2016
GEOS version ........: 3.5.0-CAPI-1.9.0 r4084
RTTOPO version ......: 1.1.0-dev
SQLite version ......: 3.22.0
Enter ".help" for instructions
spatialite>
.loadshp prov2010_s provinces CP1252 32632
the SPATIAL_REF_SYS table already contains some row(s)
========
Loading shapefile at 'prov2010_s' into SQLite table 'provinces'

BEGIN;
CREATE TABLE provinces (
PK_UID INTEGER PRIMARY KEY AUTOINCREMENT,
"OBJECTID" INTEGER,
"cod_pro" INTEGER,
"NOME_PRO" TEXT,
"SIGLA" TEXT);
SELECT
 AddGeometryColumn('provinces', 'Geometry', 32632, 'MULTIPOLYGON', 'XY');
COMMIT;

Inserted 110 rows into 'provinces' from SHAPEFILE
========
spatialite>
.headers on
spatialite> SELECT * FROM provinces LIMIT 5;
PK_UID|OBJECTID|cod_pro|NOME_PRO|SIGLA|Geometry
1|1|1|Torino|TO|
2|2|2|Vercelli|VC|
3|3|3|Novara|NO|
4|4|4|Cuneo|CN|
5|5|5|Asti|AT|
spatialite>

spatialite. CLI front-end: shapefile export:
spatialite> .dumpshp provinces Geometry exported_provinces CP1252
========
Dumping SQLite table 'provinces' into shapefile at 'exported_provinces'

SELECT
 * FROM "provinces" WHERE GeometryAliasType("Geometry") = 'POLYGON'
OR GeometryAliasType("Geometry") = 'MULTIPOLYGON' OR "Geometry" IS NULL;

Exported 110 rows into SHAPEFILE
========
spatialite>
.quit
>

spatialite_tool. shell command: shapefile import
> spatialite_tool -i -shp prov2010_s -d db.sqlite -t provinces -c CP1252 -s 32632
SQLite version: 3.7.4
SpatiaLite version: 2.4.0-RC5
Inserted 110 rows into 'provinces' from 'prov2010_s.shp'

>

spatialite_tool. shell command: shapefile export:
> spatialite_tool -e -shp exported_provinces -d db.sqlite -t provinces -g Geometry -c CP1252
SQLite version: 3.7.4
SpatiaLite version: 2.4.0-RC5
Exported 110 rows into 'exported_provinces.shp' from 'provinces'

>

DBF files

SpatiaLite simply supports import for DBF files: DBF import
spatialite_gui.: DBF import

> spatialite communities.sqlite
SpatiaLite version ..: 5.0.0 Supported Extensions:
        - 'VirtualShape'        [direct Shapefile access]
        - 'VirtualDbf'          [direct DBF access]
        - 'VirtualText'         [direct CSV/TXT access]
        - 'VirtualNetwork'      [Dijkstra shortest path]
        - 'RTree'               [Spatial Index - R*Tree]
        - 'MbrCache'            [Spatial Index - MBR cache]
        - 'VirtualFDO'          [FDO-OGR interoperability]
        - 'SpatiaLite'          [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.3.0-CAPI-1.7.0
SQLite version ......: 3.7.4
Enter ".help" for instructions
spatialite>
.loaddbf com2010_s.dbf communities CP1252
========
Loading DBF at 'com2010_s.dbf' into SQLite table 'communities'

BEGIN;
CREATE TABLE communities (
PK_UID INTEGER PRIMARY KEY AUTOINCREMENT,
"OBJECTID" INTEGER,
"cod_reg" INTEGER,
"cod_pro" INTEGER,
"cod_com" INTEGER,
"pro_com" INTEGER,
"comune" TEXT,
"NOME_ITA" TEXT,
"NOME_TED" TEXT);
COMMIT;

Inserted 8094 rows into 'communities' from DBF ========
spatialite>
.headers on
spatialite>
SELECT * FROM communities LIMIT 5 OFFSET 5000;
PK_UID|OBJECTID|cod_reg|cod_pro|cod_com|pro_com|comune|NOME_ITA|NOME_TED
5001|4958|12|58|54|58054|Manziana|Manziana|
5002|4959|12|58|55|58055|Marano Equo|Marano Equo|
5003|4960|12|58|56|58056|Marcellina|Marcellina|
5004|4961|12|58|57|58057|Marino|Marino|
5005|4962|12|58|58|58058|Mazzano Romano|Mazzano Romano|

spatialite>
.quit
>

spatialite. CLI front-end: DBF import

> spatialite_tool -i -dbf com2010_s -d db.sqlite -t communities -c CP1252
SQLite version: 3.7.4
SpatiaLite version: 2.4.0-RC5
Inserted 8094 rows into 'communities' from 'com2010_s.dbf'
>
spatialite_tool. shell command: DBF import


TXT/CSV files

SpatiaLite supports both import and export for TXT/CSV files: TXT/CSV import
spatialite_gui.: TXT/CSV import

TXT/CSV export
spatialite_gui.: TXT/CSV export


Other supported export formats

Using spatialite_gui. you can also export your data as: HTML export
spatialite_gui.: HTML export

HTML sample
HTML export sample



MAP export 1
spatialite_gui.: PNG / PDF / SVG export (1)

MAP export 2
spatialite_gui.: PNG / PDF / SVG export (2)



PNG sample
PNG export sample

PDF sample
PDF export sample

SVG sample
SVG export sample

Both SQLite and SpatiaLite are elementary simple are really lightweight.
So both them are obvious candidates when you are a software developer, and your application absolutely requires a robust and affordable Spatial DBMS support, but you are attempting to keep anything as simple as possible, possibly avoiding at all any unnecessary complexity.

The best fit development language supporting SQLite and SpatiaLite is obviously C [or its ugly duckling son, the C++]
(after all, both SQLite and SpatiaLite are completely written in C language).
Using C/C++ you can directly access the wonderful APIs of both SQLite and SpatiaLite, so to get a full and unconstrained access to any supported feature at the lowest possible level.
And that's not all: using C/C++ you can eventually adopt static linkage, so to embed directly within the executable itself (and with an incredibly small footprint) a fully self-contained DBMS engine.
And such an approach surely gets rid of any installation related headache.

Anyway you are not at all compelled to necessarily use C/C++
SQLite and SpatiaLite are supported as well by many other languages such as Java, Python, PHP (and probably many others).

The basic approach

Any language supporting any generic SQLite driver aka connector can fully support SpatiaLite.
SQLite supports dynamic extension loading; and SpatiaLite simply is such an extension.

SELECT load_extension('path_to_extension_library');.

Executing the above SQL statement will load any SQLite's extension: this obviously including SpatiaLite.
Anyway, too much often this is true only in theory, but reality is completely different from this.

Let us quickly examine the main issues actually bringing this simple approach to a complete failure:
  • SQLite is highly configurable; it supports lots and lots of build-time options.
    You can completely disable at all the dynamic extension load mechanism, if you think this one could be a safe option.
    Sadly, for many long years this one has been the favorite choice for the vast majority of system packagers.
  • SQLite is growing very quickly: usually a major update is released every few months.
    But many system packagers still continue distributing incredibly obsolete SQLite's versions.
    And quite obviously such obsolete versions cannot adequately support SpatiaLite.
  • Once you are caught in this painful situation (disabled extensions / obsolete SQLite) you cannot do absolutely nothing.
    You simply have to give up forgetting SpatiaLite: at least for now.
  • Anyway, if this is your actual case, doesn't lose heart: things evolves, and usually tends to evolve in the right direction.
    Just two years ago very few languages supported SpatiaLite.
    Today this isn't any longer true for the most widely used languages (at least, using recently released versions).



You can usefully read the appropriate section corresponding to your beloved language, so to get started in the quickest time.
Each section contains a complete sample program, and contains as well any related system configuration hint, compiler/linker settings and so on:

Fine dining experience: Chez Dijkstra

SpatiaLite supports an internal routing module called VirtualNetwork.
Starting from an arbitrary network this module allows to identify shortest path connections using simple SQL queries.
The VirtualNetwork. module supports sophisticated and highly optimized algorithms, so it's really fast and very efficient even using huge sized networks.

Network foundations

You cannot assume that any generic road layer corresponds to a network.
A real network must satisfy several specific prerequisites, i.e. it has to be a graph.

Graph theory is a wide and complex branch of mathematics;
if you are interested in this, here you can get some further details:
Graph Theory
Shortest Path Problem
Dijkstra's Algorithm
A* Algorithm
Graph example
Very shortly explained:
  • a network is a collection of arcs
  • each single arc connects two nodes
  • each arc has an unique direction:
    i.e. the arc going from A-node to B-node is not necessarily the same one going from B to A
  • each arc has a well known cost (e.g. length, travel time, capacity, ...)
  • both arcs and nodes must expose some explicitly defined unique identifier.
  • geometries of arcs and nodes must satisfy a strong topological consistency.
Starting from a network aka graph both Dijkstra's and A* algorithms can then identify the shortest path (minimal cost connection) connecting any arbitrary couple of nodes.
There are several sources distributing network-like data.
One of the most renowned and widely used is OSM [Open Street Map], a completely free worldwide dataset.
There are several download sites distributing OSM; just to mention the main ones: Anyway in the following example we'll download the required OSM dataset from: www.gfoss.it
Most precisely we'll download the toscana.osm.bz2. dataset.
Version from 2014-06-26:
  • toscana.osm.bz2 : 105 MB
  • toscana.osm : 1.5 GB
  • tuscany.sqlite : 84.4 MB (after import by spatialite_osm_net [2018-09-28] )

[2018-09-28] or from Geofabrik
Step 1: you must uncompress the OSM dataset.
This file is compressed using the bzip2 algorithm, widely supported by many open source tools.
e.g. you can use 7-zip to unzip this file. www.7-zip.org

Step 2: any OSM dataset simply is an XML file
(you can open this file using any ordinary text editor at your choice).
SpatiaLite supports a specific CLI tool allowing to load an OSM dataset into a DB: spatialite_osm_net.
>spatialite_osm_net -o toscana.osm -d tuscany.sqlite -T tuscany -m
SQLite version: 3.22.0
SpatiaLite version: 5.0.0
using IN-MEMORY database
Parsing input: Pass 1 [Nodes and Ways] ...
Parsing input: Pass 2 [Arcs of the Graph] ...
UNRESOLVED-NODE 1520302186
..
UNRESOLVED-NODE 1964678842
Creating helper table 'tuscany_nodes' ... wait please ...
  Helper table 'tuscany_nodes' successfully created
Dropping temporary table 'osm_tmp_nodes' ... wait please ...
  Dropped table 'osm_tmp_nodes'
Dropping temporary table 'graph_nodes' ... wait please ...
  Dropped table 'graph_nodes'
exporting IN_MEMORY database ... wait please ...
  IN_MEMORY database successfully exported
VACUUMing the DB ... wait please ...
  All done: OSM graph was successfully loaded
>

Very briefly explained:
  • -o toscana.osm. selects the input OSM dataset to be loaded.
  • -d tuscany.sqlite. selects the output DB to be created and populated.
  • -T tuscany. will create the Geometry Table storing the OSM dataset
  • -m. an in-memory database will be used, so to perform data import in the shortest time.
SELECT
 *
FROM tuscany;
id osm_id class node_from node_to name oneway_from_to oneway_to_from length cost geometry
... ... ... ... ... ... ... ... ... ... ...
2393 8079944 tertiary 659024545 659024546 Via Cavour 1 1 7.468047 0.537699 BLOB sz=80 GEOMETRY
2394 8079944 tertiary 659024546 156643876 Via Cavour 1 1 12.009911 0.864714 BLOB sz=96 GEOMETRY
2395 8083989 motorway 31527668 319386487 Autostrada del Sole 1 0 424.174893 13.882087 BLOB sz=80 GEOMETRY
2396 8083990 motorway 31527665 31527668 Autostrada del Sole 1 0 130.545183 4.272388 BLOB sz=112 GEOMETRY
... ... ... ... ... ... ... ... ... ... ...
Just a quick check:
  • a single Tuscany. table exists into the DB created by spatialite_osm_net.
  • each row in this table corresponds to a single network arc
  • the nodes connected by each arc are identified by node_from. and node_to.
  • oneway_from_to. and oneway_to_from. determine if the arc can be walked in both directions or not.
  • length. is the geometric length of the arc (measured in meters).
  • cost. is the estimated travel time (expressed in seconds).
  • geometry. is the LINESTRING. representation corresponding to the arc.
Please note #1: there is no separate representation for nodes, simply because they can be indirectly retrieved starting from the corresponding arcs.

Please note #2: this one surely is a real network, but in this form cannot yet support routing queries.
A further step is still required, i.e. creating a VirtualNetwork. table.
building network
We'll use spatialite_gui. to create the VirtualNetwork table.
Anyway the same operation is supported as well by the spatialite_network. CLI tool
(and this CLI tool supports an extended diagnostic capability, useful to identify any eventual problem).
SELECT
 *
FROM tuscany_net
WHERE NodeFrom = 267209305
 AND NodeTo = 267209702;
Algorithm ArcRowid NodeFrom NodeTo Cost Geometry Name
Dijkstra NULL 267209305 267209702 79.253170 BLOB sz=272 GEOMETRY NULL
Dijkstra 11815 267209305 250254381 11.170037 NULL Via Guelfa
Dijkstra 11816 250254381 250254382 8.583739 NULL Via Guelfa
Dijkstra 11817 250254382 250254383 12.465016 NULL Via Guelfa
Dijkstra 16344 250254383 256636073 15.638407 NULL Via Cavour
Dijkstra 67535 256636073 270862435 3.147105 NULL Piazza San Marco
Dijkstra 25104 270862435 271344268 5.175379 NULL Piazza San Marco
Dijkstra 25105 271344268 82591712 3.188657 NULL Piazza San Marco
Dijkstra 11802 82591712 267209666 4.978328 NULL Piazza San Marco
Dijkstra 20773 267209666 267209702 14.906501 NULL Via Giorgio La Pira
And finally you can now test your first routing query:
  • you simply have to set the WHERE NodeFrom = ... AND NodeTo = .... clause.
  • and a result-set representing the shortest path solution will be returned.
  • the first row of this result-set summarizes the whole path, and contains the corresponding geometry.
  • any subsequent row represents a single arc to be traversed, following the appropriate sequence, so to go from origin to destination.
UPDATE tuscany_net SET Algorithm = 'A*';
UPDATE tuscany_net SET Algorithm = 'Dijkstra';
SpatiaLite's VirtualNetwork. tables support two alternative algorithms:
  • Dijkstra's shortest path is a classic routing algorithm, based on thorough mathematical assumptions, and will surely identify the optimal solution.
  • A* is an alternative algorithm based on heuristic assumptions:
    it is usually faster than Dijkstra's, but under some odd condition may eventually fail, or may return a sub-optimal solution.
  • anyway switching from the one to the other is really simple.
  • using the Dijksta's algorithm is the default selection.
A VirtualNetwork. table simply represents a staticized snapshot of the underlying network.
This allows to adopt an highly efficient binary representation (in other words, allows to produce solutions in a very quick time), but obviously doesn't supports dynamic changes.

Each time the underlying network changes the corresponding VirtualNetwork. must be DROP.ped and then created again, so to correctly reflect the latest network state.

In many cases this isn't an issue at all: but on some highly dynamic scenario this may be a big annoyance.
Be well conscious of this limitation.


last updated: 2018-09-22