SpatiaLite - spatial extensions for SQLite

The present manual is not intended to be a complete and exhaustive reference guide; it is simply intended to explain in a conceptual way the basis of Spatial Data Processing and the main features supported by SpatiaLite.

In order to get most specific informations, you can read the complete list of Spatial SQL functions supported by SpatiaLite.

And you can follow the tutorial in order to get a quick how to guide.

Table of Contents


1. An introduction to SpatiaLite

SpatiaLite is just a small sized SQLite extension. Once you have installed SpatiaLite (a very simple and elementary task), the SQLite DBMS is enable to load, store and manipulate Spatial Data (aka Geographic Data, GIS Data, Cartographic Data, GeoSpatial Data, Geometry Data and alike).
SpatiaLite implements spatial extensions following the specification of the Open Geospatial Consortium (OGC), an international consortium of companies, agencies, and universities participating in the development of publicly available conceptual solutions that can be useful with all kinds of applications that manage spatial data.
At a very basic level, a DBMS that supports Spatial Data offers an SQL environment that has been extended with a set of geometry types, and thus may be usefully employed i.e. by some GIS application.
A geometry-valued SQL column is implemented as a column that has a geometry type. The OGC specification describe a set of SQL geometry types, as well as functions on those types to create and analyze geometry values. The followings are examples of geographic entities with a geometry: SpatiaLite itself simply implements a basic subset of the SQL with Geometry Types environment proposed by OGC; but SpatiaLite directly integrates others very popular open source libraries, such as: All this put together make SpatiaLite to be a quite complete and powerful Spatial DBMS. The main strength points of the SQLite + SpatiaLite couple are: However SpatiaLite has clear limits, mainly due to the elementary simplicity of the SQLite DB engine.
If you are looking for a full fledged Geo-DBMS, capable to support you in every kind of GIS operation, and perhaps for you is paramount to cope with millions and millions of geographic entities, and you are planning to deploy all this in a complex client-server architecture, don't even try to use SpatiaLite + SQLite.
Obviously it's not the solution you are looking for; simply it falls too short for your needs.
Otherwise, if you are looking for a very basic but full effective desktop Geo-DBMS, with no frills, no undue complexity, and the average volume of your data didn't exceeds a reasonable size, SpatiaLite + SQLite can really be the right solution for your needs.

Just to do a very quick comparison with others widely used open source DBMSs supporting Spatial Data:
DBMS Binary size Brief description
PostgreSQL
PostGis
25 MB
1.5 MB
It claims to be "The world's most advanced open source database", and that's simply the truth.
Not very user friendly (may cause you some headache when you'll try to install and configure it at first time), but surely a very complete and impressive full ACID DBMS.
The PostGIS add-on on its own implements a full set of Spatial features (OGC certified); whatever kind of GIS operation you need, PostGIS can support you in an useful way and without dimensional limits.
If you are seriously looking for a complete and unrestricted GIS support, and you aren't frightened at all about managing and caring a very sophisticated DBMS, as powerful as complex, PostgreSQL + PostGis is just what you need.
MySQL Spatial 42 MB It claims to be "The world's most popular open source database", and this too is simply the truth.
Reasonably easy to use, very fast, decently simple to manage; not surprisingly it's the preferred choice for so many WEB developers.
Often criticized for not being completely conformant to SQL standards, supports a wide set (sometimes puzzling too wide) of data engines.
MySql's Spatial enable you to manage in a very effective way GIS data; it's a pity that you can create a spatial index only if you are using MyISAM tables, that aren't neither transactional nor ACIDs. You can use Spatial data in InnoDB tables as well (full ACIDs), but in this case you cannot have spatial indexes.
Anyway, if you are looking for a DMBS able to effectively support WEB publishing of very large amounts of GIS data, and aren't worried to use a non-ACID DBMS (e.g. because you are publishing cold data, i.e. replicated data coming out from some other source), you can be very satisfied with MySQL Spatial.
SQLite
SpatiaLite
200 KB
2 MB
A very light weighted DBMS; it doesn't pretends to be, but perhaps it is "The world's smallest and simplest database".
As the aforementioned two tends to be vast and complex, as SQLite tends to be simple.
But simplicity does not always means incompleteness, or low quality; not in this case, at least.
SpatiaLite offers you the option to enable the SQLite DBMS to support a decent and standard-conformant environment for Spatial data as well.
It doesn't pretends at all to be better, or fastest, or most powerful than others; most humbly, it just pretends to be simple, light weighted and reasonably useful.
SpatiaLite now supports spatial indices as well, via RTrees SQLite introduced since v.3.6.0.
If you are looking for an elementary simple-to-manage GIS environment, without frills and undue complexity, and if your have to manage only small- or medium-sized GIS data, SQLite and SpatiaLite can make you feel very happy [I hope so, at least ...]

2. The OpenGIS geometry model in a nutshell

The set of geometry types proposed by OGC's SQL with Geometry Types environment is based on the OpenGIS Geometry Model. In this model, each geometric object has the following general properties:

2.1. The geometry class hierarchy

The geometry classes define a hierarchy as follows: It is not possible to create objects in non-instantiable classes. It is possible to create objects in instantiable classes.
Geometry is the root class of the hierarchy. It is a non-instantiable class but has a number of properties that are common to all geometry values created from any of the Geometry subclasses. These properties are described in the following list. Particular subclasses have their own specific properties, described later.
Geometry Properties
A geometry value has the following properties:

2.3. class Point

A Point is a geometry that represents a single location in coordinate space.
Point Properties
point example

2.4. class Curve

A Curve is a one-dimensional geometry, usually represented by a sequence of points. Particular subclasses of Curve define the type of interpolation between points. Curve is a non-instantiable class.
Curve Properties

2.5. class LineString

A LineString is a Curve with linear interpolation between points.
LineString Properties
linestring example

2.6. class Surface

A Surface is a two-dimensional geometry. It is a non-instantiable class. Its only instantiable subclass is Polygon.
Surface Properties

2.7. class Polygon

A Polygon is a planar Surface representing a multisided geometry. It is defined by a single exterior boundary and zero or more interior boundaries, where each interior boundary defines a hole in the Polygon.
Polygon Assertions
The boundary of a Polygon consists of a set of LinearRing objects (that is, LineString objects that are both simple and closed) that make up its exterior and interior boundaries. The preceding assertions make a Polygon a simple geometry.

polygon example

2.8. class GeometryCollection

A GeometryCollection is a geometry that is a collection of one or more geometries of any class. All the elements in a GeometryCollection must be in the same Spatial Reference System (that is, in the same coordinate system). There are no other constraints on the elements of a GeometryCollection, although the subclasses of GeometryCollection described in the following sections may restrict membership. Restrictions may be based on:

2.9. class MultiPoint

A MultiPoint is a geometry collection composed of Point elements. The points are not connected or ordered in any way.
MultiPoint Properties
multipoint example

2.10. class MultiCurve

A MultiCurve is a geometry collection composed of Curve elements. MultiCurve is a non-instantiable class.
MultiCurve Properties

2.11. class MultiLineString

A MultiLineString is a MultiCurve geometry collection composed of LineString elements. multilinestring example

2.12. class MultiSurface

A MultiSurface is a geometry collection composed of surface elements. MultiSurface is a non-instantiable class. Its only instantiable subclass is MultiPolygon.
MultiSurface Assertions

2.13. class MultiPolygon

A MultiPolygon is a MultiSurface object composed of Polygon elements.
MultiPolygon Assertions
MultiPolygon Properties
multipolygon example

3. Supported spatial data formats

This section describes the standard spatial data formats that are used to represent geometry objects in queries. They are:

3.1. Well-Known Text (WKT) format

The Well-Known Text (WKT) representation of Geometry is designed to exchange geometry data in ASCII form.
Examples of WKT representations of geometry objects: A Backus-Naur grammar that specifies the formal production rules for writing WKT values can be found in the OpenGIS specification document referenced near the beginning of this chapter.

3.2. Well-Known Binary (WKB) format

The Well-Known Binary (WKB) representation for geometric values is defined by the OpenGIS specification. It is also defined in the ISO SQL/MM Part 3: Spatial standard. WKB is used to exchange geometry data as binary streams represented by BLOB values containing geometric WKB information.
WKB uses one-byte unsigned integers, four-byte unsigned integers, and eight-byte double-precision numbers (IEEE 754 format). A byte is eight bits.

For example, a WKB value that corresponds to POINT(1 1) consists of this sequence of 21 bytes (each represented here by two hex digits):
0101000000000000000000F03F000000000000F03F
The sequence may be broken down into these components: Component representation is as follows: The WKB value that corresponds to LINESTRING(1 1, 2 2) consists of the following sequence of 41 bytes:
0102000000000000000000F03F000000000000F03F00000000000000400000000000000040
The sequence may be broken down into these components: A last example referred to collection geometries: the WKB value that corresponds to MULTIPOINT(1 1, 2 2) consists of the following sequence of 49 bytes:
010200000001000000000000000000F03F000000000000F03F0100000000000000000000400000000000000040
The sequence may be broken down into these components:

3.3. Internal BLOB format

Internally, SpatiaLite stores geometry values using ordinary SQLite's BLOB columns in a format that is very closely related to WKB format, but not exactly identical.
The main rationale to adopt a modified WKB format is needing to directly expose explicit MBRs; this helps a lot in order to grant a quick access to entities selected on a spatial basis, and partially balances the lacking of spatial indexes.
The second good reason to use a peculiarly encoded format is needing to be sure that a generic BLOB value really corresponds to a valid SpatiaLite GEOMETRY; remember that one of SQLite specific features is to offer a very weak column-type enforcement.
So SpatiaLite simply relies on ordinary SQLite general support to check that one column value contains a generic BLOB, and then implements on its own any further check if this may be considered as a valid GEOMETRY value.
To do this, SpatiaLite inserts some special markers at predictable and strategic positions. The following is the general internal BLOB format for GEOMETRY used by SpatiaLite:
Byte Offset Content Notes
0 START [hex 00] a GEOMETRY encoded BLOB value must always start with a 0x00 byte
1 ENDIAN
[hex 00 or hex 01]
if this GEOMETRY is BIG_ENDIAN ordered must contain a 0x00 byte value
otherwise, if this GEOMETRY is LITTLE_ENDIAN ordered must contain a 0x01 byte value
Any other value [neither 0x00 nor 0x01], means that this BLOB cannot be a valid GEOMETRY
2 - 5 SRID a 32-bits integer value [little- big-endian ordered, accordingly with the precedent one]
corresponding to the SRID for this GEOMETRY
6 - 13 MBR_MIN_X a double value [little- big-endian ordered, accordingly with the precedent one]
corresponding to the MBR minimum X coordinate for this GEOMETRY
14 - 21 MBR_MIN_Y a double value corresponding to the MBR minimum Y coordinate
22 - 29 MBR_MAX_X a double value corresponding to the MBR maximum X coordinate
30 - 37 MBR_MAX_Y a double value corresponding to the MBR maximum Y coordinate
38 MBR_END [hex 7C] a GEOMETRY encoded BLOB value must always have an 0x7C byte in this position
39 - 42 CLASS TYPE a 32-bits integer; must identify a valid WKB class, i.e.
  • 1 = POINT
  • 2 = LINESTRING
  • 3 = POLYGON
  • 4 = MULTIPOINT
  • 5 = MULTILINESTRING
  • 6 = MULTIPOLYGON
  • 7 = GEOMETRYCOLLECTION
All following bytes are interpreted accordingly to this one class declaration; any other value in this position causes the current one to be considered as an invalid GEOMETRY.
43 - ... geometry class specific length and content depends on geometry class [POINT, LINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION etc]
LAST END [hex FE] a GEOMETRY encoded BLOB value must always end with a 0xFE byte

The preceding one is the general format that SpatiaLite uses for internal representations of any BLOB encoded GEOMETRY. Specific formats for the various classes of GEOMETRY are strictly based upon WKB, as follows:
GEOMETRY class
Format
Byte Offset Content Notes
POINT
0 - 7 X coordinate a double value corresponding to the X coordinate for this POINT
8 - 15 Y coordinate a double value corresponding to the Y coordinate for this POINT
LINESTRING
0 - 3 number of POINTs a 32-bits integer specifying the number of POINTs in this LINESTRING
4 - ... POINTs a corresponding sequence of POINTs
POLYGON
0 - 3 number of RINGs a 32-bits integer specifying the number of RINGs in this POLYGON
4 - ... RINGs a corresponding sequence of RINGs [LINESTRINGs]
MULTIPOINT
0 - 3 number of POINTs a 32-bits integer specifying the number of POINTs in this MULTIPOINT
4 - ... collection entities a corresponding sequence of collection entities, all of the POINT type
MULTILINESTRING
0 - 3 number of LINESTRINGs a 32-bits integer specifying the number of LINESTRINGs in this MULTILINESTRING
4 - ... collection entities a corresponding sequence of collection entities, all of the LINESTRING type
MULTIPOLYGON
0 - 3 number of POLYGONSs a 32-bits integer specifying the number of POLYGONs in this MULTIPOLYGON
4 - ... collection entities a corresponding sequence of collection entities, all of the POLYGON type
GEOMETRYCOLLECTION
0 - 3 number of collection entities a 32-bits integer specifying the number of collection entities in this GEOMETRYCOLLECTION
4 - ... collection entities a corresponding sequence of collection entities, of any supported kind

The following is the format expected for each one collection entity:
Byte Offset Content Notes
0 ENTITY [hex 69] a GEOMETRY encoded BLOB value must always have an 0x69 byte in this position
1 - 4 CLASS TYPE a 32-bits integer; must identify a valid WKB class that can be included within a collection, i.e.
  • 1 = POINT
  • 2 = LINESTRING
  • 3 = POLYGON
All following bytes are interpreted accordingly to this one class declaration; any other value in this position causes the current one to be considered as an invalid GEOMETRY.
5 - ... geometry class specific Length and content depends on geometry class [POINT, LINESTRING or POLYGON]

4. Coordinate Reference Systems; the EPSG dataset

Each coordinate value, in order to be fully qualified, needs to be explicitly assigned to some SRID, or Spatial Reference Identifier.
This value identifies the geometry's associated Spatial Reference System that describes the coordinate space in which the geometry object is defined.
As a general rule, different geometries can interoperate in a meaningful way only if their coordinates are expressed in the same Coordinate Reference System.

To avoid any confusion, and to grant effective interoperability, SRID values cannot be assigned randomly, but have to adhere to some standard specification as defined by an international authority.
The EPSG [European Petroleum Survey Group] is the most authoritative on this field, actually maintaining and distributing a large dataset describing quite any worldwide known Coordinate Reference System. Not only; the EPSG dataset includes Coordinate Transformation geodetic parameters, thus allowing to transform coordinates from some CSR to any other.

The followings are examples from the EPSG dataset:
SRIDCSR namegeodetic parameters
3000Segara / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3001Batavia / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3002Makassar / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +towgs84=-587.8,519.75,145.76,0,0,0,0 +units=m +no_defs
3003Monte Mario / Italy zone 1 +proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs
3004Monte Mario / Italy zone 2 +proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m +no_defs

You don't have to worry too much about geodetic parameters and their meaning; you simply have to know that the PROJ.4 library can use them the right way in order to transform coordinates between different CSRs.

5. Spatial Metadata support

Basically, a GEOMETRY simply is a glorified kind of BLOB value, implementing an internal structure such as the one we've just seen in the previous chapter.
Following such a simplistic approach you can perform some useful tasks. But for sure, this one cannot be the right way to implement a fairly designed and completely reliable Spatial DBMS [i.e. one that is well checked, consistent and properly constrained]. Let we briefly see some horror cases, to better understand which one are the main issues we have to avoid: Obviously it will be highly desirable that your Spatial DBMS can help you to avoid this messy mismatch, by enforcing some appropriate kind of constraint on your Spatial Data.
And this one is exactly the specific purpose of Spatial Metadata. When you add a GEOMETRY column to some table, you don't simply define it in the most generic way, but you have to specify a GEOMETRY type and a SRID as well.
This infos will then be permanently added to Spatial Metadata, and any required constraint will be then enforced each time you'll try to INSERT or UPDATE a row on related table.

As defined by OpenGIS specifications, Spatial Metadata are actually implemented as two distinct tables: The following is an example from the spatial_ref_sys table:
sridauth_nameauth_sridref_sys_nameproj4text
3000epgs3000Segara / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3001epgs3001Batavia / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3002epgs3002Makassar / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +towgs84=-587.8,519.75,145.76,0,0,0,0 +units=m +no_defs
3003epgs3003Monte Mario / Italy zone 1 +proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs
3004epgs3004Monte Mario / Italy zone 2 +proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m +no_defs

And this one is an example from the geometry_columns table:
f_table_namef_geometry_columntypecoord_dimensionsridspatial_index_enabled
RegionsGeometryMULTIPOLYGON2326321
HighWaysGeometryLINESTRING2326321
TownsGeometryPOINT2326321

In the SpatiaLite's own implementation, you are never required to be directly concerned about Spatial Metadata handling.
Any related task is automatically performed in a completely transparent way; anything silently happens behind the scenes.
The main steps to follow in order to activate all this, are as follows: So, once you've created a Geometry column using the AddGeometryColumn() function, you are granted for sure that:

6. The VirtualShape module

The SQLite DBMS implements a very interesting feature; you can develop some specialized driver in order to access any generic [and possibly exotic] data source, and then those external data will appear to the SQL engine exactly as they where SQLite's native ones. This means you can apply any SQL operation on them, if appropriate / possible and if the driver offers an adequate support.
Such mechanism is referred in SQLite's documentation as the VIRTUAL TABLE support; this name is very appropriate, because it allows to access data physically stored anywhere as they where virtually stored in a standard SQLite's table.
In a more general way, a VIRTUAL TABLE can hide anything you can think about; it simply represents some software extension module exposing an appropriate interface to the SQLite's database engine, thus allowing standard SQL processing.

The VirtualShape module is a VIRTUAL TABLE driver specifically designed to allow direct read only access to shapefiles.

The shapefile format is defined by ESRI; you can find a copy of the complete specification at http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf.
If you are interested about this argument, you can usefuly read this article: http://en.wikipedia.org/wiki/Shapefile.
The ESRI shapefile format is quite obsolescent nowadays, but is still very widely used in GIS, mainly because it is simple, is universal and is supported by any existing GIS software.
So it really can be considered as the de facto standard data exchange format for GIS data.

VirtualShape is designed to interact very strictly with SpatiaLite, in order to support a complete direct access to shapefiles via standard SQL, for both attributes and geometries.
SpatiaLite tools support anyway importing GIS data from shapefiles into spatial tables; but the option to support standard SQL queries directly on external shapefiles, with no need at all to import them in some SQLite table, makes VirtualShape a very valuable module.

7. The VirtualText module

The VirtualText module is another VIRTUAL TABLE driver supporting direct read only access to CSV and TXT-tab files.

Formats based on delimited text are widespread; usually you can export quite any kind of tabular data using the CSV or TXT-tab formats. Any mainstream spreadsheet [such as MS Excel or Open Office Calc] supports import / export to and from delimited text.
So, adding to SpatiaLite the ability to process on-the-fly this kind of text based formats as they where native SQL data can really be a very useful feature.

8. Spatial Indexing support

The SQLite DBMS (as many other DBMS) implements Indices as a mean to quickly retrieve selected data.
Imagine a table containing million and million rows, such as the national telephone directory; retrieving the people's name owning some specific telephone number may be a slow operation, if you perform a so called full table scan; i.e. if the DBMS engine has to read and evaluate every row in order to filter the requested one.
Implementing an Index on the telephone number column allows for very fast querying; now the DBMS engine can simply read the Index, and immediately will find the required row.
There is no need at all to perform the lengthy full table scan process.

Please note: if your table contains a limited number of rows (let say 5,000 or 20,000) implementing an Index hasn't any practical effect. SQLite can perform a full table scan in a very short time anyway.
But if your table contains an huge number of rows (let say more than 100,000), then implementing an Index can have a dramatic effect on query's execution time. As biggest is the number of rows, as more evident becomes the benefit to implement an Index.

The same identical problem arises for Spatial data [i.e. Geometry] as well; but unhappily an ordinary Index cannot be used against a Geometry-type column.
This is because ordinary Indices represents an implementation of a so called BTree (i.e. Binary Tree); binary trees can usefully be applied on numbers and / or strings, but not at all on geometries.
For geometries we need a completely different kind of Index, i.e. the so called RTree (Rectangle Tree).

SQLite's release v.3.6.0 [and the following ones] support a stable and effective implementation for RTRee: This implementation for RTree is quite complex [and completely different from the ones supported by MySQL Spatial or PostGis], but is actually a very effective one.
SQLite's RTrees can seem a little bit exotic, but after all they perform very well.

From the SpatiaLite perspective, supporting Spatial Indexing via SQLite's RTrees concretely means that: All these tasks are accomplished in a completely transparent way: You can discover how to write SQL queries taking full profit from Spatial Indices consulting the appropriate tutorial section

9. MBR Caching support

SpatiaLite supports an alternative way in order to implement a Spatial Index. This second Spatial Index implementation is based on MBRs in-memory caching.
Technically speaking, this is not really an index; but for any practical purpose it works exactly as if it was an index; you can use an MBR cache in order to speed up a sluggish spatial query in the same way you'll use an R-Tree index, and obtaining a comparable performance boost.
From the user perspective an MbrCache is handled in a completely transparent way: As a rule of the thumb, using an MbrCache is an advisable solution each time you simply need to use SpatiaLite in order to perform some very specific and non repetitive task; in this context the cost to pay in order to load the cache each time you'll establish a new connection can be really negligible.
Setting up an MbrCache requires by far less overhead than setting up an R*Tree, and can give you exactly the same speed enhancements.
From SQL perspective, using an MbrCache requires less exotic and baroque statements than using an R*Tree.

On the other way, you never have to use an MbrCache if your dataset stores several millions geometries, because a severe memory starvation will result by this.
And using MbrCache make no sense at all when you need to establish very often lots of short lived connections, as is the case of CGI modules.
In this case you have absolutely to use an R*Tree

You can discover how to write SQL queries taking full profit from MbrCache consulting the appropriate tutorial section

10. SpatiaLite conformance and compatibility

SpatiaLite implements a set of spatial extensions that enable an SQLite database to generate, store, and analyze geographic features, in an OpenGIS conformant fashion.
The Open Geospatial Consortium publishes the OpenGIS® Simple Features Specifications For SQL, that defines how to extend an SQL RDBMS to support spatial data.
This specification is available from the OGC Web site at http://www.opengeospatial.org/standards/sfs

You can directly check by yourself if SpatiaLite really is an OGC conformant implementation:
  • download the SpataLite's test scripts
  • then execute the OpenGisTest.sql script
    this corresponds to the original Blue Lake OGC conformity test, with only a few adaption in order to accommodate for some specific SQLite/SpatiaLite idiosyncrasies