last updated: 2018-10-07



An introduction to SpatiaLite

SpatiaLite is just a small sized SQLite extension. Once you have installed SpatiaLite (a very simple and elementary task), the SQLite DBMS is enable to load, store and manipulate Spatial Data (aka Geographic Data, GIS Data, Cartographic Data, GeoSpatial Data, Geometry Data and alike).
SpatiaLite implements spatial extensions following the specification of the Open Geospatial Consortium (OGC), an international consortium of companies, agencies, and universities participating in the development of publicly available conceptual solutions that can be useful with all kinds of applications that manage spatial data.
At a very basic level, a DBMS that supports Spatial Data offers an SQL environment that has been extended with a set of geometry types, and thus may be usefully employed i.e. by some GIS application.
A geometry-valued SQL column is implemented as a column that has a geometry type. The OGC specification describe a set of SQL geometry types, as well as functions on those types to create and analyze geometry values. The followings are examples of geographic entities with a geometry:
  • a river
  • a building
  • a tree
  • a street, a railway
  • a state boundary
  • a lake
SpatiaLite itself simply implements a basic subset of the SQL with Geometry Types environment proposed by OGC; but SpatiaLite directly integrates others very popular open source libraries, such as:
  • the GEOS library implements sophisticated spatial analysis functions
    GEOS was originally developed to support the PostGIS spatial extension for PostgreSQL; so, SpatiaLite can perform any kind of spatial analysis, and in exactly the same way PostGIS will do
  • the PROJ library implements coordinate transformation between different Coordinate System References
    it supports the very popular and widespread EPSG geodetic database, thus allowing to use any known CSR [at least, the main ones]
  • the LIBICONV library implements support for locale_charset, thus effectively supporting any exotic language, such as Chinese, Japanese, Russian, etc
  • the SQLite library implements a full SQL data engine; SQLite's latest version 3.6.2 includes a native implementation for RTree, thus effectively supporting Spatial Indices
All this put together makes SpatiaLite quite complete and a powerful Spatial DBMS. The main strengths of the SQLite + SpatiaLite couple are:
  • no need for complex installation procedures
  • no DBMS administration at all
  • a very low system footprint
  • an elementary complexity, combined with a good performance level
  • simplicity of use
However SpatiaLite has clear limits, mainly due to the elementary simplicity of the SQLite DB engine.
If you are looking for a full fledged Geo-DBMS, capable to support you in every kind of GIS operation, and perhaps for you is paramount to cope with millions and millions of geographic entities, and you are planning to deploy all this in a complex client-server architecture, don't even try to use SpatiaLite + SQLite.
Obviously it's not the solution you are looking for; simply it falls too short for your needs.
Otherwise, if you are looking for a very basic but full effective desktop Geo-DBMS, with no frills, no undue complexity, and the average volume of your data didn't exceeds a reasonable size, SpatiaLite + SQLite can really be the right solution for your needs.

Just to do a very quick comparison with others widely used open source DBMSs supporting Spatial Data:
DBMS Binary size Brief description
PostgreSQL
PostGis
25 MB
1.5 MB
It claims to be "The world's most advanced open source database", and that's simply the truth.
Not very user friendly (may cause you some headache when you'll try to install and configure it at first time), but surely a very complete and impressive full ACID DBMS.
The PostGIS add-on on its own implements a full set of Spatial features (OGC certified); whatever kind of GIS operation you need, PostGIS can support you in an useful way and without dimensional limits.
If you are seriously looking for a complete and unrestricted GIS support, and you aren't frightened at all about managing and caring a very sophisticated DBMS, as powerful as complex, PostgreSQL + PostGis is just what you need.
MySQL Spatial 42 MB It claims to be "The world's most popular open source database", and this too is simply the truth.
Reasonably easy to use, very fast, decently simple to manage; not surprisingly it's the preferred choice for so many WEB developers.
Often criticized for not being completely conformant to SQL standards, supports a wide set (sometimes puzzling too wide) of data engines.
MySql's Spatial enables you to manage GIS data in a very effective way; it's a pity that you can create a spatial index only if you are using MyISAM tables, that aren't neither transactional nor ACIDs. You can use Spatial data in InnoDB tables as well (full ACIDs), but in this case you cannot have spatial indexes.
Anyway, if you are looking for a DMBS able to effectively support WEB publishing of very large amounts of GIS data, and aren't worried to use a non-ACID DBMS (e.g. because you are publishing cold data, i.e. replicated data coming out from some other source), you can be very satisfied with MySQL Spatial.
SQLite
SpatiaLite
200 KB
2 MB
A very light weighted DBMS; it doesn't pretends to be, but perhaps it is "The world's smallest and simplest database".
As the aforementioned two tends to be vast and complex, as SQLite tends to be simple.
But simplicity does not always means incompleteness, or low quality; not in this case, at least.
SpatiaLite offers you the option to enable the SQLite DBMS to support a decent and standard-conformant environment for Spatial data as well.
It doesn't pretends at all to be better, or fastest, or most powerful than others; most humbly, it just pretends to be simple, light weighted and reasonably useful.
SpatiaLite now supports spatial indices as well, via RTrees SQLite introduced since v.3.6.0.
If you are looking for an elementary, simple-to-manage GIS environment, without frills and undue complexity and if your have to manage only small- or medium-sized GIS data, SQLite and SpatiaLite can make you feel very happy [I hope so, at least ...]

The OpenGIS geometry model

The OpenGIS geometry model in a nutshell The set of geometry types proposed by OGC's SQL with Geometry Types environment is based on the OpenGIS Geometry Model. In this model, each geometric object has the following general properties:
  • It is associated with a Spatial Reference System, which describes the coordinate space in which the object is defined.
  • It belongs to some geometry class

2.1. The geometry class hierarchy

The geometry classes define a hierarchy as follows:

2.2. class Geometry

It is not possible to create objects in non-instantiate-able classes. It is possible to create objects in instantiate-able classes.
Geometry is the root class of the hierarchy. It is a non-instantiable class but has a number of properties that are common to all geometry values created from any of the Geometry subclasses. These properties are described in the following list. Particular subclasses have their own specific properties, described later.
Geometry Properties
A geometry value has the following properties:
  • Its type. Each geometry belongs to one of the instantiate-able classes in the hierarchy.
  • Its SRID, or Spatial Reference Identifier. This value identifies the geometry's associated Spatial Reference System that describes the coordinate space in which the geometry object is defined.
    All calculations are done assuming Euclidean (planar) geometry.
  • Its coordinates in its Spatial Reference System, represented as double-precision (eight-byte) numbers. All non-empty geometries include at least one pair of (X,Y) coordinates. Empty geometries contain no coordinates.
    Coordinates are related to the SRID. For example, in different coordinate systems, the distance between two objects may differ even when objects have the same coordinates, because the distance on the planar coordinate system and the distance on the geocentric system (coordinates on the Earth's surface) are different things.
  • Its interior, boundary, and exterior. Every geometry occupies some position in space. The exterior of a geometry is all space not occupied by the geometry. The interior is the space occupied by the geometry. The boundary is the interface between the geometry's interior and exterior.
  • Its MBR (Minimum Bounding Rectangle), or Envelope. This is the bounding geometry, formed by the minimum and maximum (X,Y) coordinates: ((MINX MINY, MAXX MINY, MAXX MAXY, MINX MAXY, MINX MINY))
  • Whether the value is simple or non-simple. Geometry values of types (LineString, MultiPoint, MultiLineString) are either simple or non-simple. Each type determines its own assertions for being simple or non-simple.
  • Whether the value is closed or not closed. Geometry values of types (LineString, MultiString) are either closed or not closed. Each type determines its own assertions for being closed or not closed.
  • Whether the value is empty or non-empty A geometry is empty if it does not have any points. Exterior, interior, and boundary of an empty geometry are not defined (that is, they are represented by a NULL value). An empty geometry is defined to be always simple and has an area of 0.
  • Its dimension. A geometry can have a dimension of –1, 0, 1, or 2:
    • –1 for an empty geometry.
    • 0 for a geometry with no length and no area.
    • 1 for a geometry with non-zero length and zero area.
    • 2 for a geometry with non-zero area.
    Point objects have a dimension of zero. LineString objects have a dimension of 1. Polygon objects have a dimension of 2. The dimensions of MultiPoint, MultiLineString, and MultiPolygon objects are the same as the dimensions of the elements they consist of.

2.3. class Point

A Point is a geometry that represents a single location in coordinate space.
Point Properties
  • X-coordinate value.
  • Y-coordinate value.
  • Point is defined as a zero-dimensional geometry.
  • The boundary of a Point is the empty set.
point example

2.4. class Curve

A Curve is a one-dimensional geometry, usually represented by a sequence of points. Particular subclasses of Curve define the type of interpolation between points. Curve is a non-instantiable class.
Curve Properties
  • A Curve has the coordinates of its points.
  • A Curve is defined as a one-dimensional geometry.
  • A Curve is simple if it does not pass through the same point twice.
  • A Curve is closed if its start point is equal to its endpoint.
  • The boundary of a closed Curve is empty.
  • The boundary of a non-closed Curve consists of its two endpoints.
  • A Curve that is simple and closed is a LinearRing.

2.5. class LineString

A LineString is a Curve with linear interpolation between points.
LineString Properties
  • A LineString has coordinates of segments, defined by each consecutive pair of points.
  • A LineString is a Line if it consists of exactly two points.
  • A LineString is a LinearRing if it is both closed and simple.
linestring example

2.6. class Surface

A Surface is a two-dimensional geometry. It is a non-instantiable class. Its only instantiable subclass is Polygon.
Surface Properties
  • A Surface is defined as a two-dimensional geometry.
  • The OpenGIS specification defines a simple Surface as a geometry that consists of a single “patch” that is associated with a single exterior boundary and zero or more interior boundaries.
  • The boundary of a simple Surface is the set of closed curves corresponding to its exterior and interior boundaries.

2.7. class Polygon

A Polygon is a planar Surface representing a multisided geometry. It is defined by a single exterior boundary and zero or more interior boundaries, where each interior boundary defines a hole in the Polygon.
Polygon Assertions
The boundary of a Polygon consists of a set of LinearRing objects (that is, LineString objects that are both simple and closed) that make up its exterior and interior boundaries.
  • A Polygon has no rings that cross. The rings in the boundary of a Polygon may intersect at a Point, but only as a tangent.
  • A Polygon has no lines, spikes, or punctures.
  • A Polygon has an interior that is a connected point set.
  • A Polygon may have holes. The exterior of a Polygon with holes is not connected. Each hole defines a connected component of the exterior.
The preceding assertions make a Polygon a simple geometry.

polygon example

2.8. class GeometryCollection

A GeometryCollection is a geometry that is a collection of one or more geometries of any class. All the elements in a GeometryCollection must be in the same Spatial Reference System (that is, in the same coordinate system). There are no other constraints on the elements of a GeometryCollection, although the subclasses of GeometryCollection described in the following sections may restrict membership. Restrictions may be based on:
  • Element type (for example, a MultiPoint may contain only Point elements)
  • Dimension
  • Constraints on the degree of spatial overlap between elements

2.9. class MultiPoint

A MultiPoint is a geometry collection composed of Point elements. The points are not connected or ordered in any way.
MultiPoint Properties
  • A MultiPoint is a zero-dimensional geometry.
  • A MultiPoint is simple if no two of its Point values are equal (have identical coordinate values).
  • The boundary of a MultiPoint is the empty set.
multipoint example

2.10. class MultiCurve

A MultiCurve is a geometry collection composed of Curve elements. MultiCurve is a non-instantiable class.
MultiCurve Properties
  • A MultiCurve is a one-dimensional geometry.
  • A MultiCurve is simple if and only if all of its elements are simple; the only intersections between any two elements occur at points that are on the boundaries of both elements.
  • A MultiCurve boundary is obtained by applying the “mod 2 union rule” (also known as the “odd-even rule”): A point is in the boundary of a MultiCurve if it is in the boundaries of an odd number of MultiCurve elements.
  • A MultiCurve is closed if all of its elements are closed.
  • The boundary of a closed MultiCurve is always empty.

2.11. class MultiLineString

A MultiLineString is a MultiCurve geometry collection composed of LineString elements.

multilinestring example

2.12. class MultiSurface

A MultiSurface is a geometry collection composed of surface elements. MultiSurface is a non-instantiable class. Its only instantiable subclass is MultiPolygon.
MultiSurface Assertions
  • Two MultiSurface surfaces have no interiors that intersect.
  • Two MultiSurface elements have boundaries that intersect at most at a finite number of points.

2.13. class MultiPolygon

A MultiPolygon is a MultiSurface object composed of Polygon elements.
MultiPolygon Assertions
  • A MultiPolygon has no two Polygon elements with interiors that intersect.
  • A MultiPolygon has no two Polygon elements that cross (crossing is also forbidden by the previous assertion), or that touch at an infinite number of points.
  • A MultiPolygon may not have cut lines, spikes, or punctures. A MultiPolygon is a regular, closed point set.
  • A MultiPolygon that has more than one Polygon has an interior that is not connected. The number of connected components of the interior of a MultiPolygon is equal to the number of Polygon values in the MultiPolygon.
MultiPolygon Properties
  • A MultiPolygon is a two-dimensional geometry.
  • A MultiPolygon boundary is a set of closed curves (LineString values) corresponding to the boundaries of its Polygon elements.
  • Each Curve in the boundary of the MultiPolygon is in the boundary of exactly one Polygon element.
  • Every Curve in the boundary of an Polygon element is in the boundary of the MultiPolygon.
multipolygon example

Supported spatial data formats

Supported spatial data formats This section describes the standard spatial data formats that are used to represent geometry objects in queries. They are:
  • Well-Known Text (WKT) format
  • Well-Known Binary (WKB) format
  • Internal BLOB format [the one used by SpatiaLite to store geometries in SQLite columns]

Well-Known Text (WKT) format

The Well-Known Text (WKT) representation of Geometry is designed to exchange geometry data in ASCII form.
Examples of WKT representations of geometry objects:
  • A Point:
    POINT(123.45 543.21)
    Note that point coordinates are specified with no separating comma.

  • A LineString with three points:
    LINESTRING(100.0 200.0, 201.5 102.5, 1234.56 123.89)
    Note that point coordinate pairs are separated by commas.

  • A Polygon with one exterior ring and one interior ring:
    POLYGON((10 10, 20 10, 20 20, 10 20, 10 10),(13 13, 17 13, 17 17, 13 17, 13 13))
    Note that each ring must be closed, i.e. the last point should be exactly the same as the first one.

  • A Polygon with exterior ring but no one interior ring:
    POLYGON((101.23 171.82, 201.32 101.5, 215.7 201.953, 101.23 171.82))
    Note that you should always use a double parenthesis to denote the presence of the exterior ring.

  • A MultiPoint with four Point values:
    MULTIPOINT(1234.56 6543.21, 1 2, 3 4, 65.21 124.78)

  • A MultiLineString with three LineString values:
    MULTILINESTRING((1 2, 3 4), (5 6, 7 8, 9 10), (11 12, 13 14))

  • A MultiPolygon with two Polygon values [the first one contains an interior ring, the second one has only the exterior ring]:
    MULTIPOLYGON(((0 0,10 20,30 40,0 0),(1 1,2 2,3 3,1 1)),((100 100,110 110,120 120,100 100)))

  • A GeometryCollection consisting of two Points and one LineString:
    GEOMETRYCOLLECTION(POINT(1 1), LINESTRING(4 5, 6 7, 8 9), POINT(30 30))
A Backus-Naur grammar that specifies the formal production rules for writing WKT values can be found in the OpenGIS specification document referenced near the beginning of this chapter.

Well-Known Binary (WKB) format

The Well-Known Binary (WKB) representation for geometric values is defined by the OpenGIS specification. It is also defined in the ISO SQL/MM Part 3: Spatial standard. WKB is used to exchange geometry data as binary streams represented by BLOB values containing geometric WKB information.
WKB uses one-byte unsigned integers, four-byte unsigned integers, and eight-byte double-precision numbers (IEEE 754 format). A byte is eight bits.

For example, a WKB value that corresponds to POINT(1 1) consists of this sequence of 21 bytes (each represented here by two hex digits):
0101000000000000000000F03F000000000000F03F
The sequence may be broken down into these components:
  • Byte order : 01
  • WKB type : 01000000
  • X : 000000000000F03F
  • Y : 000000000000F03F
Component representation is as follows:
  • The byte order may be either 1 or 0 to indicate little-endian or big-endian storage. The little-endian and big-endian byte orders are also known as Network Data Representation (NDR) and External Data Representation (XDR), respectively.
  • The WKB type is represented as a four-bytes unsigned integer that indicates the geometry type.
    Values from 1 through 7 indicate respectively Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection.
  • A Point value has X and Y coordinates, each represented as a double-precision value.
  • A LineString value has:
    • the number of Points, represented as a four-bytes unsigned integer.
    • a corresponding array of Points
  • A Polygon value has:
    • the number of rings, represented as a four-bytes unsigned integer.
    • a corresponding array of LineStrings
    • note that the first ring is always the exterior ring; other rings, if any, are always interior rings.
  • Collection type geometries [MultiPoint, MultiLineString, MultiPolygon and GeometryCollections] value has:
    • the number of entities, represented as a four-bytes unsigned integer.
    • a corresponding array of Points, LineStrings or Polygons as needed
    • note that each elementary entity has to be prefixed by its own WKB type.
The WKB value that corresponds to LINESTRING(1 1, 2 2) consists of the following sequence of 41 bytes:
0102000000000000000000F03F000000000000F03F00000000000000400000000000000040
The sequence may be broken down into these components:
  • Byte order : 01
  • WKB type : 02000000 LINESTRING
  • # points : 02000000 2 points
  • X : 000000000000F03F point #1
  • Y : 000000000000F03F
  • X : 0000000000000040 point #2
  • Y : 0000000000000040
A last example referred to collection geometries: the WKB value that corresponds to MULTIPOINT(1 1, 2 2) consists of the following sequence of 49 bytes:
010200000001000000000000000000F03F000000000000F03F0100000000000000000000400000000000000040
The sequence may be broken down into these components:
  • Byte order : 01
  • WKB type collection : 04000000 MULTIPOINT
  • # entities : 02000000 2 elementary geometries
  • WKB type entity # 1: 01000000 POINT
  • X : 000000000000F03F point #1
  • Y : 000000000000F03F
  • WKB type entity # 2: 01000000 POINT
  • X : 0000000000000040 point #2
  • Y : 0000000000000040

Internal BLOB format

Coordinate Reference Systems; the EPSG dataset

Each coordinate value, in order to be fully qualified, needs to be explicitly assigned to some SRID, or Spatial Reference Identifier.
This value identifies the geometry's associated Spatial Reference System that describes the coordinate space in which the geometry object is defined.
As a general rule, different geometries can interoperate in a meaningful way only if their coordinates are expressed in the same Coordinate Reference System.

To avoid any confusion, and to grant effective interoperability, SRID values cannot be assigned randomly, but have to adhere to some standard specification as defined by an international authority.
The EPSG [European Petroleum Survey Group] is the most authoritative on this field, actually maintaining and distributing a large dataset describing quite any worldwide known Coordinate Reference System. Not only; the EPSG dataset includes Coordinate Transformation geodetic parameters, thus allowing to transform coordinates from some CSR to any other.

The followings are examples from the EPSG dataset:
SRID CSR name geodetic parameters
3000 Segara / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3001 Batavia / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3002 Makassar / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +towgs84=-587.8,519.75,145.76,0,0,0,0 +units=m +no_defs
3003 Monte Mario / Italy zone 1 +proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs
3004 Monte Mario / Italy zone 2 +proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m +no_defs
You don't have to worry too much about geodetic parameters and their meaning; you simply have to know that the PROJ library can use them the right way in order to transform coordinates between different CSRs.

Spatial Metadata support

Basically, a GEOMETRY simply is a glorified kind of BLOB value, implementing an internal structure such as the one we've just seen in the previous chapter.
Following such a simplistic approach you can perform some useful tasks. But for sure, this one cannot be the right way to implement a fairly designed and completely reliable Spatial DBMS [i.e. one that is well checked, consistent and properly constrained]. Let we briefly see some horror cases, to better understand which one are the main issues we have to avoid:
  • Suppose you have just created a "Streets" table containing a GEOMETRY column; and now suppose to populate this table by inserting POINTs for some entities, LINESTRINGs for other entities and POLYGONs for the remaining ones.
    Do you think such a table can be used for any practical purpose ?
  • And suppose some of this GEOMETRIES has no explicit SRID at all [i.e. their Reference System is unknown / undefined], while other GEOMETRIES are using random SRID values.
    Yet again: how do you think will be possible to represent all them together in the same homogeneous Reference System ?
Obviously it will be highly desirable that your Spatial DBMS can help you to avoid this messy mismatch, by enforcing some appropriate kind of constraint on your Spatial Data.
And this one is exactly the specific purpose of Spatial Metadata. When you add a GEOMETRY column to some table, you don't simply define it in the most generic way, but you have to specify a GEOMETRY type and a SRID as well.
This infos will then be permanently added to Spatial Metadata, and any required constraint will be then enforced each time you'll try to INSERT or UPDATE a row on related table.

As defined by OpenGIS specifications, Spatial Metadata are actually implemented as two distinct tables:
  • the spatial_ref_sys table contains the Coordinate Reference System definitions.
  • the geometry_columns table contains a list of fully qualified GEOMETRY columns as defined in the database structure.
The following is an example from the spatial_ref_sys table:
srid auth_name auth_srid ref_sys_name proj4text
3000 EPSG 3000 Segara / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3001 EPSG 3001 Batavia / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +units=m +no_defs
3002 EPSG 3002 Makassar / NEIEZ +proj=merc +lon_0=110 +k=0.997 +x_0=3900000 +y_0=900000 +ellps=bessel +towgs84=-587.8,519.75,145.76,0,0,0,0 +units=m +no_defs
3003 EPSG 3003 Monte Mario / Italy zone 1 +proj=tmerc +lat_0=0 +lon_0=9 +k=0.9996 +x_0=1500000 +y_0=0 +ellps=intl +units=m +no_defs
3004 EPSG 3004 Monte Mario / Italy zone 2 +proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m +no_defs

And this one is an example from the geometry_columns table:
f_table_name f_geometry_column type coord_dimension srid spatial_index_enabled
Regions Geometry MULTIPOLYGON 2 32632 1
HighWays Geometry LINESTRING 2 32632 1
Towns Geometry POINT 2 32632 1

In the SpatiaLite's own implementation, you are never required to be directly concerned about Spatial Metadata handling.
Any related task is automatically performed in a completely transparent way; anything silently happens behind the scenes.
The main steps to follow in order to activate all this, are as follows:
  • after creating a new SQLite database file, you are required to execute the init_spatialite.sql SQL script.
    This will automatically create the Spatial Metadata tables, and will insert EPSG definitions into the spatial_ref_sys table as well.
  • and then you simply have to create any GEOMETRY column using the AddGeometryColumn() function.
    This will implicitly update the Spatial Metadata, and will ensure too that appropriate triggers will be created in order to enforce constraints for acceptable GeometryType and SRID.
So, once you've created a Geometry column using the AddGeometryColumn() function, you are granted for sure that:
  • any Geometry stored into this column belongs to the same Geometry class
    i.e. once you've create a Geometry column of POINT type, only POINT-type geometries can pass validation.
    Please note: SpatiaLite applies a relaxed type-checking for group MULTIxxxx classes:
    • a MULTIPOINT column can accept both POINT and MULTIPOINT geometries
    • a MULTILINESTRING column can accept both LINESTRING and MULTILINESTRING geometries
    • a MULTIPOLYGON column can accept both POLYGON and MULTIPOLYGON geometries
  • any Geometry stored into this column references the same SRID
    i.e. once you've created a Geometry column within the 3003 SRID, only Geometries belonging to this one SRID can pass validation.
  • any attempt to violate these constraints will then fail; offending INSERTs or UPDATEs will be inexorably rejected.

The VirtualShape module

The SQLite DBMS implements a very interesting feature; you can develop some specialized driver in order to access any generic [and possibly exotic] data source, and then those external data will appear to the SQL engine exactly as they where SQLite's native ones. This means you can apply any SQL operation on them, if appropriate / possible and if the driver offers an adequate support.
Such mechanism is referred in SQLite's documentation as the VIRTUAL TABLE support; this name is very appropriate, because it allows to access data physically stored anywhere as they where virtually stored in a standard SQLite's table.
In a more general way, a VIRTUAL TABLE can hide anything you can think about; it simply represents some software extension module exposing an appropriate interface to the SQLite's database engine, thus allowing standard SQL processing.

The VirtualShape module is a VIRTUAL TABLE driver specifically designed to allow direct read only access to shapefiles.

The shapefile format is defined by ESRI; you can find a copy of the complete specification at http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf.
If you are interested about this argument, you can usefuly read this article: http://en.wikipedia.org/wiki/Shapefile.
The ESRI shapefile format is quite obsolescent nowadays, but is still very widely used in GIS, mainly because it is simple, is universal and is supported by any existing GIS software.
So it really can be considered as the de facto standard data exchange format for GIS data.

VirtualShape is designed to interact very strictly with SpatiaLite, in order to support a complete direct access to shapefiles via standard SQL, for both attributes and geometries.
SpatiaLite tools support anyway importing GIS data from shapefiles into spatial tables; but the option to support standard SQL queries directly on external shapefiles, with no need at all to import them in some SQLite table, makes VirtualShape a very valuable module.

The VirtualText module

The VirtualText module is another VIRTUAL TABLE driver supporting direct read only access to CSV and TXT-tab files.

Formats based on delimited text are widespread; usually you can export quite any kind of tabular data using the CSV or TXT-tab formats. Any mainstream spreadsheet [such as MS Excel or Open Office Calc] supports import / export to and from delimited text.
So, adding to SpatiaLite the ability to process on-the-fly this kind of text based formats as they where native SQL data can really be a very useful feature.
Importing a CSV/TXT file seldom can be done automatically

You need to analyse and prepare the data properly. This is why a 2 step process is advised.
  • Use VirtualText to temporary to view the data as it will be seen in a database
  • create a SELECT statement to fill the final TABLE in the form you need.
This is the main reason why there is no ImportCSV function.

mille passus from/to: 170-220, 270-320, 370-420, 470-520, 570-620
Importing the GeoNames dataset shapes_all_low.txt with VirtualText
which is a CSV file with 2 fields, containing simplified country boundaries:
  • geonameId: The geonameId of the feature
  • geoJson: as a GeoJSON string that can be used with GeomFromGeoJSON
Notes GeoJSON:
  • The GeoJSON Format is intended more as an transfer format than than a Database storage format
      and does not conform to the OGC-SFS specification.
  • Wsg 84 (4326) is always assumed, but alternative coordinate reference systems can be used (rfc7946 4 Note)
  • Different geometry types, within a collection, should be assumed
  • Z-Values, may exist but are not listed as an extra geometry type (i.e.: no PointZ, just a Point with 3 values)
      theoretically this could be M-Values, but cannot be automaticly determined
  • ZM-Values, may exist but are not listed as an extra geometry type (i.e.: no PointZM, just a Point with 4 values)
With spatialite_gui, using The VirtualText module, we have analysed the data and come to the following conclusions:
  • Since the geometries are of Polygon and MultiPolygon types, MULTIPOLYGON must be used with Casting
  • Since the first found point ([29.96,-2.327]), only contains 2 values (not 3 or 4), dimension XY can be assumed
      theoretically records may exist that contains more than 2 values
  • SetSRID and GeomFromGeoJSON must be used when creating the geometries
  • Since there are no (Multi-)Point, (Multi-)LineString or GeometryCollection only one 1 TABLE is needed

A complete Sql-Script could then look like this:
-- -- ---------------------------------- --
-- export SPATIALITE_SECURITY=relaxed
-- -- ---------------------------------- --
-- rm shapes_all_low.db ; spatialite shapes_all_low.db < create.shapes_all_low.sql
-- -- ---------------------------------- --
-- Simplified country boundaries are available in two slightly different formats:
-- shapes_simplified_low: [shapes_all_low.txt]
-- geonameId:   The geonameId of the feature
-- geoJson:     The boundary in geoJson format
-- -- ---------------------------------- --
-- https://tools.ietf.org/html/rfc7946
-- 'The GeoJSON Format'
-- -- ---------------------------------- --
-- Note: there is no 'ImportCSV' function, so this must be done in 2 steps using VirtualText
-- -- ---------------------------------- --
SELECT DateTime('now','localtime'), "shapes_all_low -virtual_shapes_all_low -";
-- -- ---------------------------------- --
-- Step 1: create a virtual table using VirtualText
-- -- ---------------------------------- --
CREATE VIRTUAL TABLE virtual_shapes_all_low USING
 VirtualText
 (
  --  absolute or relative path leading to the textfile
  'shapes_all_low.txt',
  --  charset encoding used by the textfile
  'UTF-8',
  -- does the first line contains column names [0=no, 1=yes]
  1,
  -- the decimal separator [POINT or COMMA]
 POINT,
 -- the text separator [NONE, SINGLEQUOTE or DOUBLEQUOTE]
 NONE,
 -- the field separator [TAB, ',', ':' or other character]
 TAB
 );
-- -- ---------------------------------- --
SELECT DateTime('now','localtime'), "shapes_all_low -CREATE TABLE-";
-- -- ---------------------------------- --
-- BEGIN a transaction
-- -- ---------------------------------- --
BEGIN;
-- -- ---------------------------------- --
DROP TABLE IF EXISTS shapes_all_low;
CREATE TABLE IF NOT EXISTS shapes_all_low
(
 -- integer id as primary key
 id_rowid INTEGER PRIMARY KEY AUTOINCREMENT,
 -- integer id of record in geonames database
 id_geoname INTEGER  DEFAULT 0
);
-- -- ---------------------------------- --
-- Add a Geometry-Column to TABLE shapes_all_low
-- -- ---------------------------------- --
SELECT AddGeometryColumn
(
 -- table-name
 'shapes_all_low',
 -- geometry column-name
 'geom_wsg84',
 -- srid of geometry
 4326,
 -- geometry-type
 'MULTIPOLYGON',
 -- permit NULL values for geometry [0=yes ; 1=no]
 0
);
-- -- ---------------------------------- --
-- Create a Spatial-Index for geom_wsg84
-- -- ---------------------------------- --
SELECT CreateSpatialIndex
(
 -- table-name
 'shapes_all_low',
 -- geometry column-name
 'geom_wsg84'
);
-- -- ---------------------------------- --
SELECT DateTime('now','localtime'), "shapes_all_low -INSERT INTO-";
-- -- ---------------------------------- --
INSERT INTO shapes_all_low
(id_geoname, geom_wsg84)
SELECT
 geoNameId AS  id_geoname,
 -- Cast to MultiPolygon
 CastToMultiPolygon
 (
   -- set to srid=4326
   SetSRID
   (
    -- will be SRID 0 [Undefined - Geographic Long/Lat]
    GeomFromGeoJSON
    (
     -- rfc7946 assumes Wsg84, thus no information is contained in the JSON-string
     geoJSON
    ),
    4326
   )
  ) AS geom_wsg84
FROM
 virtual_shapes_all_low;
-- -- ---------------------------------- --
SELECT DateTime('now','localtime'), "shapes_all_low -UpdateLayerStatistics-";
-- -- ---------------------------------- --
-- Update the metadata for the Database
-- -- ---------------------------------- --
SELECT UpdateLayerStatistics
(
 -- table-name
 'shapes_all_low',
 -- geometry column-name
 'geom_wsg84'
);
-- -- ---------------------------------- --
-- COMMIT (ending the transaction)
-- -- ---------------------------------- --
COMMIT;
-- -- ---------------------------------- --
-- DROP the VirtualText table, since it is no longer needed
-- -- ---------------------------------- --
DROP TABLE IF EXISTS virtual_shapes_all_low ;
-- -- ---------------------------------- --
SELECT DateTime('now','localtime'),"Records imported into 'shapes_all_low': "||(SELECT count(id_geoname) FROM shapes_all_low)||" --done--";
-- -- ---------------------------------- --
SELECT DateTime('now','localtime'), "shapes_all_low -end -";
-- -- ---------------------------------- --
which returns 249 rows.

Spatial Indexing support

The SQLite DBMS (as many other DBMS) implements Indices as a mean to quickly retrieve selected data.
Imagine a table containing million and million rows, such as the national telephone directory; retrieving the people's name owning some specific telephone number may be a slow operation, if you perform a so called full table scan; i.e. if the DBMS engine has to read and evaluate every row in order to filter the requested one.
Implementing an Index on the telephone number column allows for very fast querying; now the DBMS engine can simply read the Index, and immediately will find the required row.
There is no need at all to perform the lengthy full table scan process.

Please note: if your table contains a limited number of rows (let say 5,000 or 20,000) implementing an Index hasn't any practical effect. SQLite can perform a full table scan in a very short time anyway.
But if your table contains an huge number of rows (let say more than 100,000), then implementing an Index can have a dramatic effect on query's execution time. As biggest is the number of rows, as more evident becomes the benefit to implement an Index.

The same identical problem arises for Spatial data [i.e. Geometry] as well; but unhappily an ordinary Index cannot be used against a Geometry-type column.
This is because ordinary Indices represents an implementation of a so called BTree (i.e. Binary Tree); binary trees can usefully be applied on numbers and / or strings, but not at all on geometries.
For geometries we need a completely different kind of Index, i.e. the so called RTree (Rectangle Tree).

SQLite's release v.3.6.0 [and the following ones] support a stable and effective implementation for RTRee:
  • SQLite's RTrees are implemented as VIRTUAL TABLEs
    We've just explained what a SQLite's VIRTUAL TABLE is in the preceding paragraph.
  • an RTree actually requires three distinct ordinary tables, applying the following naming schema:
    • the indexname_node table is used to store binary data corresponding to each tree node
    • the indexname_parent table is used to represent the tree nodes hierarchy
    • the indexname_rowid table is used to associate each tree node with corresponding ROWIDs
  • all these tables are not intended to be directly accessed anyway, as they contain quite unintelligible data.
    But the RTree module magically puts all this stuff together, and let you access the indexname VIRTUAL TABLE, containing the following columns:
    • the pkid column represents the ROWID of the corresponding row within the table associated with this Spatial Index
    • the xmin, xmax, ymin and ymax columns represents the MBR aka BBOX for corresponding geometry
This implementation for RTree is quite complex [and completely different from the ones supported by MySQL Spatial or PostGis], but is actually a very effective one.
SQLite's RTrees can seem a little bit exotic, but after all they perform very well.

From the SpatiaLite perspective, supporting Spatial Indexing via SQLite's RTrees concretely means that:
  • an RTree has to be defined for each single column you want to benefit from Spatial Indexing
  • while creating such a Spatial Index, if some entities are already stored into the corresponding table, the Spatial Index has to be initially loaded accordingly
  • for each INSERT, UPDATE or DELETE operation performed on the corresponding table, the RTree has to be updated in a consistent way, in order to avoid any data misalignment and thus keeping the Spatial Index in a fully consistent state
All these tasks are accomplished in a completely transparent way:
  • any Geometry column you want to benefit from Spatial Indexing has to be defined within Spatial Metadata; in other words, you can add a Spatial Index only on geometry columns created via AddGeometryColumn()
  • then you can create the Spatial Index simply calling the CreateSpatialIndex() SQL function
  • this not only will create the Spatial Index itself, but will add any trigger required in order to grant full synchronization between the main table and corresponding RTree

MBR Caching support


SpatiaLite supports an alternative way in order to implement a Spatial Index. This second Spatial Index implementation is based on MBRs in-memory caching.
Technically speaking, this is not really an index; but for any practical purpose it works exactly as if it was an index; you can use an MBR cache in order to speed up a sluggish spatial query in the same way you'll use an R-Tree index, and obtaining a comparable performance boost.
  • SpatiaLite's MBR caching is implemented using a single VIRTUAL TABLE supported by the MbrCache module. Each MbrCache table contains the following columns:
    • the rowid column represents the ROWID of the corresponding row within the table associated with this Spatial Index
    • the mbr column represents the MBR aka BBOX for corresponding geometry
  • an MbrCache is an in-memory structure, so it is volatile and needs to be initialized each time a new connection is established
  • after initial loading, appropriate triggers grant full synchronization between the main table and the corresponding MbrCache
  • for each INSERT, UPDATE or DELETE operation performed on the main table, the MbrCache is then updated accordingly
  • An MbrCache allocates memory in order to work. More or less, caching a million MBRs requires 40MB
    this isn't so much for an ordinary PC having some 512MB or 1GB RAM; but trying to cache ten millions MBRs (or more) will require 400MB, and allocating such a wide amount of memory can hang even the more robust hardware
    Never forget this point when you are planning to use MbrCache.
From the user perspective an MbrCache is handled in a completely transparent way:
  • any Geometry column you want to benefit from MbrCache has to be defined within Spatial Metadata; in other words, you can add an MbrCache only on geometry columns created via AddGeometryColumn()
  • then you can create the MbrCache simply calling the CreateMbrCache() SQL function
  • this not only will create the MbrCache itself, but will add any trigger required
  • creation of both R*Tree and MbrCache on the same Geometry column is not supported
As a rule of the thumb, using an MbrCache is an advisable solution each time you simply need to use SpatiaLite in order to perform some very specific and non repetitive task; in this context the cost to pay in order to load the cache each time you'll establish a new connection can be really negligible.
Setting up an MbrCache requires by far less overhead than setting up an R*Tree, and can give you exactly the same speed enhancements.
From SQL perspective, using an MbrCache requires less exotic and baroque statements than using an R*Tree.

On the other way, you never have to use an MbrCache if your dataset stores several millions geometries, because a severe memory starvation will result by this.
And using MbrCache make no sense at all when you need to establish very often lots of short lived connections, as is the case of CGI modules.
In this case you have absolutely to use an R*Tree

SpatiaLite conformance and compatibility

SpatiaLite implements a set of spatial extensions that enable an SQLite database to generate, store, and analyze geographic features, in an OpenGIS conformant fashion.

The Open Geospatial Consortium publishes the OpenGIS® Simple Features Specifications For SQL, that defines how to extend an SQL RDBMS to support spatial data.
This specification is available from the OGC Web site at http://www.opengeospatial.org/standards/sfs



last updated: 2018-10-07