SpatiaLite logo

Recipe #10:
Wonderful R*Tree Spatial Index

2011 January 28

Previous Slide Table of Contents Next Slide

A Spatial Index more or less is like any other Index: i.e. the intended role of any Index is to support really fast search of selected items within an huge dataset.

Simply think of some huge textbook: searching some specific item by reading the whole book surely is painful, and may require a very long time.
But you can actually look at the textbook's index, then simply jumping to the appropriate page(s).

Any DB index plays exactly the same identical role.
Anyway, searching Geometries falling within a given search frame isn't the same of searching a text string or a number: so a different Index type is required. i.e. a Spatial Index.


Several algorithms supporting a Spatial Index has been defined during the past years.
SQLite's Spatial Index is based on the R*Tree algorithm.

R*Tree illustration

Very shortly said, an R*Tree defines a tree-like structure based on rectangles (the R in R*Tree stands exactly for Rectangle).
MBR illustration
Every arbitrary Geometry can be represented as a rectangle, irrelevantly of its actual shape: we can simply use the MBR (Minimum Bounding Rectangle) corresponding to such Geometry.
May well be the term BBOX (Bounding Box) is more familiar to you: both terms are exact synonyms.

It's now quite intuitive understanding how the R*Tree does actually works: Think of the well known needle in the hay problem: using an R*Tree is an excellent solution allowing to find the needle in a very short time, even when the hay actually is an impressively huge one.

Common misconceptions and misunderstandings

I have a table storing several zillion points disseminated all around the world:
drawing a map was really painful and required a very long time.
Then I found somewhere some useful hint, so I've created a Spatial Index on this table.
And now my maps are drawn very quickly, as a general case.
Anyway I'm strongly puzzled, because drawing a worldwide map still takes a very long time.
Why the Spatial Index doesn't work on worldwide map ?


The answer is elementary simple: the Spatial Index can speed up processing only when a small selected portion of the dataset has to be retrieved.
But when the whole (or a very large part of) dataset has to be retrieved, obviously the Spatial Index cannot give any speed benefit.
To be pedantic, under such conditions using the Spatial Index introduces further slowness, because inquiring the R*Tree imposes a strong overhead.

Conclusion: the Spatial Index isn't a magic wand. The Spatial Index basically is like a filter.
  • when the selected frame covers a very small region of the whole dataset, using the Spatial Index implies a ludicrous gain.
  • when the selected region covers a wide region, using the Spatial Index implies a moderate gain.
  • but when the selected region covers the whole dataset (or nearly covers the whole dataset), using the Spatial Index implies a further cost.

SQLite's R*Tree implementation details

SQLite supports a first class R*Tree: anyway, some implementation details surely may seem strongly exotic for users accustomed to other different Spatial DBMS (such as PostGIS and so on).

Any R*Tree on SQLite actually requires four strictly correlated tables:
  • rtreebasename_node stores (binary format) the R*Tree elementary nodes.
  • rtreebasename_parent stores relations connecting parent and child nodes.
  • rtreebasename_rowid stores ROWID values connecting an R*Tree node and a corresponding row into the indexed table.
    • none of these three tables is intended to be directly accessed: they are reserved for internal management.
  • rtreebasename actually is a Virtual Table, and exposes the R*Tree for any external access.
    • important notice: never attempt to directly bungle or botch any R*Tree related table;
      quite surely such attempt will simply irreversibly corrupt the R*Tree. You are warned.
SELECT *
FROM rtreebasename;

pkuid miny maxx miny maxy
1022 313361.000000 331410.531250 4987924.000000 5003326.000000
1175 319169.218750 336074.093750 4983982.000000 4998057.500000
1232 329932.468750 337638.812500 4989399.000000 4997615.500000
... ... ... ... ...
Any R*Tree table looks like this one:
  • The pkid column contains ROWID values.
  • minx, maxx, miny and maxy defines MBR extreme points.
The R*Tree internal logic is magically implemented by the Virtual Table.

SpatiaLite's support for R*Tree

Any SpatiaLite Spatial Index fully relies on a corresponding SQLite R*Tree.
Anyway SpatiaLite smoothly integrates the R*Tree, so to make table handling absolutely painless:
  • each time you perform an INSERT, UPDATE or DELETE affecting the main table, then SpatiaLite automatically take care to correctly reflect any change into the corresponding R*Tree.
  • some triggers will grant such synchronization.
  • so, once you've defined a Spatial Index, you can completely forget it.
Any SpatiaLite's Spatial Index always adopts the following naming convention:
  • assuming a table named local_councils containing the geometry column.
  • the corresponding Spatial Index will be named idx_local_councils_geometry
  • and idx.local_councils.pkid will relationally reference local_councils.ROWID.
Anyway using the Spatial Index so to speed up Spatial queries execution is a little bit more difficult than in other Spatial DBMS, because there is no tight integration between the main table and the corresponding R*Tree: in the SQLite's own perspective they simply are two distinct tables.

Accordingly to all this, using a Spatial Index requires performing a JOIN, and (may be) defining a sub-query.
You can find lots of examples about Spatial Index usage on SpatiaLite into the Haute Cuisine section.


SELECT CreateSpatialIndex('local_councils', 'geometry');

SELECT CreateSpatialIndex('populated_places', 'geometry');
This simple declaration is all you are required to specify in order to set a Spatial Index corresponding to some Geometry column. And that's all.

SELECT DisableSpatialIndex('local_councils', 'geometry');
And this will remove a Spatial Index:
SpatiaLite supports a second alternitive Spatial Index based on MBR-caching.
This one simply is a historical legacy, so using MBR-caching is strongly discouraged.

Previous Slide Table of Contents Next Slide

CC-BY-SA logo Author: Alessandro Furieri a.furieri@lqt.it
This work is licensed under the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.

GNU logo Permission is granted to copy, distribute and/or modify this document under the terms of the
GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.