Update of "Shadowed ROWID issues"
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 0f957dde11cedfd6affb02784902e9920b4ec314
Page Name:Shadowed ROWID issues
Date: 2015-07-01 08:22:36
Original User: sandro
Parent: 430c81a7b5a3ff89350877d2d3492cf0fb83f2f4 (diff)
Content

back



Broken R*Tree caused by shadowed ROWID issues

A new issue affecting the R*Tree Spatial Index has been recently reported.
The specific problem causing this issue seems to have a very low probability to happen under normal conditions; anyway, once this problem will actually arise the consequences could easily have a really strong negative impact on the overall performance of SpatiaLite.
So you are kindly invited to carefully read this technical note.

Understanding what really is a ROWID

From the SQLite's own documentation:
Each entry in an SQLite table has a unique 64-bit signed integer key called the "rowid". 
The rowid is always available as an undeclared column named ROWID, OID, or _ROWID_ as 
long as those names are not also used by explicitly declared columns. 
If the table has a column of type INTEGER PRIMARY KEY then that column is another 
alias for the rowid.
In other words: a ROWID represents a Unique Identifier allowing to retrieve every individual row from a table in a very quick time.
Using ROWID values as relational keys whenever is possible is really useful in many SQL queries, because relational JOIN ops based on ROWIDs always are very efficient and fast to be resolved.

The SpatialLite's own implementation of the Spatial Index widely depends on ROWIDs in order to JOIN both the main table containing Geometries and the corresponding R*Tree table.
Anyway there is a dangerous hidden trap in all this: it works well only since ROWID values returned by the main table do exactly correspond to ROWID values stored into the R*Tree. If (for any reason) the above assumption is no longer satisfied then the Spatial Index will simply return unreliable and useless results.

You'll probably be well aware that not declaring an INTEGER PRIMARY KEY supporting any Geometry table could directly cause many severe Spatial Index releted headaches, e.g. immediately after performing a VACUUM. If you missed this critical point, reading this old technical note is warmly suggested.
The recently discovered issue is even worst (if possible) than the previous one: it will not simply cause a severe Spatial Index corruption, it could eventually transform any SQL query attempting to access the Spatial Index into a deadly slow query.

Now we'll read yet again the above paragraph taken from SQLite's own documentation, this time paying the due attention to any single world:
The rowid is always available as an undeclared column named ROWID, OID, or _ROWID_
as long as those names are not also used by explicitly declared columns
.

Here is the dangerous pitfall: if the same table containing your Geometries actually declares (for any possible reason) an ordinary column named "rowid", then all values contained into this column will completely shadow the expected Unique Identifiers allowing to immediately access each single row (aka ROWID values, as usually intended).
And as a rather obvious consequence a strong relationship JOINing each Geometry to its Spatial Index entry will immediately vanish.

That's not all: the worst consequence will now be that instead of using a really fast key allowing to quickly retrieve each row, your query under such odd conditions will require a lengthy full table scan for each single row to be retrieved; and consequently a Spatial Index query expected to add a noticeable performance boost will cause instead a catastrophic performance loss.

Happily enough this issue doesn't seems to materialize too often in real world cases; anyway once it occurs it could really have a dramatic negative impact.
Be aware and well conscious of the problem.

Remedies and cures

The current code base (already available from the Fossil repository) adopts the following precautions in order to mitigate this recently discovered issue:

How to use CheckShadowedRowid

SELECT f_table_name, f_geometry_column
FROM geometry_columns
WHERE CheckShadowedRowid(f_table_name) = 1;
------
broken_table	geometry
insane_table	geometry
crazy_table	geometry
Executing an elementary simple query as the above one requires just few seconds and could probably avoid to spend many long hours in fruitless debugging sessions.
Don't hesitate to immediately perform a check like this if you simply suspect that something is going bad in your Spatial Indices, or when you notice some puzzling or unexpected behaviour even vaguely related to Spatial Indices.


back