last updated: 2018-10-12



recipe #11: Guinness Book of Records

Note:
  The TABLEs used here were created in
   recipe #1: creating a well designed DB.
Community Province Region
Atrani Salerno Campania
Bardonecchia Torino Piemonte
Briga Alta Cuneo Piemonte
Casavatore Napoli Campania
Lampedusa e Linosa Agrigento Sicilia
Lu Alessandria Piemonte
Morterone Lecco Lombardia
Ne Genova Liguria
Otranto Lecce Puglia
Pino sulla Sponda del Lago Maggiore Varese Lombardia
Predoi Bolzano/Bozen Trentino-Alto Adige/Südtirol
Re Verbano-Cusio-Ossola Piemonte
Ro Ferrara Emilia-Romagna
Roma Roma Lazio
San Valentino in Abruzzo Citeriore Pescara Abruzzo
Vo Padova Veneto

The above list of Communities really is a kind of Guinness Book of Records.
For one reason or the other each one of them really has something absolutely exceptional and worth of note.
May well be you are really puzzled and surprised while reading this, because (with the notable exception of Rome) none of them is within the most renowned places of Italy.
Anyway, the explanation is really simple: Discovering such highly (un)useful Guinness Records collection is quite easy.
You simply have to execute the following SQL query.

SELECT
 c.community_name AS Community,
 p.province_name AS Province,
 r.region_name AS Region,
 c.population AS Population,
 ST_Area(c.geometry) / 1000000.0 AS "Area sqKm",
 c.population / (ST_Area(c.geometry) / 1000000.0) AS "PopDensity [people/sqKm]",
 Length(c.community_name) AS NameLength,
 MbrMaxY(c.geometry) AS North,
 MbrMinY(c.geometry) AS South,
 MbrMinX(c.geometry) AS West,
 MbrMaxX(c.geometry) AS East
FROM communities AS c
JOIN provinces AS p ON (p.province_id = c.province_id)
JOIN regions AS r ON (r.region_id = p.region_id)
WHERE c.community_id IN
(
 SELECT community_id FROM communities
 WHERE population IN
 (
  SELECT Max(population) FROM communities
  UNION
  SELECT Min(population) FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE ST_Area(geometry) IN
 (
  SELECT Max(ST_area(geometry)) FROM communities
  UNION
  SELECT Min(ST_Area(geometry)) FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE (population / (ST_Area(geometry) / 1000000.0)) IN
 (
  SELECT Max(population / (ST_Area(geometry) / 1000000.0)) FROM communities
  UNION
  SELECT MIN(population / (ST_Area(geometry) / 1000000.0)) FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE Length(community_name) IN
 (
  SELECT Max(Length(community_name)) FROM communities
  UNION
  SELECT Min(Length(community_name)) FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE MbrMaxY(geometry) IN
 (
  SELECT Max(MbrMaxY(geometry)) FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE MbrMinY(geometry) IN
 (
  SELECT Min(MbrMinY(geometry)) FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE MbrMaxX(geometry) IN
 (
  SELECT Max(MbrMaxX(geometry)) FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE MbrMinX(geometry) IN
 (
  SELECT Min(MbrMinX(geometry)) FROM communities
 )
)
ORDER BY Community;
-- 14 records

Oh yes, this one surely isn't a simple and plain query.
But you are now consulting the High Cuisine Recipes.
So I suppose your intention was exactly the one to look for some really tasty and spicy SQL query:
and you've just got it. Feel happy.

After all the above SQL query is only apparently complex, but it real structure is surprisingly simple.
Let try rewriting the SQL query in a simplified form:

SELECT
 c.community_name AS Community,
 p.province_name AS Province,
 r.region_name AS Region,
 c.population AS Population,
 ST_Area(c.geometry) / 1000000.0 AS "Area sqKm",
 c.population / (ST_Area(c.geometry) / 1000000.0) AS "PopDensity [people/sqKm]",
 Length(c.community_name) AS NameLength,
 MbrMaxY(c.geometry) AS North,
 MbrMinY(c.geometry) AS South,
 MbrMinX(c.geometry) AS West,
 MbrMaxX(c.geometry) AS East
FROM communities AS c
JOIN provinces AS p ON (p.province_id = c.province_id)
JOIN regions AS r ON (r.region_id = p.region_id)
WHERE c.community_id IN (... some list of values ...);

I suppose you can now easily understand what is the meaning of this SQL query.
There is absolutely nothing too much complex or difficult in this: But you already know all this, so I suppose you are very little interested to get any further detail.
Quite obviously the most interesting things happen within the WHERE ... IN (...) clause;
and exactly here we'll now focus our attention.

...
SELECT
 Max(population) FROM communities
...
SELECT
 Min(population) FROM communities

This SQL snippet is really simple: each query simply computes a Min / Max value.

...
 SELECT Max(population) FROM communities
UNION
 SELECT Min(population) FROM communities

This is the first time we introduce an UNION statement:
...
SELECT
 community_id FROM communities
WHERE population IN
(
 SELECT Max(population) FROM communities
 UNION
 SELECT Min(population) FROM communities
)
...

SQL supports a wonderful mechanism: the one known as sub-query.
You can define an inner query (which is executed first), then using any returned value into the outer (main) query.
Now the previous SQL snippet doesn't looks too much mysterious:
...
 SELECT community_id FROM communities
 WHERE population IN
 (
  SELECT Max(population) FROM communities
  UNION
  SELECT Min(population)FROM communities
 )
 UNION
 SELECT community_id FROM communities
 WHERE ST_Area(geometry) IN
 (
  SELECT Max(ST_area(geometry)) FROM communities
  UNION
  SELECT Min(ST_Area(geometry))FROM communities
 )
...

Nothing forbid us to nest more than one single UNION clauses: this one is a fully legitimate option.
Accordingly to this, the above SQL snippet has to be interpreted as follows:

SELECT
 c.community_name AS Community,
 p.province_name AS Province,
 r.region_name AS Region,
 c.population AS Population,
 ST_Area(c.geometry) / 1000000.0 AS "Area sqKm",
 c.population / (ST_Area(c.geometry) / 1000000.0) AS "PopDensity [people/sqKm]",
 Length(c.community_name) AS NameLength,
 MbrMaxY(c.geometry) AS North,
 MbrMinY(c.geometry) AS South,
 MbrMinX(c.geometry) AS West,
 MbrMaxX(c.geometry) AS East
FROM communities AS c
JOIN provinces AS p ON (p.province_id = c.province_id)
JOIN regions AS r ON (r.region_id = p.region_id)
WHERE c.community_id IN
(
 --
 -- a list of c.community_id values will be returned
 -- by this complex sub-query
 --
 SELECT community_id FROM communities
 WHERE population IN
 (
  --
  -- this further sub-query will return
  -- Min/Max POPULATION
  --
  SELECT Max(population) FROM communities
  UNION
  SELECT Min(population) FROM communities
 )
 UNION  -- merging into first-level sub-query
 SELECT community_id FROM communities
 WHERE ST_Area(geometry) IN
 (
  --
  -- this further sub-query will return
  -- Min/Max ST_AREA()
  --
  SELECT Max(ST_area(geometry)) FROM communities
  UNION
  SELECT Min(ST_Area(geometry)) FROM communities
 )
 UNION  -- merging into first-level sub-query
 SELECT community_id FROM communities
 WHERE population / (ST_Area(geometry) / 1000000.0) IN
 (
  --
  -- this further sub-query will return
  -- Min/Max POP-DENSITY
  --
  SELECT Max(population / (ST_Area(geometry) / 1000000.0)) FROM communities
  UNION
  SELECT MIN(population / (ST_Area(geometry) / 1000000.0)) FROM communities
 )
 UNION  -- merging into first-level sub-query
 SELECT community_id FROM communities
 WHERE Length(community_name) IN
 (
  --
  -- this further sub-query will return
  -- Min/Max NAME-LENGTH
  --
  SELECT Max(Length(community_name)) FROM communities
  UNION
  SELECT Min(Length(community_name)) FROM communities
 )
 UNION  -- merging into first-level sub-query
 SELECT community_id FROM communities
 WHERE MbrMaxY(geometry) IN
 (
  --
  -- this further sub-query will return
  -- Max NORTH
  --
  SELECT Max(MbrMaxY(geometry)) FROM communities
 )
 UNION  -- merging into first-level sub-query
 SELECT community_id FROM communities
 WHERE MbrMinY(geometry) IN
 (
  --
  -- this further sub-query will return
  -- Max SOUTH
  --
  SELECT Min(MbrMinY(geometry)) FROM communities
 )
 UNION  -- merging into first-level sub-query
 SELECT community_id FROM communities
 WHERE MbrMaxX(geometry) IN
 (
  --
  -- this further sub-query will return
  -- Max WEST
  --
  SELECT Max(MbrMaxX(geometry)) FROM communities
 )
 UNION  -- merging into first-level sub-query
 SELECT community_id FROM communities
 WHERE MbrMinX(geometry) IN
 (
  --
  -- this further sub-query will return
  -- Max EAST
  --
  SELECT Min(MbrMinX(geometry)) FROM communities
 )
)
ORDER BY Community;
-- 14 records

According to SQL syntax, using two consecutive hyphens (--) you can mark a comment.
i.e. any further text until the next line terminator is absolutely ignored by the SQL parser.
Placing appropriate comments within really complex SQL queries surely enhances readibility.

Conclusion: SQL is a wonderful language, fully supporting a regular and easily predictable syntax.
Each time you'll encounter some intimidating complex SQL query don't panic and don't be afraid:
simply attempt to break the complex statement into several smallest and simplest blocks, and you'll soon discover that complexity was more apparent than real.
Useful hint: attempting to debug some very complex SQL statement is obviously a difficult and defatigating task.
Breaking down a complex query into smallest chunks, then testing each one of them individually usually is the best approach you can follow.

recipe #12: Neighbours

Note:
  The TABLEs used here were created in
   recipe #1: creating a well designed DB.

Warning: For the Cookbook 3.0 version of this chapter the used function RTreeIntersects was deprecated in Spatialite 4.2.0 and is no longer available in the present Spatialite version.

The problem

  • Obviously each Local Council shares a common boundary with a neighbour:
    [not an absolute rule, anyway: e.g. small islands are self-contained].
  • We'll use Spatial SQL to identify every adjacent couple of Communities.
  • Just to add some further complexity, we'll specifically focus our attention on the Tuscany region boundaries.

This first query is really simple:
SELECT
 c1.community_name AS "Community",
 c2.community_name AS "Neighbour"
FROM communities AS c1, communities AS c2
WHERE (ST_Touches(c1.geometry, c2.geometry));
  • the ST_Touches(geom1, geom2) function is used to evaluates the spatial relationship existing between couples of Communities.
  • the community table is scanned twice, so to implement a JOIN;
    in other words this simply allows to evaluate each Local Council against any other, in a combinatory/permutative fashion.
  • obviously this may cause ambiguity.
    We have to set an appropriate alias name (AS c1 / AS c2) so to uniquely identify each one of the two table instances.
Anyway, a so simplistic approach implies several (strong, severe) issues:
  • this query surely will return the correct answer: but the process time will be very long.
    [actually, so long to be absolutely not usable for any practical purpose].
  • explaining all this is really easy: evaluating ST_Touches() implies lots of complex calculations, so this one is a really heavy (and lengthy) step.
  • following a pure combinatorial logic creates many million couples requiring to be evaluated.
  • conclusion: iterating many million times a lengthy operation is a sure recipe for disaster.

Fortuantly, we can perform such a Spatial query in a really fast and efficient way using the created SpatialIndex :
SELECT
 c1.community_name AS "Community",
 c2.community_name AS "Neighour Community"
FROM communities AS c1, communities AS c2
WHERE c2.ROWID IN
(
 SELECT ROWID FROM SpatialIndex WHERE
 (
  (f_table_name = 'communities') AND
  (search_frame = c1.geometry)
 )
);
  • we have to use a Spatial Index [aka R*Tree]
  • this will add some further complexity to the SQL statement, but will achieve a ludicrous speed enhancement.
  • how it works: the R*Tree is checked first, so to evaluate Minimum Bounding Rectangles [MBRs] of both Geometries.
    • This one is a very quick step to be evaluated, and allows discarding lots of couples that surely cannot share a common boundary.
    • So the number of times ST_Touches() will then be actually called will be dramatically reduced.
    • And all this strongly reduces the overall processing time.
  • At SQL syntax level using the Spatial Index simply requires to implement a sub-query:
    • the R*Tree Spatial Index in SQLite actually is a separate table.
    • table names are strictly related: the Spatial Index corresponding to table myTbl and geometry column myGeom always is expected to be named as idx_myTbl_myGeom.
    • the MATCH RTreeIntersects() clause is used to quickly retrieve any interesting Geometry simply evaluating its own MBR.
    • MbrMinX() and alike are used to identify the extreme points of the filtering MBR.
Just to explain better what's going on, you can imagine that this SQL query is processed using the following steps:
  • a Geometry is picked up from the first communities instance: lc1.geometry
  • then the R*Tree Spatial Index is scanned, so to identify any other Geometry from the second communities instance: lc2.geometry
  • Only Geometries satisfying an intersecting MBR constraint will be fetched via Spatial Index.
  • and finally ST_Touches() will be evaluated: but this will affect only a very limited few carefully pre-filtered Geometries.

SELECT
 c1.community_name AS "Tuscan Community",
 p1.province_name AS "Tuscan Province",
 c2.community_name AS "Neighbouring Community",
 p2.province_name AS "Neighbouring Province",
 r2.region_name AS Region
FROM communities AS c1,
 communities AS c2,
 provinces AS p1,
 provinces AS p2,
 regions AS r1,
 regions AS r2
WHERE
(
 -- 1st filter: Does the Community belong to the Province ? [homeland side]
 (c1.province_id = p1.province_id) AND
 -- 2nd filter: Does the Community belong to the Province ? [foreigner side]
 (c2.province_id = p2.province_id) AND
 -- 3rd filter: Does the Province belong to the Region ? [homeland side]
 (p1.region_id = r1.region_id) AND
 -- 4th filter: Does the Province belong to the Region ? [foreigner side]
 (p2.region_id = r2.region_id) AND
 -- 5th filter: only Tuscan Region [homeland side]
 (r1.region_id = 9) AND
 -- 6th filter: only not-Tuscan Regions [foreigner side]
 (r1.region_id <> r2.region_id) AND
 (
  -- 7th filter: BoundingBox of possible [foreigner side] candidates
  c2.ROWID IN
  (
   -- Is the foreigner side Community within the area (BoundingBox) of the homeland Community ?
   SELECT ROWID FROM SpatialIndex WHERE
   (
    (f_table_name = 'communities') AND
    (search_frame = c1.geometry)
   )
  )
 ) AND
 -- 8th filter: check if both candidates share a common border
 (ST_Touches(c1.geometry, c2.geometry))
)
ORDER BY p1.province_name, c1.community_name;
-- 132 records

The result will look something like this:
Tuscan Community Tuscan Province Neighbour Community Neighbour Province Region
Anghiari Arezzo Citerna Perugia Umbria
Arezzo Arezzo Monte Santa Maria Tiberina Perugia Umbria
Bibbiena Arezzo Bagno di Romagna Forli'-Cesena Emilia-Romagna
Chiusi della Verna Arezzo Bagno di Romagna Forli'-Cesena Emilia-Romagna
Chiusi della Verna Arezzo Verghereto Forli'-Cesena Emilia-Romagna
... ... ... ... ...

All right, once we have resolved the Spatial Index stuff writing the whole SQL query isn't so difficult.
Anyway this one is a rather complex query, so some further explanation is surely welcome:
  • we have to resolve JOIN relations connecting communities to Provinces, and Provinces to regions
  • anyway we used two different instances for communities, so we need to resolve JOIN relations separately for each one instance.
  • setting the r1.region_name LIKE 'toscana' clause only Tuscan Communities will be evaluated for the homeland side.
  • and setting the r1.region_id <> r2.region_id clause ensures that only not-Tuscan Communities will be evaluated for the foreigner side.
  • only Tuscan Communities sharing a common boundary with not-Tuscan Communities will be extracted by this query.

SELECT
 c1.community_name AS "Tuscan Community",
 p1.province_name AS "Tuscan Province",
 c2.community_name AS "Neighbour Communit",
 p2.province_name AS "Neighbour Province",
 r2.region_name AS Region
FROM communities AS c1, communities AS c2
JOIN provinces AS p1 ON (p1.province_id = c1.province_id)
JOIN provinces AS p2 ON (p2.province_id = c2.province_id)
JOIN regions AS r1 ON (r1.region_id = p1.region_id)
JOIN regions AS r2 ON (r2.region_id = p2.region_id)
WHERE
(
 -- 1st filter: only Tuscan Region [homeland side]
 (r1.region_id = 9) AND
 -- 2nd filter: only not-Tuscan Regions [foreigner side]
 (r1.region_id <> r2.region_id) AND
 (
  -- 3rd filter: BoundingBox of possible [foreigner side] candidates
  c2.ROWID IN
  (
   -- Is the foreigner side Community within the area (BoundingBox) of the homeland Community ?
   SELECT ROWID FROM SpatialIndex WHERE
   (
    (f_table_name = 'communities') AND
    (search_frame = c1.geometry)
   )
  )
 ) AND
 -- 4th filter: check if both candidates share a common border
 (ST_Touches(c1.geometry, c2.geometry))
)
ORDER BY p1.province_name, c1.community_name;
-- 132 records

With the same results as the first query.

Obviously you can this query adopting the alternative syntax for JOINs: the difference simply is syntactic.
And doesn't implies any difference at functional or performance levels.
Performing sophisticated Spatial Analysis not necessarily is an easy and plain task.
Mastering complex SQL queries is a little bit difficult (but not at all impossible).
Optimizing such complex SQL, so to get fast answers surely requires some extra-care and attention.

But Spatial SQL supports you in the most effective (and flexible) way: the results you can get simply are fantastic.
After all the game surely is worth the candle.

recipe #13: Isolated Islands

Note:
  The TABLEs used here were created in
   recipe #1: creating a well designed DB.

The problem

Very closely related to the latest one. Now the problem is:
  • identify any isolated Community, i.e. which does not share a common boundary with any other Italian Community.
SELECT
 c1.community_name AS Community,
 p.province_name AS Province,
 r.region_name AS Region
FROM communities AS c1
JOIN provinces AS p ON (p.province_id = c1.province_id)
JOIN regions AS r ON (r.region_id = p.region_id)
LEFT JOIN communities AS c2 ON
(
 -- 1st not the same Community
 (c1.community_id <> c2.community_id) AND
 ( -- 2nd filter: the neighbour Community inside the BoundingBox of the present Community
  WHERE c2.ROWID IN
  (
   -- Is the neighbour Community within the area (BoundingBox) of the present Community ?
   SELECT ROWID FROM SpatialIndex WHERE
   (
    (f_table_name = 'communities') AND
    (search_frame = c1.geometry)
   )
  )
 ) AND
 -- 3rd filter: the neighbour Community does not share a common boundry with the present Community
 (NOT ST_Disjoint(c1.geometry, c2.geometry))
)
GROUP BY c1.community_id
HAVING Count(c2.community_id) = 0
ORDER BY c1.community_name;
-- 14 records
Community Province Region
Campione d'Italia Como Lombardia
Capraia Isola Livorno Toscana
Carloforte Carbonia-Iglesias Sardegna
Favignana Trapani Sicilia
Isola del Giglio Grosseto Toscana
Isole Tremiti Foggia Puglia
La Maddalena Olbia-Tempio Sardegna
Lampedusa e Linosa Agrigento Sicilia
Lipari Messina Sicilia
Pantelleria Trapani Sicilia
Ponza Latina Lazio
Procida Napoli Campania
Ustica Palermo Sicilia
Ventotene Latina Lazio

Note: All the Communities, listed above, are small sea islands.
With the exception of Campione d'Italia, which is a land island:
  i.e. a small Italian enclave inside Switzerland.

Nothing really new in this: more or less, this is quite exactly the same we've already examined in the latest example.
Just few differences are worth to be explained:
  • this time we've used NOT ST_Disjoint() to identify any allowable Spatial relationship between couples of adjacent Communities.
  • and we've used LEFT JOIN for the second instance of the communities table (AS lc2): this way we'll be absolutely sure to insert into the result-set every Local Council coming from the left-sided term lc1, because a LEFT JOIN is valid even when the right-sided term lc2 doesn't matches any corresponding entry.
  • the GROUP BY lc1.community_id clause is required so to build a distinct aggregation group for each Local Council.
  • after all this the function Count(lc2.community_id) will return the number of neighbours for each Local Council:
    quite obviously, a value ZERO denotes that this one actually is an isolated Local Council.
  • and finally we've used the HAVING clause to exclude any not-isolated Local Council.
  • Please note well: the HAVING clause must not be confused with the WHERE clause.
    They only are apparently similar, but a strong difference exists between them:
    • WHERE immediately evaluates if a candidate row has to inserted into the result-set or not.
      So, a row discarded by WHERE is completely ignored, and cannot be used in any further step.
    • on the other side HAVING is evaluated only when the result-set is completely defined, just immediately before be passed back to the calling process.
      So HAVING is really useful to perform any kind of post-processing, as in this case.
      We simply needed to reduce the result-set (by deleting any not-isolated Local Council), and the HAVING clause was exactly the tool for the job.

recipe #14: Populated Places vs Communities

Note:
  The TABLEs used here were created in
   recipe #1: creating a well designed DB.

The problem

Do you remember ?
  • We've left the populated_places table in a self-standing position since now.
  • While designing the DB layout we concluded that some spatial relationship must exists between populated_places and communities.
  • We can easily expect to get some inconsistencies between these two datasets, because they come from absolutely unrelated sources.
  • The populated_places table had a POINT Geometries using the 4236 SRID (Geographic, WGS84, long-lat):
    while the communities table has a MULTIPOLYGON Geometries using the 32632 SRID (planar, WGS 84 / UTM zone 32N)
  • Using two different SRIDs surely introduces some further complication to be resolved.
It's now time to confront yourself with this not-so-simple problem.
not yet, since we have alreaddy done a one-time Transform to 32632 when importing the TABLE

You can start with this first simple query:
SELECT
 pp.id AS PopulatedPlaceId,
 pp.name AS PopulatedPlaceName,
 c.community_id AS CommunityId,
 c.community_name AS CommunityName
FROM populated_places AS pp, communities AS c
WHERE
 (ST_Contains(c.geometry,pp.geometry));
PopulatedPlaceId PopulatedPlaceName CommunityId CommunityName Province Region
... ... ... ... ... ...
12383 Acitrezza NULL NULL NULL NULL
12384 Lavinio NULL NULL NULL NULL
11327 Altino 69001 Altino Chieti Abruzzo
11265 Archi 69002 Archi Chieti Abruzzo
11247 Ari 69003 Ari Chieti Abruzzo
... ... ... ... ... ...
  • the ST_Contains(geom1, geom2) function is used to evaluate the spatial relationship existing between Communities and Populated Places.
  • there is absolutely nothing strange in this query: you'll simply use a JOIN condition based on a spatial relationship.
  • using ST_Transform() is only required when the SRIDs don't match, to re-project any coordinate into the same SRID.
  • anyway you are already warned: you are now well conscious that using a so simplistic approach (i.e. not using the R*Tree Spatial Index) will surely produce a very slowly running query.

SELECT
 pp.id AS PopulatedPlaceId,
 pp.name AS PopulatedPlaceName,
 c.community_id AS CommunityId,
 c.community_name AS CommunityName
FROM populated_places AS pp, communities AS c
WHERE
(
 -- 1st filter: Populated-Place (POINT) inside the BoundingBox of the Community MULTIPOLYGON
 c.ROWID IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- the BoundingBox is a rectangle around the (likley wiggly) boundry of the Community MULTIPOLYGON
   (f_table_name = 'communities') AND
   (search_frame = pp.geometry)
  )
 ) AND
 -- 2nd filter: Populated-Place (POINT) inside the (likley wiggly boundry) of the Community MULTIPOLYGON
 (ST_Contains(c.geometry,pp.geometry))
);-- 9897 records

This further query is exactly the same as the first one: except in that this second version fully exploits the R*Tree Spatial Index.
Note: had be not have done the one-time Transformation to 32632 when importing the TABLE,
ST_Transform(pp.geometry,32632) would have to have been done at least 9897 times, everytime we used this query

There is still an unresolved issue in the above query: following this way any mismatching Populated Place will never be identified.
In order to detect if some Populated Place does actually falls outside any corresponding Local Council you absolutely have to implement a LEFT JOIN.
All right: lets create the final version:
SELECT
 pp.id AS PopulatedPlaceId,
 pp.name AS PopulatedPlaceName,
 c.community_id AS CommunityId,
 c.community_name AS CommunityName,
 p.province_name AS Province,
 r.region_name AS Region
FROM populated_places AS pp
LEFT JOIN communities AS c ON
(
 -- 1st filter: Populated-Place (POINT) inside the BoundingBox of the Community MULTIPOLYGON
 c.ROWID IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- the BoundingBox is a rectangle around the (likley wiggly) boundry of the Community MULTIPOLYGON
   (f_table_name = 'communities') AND
   (search_frame = pp.geometry)
  )
 ) AND
 -- 2nd filter: Populated-Place (POINT) inside the (likley wiggly boundry) of the Community MULTIPOLYGON
 (ST_Contains(c.geometry,pp.geometry))
)
LEFT JOIN provinces AS p ON (p.province_id = c.province_id)
LEFT JOIN regions AS r ON (r.region_id = p.region_id)
ORDER BY Region, Province, CommunityName;
-- 9900 records

Note: This result will return 3 records in populated_places that were not found in communities.
  • you've simply added further LEFT JOIN clauses, so to fully qualify each Local Council reporting the corresponding County and Region.
  • not at all surprisingly about twenty Populated Places doesn't actually correspond to any Local Council
    (you expected this, because these two datasets come from unrelated sources).
  • you can duly use QGIS to visually inspect such mismatching entries: and you'll soon discover that in each case all them are towns placed very closely to sea shore.
    And some (slight) misplacement actually exist.

recipe #15: Tightly bounded Populated Places

The problem

Yet another problem based on the populated_places dataset. This time the question is:
  • Identify any possible couple of Populated Places laying at very close distance: < 1 Km
  • Compair the original latitude/longitude values of the Populated Places when compairing distances

Please note: this problem hides an unpleasant complication.
  • the Populated Places latitude/longitude values contained in thedataset are of the 4236 SRID (Geographic, WGS84, long-lat)
  • accordingly any distances are naturally measured in decimal degrees
    A Geographic distance of 0.02 means 2/100 of degree (depending how far away the position is from the equator, about 2Km, on the Great Circle).
  • The GeodesicLength() function calculates the total length (expressed in meters) for any long-lat LINESTRING.
    This function gives very accurate results, because is taken directly on the ellipsoid.
    Unhappily, this requires lots of complex calculations, so computing a Geodesic length is intrinsically an heavy (and slow) process.
  • A function, such as ST_Buffer would, however, need a Geographic distance on a Geographic based geometry
but the imposed range limit is expressed in meters/Km

PopulatedPlace #1 Distance (meters) PopulatedPlace #2
Vallarsa 49.444299 Raossi
Raossi 49.444299 Vallarsa
Seveso 220.780551 Meda
Meda 220.780551 Seveso
... ... ...

DROP TABLE IF EXISTS temp_Difference_Meters;
CREATE TEMP TABLE temp_Difference_Meters AS
SELECT
 pp1.name AS "PopulatedPlace #1",
 GeodesicLength(MakeLine(MakePoint(pp1.longitude,pp1.latitude,4326), MakePoint(pp2.longitude,pp2.latitude,4326))) AS "Geodesic-Distance (meters)",
 (ST_Length(MakeLine(pp1.geometry, pp2.geometry)))-
  GeodesicLength(MakeLine(MakePoint(pp1.longitude,pp1.latitude,4326), MakePoint(pp2.longitude,pp2.latitude,4326))) AS "Difference in Meters",
 ST_Length(MakeLine(pp1.geometry, pp2.geometry)) AS "Meters-Distance (meters)",
 pp2.name AS "PopulatedPlace #2",
 0 AS set_unique
FROM populated_places AS pp1, populated_places AS pp2
WHERE
(
 -- 1st filter: not the same Populated-Place
 (pp1.id <> pp2.id) AND
 -- 2nd filter: Populated-Place (POINT) inside an area 1000 meters around the Populated-Place
 pp2.ROWID IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- the BoundingBox is a square 1000 meters around the Populated-Place (width/height 2000 meters)
   (f_table_name = 'populated_places') AND
    -- create a circle geometry [which is in meters] that is 1000 meters around the Populated-Place
   (search_frame = ST_Buffer(pp1.geometry,1000))
  )
 ) AND
 -- 3rd filter: check if the 2nd Populated-Place (POINT) is within the 1000 meter circle around the 1st Populated-Place
 (ST_Contains(ST_Buffer(pp1.geometry,1000),pp2.geometry))
)
ORDER BY "PopulatedPlace #1";

UPDATE temp_Difference_Meters SET set_unique=1 WHERE
(
 Exists
 (
  SELECT "PopulatedPlace #1" FROM temp_Difference_Meters AS diff_meters
  WHERE
  (
   (diff_meters."PopulatedPlace #1" = temp_Difference_Meters."PopulatedPlace #2") AND
   (diff_meters."PopulatedPlace #2" = temp_Difference_Meters."PopulatedPlace #1")
  ) AND (set_unique < 1)
 )
);

CREATE TABLE Difference_Meters AS
 SELECT
  "PopulatedPlace #1",
  "Geodesic-Distance (meters)",
  "Difference in Meters",
  "Meters-Distance (meters)"
 FROM temp_Difference_Meters
 WHERE (set_unique = 1);
DROP TABLE IF EXISTS temp_Difference_Meters;
--523 records
-- Avoid double entries (Point A to B and B to A) ; will sort by result of ST_Length ASC
-- GROUP BY ST_Length(MakeLine(pp1.geometry, pp2.geometry));
523 records

This time we'll go straight forward to final solution.
Building site Placeholder: TODO: Expand 'Difference_Meters' Sql-Query
Goals:
Remove double text Entries
    [Balsorano --> Balsorano Nuovo and Balsorano Nuovo --> Balsorano ; both 134.548931]
Sort by first entry
avoid removal of otherwise valid results, since GROUP BY ST_Length could remove too much
Explain why:
  • GROUP BY ST_Length(MakeLine(pp1.geometry, pp2.geometry)); could lead to false results
    May be more that two [not the case here] results of a distance of 134.548931 meters
    this causeing removal of another, valid, result.
  • Use of CREATE TEMP TABLE to help a solution that can only be done in 2-3 steps as above

I should be clear to everyone that using a Spatial Index is absolutely required to get a decently well-performing query.
And that a JOIN between two different instances of the same table is required to perform this kind of Spatial Analysis, and so on ...

So we'll simply focus our attention on the most notable highlights:
Performing a Spatial query like this one in the most naive way requires an extremely long time, even if you'll use the most recent and powerful CPU.
But carefully applying a little bit of optimization is not too much difficult.
And a properly defined an well optimized SQL query surely runs in the smoothest and fastest way.

recipe #16: Railways vs Communities

Note:
  The railways dataset used here were created in
   recipe #1: creating a well designed DB.

The problem

This time we'll use for the first time the railways dataset.
Please remember: this one is a really small dataset simply representing two railway lines which uses the 23032 SRID [ED50 UTM zone 32].

Please note: this problem hides an unpleasant complication.
  • Since the railways dataset uses a different SRID a ST_Transform must be done before compairing any values from another SRID
    not doing so would result in wierd results, unless the SRID are very similar (as in this case)
    So, as a general rule: Use ST_Transform on geometries containig different SRIDs.

The problem is:
  • Identify any Local Council crossed by a railway line.

Railway Community Province Region
Ferrovia Roma-Napoli Aprilia Latina Lazio
Ferrovia Roma-Napoli Ardea Roma Lazio
Ferrovia Roma-Napoli Lanuvio Roma Lazio
Ferrovia Roma-Napoli Pomezia Roma Lazio
Ferrovia Roma-Napoli Roma Roma Lazio

SELECT
 rw.name AS Railway,
 c.community_name AS Community,
 p.province_name AS Province,
 r.region_name AS Region
FROM railways AS rw
JOIN communities AS c ON
(
 -- 1st filter: Railway-line (MULTILINESTRING) going through BoundingBox of Community [MULTIPOLYGON]
 c.ROWID IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- the BoundingBox is a square 1000 meters around the Populated-Place (width/height 2000 meters)
   (f_table_name = 'communities') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
    -- to calculate the values precisely, transform Railway-Line from 23032 to 32632
   (search_frame = ST_Transform(rw.geometry,32632))
  )
 ) AND
 ST_Intersects
 (
  -- Does the Railway-Line pass through ...
  ST_Transform
  (
   -- transform from 'ED50 UTM zone 32' to 'WGS 84 / UTM zone 32N'
   rw.geometry,32632
  ),
  -- ... the wiggly boundry of the Community ?
  c.geometry
 )
)
JOIN provinces AS p ON (p.province_id = c.province_id)
JOIN regions AS r ON (r.region_id = p.region_id)
ORDER BY r.region_name, p.province_name, c.community_name;
-- 83 records

We'll simply examine few interesting key points:
  • a JOIN clause is used so to retrieve the corresponding County and Region for each Local Council.
    You already know how this works, because you had already used this in some previous example.
  • the most interesting point in this query is in the first JOIN clause:
    the ST_Intersects() function is used to evaluate a Spatial relationships between the communities and the railways tables.
    Anyway all this isn't at all surprising, because you've already seen something like this in previous examples.
  • and once again the appropriate R*Tree Spatial Index is used in order to speed up the query.
More or less, this is quite the same thing of the previous example, when we examined Spatial relationships existing between Communities and Populated Places.
Anyway, this confirms that using any possible kind of Spatial relationship is a reasonably easy task, and that you can successfully use Spatial relationships to resolve lots of different real-world problems.

recipe #16a: SpatialIndex as BoundingBox

The problem

We'll use once again the railways dataset, using the Sql-Query found in recipe #16: Railways vs Communities.
SpatialIndex as BoundingBox (part 1)
SpatialIndex-BoundingBox.02
Explanation of Image: Area shown is about 20 KM South/East of Rome (1912)
  • The Railway-Line between Napels and Rome is shown as an orange line
  • BoundingBox (rectangles) of 3 Communities (shown in gold) are shown as green lines
    The green rectangles are the BoundingBox of the gold Polygons
  • The Boundry-Polygon of 8 Communities are shown as red and gold lines
    The BoundingBox of the red Polygons are not shown.
    The Railway-Line passes through the red Polygons.
    The Railway-Line does not pass through the gold Polygons.
    The Railway-Line does pass through the 2 of the 3 green BoundingBox's.

Goal of the SpatialIndex based Query
retrieve a list of Communities that Railway-Line passes through in this area,
  restricting the amount of Communities to search in to 8.

So after adding the following before the -- 1st filter ...
-- 0a filter: search only Railway-Line 'Ferrovia Roma-Napoli'
(rw.id=2) AND
-- 0b filter: list of 8 specific Communities to search through
(c.community_id IN (58117,58079,58091,58003,58009,58057,59001,58050)) AND

... and removing the ST_Intersects portion, we get the following result:
Railway Community Province Region
Ferrovia Roma-Napoli Aprilia Latina Lazio
Ferrovia Roma-Napoli Albano Laziale Roma Lazio
Ferrovia Roma-Napoli Ardea Roma Lazio
Ferrovia Roma-Napoli Ariccia Roma Lazio
Ferrovia Roma-Napoli Lanuvio Roma Lazio
Ferrovia Roma-Napoli Marino Roma Lazio
Ferrovia Roma-Napoli Pomezia Roma Lazio
Ferrovia Roma-Napoli Roma Roma Lazio

which was not what I was expecting., since the Rail-Line does not pass through the BoundingBox of Marino.
Why?
Explanation why:
The area given for the search with the Spatial-Index was:
(search_frame = ST_Transform(rw.geometry,32632))
and as the term search_frame implies, it is the Bounding-Box of the geometry (not the geometry itsself).

Since the Railway-Line between Napels and Rome is a steep South/East to North-West line and the Max/Min Y/Y values covers a great area outside the shown image. (containg 353 Communities in a 157.471 x 107.412 KM area).
Conclusion:
The area that is to be searched with the SpatialIndex should be reduced
  • 7739 Communities were filtered out by the SpatialIndex
  • 353 Communities were returned by the SpatialIndex
  • 8 ('0b filter') of the 353 were displayed
For this situation the geometry should be replaced with an explicit area to search in.

And with that we complete 'Brilliant idea (version 1)' to 'Brilliant idea (version 2)'.
SpatialIndex as BoundingBox (part 2)
SpatialIndex-BoundingBox.02
Having wasted a great deal of time on part 1, we (the writer) have come to the conclusion:
to use the SpatialIndex correctly and have defined a more realistic area that is shown in the second image
(search_frame = BuildMbr(795201.2499,4625100.7198,811621.7722,4615838.1497,32632))

We now get our desired result (Marino is now out of range):
Railway Community Province Region
Ferrovia Roma-Napoli Aprilia Latina Lazio
Ferrovia Roma-Napoli Albano Laziale Roma Lazio
Ferrovia Roma-Napoli Ardea Roma Lazio
Ferrovia Roma-Napoli Ariccia Roma Lazio
Ferrovia Roma-Napoli Lanuvio Roma Lazio
Ferrovia Roma-Napoli Pomezia Roma Lazio
Ferrovia Roma-Napoli Roma Roma Lazio

The SpatialIndex has filtered out many more records.
  • 8082 Communities were filtered out by the SpatialIndex
  • 10 Communities were returned by the SpatialIndex
  • 8 ('0b filter') of the 10 were displayed
The 2 Communities that are red are those where the Railway-Line the BoundingBox,    but not through Community boundry.
When the ST_Intersects portion the original Query is again added,
  only the 8 Communities that are black will be returned.
Why?
Explanation why:
  • The SpatialIndex will only search the BoundingBox of the geometry and can therefore only determin candidats that possibly fufull the conditions
  • ST_Intersects will search the geometry to determin if the Railway-Line 'wanders' into the 'wiggly' boundry of the Community
Being fast and efficiant the SpatialIndex should be used first, in the WHERE conditions
- to filter out all records that are out of the question - and then ,
the more expensive, ST_Intersects on possible candidats that are left over..
The final query with the results
SELECT
 rw.name AS Railway,
 c.community_name AS Community,
 p.province_name AS Province,
 r.region_name AS Region
FROM railways AS rw
JOIN communities AS c ON
(
 -- 1st filter: Railway-line (MULTILINESTRING) going through BoundingBox of Community [MULTIPOLYGON]
 c.ROWID IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- the BoundingBox is a square 1000 meters around the Populated-Place (width/height 2000 meters)
   (f_table_name = 'communities') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
    -- to calculate the values precisely, transform Railway-Line from 23032 to 32632
   (search_frame = BuildMbr(795201.2499,4625100.7198,811621.7722,4615838.1497,32632))
  )
 ) AND
 ST_Intersects
 (
  -- Does the Railway-Line pass through ...
  ST_Transform
  (
   -- transform from 'ED50 UTM zone 32' to 'WGS 84 / UTM zone 32N'
   rw.geometry,32632
  ),
  -- ... the wiggly boundry of the Community ?
  c.geometry
 )
)
JOIN provinces AS p ON (p.province_id = c.province_id)
JOIN regions AS r ON (r.region_id = p.region_id)
ORDER BY r.region_name, p.province_name, c.community_name;
-- 6 records

We now get our expected result (Marino is now out of range):
Railway Community Province Region
Ferrovia Roma-Napoli Aprilia Latina Lazio
Ferrovia Roma-Napoli Ardea Roma Lazio
Ferrovia Roma-Napoli Lanuvio Roma Lazio
Ferrovia Roma-Napoli Pomezia Roma Lazio
Ferrovia Roma-Napoli Roma Roma Lazio
Ferrovia Roma-Napoli Velletri Roma Lazio

Well, ... almost the expected result
Further south (outside of the image) the Railway-Line runs into Velletri,
which can be seen at the right edge, and was overlooked when the images were made.
And with that we complete 'Brilliant idea (version 2)' with no 'Brilliant idea (version 3)' being planned.

recipe #17: Railways vs Populated Places

The problem

We'll use once again the railways dataset.
But this really is an hot spiced recipe: be prepared to taste very strong flavors.
As you can now easily image by yourself, computing distances between a railway line and Populated Places isn't so difficult.
So this problem introduces a further degree of complexity (just to escape from boredom and to keep your mind active and interested).
SpatialIndex-BoundingBox.03
Image that for any good reason the following classification exists:
Class Min. distance Max. distance
A-class 0 Km 1 Km
B-class 1 Km 2.5 Km
C-class 2.5 Km 5 Km
D-class 5 Km 10 Km
E-class 10 Km 20 Km

The problem you are faced to resolve is:
  • identify any Populated Place laying within a distance radius of 20 Km from a Railway.
  • identify the corresponding distance Class for each one of such Populated Places.

Explanation of Image: Area around Italy (1861)
  • The Railway-Lines Roma-Napoli and Adriatica are shown as an orange line
  • The BoundingBox (rectangles) of the 2 Railway-Lines are shown as green Polygons
  • The 9900 Populated-Places are shown as red Points
    495 Populated-Places are within the BoundingBox of the Adriatica Railway-Line
    414 Populated-Places are within the BoundingBox of th Roma-Napoli Railway-Line
    91 (of the 909) Populated-Places are within the BoundingBox of both Railway-Lines
  •   818 (909-91) Populated-Places are within  the BoundingBox of one or both Railway-Lines
  • 9082 Populated-Places are outside the BoundingBox of both Railway-Lines
Based in this situation, we will start to plan our Sql-Query (part 1):
SpatialIndex-BoundingBox.Foro Romano, Roma
Image showing same area, displaying the A, B, C class zones
  • with 'Roma, Foro Romano' outside the BoundingBox of the SpatialIndex
  • but inside the'B class' area
and therefore must be found in the results of the SpatialIndex

What is faster and more efficient?
  • to query, with ST_Distance, all 9082 Populated-Places
  • and then filter out, with the SpatialIndex, the 8173 Populated-Places that are outside of the Railway-Line's BoundingBoxes
or
  • collect, with the SpatialIndex, the 909 Populated-Places that are within both of the Railway-Line's BoundingBoxes
  • and then query, with ST_Distance, the likely candidates that the SpatialIndex has collected

Answer:
Since the SQLite's R*Tree based SpatialIndex is fast and efficient
it should be used first to remove all, out of the question, results.
Then, with the likley candidates, the further, final, condition checks
with the more CPU entensive commands should be done.

Explanation of Image: Area Rome (1884)
  • The Railway-Line Roma-Napoli, ending at the (present day) Roma-Termini, is shown as an orange line
  • The BoundingBox (rectangle) of the Railway-Line is shown as a green Polygon
  • The 2 Populated-Places are shown as red Points, with Text
    the original GeoNames position is inside the BoundingBox of the Railway-Line
       (around 1.195 KM from Roma-Termini and 124 meters away from the nearst part of the Railway-Line)
    the added Roma,Foro Romano position is outside (by 1312 meters) the BoundingBox of the Railway-Line
       (around 1.555 KM from Roma-Termini, which is the nearst part of the Railway-Line)
INSERT INTO populated_places
(name,longitude,latitude,geometry)
VALUES("Roma, Foro Romano (Colonna di Foca, Basilica Giulia)", 12.484847,41.892464, ST_Transform(MakePoint(12.484847,41.892464,4326),32632));

The added position is more realistic,
since it lies within the city center, being within the former administration center of the Roman Empire.
Further considerations when planing the Sql-Query (part 2):

Since the goal is to identify any Populated-Place laying within a given distance radius of: 1, 2.5, 5, 10 and 20 KM
a Populated-Place outside the BoundingBox, but inside the given radius must also be found.

For this we will extend the original search_frame of the SpatialIndex from:
  • ST_Transform(rw.geometry,32632)
to:
  • ST_Buffer(ST_Transform(ST_Envelope(rw.geometry),32632), radius)

Achieved optimations:
  • ST_Envelope will return 5 POINTs of the BoundingBox of the Railway-Line,
    since the SpatialIndex only uses the BoundingBox of a geometry anyway
  • ST_Transform will transform these 5 POINTs (instead of the > 1000 POINTs contained the Roma-Napoli Railway-Line)
  • ST_Buffer will increase the area of the BoundingBox by the given radius
The SpatialIndex will return more results this way, but the results will be more realistic than the previous version.
Further considerations when planing the Sql-Query (part 3):
ShortestLine ST_Distance Railways
Where, in the query. should ST_Distance be placed?.

Let's take a closer look at what is being done in the backgound
   that Sqlite's Query Optimizer has no idea (nor cares) about:
Explanation of Image: Area between Naples and Rome (1861)
  • The Railway-Lines Roma-Napoli and Adriatica are shown as an orange line
  • The BoundingBox's (rectangles) of these 2 Railway-Lines is shown as a green Polygon
  • A third orange line shows the result of ST_ShortestLine (a cousin of ST_Distance)
       connecting both Railway-Lines at the closest point between them,
       with ST_Distance returning: 121304.141990 meters

The Sql-Command used to create this LINESTRING, with ST_ShortestLine, is:
INSERT INTO railways
(id,name,geometry)
SELECT
 3 AS id,
 "ShortestLine between 'Ferrovia Adriatica / Roma-Napoli' Railway-Lines ; Distance[121304.141990 meters]" AS name,
 -- geometry is defined as a 'MULTILINESTRING'
 CastToMultiLinestring
 (
  -- Result will be (in most cases) a LINESTRING
  -- - as the shortest distance between Line_1 and Line_2
  ST_ShortestLine
  (
   -- GEOS will search for the nearest position along Line_1 to Line_2
   (SELECT geometry FROM railways WHERE (id = 1)),
   -- GEOS will search for the nearest position along Line_2 to Line_1
    (SELECT geometry FROM railways WHERE (id = 2))
  )
  -- if NOT casted to 'MULTILINESTRING', the 'INSERT' would fail
 )
) AS geometry;

The following Query tells us:
SELECT
 ST_NumGeometries
 (
  -- the Geometry-Collection to count
  ST_DissolvePoints
  (
  -- the 'MULTILINESTRING' to desolve as 'MULTIPOINT'
   (SELECT geometry FROM railways WHERE (id = 1))
  )
 ) AS count_line_1; --returns 1487
  • The Railway-Line Adriaticai (id = 1) contains 1487 POINTs (vertices)
  • The Railway-Line Roma-Napoli (id = 2) contains 1048 POINTs (vertices)
Editoral comment:
If you are getting the impression that    the combination of different Spatial-Functions gives you results,
that you otherwise expected as an Extra-Function, that impression is corrrect.

GEOS will search both geometries, search for each, the nearest position to the other (perpendicular distance)
  • for a POINT : this is very simple, the POINT is the nearest position
  • for a LINESTRING : this is more difficult: a position along the LINESTRING
       not necessarily a POINT/vertex inside the LINESTRING
    will be searched for in the segment between each POINT/vertex inside the LINESTRING until the closest is found.
    Only if this fails, will the nearest POINT/vertex inside the LINESTRING be returned.
  • for a POLYGON : its exterior-RING (as a closed LINESTRING) will be searched
Both ST_Distance and ST_ShortestLine use the same logic.
Summa summarum:
Being a very complex CPU operation
  • this should only be done with the likley candidates that the SpatialIndex has returned

ST_Distance will then check if the distance between
the Populated-Place and the nearest POINT on the Railway-Line is less or equal the given range.

The final Sql-Query below, will return something in the form of:
Railway PopulatedPlace distance [in meters] A class [< 1Km] B class [< 2.5Km] C class [< 5Km] D class [< 10Km] E class [< 20Km]
... ... ... ... ... ... ... ...
Ferrovia Roma-Napoli Roma, Foro Romano (Colonna di Foca, Basilica Giulia) 1555.353612 NULL 1 0 0 0
Ferrovia Roma-Napoli Rome ('GeoNames' position) 124.298444 1 0 0 0 0
... ... ... ... ... ... ... ...
Ferrovia Adriatica Vasto 620.351787 1 0 0 0 0
Ferrovia Adriatica Villamagna 10452.352746 NULL NULL NULL NULL 1
Ferrovia Adriatica Villalfonsina 4958.656719 NULL NULL NULL 1 0
Ferrovia Adriatica Zapponeta 12717.971546 NULL NULL NULL NULL 1
... ... ... ... ... ... ... ...

Explanation of returned values:
  • NULL: no results returned by SpatialIndex (ST_Distance was not called) or ST_Distance was called, but it's result was outside of the radius (no JOIN result for this class-type)
  • 0: result returned by SpatialIndex and ST_Distance, checking if within range, returns false (JOIN result for this class-type)
  • 1: result returned by SpatialIndex and ST_Distance, checking if within range, returns true (JOIN result for this class-type)
Since E class will always return a result (never NULL), PopulatedPlace and distance is from the result of 'pp_e.name'.
  • Stated otherwise: when E class is NULL, no record is shown
The final when Sql-Query:
and now, (... da,da,da,daaa ...) presenting the horror Query in all it's glory:
   (but don't worry - each portion will be explained further down)
SELECT
 rw.name AS Railway,
 pp_e.name AS PopulatedPlace,
 ST_Distance(ST_Transform(rw.geometry, 32632),pp_e.geometry) AS "distance [in meters]",
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_a.geometry) <= 1000.0) AS "A class [< 1Km]",
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_b.geometry) > 1000.0) AS "B class [< 2.5Km]",
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_c.geometry) > 2500.0) AS "C class [< 5Km]",
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_d.geometry) > 5000.0) AS "D class [< 10Km]",
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_e.geometry) > 10000.0) AS "E class [< 20Km]"
FROM railways AS rw
JOIN populated_places AS pp_e ON
(
 -- retrieve 'E class' Populated-Places using the SpatialIndex
 pp_e.id IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- only Populated-Places within the BoundingBox of the Railway-Line [909=818+91]
   (f_table_name = 'populated_places') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
   -- extract BoundingBox from Railway-Line ; Tranform result and expand by 20000.0 meters
   (search_frame = ST_Buffer(ST_Transform(ST_Envelope(rw.geometry),32632), 20000.0))
  )
 ) AND
 -- check if candidates are within the distance condition [376, NOT=533]
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_e.geometry) <= 20000.0)
)
LEFT JOIN populated_places AS pp_d ON
(
 (pp_e.id = pp_d.id) AND
 -- retrieve 'D class' Populated-Places using the SpatialIndex
 pp_d.id IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- only Populated-Places within the BoundingBox of the Railway-Line [909=818+91]
   (f_table_name = 'populated_places') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
   -- extract BoundingBox from Railway-Line ; Tranform result and expand by 10000 meters
   (search_frame = ST_Buffer(ST_Transform(ST_Envelope(rw.geometry),32632), 10000.0))
  )
 ) AND
 -- check if candidates are within the distance condition
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_d.geometry) <= 10000.0)
)
LEFT JOIN populated_places AS pp_c ON
(
 (pp_d.id = pp_c.id) AND
 -- retrieve 'C class' Populated-Places using the SpatialIndex
 pp_c.id IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- only Populated-Places within the BoundingBox of the Railway-Line [909=818+91]
   (f_table_name = 'populated_places') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
   -- extract BoundingBox from Railway-Line ; Tranform result and expand by 5000 meters
   (search_frame = ST_Buffer(ST_Transform(ST_Envelope(rw.geometry),32632), 5000.0))
  )
 ) AND
 -- check if candidates are within the distance condition
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_c.geometry) <= 5000.0)
)
LEFT JOIN populated_places AS pp_b ON
(
 (pp_c.id = pp_b.id) AND
 -- retrieve 'B class' Populated-Places using the SpatialIndex
 pp_b.id IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- only Populated-Places within the BoundingBox of the Railway-Line [909=818+91]
   (f_table_name = 'populated_places') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
   -- extend the search_frame by 2500 meters ('Roma, Foro Romano' will be found)
   (search_frame = ST_Buffer(ST_Transform(ST_Envelope(rw.geometry),32632), 2500.0))
  )
 ) AND
 -- check if candidates are within the distance condition
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_b.geometry) <= 2500.0)
)
LEFT JOIN populated_places AS pp_a ON
(
 (pp_b.id = pp_a.id) AND
 -- retrieve 'A class' Populated-Places using the SpatialIndex
 pp_a.id IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- only Populated-Places within the BoundingBox of the Railway-Line [909=818+91]
   (f_table_name = 'populated_places') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
   -- extract BoundingBox from Railway-Line ; Tranform result and expand by 1000 meters
   (search_frame = ST_Buffer(ST_Transform(ST_Envelope(rw.geometry),32632), 1000.0))
  )
 ) AND
 -- check if candidates are within the distance condition
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_a.geometry) <= 1000.0)
)
ORDER BY PopulatedPlace;
-- 512 records [00:00:05.022]

Yes, this one really looks like a complex and intimidating query, but it is not as bad as it looks.

You already know the trick:
   you simply have to break down the Sql-Query into several smaller chunks.

And then you'll soon discover that each chunk is quite a simple affair.
Let us examine this then, piece by piece:

Chunk 1: the outer chunk
   (like the outer crust of a pie)
SELECT
 rw.name AS Railway, ...
FROM railways AS rw
JOIN populated_places AS pp_e ON (...)
LEFT JOIN populated_places AS pp_d ON (...)
LEFT JOIN populated_places AS pp_c ON (...)
LEFT JOIN populated_places AS pp_b ON (...)
LEFT JOIN populated_places AS pp_a ON (...);
  • we'll simply JOIN the 'railways AS rw' and the 'populated_places AS pp_e' tables togeather
       and there is nothing strange in that, is there ?
  • then we'll perform a LEFT JOIN of the populated_places AS pp_d table a second time:
       and you'll, no doubt, remember that a LEFT JOIN inserts a valid row into the result-set even when the right-sided term evaluates to NULL, right?
  • and finally we'll simply repeat LEFT JOIN for pp_c, pp_b and pp_a.

Chunk 2 a: part of the inner chunk
   (an ingredient inside the crust of a pie)

Defining each JOIN (or LEFT JOIN)
We'll obviously start by using the SpatialIndex, which will, fast and efficiantly extract all of the likley candidates that will fulfill the distance criteria.
   (what will be used as the main ingredient of the pie  (mincemeat for a meat pie, peakan's for sweet pie, or something else, whatever might be needed for a vegatarian)

...
JOIN populated_places AS pp_e ON
(
 -- retrieve 'E class' Populated-Places using the SpatialIndex
 pp_e.id IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- only Populated-Places within the BoundingBox of the Railway-Line [909=818+91]
   (f_table_name = 'populated_places') AND
   -- geometry-column to use (only mandatory when the TABLE contains more than 1 geometry)
   (f_geometry_column = 'geometry') AND
   -- extract BoundingBox from Railway-Line ; Tranform result and expand by 20000 meters
   (search_frame = ST_Buffer
                   (
                    -- to calculate the values precisely, transform Railway-Line from 23032 to 32632
                    ST_Transform
                    (
                     -- Extract the BoundingBox of the geometry (which is all the SpatialIndex uses)
                     ST_Envelope
                     (
                      -- the geometry to use for Envelope/Transform and Buffer
                      rw.geometry
                     ),
                      --the SRID to transform the geometry to
                     32632
                    ),
                    --Expand by value in SRID-Units (for 32632: meters)
                    20000.0
                   )
   )
  )
 ) AND ...
)

Chunk 2 b: also a part of the inner chunk
   (to further the flavor the main ingredient inside the pie like onions, syrup or (again) what ever else is needed)

Evaluate, with ST_Distance the candidates, checking if this one falls within the expected threshold for the corresponding Class.
 ... AND
 (ST_Distance(ST_Transform(rw.geometry, 32632),pp_e.geometry) <= 20000.0)

Chunks 2 c-z: any other conditions, as needed.
   (anything else to make the pie nice and tasty)

With that, the main building block of the query should be clear:
  • the first JOIN will include into the result-set any Populated Place falling within 20 Km from the railway line.
  • any other LEFT JOIN will then test decreasing distances, accordingly to the imposed Class boundaries.
  • and each LEFT JOIN carefully checks if the Populated Place ID is the same of the previous successfully identified Class, as in: pp_d.id = pp_c.id
  • each time one such LEFT JOIN will fail, then corresponding NULL-values will be inserted into the result-set.
Everything else ... is a repetition of each Chunks N a-z for the other classes.
   ( ... is making enough pies for a nice party. Don't forget the vegatarian though,  there is always one that turns up ! Remember: in 50 years, it may be the other way around.)

And now you are ready to start experimenting ...
by making further changes to this query like
  • adding some other ORDER BY
  • adding/changing some new WHERE conditions
Soon you will see how easy and powerful is really is.

  • Don't be discouraged if doesn't work the first ... fourth time off !
  • Remember: Rome wasn't built in a day !
  • Things of quality take their time to evolve.

by 'playing around' with your borrowed recipies by
  • adding some other ingredients
  • adding/changing some new spices for more flavors
Soon you will see how a bit of creativity can make a big difference (enough not to notice that ...).

  • Don't be discouraged if tasts horrable the first ... fourth time off !
  • Remember: Italian cooking started before and continued after, Marco Polo's visit to foreign parts !
  • Things of quality take their time to evolve.

recipe #18: Railway Zones as Buffers

The problem

This is like a kind of visual conclusion of the latest exercise. The problem now is:
  • create an appropriate Map Layer representing A, B, C, D and E-class zones as previously defined.
We'll start creating a new table:
QGIS - buffers
  • as usual, we'll first create the table, omitting any Geometry column
  • and we'll then create the Geometry column in a second time, using AddGeometryColumn(); but you already known all this.
  • placing this new table into the 32632 SRID [WGS 84 / UTM zone 32N] is an absolutely obvious choice: after all, the original railways table is into the same identical SRID
  • declaring a MULTYPOLYGON Geometry type is less obvious: but we'll see later why this is required.

CREATE TABLE railway_zones
(
 id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 railway_name TEXT NOT NULL,
 zone_name TEXT NOT NULL
);
SELECT
 AddGeometryColumn('railway_zones', 'geometry', 32632, 'MULTIPOLYGON', 'XY');

Nothing new in this INSERT INTO ... SELECT ... statement (having allready been done before).
INSERT INTO railway_zones
(id, railway_name, zone_name, geometry)
 SELECT
  NULL, name,
  'A class [< 1Km]' AS zone_name,
  CastToMultiPolygon
  (
   ST_Buffer(ST_Transform(geometry,32632), 1000.0)
  ) AS geometry
 FROM railways;

Except for the following points:
  • we'll use ST_Buffer() to create a POLYGON corresponding to the 1Km A-class zone.
  • Please note: since the exact type of any Geometry created by ST_Buffer(); may differ:
       either this function will create a POLYGON, or a MULTIPOLYGON depending on the exact shape of the input line,
       but also influenced by the given buffer radiusas well
    .
  • so, in order to avoid any possible type inconsistency we defined a MULTIPOLYGON Geometry for this table.
  • and we are now forcing the type by calling the explicit type-casting function CastToMultiPolygon()
  • since the source geometry uses SRID 23032, 'ED50 UTM zone 32', ST_Transform must be used.

Note:
   Adding a 'AS column_name' ist strictly cosmetic (not mandatory),
    but sometimes useful for clarity (which column is intended in the SELECT statement)
Now we will INSERT the data to the created table:
INSERT INTO railway_zones
(id, railway_name, zone_name, geometry)
 SELECT
  NULL, name,
  'B class [< 2.5Km]' AS zone_name,
  CastToMultiPolygon
  (
   -- Create a POLYGON that surrounds the A class
   ST_Difference
   (
    ST_Buffer(ST_Transform(geometry,32632), 2500.0),
    ST_Buffer(ST_Transform(geometry,32632), 1000.0)
   )
  ) AS geometry
 FROM railways;
INSERT INTO railway_zones
(id, railway_name, zone_name, geometry)
 SELECT
  NULL, name,
  'C class [< 5Km]' AS zone_name,
  CastToMultiPolygon
  (
   -- Create a POLYGON that surrounds the B class
   ST_Difference
   (
    ST_Buffer(ST_Transform(geometry,32632), 5000.0),
    ST_Buffer(ST_Transform(geometry,32632), 2500.0)
   )
  ) AS geometry
 FROM railways;
INSERT INTO railway_zones
(id, railway_name, zone_name, geometry)
 SELECT
  NULL, name,
  'D class [< 10Km]' AS zone_name,
  CastToMultiPolygon
  (
   -- Create a POLYGON that surrounds the C class
   ST_Difference
   (
    ST_Buffer(ST_Transform(geometry,32632), 10000.0),
    ST_Buffer(ST_Transform(geometry,32632), 5000.0)
   )
  ) AS geometry
 FROM railways;
INSERT INTO railway_zones
(id, railway_name, zone_name, geometry)
 SELECT
  NULL, name,
  'E class [< 20Km]' AS zone_name,
  CastToMultiPolygon
  (
   -- Create a POLYGON that surrounds the D class
   ST_Difference
   (
    ST_Buffer(ST_Transform(geometry,32632), 20000.0),
    ST_Buffer(ST_Transform(geometry,32632), 10000.0)
   )
  ) AS geometry
 FROM railways;
Creating any further zone isn't much more difficult.
  • we'll simply use ST_Difference() in order to get the appropriate representation:
    we create an interior hole for each zone, so as to exclude any other other zone we've already created, avoiding any overlapping.
You can now perform a simple visual check using QGIS.
And that's it, nothing more to it.
Note: (Addendum to recipe #17: Railways vs Populated Places)
  'Roma, Foro Romano' can be seen within the 'B class' area
  • the BoundingBox (green rectangle) of the Railway-Line was extended (using ST_Buffer) by 2500 meters
  • with this extended search_frame, the SpatialIndex finds 'Roma, Foro Romano' for 'B class'
  • without this extension of the search_frame, 'Roma, Foro Romano' would never be found, since it is not within the BoundingBox being used
        the query results should, therefore, be considered as invalid, since something that should have been found, was not found.
Rome - Railways with Zones

recipe #19: merging Communities into Provinces and so on ...

Note:
  The TABLEs used here were created in
   recipe #1: creating a well designed DB.

The problem

Communities, Provinces and Regions

follow a well defined order of hierarchy.
For administrative purposes Italy is subdivided into Regions
   Regions are subdivided into Provinces
   Provinces are subdivided into Communities
Following the appropriate Spatial SQL procedures
A set of administration geometries will be create from:
  • admin_cities from the Community geometries of com2011_s [LAU 2]
  • admin_provinces from the created admin_cities geometries [NUTS 3, LAU 1]
  • admin_regions from the created admin_provinces geometries [NUTS 2]
  • admin_countries from the created admin_regions geometries
Creation of the VIEW admin_cities
First create the admin_cities VIEW; this VIEW
represents a nicely de-normalized flat table, so to make any subsequent activity absolutely painless.
-- not being the first version of our masterpiece, remove previous version
DROP VIEW IF EXISTS admin_cities;
CREATE VIEW admin_cities AS
SELECT
 -- the id of the City
 c.pro_com AS id_city,
 -- the name of the City
 c.comune AS name_city,
 -- Insure whole numbers ('Hanged, drawn and quartered' has been abolished, no longer need for quarter sums)
 CAST (c.pop_2011 AS INTEGER) AS population_city,
 -- id of Province the City belongs to
 c.cod_pro AS id_province,
 -- name of Province the City belongs to
 p.provincia AS name_province,
 -- The Car Plate of the Province the City belongs to
 p.sigla AS car_plate_code,
 -- id of region the Province belongs to
 c.cod_reg AS id_region,
 -- id of region the Province belongs to
 r.regione AS name_region,
 -- The Geometry ofthe City
 c.geometry AS geom_city
 -- contains the columns 'cod_pro' and 'cod_reg'
FROM com2011_s AS c
 -- contains the columns 'cod_pro' and 'cod_reg'
JOIN prov2011_s AS p USING (cod_pro)
 -- contains the column and 'cod_reg'
JOIN reg2011_s AS r USING (cod_reg);
Note: Since the cod_pro and cod_reg columns exist in all 3 tables:
  • JOIN prov2011_s AS p USING (cod_pro)
  • JOIN reg2011_s AS r USING (cod_reg)
can be used, instead of :
  • JOIN prov2011_s AS p ON
    (
     (c.cod_pro = p.cod_pro)
    )
  • JOIN reg2011_s AS r ON
    (
     (p.cod_reg = r.cod_reg)
    )
With USING, checking is done that the named-column exists in the TABLEs that are to be JOINed.
(What some peaple do to make life easier for others, hoping that it will be appreciated)

SELECT
 *
FROM admin_cities
WHERE
(
 -- returns 87 results
 (id_province IN (41,99))
);

SELECT
 *
FROM admin_cities
WHERE
(
 -- returns 3 results
 (id_city IN (41009,51003,99024))
);
id_city name_city population_city id_province name_province car_plate_code id_region name_region geom_city
41009 Carpegna 1670 41 Pesaro e Urbino PU 11 Marche BLOB sz=14871 GEOMETRY
99024 Pennabilli 3017 99 Rimini RN 8 Emilia-Romagna BLOB sz=21741 GEOMETRY
51003 Badia Tedalda 1091 51 Arezzo AR 9 Toscana BLOB sz=23815 GEOMETRY
Creation of the TABLE admin_provinces
Now you'll create and populate the admin_provinces table:
-- this can take some time [ST_Area, ST_Union etc.], start TRANSACTION
BEGIN;
-- not being the first version of our masterpiece, remove previous version
DROP TABLE IF EXISTS admin_provinces;
CREATE TABLE admin_provinces AS
SELECT
 id_province,
 name_province,
 car_plate_code,
 -- Population of Province, as whole persons
 sum(population_city) AS population_province,
 -- Area of Province, in meters
 ST_Area(ST_Union(geom_city)) AS area_province,
 -- Amount of Cities belonging to the Province,
 count(id_city) AS city_count,
 -- id of Region the Province belongs to  id_region,
 --name of Region the Province belongs to  name_region,
 -- Build outer Boundry of Province, based on City-Geometries and stored as MULTIPOLYGON
 CastToMultiPolygon(ST_Union(geom_city)) AS geom_province
FROM admin_cities
 -- needed for aggregate functions: count, sum, ST_Area and ST_Union
GROUP BY id_province;

SELECT
 RecoverGeometryColumn
 (
   -- table-name of geometry
  'admin_provinces',
   -- column-name of geometry
  'geom_province',
   -- srid of geometry
  32632,
   -- '[MULTI] POINT, LINESTRING, POLYGON, GEOMETRYCOLLECTION [Z, M, ZM]'
   'MULTIPOLYGON',
   -- Dimension ['XY' (or 2) ; 'XYZ' (or 3) ; 'XYZM' (or 4) ; 'XYM']
  'XY'
 );
-- end TRANSACTION
COMMIT;
Notes:
  • When using the CREATE TABLE table_name AS syntax: you will not have a PRIMARY KEY.
  • RecoverGeometryColumn will fail if there is a mismatch for any geometry that is not defined with the given srid or geometry-type
  • A ORDER BY after GROUP BY is possible, but not advised, since the ROWID would not match id_province (which will be needed later)

  • the ST_Union() SQL Spatial function is used to aggregate / merge Geometries.
  • by defining a GROUP BY id_province clause then ST_Union() will work as an aggregate function, thus effectively building the Geometry representation corresponding to each single County.
  • please note well: you must absolutely call RecoverGeometryColumn() so to properly register the geom_province geometry column into the geometry_columns metadata table.

Some background information, to (hopefully), understand these terms better:
Aggregate function:
    SQL As Understood By SQLite: Aggregate Functions

is a function that will recieve, as parameter, an array of values
  collecting whatever is returned by the WHERE or GROUP BY condition.
ST_Union and ST_Collect, as Aggregate functions, can recieve, as one parameter, the 27 city-geometries
    similar in nature to a SELECT geometry FROM table Query, reading an array of values as input, based on a possible WHERE condition.
SELECT
 -- Combine Geometries (where possible) to 1 Polygon (outer boundry only)
 ST_Union(geom_city) AS Province_Rimini,
 -- Combine Geometries to 1 MULTIPOLYGON Geometry, with each City-Geometry
 ST_Collect(geom_city) AS Province_Rimini_with_Cities
FROM admin_cities
WHERE
(
 -- returns 27 results (the cities of the Province of 'Rimini')
 (id_province IN (99))
);

Both Aggregate functions will combine the 27 city-geometries, but return different results:
302 Outer Boundries of the Province Rimini Boundries of the Province Rimini and it's cities
ST_Union will return (where possible) one POLYGON,
  as the outer boundry of the Province.
ST_Collect will return one MULTIPOLYGON,
  as the outer and inner boundries of the Province and it's 27 city's.

Note: Calculation of the Area of a POLYGON and MULTIPOLYGON through ST_Area
POLYGON
  • The Area of the exterior-RING (there can only be one)
      minus the area of all (there can more than be one) interior-RING(s)
  • Rimini, 1 POLYGON with an area of 865014381.183741 meters
      the enclave belonging to Arezzo is not part of this sum
MULTIPOLYGON
  • The sum of the area of each POLYGON
  • Arezzo, 3 POLYGONs with an area of 3232992543.821383 meters
        the 1st POLYGON is the exclave inside Rimini is part of this sum (14970177.499305 meters).
        the 2nd POLYGON is the main portion of the Province Arezzo (3218022235.673755 meters).
    And what then, you may ask (as you no doubt do), is the third POLYGON ??
        Lets have a look with:
    SELECT
     ST_Area(geometry) AS area_polygon,
     AsEWKT(ST_Centroid(geometry)) AS center_polygon,
     item_no,geometry
    FROM ElementaryGeometries
    WHERE
    (
     (f_table_name = 'admin_provinces') AND
     -- only mandatory when the TABLE contains more than 1 geometry
     (f_geometry_column = 'geom_province') AND
     -- 'Arezzo': id_province=51
     (origin_rowid = 51)
    );

    area_polygon item_no center_polygon geometry
    14970177.499305 0 SRID=32632;POINT(757742.3498469464 4853880.121523556) BLOB sz=8996 GEOMETRY
    3218022235.673755 1 SRID=32632;POINT(730702.4218141739 4823875.472936004) BLOB sz=98614 GEOMETRY
    130.648324 2 SRID=32632;POINT(759715.6704926285 4832915.396960044) BLOB sz=116 GEOMETRY

    in QGis, with Scale 1:50 zoom to: 759715.6704926285,4832915.396960044
       and you will see something like this:
    3rd POLYGON of the Province Arezzo An area of 130.648324 meters with the widest point around 2.698 meters?
      This, I would say, can safely be clasified under the catagory 'goofups'
        and can often be avoided by using Topology systems.

Non-Aggregate function:
    SQL As Understood By SQLite: Core Functions     SQL As Understood By SQLite: CAST expressions

is a function that will recieve, as a parameter, a single value
  will not expect any other input returned by a (possible) WHERE condition.
ST_Intersection takes 2 Non-Aggregate geometries, to extract the common areas of both
    so a Sub-Query can be used, but then must return only one result for each parameter.
SELECT
 ST_Intersection
 (
  -- geometry 1 (Province of 'Rimini')
  ( -- A Sub-Query must be inclose in brackets
   SELECT
    -- creates 1 geometry of the outer boundry of the Province Rimini
    ST_Union(geom_city)
   FROM admin_cities
   WHERE
   (
    -- returns 27 results (the cities of the Province of 'Rimini', Emilia-Romagna)
    (id_province IN (99))
   )
  ), -- -end- of 1st Sub-Query
  -- geometry 2 (Province of 'Arezzo')
  ( -- A Sub-Query must be inclose in brackets
   SELECT
    -- creates 1 geometry of the outer boundry of the Province Arezzo
    ST_Union(geom_city)
   FROM admin_cities
   WHERE
   (
    -- returns 39 results (the cities of the Province of 'Arezzo', Toscana)
    (id_province IN (51))
   )
  ) -- -end- of 2nd Sub-Query
 ) AS Common_Boundry_Rimini_Arezzo;

The result is the common border between the Provincies of 'Rimini' and 'Arezzo':
ST_Intersection will return the common features of the 2 geometries
Common Boundries of the Province Rimini and Arezzo The cause of the hole in the 'Rimini'-Polygon above is
   an interior-RING of the POLYGON

A city that belongs to the Province of 'Arezzo',
   lies within the Province of 'Rimini'.
   (the city is a enclave of 'Arezzo' inside 'Rimini').

GROUP BY:

  Where a GROUP BY can be found, an Aggregate function is not far away
    It is often used to create a summery report of the data.
List all cities that have the same name:

SELECT
 -- amount of city names found
 count(name_city) AS city_name_count,
 -- the name of the city
 name_city,
 -- the id of the last city found
 id_city AS id_city_last
FROM admin_cities
-- 1st sort: 'city_name_count' will be set for each unique row
GROUP BY name_city
-- filter out the result we don't want
HAVING (city_name_count > 1)
-- 2nd sort: just to show it can be done
ORDER BY id_city_last;

Should return 8 results.
Just a quick check ...
SELECT
 *
FROM admin_provinces;
id_province name_province car_plate_code population_province area_province city_count id_region name_region geom_province
1 Torino TO 2247780 6826908024.258377 315 1 Piemonte BLOB sz=208817 GEOMETRY
2 Vercelli VC 176941 2081601547.524640 86 1 Piemonte BLOB sz=161821 GEOMETRY
3 Novara NO 365559 1340249720.791641 88 1 Piemonte BLOB sz=59421 GEOMETRY
... ... ... ... ... ... ... ... ...
51 Arezzo AR 343676 3232992543.821383 39 9 Toscana BLOB sz=107207 GEOMETRY
... ... ... ... ... ... ... ... ...
99 Rimini RN 321769 865014381.183741 27 8 Emilia-Romagna BLOB sz=123135 GEOMETRY
... ... ... ... ... ... ... ... ...

And then you are ready to display the admin_provinces map layer using QGIS.
Italy - Provinces
Creation of the TABLE admin_regions
Now you'll create and populate the admin_regions table:
-- this can take some time [ST_Area, ST_Union etc.], start TRANSACTION
BEGIN;
-- not being the first version of our masterpiece, remove previous version
DROP TABLE IF EXISTS admin_regions;
CREATE TABLE admin_regions
(
 id_region INTEGER NOT NULL PRIMARY KEY,
 name_region TEXT NOT NULL,
 -- Population of Region, as whole persons
 population_region INTEGER DEFAULT 0,
 -- Area of Region, in meters
 area_region DOUBLE DEFAULT 0,
 -- Amount of Cities belonging to the Region,
 city_count INTEGER DEFAULT 0,
 -- Amount of Provinces belonging to the Region,
 province_count INTEGER DEFAULT 0,
 -- id of Country the Region belongs to [default: 39, Italy]
 id_country INTEGER DEFAULT 39,
 -- name of Country the Region belongs to [default: Italy]
 name_country TEXT DEFAULT 'Italy'
);

SELECT
 AddGeometryColumn('admin_regions', 'geom_region', 32632, 'MULTIPOLYGON', 'XY');

INSERT INTO admin_regions
(id_region, name_region, population_region, area_region, city_count, province_count, geom_region)
SELECT
 id_region,
 name_region,
 -- Population of Region, as whole persons
 sum(population_province) AS population_region,
 -- Area of Region, in meters
 ST_Area(ST_Union(geom_province)) AS area_region,
 -- Amount of Cities belonging to the Region,
 sum(city_count) AS city_count,
 -- Amount of Provinces belonging to the Region,
 count(id_province) AS province_count,
 -- default values for id_country[39] and name_country['Italy']
 -- Build outer Boundry of Region, based on Province-Geometries and stored as MULTIPOLYGON
 CastToMultiPolygon(ST_Union(geom_province)) AS geom_region
FROM admin_provinces
 -- needed for aggregate functions: sum, count, ST_Area and ST_Union
GROUP BY id_region;

-- end TRANSACTION
COMMIT;
  • as in the previous step you'll use ST_Union() and GROUP BY to aggregate regions Geometries.
  • please note well: in this example you have explicitly created the regions table, then using AddGeometryColumn() so to create the regions.geometry column.
    And finally you have used INSERT INTO ... (...) SELECT ... in order to populate the table.
    The procedure is different, but the final result is exactly the same one as in the previous example.
SELECT
 id_region,
 name_region,
 population_region,
 area_region,
 city_count,
 province_count,
 geom_region
 -- We assume that [39, Italy] is known
FROM admin_regions;

Just a quick check ...
id_region name_region population_region area_region city_count province_count geom_region
1 Piemonte 4363916 25386696893.947697 1206 8 BLOB sz=258069 GEOMETRY
2 Valle D'Aosta 4363916 3260854219.911026 74 1 BLOB sz=57837 GEOMETRY
3 Lombardia 9704151 23863097447.125397 1544 12 BLOB sz=296140 GEOMETRY
... ... ... ... ... ... ...
And then you can display the admin_regions map layer using QGIS.
Italy - Regions

      Advanced Topic  :
Beware that: this topic is of a more  advanced  nature.
You may want to look into this later.
The information here in not strictly needed at this point.

Goals:
  Background:
    The Italian Peninsula contains 3 countries
    The Italian National Census 2011 data does not reference San Marino or the Vatican City as a Region, Province or City
    They are, however, indirectly contained in the data since both are enclaves of the Italian Republic.

The goals will therefore be:
  • extract both enclaves from the existing data
  • add each enclave to the admin_regions table, with an country definition
  • dynamically create as country in the admin_countries table, togeather with Italy
Notes:
  Most projects will contains aspects, which cannot be resolved within a general solution.
  Often one time solutions must be delt with.
  Also each aspect, may need a different solution.
  Some aspects, there may be different methods for a solution.

All of these are true for the case of the 2 enclaves:
  • the method for the Vatican City cannot be used for San Marino
  • there is more that one method to resolve the San Marino 'problem'
1748 Vatican City with Boundries City, Province, Region
The Vatican City

Explanation of Image: Area Rome (1748, Nuova Topografia di Roma di Giovanni Battista Nolli)
  • The Community boundry (yellow) of Roma (id_city=58091)
  • The Province boundry (blue) of Roma (id_province=58)
  • The Region boundry (red) of Roma (id_region=12)

The Community of Rome as seen in Blob-Explorer:
Community boundry Roma

An Sql-Query to analyse the geometry and work out the information needed later for the INSERT command.
Notes:
  the numbers seen in the Blob-Explorer can be used for the ST_GeometryN and ST_InteriorRingN commands.
  each POLYGON is shown as 'exterior ring' n the Blob-Explorer

SELECT
 -- Select the Community of Rome and view in Blob-Explorer
 geom_city,
 -- How many Polygons does the Community of Rome have ?
 ST_NumGeometries(geom_city) AS num_geom_city,
 -- Select the first Polygon [1 based]
 ST_GeometryN(geom_city,1) AS first_geom_city,
 -- How many Interior-Rings does the first Polygon have ?
 ST_NumInteriorRing(ST_GeometryN(geom_city,1)) AS num_InteriorRing,
 -- Select the first Interior-Ring of the first Polygon [1 based]
 ST_InteriorRingN(ST_GeometryN(geom_city,1),1) AS linestring_Vatican_City,
 -- Convert the Interior-Ring (closed LINESTRING) to a Polygon
 ST_MakePolygon(ST_InteriorRingN(ST_GeometryN(geom_city,1),1)) AS polygon_Vatican_City,
 -- Convert the Polygon to am MULTIPOLYGON for the admin_regions TABLE
 CastToMulti(ST_MakePolygon(ST_InteriorRingN(ST_GeometryN(geom_city,1),1))) AS multi_polygon_Vatican_City,
 -- Select the third Polygon [1 based]
 ST_GeometryN(geom_city,3) AS third_geom_city,
 -- Area of the third Polygon [142.236898 meters]
 St_Area(ST_GeometryN(geom_city,3)) AS third_geom_city_area,
 -- Select the fourth Polygon [1 based]
 ST_GeometryN(geom_city,4) AS fourth_geom_city,
 -- Area of the fourth Polygon [81.987090 meters]
 St_Area(ST_GeometryN(geom_city,4)) AS fourth_geom_city_area,
 -- Select the fifth Polygon [1 based]
 ST_GeometryN(geom_city,5) AS fifth_geom_city,
 -- Area of the fifth Polygon [34.695557 meters]
 St_Area(ST_GeometryN(geom_city,5)) AS fifth_geom_city_area,
 -- Create a corrected version of the Community of Rome geometry
 ST_Collect
 (
  -- Select the first Polygon [1 based]
  ST_GeometryN(geom_city,1),
  -- Select the second Polygon [1 based]
  ST_GeometryN(geom_city,2)
 ) AS corrected_geom_city
FROM admin_cities
WHERE
(
 (id_city=58091)
);

Conclusions:
  • multi_polygon_Vatican_City shows the Boundry of the Vatican City as we need it
  • POLYGONs: 3,4,5 are candidates for the catagory 'goofups'
  • corrected_geom_city could be used to correct the faulty Community of Rome geometry

INSERT Sql: the Vatican City
INSERT INTO admin_regions
(id_region, name_region, population_region, area_region, city_count, province_count, id_country, name_country, geom_region)
 SELECT
  -- County calling (telephon) code
  387 AS id_region,
  'Stato della Città del Vaticano (Status Civitatis Vaticanae)' AS name_region,
  -- Population of Region, as whole persons [wiki data]
  829 AS population_region,
  -- Area of Region, in meters [528986.602061 ; wiki: 301340]
  ST_Area(ST_MakePolygon(ST_InteriorRingN(ST_GeometryN(geom_city,1),1))) AS area_region,
  -- Amount of Cities belonging to the Region,
  1 AS city_count,
  -- Amount of Provinces belonging to the Region,
  0 AS province_count,
  -- override default values for id_country and name_country
   387 AS id_country,
  'Vatican City State' AS name_country,
  -- Extract from Community of Rome geometry and stored as MULTIPOLYGON
  CastToMulti(ST_MakePolygon(ST_InteriorRingN(ST_GeometryN(geom_city,1),1))) AS geom_region
 FROM admin_cities
 WHERE
 (
  (id_city=58091)
 );
Then you can display the admin_regions, with filter id_region=387, in QGIS.
Vatican City State - Regon

WMS San Marino with Boundries City, Province, Region
San Marino
Explanation of Image: Area San Marino
  • The Community boundries (yellow) id_city IN (41033,41035,41060,99003,99010,99014,99020,99025)
  • The Province boundries (blue) id_province IN (41,99)
  • The Region boundries (red) id_region IN (8,11)
  • The Country boundries (green stars) id_country IN (39)

A Sql-Query to analyse the geometrys and work out the information needed later for the INSERT command.
Notes:
  San Marino is not an enclave of any single Community, Province or Region.
    (it is an enclave of Italy, but not the only one)
  So a 'enclave' will be created by combining the neighboring Community geometries.
    and then extract the 'enclave' as done with the Vatican City.

Communities:
  1 POLYGON with 1 Interior-Ring
Provinces:
  3 POLYGONs, 1 with 3 Interior-Rings
San Marino from Cities San Marino from Provinces
The Communities Geometry offers itself as the simplest solution
SELECT
 -- Communities: 1 Polygons, 1st with 1 Interior-Ring
 (SELECT ST_Union(geom_city) FROM admin_cities WHERE (id_city IN (41033,41035,41060,99003,99010,99014,99020,99025))) AS community_San_Marino,
 -- Provinces: 3 Polygons, 1st with 3 Interior-Ring
 (SELECT ST_Union(geom_province) FROM admin_provinces WHERE (id_province IN (41,99))) AS province_San_Marino,
 -- Regions: 4 Polygons, 2nd with 3 Interior-Ring (not shown as image)
 (SELECT ST_Union(geom_region) FROM admin_regions WHERE (id_region IN (8,11))) AS region_San_Marino,
 -- Select the first Interior-Ring of the first Polygon [1 based]
 (SELECT ST_InteriorRingN(ST_GeometryN(ST_Union(geom_city),1),1) FROM admin_cities WHERE (id_city IN (41033,41035,41060,99003,99010,99014,99020,99025))) AS linestring_San_Marino,
 -- Convert the Interior-Ring (closed LINESTRING) to a Polygon
 (SELECT ST_MakePolygon(ST_InteriorRingN(ST_GeometryN(ST_Union(geom_city),1),1)) FROM admin_cities WHERE (id_city IN (41033,41035,41060,99003,99010,99014,99020,99025))) AS polygon_San_Marino,
 -- Convert the Polygon to a MULTIPOLYGON for the admin_regions TABLE
 (
  -- -start- of sub-query
  SELECT
   -- create a MULTIPOLYGON from a POLYGON
   CastToMulti
   (
    -- create a POLYGON from a closed LINESTRING, non-aggregate-geometry input
    ST_MakePolygon
    (
     -- Interior-Ring (closed LINESTRING), non-aggregate-geometry input
     ST_InteriorRingN
     (
      -- which Interior-Ring number, 1-based
      ST_GeometryN
      (
       -- non-aggregate-geometry input
       ST_Union
       (
        -- aggregate-geometry input
        geom_city
       ),
       -- which geometry from collection, 1-based
       1
      ),
     -- which Interior-Ring number, 1-based
     1
    )
   )
  )
  FROM admin_cities
  WHERE
  (
   -- condition for aggregate-geometry input
   (id_city IN (41033,41035,41060,99003,99010,99014,99020,99025))
  )
  -- -end- of sub-query
 ) AS multi_polygon_San_Marino;

Conclusions:
  • using community_San_Marino is simpler to use than the Province and Regions versions
  • the POLYGON extraction is almost the same as the used for the Vatican City.
     The aggregate function ST_Union is used to create the geometry that the non-aggregate function ST_GeometryN needs.
  • more details about this Sql-Query can be found in the Sql-HowTo
Here, again, you see that the combination of different Spatial-Functions gives you different results.
INSERT Sql: San Marino
INSERT INTO admin_regions
(id_region, name_region, population_region, area_region, city_count, province_count, id_country, name_country, geom_region)
 SELECT
  -- County calling (telephon) code
  378 AS id_region,
  'Repubblica di San Marino (Serenissima Repubblica di San Marino)' AS name_region,
  -- Population of Region, as whole persons [wiki data]
  33537 AS population_region,
  -- Area of Region, in meters [61072552.870600 ; wiki: 612000]
  ST_Area(ST_MakePolygon(ST_InteriorRingN(ST_GeometryN(ST_Union(geom_city),1),1))) AS area_region,
  -- Amount of Cities belonging to the Region,
  9 AS city_count,
  -- Amount of Provinces belonging to the Region,
  0 AS province_count,
  -- override default values for id_country and name_country
  378 AS id_country,
  'Republic of San Marino' AS name_country,
  -- Extract from Community of Rome geometry and stored as MULTIPOLYGON
  CastToMulti(ST_MakePolygon(ST_InteriorRingN(ST_GeometryN(ST_Union(geom_city),1),1))) AS geom_region
 FROM admin_cities
 WHERE
 (
  (id_city IN (41033,41035,41060,99003,99010,99014,99020,99025))
 );

Then you can display the admin_regions, with filter id_region=378, in QGIS.
And now we present, (... da,da,da,daaa ...)
  the oldest, serviving, constitutional republic and the the fifth smallest country in the world:
Republic of San Marino - Regon
Creation of the TABLE admin_countries
As a final step you can now create the admin_countries table representing the whole Italian Republic international boundaries.
-- this can take some time [ST_Area, ST_Union etc.], start TRANSACTION
BEGIN;
-- not being the first version of our masterpiece, remove previous version
DROP TABLE IF EXISTS admin_countries;
CREATE TABLE admin_countries
(
 -- based on County calling (telephon) code
 id_country INTEGER NOT NULL PRIMARY KEY,
 name_country TEXT NOT NULL,
 -- Population of Region, as whole persons
 population_country INTEGER DEFAULT 0,
 -- Area of Country, in meters
 area_country DOUBLE DEFAULT 0,
 -- Amount of Cities belonging to the Country,
 city_count INTEGER DEFAULT 0,
 -- Amount of Provinces belonging to the Country,
 province_count INTEGER DEFAULT 0,
 -- Amount of Regions belonging to the Country,
 region_count INTEGER DEFAULT 0
);

SELECT
 AddGeometryColumn('admin_countries', 'geom_country', 32632, 'MULTIPOLYGON', 'XY');
SELECT
 CreateSpatialIndex('admin_countries','geom_country');

-- This geometry is only needed for tutorial purposes, not needed for a real life scenario
SELECT
 AddGeometryColumn('admin_countries', 'bbox_geom_country', 32632, 'MULTIPOLYGON', 'XY');
SELECT
 CreateSpatialIndex('admin_countries','bbox_geom_country');

INSERT INTO admin_countries
(id_country, name_country, population_country, area_country, city_count, province_count, region_count, geom_country)
SELECT
 id_country,
 name_country,
 -- Population of Country, as whole persons
 sum(population_region) AS population_country,
 -- Area of Country, in meters
 ST_Area(ST_Union(geom_region)) AS area_country,
 -- Amount of Cities belonging to the Country,
 sum(city_count) AS city_count,
 -- Amount of Provinces belonging to the Country
 sum(province_count) AS province_count,
 -- Amount of Provinces belonging to the Country
 count(id_region) AS region_count,
 -- Build outer Boundry of the Country, based on Region-Geometries and stored as MULTIPOLYGON
 CastToMultiPolygon(ST_Union(geom_region)) AS geom_country
FROM admin_regions
-- for each UNIQUE id_country in admin_regions, create and INSERT into admin_countries.
GROUP BY id_country;
After the admin_countries has been filled, the not so stricly needed, bbox_geom_country can be created.
Preconditions for this to work correctly is:
  • that the CreateSpatialIndex for geom_country command was executed
-- Such a geometry is only needed for tutorial purposes.
UPDATE admin_countries SET bbox_geom_country =
(
 SELECT
  CastToMultiPolygon
  (
   -- shows Bounding Box as Exterior-Ring, geometry as Interior-Ring
   ST_Difference
   (
    -- SpatialIndex of geometry of the boundries of the country
    BuildMbr
    (
     idx_SpatialIndex.xmin,
     idx_SpatialIndex.ymax,
     idx_SpatialIndex.xmax,
     idx_SpatialIndex.ymin,
     32632
    ),
    -- geometry of the boundries of the country
    geom_country
   )
  )
 FROM idx_admin_countries_geom_country AS idx_SpatialIndex
 WHERE
 (
  (idx_SpatialIndex.pkid=admin_countries.id_country)
 )
);
more details about this Sql-Query can be found in the Sql-HowTo
-- end TRANSACTION
COMMIT;

Then you can display the admin_countries     Shown here with the BoundingBox of the SpatialIndex around Italy, against Map of Ancient Italy around 49 BC.
Italy - Countries

recipe #20: Spatial Views

SpatiaLite supports Spatial Views: any properly defined Spatial View can then be used as any other map layer, i.e. can be displayed using QGIS .

The purpose of the created Geometry is to show:
  • how to create extra regions/countries from other existing components, otherwise not supported by the original data-source
  • this is a one-time solution, based on collected information (i.e not a general solution to a specific problem)
Please note: any SQLite VIEW can only be accessed in read-mode (SELECT);
and obviously such limitation applies to any Spatial View as well (no INSERT, DELETE or UPDATE are supported).
spatialite_gui supports a query composer tool; in this first example we'll this is what we will use.
Using the query composer tool
Step 1: selecting the required tables and columns, and defining the corresponding JOIN condition.
In this first example we'll JOIN the communities and the Provinces tables.
query composer #1

Step 2: now we'll set an appropriate filter clause;
in this case only communities and Provinces belonging to Tuscany Region (region_id = 9) will be extracted.
query composer #2

Step 3: and finally we'll set an appropriate VIEW name: during this latest phase we'll select the Geometry column corresponding to this VIEW
query composer #3

We are now able to display this Spatial View using QGIS (an appropriate thematic rendering was applied so to evidentiate Provinces).
query composer #4

Hand-writing your own Spatial VIEW
CREATE VIEW italy AS
 SELECT
 c.ROWID AS ROWID,
 c.community_id AS community_id,
 c.community_name AS community_name,
 c.population AS population,
 c.geometry AS geometry,
 p.province_id AS province_id,
 p.province_name AS province_name,
 p.car_plate_code AS car_plate_code,
 r.region_id AS region_id,
 r.region_name AS region_name
FROM communities AS c
JOIN provinces AS p ON (c.province_id = p.province_id)
JOIN regions AS r ON (p.region_id = r.region_id);

You are not obligatorily compelled to use the query composer tool.
You are absolutely free to define any arbitrary VIEW to be used as a Spatial View.
INSERT INTO views_geometry_columns
  (view_name, view_geometry, view_rowid, f_table_name, f_geometry_column)
 VALUES ('italy', 'geometry', 'ROWID', 'communities', 'geometry');

Anyway you must register this VIEW into the views_geometry_columns, so to make it become a real Spatial View.
SELECT
 * FROM views_geometry_columns;
view_name view_geometry view_rowid f_table_name f_geometry_column
tuscany geometry ROWID communities geometry
italy geometry ROWID communities geometry

Just a simple check ...
query composer #5
And finally we can display this Spatial View using QGIS (an appropriate thematic rendering was applied so to evidentiate Regions).

recipe #21: Spatial Views - Writable

SpatiaLite supports Spatial Views: any properly defined Spatial View can then be used as any other map layer, i.e. can be displayed using QGIS .

The purpose of the created Geometry is to show:
  • how to create extra regions/countries from other existing components, otherwise not supported by the original data-source
  • this is a one-time solution, based on collected information (i.e not a general solution to a specific problem)
Please note: any SQLite VIEW can only be accessed in read-mode (SELECT);
and obviously such limitation applies to any Spatial View as well (no INSERT, DELETE or UPDATE are supported).


last updated: 2018-10-12