Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Artifact ID: | bce1f7e1d78f60c7dbbd90bf12162af7009d3ab6 |
---|---|
Page Name: | KNN |
Date: | 2015-12-10 16:39:08 |
Original User: | sandro |
Parent: | 3892c7d2428ebcc57b5e315d7343065389b82e55 (diff) |
Next | 806fd4da03659c33c3e154fa0d17cf6cf6a9056f |
Content
A first naive (and very inefficient) approachSELECT geonameid, name, country, Min(ST_Distance(MakePoint(10, 43), geom, 1)) / 1000.0 AS dist_km FROM airports; -------------------------------- 6299623 | Marina Di Campo | IT | 33.043320 SELECT geonameid, name, country, Min(ST_Distance(MakePoint(10, 43), geom, 1)) / 1000.0 AS dist_km FROM airports WHERE geonameid NOT IN (6299623); --------------------------------- 6299392 | Bastia-Poretta | FR | 65.226573 SELECT geonameid, name, country, Min(ST_Distance(MakePoint(10, 43), geom, 1)) / 1000.0 AS dist_km FROM airports WHERE geonameid NOT IN (6299623, 6299392); --------------------------------- 6299628 | Pisa / S. Giusto | IT | 82.387014 SELECT geonameid, name, country, Min(ST_Distance(MakePoint(10, 43), geom, 1)) / 1000.0 AS dist_km FROM airports WHERE geonameid NOT IN (6299623, 6299392, 6299628); --------------------------------- 6299630 | Grosseto Airport | IT | 91.549773 SELECT geonameid, name, country, Min(ST_Distance(MakePoint(10, 43), geom, 1)) / 1000.0 AS dist_km FROM airports WHERE geonameid NOT IN (6299623, 6299392, 6299628, 6299630); --------------------------------- 6694495 | Corte | FR | 102.819778We could effectively execute a first SQL query using the Min() aggregate function in order to identify the closest airport to an arbitrary position (latitude=43.0, longitude=10.0) located in the middle of the Tyrrhenian Sea. Then we could eventually repeatably execute the identical query, each time excluding all airports we've already identified in the previous processed steps. The final result will be the full list of the first five airports closest to the reference location ordered by increasing distance.
Such a simple approach would be highly impractical, because it strictly requires to hand-write several ad hoc SQL queries one by one. Writing a fully automated SQL script isn't at all a simple task, and some kind of higher level scripting (e.g. Python) would be easily required in any realistic scenario. The automation of all required SQL queries isn't the most critical issue we have to face; the real, critical, bottle neck with this simple approach, is that each of these queries will cause an awful full table scan This will be the cause of an intolerably sluggish performance, and the overall performance will quickly become worse as the dataset progressively increases in size. Conclusion: Some simpler and smarter approach is surely required: possibly one taking full profit from an R*TRee Spatial Index supporting the Geometries to be searched. VirtualKNNStarting since version 4.4.0 SpatiaLite supports a VirtualKNN Virtual Table specifically intended as a complete and highly efficient solution to the KNN problem.CREATE VIRTUAL TABLE knn USING VirtualKNN();Every new db-file being created by 4.4.0 itself will always automatically define a KNN virtual table. You could eventually add full KNN support to any db-file created by any earlier version of SpatiaLite by manually executing the above SQL statement. Note: VirtualKNN necessarily requires 4.4.0 binary support, so any attempt to open a db-file including a VirtualKNN table by using some previous version (<= 4.3.0a) will surely raise an error condition. A first basically simple KNN querySELECT * FROM knn WHERE f_table_name = 'airports' AND ref_geometry = MakePoint(10, 43);
A second example of a more sophisticated KNN querySELECT a.pos, b.name, b.country, a.distance / 1000.0 AS dist_km FROM knn AS a JOIN airports AS b ON (b.geonameid = a.fid) WHERE f_table_name = 'airports' AND ref_geometry = MakePoint(10, 43) AND max_items = 5;
This second KNN query will return the same identical results we already got before by using the silly naive approach. The striking difference is that this KNN query successfully completed in just a fraction of a second, whilst executing the first inefficient queries required many seconds (and please consider that this one is a relatively small dataset). Lesson to learn: KNN queries are really efficient and fast, because they directly interact with the lowermost levels of the R*Tree Spatial Index implementation. A third example of KNN querySELECT a.pos AS rank, b.geonameid, b.name, b.country, a.distance / 1000.0 AS dist_km FROM knn AS a JOIN airports AS b ON (b.geonameid = a.fid) WHERE f_table_name = 'airports' AND ref_geometry = MakePoint(-17.3, -44) AND max_items = 5;
This last KNN query is very similar to the previous one. This time we've placed the origin location somewhere into the South Atlantic blue deep waters (more or less midway from both Brazil and South Africa) and consequently the nearest airports are not really so near, because they are located many thousands Km away. As you can easily check by yourself a KNN query will brilliantly perform even under such unusual conditions. ![]() Lesson to learn: KNN queries never assume any predefined distribution of the searched items, and can automatically adapt in a very efficient way to the most irregular sample distributions. Advanced tutorialYou can eventually download a sample db-file exactly corresponding to the practical examples we'll use in this advanced tutorial.This is a much more demanding dataset corresponding to a collection of about 2.4 million house numbers. The original input shapefile was downloaded from Tuscany Region (Grafo Stradale civici.shp) and was subsequently rearranged into a more convenient form. The dataset is released under the CC-BY-SA 4.0 license terms. The reference system is 3003 Monte Mario / Italy zone 1; this is a planar (projected) reference system measured in metres. SELECT a.pos AS rank, a.fid AS fid, a.distance AS dist_m, d.prov AS province, d.nome AS municipality, c.toponimo AS street_name, b.civico AS house_num, b.geom AS geom FROM knn AS a JOIN civici AS b ON (b.rowid = a.fid) JOIN toponimi AS c ON (c.fid = b.id_toponimo) JOIN comuni AS d ON (d.fid = c.id_comune) WHERE a.f_table_name = 'civici' AND a.ref_geometry = MakePoint(1733003, 4816332, 3003) AND a.max_items = 10; SELECT a.pos AS rank, a.fid AS fid, a.distance AS dist_m, b.prov AS province, b.comune AS municipality, b.toponimo AS street_name, b.civico AS house_num, b.geom AS geom FROM knn AS a JOIN vw_civici AS b ON (b.rowid = a.fid) WHERE a.f_table_name = 'vw_civici' AND a.ref_geometry = MakePoint(1733003, 4816332, 3003) AND a.max_items = 10;The following map corresponds to the above queries: the origin point is positioned on a road junction in the central area of a densely populated town (Arezzo). Not surprisingly the ten nearest house numbers have been located within a radius of about 10m. ![]() Both queries will return the same identical results (you can easily check by yourself). Anyway if you pay close attention you'll easily discover that the second query is completely based on a Spatial View Lesson to learn: a VirtualKNN query can indifferently target either a GeoTable or a properly registered Spatial View. SELECT a.pos AS rank, a.fid AS fid, a.distance AS dist_m, b.prov AS province, b.comune AS municipality, b.toponimo AS street_name, b.civico AS house_num, b.geom AS geom FROM knn AS a JOIN vw_civici AS b ON (b.rowid = a.fid)WHERE a.f_table_name = 'civici' AND a.ref_geometry = MakePoint(1595625, 4767420, 3003) AND max_items = 20;The following map corresponds to the above query. Now the origin point is positioned in the Tyrrhenian Sea; the Tuscany mainland, the Island of Elba and the Island of Capraia are as far as about 26 Km. Anyway in this case too VirtualKNN confirms to be able to identify the nearest house numbers in just a fraction of a second. ![]() Lesson to learn: VirtualKNN is always highly efficient and very fast. Even when querying an huge dataset, and when exploring the most contrasting data distributions (very high density / very low density).
|