View Ticket
Not logged in
Ticket Hash: 000b0fb3fda6d0df6e7319414f069914ade0df7f
Title: Discrepancy in KNN function
Status: Fixed Type: Incident
Severity: Important Priority: Immediate
Subsystem: Resolution: Fixed
Last Modified: 2016-12-10 13:36:08
Version Found In: 4.4.0-dev
User Comments:
mj10777 added on 2016-09-16 09:20:44:

This problem was first reported in June, with a database supplied later.
https://groups.google.com/forum/#!topic/spatialite-users/gMwlHjl4rnI


After receiving a copy of the database which used the table
    Ashland_att with 325359 records
One query returns correct results, another no results
SELECT 
 10000 AS slink,
 aa.slink AS adj_slink,
 k.distance,
 AsEWKT(aa.shape)
FROM 
 ashland_att aa JOIN knn k
 ON k.fid=aa.rowid
WHERE 
 k.f_table_name='ashland_att' AND
 k.ref_geometry=(SELECT shape FROM ashland_att WHERE slink=10000) AND
 k.max_items=20 AND
 aa.slink<>10000 
GROUP BY k.pos
HAVING k.distance=0
;

-- brings no results ( expected results (8))
SELECT
 aa.ROWID AS id_row,
 277838 AS slink,
 aa.slink AS adj_slink,
 k.distance,
 AsEWKT(aa.shape)
FROM ashland_att aa JOIN knn k
 ON k.fid=aa.rowid
WHERE 
 k.f_table_name='ashland_att' AND
 k.ref_geometry=(SELECT shape FROM ashland_att WHERE slink=277838) AND
 k.max_items=20 AND
 aa.slink<>277838 
GROUP BY k.pos
HAVING k.distance=0
;
the following query will show the expected results of slink=277838
Note: in this case the value of rowid=the PK objectid = slink
SELECT
 a.ROWID AS id_row,
 a.slink AS adj_slink,
 ST_DISTANCE(a.shape,b.shape),
 AsEWKT(a.shape) 
FROM ashland_att a,ashland_att b
WHERE 
 ST_TOUCHES(a.shape,b.shape) AND
b.slink=277838
;
The following query shows that the index contains the needed records
rowid is taken from the previous result.
SELECT * FROM "idx_ashland_att_shape" 
WHERE 
 rowid IN (277838,277457,277458, 277459,277837, 277839,278217, 278218, 278219)

The following debugging coded was added to:
    vknn_query_callback
    vknn_init_context
-- vknn_init_context
printf("-I-> vknn_init_context: SELECT ST_Distance(0, BuildMbr(%2.5f, %2.5f, %2.5f, %2.5f));\n",ctx->minx,ctx->miny,ctx->maxx,ctx->maxy);

--vknn_query_callback
    switch (info->iRowid)
    {
     // 277838 records are not being found
     case 277838:
      printf("-WW-> vknn_query_callback query_no_results[%ld]: SELECT ST_Distance(0, BuildMbr(%2.5f, %2.5f, %2.5f, %2.5f));\n",(long)info->iRowid,ctx->minx,ctx->miny,ctx->maxx,ctx->maxy);
     break;
     // not being found
     case 277457:
     case 277458:
     case 277459:
     case 277837:
     case 277839:
     case 278217:
     case 278218:
     case 278219:
      printf("-W-> vknn_query_callback [%ld]: SELECT ST_Distance(0, BuildMbr(%2.5f, %2.5f, %2.5f, %2.5f));\n",(long)info->iRowid,ctx->minx,ctx->miny,ctx->maxx,ctx->maxy);
     break;
     // 10000 records are being found
     case 10000:
      printf("-II-> vknn_query_callback query_with_results[%ld]: SELECT ST_Distance(0, BuildMbr(%2.5f, %2.5f, %2.5f, %2.5f));\n",(long)info->iRowid,ctx->minx,ctx->miny,ctx->maxx,ctx->maxy);
     break;
     case 9564:
     case 9565:
     case 9566:
     case 9999:
     case 10001:
     case 10443:
     case 10444:
     case 10445:
      printf("-I-> vknn_query_callback[%ld]: SELECT ST_Distance(0, BuildMbr(%2.5f, %2.5f, %2.5f, %2.5f));\n",(long)info->iRowid,ctx->minx,ctx->miny,ctx->maxx,ctx->maxy);
     break;
    }
In the query that works correctly, the following can be seen:
-I-> vknn_init_context: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[9564]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[9565]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[9566]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[9999]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-II-> vknn_check_mbr query_with_results[10000]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[10001]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[10443]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[10443]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[10444]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
-I-> vknn_query_callback[10445]: SELECT ST_Distance(0, BuildMbr(-783090.00000, 2507400.00000, -783000.00000, 2507490.00000));
In the query that brings no results, nothing can be seen from inside vknn_query_callback
-I-> vknn_init_context: SELECT ST_Distance(0, BuildMbr(-792990.00000, 2559420.00000, -792900.00000, 2559510.00000));
    it is running, but not with the expected records
Conclusion:
    vknn_query_callback is not receiving the expected records

When creating a new copy of the table with call of the records:
    same result
When creating a sub-set
    with only the expected results of both queries
    CREATE TABLE Ashland_att_test 
    (
     objectid integer primary key autoincrement, 
     original_objectid long,
     grid_objectid long,
     slink long,
     row_num long,
     col_num long,
     stand_id text,
     area double,
     ht_grp text,
     delim_open text,num_lifeform text,
     species text,size_class text,density text,
     process text,process_t text,treatment text,treat_t text,tracking_species text,delim_close text,
     ownership text,
     road text,fire text,fmz text,sp_area text,landtype text,
     point_x double,point_y double,elev long
    );
    SELECT AddGeometryColumn('Ashland_att_test','shape',5070,'POLYGON','XY');
    SELECT CreateSpatialIndex('Ashland_att_test','shape');
    INSERT INTO Ashland_att_test 
     (original_objectid, grid_objectid, slink, row_num, col_num, stand_id, area, ht_grp, delim_open, num_lifeform, species, size_class, density, process, process_t, treatment, treat_t, tracking_species, delim_close, ownership, road, fire, fmz, sp_area, landtype, point_x, point_y, elev, shape)
    SELECT
     original_objectid, grid_objectid, slink, row_num, col_num, stand_id, area, ht_grp, delim_open, num_lifeform, species, size_class, density, process, process_t, treatment, treat_t, tracking_species, delim_close, ownership, road, fire, fmz, sp_area, landtype, point_x, point_y, elev, shape
    FROM ashland_att
    WHERE rowid IN (10000, 277838,277457,277458, 277459,277837, 277839,278217, 278218, 278219, 9564, 9565, 9566, 9999,10001, 10443, 10444, 10445);
    SELECT UpdateLayerStatistics('Ashland_att_copy','shape');
    
both query return the expected results
Note: in this case the value of rowid=the PK objectid
      slink retains it's original value

I could find no fault with the database.
Under certain conditions, the queries work correctly

Conclusion:
Something incorrectly happens before
    vknn_query_callback
is called with a large table as input.

To quote the King of Siam:
    It is a puzzlement ...


sandro added on 2016-12-10 13:36:08:
the previous implementation of KNN was plagued by several unverified assumptions about the general layout of an SQLite's R*Tree.
the net effect was that in some cases many potentially interesting branches of the Tree were badly ignored and not properly expanded as it ought be, thus causing many valid features to become completely invisible.

now the KNN module has been strongly refactored and is supposed to be more robust and accurate than before.