gml-topo
Not logged in

XML Tools Tutorial: how to process a Complex GML-Topology dataset

Back to main page

Downloading the sample dataset

In this example we'll use a rather small but anyway complete GML-Topology sample kindly contributed by Tuscany Region (many thanks to Dr. Maurizio Trevisani).

Loading into the DBMS

$ spatialite_xml_load -x GML_RT98030316-3-328-4.xml -d rt-sample.sqlite -cg -xl -pl 2 -nl 1
SQLite version: 3.8.5
SpatiaLite version: 4.2.0-rc2
Target DB: rt-sample.sqlite
Input XML: GML_RT98030316-3-328-4.xml
Done - inserted/updated rows: 589652 [0:00:26.070]
$
This first invocation of spatialite_xml_load will create and initialize a new DB-file, and will import into it the main Topology.
$ spatialite_xml_load -x GML_RT98030316-3-328-4_Curve_Livello.xml -d rt-sample.sqlite -cg -xl -pl 2 -nl 1
SQLite version: 3.8.5
SpatiaLite version: 4.2.0-rc2
Target DB: rt-sample.sqlite
Input XML: GML_RT98030316-3-328-4_Curve_Livello.xml
Done - inserted/updated rows: 14922 [0:00:04.586]

$ spatialite_xml_load -x GML_RT98030316-3-328-4_Punti_Quotati.xml -d rt-sample.sqlite -cg -xl -pl 2 -nl 1
SQLite version: 3.8.5
SpatiaLite version: 4.2.0-rc2
Target DB: rt-sample.sqlite
Input XML: GML_RT98030316-3-328-4_Punti_Quotati.xml
Done - inserted/updated rows: 68827 [0:00:04.914]
$
And finally you simply have to load the last two GML files into the same DB-file.

direct check

Start a spatialite CLI or spatialite_gui session and connect to the DB-file you've just created:
  1. As you can see, there are lots of tables; the XML tree supporting this GML schema is rather complex, but this simply is because this one is a real-world sample corresponding to an high-quality vector map.
    Recall: you can query both xml_metacatalog_tables and xml_metacatalog_columns in order to correctly reconstruct the whole tree hierarchy.
  2. we'll now focus our attention just on the more relevant topology items:
    • Topology Nodes:
      • the dbtopofeaturemembers_edge_directednode_node table does contain all Nodes; each individual Node is expected to declare its own unique identifier (gml_id column).
      • Topology Edges:
        • the dbtopofeaturemembers_topocurve_directededge_edge table does contain all gml_id values uniquely identifying each single Edge.
        • the dbtopofeaturemembers_directededge_edge_curveproperty (child of the above one) does contains the collapsed geometry corresponding to each Edge.
        • the dbtopofeaturemembers_directededge_edge_directednode table (sibling of the above one) does contain the start-Node and end-Node for each Edge.
        • Please note: sometimes an Edges could reference a Node via a direct parent-child relationship; but in many other cases this relation is expressed in an indirect way using gml_id and xlinl_href cross references.
        • Please note well: the coordinates for each Node are always expressed in a virtual way, i.e. they are assumed to coincide with the first (or respectively last) point of the Edge.
      • Topology Faces
        • the dbtopofeaturemembers_toposurface_directedface_face table does contain all gml_id values uniquely identifying each single Face.
        • the dbtopofeaturemembers_directedface_face_directededge table (child of the above one) does enumerates all Edges delimiting each Face; every Edge is always indirectly identified by its xlink_href value.
        • Please note: all Faces will never have an explicit Geometry. Accordingly to Topology principles such Geometry will always be indirectly determined by their delimiting Edges.


    few useful Topology validations via pure SQL processing

    extracting all Topology Nodes

    Recall: all Topology Nodes are simply represented as pure virtual entities uniquely identified by a gml_id value.
    So we'll be necessarily required to scan all Edges (and their corresponding Geometries) in order to extract the physical Nodes with an appropriate Geometry.
    CREATE TABLE aux_nodes AS
    SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'start' AS role, 
      ST_StartPoint(eg.from_gml_geometry) AS NodeGeom
    FROM dbtopofeaturemembers_topocurve_directededge_edge AS e
    JOIN dbtopofeaturemembers_directededge_edge_curveproperty 
      AS eg ON (e.node_id = eg.parent_id)
    JOIN dbtopofeaturemembers_directededge_edge_directednode 
      AS nr ON (e.node_id = nr.parent_id AND 
        nr.orientation = '-' AND nr.xlink_href IS NULL)
    JOIN dbtopofeaturemembers_edge_directednode_node
      AS n ON (nr.node_id = n.parent_id);
    
    This first SQL query will create and feed an auxiliary table containing all start Nodes associated to their corresponding Edges via direct parent-child relations.
    INSERT INTO aux_nodes
    SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'start' AS role, 
      ST_StartPoint(eg.from_gml_geometry) AS NodeGeom
    FROM dbtopofeaturemembers_topocurve_directededge_edge AS e
    JOIN dbtopofeaturemembers_directededge_edge_curveproperty 
      AS eg ON (e.node_id = eg.parent_id)
    JOIN dbtopofeaturemembers_directededge_edge_directednode 
      AS nr ON (e.node_id = nr.parent_id AND 
        nr.orientation = '-' AND nr.xlink_href IS NOT NULL)
    JOIN dbtopofeaturemembers_edge_directednode_node
      AS n ON (nr.xlink_href = n.gml_id);
    
    This second SQL query will append into the same auxiliary table all start Nodes associated to their corresponding Edges via indirect gml_id / xlink_href relations.
    INSERT INTO aux_nodes
    SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'end' AS role, 
      ST_EndPoint(eg.from_gml_geometry) AS NodeGeom
    FROM dbtopofeaturemembers_topocurve_directededge_edge AS e
    JOIN dbtopofeaturemembers_directededge_edge_curveproperty 
      AS eg ON (e.node_id = eg.parent_id)
    JOIN dbtopofeaturemembers_directededge_edge_directednode 
      AS nr ON (e.node_id = nr.parent_id AND 
        (nr.orientation <> '-' OR nr.orientation IS NULL) AND
         nr.xlink_href IS NULL)
    JOIN dbtopofeaturemembers_edge_directednode_node
      AS n ON (nr.node_id = n.parent_id);
    
    This third SQL query will continue in appending into the same auxiliary table all end Nodes associated to their corresponding Edges via direct parent-child relations.
    INSERT INTO aux_nodes
    SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'end' AS role, 
      ST_EndPoint(eg.from_gml_geometry) AS NodeGeom
    FROM dbtopofeaturemembers_topocurve_directededge_edge AS e
    JOIN dbtopofeaturemembers_directededge_edge_curveproperty 
      AS eg ON (e.node_id = eg.parent_id)
    JOIN dbtopofeaturemembers_directededge_edge_directednode 
      AS nr ON (e.node_id = nr.parent_id AND 
        (nr.orientation <> '-' OR nr.orientation IS NULL) AND 
         nr.xlink_href IS NOT NULL)
    JOIN dbtopofeaturemembers_edge_directednode_node
      AS n ON (nr.xlink_href = n.gml_id);
    
    And this last SQL query will finally append into the same auxiliary table all end Nodes associated to their corresponding Edges via indirect gml_id / xlink_href relations.
    All right, we've now extracted all start/end Nodes for all Edges; so we can now continue into our validation process.

    validating the Edges/Nodes

    SELECT e1.EdgeId, e1.NodeId AS StartNode, e1.NodeGeom AS StartNodeGeom, 
      e2.NodeId AS EndNode, e2.NodeGeom AS EndNodeGeom
    FROM aux_nodes AS e1
    LEFT JOIN aux_nodes AS e2 ON (e2.role = 'end' AND e1.EdgeId = e2.EdgeId)
    WHERE e1.role = 'start'
    ORDER BY e1.EdgeId;
    
    This first query is mainly intended for didactic purposes, and will return the full list of all Edges with the corresponding Start/EndNodes.
    SELECT n1.NodeId, n1.EdgeId, n2.EdgeId
    FROM aux_nodes AS n1, aux_nodes AS n2
    WHERE n1.NodeId = n2.NodeId AND n1.EdgeId <> n2.EdgeId AND ST_Equals(n1.NodeGeom, n2.NodeGeom) <> 1;
    
    This second query is certainly more interesting for validation purposes; it will return an empty resultset, so we can safely conclude that all Topology Nodes are in a fairly self-consistent state.
    SELECT a.Cardinality AS Cardinality, Count(*) AS Frequency
    FROM (
      SELECT NodeId, Count(*) AS Cardinality
      FROM aux_nodes
      GROUP BY NodeId) AS a
    GROUP BY a.Cardinality;
    -----------------
    2	3388
    3	5451
    4	600
    5	15
    
    This last query is interesting as well, because it reports a quick cardinality analysis:

    materializing virtual Face geometries

    Recall: all Topology Faces are simply described by the corresponding delimiting Edges; there is no explicitly set Geometry, it's always a purely virtual Geometry.
    SELECT f.gml_id AS FaceId, ST_Polygonize(e.from_gml_geometry) AS FaceGeom
    FROM dbtopofeaturemembers_toposurface_directedface_face AS f
    JOIN dbtopofeaturemembers_directedface_face_directededge AS er
      ON (f.node_id = er.parent_id)
    JOIN dbtopofeaturemembers_topocurve_directededge_edge AS de
      ON (de.gml_id = er.xlink_href)
    JOIN dbtopofeaturemembers_directededge_edge_curveproperty AS e
      ON (de.node_id = e.parent_id)
    GROUP BY f.gml_id;
    
    Anyway we could eventually materialize all Face Geometries by just executing a rather trivial SQL query like the above one.

    Quick conclusions

    SQL and Spatial SQL are powerful and very flexible processing tools; and SQL scripting is a smart technique allowing to nicely automatize many boring repetitive tasks.
    After loading an XML document into a DBMS you'll be absolutely free to deploy all the incredible firepower of SQL (and SQL scripts) at its best.
    This should not always be a plain and easy task: but it surely is a powerful instrumentation you could successfully deploy so to resolve many complex problems as e.g. thoroughly validating some not trivial and may be huge dataset.



    back to main page