XML Tools Tutorial: how to process a Simple GML datasetBack to main page
Downloading the sample datasetIn this first example we'll use a simple GML dataset being published via WFS by Regione Sardegna (Italy) under the IODL v.2 licence terms (equivalent to CC-BY-SA).
- point your web browser at this request URL:
- then (when the download completes) save the GML document on your local files-system as poi-sardegna.gml
transforming the XML document into a SQLite/Spatialie DB-file (#1)
$ spatialite_xml_load -x poi-sardegna.gml -d poi0.sqlite SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Target DB: poi0.sqlite Input XML: poi-sardegna.gml Done - inserted/updated rows: 62751 [0:00:01.856] $In this first attempt we'll use no special argument at all; we'll simply rely on standard basic defaults:
- the -x argument specifies the path of the XML/GML file to be imported (poi-sargegna.gml).
- the -d argument specifies the path of the destination SQLite/SpatiaLite DB-file (poi1.sqlite).
- if the DB-file does not exist it will be created and properly initialized.
- if the DB-file already exists an attempt will be performed in order to append all new XML/GML nodes to the ones already contained within the DB-file.
Note well: this will work only if all the XML files being subsequently imported into the same DB-file strictly adopt the same identical schema specification.
transforming the XML document into a SQLite/Spatialie DB-file (#2)
$ spatialite_xml_load -x poi-sardegna.gml -d poi1.sqlite -nl 2 -pl 1 SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Target DB: poi1.sqlite Input XML: poi-sardegna.gml Done - inserted/updated rows: 62751 [0:00:01.840] $In this second attempt we'll apply two further arguments in order to limit the length of table names:
- the -nl argument specifies the XML tree base-level; in this case we've specified 2 simply because this is the level corresponding to the <dbu:POIGENERICI> XML tag (the root node always corresponds to level 0). We can safely ignore any ancestor tag before this tree level because they'll surely be not relevant in causing table names collisions.
- the -pl argument specifies how many ancestors are required so to avoid any possible collision in table names; in this first we'll attempt to request just a single ancestor.
Now the import tool will automatically compose all table names using no more than three items, i.e. as: BaseLevelTag_ParentTag_ChildTag
transforming the XML document into a SQLite/Spatialie DB-file (#3)
$ spatialite_xml_load -x poi-sardegna.gml -d poi2.sqlite -nl 2 -pl 2 SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Target DB: poi2.sqlite Input XML: poi-sardegna.gml Done - inserted/updated rows: 62751 [0:00:01.778] $This third attempt is exactly the previous one: except in that this time we've specified -pl 2:
- Now the import tool will automatically compose all table names using no more than four items, i.e. as: BaseLevelTag_GrandParentTag_ParentTag_ChildTag
transforming the XML document into a SQLite/Spatialie DB-file (#4)
$ spatialite_xml_load -x poi-sardegna.gml -d poicg.sqlite -nl 2 - pl 1 -cg SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Target DB: poicg.sqlite Input XML: poi-sardegna.gml Done - inserted/updated rows: 58260 [0:00:01.880] $This final attempt is exactly the same as #3; but this time we've added a further -cg argument, thus requesting to immediately collapse any GML geometry item.
- when no special precaution is applied (poi0.sqlite) you'll discover that some tables could have very long names, as e.g.
Please note: this GML sample is basically simple; things may easily go much more bad that this in many other real-world cases.
- in this specific case by applying both -pl 2 and -nl 1 we'll get significantly shorter table names, as e.g.
- by applying both -pl 2 and -nl 2 we'll get slightly longer table names, as e.g.
- Please note well: the required minimal number of ancestors (-pl) strictly depends on the intrinsic structure of the GML document being parsed:
- Requesting to use too few ancestors will produce table names collisions, and will cause the import operation to fail.
- Requesting to use too much ancestors will produce unnecessarily long table names.
- As a rule of the thumb, you should always start by just requesting -nl 1; and in case of any failure you'll then increase this value until final success.
- Activating the -cg option is always warmly recommended while importing any GML file, because it will greatly reduce the overall complexity of the target DB-file, most notably when there are many complex Linestrings or Polygons.
direct checkNow start a spatialite CLI or spatialite_gui session, and familiarize yourself with all DB-files created by spatialite_xml_load:
- check both xml_metacatalog_tables and xml_metacatalog_columns, and discover how the XML tree was mapped into corresponding relationally joined DBMS tables.
- test poi1.sqlite and check how raw GML geometries were loaded into the DBMS; explore the following tables:
- now test poicg.sqlite and check how the the same GML geometries were loaded into the DBMS in the collapsed form:
- explore the poigenerici_punto table.
- now the node_value will contain a collapsed representation of the whole GML geometry.
- a further from_gml_geometry will now contain a Geometry BLOB.
Please note: such geometries are intentionally not properly registered in geometry_columns simply because the GML standard potentially allows to use different SRID values for each single geometry feature.
Final post-processing: collapsing the whole DBMS
$ spatialite_xml_collapse -d poicg.sqlite -dd -nl 2 SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Collapsing <gml:Envelope><gml:lowerCorner> Collapsing <gml:Envelope><gml:upperCorner> Collapsing <gml:Envelope><gml:lowerCorner> Collapsing <gml:Envelope><gml:upperCorner> Collapsing <dbu:POIGENERICI><dbu:IDFEATURE> Collapsing <dbu:POIGENERICI><dbu:NOME> Collapsing <dbu:POIGENERICI><dbu:CODICEISTACOMUNALE> Collapsing <dbu:POIGENERICI><dbu:COMUNE> Collapsing <dbu:POIGENERICI><dbu:MACROTIPOLOGIAPOI> Collapsing <dbu:POIGENERICI><dbu:TIPOLOGIAPOI> Collapsing <dbu:POIGENERICI><dbu:LINKSCHEDA> Collapsing <dbu:POIGENERICI><dbu:PUNTO> No duplicated rows found in: boundedby_envelope No duplicated rows found in: member_poigenerici Recovering Geometry: member_poigenerici.from_gml_geometry Creating Spatial Index: member_poigenerici.from_gml_geometry
- the spatialite_xml_collapse tool performs a final, irreversible post-processing on behalf of the whole DB-file.
- the -d argument specifies the path of the target SQLite/SpatiaLite DB-file (poigc.sqlite).
- the -dd arguments stands for delete duplicates: this is an useful option when you've subsequently imported more XML files into the same DB-file, thus possibly introducing duplicate entries.
- the -nl argument exactly has the same identical meaning as for spatialite_xml_load.
Please note: applying yet again the same identical -nl argument passed to spatialite_xml_load is warmly recommended.
how collapsing works
- collapsing is a recursive process.
- each leaf node will be evaluated; if the corresponding parent node has just this single child, then the child will be merged into the parent, and the now useless child will be definitely removed from the tree.
- when any parent node has multiple children nodes the collapsing process stops and no further propagates.
- Please note: collapsing is an irreversible activity because nodes and tables will be definitely removed from the tree.
- Final post-processing: once all possible nodes have been collapsed, than an attempt will be made in order to identify any possible layer: i.e. any table containing Geometries of the same type/SRID.> In this case the Geometry will be properly registered in geometry_columns, and a corresponding Spatial Index will be created.
direct checkStart a spatialite CLI or spatialite_gui session and connect to the collapsed DB-file:
- check xml_metacatalog_tables; you can easily verify by yourself how the whole collapsing process has been faithfully mapped.
- check the member_poigenerici table: this is a genuine layer; e.g. you could directly export it as a Shapefile.
Exporting yet again an XML document form the DBMS
$ spatialite_xml_print -d poi1.sqlite -x export.xml SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Input DB: poi1.sqlite $
- the spatialite_xml_print toll is intended to export the whole DB-file content as an XML document.
- the -d argument specifies the path of the input SQLite/SpatiaLite DB-file (poi1.sqlite).
- the -x argument specifies the path of the output XML document (export.xml).
- Please note: exporting yet again an XML document could eventually be an useful option when you've eventually applied by pure SQL some validation / editing / correction to the initial dataset.
how it works
- both xml_metacatalog_tables and xml_metacatalog_columns faithfully preserve all detailed informations describing the initial XML layout.
- the layout adopted for each table faithfully corresponds to the initial XML syntax.
- so it's not at all surprising to discover that exporting an XML document from the DBMS is a perfectly reversible operation.
- Please note well: a collapsed DB will be no longer able to adequately support spatialite_xml_print, simply because it has been massively rearranged thus irreversibly destroying any direct correspondence between XML and DBMS items.
back to main page