metacatalog
Not logged in

Tutorial: how to build and update a MetaCatalog

back

Downloading the sample dataset

In this tutorial we'll use the standard geonames dataset; more precisely for practical reasons we'll use the shortened version you can download from here: cities1000.zip

Loading the sample dataset

The GeoNames dataset simply is a text file; so the best way to start the import process is the one to create a VirtualText table directly based on the external file.
Please use the following settings:

Creating the final destination table

Now you'll be ready to create the final destination table.
CREATE TABLE geonames (
    geonameid INTEGER NOT NULL PRIMARY KEY,
    name TEXT NOT NULL,
    asciiname TEXT,
    alternatenames TEXT,
    latitude DOUBLE NOT NULL,
    longitude DOUBLE NOT NULL,
    feature_class TEXT NOT NULL,
    feature_code TEXT NOT NULL,
    country_code TEXT NOT NULL,
    cc2 TEXT,
    admin1_code TEXT,
    admin2_code TEXT, 
    admin3_code TEXT,
    admin4_code TEXT,
    population INTEGER, 
    elevation INTEGER,
    dem TEXT,
    timezone TEXT,
    modification_date TEXT);
Then you'll simply populate the destination table by executing this SQL statement:
INSERT INTO geonames
SELECT col001, col002, col003, col004, col005, col006, col007, 
       col008, col009, col010, col011, col012, col013, col014,
       col015, col016, col017, col018, col019
FROM cities1000;
You can now drop the VirtualText table; it's no longer required.
DROP TABLE cities1000;
Just a last final operation; you'll now add and populate a Geometry column corresponding to the Point geographic position.
SELECT AddGeometryColumn('geonames', 'geom', 4326, 'POINT', 'XY');

UPDATE geonames SET geom = MakePoint(longitude, latitude, 4326);
All done; now you are ready to begin this tutorial.

Creating the MetaCatalog

SELECT CreateMetaCatalogTables(1);
-------------
1
You already know this; you simply have to invoke the CreateMetaCatalogTables() function.
This will create both splite_metacatalog and splite_metacatalog_statistics tables; but only splite_metacatalog will be immediately populated.

Populating (or updating) splite_metacatalog_statistics will require some more careful planning:
All right, now you've identified all the columns to be monitored; you could eventually invoke many times the UpdateMetaDataCatalogStatistics() function, one for each single column. It will certainly work, but it will be surely unpractical; even worst, if you'll eventually had to update yet again the MetaCatalog in the next days after applying some editing, you'll then be necessarily forced to repeat the whole sequence.
Happily there is a better alternative.

Creating and populating an helper master_table

CREATE TABLE geonames_helper (
    tblname TEXT,
    colname TEXT);

INSERT INTO geonames_helper (tblname, colname) VALUES ('geonames', 'feature_class');
INSERT INTO geonames_helper (tblname, colname) VALUES ('geonames', 'feature_code');
INSERT INTO geonames_helper (tblname, colname) VALUES ('geonames', 'country_code');
INSERT INTO geonames_helper (tblname, colname) VALUES ('geonames', 'admin1_code');
INSERT INTO geonames_helper (tblname, colname) VALUES ('geonames', 'admin2_code');
INSERT INTO geonames_helper (tblname, colname) VALUES ('geonames', 'admin3_code');
INSERT INTO geonames_helper (tblname, colname) VALUES ('geonames', 'admin4_code');

SELECT * FROM geonames_helper;
------------------------------
tblname         colname
------------------------------
geonames	feature_class
geonames	feature_code
geonames	country_code
geonames	admin1_code
geonames	admin2_code
geonames	admin3_code
geonames	admin4_code
All right; now you've permanently registered all columns intended to be monitored in the MetaCatalog.

Populating / Updating the MetaCatalog

SELECT UpdateMetaCatalogStatistics(1, 'geonames_helper', 'tblname', 'colname');
You can now invoke the UpdateMetaCatalogStatistics() function in a more convenient form:

Conclusions

If you are specifically interested in creating or validating datasets intended for public dissemination using the MetaCatalog could be an interesting option.
It supports in a very quick and simple way many analytics processes, it's very flexible and highly configurable, and it supports a persistent configuration.
Statistics data collected in the MetaCatalog can substantially help many complex validation processes, and could eventually make simpler, faster and safer many activities widely based on manual input or in integrating/merging data coming from disparate sources.



back