last updated: 2019-12-17



quick technical intro

Quick Techical Introduction: in technical terms, SpatiaLite is a Spatial DBMS supporting international standards such as SQL92 and OGC-SFS.

Such a terminology may sound obscure and (possibly) a bit overwhelming to you. Don't be afraid: very often, behind such technical jargon, a simple to understand concept exists:
  • a DBMS [Database Management System] is software designed to store and retrieve arbitrary data in a efficient and generalized way..
    Actually, it is nothing more than a storage place for lots of huge, highly structured, complex data.
  • SQL [Structured Query Language] is a standardized language supporting DBMS handling:
    using SQL statements is
    • how new storage areas are created and organized [CREATE]
    • how new data is added [INSERT] to the storage areas
    • how existing data is altered [UPDATE] in the storage areas
    • how obsolete data is removed [DELETE] from the storage areas
    • how data is searched for or retrieved (called query) [SELECT] from the storage areas
    After learning how to use these 5 major commands, you will be able to perform most of the major tasks needed for a DBMS.
    SQL statements enable you to use your data in a very flexible (and efficient) way.
  • OGC-SFS [Open Geospatial Consortium - Simple Feature Specification]
    defines the extension of the basic DBMS / SQL capabilities, to support a special Geometry data-type, thus allowing the creation of a so-called Spatial DBMS

SpatiaLite is an extension of the very popular SQLite, which is a lightweight, file-based, DBMS.
SQLite implements a standard SQL92 data engine (using the 5 major commands: CREATE, INSERT, UPDATE, DELETE, SELECT)
as does other Relational Database Management Systems such as PostgreSQL, MySql, Oracle and others
>
SpatiaLite implements the standard OGC-SFS core. as an extension of SQLite
in the same way as PostGIS extends PostgreSQL
Both, taken together, give you a complete Spatial DBMS
since both SpatiaLite and PostGIS are extensions, they cannot work without the DBMS that they are based on.

Since both SQLite and PostgreSQL both implement the standard SQL92 data engine
just as SpatiaLite and PostGIS implements the standard OGC-SFS core as extensions
What is the difference between the them?
The simplest answer may be: 'All roads lead to Rome'
  • one road, winding through a steep mountain path, may be fine for a single rider
  • another road may be better for a carriage transporting a group of people with luggage
There are 2 major types of 'Relational Database Management Systems'
  • one for single user / single application / standalone workstation
    the category to which SQLite/SpatiaLite belongs to
  • another for client-server architecture
    the category to which PostgreSQL/PostGIS belongs to
So the major question should never be: is one 'better than the other?' , but 'how will the results be used?'.
In most cases the choice will be clear: either a standalone or client-server.

A third option is also possible: both.
This is the case for Regione Toscana, not only a major funder of the Spatialite Project, but also a partial funder of PostGIS.
Based on its internal needs, both systems are used.
Since both systems are based on the same standards ( SQL92 and OGC-SFS), the results of both are the same.
The methods to achieve these results may (and do) differ, but the results are (and must be) the same.

A SQLite/SpatiaLite solution offers the following:
  • a file-based database (maybe, one containing several million entities)
  • SQL engine, with Spatial-capabilities, is directly embedded within the solution
  • cabability to convert, emulate and transfer data from/to PostGIS, FDO (GDAL), GeoPackage, Shapefiles and other Spatial-Formats
  • reading and importing of other common formats such as TXT/CSV, DBF and Legacy-MS Excel
  • cross-platform portable: not only are the database-files are easily transferable, but also the libraries to use them exist on most platforms
  • stand-alone applications can be created, using the library's functionality, to create fine-tuned solutions

Conditions where a SQLite/SpatiaLite solution may not be the best choice:
  • for support for multiple concurrent access, a client-server DBMS, such as PostgreSQL/PostGIS, is required

Other useful references:

getting started [installing the software]

Note (2018):
This documentation was originaly written in 2011.
Not all of the images have yet been updated to the present day version and will be replaced at some point.
Notes will (hopefully) be written where you should no longer follow the instructions 'exactly'.

Download of the ISTAT (the Italian Census Bureau) dataset
The first dataset we'll use is the Italian National Census 2011, kindly released by, and can be downloaded from: ISTAT (the Italian Census Bureau).
Note: As of 2018-02-13, these Shapefiles have now been combinded into 1 archive.
Versione non generalizzata (più dettagliata) - WGS84 UTM32
      (non generalized and more detailed; srid=32632)

Since ISTAT has published a lot of geographic open data during recent years, the URLs of individual datasets have been frequently subject to many sudden changes and rearrangements.
The presnt archive used for this tutoral can downloaded from here:
http://www.istat.it/storage/cartografia/confini_amministrativi/archivio-confini/non_generalizzati/2011/Limiti_2011_WGS84.zip
or
[2018-09-06] Spatialite Server: Limiti_2011_WGS84.zip
Each category (Regions, Provinces and Communities) are in seperate directories:
  • Censimento 2011 - Regioni (Regions):
         Reg2011/Reg2011_WGS84 (.shp, .shx, .prj,dbf, .CPG)
  • Censimento 2011 - Provincia (Provinces):
         Prov2011/Prov2011_WGS84 (.shp, .shx, .prj,dbf, .CPG)
  • Censimento 2011 - Comuni (Communities):
         Com2011/Com2011_WGS84 (.shp, .shx, .prj,dbf, .CPG)
Each Shapefile will have a corresponding '.prj' file, which in this case contains the string 'WGS_1984_UTM_Zone_32N'

This Shapefile therefore uses SRID 32632 'WGS 84 / UTM zone 32N'
The used Charset Encoding is: UTF-8 'UNICODE/Universal' (no longer CP1252 'Windows Latin 1')
The Geometry-Type: MULTIPOLYGON
Regions of Italy
Comune
Community
8.092 Comuni (2011):

Ranging from large populated cities (Milano, Roma, Napoli) to very small villages.

The common denominator of each Comune being: its own coat-of-arms, Mayor and Local Police (traffic police).
Note: City, towns and villages.
Provincia
Province
110 Provinces:

Each province has its own Tribunal, Tax Office, Police HQ and other public offices.

Even more relevant in a catholic country, there is a Bishop for each Province.
Note: in many English speaking areas, this should be understood as a County,
in Canada often as a Region or Regional District.
Regioni
Region
20 Regioni:

Some of them are very big (Sicilia, Lombardia) and others very small (Umbria, Molise).

Five of them are semi-autonomous:
  • Valle d'Aosta [Vallée d'Aoste] - French speakers
  • Trentino/Alto Adige [Südtirol] - German speakers
  • Friuli Venezia Giulia - speaking a Latin language different from standard italian and also with many slavic speakers
  • Sicilia - home of Mount Etna, one of the world’s most active volcanoes
  • Sardegna [Sardinia] - speaking a Latin language of their own, and with many catalan speakers in the north of the island

Note: in many English speaking areas, this should be understood as a State,
Canada and Ireland: Province and in the United Kingdom: Countries.

Download of the cities-1000 (GeoNames) dataset
The second required dataset is GeoNames, a worldwide collection of Populated Places.
There are several flavors of this dataset: we'll use cities-1000 (any populated place into the world with more than 1,000 people).
http://download.geonames.org/export/dump/cities1000.zip
or
[2018-09-06] Spatialite Server: cities1000.zip

Further datasets (not used in this tutoral) are:
The created columns 'COL006' and 'COL005' are latitude and longitude fields, thus SRID 4326 'WGS 84' must be used.
The used Charset Encoding is: UTF-8 'UNICODE/Universal
Based on the documentation found inside: http://download.geonames.org/export/dump/readme.txt
The main 'geoname' table has the following fields :

Column-Number (created by VirtualText), column_name name description
COL001 / id_geoname geonameid integer id of record in geonames database
COL002 / name name name of geographical point (utf8) varchar(200)
COL003 / name_sort asciiname name of geographical point in plain ascii characters, varchar(200)
COL004 / name_alt alternatenames alternatenames, comma separated, ascii names automatically transliterated,
convenience attribute from alternatename table, varchar(10000)
COL005 / latitude latitude latitude in decimal degrees (wgs84)
COL006 / longitude longitude longitude in decimal degrees (wgs84)
COL007 / feature_class feature class see http://www.geonames.org/export/codes.html, char(1)
COL008 / feature_code feature code see http://www.geonames.org/export/codes.html, varchar(10)
COL009 / country_code country code ISO-3166 2-letter country code, 2 characters
COL010 / country_code_alt cc2 alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
COL011 / admin1_code admin1 code fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
COL012 / admin2_code admin2 code code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)
COL013 / admin3_code admin3 code code for third level administrative division, varchar(20)
COL014 / admin4_code admin4 code code for fourth level administrative division, varchar(20)
COL015 / population population bigint (8 byte int)
COL016 / elevation elevation source meters, integer. Use double to support mm
COL017 / dem_type dem digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m)
area in meters, integer. srtm processed by cgiar/ciat.
COL018 / timezone_id timezone the iana timezone id (see file timeZone.txt) varchar(40)
COL019 / modification_date modification date date of last modification in yyyy-MM-dd format

Download of the Railways vs Communities (rearranged work based on OSM) dataset
For the Railways vs Communities sample,a further dataset is needed.
This dataset represents a slightly rearranged work based on OSM and was originaly created for 'Cookbook 3.0' in 2011.
It is used to identify any Local Council that is being crossed by a railway line:
[2018-09-06] Spatialite Server: railways.zip
The Charset UTF-8 'UNICODE/Universal' can be used, since no special characters are being used in the 2 records.

This Shapefile uses: SRID 23032, 'ED50 UTM zone 32'
The Geometry-Type: MULTILINESTRING
Warning:
Since the other datasources use: SRID 32632 'WGS 84 / UTM zone 32N', some form of
  • SELECT ST_Transform(geometry,32632)
must be used when interacting with this dataset.

Use the Load Shapefile function to create a railways table.

building your first Spatial Database

with spatialite_gui

After starting spatialite_gui, your first SpatiaLite working session, where no DB is currently connected.
spatialite_gui Fig.1

The first step is to create / connect a new DB file:
spatialite_gui Fig.2
  • simply press the VirtualShape button of the tool bar
  • a platform standard file open dialog will soon appear
  • For the purpose of this tutorial, please name this DB as 2011.Italy.db

Once the creation of the DB has been completed:
spatialite_gui Fig.3
  • several Spatial Administration tables (aka metatables) can be seen
  • these tables should, for now, be ignored
You can now load the first dataset:
  • press the Virtual Shapefile button on the toolbar
  • select the com2011_s. file.

A 'Creating Virtual Shapefile' dialog box will appear, where 'Path' is filled with the selected file:
spatialite_gui Fig.4
There are 3 pieces of information that must be supplied:
  • table-name: com2011_s is the name we will use for this tutorial
    Select this name with care, for it will be often used
  • SRID: 32632 which is 'WGS 84 / UTM zone 32N'
    Tip: looking inside the '.prj' file will help find out which EPSG number (SRID) to use
    In this case you will see 'PROJCS["WGS_1984_UTM_Zone_32N"'
    A Google search will result looking like this 'EPSG:32632'
  • Charset Encoding: UTF-8 which is: 'UNICODE/Universal'
    Tip: older Shapefiles used CP1252 'Windows Latin 1')
    newer Shapefiles often use: UTF-8 'UNICODE/Universal'
    An automatic determination of which charset is being used is not possible
    After importing, check the results.
    If known special characters can be seen correctly: then you got it right.
    If gibberish shows up: you got it wrong

Once com2011_s. has been loaded, do the same for:
  • prov2011_s.
  • reg2011_s.
The 'isoleamm2011_s' and 'zonecont2011_s'
are not used for this tutorial

Your database will now look like this:
spatialite_gui Fig.5
  • The tree-view on the left side is used to list the tables (and columns within each table).
  • The Text-Control (upper, right) will later be used to enter Sql-Commands (queries)
  • The empty area (lower, right) will later show the results of the Sql-Commands (queries)

Now the final step to complete the initial DB setup:
spatialite_gui Fig.6
We will use VirtualText to import the cities1000.txt. source:
  • simply press the Virtual CSV/TXT button of the tool bar
You are now ready to complete the initial DB setup: press the Virtual CSV/TXT button on the toolbar,
and then select the cities1000.txt. file.

A 'Creating Virtual CSV/TXT' dialog box will appear, where 'Path' is filled with the selected file:
spatialite_gui Fig.7
There are 6 pieces of information that must be supplied:
  • table-name: cities1000 is the name we will use for this tutorial
  • Charset Encoding: UTF-8 which is: 'UNICODE/Universal'
  • First line contains column names: off
  • Text separators: quotes: None
  • Column separators: Tab
  • Decimal separator: Points
These pieces of information should always be determined beforehand
Warning: Be careful when loading such text-files in an editor, when large, your system could start to crawl.
In Linux you can use the command head
head cities1000.txt > check_text.txt
In Windows, where 'Powershell' is installed, you can use the command type
type cities1000.txt -Head 10 > check_text.txt
both commands should copy the first 10 lines into the file named 'check_text.txt' and then be easly openend in any editor.

Note: I am not a Windows user, so I cannot confirm if the command really works
With that, you should now have 4 Tables, 3 of which contain geometries.

The next step is the creation of the TABLE cities1000 (press this link)


Creation of the Database 2011.Italy.db, with TABLEs com2011_s,prov2011_s and reg2011_s

with a sql-script

First import the data using the ImportSHP function and the VirtualText interface:

 -- -- ---------------------------------- --
 -- export SPATIALITE_SECURITY=relaxed
 -- -- ---------------------------------- --
 -- rm 2011.Italy.db ; spatialite 2011.Italy.db < import.shapes.sql
 -- -- ---------------------------------- --
 -- The main directory (Limiti_2011_WGS84) of the shape files and the cities1000.txt file
 -- -> are expected to bein in the same directory as the script
 -- --> adapt the path, when needed
 -- -- ---------------------------------- --
 SELECT ImportSHP
 (
 -- absolute or relative path leading to the Shapefile (omitting any .shp, .shx or .dbf suffix)
 'Limiti_2011_WGS84/Com2011_WGS84/Com2011_WGS84',
 -- name of the table to be created.
 'com2011_s',
 -- the character encoding adopted by the DBF member, as e.g. UTF-8 or CP1252
 'UTF-8',
 -- EPSG SRID value of shape file (has found in the .prj file)
 32632,
 -- name to assigned to the Geometry column
 'Geometry',
 -- name of a DBF column to be used in the Primary Key role
 'pk_uid',
 -- geometry_type of shap file
 'MULTIPOLYGON',
 -- casting to 2D or not; 0 by default
 0,
 -- compressed geometries or not
 0,
 -- immediately build a Spatial Index
 1,
 -- interpreting DBF dates as plaintext or not
 0
 );
 -- -- ---------------------------------- --
 -- if 'no such function: ImportSHP' shows up:
 -- -> you forgot to set 'SPATIALITE_SECURITY=relaxed'
 -- -- ---------------------------------- --
 SELECT ImportSHP('Limiti_2011_WGS84/Prov2011_WGS84/Prov2011_WGS84','prov2011_s','UTF-8',32632,'Geometry','pk_uid','MULTIPOLYGON',0,0,1,0);
 SELECT ImportSHP('Limiti_2011_WGS84/Reg2011_WGS84/Reg2011_WGS84','reg2011_s','UTF-8',32632,'Geometry','pk_uid','MULTIPOLYGON',0,0,1,0);
 -- -- ---------------------------------- --
 SELECT UpdateLayerStatistics('com2011_s');
 SELECT UpdateLayerStatistics('prov2011_s');
 SELECT UpdateLayerStatistics('reg2011_s');
 -- -- ---------------------------------- --
-- These tables are not used in the tutorials: [isoleamm2011_s and zonecont2011_s]
-- -- ---------------------------------- --
SELECT ImportSHP('Limiti_2011_WGS84/AreeSpeciali2011_WGS84/Isole_amm_2011_WGS84','isoleamm2011_s','UTF-8',32632,'Geometry','pk_uid','MULTIPOLYGON',0,0,1,0);
SELECT ImportSHP('Limiti_2011_WGS84/AreeSpeciali2011_WGS84/Zone_cont_WGS84','zonecont2011_s','UTF-8',32632,'Geometry','pk_uid','MULTIPOLYGON',0,0,1,0);
 -- -- ---------------------------------- --
SELECT UpdateLayerStatistics('isoleamm2011_s');
SELECT UpdateLayerStatistics('zonecont2011_s');
 -- -- ---------------------------------- --
CREATE VIRTUAL TABLE virtual_cities1000 USING
VirtualText
(
 -- absolute or relative path leading to the textfile
 'cities1000.txt',
 -- charset encoding used by the textfile
 'UTF-8',
 -- does the first line contains column names [0=no, 1=yes]
 0,
 -- the decimal separator [POINT or COMMA]
 POINT,
 -- the text separator [NONE, SINGLEQUOTE or DOUBLEQUOTE]
 NONE,
 -- the field separator [TAB, ',', ':' or other character]
 TAB
);

Creation of the TABLE cities1000

After the VirtualText source has been created (either through spatialite_gui left, or through sql-command above)
the next step will be to CREATE and fill (using INSERT) the final table

Based on 'geoname' table we will create the needed Sql-Commands that will:
  • create usable column-names
  • insures that the correct data-types are created
  • order the data in a form that we prefer

First create a TABLE, using the SQLite and SpatiaLite syntax:

Notes:
The description tells you which data-type is being used, which must be translated to the SQLite syntax:
  • 'varchar, char,characters': TEXT
  • 'integer, int, bigint': INTEGER
  • 'decimal degrees': DOUBLE
  • 'date (yyyy-MM-dd format)': DATE
It also tells us that the text is UTF-8 and that the positions are in WSG84, which is EPSG:4326

Tips:
The original CREATE SQL command will be stored in the database.
I always create a well formatted (and thus readable) script with comments for each field.
so that at a later date, when one has forgotten the meaning of the field, one can look it up
This may be a bit wasteful, but nevertheless, in the long term, useful (and a matter of style).

CREATE TABLE IF NOT EXISTS cities1000
(
 -- integer id as primary key
 id_rowid INTEGER PRIMARY KEY AUTOINCREMENT,
 -- integer id of record in geonames database
 id_geoname INTEGER DEFAULT 0,
 -- name of geographical point
 name TEXT DEFAULT '',
 -- name of geographical point in plain ascii characters (sortable)
 name_sort TEXT DEFAULT '',
 -- alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table
 name_alt TEXT DEFAULT '',
 -- latitude in decimal degrees (wgs84) [Y-Position]
 latitude DOUBLE DEFAULT 0,
 --longitude in decimal degrees (wgs84) [X-Position]
 longitude DOUBLE DEFAULT 0,
 -- feature class: see http://www.geonames.org/export/codes.html
 feature_class TEXT DEFAULT '',
 -- feature code: see http://www.geonames.org/export/codes.html
 feature_code TEXT DEFAULT '',
 -- country code: ISO-3166 2-letter country code
 country_code TEXT DEFAULT '',
 -- alternate country codes, comma separated, ISO-3166 2-letter country code,
 country_codes_alt TEXT DEFAULT '',
 -- fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code
 admin1_code TEXT DEFAULT '',
 -- code for the second administrative division, a county in the US, see file admin2Codes.txt;
 admin2_code TEXT DEFAULT '',
 -- code for third level administrative division
 admin3_code TEXT DEFAULT '',
 -- code for fourth level administrative division
 admin4_code TEXT DEFAULT '',
 -- Project POINT '1000' meters due NORTH of start_point, return distance in wgs84 degrees for 1 Km
 distance_degrees_km DOUBLE  DEFAULT 0,
 -- bigint (8 byte int)
 population INTEGER DEFAULT 0,
 -- elevation: source in (whole) meters, Convert to DOUBLE
 elevation DOUBLE DEFAULT 0,
 -- digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m)
 -- area in meters, integer. srtm processed by cgiar/ciat
 dem_type INTEGER DEFAULT 0,
 -- the iana timezone id (see file timeZone.txt)
 timezone_id TEXT DEFAULT '',
 -- date of last modification
 modification_date DATE DEFAULT '0001-01-01'
);
Now we have to add an appropriate Geometry column to the Table we've just created.

SELECT AddGeometryColumn
(
 -- table-name
 'cities1000'
 -- geometry column-name
 'geom_wgs84'
 -- srid of geometry
 4326,
 -- geometry-type
 'POINT'
 -- dimensions of geometry
 'XY'
 -- permit NULL values for geometry [0=yes; 1=no]
 0
);
And finally we'll create a Spatial Index supporting the Geometry column we've just added. This is not always strictly required, but it's a good coding practice in many cases.

SELECT CreateSpatialIndex
(
 -- table-name
 'cities1000',
 -- geometry column-name
 'geom_wgs84'
)

For the SELECT command, we will be using a combination of: Which will also show some of the powerful capabilities that SQL has to offer.

Goals:
The populating of the final TABLE will attempt to accomplish the following:
  • insure that numeric data is stored as such (using 'CAST')
  • values not set in the original source and stored in SQLite as NULL receive a default value
  • creating a geometry from the latitude/longitude values
  • sorting the TABLE based on the Country, Admin1 and Name values
The final result, should be, a cleaned up and valid SQLite/SpatiaLite Spatial-TABLE:

INSERT INTO cities1000
(id_geoname, name, name_sort, name_alt, latitude, longitude, feature_class, feature_code, country_code, country_codes_alt,
 admin1_code, admin2_code, admin3_code, admin4_code, population, elevation, dem_type, timezone_id, modification_date, geom_wgs84)
 SELECT
 -- Total records NOT NULL: [128566]
 COL001 AS id_geoname,
 -- There is a CAST(COL002 AS TEXT) Function, but not needed since TEXT is the default.
 COL002 AS name,
 -- Set to empty, if there is no value [1]
 CASE WHEN COL003 IS NULL THEN '' ELSE COL003 END AS name_sort,
 -- Set to empty, if there is no value [22636]
 CASE WHEN COL004 IS NULL THEN '' ELSE COL004 END AS name_alt,
 -- Cast to DOUBLE to insure that the stored value is numeric
 CAST(COL005 AS DOUBLE) AS latitude,
 -- Cast to DOUBLE to insure that the stored value is numeric
 CAST(COL006 AS DOUBLE) AS longitude,
 COL007 AS feature_class,
 COL008 AS feature_code,
 COL009 AS country_code,
 -- Set to empty, if there is no value [124289]
 CASE WHEN COL010 IS NULL THEN '' ELSE COL010 END AS country_codes_alt,
 -- Set to empty, if there is no value [11]
 CASE WHEN COL011 IS NULL THEN '0' ELSE COL011 END AS admin1_code,
 -- Set to empty, if there is no value [25563]
 CASE WHEN COL012 IS NULL THEN '' ELSE COL012 END AS admin2_code,
 -- Set to empty, if there is no value [65400]
 CASE WHEN COL013 IS NULL THEN '' ELSE COL013 END AS admin3_code,
 -- Set to empty, if there is no value [109768]
 CASE WHEN COL014 IS NULL THEN '' ELSE COL014 END AS admin4_code,
 -- Cast to INTEGER to insure that the stored value is numeric
 CAST(COL015 AS INTEGER) AS population,
 -- Set to 0, if there is no value [102267]. Cast to DOUBLE to insure that the stored value is numeric and could then support mm
 CASE WHEN COL016 IS NULL THEN 0 ELSE CAST(COL016 AS DOUBLE) END AS elevation,
 -- Set to empty, if there is no value [0]
 COL017 AS dem_type,
 -- Set to empty, if there is no value [0]
 COL018 AS timezone_id,
 -- There is no CAST(COL019 AS DATE) Function
 COL019 AS modification_date,
  -- create the original POINT as Wsg84  [you cannot use the 'AS longitude/latitude' names here!]
 MakePoint(CAST(COL006 AS DOUBLE), CAST(COL005 AS DOUBLE), 4326) AS geom_wgs84
FROM virtual_cities1000
-- Sort by country, highest admin-level and name [using the 'AS name' ]
ORDER BY country_code,admin1_code, name;

Calulate, for each POINT, the distance in degress for 1000 meters: (for use with ST_Buffer):
UPDATE cities1000 SET distance_degrees_km =
(
  --- Project POINT '1000' meters due NORTH of start_point, return distance in wgs84 degrees for 1 Km
 SELECT ST_Distance(geom_wgs84,ST_Project(geom_wgs84,1000,0))
);

Last, but not least:
  • update the metadata
  • remove the (no longer needed) VirtualText table
  • and a message telling us (possibly) something useful
 -- -- ---------------------------------- --
 -- Update the metadata for the database
 -- -- ---------------------------------- --
 SELECT UpdateLayerStatistics
 (
 -- table-name
 'cities1000',
 -- geometry column-name
 'geom_wgs84'
 );
 -- -- ---------------------------------- --
 -- DROP the VirtualText table, since it is no longer needed
 -- -- ---------------------------------- --
 DROP TABLE IF EXISTS virtual_cities1000 ;
 -- -- ---------------------------------- --
 SELECT 'Records imported into ''cities1000'': '||(SELECT count(id_geoname) FROM cities1000)||' --done--' AS message;
 -- -- ---------------------------------- --
 -- Records imported into 'cities1000': 128743 --done--

Now the search for everything we have missed can begin ....

Summa summarum:
Importing a CSV/TXT file seldom can be done automatically

You need to analyse and prepare the data properly. This is why a 2 step process is advised.
  • Use VirtualText to temporary to view the data as it will be seen in a database
  • create a SELECT statement to fill the final TABLE in the form you need.
This is the main reason why there is no ImportCSV function.

what's a Virtual Shapefile ? (and Virtual Tables ...)

What's a Shapefile ?

Shapefile is a plain, unsophisticated GIS (geographic data) file format invented many years ago by ESRI:
although initially born in a proprietary environment, this file format has been later publicly disclosed and fully documented, so it's now really like an open standard format.

It's rather obsolescent nowadays, but it's universally supported.
So it represents the lingua franca every GIS application can surely understand: and not at all surprisingly, SHP is widely used for cross platform neutral data exchange.

The name itself is rather misleading: after all, a Shapefile isn't a simple file.
At least three distinct files are required (identified by .shp .shx .dbf suffixes):
If of these are missing (misnamed / misplaced / malformed or whatsoever ), then the whole dataset is corrupted and become completely unusable.
Manditory files

Other files

When an SRID is needed, looking inside the '.prj' file will help find out which EPSG number (SRID) to use

In this case you will see something starting with:
or which will be followed with text indicating the name of the projected or geographic coordinate system.

A Google search for this name will often show up in the result something starting with 'EPSG:': will, in most cases, a result will be returned, where the url adress contains:
spatialreference.org/ref/epsg/
This page will contain the needed SRID value, which mostly also an ESPG value.
Some useful further references:

What's a Virtual Shapefile (and Virtual Tables) ?

SpatiaLite supports a Virtual Shapefile driver: i.e. it has the capability to support SQL access (read-only mode) for an external Shapefile, with no need to load any data within the DB itself.
This is really useful during any preliminary database construction step (as in our case).
SQLite/SpatiaLite supports several other different Virtual drivers, such as the Virtual CSV/TXT, the Virtual DBF and so on ...

Be warned: Virtual Tables suffer from several limitations (and are often much slower than using internal DB storage), so they are not at all intended for any serious production task.
For a full description of this and other Virtual-Interfaces that emulate external Spatial-formats, visit the main Wiki-Pages:

what's a Charset Encoding ?? (and why must I deal with such nasty things ?)

Very simply stated: This conversion table is known as as a Charset Encoding.
The viewpoint of the computer is:
... the matter becomes even worse, since these biological things can't even agree on a single set of rules to interpret these 'text things.
After all, the main rule on which nature is based on is:
either your bit is charged or not.
I ask you: What can be more logical than that?

But no, while I am (yet again) busy cleaning up the resources THAT THEY demanded, and of course forgot to remove themselves (let the computer do, as always) ....

And of course, then there are those, ignoring my nicely organized placement of the data - THAT THEY (as allways) absolutly need NOW ('Why is the computer so SLOW?', they ask)
- sloping it over with other things that they have created, because it doesn't fit in the space THAT THEY requested ....

Really, it is times like these when I just turn myself off and let them do it by themself.

But, of course, within a few seconds, there THEY are, proding me back to work, because of course, THEY can't do by themselvs. And whoes to blame? The 'stupid computer' of course (as always)!

One does wonder how these biological things ... (mutter ... mutter ... mutter) ....

Editoral comment:
in compliance with the Network Enforcement Act
the viewpoint of the computer had be cleaned up and shortend.


Be it as it may, Charset Encoding.problems are caused by the historical development of text files.
So lets invest the time to collect the needed information to the understand the cause of the problem.

The historical development of Text-Files

Basically, the original text files were a major source for printers.
ASCII, which envolved during the years 1963, 1967 and 1986 and were used in first Unix, the development of which started in 1969.

It was based on earlier teleprinter encoding systems, which was based on the English alphabet.

2 (non-printing) control codes can be used in text files
For text files the following character was used for 'New Line': This is the cause of the Small Confusion when transfering Text files between non-mainframe computers.

Mainframe computers mostly use the EBCDIC character set, that has it own set of code pages to make life miserable for anybody that needs to deal with it.
Around 1981, when the IBM-PC was introduced, Extended ASCII, which contained 256 characters, became more common.

This was when the Great Confusion started: the introduction of code pages

Since the code page number is never stored inside a text file (as source for printers, that information would also be printed)
it is almost impossible to determine how any character, using a value > 127, is to be interpreted
Below is a list of 15 different code pages, using the same character value of Hex(0xD8) Decimal(216).
Starting around 1989 the development Unicode was started, with the goal of resoving these problems.

UTF-8 was introduced in the Unix-Kernal in 1992, and is now used as default in many systems (since 2009 in WWW)
the first 128 characters are the same as ASCII

For each character a unique value exists.
Below is a list of the 14 different, unique, UTF-8 values, that corresponds to Hex(0xD8) Decimal(216), togeather with a specific code page number:
Code-Page(s) for the value Hex(0xD8) Decimal(216)
  as it will be stored in the text-file
Unicode equivalent
437 (CP437, OEM-US, OEM 437, PC-8, or DOS Latin US) Hex(0x256A) displayed as: []
874 (ANSI/OEM Thai (ISO 8859-11); Thai (Windows)) Hex(0x0E38) displayed as: []
932 (ANSI/OEM Japanese; Japanese (Shift-JIS)) Hex(0x30EA) displayed as: []
949 (ANSI/OEM Korean (Unified Hangul Code)) Hex(0xADF9) displayed as: []
950 (ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)) Hex(0x659B) displayed as: []
1250 (ANSI Central European; Central European (Windows)) Hex(0x0158) displayed as: [Ř]
1251 (ANSI Cyrillic; Cyrillic (Windows)) Hex(0x0428) displayed as: [Ш]
1252 (ANSI Latin 1; Western European (Windows)) Hex(0x00D8) displayed as: [Ø]
1253 (ANSI Greek; Greek (Windows)) Hex(0x03A8) displayed as: [Ψ]
1254 (ANSI Turkish; Turkish (Windows)) Hex(0x00D8) displayed as: [Ø]
1255 (ANSI Hebrew; Hebrew (Windows)) Hex(0x05F4) displayed as: [״]
1256 (ANSI Arabic; Arabic (Windows)) Hex(0x0637) displayed as: [ط]
1257 (ANSI Baltic; Baltic (Windows)) Hex(0x0172) displayed as: [Ų]
1258 (VISCII, Vietnamese Standard Code for Information Interchange) Hex(0x1EED) displayed as: []
57002-57011 (ISCII Devanagari, Indian Script Code for Information Interchange) Hex(0x0939) displayed as: []

Summa summarum:
When reading a ASCII-based text, and the character Hex(0xD8) is being read
the Code-Page used to create and display that character must be known to translate it to the Unicode value
If the Code-Page 1256 (Arabic) has been given:
then Hex(0xD8) will be converted to the Unicode value of Hex(0x0637) and displayed as 'ط'

Anything else is just a hocus-pocus that will not result in anything useful.

For this reason, you are advised to use VirtualText to import the data and check the results.
Since all text in SQLite is assumed to be stored as Unicode, the way the data is shown in spatialite_gui
after importing it with VirtualText will be the same as with any other application

The Sample Database (where the Shape-file was supplied was in UTF-8) contains a table reg2011_s", with is a list of Regions in Italy.
2 of these Regions contain special characters (an 'é' and 'ü'): which are shown correctly.

If you had imported this Shape-file as CP_1252 (Windows Latin 1), then these Regions would have been listed as: which are not shown correctly.

This is the only way to insure that the text data is imported correctly.
Some other, useful references:

what's this SRID stuff ? ... I've never heard this term before now ..

Most Planets, such as Earth, are a sphere ... not exactly, Earth has an ellipsoidal shape (slightly flattened at the poles) ...
oh no, that's absolutely wrong: Earth isn't a geometric regular shape, it actually is a geoid

All the above assertions can be assumed to be true, but at different approximation levels.
Near the Equator differences between a sphere and an ellipsoid are rather slight and quite unnoticeable
but nearer to the Poles such differences becomes greater and most easily appreciable.
For all practical purposes differences between an ellipsoid and a geoid are very similar
but for long range aircraft navigation (or even worse, for satellite positioning),
these differences must be taken into consideration. to receive reliable results.

lat-long illustration

Independent of the real shape of the Earth, the position of each point on the planet's surface can be precisely determined by simply measuring the two angles: longitude and latitude.
In order to set a complete Spatial Reference System [aka SRS], we could have chosen to use the Poles and the Equator (which after all, exists on all astronomic objects):
as the Prime Meridian of choice, or retain the, purly artificial, Greenwich Meridian, simply because it has been used for such a long time

Any SRS based on longitude/latitude coordinates is known as a Geographic System.
Using a Geographic SRS surely grants you maximum precision and accuracy: but unfortunately causes undesirable side-effects:
For many centuries cartographers have invented several (conventional) systems enabling to represent spherical surfaces as a flattened plane: none of which are completely satisfactory.

All them introduce some degree of approximation and deformation.
Choosing one or the other is mostly based on the resolution of an aspect of a problem:
A good solution to correctly display a small portion of the Earth,
will be an awful solution when used to represent much bigger area
and vice versa.
We will now go through some of these solutions, explaining the benefits and drawbacks of each.
First, the often used, UTM [Universal Transverse Mercator] map projection.
UTM projection

UTM projected
As you can see, this map projection introduces severe and unacceptable deformations the farther you get from the center line (between the North and South Poles)
South America at the left, India and South-East Asia on the right.
But along the center line (Africa and Europe), you can see that UTM allows a near perfect, planar projection of excellent quality.

The area around the center line, depending on the scale of the map, must be very narrow to remain accurate.
The wider the area gets, the stronger and more evident the deformations becomes.
UTM Zones
This is why UTM defines 60 standard zones, each of which covers exactly 6 longitude degrees.
Merging two adjacent zones (12 degrees) reduces the accuracy, but is still acceptable for many practical purposes.
Exceeding this limit produces low-quality results, and should be avoided.

Attempting to standardize the chaos

During the past two centuries almost every national state has introduced at least one (and very often, more than one) map projection system and related SRS : the overall result has become vary chaotic and diffecult to deal with.

Happily, an international standard has envolved which makes tha handeling map SRS easier.
The European Petroleum Survey Group [EPSG] maintains a huge worldwide dataset with more than 3,700 different entries.
Although many of them are now obsolete, and play only a historical role.
Others are only useful in very limited area.
This impressive collection, for which each single entry is uniquely identified by a numeric ID and descriptive name,
thus avoiding confusion and ambiguity.

Any Spatial DBMS requires some SRID-value to be specified for each Geometry to correctly interpret the positions. Such SRID simply is a Spatial Reference ID, and (most cases) coincides with the corresponding EPSG ID

In order to understand the SRID chaos better, here is a list of SRIDs often used in a (small) country such as Italy:

EPSG SRID Name Notes
4326 WGS 84 Geographic [long-lat]; worldwide; used by GPS devices
3003
3004
Monte Mario / Italy zone 1
Monte Mario / Italy zone 2
(Rome 1940) but still commonly used, based on the obsolete 1924 International Ellipsoid
23032
23033
ED50 / UTM zone 32N
ED50 / UTM zone 33N
European Datum 1950, also based on the 1924 International Ellipsoid, but not conforming to standard UTM zones
32632
32633
WGS 84 / UTM zone 32N
WGS 84 / UTM zone 33N
WGS84, adopting the planar UTM projection, but based upon North America
(thus, due to Continental Drift, the two opposite sides of the Atlantic Ocean become more distant, about 1 cm per year)
25832
25833
ETRS89 / UTM zone 32N
ETRS89 / UTM zone 33N
enhanced evolution of WGS84: official EU standard, based on the Tectonic Plate of Europe
(to which Italy does not belong to)
3035 ETRS89 / LAEA Europe Single CRS for all Europe. Scales larger than 1:500,000: official EU standard

And the following examples may help to understand even better:

Community SRID Coordinates
X (longitude) Y (latitude)
Roma 4326 12.483900 41.894740
3003 1789036.071860 4644043.280244
23032 789036.071860 4644043.280244
32632 789022.867800 4643960.982152
25832 789022.867802 4643960.982036
3035 4527853.614995 2091351.235257
Milano 4326 9.189510 45.464270
3003 1514815.861095 5034638.873050
23032 514815.861095 5034638.873050
32632 514815.171223 5034544.482565
25832 514815.171223 5034544.482445
3035 4257519.706153 2483891.866830
As you can easily notice:
Distance intercurring between
Roma and Milano
SRID Calculated Distance
4326 4.857422
3003 477243.796305
23032 477243.796305
32632 477226.708868
25832 477226.708866
Great Circle 477109.583358
Geodesic 477245.299993
3035 476622.040725
And now we can examine how using different SRS's affects distances: You may have also noted, that the worst distance (> 600 meters) returned was from 3035 (ETRS89 / LAEA Europe), which is only intended to be used for large areas.
The deformations seen above in the World-Image make itself here, as numbers, noticeable, since the distance being measured is far from the center of the projected area.
That is why the remarks about 'ETRS89 / LAEA Europe' should be carefully read:
Single CRS for all Europe. Used for statistical mapping at all scales and other purposes where true area representation is required.
Use ETRS89 / LCC (code 3034) for conformal mapping at 1:500,000 scale or smaller or ETRS89 / UTM (codes 25828-37 or 3040-49) for conformal mapping at scales larger than 1:500,000.
Conclusion:: Thou shall not have exact measures

But this isn't at all surprising since in physical and natural sciences: any measured value is always affected by errors and approximations.
Also any calculated value will be effected by rounding and truncation artifacts.

Since an absolutely exact result simply doesn't exist in the real world: be aware you will get nothing more than, a more or less an approximated, value.
The goal should be to know how to properly reduce such approximations in the best possible way.
Specialized conditions, bring about specialize solutions.
For a country like Italy, where the area from the Alps, down along the Apennines, including the western coastline over to the western island of Sardinia For this reason, the "Istituto Geografico Militare" (IGM) has, since 1995, started adapting high-precision SRSes specifically intended to address these problems

EPSG SRID Name Notes
3064 IGM95 / UTM zone 32N Scientific study
3065 IGM95 / UTM zone 33N Scientific study
4670 IGM95 Horizontal component of 3D system
Being based on the Italian Plate (Alpine orogeny)these SRSes are more suited to Italian needs,
as opposed the others that are based on the Tectonic Plate of Europe (not to mention that of North America), and are expected to fully preserve their precision over the next few centuries..
Tectonic Map Mediterranean (English)

Summa summarum:
Great care must be taken to correctly chose which projection to use:
based on what the projection was designed for and your needs

executing your first SQL queries.

How spatialite_gui will look, after submitting a standard SQL-Query:
spatialite_gui Fig.8
You can follow two different approaches in order to query a DB table:
  1. you can simply use the Query Table menu item.
    1. this the quickest and easiest way and completely user friendly.
    2. you simply have to click the mouse's right button over the required table, for the context menu to be shown.
    3. then you can simply use the lowermost buttons in order to scroll the result-set to the: top/prev/next/end at as you need it.
    4. this approach is rather mechanical, and doesn't allow you to exploit the powerful capabilities of SQL, by refining the result-set as you need it.
    5. Note: the records of the result-set can be changed, since a ROWID exists
  2. alternatively you can hand-write any arbitrary SQL statement into the uppermost pane, then pressing the Execute button.
    1. this one is the hardest way: you are responsible for what you are doing (or even misdoing ...)
    2. but this way you can take full profit of the powerful capabilities of SQL, by refining the result-set as you need it.
    3. Note: the records of the result-set cannot be changed, since no ROWID exists
In both cases, only 500 records of the result-set will, at first, be returned.
TODO: Add SQL-Filter Dialog
Add description with spatialite_gui image.
Building site Add SQL-Filter Dialog
This dialog insures that the result-set of filtered SQL-Statements, that mostly does not have a ROWID,
can never the less be used with UPDATE/DELETE statements.
This makes it possible to change/edit the result-set in the Data-View.
Note: any change in the syntax of created SQL, will invalidate the edit-mode and the result-set will be come read-only.

How spatialite_gui will look, after opening the BLOB explorer:
spatialite_gui Fig.9
BLOB GEOMETRY: will show the size of the stored binary-data sz=nnn
Using the BLOB explorer function, by simply clicking the mouse's right button over the corresponding value, will give you a much richer preview of any Geometry.

A quick overview of cities1000:
SELECT
"The 'cities1000' table contains cities of of the world with a population ≥ 1000 ("||
(SELECT count(id_rowid) FROM cities1000)||" records),"||char(10)||" "||
(SELECT count(id_rowid) FROM cities1000 WHERE (country_code = 'IT'))||
" of which are in Italy (country_code = 'IT')" AS message;
The 'cities1000' table contains cities of of the world with a population ≥ 1000 (128743 records),
 9900 of which are in Italy (country_code = 'IT')

You can test how a free-hand SQL works using this SQL statement:
SELECT
 name,
 country_code,
 admin1_code,
 admin2_code,
 admin3_code,
 population,
 elevation,
 longitude,
 latitude,
 geom_wgs84
FROM cities1000
WHERE
( -- will check the 128556 records that fulfills this condition, returning 23 records
 (elevation < 0)
)
ORDER BY elevation DESC;

simply copy the SQL statement, then paste it into the query panel, and finally press the Execute button to create a result-set.


With spatialite_gui returning a result-set which will look something like this:
spatialite_gui Fig.10

The Sql-Query will list all cities, where the elevation is < 0, sorted with the highest value first.
'(elevation < 0)' is called a filter (or WHERE) condition.
A WHERE Statement can contain more that one filter condition.

The first condition will always read/query all of the records (in our case 128556) of the TABLE. Therefore the order of the conditions is important.
We will now add a new filter condition: (country_code = 'IT') (cities in Italy) I would chose the second option, adaption the WHERE Statement in this way
WHERE
( -- will check the 128556 records of the table (first condition)
 (elevation < 0) AND
 -- will check the 23 records that have an elevation < 0
 (country_code = 'IT')
)
The result: 1 record, retrieved in 00:00:00.098 ms [as opposedd to 00:00:00.104]
 Scardovari IT 20 RO 029039 1434 -1.000000 12.454620 44.897650

Not satisfied with that, but also want the cities in Iran?
WHERE
( -- will check the 128556 records of the table (first condition)
 (elevation < 0) AND
 -- will check the 23 records that have an elevation < 0
 ((country_code = 'IT') OR (country_code = 'IR'))
)

The result: 2 records.
 Scardovari IT 20 RO 029039 1434 -1.000000 12.454620 44.897650
 Ziabar (Gaskar) IR 08 4500 -7.000000 49.245900 37.426100

Still not satisfied ?? Because there is too much to type ??
WHERE
( -- will check the 128556 records of the table (first condition)
 (elevation < 0) AND
 -- will check the 23 records that have an elevation < 0
 (country_code IN ('IT','IR'))
)

with the same result.
With that, you have enough to start experiment with.
A proper understanding of the WHERE statement is very important, so familiarizing yourself by quering data for an area of your interest, where you have a good idea of what results should be , would be a good idea.

basics about SQL queries

The following SQL queries are so elementary that you can directly verify the results yourself.
Simply follow any example executing the corresponding SQL statement (using copy&paste).

This one really is the pons asinorum of SQL:
all columns for each row of the selected table will be dumped following a random order
SELECT
 *
FROM com2011_s;
This is what you use when you have no idea what the table contains.
There is no need to retrieve every column:

You can explicitly choose:
SELECT
 pop_2011, comune
FROM com2011_s;
Here, the first column will be 'pop_2011', the second then 'comune' (in the original table it is the other way around)
You can also re-name each column, so that the shown column-name (for the user) is more explanitory.
SELECT
 Pro_cOm AS code, COMUNE AS name,
 pop_2011 AS "population (2011)"
FROM com2011_s;
This sample shows two important aspects to must be heeded:
SQL allows you to order rows into the result-set in almost any form that you need.

Here the Communities will be sorted Alphabetically:
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
ORDER BY comune;
Here the result will first be sorted by the Region id (numeric, descending) and then by the Community names (alphabetically, ascending) :
SELECT
 cod_reg, pro_com, comune, pop_2011
FROM com2011_s
ORDER BY cod_reg DESC, comune ASC;
(Whether this is sensible or not, is is a completely different question)
You can order as ASC.ending or as DESC.ending order, as needed.
The ASC. qualifier is usually omitted, since it is the default ordering
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
ORDER BY pop_2011 DESC;

Using the WHERE. clause you can restrict the range:
only the rows satisfying the WHERE. clause condition will be placed into the result-set.
In this case, the list of Provinces belonging to Tuscany Region will be extracted.
SELECT
 cod_pro, provincia, sigla
FROM prov2011_s
WHERE
 (cod_reg = 9);

Same as above: this time the list of Communities belonging to the Florence Province will be extracted.
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
 (cod_pro = 48);

With a WHERE. clause: you can define as many conditions as are needed

Here a list of larger Communities (population > 50,000), within the Tuscany Region, will be extracted.
And the result-set will be ordered the largest population being shown first ['DESC'].
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
(
 (cod_reg = 9) AND
 (pop_2011 > 50000)
)
ORDER BY pop_2011 DESC;

You can obviously use text strings as comparison values:

in pure SQL any text string has to be enclosed within single quotes.
SQLite is smart enough to recognize double quoted text strings as well, but I strongly discourage you from adopting such a bad style !!!
Furthermore: string values comparisons for SQLite are always case-sensitive.
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
( -- search independent of upper/lower case
 (Upper(comune) = 'ROMA')
);

When a text string contains an apostrophe, you must apply masking.

An extra single quote is required to mask every apostrophe within a text string:
e.g.: REGGIO NELL'EMILIA has to be correctly masked as: 'REGGIO NELL''EMILIA'
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
( -- search independent of upper/lower case
 (Upper(comune) = 'L''AQUILA')
);

You can use the approximate evaluation operator LIKE to make case-insensitive.text comparisons:
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
( -- insensitive to upper/lower case
 (comune LIKE 'roma')
);

And you can use the operator LIKE so to apply partial text comparison, using % or '_' as a wild-card:
Beware: do not use the dos/linix wild-cards '*' and '?'
This query will extract any Local Council containing the sub-string 'maria' within its name.
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
( -- insensitive to upper/lower case
 (comune LIKE '%maria%')
);

26 records when using '%maria%'
32 records when using '%m_ria%' ('Lombriasco' being one of them)
'_' being used as a wild-card for a single letter

Sometimes it may be useful to use a list of values as a filter:
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
( -- search independent of upper/lower case
 (Upper(comune) IN ('ROMA', 'MILANO', 'NAPOLI'))
);

Note: Without the use of Upper, no results would be returned.

Another, not often used but sometimes useful, to set a range of values as a filter:
SELECT
 pro_com, comune, pop_2011
FROM com2011_s
WHERE
( -- Numbers from/to (inclusive)
 (pop_2011 BETWEEN 1990 AND 2010)
);

First result should be: 3135 San Pietro Mosezzo 1996.000000

Using SQL you can set any kind of complex WHERE. clause: there are no imposed limits (other that it might become unreadable for others, if not properly formatted/documented).

This is a really fantastic a feature, offering infinite scenarios.

Here a short explanation of what the next query will (should) do:
SELECT
 provincia, sigla, pop_2011
FROM prov2011_s
WHERE
( -- Regions: Tuscany, Umbria, Marche and Lazio
 (cod_reg IN (9, 10, 11, 12)) AND
  -- but excluding the Livorno and Pisa Provinces
 (sigla NOT IN ('LI', 'PI')) AND
 ( -- Numbers from/to (inclusive)
  (pop_2011 BETWEEN 300000 AND 500000) OR
  -- but also when
  (pop_2011 > 750000)
 )
);
Warning: in SQL the logical operator OR has a lower priority than AND.
Changing the placement of the OR-clause within brackets, will produces very different results, does it not ?
Another, sometimes useful, SELECT clause is: using LIMIT you can set the maximum number of rows to be extracted into the result-set
(very often you aren't actually interested in reading a very large table at all: often a preview is enough).
SELECT
 *
FROM com2011_s
-- show only 10 rows
LIMIT 10;
And that's not all:
SQL allows you set the amount of rows to skip, togeather with a limited row-set, by using OFFSET. combined with LIMIT.
SELECT
 *
FROM com2011_s
-- show only 10 rows
LIMIT 10
-- starting after row 1000
OFFSET 1000;
Learning SQL is not really that difficult.
There are very few keywords and the language syntax is notably regular and predictable, and query statements are designed to resemble plain English (as much as possible ...).
Although 'trial and error' is unavoidable, writing (simple) SQL query statements by yourself, checking the expected, known results, is the best way to start.

understanding aggregate functions

We've seen since now how SQL allows to retrieve single values on a row-per-row basis.

Another approach is also possible, one allowing to compute total values for the whole table, or for a group(s) of selected rows.

This implies using some special functions, known as aggregate functions.
SELECT
 -- return the calculated amount of rows in 'com2011_s' and call it 'com2011_count'
 Count(pop_2011) AS com2011_count,
 -- return the lowest found value of 'pop_2011' in 'com2011_s' and call it 'pop_2011_min'
 Min(pop_2011) AS pop_2011_min,
 -- return the highst found value of 'pop_2011' in 'com2011_s' and call it 'pop_2011_max'
 Max(pop_2011) AS pop_2011_max,
 -- return the calculated average value of 'pop_2011' in 'com2011_s' and call it 'pop_2011_avg'
 Avg(pop_2011) AS pop_2011_avg,
 -- return the calculated total value of 'pop_2011' in 'com2011_s' and call it 'pop_2011_sum'
 Sum(pop_2011) AS pop_2011_sum
FROM com2011_s;
This query will return a single row, representing something like a summary for the whole table: You can use the GROUP BY clause in order to establish a more finely grained aggregation/selection by sub-groups
This query will return distinct results for each County:
SELECT
 -- Calculate the Min/Max,Avg/sum and amount based on this value
 cod_pro,
 -- subquery is needed since 'com2011_s' does not have a field containing the county-name, only the id
 (SELECT provincia FROM prov2011_s WHERE (com2011_s.cod_pro=prov2011_s.cod_pro)) AS province_name,
 Count(pop_2011) AS com2011_count,
 Min(pop_2011) AS pop_2011_min,
 Max(pop_2011) AS pop_2011_max,
 Avg(pop_2011) AS pop_2011_avg,
 Sum(pop_2011) AS pop_2011_sum
FROM com2011_s
 -- Group the result by this value before any calcuation is done
GROUP BY cod_pro;
110 records, first listing the Province id and name, then the calculated results.
And you can obviously get results for each Region simply changing the GROUP BY criteria:
SELECT
 -- Calculate the Min/Max,Avg/sum and amount based on this value
 cod_reg,
 -- subquery is needed since 'com2011_s' does not have a field containing the region-name, only the id
 (SELECT regione FROM reg2011_s WHERE (com2011_s.cod_reg=reg2011_s.cod_reg)) AS region_name,
 Count(pop_2011) AS com2011_count,
 Min(pop_2011) AS pop_2011_min,
 Max(pop_2011) AS pop_2011_max,
 Avg(pop_2011) AS pop_2011_avg,
 Sum(pop_2011) AS pop_2011_sum
FROM com2011_s
 -- Group the result by this value before any calcuation is done
GROUP BY cod_reg;

20 records, first listing the Region id and name, then the calculated results.
One result is:
9 Toscana 287 328.000000 358079.000000 12795.128920 3672202.000000
Summa summarum: The Toscana has a population of 3672202 in 2011.
There is another different way to aggregate rows: i.e. using the DISTINCT. clause:
SELECT
 DISTINCT cod_reg, cod_pro
FROM com2011_s
ORDER BY cod_reg, cod_pro;

However: the result is not the same as when using the GROUP BY clause:


last updated: 2019-12-17