VirtualXPath-intro
Not logged in

back to XmlBlob main page

About XPath

XPath is a query language for selecting nodes from an XML document. It's a well known standard defined by the W3C
The VirtualXPath module enables SpatiaLite to directly perform XPath queries on behalf of XmlBlobs.

You can easily find on the WEB many quick references about XPath; the most comprehensive and authoritative (but not necessarily the easiest one) is the official standard reference.

A quick practical intro

We'll start as usual by examining few practical examples and introducing some useful comment where appropriate.

Just a very short introduction: an XmlDocument always has a tree structure formed by parent- and child-nodes, and surely has a unique root-node. This strongly resembles the very familiar file-system structure, where a ramified hierarchy of directories aka folders (i.e. parent-nodes) contains many files (i.e. child-nodes).
And not at all surprisingly, the XPath design is closely modelled on pathnames; it's not exactly the same (many relevant differences exist), but you'll be pleasantly surprised by the many familiar analogies you'll find.

Preparing a test-DB

CREATE VIRTUAL TABLE xpath_books USING VirtualXPath( 'books', 'xml_document' );
Now you simply have to create a further VirtualXPath table based on the table you've just now created by importing all the XmlDocuments.
The first argument 'books' identifies the main table name, and the second one 'xml_document' identifies the column containing the XmlBlobs.
All VirtualXPath tables always contain the following columns:

Your first XPath query

SELECT * FROM xpath_books WHERE xpath_expr = '/Books/author/title' LIMIT 5;
pkidsubparentnodeattributevaluexpath_expr
10author titleNULLNULL/Books/author/title
11authortitleNULLNULL/Books/author/title
12authortitleNULLNULL/Books/author/title
13authortitleNULLNULL/Books/author/title
14authortitleNULLNULL/Books/author/title

As you can notice the above query returned many rows, but it's practically useless because no meaningful value has been retrieved.
SELECT * FROM xpath_books WHERE xpath_expr = '/Books/author/title/text()' LIMIT 5;
pkidsubparentnodeattributevaluexpath_expr
10authortitleNULLBuddenbrooks - Verfall einer Familie/Books/author/title/text()
11authortitleNULLKönigliche Hoheit/Books/author/title/text()
12authortitleNULLBekenntnisse des Hochstaplers Felix Krull/Books/author/title/text()
13authortitleNULLDer Tod in Venedig/Books/author/title/text()
14authortitleNULLDer Zauberberg/Books/author/title/text()

So it's now time to introduce the XPath text() operator; this is intended to fetch the inner text enclosed between a begin- and the corresponding close-tag: e.g. supposing an XML fragment like <someTag>somevalue</someTag> the XPath expression someTag/text() will then return someValue. And effectively this second query will now allow you to examine all book titles contained within the imported XmlDocuments.
SELECT * 
FROM xpath_books 
WHERE xpath_expr = '/Books/author/title/text()'
ORDER BY value DESC
LIMIT 5 OFFSET 8;
pkidsubparentnodeattributevaluexpath_expr
72authortitleNULLКрейцерова соната/Books/author/title/text()
70authortitleNULLВойна и мир/Books/author/title/text()
71authortitleNULLАнна Каренина/Books/author/title/text()
510authortitleNULLΤρώαδες/Books/author/title/text()
513authortitleNULLΣφῆκες/Books/author/title/text()

That's not all: you can eventually add an ORDER BY clause, so to get all titles lexicographically ordered.
SELECT t.file_name, x.parent, x.node, x.value 
FROM xpath_books AS x
JOIN books AS t ON (t.pk_uid = x.pkid)
WHERE xpath_expr = '/Books/author/title/text()' AND t.file_name = 'books_it.xml';
file_nameparentnodevalue
books_it.xmlauthortitleI promessi sposi
books_it.xmlauthortitleI Malavoglia
books_it.xmlauthortitleMastro Don Gesualdo
books_it.xmlauthortitleIl fu Mattia Pascal
books_it.xmlauthortitleUno, nessuno e centomila

And you can even JOIN the main table and the corresponding VirtualXPath table: in this example we are now showing all book titles from some Italian author.

More elaborate XPath queries

SELECT t.file_name, x.parent, x.node, x.value
FROM xpath_books AS x
JOIN books AS t ON (t.pk_uid = x.pkid)
WHERE xpath_expr = '//*[text()="Oliver Twist"]/text()';
file_nameparentnodevalue
books_en.xmlauthortitleOliver Twist

SELECT t.file_name, x.parent, x.node, x.value
FROM xpath_books AS x
JOIN books AS t ON (t.pk_uid = x.pkid)
WHERE xpath_expr = '//title[starts-with(., "Les")]/text()';
file_nameparentnodevalue
books_fr.xmlauthortitleLes Bijoux indiscrets
books_fr.xmlauthortitleLes Misérables

But XPath supports many other useful string-comparison operators: start-with(str1, str2) evaluates to TRUE only when srt1 starts with str2.
The single-dot in this case simply intends the node-value itself (in pure pathname fashion).
SELECT t.file_name, x.parent, x.node, x.value
FROM xpath_books AS x
JOIN books AS t ON (t.pk_uid = x.pkid)
WHERE xpath_expr = '//title[contains(., " of ")]/text()';
file_nameparentnodevalue
books_en.xmlauthortitleThe Luck of Barry Lyndon
books_en.xmlauthortitleHeart of Darkness

And contains(str1, str2) evaluates to TRUE only when srt1 contains str2. So XPath offers the rough equivalent of the LIKE SQL operator.

Nodes and Attributes

SELECT pkid, sub, parent, node, attribute, value
FROM xpath_books 
WHERE xpath_expr = '//*/@lastName'
ORDER BY value DESC
LIMIT 5;
pkidsubparentnodeattributevalue
70BooksauthorlastNameТолстой
72BooksauthorlastNameДостоевский
71BooksauthorlastNameГо́голь
32BooksauthorlastNamede Molina
30BooksauthorlastNamede Cervantes

SELECT pkid, sub, parent, node, value
FROM xpath_books 
WHERE xpath_expr = '//*[@firstName="Σοφοκλῆς"]/title/text()'
ORDER BY value;
pkidsubparentnodevalue
52authortitleOἰδίπoυς ἐπὶ Κολωνῷ
51authortitleΟιδίπoυς τύραννoς
50authortitleἈντιγόνη
53authortitleἨλέκτρα

You can obviously apply some conditional expression to an attribute value as well, exactly as we've already seen before for the node-value case.
In this example we've just extracted the list of Sophocles' tragedies.
SELECT pkid, sub, parent, node, value
FROM xpath_books 
WHERE xpath_expr = '//title[contains(., "Katharina Blum")]/../@lastName'
ORDER BY value;
pkidsubparentnodevalue
10BooksauthorBöll

And in this final example we are going to discover who is the author of a novel about Katharina Blum:

XPath and XML namespaces

SELECT * FROM xpath_opera WHERE xpath_expr = '//*' LIMIT 3;
pkidsubparentnodeattributevaluexpath_expr
10NULLop:OperaNULLNULL//*
11op:Operaop:authorNULLNULL//*
12op:authorop:titleNULLNULL//*

Please note well: now all nodes have an op: prefix; this is because this XmlDocuments declares a namespace, and XPath is very pedantic about namespaces.
xmlns:op="http://www.gaia-gis.it/opera"
SELECT sub, parent, node, value 
FROM xpath_opera 
WHERE xpath_expr = '//title/text()';
Please note well: this query will return absolutely no row: this is because from the XPath own perspective there is absolutely no node being named title.
SELECT sub, parent, node, value 
FROM xpath_opera 
WHERE xpath_expr = '//op:title/text()' LIMIT 3;  
subparentnodevalue
0op:authorop:titleRigoletto
1op:authorop:titleIl trovatore
2op:authorop:titlea traviata

This second query instead works exactly as expected, because this time we are now correctly passing an XPath expression defining the fully qualified node-name, that is op:title (including the namespace prefix).

The cursed default namespace

SELECT * FROM xpath_movies WHERE xpath_expr = '//*' LIMIT 3;
pkidsubparentnodeattributevaluexpath_expr
10NULLdflt:MoviesNULLNULL//*
11dflt:Moviesdflt:directorNULLNULL//*
12dflt:directordflt:movieNULLNULL//*

Please note well: now all nodes have a dflt: prefix; this is because this XmlDocuments declares yet again a namespace, but this time we have to face the doomed default namespace.
xmlns="http://www.gaia-gis.it/movies"
SELECT pkid, sub, parent, node, value
FROM xpath_movies
WHERE xpath_expr = '//*[@lastName="Weir"]/dflt:movie/@title'
ORDER BY value;
pkidsubparentnodevalue
11dflt:directordflt:movieGallipoli
12dflt:directordflt:movieMaster and Commander: The Far Side of the World
10dflt:directordflt:moviePicnic at Hanging Rock

The above XPath query will correctly return all movies directed by Peter Weir.
SELECT pkid, sub, parent, node, value
FROM xpath_movies
WHERE xpath_expr = '//dflt:movie[@title="Pulp Fiction"]/../@lastName'
ORDER BY value;
pkidsubparentnodevalue
10dflt:Moviesdflt:directorTarantino

And this second query will help us in discovering that Quentin Tarantino directed Pulp Fiction.

XPath and numeric comparisons

SELECT pkid, sub, parent, node, value
FROM xpath_movies
WHERE xpath_expr = '//dflt:movie[@minutes < 110]/@title';
pkidsubparentnodevalue
10dflt:directordflt:movieDr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb
11dflt:directordflt:movieDressed to Kill

XPath has the capability to perform numeric comparisons: this query identifies any moving having a less than 110 minutues running time.
SELECT pkid, sub, parent, node, value
FROM xpath_movies
WHERE xpath_expr = '//dflt:movie[@minutes > 133 and @minutes < 138]/@title';
pkidsubparentnodevalue
10dflt:directordflt:movieA Clockwork Orange
11dflt:directordflt:movieGoodfellas

You can eventually compare a range of numeric values: in this latest example we've extracted the list of movies having a running time between 133 and 138 minutes.

back to XmlBlob main page