Update of "XmlBlob-intro"
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 37c5c25c54185f939834ed4671a79a730e3f8ca7
Page Name:XmlBlob-intro
Date: 2013-03-04 19:51:35
Original User: sandro
Parent: e06bec88359ec9c6ab3f344831c0360fb3306e7f (diff)
Content

back to XmlBlob main page

XmlBlob - Quick start

Starting since version 4.1.0 SpatiaLite supports a specialized format (XmlBlob) intended to store a generic XmlDocument into the DB as a BLOB value.
The binary format specification supporting XmlBlob is publicly available.

Supporting XmlDocuments is enough interesting by itself; but its real utility become absolutely clear considering the growing role played by XML in the GeoSpatial context: metadata, SLD/SE styles and many other representations are fully based on XML.
In the following steps we'll get a first quick glance at XmlBlob using very simple practical examples.

TOC - Table Of Contents

  1. Creating an XmlBlob from an XmlDocument
  2. Extracting an XmlDocument from within an XmlBlob
  3. XmlBlob compression
  4. XML: well-formedness and Schema validation
  5. The internal XML Schema cache
  6. Other useful misc SQL functions supporting XmlBlob


1 - Creating an XmlBlob from an XmlDocument

SELECT Hex( '<a><b>alpha</b></a>' );
---
3C613E3C623E616C7068613C2F623E3C2F613E

SELECT XB_Create( x'3C613E3C623E616C7068613C2F623E3C2F613E' );
---
XmlBLOB-result
The XB_Create() SQL function in its simplest forms accepts a simple argument corresponding to a BLOB (XmlDocument payload).
The returned value (if successful) corresponds to a BLOB value (namely: an XmlBlob value).
Please note: the input XmlDocument is expected to be passed as a BLOB, not as Text string; and there is a very good reason accounting for this.
An XML Document can be internally encoded using several different charset encodings (UTF-8, UTF-16, ASCII ...): many of such encodings cannot be safely handled as Text strings, but surely all them can always be handled as a binary BLOB value.
The Hex() function will simply return the hexadecimal representation corresponding to a Text string, and the x'08..AF' notation can then be used to define a BLOB-constant.
A simpler way to pass an XML sample as an argument is by directly invoking the CastToBlob() SQL function; a third way to perform the same identical task is by using the SQL CAST operator.

SELECT XB_Create( CastToBlob( '<a><b>alpha</a></b>' ) );
---
NULL

SELECT XB_Create( CAST( '<a><b>alpha</a></b>' AS BLOB ) );
---
NULL
If the input BLOB value passed to XB_Create() doesn't contains any valid XmlDocument, a NULL value will be returned.

SELECT XB_GetLastParseError();
---
noname.xml:1: parser error : Opening and ending tag mismatch: b line 1 and a
<a><b>alpha</a></b>
             ^
noname.xml:1: parser error : Opening and ending tag mismatch: a line 1 and b
<a><b>alpha</a></b>
                 ^
You can then call XB_GetLastParseError() in order to get a meaningful message explaining why the most recent call to XB_Create() failed.


2 - Extracting an XmlDocument from within an XmlBlob

SELECT XB_IsValid( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' );
---
1
The XB_IsValid() SQL function checks if a generic BLOB actually corresponds to some valid XmlBlob value.

SELECT XB_GetDocumentSize(
    x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' );
---
19
And XB_GetDocumentSize() will return the corresponding XmlDocument length (measured in bytes).

SELECT XB_GetPayload(
    x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' );
---
BLOB-value
You can then call XB_GetPayload() in order to extract the XmlDocument from the XmlBlob, exactly as it was initially loaded from a previous call to XB_Create() (i.e., as a BLOB value fully preserving the original character encoding).

SELECT XB_GetDocument(
    x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' );
---
<a><b>alpha</b></a>
A similar SQL function is supported, i.e. XB_GetDocument(); the main difference is in that this second one will always return a Text string (UTF-8 encoded).

SELECT XB_GetDocument(
    x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' , 2 );
---
<?xml version="1.0"?>
<a>
  <b>c</b>
</a>

SELECT XB_GetDocument(
    x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' , 8 );
---
<?xml version="1.0"?>
<a>
      <b>c</b>
</a>

SELECT XB_GetDocument(
    x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' , 0 );
---
<?xml version="1.0"?><a><b>c</b></a>
You can eventually pass a second argument to XB_GetDocument(), this corresponding to indentation.
In this case a nicely formatted and properly indented XmlDocument will be returned. Any indent factor ranging from 1 to 8 is actually supported; any other bigger number will silently be understood as 8.
Any negative value will simply return the original XML Document exactly as it was initially stored; 0 will return the whole XML Document as a single line.
The SQL function XB_GetPayload() as well supports a similar indentation optional argument.


3 - XmlBlob compression

XmlDocuments notoriously tend to be verbose: storing an XmlBlob in a compressed form usually saves half of the required storage (and possibly even more).
SELECT XB_IsCompressed(
     x'0001AB13000000130000000000BA0000CA0000DA0000DB0000DC0000DDCB3C613E3C623E616C7068613C2F623E3C2F613EBC9AB71406DD' );
---
0
The XmlBlob we've used in the previous example is uncompressed, as reported by XB_IsCompressed()

SELECT Hex( XB_Compress( 
     x'0001AB13000000130000000000BA0000CA0000DA0000DB0000DC0000DDCB3C613E3C623E616C7068613C2F623E3C2F613EBC9AB71406DD' ) );
---
0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD
The XB_Compress() SQL function will return a new compressed XmlBlob exactly corresponding to the input one.
If the input XmlBlob is already compressed this one is a harmless no-op.

SELECT XB_IsCompressed(
     x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' );
---
1
As reported by XB_IsCompressed() this second XmlBlob actually is of the compressed kind; anyway it's indisputably bigger than the uncompressed one !!!
Don't be worried: this paradoxical effect can be easily explained.
The corresponding XmlDocument is very short-sized (unrealistically short-sized): under these very peculiar circumstances there is not enough entropy supporting an effective compression, and the net result will have a negative impact (inflation instead of the expected deflation).
But under more realistic conditions (i.e. using an XmlDocument of some KB or more) you can reasonably expect an efficient compression ratio of 50% or even more.

SELECT Hex( XB_Uncompress( 
    x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' ) );
---
0001AB13000000130000000000BA0000CA0000DA0000DB0000DC0000DDCB3C613E3C623E616C7068613C2F623E3C2F613EBC9AB71406DD
The XB_Uncompress() SQL function will return a new uncompressed XmlBlob exactly corresponding to the input one.
If the input XmlBlob already is uncompressed this one is a harmless no-op.


4 - XML: well-formedness and Schema validation

it's now time passing to explore some more serious XML example.
SELECT Length( XB_Create( CastToBlob(
'<?xml version="1.0" encoding="utf-8"?>
<Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="books.xsd">
  <author firstName="Nathaniel" lastName="Hawthorne">
    <title>The Scarlet Letter</title>
  </author>
  <author firstName="William" middleName="Makepeace" lastName="Thackeray">
    <title>Vanity Fair</title>
    <title>The Luck of Barry Lyndon</title>
  </author>
</Books>' ), 1 ) );
---
287
XB_Create() accepts an optional second argument corresponding to compression=TRUE | FALSE

SELECT Length( XB_Create( CastToBlob(
'<?xml version="1.0" encoding="utf-8"?>
<Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="books.xsd">
  <author firstName="Nathaniel" lastName="Hawthorne">
    <title>The Scarlet Letter</title>
  </author>
  <author firstName="William" middleName="Makepeace" lastName="Thackeray">
    <title>Vanity Fair</title>
    <title>The Luck of Barry Lyndon</title>
  </author>
</Books>' ), 0 ) );
---
455
As you can easily notice, this time the compression effectively works as expected, because the document size is a reasonable one.
In both cases XB_Create() was able to create a valid XmlBlob because this XML sample formally respects the well-formedness rules, thus being successfully parsed. So it indisputably is a valid XMLDocument.

Anyway this first check simply is formal, and nothing states about the actual semantic correctness of this XmlDocument.
In order to get a more comprehensive and thorough validity check (i.e. XML Validation) we are required to define an appropriate XML Schema, as the following one:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema elementFormDefault="qualified" version="0.1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Books">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="author" minOccurs="1" maxOccurs="unbounded">
	  <xs:complexType>
	    <xs:sequence>
              <xs:element name="title" type="xs:string" minOccurs="1" maxOccurs="unbounded" />
            </xs:sequence>
            <xs:attribute name="firstName" type="xs:string" use="required" />
            <xs:attribute name="middleName" type="xs:string" use="optional" />
            <xs:attribute name="lastName" type="xs:string" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>
An XML Schema simply is yet another XmlDocument, representing the expected data layout of the target XMLDocument to be validated.
In this specific example:
Now you simply have to cut&paste the above XML Schema, saving it into some file on your local filesystem (usually XmlSchema documents are identified by the .xsd file-name extension).
SELECT  XB_Create ( CastToBlob (
'<?xml version="1.0" encoding="utf-8"?>
<Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="books.xsd">
  <author firstName="Nathaniel" lastName="Hawthorne">
    <title>The Scarlet Letter</title>
  </author>
  <author firstName="William" middleName="Makepeace" lastName="Thackeray">
    <title>Vanity Fair</title>
    <title>The Luck of Barry Lyndon</title>
  </author>
</Books>' ), 1, 'C:/sample.xsd' ) );
---
1
XB_Create() accepts a third facultative argument corresponding to the Schema URI; in this specific example the XmlSchema is expected to be found on a file into the local Windows filesystem (C:/sample.xsd).
When a SchemaURI is explicitly specified XB_Create() will perform a full XML Validation against the XmlSchema.
The XB_IsSchemaValidated() SQL function can then be used in order to detect if some XmlBlob has successfully passed a full XML Validation.

SELECT XB_Create ( CastToBlob (
'<?xml version="1.0" encoding="utf-8"?>
<Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="books.xsd">
  <author firstName="Nathaniel" lastName="Hawthorne">
    <novel>The Scarlet Letter</novel>
  </author>
  <author firstName="William" middleName="Makepeace" lastName="Thackeray">
    <title lang="en">Vanity Fair</title>
    <book>The Luck of Barry Lyndon</book>
  </author>
</Books>' ), 1, 'C:/sample.xsd' );
---
NULL
When a SchemaURI is explicitly defined and the XML Validation fails (for any reason) the XmlDocument will be assumed to be invalid, and a NULL value will be returned by XB_Create()

SELECT XB_GetLastValidateError();
---
Element 'novel': This element is not expected. Expected is ( title ).
Element 'title', attribute 'lang': The attribute 'lang' is not allowed.
Element 'book': This element is not expected. Expected is ( title ).
You can then call XB_GetLastValidateError() in order to get a meaningful message explaining why the most recent call to XB_Create() failed.


5 - The internal XML Schema cache

Performing XML validation is a reasonably fast operation; anyway downloading from the web some very complex XML Schema definition sometimes could require a noticeable time length. (and this usually is the case of e.g. Metadata Schemata)
SpatiaLite silently handles an internal cache of XML Schemata; once a Schema is successfully downloaded from the web it's immediately stored into this cache.
So any following reference to the same identical Schema URI will be serviced using the cached Schema, thus avoiding any further network access and allowing a noticeably faster processing.
SELECT XB_CacheFlush();();
The XB_CacheFlush() SQL function immediately resets such internal cache to its initial empty state.

Please note: the internal cache is strictly connection-related; each single connection has its own internal cache.

6 - Other useful misc SQL functions supporting XmlBlob

SELECT XB_GetSchemaURI( XmlBlob-value );
---
C:/sample.xsd
The XB_GetSchemaURI return the SchemaURI initially used to perform XML Validation (if any).

SELECT XB_GetInternalSchemaURI( CastToBlob (
'<?xml version="1.0" encoding="utf-8"?>
<Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="http://www.gaia-gis.it/books.xsd">
  <author firstName="Nathaniel" lastName="Hawthorne">
    <title>The Scarlet Letter</title>
  </author>
</Books>' ) );
---
http://www.gaia-gis.it/books.xsd
Please don't be confused: the XB_GetInternalSchemaURI() SQL function performs a completely different task:
SELECT XB_GetEncoding( XmlBlob-value );
---
utf-8
The XB_GetEncoding() SQL function will return the character encoding used by the XmlDocument stored within the XmlBlob.
Please note: NULL could be eventually returned, because some XmlDocuments has no explicit encoding declaration. In this case UTF-8 will be implicitly assumed as a silent default.

We've already encountered the XB_GetPayload() and XB_GetDocument() SQL functions. They are very closely related, but behaves quite differently under an important aspect:
This has many interesting further consequences:
SELECT XB_SchemaValidate( XmlBlob-value, SchemaURI );
---
a possibly SchemaValidated XmlBlob
We've already seen as XB_Create() directly supports XML Validation.
Anyway a two-steps approach is possible as well, by calling XB_Create() first (no Schema Validation), and then calling XB_SchemaValidate() in a second time so to perform a deferred XML Validation.

back to XmlBlob main page