Update of "Supporting GeoJSON"
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 680b5f8ea26cfdade55a6f7b3a693aebaf735957
Page Name:Supporting GeoJSON
Date: 2019-01-21 12:48:13
Original User: sandro
Parent: c60685c4e86d0994b6066e85e481afd1d22a3e8d (diff)
Next 2c5eed193c4b1282fe1c3120b1b8bcc340667de3
Content

back



Introduction

GeoJSON is an open standard data format based on JSON (JavaScript Object Notation), a very popular data format widely adopted by many web-apps as a replacement for XML.
The specific scope of GeoJSON is extending the basic capabilities of JSON so to adequately support geographic features including both Geometries and non-spatial attributes.

GeoJSON exists since many years being based on a loose and informal data specification.
Only very recently (2016) it has finally become a respectable standard format based on a formal specification, that is RFC 7946 released by IETF (Internet Engineering Task Force).
Note: RFC 7946 introduced several relevant requirements and restrictions, so that pre-RFC and post-RFC GeoJSON files are not mutually interoperable.

A very remarkable feature of RFC 7946 is that it's explicitly declared to be a fixed and immutable specification.
There will never be updated versions of GeoJSON; even the slighter change will inexorably require changing the name from GeoJSON to some else.
Such a restrinction is obviosly intended to ensure a very strong stability during the time.

The most obvious competitors of GeoJSON are the ESRI Shapefile and GML
The following chart will quickly resume the main differences between them.

CategoryShapefileGMLGeoJSONRemarks
File organization At least three independent files sharing the same name and respectively identified by suffixes .shp, .shx and .dbf
  • Both the .shp and .shx members are binary files intended to store Geometries and should be encoded accordingly to ESRI open specification
  • The .dbf member is intended to store non-spatial attributes.
    This too is a binary file expected to be encoded accordingly to Ashton-Tate dBase specification; unhappily this very old specification (born in the '80s) had a savage proliferation of different dialects (Clipper, FoxPro) becoming quite messy and chaotic.
GML is based on XML, and conseguently just requires a single, monolithic text file.
As any other XML file, GML too can be strongly constrained to verbatim respect a formally defined XML Schema
Single monolithic text file.
Similar in this to XML, but explicitly intended to be by way simpler and less verbose.
The three-files layout of Shapefile is clearly obsolete, and it frequently poses many headaches causing unexpected troubles.

The single-file layout adopted by both GML and GeoJSON is clearly better and safer, and being text files they can be easily inspected and eventually debugged just using any generic text editor without requiring any specific tool.
Supported Geometry classes
  • Null Shape
  • Point
  • MultiPoint
  • PolyLine (without distinguishing between single- and multi-part)
  • Polygon (without distinguishing between single- and multi-part)

Notes:
  • All Geometries in the same Shapefile must share the same class (or be Null).
  • All Geometries in the same Shapefile must share the same SRID.
  • The rules for identifying Exterior and Interior Polygon rings are awkward and can frequently cause interoperability issues.
GML allows many different ways for defining the same Geometry, and the specifications radically changed from version to version.

GML has a really impressive flexibily (e.g. each single Geometry can freely declare its own SRID), but at the cost of imposing an overwhelming complexity.
  • Null
  • Point
  • Linestring
  • Polygon
  • MultiPoint
  • MultiLinestring
  • MultiPolygon
  • GeometryCollection

Notes:
  • This exactly corresponds to the standard 7 classes model adopted by Spatial SQL.
  • The same GeoJSON file can freely contain any kind of Geometry classes withour restrictions.
  • All Geometries in the same GeoJSON file must share the same SRID.
  • Shapefile is obviously obsolete, and someway messy and limited.
  • GML is elegant and very sophisticated: sometimes too much sophisticated and complex to be really usable.
  • GeoJSON matches very well Spatial SQL requirements, and is so simple to avoid any unnecessary complexity.
Supported dimensions
  • XY
  • XYM
  • XYZ
  • XYZM
  • XY
  • XYM
  • XYZ
  • XYZM
  • XY
  • XYZ

Note: RFC 7946 just supports 2 or 3 coordinates, and the third value (when declared) is always expected to correspond to an Elevation (Z axis).

Supporting XYM or XYZM is not technically unfeasible. Both writers and readers could support such options, but all this is surely outside the standard and will surely impair the universal portability of any non canonical file.
GeoJSON lacks the capability to support XYM and XYZM, if not by adopting vicious tricks.
May well be it's not a forbidding limitation in many common cases, but it's indisputably a limitation.
Intended SRID Not internally declared by the Shapefile itself.
Deploying a further .prj file describing the intended SRID is the usual solution adopted by ESRI itself, but correctly parsing these extra files is an usually flimsy process falling outside real capabilities of many third party readers.
Each single Geometry is allowed to freely define its own SRID; and is also possible to define the SRID for a whole layer. Accordingly to RFC 7946 all coordinates are always expeted to be expressed as longitues and latitudes (in this exact order).
So any canonical GeoJSON file is always expected to reference SRID=4326 WGS 84.

Using any other SRID is technically possible, but requires a conventional agreement between writers and readers, but all this is surely outside the standard and will surely impair the universal portability of any non canonical file.
The unique effective solution is the one adopted by GML.
Both Shapefile and GeoJSON are clearly inferior under this peculiar aspect.
Supported non-spatial attributes
  • CHAR (limited to max. 254 bytes)
  • NUMBER (represented by an ASCII string of max. 32 bytes)
  • DATE (YYYYMMDD)
  • LOGICAL (T/F)

Note: all attribute names are limited to a length of max. 10 bytes. There is no safe way for declaring NULL values.
Any possible datatype you can imagine.
And defining further derived datatypes is an option supported by XML Schema.

Note: attribute names and text values can have any arbitrary unconstrained length.
  • text (unconstrained length)
  • number
  • null
  • true
  • false

Note: attribute names can have any arbitrary unconstrained length.
  • Shapefile (or more precisely in this case DBF) clearly suffers from too many unpleasant limitations.
  • GML (more precisely XML) can effectively support an impressive flexibility but can easily become too much complex and heavy to be parsed.
  • GeoJSON offers a well balanced mix; it's still reasonably simple and it's powerful at the same time.
Charset encoding Not internally defined by the Shapefile itself.
Attempting to guess the appropriate charset encoding required by some Shapefile is more a magic art than rational science.
Always internally defined by the GML/XML file itself. RFC 7946 strictly requires that all GeoJSON files must be encoded as UTF-8

In pure theory both UTF-16 and UTF-32 could be used for encoding a legitimate GeoJSON file, but such options seems to be very rarely (if never) adopted in real world.
  • Shapefiles leaves a lot to be desired, and not rarely misunderstunding the appropriate charset encoding causes many serious portability issues.
  • GML/XML nicely supports any possible charset in the most flexible (and safe) way
  • Once again, GeoJSON is straightforward simple but really effective


back