SQLite

From OpenStreetMap Wiki
Jump to navigation Jump to search

SQLite (sqlite.org) is a container format for different kinds of data. It is not geographical by itself. SpatiaLite [2] extends SQLite with spatial indexing and spatial functions.

Overview

SQLite is about storing and querying huge lists of data in a single file. Telling that a file is SQLite does not tell what kind of data is therein (geo data, non-geo data, metadata) and how to query it programmatically.

GeoPackage and MBTiles specialize SQLite and tell what kind of data to expect and how to query it programmatically. They tell how geo data is stored in the form of a list of features and their geometries, or an archive of vector tiles, or an archive of raster tiles.

ogr2ogr has its own specialization of SQLite, which is documented here and is conceptually on the same level as GeoPackage, MBTiles, etc.

When people say SQLite, they can mean

  • a SQLite file that is not a GeoPackage, MBTiles, etc.
  • any SQLite file, including GeoPackage, MBTiles, etc.
  • ogr2ogr's specialization of SQLite

Downloading

Some processed data providers provide SQLite files (including specializations, like GeoPackage). Note that that page does not list all the formats for all the providers.

Converting

You can download OSM in PBF or another format, then convert via ogr2ogr. ogr2ogr supports OSM XML since version 2.4. See below for details on how to use ogr2ogr.

Other options may be found at Converting map data between formats.

The now-abandoned OSMLib was able to convert to and from SQLite and OSM XML (osmlib-sqlite docs).

ogr2ogr

ogr2ogr can create a SQLite + SpatiaLite file (documented here), but you usually want to follow the instructions of a specialized format, such as GeoPackage, plus the general indications that follow here and apply to all of the SQLite-based formats implemented in GDAL.

You can optionally customize the conversion: copy this file on your machine (call it osmconf.ini), edit the file to your liking, then do export OSM_CONFIG_FILE=osmconf.ini.

GDAL version 3.8.0 substantially improves the SQLite creation performance.[1]

To improve creation performance, increase the cache memory size, for example 512 MB or 1024 MB: export OGR_SQLITE_CACHE=512

If, in case of a power loss, you plan to trash the half-made DB and start over, you will not need fsync()[2], which otherwise slows down creation a lot. You can disable fsync with any of these

  • export OGR_SQLITE_SYNCHRONOUS=OFF
  • creating the SQLite file on a tmpfs (because fsync() does nothing on tmpfs)
  • using eatmydata / libeatmydata


Optimize searching by tag

ogr2ogr stores the OSM tags in SQLite or GeoPackage files in suboptimal ways (JSON, HSTORE, ogrconf.ini's attributes). There is a feature request to store in two columns key and value[3] to allow for a query that is performant, readable and flexible at the same time. For now, you can implement that schema by hand, see SQLite/fid_to_tag schema.

SQLite files and QGis

Often, it's more useful to store geodata in small SQLite. It doesn't have geo-functions or a spatial index, like SpatiaLite does, but this is often not needed at all. SQLite gives a very lightweight way to store such information.

QGIS can produce both kinds of DBs, but it doesn't add the correct CRS/SRS (information about the coordinate system being used). That might be a bug, but doesn't work up to version 2.18.3 and maybe later. The problem results into QGIS complaining about a missing CRS/SRS on re-opening at file and forces it to be EPSG:4326. That's good for OpenStreetMap data, but not really usefull for anything else.

To add the missing CRS/SRS, keep in mind to store metadata in QGIS and select "SQLite" as export format (not "SpatiaLite"!). In the following example, I'll set SRID to 3035 used in Europe for maps about statistics:

  1. Open the SQLite file, e.g., with sqlite3: sqlite3 database.sqlite
  2. Add the information about spatial reference to the geometry column (table is "demo_data"):
UPDATE geometry_columns SET srid = 3035 WHERE table = 'demo_data';
  1. Add the definition of the spatial reference:
INSERT INTO spatial_ref_sys (srid, auth_name, auth_srid, srtext) VALUES (3035, 'epsg', 3035, 'PROJCS["ETRS89 / ETRS-LAEA",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Lambert_Azimuthal_Equal_Area"],PARAMETER["latitude_of_center",52],PARAMETER["longitude_of_center",10],PARAMETER["false_easting",4321000],PARAMETER["false_northing",3210000],AUTHORITY["EPSG","3035"],AXIS["X",EAST],AXIS["Y",NORTH]]');

Now you're done. You can look up the string for spatial reference, f.e. on this website: https://spatialreference.org/ref/epsg/3035/postgis/ But this is important:

  1. Remove "proj4text," from the bracketed part and remove "+proj=..." from the second one.
  2. Make the first number of the second bracketed part be equal to your EPSG-code, f.e. "3035".

Tools using SQLite

Microcosm is a map API server that uses Sqlite and the R*tree module.

General usage

Visualisation tools

General tools to navigate in a SQLite database:

  • sqlitebrowser: open the tab Navigate data, select the table in the dropdown box, filter columns by typing something in the text filters below the table column headers
  • sqlite3 command-line program:
    • Type sqlite3 to run SQL interactively in the terminal window
    • Or give it a SQL as an argument. For example, using the above optimization tutorial: sqlite3 file.sqlite "select fid from points_fid_to_tag where key = 'amenity' and value = 'atm' limit 10;"

Indices

You can make indices on any column of any DB table. You can pass a query to .expert to tell SQLite automatically suggest indices that speed up that specific query. [4] [5] Beware that WHERE uses indices which have WHERE's columns as their first columns[6] and that at most one index is used for each table in a query[7].

For example, ogr2ogr makes a column osm_id. To make an index to speed up the WHERE osm_id = ... part of a query, run:

CREATE INDEX index_points_by_osm_id ON points(osm_id);

To check that a query uses an index:

username@machine$ sqlite3 the_file.sqlite
sqlite> EXPLAIN QUERY PLAN
   ...>  SELECT points.geometry
   ...>    FROM points
   ...>   WHERE points.osm_id = ...
   ...>     AND ...;
QUERY PLAN
...
sqlite>
sqlite> -- Make indices per above
sqlite> CREATE INDEX ...
sqlite>
sqlite> -- Test the same query
sqlite> EXPLAIN QUERY PLAN
   ...>  SELECT points.geometry
   ...>    FROM points
   ...>   WHERE points.osm_id = ...
   ...>     AND ...;
QUERY PLAN
...    -- This output should be different

See also

References

  1. release notes, details
  2. The details are at "PRAGMA schema.synchronous" at [1]
  3. https://github.com/OSGeo/gdal/issues/7552
  4. .expert is documented here in section 17.
  5. If you run .expert and it gives error sqlite3_expert_new: no such module: VirtualSpatialIndex on Debian, then install the package libsqlite3-mod-spatialite and run SELECT load_extension("mod_spatialite");.
  6. query planner documentation's section "3. WHERE Clause Analysis"
  7. query planner documentation's section "8. Choosing Between Multiple Indexes"