PostgreSQL

From OpenStreetMap Wiki
Jump to navigation Jump to search
PostgreSQL logo

PostgreSQL (postgresql.org), historically named Postgres, is an open source RDBMS (relational database management system). Since API v0.6 it is used for the main central OSM database running on OSMF hardware. For technical details of how this can be set up to work with the OSM web app, see The Rails Port and Database schema.

Geospatial capabilities

The PostGIS extension for PostgreSQL is often used for geographic data. PostGIS adds geospatial functions and datatypes. We do not use this for our core database, however we do use all of these things on the tile server database as required by the Mapnik rendering engine.

There are a number of tools that can import OSM data into a PostgreSQL/PostGIS database. The most frequently used ones are:

  • osm2pgsql handles transformation of the planet data into the expected format.
  • Imposm is a PostGIS importer that supports custom DB schemas. Can import planet files with low memory.
  • Osmosis can be used to populate a more general PostgreSQL/PostGIS database from a Planet.osm OSM data dump file.
  • osm2pgrouting imports to PostgreSQL/PostGIS database for routing purpose.

Tune the database

Since we are predominantly doing bulk loads and handling lots of data, it is advisable to tune Postgres a bit for this load; the default settings are generally fairly conservative. The following is just a brief overview of the settings you need to think about, read the documentation for more info. Remember at all times that the rest of your system (including a webserver for example) also needs memory to run, so don't starve them out just for PostgreSQL. You need to edit the file postgresql.conf which may be in /etc/postgresql/<version>/main or /var/lib/pgsql/data or similar depending on your distribution. The server needs to be restarted after some of the changes. On most systems you can use systemd for the job:

sudo systemctl restart postgresql

Memory usage

shared_buffers = 4GB

This controls the amount of memory used for stored things like index pages, data pages directly into local cache. More is not necessarily better. Remember that osm2pgsql needs lots of memory too and this memory is permanently reserved for PostgreSQL. 1GB is a good setting for machines with 4-8GB RAM, use 2GB up to 32GB and 4GB for more RAM.

maintenance_work_mem = 10GB
autovacuum_work_mem = 1GB

maintenance_work_mem is the amount of memory used while building indexes. The larger the value, the faster indexes will be built after the import. osm2pgsql has freed most of its memory at this point, so you can be more generous here. autovacuum_work_mem is the memory used when postgres cleans the table in the background. Per default, the same amount of memory as maintenance_work_mem is used. PostgreSQL uses multiple workers for autovacuuming (see autovacuum_max_workers) and each of them will get the amount of memory given. So if you increase maintenance_work_mem, make sure to change the setting to keep autovacuum_work_mem low.

work_mem = 256MB

This setting is the amount of memory used when performing an ORDER BY for which no index can be used. osm2pgsql pre-orders some tables before creating indexes on them, and for this the default setting once again is far too small. All notes on maintenance_work_mem above apply to work_mem, too.

Checkpointing

Checkpoints are points in time where PostgreSQL guarantees that all changed data and indexes have been written. Checkpoints can be configured to occur more or less often. Checkpointing is expensive, so the less often they occur the faster. However, when the system crashes, PostgreSQL needs to recover any data, that has been written after the last checkpoint from WAL file (write-ahead logs). The further the checkpoints are apart, the longer it takes to recover from a crash and the more space you need to set aside for WAL files.

checkpoint_timeout = 10min
max_wal_size = 20GB

There are two parameters to influence checkpointing. checkpoint_timeout states the maximum time between two checkpoints and max_wal_size gives the maximum size that the WAL file may grow before another checkpoint happens. You want to increase both parameter but to determine the optimal setting you need to play a bit. Check your postgres logs for the following:

LOG:  checkpoints are occurring too frequently (28 seconds apart)
HINT:  Consider increasing the configuration parameter "max_wal_size".

As a rule, max_wal_size should be big enough that it never hits before checkpoint_timeout.

See also