Public Transport Network Analysis/Syntax of CSV data

From OpenStreetMap Wiki
Jump to navigation Jump to search

This page describes the CSV-based format used as input for PTNA.

Introduction

PTNA produces an analysis page based on a source CSV file. The CSV file defines which routes are analyzed, but also defines the page layout: sections, text, hyperlinks.

CSV stands for comma-separated values. Technically only route definition lines are CSV (using semicolon instead of comma) while the other lines are not, but because they are the most crucial element of the file, the whole format is referred to as CSV.

The CSV is usually hosted in the OSM wiki, where anyone is free to edit it. For example, the analysis for US-WA-KCM is generated based on the CSV in this wiki page: Washington/Public Transport/King County Metro/PTNA/Analysis/Routes.

The format is interpreted line-by-line, with the first non-space character in the line determining the meaning of the line according to this table:

Character Line meaning Examples
# Comment # This is a comment
= Header

== Section title

=== Subsection title

- Plain text - This is a paragraph of text that will appear in the analysis page
@ Data import definition
@train
@@
+ Reserved for later extensions
~ Reserved for later extensions
$ Reserved for later extensions

Any other character

"

Route definition

20;bus;"Academy-Watt";;;Winnipeg Transit;CA-MB-WT;20;

"-5";bus

Ignored lines

These lines are only used to make the CSV file easier to read and edit, and are completely ignored by PTNA. They are only visible when editing the CSV, never in the analysis page.

Blank lines

Lines with only whitespace do nothing.

Comments: #

Comment lines start with #. Comments can only start at the beginning of a line. If # appears in the middle of a line, it has no special meaning.

# This is a comment, it will not appear in the analysis page
- The number sign # can appear within a plain text line (or any other kind of line) and it will show up in the analysis page like any other text
- Bus line #555 is the blue line
123;bus;;;; # This is NOT a comment - this is the operator of this route

Layout, text, and formatting

The analysis page can include text paragraphs, as well as headers for page structure and navigation. Both headers and paragraphs can include links and formatting.

Headers: =

Header lines start with =. The number of = determines the header level.

= Top title
== First main section
=== Subsection

== Second main section

Plain text: -

Text lines start with -. The text after the - appears verbatim in the resulting analysis page, subject to formatting.

= Top title
== First main section
- Text description or explanation.
- Any number of text lines can appear consecutively. 

- Consecutive text lines get rendered as single paragraphs! HTML: <p> ... </p>
- Line breaks can be added by adding an empty text line:
-
- Like that. In HTML, a new paragraph is used. <p> Like that. </p>

=== Subsection
- Lorem ipsum ...

== Second main section
- etc.

Formatting and links

MediaWiki-based syntax can be used anywhere within headers and plain text for formatting and linking

Example Example result Description
''italic'' italic Italic text
'''bold''' bold Bold text
'''''bold and italic''''' bold and italic Bold and italic text
!!!warning!!! warning Yellow background, larger and more prominent text
[[PTNA]] PTNA Link to OSM wiki page
[[Key:route|route tag]] route tag Link to OSM wiki page, with custom text
[https://example.com/dolphins Dolphins] Dolphins Link to arbitrary URL
[/results/index.html results index] results index Link to internal PTNA URL
== Buses in [[London]]
- The [[Tag:route=bus|bus routes]] in '''London'''. !!!More info!!! on the [https://example.com/london/bus official London bus website].

Route definition

Arguably the most important lines - these define the routes to be analyzed, as they are expected to appear in the OpenStreetMap data.

Route

Each route is defined in its own line in the following format:

ref;type;comment;from;to;operator;gtfs-feed;gtfs-route-id;gtfs-release-date

Only ref and type are strictly required, the rest may be left empty. If fewer than 9 fields are provided, the rest are assumed to be empty.

For example:

5;bus;;Emerald City;Hill Valley;Dreambus
66;bus

If any field includes a semicolon, it must be quoted:

6;light_rail;"May stop service soon; Check operator website to see if the route still exists"

If any field includes a double-quote character, it must be quoted and the double-quote must be "escaped" by a double-quote:

SC-NY;ferry;"Ferry is called ""Statue Cruises-New York Ferry"" operates from ""Battery Park"" via ""Ellis Island"" and ""Liberty Island"" back to ""Battery Park"""
# Name Description Example values
1 ref Required. Tag ref=* of route or route_master
250 defines that routes with 'ref'='250' are expected here
250|250a|250b defines that routes with 'ref'='250' and 'ref'='250a' and 'ref'='250b' are allowed here. Whether this is allowed after PTv1/PTv2 is another matter
605/50 defines that here 'ref' of two 'network' are valid: it is checked, whether 'ref:network1'='605' and 'ref:network2'='50' exist
"139;142" defines that the route with two numbers exists in the same 'network'.
"280;281|280|281" could be used when 'route_master' has 'ref' "280;281" and the two 'route' members have 'ref' "280" and "281" respectively
"+210" enables the reserved character '+' at the beginning of 'ref'.
2 type Required. Content of the tag route=* or route_master=* bus
train
tram
subway
3 comment

Optional. Will not be evaluated, just output. Will be displayed together with the route in the analysis page.

The formatting syntax used in plain text is available to use. Yellow highlighting can be shortened to just one exclamation mark instead of three: !text!

Remember that if your comment includes a semicolon, you must wrap it in quotes (see example).

call taxi
express bus
"semicolon ; in comment must be quoted"
!seasonal route!
4 from Usually optional. Used to differentiate between several routes with identical ref, type and operator. Should match the value of from=* for the route or route_master.
5 to Usually optional. Used to differentiate between several routes with identical ref, type and operator. Should match the value of to=* for the route or route_master.
6 operator

Usually optional. Used to differentiate between several routes with identical ref and type. Should match the value of operator=* for the route or route_master.

The value of 'operator' may contain ';', but must then be in "...".

Metro Transit
7 gtfs-feed Optional. Reference to a source in the GTFS analysis of PTNA where this route can be found. Normally should be a value from List of GTFS feeds. DE-BY-MVV
8 gtfs-route-id Optional. Reference to a "route_id" in the GTFS data that belongs to this route 12345
9 gtfs-release-date Optional. Reference to special release of the GTFS data. 2020-08-18

CSV data import definition: @

In some regions (currently only in Israel, experimentally), route CSV information can be imported from GTFS data.

Import definitions are started with at least one filter (lines starting with @), and terminated by an import-end line (line starting with @@). The most basic import statement looks like this:

@train
@@

Introduction

The import process happens in two steps:

  1. Process GTFS data to generate a list of all the routes of all types in the relevant area
  2. Inject the routes into the appropriate sections of the CSV file

How does step 2 know what is the appropriate section for each route? The answer: import definitions. When writing or editing the CSV, you tell the importer where to place which routes, using filters. All the routes that match the filters will be inserted.

Basic examples

All tram routes:

@tram
@@

All bus routes operated by TTT (fictional operator):

@bus
@operator=TTT
@@

Filters

There are 4 kinds of filters, and one useful shorthand:

Syntax Description Example
@{key}={value} Match only routes with the selected property exactly equal to the given value. @operator=Tation Transpor
@{key}~{regex}

Match routes whose value for the selected property matches the given regular expression.

Regular expressions are evaluated using the re.search function in Python. Please refer to Python's documentation of Regular Expression Syntax for a full definition of available features.

@type~^(train|tram)$
@{key}!={value} Matches any route that would not match @{key}={value}. This includes routes for which {key} has no value. @from!=Central station
@{key}!~{regex} Matches any route that would not match @{key}~{value}. This includes routes for which {key} has no value. @ref!~[a-zA-Z]
@{value} Shorthand for @type={value}. @bus

An import template starts with at least one filter and ends with a template-terminator line (@@). A route must match all filters in a given template in order to be added to it. Use multiple filters to select specific routes.

After a route was been added to a template, it can never match again in later templates.

Available keys

The default/canonical keys which you can usually use in filters are:

  • ref
  • type
  • comment
  • from
  • to
  • operator
  • gtfs_feed
  • route_id
  • gtfs_release_date

They correspond to the fields in a route definition line. Note that not all of these keys are available in all import scripts - for example, Israel's import script does not assign a value for gtfs_release_date. (gtfs_release_date would not be useful for filtering anyway)

An import script can also define additional properties for filtering. For example, Israel's import script assigns catalog_number with the number used by the Ministry of Transportation to uniquely identify a route. These extra fields should be documented individually by each import script for the convenience of the CSV editors.

Example

== Bus routes in Sevenville
- The bus routes in the Sevenville neighborhood all have number 7xx
@bus
@ref~^7\d\d\D*$
@@

Once the data is imported, the data will be filled in between the filters and the final @@:

== Bus routes in Sevenville
- The bus routes in the Sevenville neighborhood all have number 7xx
@bus
@ref~^7\d\d\D*$
701;bus
742;bus
777A;bus
@@bus ref~^7\d\d\D*$

Each time data is imported, the entire imported contents get replaced.

Routes will only ever appear once, in the first import statement that they match. Therefore the following example works:

== Bus routes
=== In Sevenville
- The bus routes in the Sevenville neighborhood all have number 7xx
@bus
@ref~^7\d\d\D*$
@@

=== Everywhere else
# the Sevenville bus lines will not appear here because they already appeared before
@bus
@@

Oldschool doc

#
# This data is input for the tool: PTNA - Public Transport Network Analysis (https://ptna.openstreetmap.de)
#
# Format of the data:
#       UTF-8
#
# Formatting:
#       The formatting is based on the OSM Wiki.
#
# Links:
#       [[...|..]] are (as in the OSM-Wiki) internal links to the OSM-Wiki
#       [... ...]  are (as in the OSM-Wiki) external links to the Internet
#
# Headlines:
#       Headlines start with '=', '==', '===', '====', ... at the beginning of a line
#
# Plain text:
#       Plain text starts with '-' at the beginning of a line.
#       Plain text may appear anywhere.
#
# New line:
#       A new line (line feed) is introduced by a single '-' in a line
#
# Layout:
#       !!!Text with yellow background!!!               in plain text or headlines
#       '''''Text with thick, italic letters'''''       in plain text or headlines
#       '''Text with thick letters'''                   in plain text or headlines
#       ''Text with italic letters''                    in plain text or headlines
#
# Comments:
#       Comments start with '#' at the beginning of a line.
#       Comments in the middle are not recognized, i.e. '#' may appear within text.
#
# Reserved characters at the beginning of a line:
#       '#' Comment line
#       '=' Headlines of different categories
#       '-' Plain text
#       '@' at the beginning of a line is used to read data from GTFS and update the CSV data be injecting/replacing current CSV data
#           more details will follow. See the start of the discussions on this in the OSM community https://community.openstreetmap.org/t/ptna-news-for-public-transport-network-analysis/8383/241
#       '+' at the beginning of a line is reserved for later extensions
#       '~' at the beginning of a line is reserved for later extensions
#       '$' at the beginning of a line is reserved for later extensions
#       If one of the reserved characters is at the beginning of 'ref' (see below), put 'ref' in double quotes
#
#
################################
#
# Definition of line information:
#       Content in CSV format
#       All fields containing ';' must be enclosed in double quotation marks (e.g. "139;142";bus;;; "Operator1;Operator";;)
#
# ref;type;comment;from;to;operator;gtfs-feed;gtfs-route-id;gtfs-release-date
#
# ref           required
#                   == tag 'ref' of route or route_master
#                       250             defines that routes with 'ref'='250' are expected here
#                       250|250a|250b   defines that routes with 'ref'='250' and 'ref'='250a' and 'ref'='250b' are allowed here
#                                       whether this is allowed after PTv1/PTv2 is another matter
#                       605/50          defines that here 'ref' of two 'network' are valid: it is checked,
#                                       whether 'ref:network1'='605' and 'ref:network2'='50' exist
#                       "139;142"       defines that the route with two numbers exists in the same 'network'.
#                       "280;281|280|281" could be used when 'route_master' has 'ref' "280;281" and the 
#                                         two 'route' members have 'ref' "280" and "281" respectively
#                       "+210"          enables the reserved character '+' at the beginning of 'ref'.
#
# type          required
#                   == Content of the tag 'route' or 'route_master' (bus, train, tram, subway, ...)
#
# comment       can be empty, will not be evaluated, just output
#                   == can contain comments like; call taxi, bus, express bus, ...
#                       !text with yellow background! in comment (surrounded by simple !)
#                       "Comment with ; in text"
#
# from          can be empty
#                   == is used to differentiate between several routes with identical ref, type and operator
#
# to            can be empty
#                   == is used to differentiate between several routes with identical ref, type and operator
#
# operator      can be empty
#                   == is used to differentiate between several routes with identical ref and type
#                       The value of 'operator' may contain ';', but must then be in "...".
#
# gtfs-feed     can be empty
#                   == Reference to a source in the GTFS analysis of PTNA where this route can be found (e.g.: "DE-BY-MVV")
#
# gtfs-route-id can be empty
#                   == Reference to a "route_id" in the GTFS data that belongs to this route
#
# gtfs-release-date can be empty
#                   == Reference to special release of the GTFS data (e.g.: "2020-08-18")
#
################################
#

= Overview of the public transport lines of the ...

- Reference to this list with the [[Sandbox|Template Lines]] in the OSM-Wiki.
-
- Reference to more [/en/config.php?network=... information] on this evaluation.
-