Mechanical Edits/Mateusz Konieczny - bot account/import Empik websites in Poland from All the Places

From OpenStreetMap Wiki
Jump to navigation Jump to search

This page describes import of website=* for Empik stores in Poland.

Goals

To add website=* where missing or imprecise.

To be the initial, of multiple planned, ATP-based imports.

To provide unique shop identification preparing ground for import of more data.

Schedule

Intended to be done in 2025, depends of availability of my hobby time.

Import Data

Background

Note: if some links are broken check https://status.codeberg.eu/status/codeberg and https://www.githubstatus.com/


Data source site: https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_empik_pl.geojson produced by https://www.alltheplaces.xyz/ via https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/ from ATP dataset produces from first-party Empik POI data (see https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/59 for analysis)
Data license: see below
Type of license (if applicable): see below
Link to permission (if required): https://osmfoundation.org/wiki/Licensing_Working_Group/Minutes/2023-08-14#Ticket#2023081110000064_%E2%80%94_First_party_websites_as_sources
OSM attribution (if required): not required
ODbL Compliance verified: yes

Data

.osm file is not generated at any point, as of 2024-02-27 following edits would be made:

exact list depends on edits made since then, for example if tags will change or object will be deleted then it will be skipped until data is regenerated

Import Type

Recurring import done with automated scripts

Data Preparation

data is published by Empik

the data is crawled and published by https://github.com/alltheplaces/alltheplaces from public website(s)

ATP data and OSM data is the processed, validated and compared by https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data

Processing includes and is not limited to

  • matching ATP and OSM POIs
  • skipping ATP POIs not matched well to any OSM POIs
  • skipping ATP POIs matched to multiple OSM POIs
  • skipping cases where matched ATP and OSM entries are conflicting on important aspects
  • skipping cases where OSM has specific website tags, but including cases where OSM website=* links main page instead of a specific POI

See

for tests that also document considered cases and behaviour.

Tagging Plans

website=* from ATP goes into website=*

Changeset Tags

Data Merge Workflow

Team Approach

I am doing this import myself but

Workflow

Import will be done by executing previously prepared script. Edits will be monitored and sample of edited objects checked in attempt to detect any previously missed problems and bugs.

Separate changeset for each POI

In case of bad, broken or otherwise problematic data such edit will be reverted. I have experience with reverting own automated edits - though it was not needed often. I will be using a separate account to make such cleanup easier, if it will be ever needed.

Edit will be done using Mateusz Konieczny - bot account - ATP import account

Conflation

Done with custom software residing at https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data

This software is intended to be enabling processing of ATP shop-type POI data in general.

QA

Samples of data was inspected manually.

Data was also reviewed by variety of automated QA, see scripts in https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data - see for example for just sample of problem found, reported back to All the Places project - in form of issues and/or patches

Discussion

The post to the community forum can be found at https://community.openstreetmap.org/t/empik-import-tagow-website-z-all-the-places/125182

Also posted extra to https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/68

And posted to imports mailing list https://lists.openstreetmap.org/pipermail/imports/2025-February/007291.html

Conflict of interest info

I received grant funding for making software that processed ATP data.

Time for making import itself was deliberately not included in grant to reduce conflict of interest.

Not doing import at all will not block grant itself (again, setup this way to reduce conflict of interest).

I am not doing this import because funder requires me to do so, I rather obtained funding to make such kind of import possible.