Foundation/AGM2024/Election to Board/Answers and manifestos/Q12 Data Quality and Protection

From OpenStreetMap Wiki
Jump to navigation Jump to search

Data Quality and Protection

In light of recent vandalism incidents, what if anything should the Board do differently in the area of data protection and quality assurance?

Candidates: Craig Allan | Brazil Singh | Courtney Cook Williamson | Maurizio Napolitano | Can Ünen | Michael Montani | Andrés Gómez Casanova | Laura Mugeha | Héctor Ochoa Ortiz | Arun Ganesh

Craig Allan - Q12 Data Quality and Protection

The Data WG and the Operations WG are leading in this issue and doing excellent work. They are planning counter-measures. The Board fully supports but does not direct them. My opinion is: Vandalism should be predicted, detected and repaired rapidly. Prediction and detection methods would involve some variety of pattern matching. We could explore using AI in this space. We might also introduce varying trust levels and allowed actions for new and more experienced mappers. Vandalism repair requires a quickly implemented local freeze on edits in the vandalised area to stop innocent contributors editing on top of vandalised data. Then application of a smart roll-back procedure to restore the site to its pre-vandalised condition. Quality assurance is getting some attention at the moment. Community oversight is our most powerful method for detection and correction of errors in mapping and we see no change in that. There are a variety of data checker applications which also have an important role in helping the community to identify and fix problems. Out-dated data is a different quality problem. We need to look at implementing a time-out on objects after which they can be flagged for checking. The time-out can vary intelligently per object type. For example, retail POIs should be short, maybe 2 years, while stable POIs like historical monuments can be 25 years or more.

Brazil Singh - Q12 Data Quality and Protection

Addressing recent vandalism incidents in OpenStreetMap (OSM) requires a multi-faceted approach to enhance data protection and quality assurance. I believe the Board should consider doing differently to improve in these areas. Board should invest in advanced monitoring tools that can detect and alert for suspicious or anomalous changes in real-time. These tools should be capable of identifying patterns that suggest vandalism or other harmful activities. Also explore the use of machine learning and AI to analyze changes and predict potential vandalism. We have to enhance the role of the community in data validation by encouraging more frequent and rigorous review of changesets. On more thing is Simplify and streamline the process for community members to report suspicious activity or vandalism. Ensuring that reports are reviewed and acted upon promptly is crucial for maintaining data integrity. Also Local communities should establish local moderation teams that can quickly address issues specific to their regions. These teams can provide valuable insights and rapid response to localized vandalism.

Courtney Cook Williamson - Q12 Data Quality and Protection

I am not qualified to comment on the technology, but I do know we need smart people to be supported with time and financial resources wherever possible to work on it.

Maurizio Napolitano - Q12 Data Quality and Protection

The natural response is that the Board should strengthen its collaboration with the Data Working Group, developing better tools to detect and prevent vandalism. This is already happening and should be further enhanced. One way to achieve this is through education. It is equally important to create training initiatives that improve the community’s resilience to vandalism. Promoting greater awareness and responsibility among contributors can reduce vandalism. Additionally, fostering dialogue with OSM ecosystem players for tools, suggestions, and data improvement initiatives is essential.

Can Ünen - Q12 Data Quality and Protection

Again I’m going to focus on OSM-TR, to cases I have experienced firsthand, within the community. Over the years we have encountered multiple data vandals. We have tried to reach out to all, sometimes got responses back and solved all the issues with communication. Those were the good vandals, without ill-intent, they simply were not aware of the community standards and where to look at to learn them. But there are ones who are obnoxious, rude, and insisting on their mistakes where we reported to the Data Working Group, which issued a ban and deleted the user. But those are the ones that just create a new account and keep vandalizing. Even now we are trying to stop the vandalism of one or two individuals who keep coming back with new accounts and keep vandalizing the data. If it would be possible to ban not just users but IPs or devices so that life is also a bit harder to the ones vandalizing the data, I can be an advocate for this solution.

Michael Montani - Q12 Data Quality and Protection

I think the OSMF dealt with the recent vandalism attacks in an extraordinary manner thanks to the prompt work of the DWG, the Board and all volunteers involved. I think it may be useful for the OSMF to focus on supporting tools to make validation tasks from the DWG even easier:

  • Real-time monitoring tools of OSM data, maybe with some automatic classification capability on historical user contributions and profiles to automatically flag bots or vandalism activity?

- Tools helping making sense of data and reverting in a situation in which changesets flood in a very quick way (both from attackers and volunteers that would like to give an hand but may make things even more complicated)?

I am not in favor of any change that would make it more difficult to freely contribute to the database (despite user restrictions sound reasonable to me in the defined thresholds) or access its raw data (availability of .*.pbf and .*.gz formats). I believe OSM is powerful BECAUSE it is easy to contribute to and access (read more at Q14), and I see any quality control mechanism in some way going against these principles, like the provision of "LTS" or "QA" versions of the data, admitting it would be feasible to do given the size and how fast the data evolves. I agree that some data consumers are interested in having good quality data (which is anyway a very subjective topic, depending on the scale and purposes of application!!), but I think this would be of interest for OSM data brokerage services and eventually something for the data user to deal with.

I think the optimal situation is the one in which the data user also contributes to the database itself. "You do not like, you change it", as we do in open-source. As OSM is a common public good, we can expect different entities and communities to increase and review the quality of the data over time. The United Nations Global Service Centre, for example, contributes OSM data where needed for the production of basemap information for their military topographic maps, and we may assume the data there to be good quality with respect to that purpose. If an entity is much smaller and cannot afford to contribute enough to the data, should instead think about supporting local communities to get the job done. But I would see it extremely demanding having a quality check on the data before distribution, and I believe data quality can be improved only if new OSM contributors get exposed to broader world of OSM (about education, more on Q10), and not just used to produce building footprints.

I use OSM-based navigators without any issues in Europe, and add POIs when I see them missing. I saw in 2018 citizens in a region of Senegal happily navigate the surroundings using (back then) Maps.me. Quality is very subjective and could be semantic (i.e. I cannot express a concept with the current tags) or feature-dependent (i.e. I am a citizen interested in restaurants close to me vs I have interest for the road network being mapped at an accuracy good enough for 1:50000 scale topographic maps).

Something more on Q11.

Andrés Gómez Casanova - Q12 Data Quality and Protection

Vandalism is part of project development, sometimes due to ignorance, sometimes intentionally, and the Board should not get involved. The board should support the existing quality tools, promote their use, and contribute to the community's development of these tools. They should not be considered separate projects unrelated to OSM.

Similarly, reversion processes should be more common, contributors should have more confidence in doing them, and the decision to do a reversion should be more open and communicated with the community. Currently, a reversion can become an utterly secret process, where someone's contributions are being erased, and this type of operation should not be encouraged but always communicated, discussed, and arrived at as a group.

Laura Mugeha - Q12 Data Quality and Protection

Over the last few years, the detection of data vandalism on OSM has improved, but this can be made better by implementing a more robust user verification process for new accounts, especially those making large-scale edits. We could explore the possibility of a tiered editing system where new users have limited editing capabilities until they've established a positive track record.

The OSMF can support the maintenance and development of data quality and assurance tools like OSMCha. We could also learn from and support data quality initiatives from OSM organizations like HOTOSM and YouthMappers, who run communities within their networks, i.e., HOTOSM Global Validators and YouthMappers Validation Hub.

Lastly, we can support OSM communities in learning how to monitor and maintain OSM data quality at a national and smaller scale. Learnt a lot on how OSM Ghana is doing this at this presentation by Enock Seth at SotM 2024.

Héctor Ochoa Ortiz - Q12 Data Quality and Protection

There needs to be an effort with the Operations WG to detect vandalism early, and to be able to react to new forms of vandalism. Monitoring anomalous values for newly created users should be ensured to avoid those accounts being used for coordinated attacks. The OWG, DWG, and Board should be kept informed and aware of the latest findings in this regard so cooperative, fast action against attacks can happen when needed.

Español

Debe haber un esfuerzo con el Grupo de Trabajo de Operaciones (Operations Working Group) para detectar el vandalismo temprano, y poder reaccionar a nuevas formas de vandalismo. Se debe asegurar la monitorización de valores anómalos para usuarios recién creados para evitar que esas cuentas sean utilizadas para ataques coordinados. El OWG, DWG y la Junta deben estar informados y conscientes de los últimos hallazgos en este sentido para que se pueda tomar una acción cooperativa y rápida contra los ataques cuando sea necesario.

Arun Ganesh - Q12 Data Quality and Protection

Data validation is an expensive and critical problem for OSM data consumers that OSMF has invested little in. My first role at Mapbox had me directly work on building a data validation pipeline and I'm aware first-hand of the large cost involved in having a reliably working system and I'm not convinced replicating this is a good use of OSMF resources at this point.

OSMF should, however, highly encourage multiple initiatives in this field which will always remain a topic that requires constant innovation and R&D, and provide more visibility to existing tools and findings in this area. There is an ecosystem of externally built tools and services that are effective to keep a check on OSM data quality, and putting these in the hands of more editors is the simplest action needed to make it faster to detect and fix data issues on the map.



Candidates: Craig Allan | Brazil Singh | Courtney Cook Williamson | Maurizio Napolitano | Can Ünen | Michael Montani | Andrés Gómez Casanova | Laura Mugeha | Héctor Ochoa Ortiz | Arun Ganesh

OSM Foundation's board election 2024: official questions
Q01 Motivation and Objectives | Q02 Conflict of Interest Management | Q03 Transparency and Accountability | Q04 Strategic Vision and Sustainability | Q05 Decision-Making and Collaboration | Q06 Fundraising and Resource Development | Q07 Handling Legal and Political Challenges | Q08 State of the Map | Q09 Your Community Contributions | Q10 Promoting Community and Attracting Volunteers | Q11 Technology and Innovation | Q12 Data Quality and Protection | Q13 Perspective on Open Source | Q14 Perspective on Overture Maps
All board candidates' manifestos


2024 OpenStreetMap Foundation's: Board election - Voting information and instructions - Annual General Meeting