2020 Differential Privacy Temporal Map Challenge (original) (raw)
The NIST, PSCR Differential Privacy Temporal Map Challenge ran from October 2020 through June 2021 awarding $129,000 in cash prizes. The goal of the challenge was to seek innovative algorithms to de-identify public safety-related data with a privacy guarantee. The challenge also sought novel methods of evaluating the quality of synthetic data.
You can try out your own solution using SDNist, an open source Python implementation of our data and scoring metrics.
The challenge was highly successful with more than 70 unique algorithms submissions across all three sprints of the challenge. Four of those algorithms have been open sourced (links in winners table below). Three solutions participated in the Development Contest, where teams were coached by NIST experts to improve the robustness and documentation of their code, creating easy-to-use implementations of sophisticated differential privacy algorithms.
The challenge was implemented by DrivenData with assistance from HeroX. Christine Task from Knexus Research Corporation served as the program’s technical lead. Gary Howarth served as the prize manager.
Winners
Algorithm contest Visit the Challenge data and scoring code repository
Team | Total Awards | Open Sourced | Development Contest | APA Citation |
---|---|---|---|---|
N-CRiPT | $44,000 | -- | -- | |
Minutemen | $48,000 | Yes | Repository link | McKenna R. (2021). Adaptive Granularity Mechanism (version 1.0). URL: https://github.com/ryan112358/nist-synthetic-data-2021 |
DPSyn | $38,000 | Yes | Repository link | Chen A., Li N., Li Z., Wang T. (2021). DPSyn: An algorithm for synthesizing microdata for data analysis while satisfying differential privacy (version 1.0). URL: https://github.com/agl-c/deid2_dpsyn |
jimking100 | $24,000 | Yes | Repository link | King, J. (2021). Privitized Histograms (Version 1.0.0) [Computer software]. https://github.com/JimKing100/PrivacyHistos |
Duke Privacy Team | $12,000 | -- | -- | |
GooseDP | $9,000 | Yes | Repository link | Covington C., Mohapatra S., Zhang S., (2021). TaxiTrip-Synthesizer (Version 1.0.0) URL: https://github.com/ctcovington/goosedp_sprint3_open _source |
SyrDP | $5,000 | -- | -- | |
3401 Walnut | $1,000 | -- | -- |
Metrics contest
Submission | Place/Prize | Amount |
---|---|---|
MGD: A Utility Metric for Private Data Publication | 1st | $5,000 |
Practical DP Metric | 2nd | $3,000 |
Confusion Matrix Metric | 2nd | $3,000 |
Bounding Utility Loss via Classifiers | 3rd | $2,000 |
Confusion Matrix Metric | People's Choice Award | $1,000 |
Challenge details
Large data sets containing personally identifiable information (PII) are exceptionally valuable resources for research and policy analysis in a host of fields supporting America's public safety agencies such as emergency planning and epidemiology.
Temporal map data—information that is geographically situated and may change over time—is of particular interest to the public safety community in applications such as optimizing response time and personnel placement, natural disaster response, epidemic tracking, demographic data and civic planning. Yet, the ability to track a person's location over a period of time presents particularly serious privacy concerns.
The Differential Privacy Temporal Map Challenge invited solvers to develop algorithms and metrics that preserve data utility while guaranteeing individual privacy is protected.
Participants competed in a series of coding sprints using differential privacy methods on temporal map data. The goal was to create a privacy-preserving dashboard map that shows changes across different map segments over time.
Challenge elements
The NIST PSCR Differential Privacy Temporal Map Challenge follows on the success of the 2018 Differential Privacy Synthetic Data Challenge, extending the reach and utility of differential privacy algorithms.
- Temporal map data: The challenge will feature public safety data sets with geographic and temporal elements. Solutions will seek to satisfy differential privacy while preserving characteristics of original data sets as much as possible, including sequential data and geographic characteristics.
- Utility metrics: Participants submitted white papers on new and innovative ways of measuring the accuracy of data sets produced by differential privacy algorithms.
- De-identification algorithms: Participants built algorithms to preserve data accuracy while guaranteeing privacy on temporal map data sets.
- Open source: Beyond expanding the types of data that can be made differentially private, this challenge incentivized teams to open source their resulting software.
- Software Development: Following the algorithm contest, NIST and outside privacy and software experts worked with teams to refactor and better document solutions to create easy-to-implement differential privacy libraries.
Algorithm Sprint Descriptions
Sprint 1
Baltimore 911 Incidents
Highly variable PS data
Training data: 2019
Evaluation data: 2016 & 2020
Problem Description | Results
Sprint 2
American Community Survey (US Census)
Complex demographic information
Training data: IL + OH
Evaluation data: NY + PA & NC+SC+GA
Problem Description | Results
Sprint 3
Chicago Taxi Rides
Linked trip information
Training data: 2019
Evaluation data: 2016 & 2020
Problem Description | Results
Resources and links
- Algorithm Contest results on DrivenData
- Metric Contest results on HeroX
- SDNist: Benchmark data and evaluation tools for data synthesizers
To find out more about the challenge and winners, visit Challenge.gov.