A 1-million-site measurement and analysis (original) (raw)
Online tracking: A 1-million-site measurement and analysis is the largest and most detailed measurement of online tracking to date. We measure stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and "cookie syncing".
This measurement is made possible by our web measurement tool OpenWPM, a mature platform that enables fully automated web crawls using a full-fledged and instrumented browser.
Authors: Steven Englehardt and Arvind Narayanan of Princeton University ({ste,arvindn}@cs.princeton.edu)
The study is part of the Princeton University's WebTAP project.
The Long Tail of Online Tracking
Third parties and HTTPS adoption
News sites have the most trackers
For the list of studies that use OpenWPM please visit this page.
The data is available as bzipped PostgreSQL dumps. The schema file used in all of the datasets is available here.
| Dataset | Comments |
|---|---|
| 1 Million Site Stateless | Parallel Stateless Crawl |
| 100k Site Stateful | Parallel Stateful Crawl -- 10,000 site seed profile |
| 10k Site ID Detection (1) | Sequential Stateful Crawl -- Flash enabled -- Synced with ID Detection (2) |
| 10k Site ID Detection (2) | Sequential Stateful Crawl -- Flash enabled -- Synced with ID Detection (1) |
| 55k Site Stateless with cookie blocking | Parallel Stateless Crawl -- Firefox set to block all third-party cookies |
| 55k Site Stateless with Ghostery | Parallel Stateless Crawl -- Ghostery extension installed and set to block all possible trackers |
| 55k Site Stateless with HTTPS Everywhere | Parallel Stateless Crawl -- HTTPS Everywhere installed |
The public repository for the OpenWPM crawling infrastructure is found on GitHub. The Princeton Web Census code is currently not public, but will be released in future iterations of the project.