A 1-million-site measurement and analysis (original) (raw)

Online tracking: A 1-million-site measurement and analysis is the largest and most detailed measurement of online tracking to date. We measure stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and "cookie syncing".

This measurement is made possible by our web measurement tool OpenWPM, a mature platform that enables fully automated web crawls using a full-fledged and instrumented browser.

Read the paper ยป

Authors: Steven Englehardt and Arvind Narayanan of Princeton University ({ste,arvindn}@cs.princeton.edu)

The study is part of the Princeton University's WebTAP project.

The Long Tail of Online Tracking

Third parties and HTTPS adoption

News sites have the most trackers

For the list of studies that use OpenWPM please visit this page.

The data is available as bzipped PostgreSQL dumps. The schema file used in all of the datasets is available here.

Dataset Comments
1 Million Site Stateless Parallel Stateless Crawl
100k Site Stateful Parallel Stateful Crawl -- 10,000 site seed profile
10k Site ID Detection (1) Sequential Stateful Crawl -- Flash enabled -- Synced with ID Detection (2)
10k Site ID Detection (2) Sequential Stateful Crawl -- Flash enabled -- Synced with ID Detection (1)
55k Site Stateless with cookie blocking Parallel Stateless Crawl -- Firefox set to block all third-party cookies
55k Site Stateless with Ghostery Parallel Stateless Crawl -- Ghostery extension installed and set to block all possible trackers
55k Site Stateless with HTTPS Everywhere Parallel Stateless Crawl -- HTTPS Everywhere installed

The public repository for the OpenWPM crawling infrastructure is found on GitHub. The Princeton Web Census code is currently not public, but will be released in future iterations of the project.