GitHub - ofek/pypinfo: Easily view PyPI download statistics via Google's BigQuery. (original) (raw)

pypinfo: View PyPI download statistics with ease.

pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery.

Click to unfold usage

$ pypinfo Usage: pypinfo [OPTIONS] [PROJECT] [FIELDS]... COMMAND [ARGS]...

Valid fields are:

project | version | file | pyversion | percent3 | percent2 | impl | impl-version |

openssl | date | month | year | country | installer | installer-version |

setuptools-version | system | system-release | distro | distro-version | cpu |

libc | libc-version

Options: -a, --auth TEXT Path to Google credentials JSON file. --run / --test --test simply prints the query. -j, --json Print data as JSON, with keys rows and query. -i, --indent INTEGER JSON indentation level. -t, --timeout INTEGER Milliseconds. Default: 120000 (2 minutes) -l, --limit INTEGER Maximum number of query results. Default: 10 -d, --days INTEGER Number of days in the past to include. Default: 30 -sd, --start-date TEXT Must be negative or YYYY-MM[-DD]. Default: -31 -ed, --end-date TEXT Must be negative or YYYY-MM[-DD]. Default: -1 -m, --month TEXT Shortcut for -sd & -ed for a single YYYY-MM month. -w, --where TEXT WHERE conditional. Default: file.project = "project" -o, --order TEXT Field to order by. Default: download_count --all Show downloads by all installers, not only pip. -pc, --percent Print percentages. -md, --markdown Output as Markdown. -v, --verbose Print debug messages to stderr. --version Show the version and exit. -h, --help Show this message and exit.

pypinfo accepts 0 or more options, followed by exactly 1 project, followed by 0 or more fields. By default only the last 30 days are queried. Let's take a look at some examples!

Tip: If queries are resulting in NoneType errors, increase timeout.

Downloads for a project

$ pypinfo requests Served from cache: False Data processed: 2.83 GiB Data billed: 2.83 GiB Estimated cost: $0.02

download_count
116,353,535

All downloads

$ pypinfo "" Served from cache: False Data processed: 116.15 GiB Data billed: 116.15 GiB Estimated cost: $0.57

download_count
8,642,447,168

Downloads for a project by Python version

$ pypinfo django pyversion Served from cache: False Data processed: 967.33 MiB Data billed: 968.00 MiB Estimated cost: $0.01

python_version download_count
3.8 1,735,967
3.6 1,654,871
3.7 1,326,423
2.7 876,621
3.9 524,570
3.5 258,609
3.4 12,769
3.10 3,050
3.3 225
2.6 158
Total 6,393,263

All downloads by country code

$ pypinfo "" country Served from cache: False Data processed: 150.40 GiB Data billed: 150.40 GiB Estimated cost: $0.74

country download_count
US 6,614,473,568
IE 336,037,059
IN 192,914,402
DE 186,968,946
NL 182,691,755
None 141,753,357
BE 111,234,463
GB 109,539,219
SG 106,375,274
FR 86,036,896
Total 8,068,024,939

Downloads for a project by system and distribution

$ pypinfo cryptography system distro Served from cache: False Data processed: 2.52 GiB Data billed: 2.52 GiB Estimated cost: $0.02

system_name distro_name download_count
Linux Ubuntu 19,524,538
Linux Debian GNU/Linux 11,662,104
Linux Alpine Linux 3,105,553
Linux Amazon Linux AMI 2,427,975
Linux Amazon Linux 2,374,869
Linux CentOS Linux 1,955,181
Windows None 1,522,069
Linux CentOS 568,370
Darwin macOS 489,859
Linux Red Hat Enterprise Linux Server 296,858
Total 43,927,376

$ pypinfo --days 365 "" project Served from cache: False Data processed: 1.69 TiB Data billed: 1.69 TiB Estimated cost: $8.45

project download_count
urllib3 1,382,528,406
six 1,172,798,441
botocore 1,053,169,690
requests 995,387,353
setuptools 992,794,567
certifi 948,518,394
python-dateutil 934,709,454
idna 929,781,443
s3transfer 877,565,186
chardet 854,744,674
Total 10,141,997,608

Downloads between two YYYY-MM-DD dates

$ pypinfo --start-date 2018-04-01 --end-date 2018-04-30 setuptools Served from cache: False Data processed: 571.37 MiB Data billed: 572.00 MiB Estimated cost: $0.01

download_count
8,972,826

Downloads between two YYYY-MM dates

$ pypinfo --start-date 2018-04 --end-date 2018-04 setuptools Served from cache: False Data processed: 571.37 MiB Data billed: 572.00 MiB Estimated cost: $0.01

download_count
8,972,826

Downloads for a single YYYY-MM month

$ pypinfo --month 2018-04 setuptools Served from cache: False Data processed: 571.37 MiB Data billed: 572.00 MiB Estimated cost: $0.01

download_count
8,972,826

Percentage of Python 3 downloads of the top 100 projects in the past year

Let's use --test to only see the query instead of sending it.

$ pypinfo --test --days 365 --limit 100 "" project percent3 SELECT file.project as project, ROUND(100 * SUM(CASE WHEN REGEXP_EXTRACT(details.python, r"^([^.]+)") = "3" THEN 1 ELSE 0 END) / COUNT(), 1) as percent_3, COUNT() as download_count, FROM bigquery-public-data.pypi.file_downloads WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -366 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) AND details.installer.name = "pip" GROUP BY project ORDER BY download_count DESC LIMIT 100

Downloads for a given version

pypinfo supports PEP 440 version matching.

We can use it to query stats on a given major version.

$ pypinfo -pc 'pip==21.*' pyversion version Served from cache: False Data processed: 34.45 MiB Data billed: 35.00 MiB Estimated cost: $0.01

python_version version percent download_count
3.6 21.3.1 78.74% 10,430
3.8 21.3.1 7.81% 1,034
3.7 21.2.1 3.59% 476
3.7 21.3.1 2.60% 345
3.7 21.0.1 2.25% 298
3.8 21.0.1 1.58% 209
3.8 21.2.1 1.42% 188
3.7 21.1.2 0.81% 107
3.9 21.3.1 0.69% 92
3.8 21.1.1 0.51% 67
Total 13,246

We can also use it to query stats on an exact version:

$ pypinfo -pc 'numpy==1.23rc3' pyversion version Served from cache: False Data processed: 34.01 MiB Data billed: 35.00 MiB Estimated cost: $0.01

python_version version percent download_count
3.9 1.23.0rc3 63.33% 38
3.8 1.23.0rc3 28.33% 17
3.10 1.23.0rc3 8.33% 5
Total 60

Check how many downloads came from continuous integration servers:

❯ pypinfo --percent --days 5 pillow ci Served from cache: False Data processed: 384.22 MiB Data billed: 385.00 MiB Estimated cost: $0.01

ci percent download_count
None 79.37% 11,963,127
True 20.63% 3,109,931
Total 15,073,058