Fetch data for 28 days to stay within quota by hugovk · Pull Request #45 · hugovk/top-pypi-packages (original) (raw)

29 days uses too much quota:

from google.cloud import bigquery

Construct a BigQuery client object.

client = bigquery.Client()

client = bigquery.Client.from_service_account_json("pypinfo-key.json")

job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)

Start the query, passing in the extra configuration.

query_job = client.query( ( """ SELECT file.project as project, COUNT(*) as download_count, FROM bigquery-public-data.pypi.file_downloads WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) GROUP BY project ORDER BY download_count DESC LIMIT 15000 """ ), job_config=job_config, ) # Make an API request.

A dry run query completes immediately.

print( f"This query will process {query_job.total_bytes_processed:,} bytes or" f" {query_job.total_bytes_processed / 2**40:.2f} TiB." )

❯ p dry-run.py This query will process 1,125,717,300,161 bytes or 1.02 TiB.

Reducing by a day, -30->-29 for a 28 day period should be okay:

❯ p dry-run.py This query will process 1,084,266,719,315 bytes or 0.99 TiB.