Fetch data for 28 days to stay within quota by hugovk · Pull Request #45 · hugovk/top-pypi-packages (original) (raw)
29 days uses too much quota:
from google.cloud import bigquery
Construct a BigQuery client object.
client = bigquery.Client()
client = bigquery.Client.from_service_account_json("pypinfo-key.json")
job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
Start the query, passing in the extra configuration.
query_job = client.query(
(
"""
SELECT
file.project as project,
COUNT(*) as download_count,
FROM bigquery-public-data.pypi.file_downloads
WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
GROUP BY
project
ORDER BY
download_count DESC
LIMIT 15000
"""
),
job_config=job_config,
) # Make an API request.
A dry run query completes immediately.
print( f"This query will process {query_job.total_bytes_processed:,} bytes or" f" {query_job.total_bytes_processed / 2**40:.2f} TiB." )
❯ p dry-run.py This query will process 1,125,717,300,161 bytes or 1.02 TiB.
Reducing by a day, -30
->-29
for a 28 day period should be okay:
❯ p dry-run.py This query will process 1,084,266,719,315 bytes or 0.99 TiB.