Gbq service account by tworec · Pull Request #11881 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation81 Commits2 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This adds service account authentication while still supports standard web auth method.
It also adds some useful std out messages: progress with elapsed time and percentage + price calculation.
In RTBHouse we're using this service account auth since may. It works perfectly with remote jupyter server (iPython notebooks).
fixes #8489
from oauth2client.client import OAuth2WebServerFlow |
---|
from oauth2client.file import Storage |
from oauth2client.tools import run_flow, argparser |
_check_google_client_version() |
flow = OAuth2WebServerFlow(client_id='495642085510-k0tmvj2m941jhre2nbqka17vqpjfddtd.apps.googleusercontent.com', |
client_secret='kOc9wMptUtxkcIFbtZCcrEAc', |
scope='https://www.googleapis.com/auth/bigquery', |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use self.scope here
the private_key
should be allowed to be a sequence of bytes as well. (and rename)
OK, I've added private_key
contents support. I understand it's better design decision.
@@ -37,6 +39,10 @@ def _check_google_client_version(): |
---|
logger = logging.getLogger('pandas.io.gbq') |
logger.setLevel(logging.ERROR) |
def _print(msg, end='\n'): |
sys.stdout.write(msg + end) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should pass verbose
to this (and do the if inside here). That way you can simply:
_print(msg, verbose=verbose)
in the code itself.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, good point
pls add a whatsnew entry (enhancements).
can you post a sample session with verbose=True.
I'm receiving the following exception related to the time.monotonic()
function when running the gbq integration tests.
AttributeError: 'module' object has no attribute 'monotonic'
I believe Travis will also have this issue. Travis is currently skipping integration tests that require a BigQuery project id, so the issue is not reported by Travis.
Should we add a dependency for monotonic
?
https://pypi.python.org/pypi/monotonic/0.5
======================================================================
ERROR: test_should_properly_handle_valid_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tony/pandas-gbq_service_account/pandas/io/tests/test_gbq.py", line 322, in test_should_properly_handle_valid_integers
df = gbq.read_gbq(query, project_id=PROJECT_ID)
File "/home/tony/pandas-gbq_service_account/pandas/io/gbq.py", line 503, in read_gbq
schema, pages = connector.run_query(query)
File "/home/tony/pandas-gbq_service_account/pandas/io/gbq.py", line 269, in run_query
self._start_timer()
File "/home/tony/pandas-gbq_service_account/pandas/io/gbq.py", line 185, in _start_timer
self.start = time.monotonic()
AttributeError: 'module' object has no attribute 'monotonic'
we don't need any more deps
use time.time()
Ad FileNotFoundError) I'm running tests on python 3 only, where this exception is built-in, hence I've missed this out. After adding support for json contents handling this exception is not needed any more. See line 168 in gbq.py, where I'm checking file exists and is regular file. So file not found exception can not happend. I'm removing it from except clause.
Doing this I've also rethinked, rewritten and tested invalid auth scenarios.
Ad json.load) it works for me. I cant reproduce it under python3 :(
I don't understand json magic in pandas.json module but I think it is imported here.
What I can do is to invoke json.loads which was used here before my changes... but its stupid, because python's built-in json module can read files...
nosetests test_gbq.py -v
test_should_be_able_to_get_a_bigquery_service (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_results_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_schema_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_valid_credentials (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_make_a_connector (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_a_bigquery_service (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyContentsIntegration) ... ok
test_should_be_able_to_get_results_from_query (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyContentsIntegration) ... ok
test_should_be_able_to_get_schema_from_query (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyContentsIntegration) ... ok
test_should_be_able_to_get_valid_credentials (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyContentsIntegration) ... ok
test_should_be_able_to_make_a_connector (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyContentsIntegration) ... ok
test_should_be_able_to_get_a_bigquery_service (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyPathIntegration) ... ok
test_should_be_able_to_get_results_from_query (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyPathIntegration) ... ok
test_should_be_able_to_get_schema_from_query (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyPathIntegration) ... ok
test_should_be_able_to_get_valid_credentials (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyPathIntegration) ... ok
test_should_be_able_to_make_a_connector (pandas.io.tests.test_gbq.TestGBQConnectorServiceAccountKeyPathIntegration) ... ok
test_bad_project_id (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_bad_table_name (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_column_order (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_column_order_plus_index (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_download_dataset_larger_than_200k_rows (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_index_column (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_malformed_query (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_arbitrary_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_empty_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_false_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_timestamp_unix_epoch (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_true_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_read_as_service_account_with_key_contents (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_read_as_service_account_with_key_path (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_unicode_string_conversion_and_normalization (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_zero_rows (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_read_gbq_when_private_key_json_values_has_wrong_types_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_read_gbq_with_corrupted_private_key_json_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_read_gbq_with_empty_private_key_file_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_read_gbq_with_empty_private_key_json_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_read_gbq_with_invalid_private_key_json_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_read_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_booleans_as_python_booleans (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_floats_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_integers_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_strings_as_python_strings (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_timestamps_as_numpy_datetime (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_that_parse_data_works_properly (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_should_fail_if_invalid_table_name_passed (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_create_dataset (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_create_table (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_dataset_does_not_exist (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_dataset_exists (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_delete_dataset (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_delete_table (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_generate_schema (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_google_upload_errors_should_raise_exception (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_list_dataset (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_list_table (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_list_table_zero_results (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_table_does_not_exist (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data_if_table_exists_append (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data_if_table_exists_fail (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data_if_table_exists_replace (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data_as_service_account_with_key_contents (pandas.io.tests.test_gbq.TestToGBQIntegrationServiceAccountKeyContents) ... ok
test_upload_data_as_service_account_with_key_path (pandas.io.tests.test_gbq.TestToGBQIntegrationServiceAccountKeyPath) ... ok
pandas.io.tests.test_gbq.test_requirements ... ok
pandas.io.tests.test_gbq.test_generate_bq_schema_deprecated ... ok
----------------------------------------------------------------------
Ran 73 tests in 330.985s
OK
Sample session gbq.read_gbq
verbose output:
Requesting query... ok.
Query running...
Elapsed 11.55 s. Waiting...
Query done.
Processed: 37.5 Mb
Retrieving results...
Got page: 1; 10% done. Elapsed 17.25 s.
Got page: 2; 19% done. Elapsed 20.32 s.
Got page: 3; 29% done. Elapsed 24.03 s.
Got page: 4; 39% done. Elapsed 28.5 s.
Got page: 5; 48% done. Elapsed 33.15 s.
Got page: 6; 58% done. Elapsed 37.19 s.
Got page: 7; 67% done. Elapsed 40.6 s.
Got page: 8; 77% done. Elapsed 43.95 s.
Got page: 9; 87% done. Elapsed 48.27 s.
Got page: 10; 96% done. Elapsed 185.45 s.
Got page: 11; 100% done. Elapsed 187.93 s.
Got 1038579 rows.
Total time taken 191.67 s.
Finished at 2016-01-15 16:16:07.
def __init__(self, project_id, reauth=False): |
def __init__(self, project_id, reauth=False, verbose=True, private_key=None): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's default to verbose=False
. you could have an option do this I suppose, e.g. pd.options.gbq.verbose
pls git diff master | flake8 --diff
and fix the issues
needs a rebase; most of the codebase underwent a PEP cleanup very recently.
You will be authenticated to the specified BigQuery account via Google's Oauth2 mechanism. |
Primary auth method is as simple as following the |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary authentication
- We need to update the
to_gbq()
definition inframe.py
to also support the'private_key'
parameter.
data_frame.to_gbq(DESTINATION_TABLE, project_id = PROJECT_ID, if_exists='append', private_key = PRIVATE_KEY)
TypeError: to_gbq() got an unexpected keyword argument 'private_key'
) |
---|
except (KeyError, ValueError, TypeError, AttributeError): |
raise InvalidPrivateKeyFormat("Service account private key should be valid JSON (file path or string contents) " |
"with at least two keys: 'client_email' and 'private_key'. Can be obtained from google developers console. ") |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make a small change in the error message to indicate that file may be missing?
raise InvalidPrivateKeyFormat("Private key is missing or invalid. Service account private key should be valid JSON (file path or string contents) with at least two keys: 'client_email' and 'private_key'. Can be obtained from google developers console. ")
.. note:: |
The `'private_key'` parameter can be set to either the file path of the service account key in JSON format, or |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need indenting to the note
(same below)
@tworec just a couple of more stylistic changes to conform to the rest of the code base.
almost there!
gr8! all style fixes done. i'm running local tests and pushing it. please review.
All 73 tests have passed both on py3.5 and py2.7
It was my first PR. Thanks for feedback.
tworec deleted the gbq_service_account branch