to_gbq: Allow creation of new tables from DataFrame (and generate schema) · Issue #8325 · pandas-dev/pandas (original) (raw)
Small extension on top the to_gbq
so that you can actually create new tables given only an existing dataframe. Given an arbitrary DataFrame
with a non hierarchical-index, create a schema from it. For now, we'd likely assume that object
dtype columns are string and maybe allow for specifying some or all columns for the schema so that int columns with nulls come out correctly (otherwise, they'd be coerced to float columns b/c of nan stuff).
E.g.:
In [6]: import pandas as pd
In [7]: import pandas.util.testing as testing
In [8]: df = testing.makeMixedDataFrame()
In [9]: df Out[9]: A B C D 0 0 0 foo1 2009-01-01 1 1 1 foo2 2009-01-02 2 2 0 foo3 2009-01-05 3 3 1 foo4 2009-01-06 4 4 0 foo5 2009-01-07
In [10]: df.dtypes Out[10]: A float64 B float64 C object D datetime64[ns] dtype: object
Then you could do something like:
In [11]: generate_bq_schema(df) Out[11]: {'fields': [{'name': 'A', 'type': 'FLOAT'}, {'name': 'B', 'type': 'FLOAT'}, {'name': 'C', 'type': 'STRING'}, {'name': 'D', 'type': 'TIMESTAMP'}]}
and with a named index, that could be added to the schema as well. For now, we could stick to requiring non-hierarchical/MultiIndex, but maybe we could use record types for an index that's MultiIndex in the future?