to_gbq: Allow creation of new tables from DataFrame (and generate schema) · Issue #8325 · pandas-dev/pandas (original) (raw)

Small extension on top the to_gbq so that you can actually create new tables given only an existing dataframe. Given an arbitrary DataFrame with a non hierarchical-index, create a schema from it. For now, we'd likely assume that object dtype columns are string and maybe allow for specifying some or all columns for the schema so that int columns with nulls come out correctly (otherwise, they'd be coerced to float columns b/c of nan stuff).

E.g.:

In [6]: import pandas as pd

In [7]: import pandas.util.testing as testing

In [8]: df = testing.makeMixedDataFrame()

In [9]: df Out[9]: A B C D 0 0 0 foo1 2009-01-01 1 1 1 foo2 2009-01-02 2 2 0 foo3 2009-01-05 3 3 1 foo4 2009-01-06 4 4 0 foo5 2009-01-07

In [10]: df.dtypes Out[10]: A float64 B float64 C object D datetime64[ns] dtype: object

Then you could do something like:

In [11]: generate_bq_schema(df) Out[11]: {'fields': [{'name': 'A', 'type': 'FLOAT'}, {'name': 'B', 'type': 'FLOAT'}, {'name': 'C', 'type': 'STRING'}, {'name': 'D', 'type': 'TIMESTAMP'}]}

and with a named index, that could be added to the schema as well. For now, we could stick to requiring non-hierarchical/MultiIndex, but maybe we could use record types for an index that's MultiIndex in the future?