BUG: to_json with objects causing segfault (original) (raw)

Code Sample, a copy-pastable example if possible

Creating an bson objectID, without giving an objectID exclusively is ok.

import bson import pandas as pd pd.DataFrame({'A': [bson.objectid.ObjectId()]}).to_json() Out[4]: '{"A":{"0":{"binary":"W\u0e32\u224cug\u00fcR","generation_time":1474361586000}}}' pd.DataFrame({'A': [bson.objectid.ObjectId()], 'B': [1]}).to_json() Out[5]: '{"A":{"0":{"binary":"W\u0e4e\u224cug\u00fcS","generation_time":1474361614000}},"B":{"0":1}}'

However, if you provide an ID explicitly, an exception is raised

pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')]}).to_json() Traceback (most recent call last): File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')]}).to_json() File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/core/generic.py", line 1056, in to_json default_handler=default_handler) File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/io/json.py", line 36, in to_json date_unit=date_unit, default_handler=default_handler).write() File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/io/json.py", line 79, in write default_handler=self.default_handler) OverflowError: Unsupported UTF-8 sequence length when encoding string

And worse, if the column is not the only column, the entire process dies.

pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')], 'B': [1]}).to_json() Process finished with exit code 139

Expected Output

output of pd.show_versions()

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 26.1.1
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.5.0
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

pymongo version is 3.3.0