DataFrame.to_csv(quoting=csv.QUOTE_NONNUMERIC) quotes numeric values · Issue #12922 · pandas-dev/pandas (original) (raw)
Failing test
def test_pandas():
import tempfile
import csv
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({'column': [1.0, 2.0]})
assert df['column'].dtype == np.dtype('float')
with tempfile.TemporaryFile() as f:
df.to_csv(f, quoting=csv.QUOTE_NONNUMERIC, index=False)
f.seek(0)
lines = f.read().splitlines()
assert lines[0] == '"column"'
assert not lines[1].startswith('"') # <--- THIS FAILS
assert [1, 2] == map(float, lines[1:])
The issue is that the floats are being output wrapped with quotes, even though I requested QUOTE_NONNUMERIC.
The problem is that pandas.core.internals.FloatBlock.to_native_types
(and by extension pandas.formats.format.FloatArrayFormatter.get_result_as_array
) unconditionally formats the float array to a str array, which is then passed unchanged to the csv
module and hence will be wrapped in quotes by that code.
I'm not 100% sure but the fix may be to have FloatBlock.to_native_types
check if quoting is set, and if so to skip using the FloatArrayFormatter
? I say this because pandas.indexes.base.Index._format_native_types
already has a special case along these lines. This does seem a bit dirty though!
Here is an awful monkeypatch that works around the problem:
orig_to_native_types = pd.core.internals.FloatBlock.to_native_types
def to_native_types(self, *args, **kwargs):
if kwargs.get('quoting'):
values = self.values
slicer = kwargs.get('slicer')
if slicer is not None:
values = values[:, slicer]
return values
res = orig_to_native_types(self, *args, **kwargs)
print 'FloatBlock.to_native_types', args, kwargs, '=', res
return res
pd.core.internals.FloatBlock.to_native_types = to_native_types
output of pd.show_versions()
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.0
nose: None
pip: 8.1.1
setuptools: 7.0
Cython: 0.20.1
numpy: 1.11.0
scipy: 0.13.3
statsmodels: None
xarray: None
IPython: 3.2.1
sphinx: None
patsy: 0.3.0
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: None
matplotlib: 1.3.1
openpyxl: 2.0.4
xlrd: 0.9.2
xlwt: None
xlsxwriter: None
lxml: 3.3.2
bs4: 4.2.0
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.7.2
boto: None