DataFrame.to_csv(quoting=csv.QUOTE_NONNUMERIC) quotes numeric values · Issue #12922 · pandas-dev/pandas (original) (raw)

Failing test

def test_pandas():
    import tempfile
    import csv
    import pandas as pd
    import numpy as np

    df = pd.DataFrame.from_dict({'column': [1.0, 2.0]})
    assert df['column'].dtype == np.dtype('float')

    with tempfile.TemporaryFile() as f:
        df.to_csv(f, quoting=csv.QUOTE_NONNUMERIC, index=False)

        f.seek(0)
        lines = f.read().splitlines()
        assert lines[0] == '"column"'
        assert not lines[1].startswith('"') # <--- THIS FAILS
        assert [1, 2] == map(float, lines[1:])

The issue is that the floats are being output wrapped with quotes, even though I requested QUOTE_NONNUMERIC.

The problem is that pandas.core.internals.FloatBlock.to_native_types (and by extension pandas.formats.format.FloatArrayFormatter.get_result_as_array) unconditionally formats the float array to a str array, which is then passed unchanged to the csv module and hence will be wrapped in quotes by that code.

I'm not 100% sure but the fix may be to have FloatBlock.to_native_types check if quoting is set, and if so to skip using the FloatArrayFormatter? I say this because pandas.indexes.base.Index._format_native_types already has a special case along these lines. This does seem a bit dirty though!

Here is an awful monkeypatch that works around the problem:

orig_to_native_types = pd.core.internals.FloatBlock.to_native_types
def to_native_types(self, *args, **kwargs):
    if kwargs.get('quoting'):
        values = self.values
        slicer = kwargs.get('slicer')
        if slicer is not None:
            values = values[:, slicer]

        return values

    res = orig_to_native_types(self, *args, **kwargs)
    print 'FloatBlock.to_native_types', args, kwargs, '=', res
    return res
pd.core.internals.FloatBlock.to_native_types = to_native_types

output of `pd.show_versions()`

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: None
pip: 8.1.1
setuptools: 7.0
Cython: 0.20.1
numpy: 1.11.0
scipy: 0.13.3
statsmodels: None
xarray: None
IPython: 3.2.1
sphinx: None
patsy: 0.3.0
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: None
matplotlib: 1.3.1
openpyxl: 2.0.4
xlrd: 0.9.2
xlwt: None
xlsxwriter: None
lxml: 3.3.2
bs4: 4.2.0
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.7.2
boto: None

DataFrame.to_csv(quoting=csv.QUOTE_NONNUMERIC) quotes numeric values · Issue #12922 · pandas-dev/pandas (original) (raw)

Failing test

output of pd.show_versions()

output of `pd.show_versions()`