Issue with CSV parser when using usecols · Issue #3192 · pandas-dev/pandas (original) (raw)

The following code:

import pandas from io import StringIO

data = u'1,2,3\n4,5,6\n7,8,9' df = pandas.read_csv(StringIO(data), dtype={'B': int, 'C':float}, header=None, names=['A', 'B', 'C'], converters={'A': str}, ) print(df.dtypes) print df = pandas.read_csv(StringIO(data), usecols=[0,2], dtype={'B': int, 'C':float}, header=None, names=['A', 'B', 'C'], converters={'A': str}, ) print(df.dtypes) print df = pandas.read_csv(StringIO(data), usecols=[0,1,2], dtype={'B': int, 'C':float}, header=None, names=['A', 'B', 'C'], converters={'A': str}, ) print(df.dtypes)

outputs

A object B int32 C float64 dtype: object

A object C float64 dtype: object

A object B object C object dtype: object

The last dtype should be the same as the first one. The issue arises when passing in a converters dictionary together with a usecols that uses all the available columns.

The commit https://github.com/tr11/pandas/commit/1ac3700e72a5861a7d8544a72d77a4d64c71f118 in my fork seems to fix this issue.