BUG: read_table() ignores dtype argument when multi-character separator is specified · Issue #6607 · pandas-dev/pandas (original) (raw)

related #4363
closes #3374

Here is a minimal example:

In [21]: df = pd.DataFrame({'A': range(5), 'B': rand(5)})

In [22]: df
Out[22]: 
   A         B
0  0  0.402616
1  1  0.880696
2  2  0.184491
3  3  0.832732
4  4  0.393917

[5 rows x 2 columns]

In [23]: df.to_csv('test.csv', sep=' ')

In [24]: pd.read_csv('test.csv', sep=' ', dtype={'A': np.float64}).dtypes
Out[24]: 
Unnamed: 0      int64
A             float64
B             float64
dtype: object

Here the dtype argument behaves as expected, and column A has type float. However with sep='\s' the dtype argument appears to be ignored:

In [25]: pd.read_csv('test.csv', sep='\s', dtype={'A': np.float64}).dtypes
Out[25]: 
A      int64
B    float64
dtype: object

Version information

In [27]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 10.8.0
machine: i386
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.1-413-ga71ede3
Cython: 0.20.1
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.0.0-dev
sphinx: 1.2.1
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2013b
bottleneck: None
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.2
lxml: 3.3.1
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.2
pymysql: None
psycopg2: None