BUG: read_table() ignores dtype argument when multi-character separator is specified · Issue #6607 · pandas-dev/pandas (original) (raw)
Here is a minimal example:
In [21]: df = pd.DataFrame({'A': range(5), 'B': rand(5)})
In [22]: df
Out[22]:
A B
0 0 0.402616
1 1 0.880696
2 2 0.184491
3 3 0.832732
4 4 0.393917
[5 rows x 2 columns]
In [23]: df.to_csv('test.csv', sep=' ')
In [24]: pd.read_csv('test.csv', sep=' ', dtype={'A': np.float64}).dtypes
Out[24]:
Unnamed: 0 int64
A float64
B float64
dtype: object
Here the dtype argument behaves as expected, and column A has type float. However with sep='\s' the dtype argument appears to be ignored:
In [25]: pd.read_csv('test.csv', sep='\s', dtype={'A': np.float64}).dtypes
Out[25]:
A int64
B float64
dtype: object
Version information
In [27]: show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 10.8.0
machine: i386
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.13.1-413-ga71ede3
Cython: 0.20.1
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.0.0-dev
sphinx: 1.2.1
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2013b
bottleneck: None
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.2
lxml: 3.3.1
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.2
pymysql: None
psycopg2: None