BUG: converters of dtype is ignored in read_csv if related to index column · Issue #40589 · pandas-dev/pandas (original) (raw)


Code Sample, a copy-pastable example

test.txt file :
5 6
7 8
9 10

Your code here

import pandas as pd

print(pd.version)

print("Case 1: no converters or dtype. ") a = pd.read_csv("test.txt", sep="\t", index_col=["Index"], names=["Index", "Length"])

print(a["Length"]) print(a.index)

print("Case 2: converters option") a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], converters={"Index": str, "Length": str}) print(a["Length"]) print(a.index)

print("Case 3: dtype option") a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], dtype={"Index": str, "Length": str}) print(a["Length"]) print(a.index)

Problem description

Output of code above:

Case 1: no converters or dtype.
Index
5 6
7 8
9 10
Name: Length, dtype: int64
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 2: converters option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 3: dtype option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')

Converters and dtype are not applied to index column when reading file via pd.read_csv .
In all three cases type of index elements remains int .

Other columns are converted as expected.

Expected Output

In "Case 2" and "Case 3" type of index elements expected to be str.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2c8480
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-54-generic
Version : #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2