read_excel() modifies provided types dict when accessing file with duplicate column · Issue #42462 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
test.xlsx :
a | a | b | c |
---|---|---|---|
1 | 1 | b1 | c1 |
2 | 2 | b2 | c2 |
3 | 3 | b3 | c3 |
import pandas as pd
types_dict = {'a': str,
'b': str,
'c': str,
}
if __name__ == "__main__":
df = pd.read_excel('./test.xlsx', dtype=type_dict)
print(list(type_dict.keys()))
>> ['a', 'b', 'c', 'a.1']
Bug/Issue description:
When using dtype loading a .xlsx-file with a duplicate column into a dataframe modifies the provided types_dict / adds entries for duplicate columns.
It seems to me like the modification of the types_dict is an unwanted side effect.