QST: Is this expected behavior when pd.read_csv() with na_values arguments? · Issue #59303 · pandas-dev/pandas (original) (raw)
Research
- I have searched the [pandas] tag on StackOverflow for similar questions.
- I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/46397526/how-to-use-na-values-option-in-the-pd-read-csv-function
Question about pandas
I have a simple csv file that looks like this:
x,y,z
a,-99,100
b,-99,200
c,-99.0,300
d,-99.0,400
and when I tried a few different na_values, I got different column y back:
import pandas as pd
df1 = pd.read_csv('test.csv', na_values={"y": -99})
print("df1 = \n", df1)
df2 = pd.read_csv('test.csv', na_values={"y": -99.0})
print("\ndf2 = \n", df2)
df3 = pd.read_csv('test.csv', na_values={"y": [-99.0, -99]})
print("\ndf3 = \n", df3)
df4 = pd.read_csv('test.csv', na_values={"y": [-99, -99.0]})
print("\ndf4 = \n", df4)
Results:
df1 =
x y z
0 a NaN 100
1 b NaN 200
2 c NaN 300
3 d NaN 400
df2 =
x y z
0 a -99.0 100
1 b -99.0 200
2 c NaN 300
3 d NaN 400
df3 =
x y z
0 a -99.0 100
1 b -99.0 200
2 c NaN 300
3 d NaN 400
df4 =
x y z
0 a NaN 100
1 b NaN 200
2 c NaN 300
3 d NaN 400
I'm not sure if this is a bug or it is by design, so just throwing out a general question here. Thank you!
Pandas version is 2.2.1, just in case needed.