read_csv's na_values dict format cannot parse float type (original) (raw)
Minor issue regarding read_csv's na_values argument in dict format. I note that the list format works fine when the NA value is given as a float-type (which is often the intuitive choice), e.g.:
co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", comment = "#", delim_whitespace = True, names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"], na_values =[-99.99, -1])
However, the dict format is more appropriate for this classic data set, since different columns are defining different NA values. Unfortunately, this fails with an error about float type:
co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", comment = "#", delim_whitespace = True, names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"], na_values = {"decimal_date" : -99.99, "days" : -1})
and the NA value must be given as a string; which feels all kinds of wrong here:
co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", comment = "#", delim_whitespace = True, names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"], na_values = {"decimal_date" : "-99.99", "days" : "-1"})
Thanks for all the pandas awesomeness,