ENH: read_excel dtypes and converts (original) (raw)

Dear all,

Here from the mailing list: https://groups.google.com/forum/#!topic/pydata/jKiPOvYUQ1c

I have an excel table about family ages like this

Family	People	Mean size [cm]
Foo	5	173.0
Bar	3	189.0

and I would like to use read_excel to parse it into Python. I would like "People" to be read as an integer, "Mean size [cm]" as a float. (And "Family" as a string, but that might be a different issue.) Now:

if I set convert_float=True, the last column reads as int
if I set convert_float=False, the second column reads as float

Neither one is correct, for a stupid reason: there happen to be those .0 in all sizes! So I would like to specify something like:

convert_float = ['People']

so only that column gets converted. An even better solution would be to be explicit about types of some columns, letting pandas perform the automagic for the others, such as:

read_excel('foo.xlsx', types={'People': np.uint8, 'Family': 'S3'})

but this changes the signature of the function more significantly.

Are you folks in favour of any of this? If yes, I can get a look and try to code it in.