Read SAS (sas7bdat) · Issue #12654 · pandas-dev/pandas (original) (raw)
Very excited to have this new feature in Pandas. I have a few comments to share:
pd.read_sas()
doesn't read SAS date variable correctly (this is noted in the doc). Dates are read asnumpy.float64
. Note in SAS, dates are recorded as numbers relative to 1960-1-1. It would be helpful to allow some sort of arguments to parse the date variable correctly.- Moreover, SAS has some special missing variables such as
.B
or.R
. I wonder how are these cases treated? - Not nearly as fast as
read_csv()
. To read a 700MB SAS data. The time is
CPU times: user 1min 47s, sys: 955 ms, total: 1min 48s
Wall time: 1min 48s
The time for the same CSV file (I covered the same file to CSV using SAS) is
CPU times: user 3.93 s, sys: 343 ms, total: 4.28 s
Wall time: 4.29 s