Read SAS (sas7bdat) · Issue #12654 · pandas-dev/pandas (original) (raw)

Very excited to have this new feature in Pandas. I have a few comments to share:

  1. pd.read_sas() doesn't read SAS date variable correctly (this is noted in the doc). Dates are read as numpy.float64. Note in SAS, dates are recorded as numbers relative to 1960-1-1. It would be helpful to allow some sort of arguments to parse the date variable correctly.
  2. Moreover, SAS has some special missing variables such as .B or .R. I wonder how are these cases treated?
  3. Not nearly as fast as read_csv(). To read a 700MB SAS data. The time is
CPU times: user 1min 47s, sys: 955 ms, total: 1min 48s
Wall time: 1min 48s

The time for the same CSV file (I covered the same file to CSV using SAS) is

CPU times: user 3.93 s, sys: 343 ms, total: 4.28 s
Wall time: 4.29 s