ENH: Add support for reading 102-format Stata dta files by cmjcharlton · Pull Request #58978 · pandas-dev/pandas (original) (raw)

This would complete support for reading all historic Stata dta format versions.

I would understand if you chose not to merge this as:

Having said that, I am reasonably confident that the changes are correct, and Stata is happy to open and view the test data that I created:

. dtaversion "stata-compat-102.dta"
  (file "stata-compat-102.dta" is
   .dta-format 102 from Stata 1)
. use "stata-compat-102.dta"
. describe

Contains data from stata-compat-102.dta
 Observations:             3                  
    Variables:             7                  
-------------------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------------------------------------------
index           long    %12.0g                
i8              int     %8.0g                 
i16             int     %8.0g                 
i32             long    %12.0g                
f               float   %9.0g                 
d               double  %10.0g                
dt              double  %10.0g                
-------------------------------------------------------------------------------------------------------------------
Sorted by:

. list

     +--------------------------------------------------+
     | index   i8     i16        i32     f    d      dt |
     |--------------------------------------------------|
  1. |     1   -1   -1025   -8388609   -.1   .1   14610 |
  2. |     2    0       0          0   -.2   .2   14611 |
  3. |     3    1    1025    8388609   -.3   .3   14612 |
     +--------------------------------------------------+
. dtaversion "stata4_102.dta"
  (file "stata4_102.dta" is
   .dta-format 102 from Stata 1)
. use "stata4_102.dta"
. describe

Contains data from stata4_102.dta
 Observations:            10                  
    Variables:             5                  
-------------------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------------------------------------------
fulllab         int     %8.0g      full_lbl   A fully labeled variable.
fulllab2        float   %9.0g      full_lbl   Another fully labeled variable.
incmplab        long    %12.0g     incp_lbl   Some values without labels.
misslab         int     %8.0g      miss_lbl   Some missing value labels.
floatlab        float   %9.0g      full_lbl   Floating point with labels.
-------------------------------------------------------------------------------------------------------------------
Sorted by:

. list

     +----------------------------------------------------+
     | fulllab   fulllab2   incmplab   misslab   floatlab |
     |----------------------------------------------------|
  1. |     one        ten        one       one        one |
  2. |     two       nine        two       two        two |
  3. |   three      eight      three     three      three |
  4. |    four      seven          4      four       four |
  5. |    five        six          5         .       five |
     |----------------------------------------------------|
  6. |     six       five          6         .        six |
  7. |   seven       four          7         .      seven |
  8. |   eight      three          8         .      eight |
  9. |    nine        two          9         .       nine |
 10. |     ten        one        ten         .        ten |
     +----------------------------------------------------+