ENH: Add support for reading 102-format Stata dta files by cmjcharlton · Pull Request #58978 · pandas-dev/pandas (original) (raw)
- closes #xxxx (Replace xxxx with the GitHub issue number)
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Added type annotations to new arguments/methods/functions.
- Added an entry in the latest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.
This would complete support for reading all historic Stata dta format versions.
I would understand if you chose not to merge this as:
- No formal documentation exists for this version, so I have had to infer the details from later formats and the Stata 1 user manual.
- Unlike all the other version formats I have not been able to locate any sample data written in this version (and hence I haven't created a linked issue).
Having said that, I am reasonably confident that the changes are correct, and Stata is happy to open and view the test data that I created:
. dtaversion "stata-compat-102.dta"
(file "stata-compat-102.dta" is
.dta-format 102 from Stata 1)
. use "stata-compat-102.dta"
. describe
Contains data from stata-compat-102.dta
Observations: 3
Variables: 7
-------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------------------------------------------
index long %12.0g
i8 int %8.0g
i16 int %8.0g
i32 long %12.0g
f float %9.0g
d double %10.0g
dt double %10.0g
-------------------------------------------------------------------------------------------------------------------
Sorted by:
. list
+--------------------------------------------------+
| index i8 i16 i32 f d dt |
|--------------------------------------------------|
1. | 1 -1 -1025 -8388609 -.1 .1 14610 |
2. | 2 0 0 0 -.2 .2 14611 |
3. | 3 1 1025 8388609 -.3 .3 14612 |
+--------------------------------------------------+
. dtaversion "stata4_102.dta"
(file "stata4_102.dta" is
.dta-format 102 from Stata 1)
. use "stata4_102.dta"
. describe
Contains data from stata4_102.dta
Observations: 10
Variables: 5
-------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------------------------------------------
fulllab int %8.0g full_lbl A fully labeled variable.
fulllab2 float %9.0g full_lbl Another fully labeled variable.
incmplab long %12.0g incp_lbl Some values without labels.
misslab int %8.0g miss_lbl Some missing value labels.
floatlab float %9.0g full_lbl Floating point with labels.
-------------------------------------------------------------------------------------------------------------------
Sorted by:
. list
+----------------------------------------------------+
| fulllab fulllab2 incmplab misslab floatlab |
|----------------------------------------------------|
1. | one ten one one one |
2. | two nine two two two |
3. | three eight three three three |
4. | four seven 4 four four |
5. | five six 5 . five |
|----------------------------------------------------|
6. | six five 6 . six |
7. | seven four 7 . seven |
8. | eight three 8 . eight |
9. | nine two 9 . nine |
10. | ten one ten . ten |
+----------------------------------------------------+