ENH: Support reading value labels for Stata formats 108 (Stata 6) and earlier · Issue #58154 · pandas-dev/pandas (original) (raw)
Feature Type
- Adding new functionality to pandas
- Changing existing functionality in pandas
- Removing existing functionality in pandas
Problem Description
Currently Pandas supports reading value labels for data files saved in 111 (Stata 7 SE) and later formats. It would be nice if this could be extended to all supported format versions.
Feature Description
This could be implemented by extending the function _read_value_labels
in pandas/io/stata.py.
Value labels in the 108 format use the same structure as later versions, except that label names are restricted to 8 characters, plus a null terminator [1].
Values labels prior to the 108 format used a simple structure for each label containing a list of codes, followed by a list of 8 character strings corresponding to each code [2].
References:
[1] Description of the 108 .dta format, section 5.6 Value Labels (dta_108.txt)
[2] Description of the 105 .dta format, section 5.6 Value Labels (dta_105.txt)
Alternative Solutions
Currently the only way to import these labels is to open the file in another piece of software that does support reading them, and then save them to a more recent version for which Pandas has value label support.
Additional Context
No response