ENH: Support reading value labels for Stata formats 108 (Stata 6) and earlier · Issue #58154 · pandas-dev/pandas (original) (raw)

Feature Type

Problem Description

Currently Pandas supports reading value labels for data files saved in 111 (Stata 7 SE) and later formats. It would be nice if this could be extended to all supported format versions.

Feature Description

This could be implemented by extending the function _read_value_labels in pandas/io/stata.py.

Value labels in the 108 format use the same structure as later versions, except that label names are restricted to 8 characters, plus a null terminator [1].

Values labels prior to the 108 format used a simple structure for each label containing a list of codes, followed by a list of 8 character strings corresponding to each code [2].

References:
[1] Description of the 108 .dta format, section 5.6 Value Labels (dta_108.txt)
[2] Description of the 105 .dta format, section 5.6 Value Labels (dta_105.txt)

Alternative Solutions

Currently the only way to import these labels is to open the file in another piece of software that does support reading them, and then save them to a more recent version for which Pandas has value label support.

Additional Context

No response