ENH: Add support for reading Stata 7 (non-SE) format dta files · Issue #47176 · pandas-dev/pandas (original) (raw)

If I attempt to read in a dta file saved in Stata 7 format using the read_stata() function I get the following error message:

ValueError: Version of given Stata file is 110. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).

This is unfortunate as this is the version of the format used by default by R's write.dta() function in the foreign package.

Describe the solution you'd like

It would be nice if this version of the data format was supported, at least for reading, in the same manner as the other variants.

API breaking implications

None that I am aware of.

Describe alternatives you've considered

write.dta also supports saving to the versions 6, 8 and 10 of the dta format which are supported, so I could manually specify a different version instead. Alternatively I could save my data using the readstata13 or haven packages which both support more recent versions of the Stata dta format.

Another option would be to use a different data format entirely that is also supported by both R and Pandas.

Additional context

It appears that to implement support for this additional variant the following two changes are required:

  1. Edit the line:
    if self.format_version > 108:

To:

if self.format_version > 110:

  1. Change the line:
    if self.format_version not in [104, 105, 108, 111, 113, 114, 115]:

To include 110 in the list of supported variants, i.e.

if self.format_version not in [104, 105, 108, 110, 111, 113, 114, 115]:

After making these changes locally my version 7 files appear to load correctly.