ENH: Add support for reading Stata 7 (non-SE) format dta files · Issue #47176 · pandas-dev/pandas (original) (raw)
Is your feature request related to a problem?
If I attempt to read in a dta file saved in Stata 7 format using the read_stata() function I get the following error message:
ValueError: Version of given Stata file is 110. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).
This is unfortunate as this is the version of the format used by default by R's write.dta() function in the foreign
package.
Describe the solution you'd like
It would be nice if this version of the data format was supported, at least for reading, in the same manner as the other variants.
API breaking implications
None that I am aware of.
Describe alternatives you've considered
write.dta
also supports saving to the versions 6, 8 and 10 of the dta format which are supported, so I could manually specify a different version instead. Alternatively I could save my data using the readstata13
or haven
packages which both support more recent versions of the Stata dta format.
Another option would be to use a different data format entirely that is also supported by both R and Pandas.
Additional context
It appears that to implement support for this additional variant the following two changes are required:
- Edit the line:
if self.format_version > 108:
To:
if self.format_version > 110:
- Change the line:
if self.format_version not in [104, 105, 108, 111, 113, 114, 115]:
To include 110 in the list of supported variants, i.e.
if self.format_version not in [104, 105, 108, 110, 111, 113, 114, 115]:
After making these changes locally my version 7 files appear to load correctly.