issue with StataReader for stata files versions 108 and older · Issue #12232 · pandas-dev/pandas (original) (raw)
I am having an issue with the StataReader class, which is found in stata.py ("pandas/io/stata.py").
I have pandas: 0.17.1.
The following is the python code I am trying to run:
import sys
reload(sys).setdefaultencoding('utf-8')
import pandas as pd
from pandas.io import stata
sr=stata.StataReader(fileName)
where fileName is a stata file.
The following code is part of the _read_old_header method(which starts on line 1184) of the StataReader class in stata.py, which gets called during the initialization of a StataReader object:
if self.format_version > 108:
typlist = [ord(self.path_or_buf.read(1))
for i in range(self.nvar)]
else:
typlist = [
self.OLD_TYPE_MAPPING[
self._decode_bytes(self.path_or_buf.read(1))
] for i in range(self.nvar)
]
I have no errors when my stata files are newer than version 108, but with files that are version 105, there seems to be a bug in _decode_bytes. The above code passes in self and only one additional argument to _decode_bytes, the string that is returned by path_or_buf.read(1).
Here is the the method _decode_bytes (line 896):
def _decode_bytes(self, str, errors=None):
if compat.PY3 or self._encoding is not None:
return str.decode(self._encoding, errors)
else:
return str
When no third argument is passed in (as is the case when it is called by _read_old_header), the argument "errors" is set to None. Here is where the error is thrown. The error is:
TypeError: decode() argument 2 must be string, not None
That is the issue: the decode method of the string class is expecting the second argument to not be a None type, but _decode_bytes passes in errors as None by default.