Unable to read Stata value_labels from .dta-file created by pandas (to_stata()) · Issue #16923 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
d = {'A':['B','E','C','A','E']} df = pd.DataFrame(data=d) df['A'] = df['A'].astype('category') # Setting as categorical, similar to value_label in Stata
df.to_stata('test.dta') # Writing dataframe to Stata-file
dfs_fromstata = pd.read_stata('test.dta', iterator=True) # Creating StataReader-object
print(dfs_fromstata.value_labels()) # Printing value_labels
Traceback (most recent call last): File "", line 1, in print(dfs_fromstata.value_labels()) # Printing value_labels File "C:\Anaconda3\lib\site-packages\pandas\io\stata.py", line 1725, in value_labels self._read_value_labels() File "C:\Anaconda3\lib\site-packages\pandas\io\stata.py", line 1329, in _read_value_labels offset = self.nobs * self._dtype.itemsize AttributeError: 'NoneType' object has no attribute 'itemsize'
Problem description
It seems as if read_stata() is not able to read value_labels properly from a .dta-file created by to_stata(). If the file is created by Stata itself, the value_labels are read correctly. Also, if the .dta-file created by to_stata() is opened and saved by Stata, the value_labels are read correctly.
Expected Output
{'A': {0: 'A', 1: 'B', 2: 'C', 3: 'E'}}
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.2
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None