Unable to import Stata 13 database files with read_stata() · Issue #7360 · pandas-dev/pandas (original) (raw)

pandas v0.14.0 (May 31 , 2014) seems uncapable of importing Stata 13 datasets although according to this http://pandas.pydata.org/pandas-docs/stable/whatsnew.html, it should. Stata 12 files can be imported without problems.

The output of running this

import pandas pandas.show_versions() dta = pandas.io.stata.read_stata('D:\Datos\rferrer\Desktop\myauto.dta')

follows:

%run D:/Datos/RFERRER/Desktop/import_stata13.py

INSTALLED VERSIONS

commit: None python: 2.7.6.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None

pandas: 0.14.0 nose: 1.3.0 Cython: 0.19.2 numpy: 1.8.0 scipy: 0.14.0 statsmodels: 0.5.0 IPython: 1.2.1 sphinx: 1.2.2 patsy: 0.2.0 scikits.timeseries: 0.91.3 dateutil: 2.2 pytz: 2013.8 bottleneck: None tables: 2.4.0 numexpr: 2.2.2 matplotlib: 1.3.1 openpyxl: 1.8.5 xlrd: 0.9.2 xlwt: 0.7.5 xlsxwriter: None lxml: 3.2.3 bs4: None html5lib: 0.95-dev bq: None apiclient: None rpy2: None sqlalchemy: 0.8.3 pymysql: None psycopg2: None C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\openpyxl_init_.py:31: UserWarning: The installed version of lxml is too old to be used with openpyxl warnings.warn("The installed version of lxml is too old to be used with openpyxl")

TypeError Traceback (most recent call last) C:\Users\rferrer\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.1938.win-x86_64\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc) 195 else: 196 filename = fname --> 197 exec compile(scripttext, filename, 'exec') in glob, loc 198 else: 199 def execfile(fname, *where):

D:\Datos\RFERRER\Desktop\import_stata13.py in () 3 pandas.show_versions() 4 ----> 5 dta = pandas.io.stata.read_stata('D:\Datos\rferrer\Desktop\myauto.dta')

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in read_stata(filepath_or_buffer, convert_dates, convert_categoricals, encoding, index) 45 identifier of column that should be used as index of the DataFrame 46 """ ---> 47 reader = StataReader(filepath_or_buffer, encoding) 48 49 return reader.data(convert_dates, convert_categoricals, index)

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in init(self, path_or_buf, encoding) 455 self.path_or_buf = path_or_buf 456 --> 457 self._read_header() 458 459 def _read_header(self):

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _read_header(self) 657 658 """Calculate size of a data record.""" --> 659 self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist) 660 661 def _calcsize(self, fmt):

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in (x) 657 658 """Calculate size of a data record.""" --> 659 self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist) 660 661 def _calcsize(self, fmt):

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _calcsize(self, fmt) 661 def _calcsize(self, fmt): 662 return (type(fmt) is int and fmt --> 663 or struct.calcsize(self.byteorder + fmt)) 664 665 def _col_size(self, k=None):

TypeError: cannot concatenate 'str' and 'NoneType' objects

The dataset myauto.dta is just the auto dataset made available running sysuse auto within Stata.

The problem is originally documented here: http://stackoverflow.com/questions/24053652/pandas-and-stata-13-files.

My Python is set up with Enthough Canopy 1.4.0 (64 bit).