read_csv python engine errors · Issue #10476 · pandas-dev/pandas (original) (raw)
Only thing I changed from my usually working reduction pipeline is to try engine="python"
(because I wanted to use nrows
for a smaller test-read, but that fails as well, and I thought maybe the python engine is buggy currently):
$ python reduction.py ~/data/planet4/2015-06-21_planet_four_classifications.csv INFO:Starting reduction. Traceback (most recent call last): File "reduction.py", line 258, in args.test_n_rows, args.remove_duplicates) File "reduction.py", line 182, in main data = [chunk for chunk in reader] File "reduction.py", line 182, in data = [chunk for chunk in reader] File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 697, in iter yield self.read(self.chunksize) File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 721, in read ret = self._engine.read(nrows) File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 1556, in read content = self._get_lines(rows) File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 2007, in _get_lines for _ in range(rows): TypeError: 'float' object cannot be interpreted as an integer
My function call is this:
as chunksize and nrows cannot be used together yet, i switch chunksize
to None if I want test_n_rows for a small test database:
if test_n_rows: chunks = None else: chunks = 1e6
creating reader object with pandas interface for csv parsing
doing this in chunks as its faster. Also, later will do a split
into multiple processes to do this.
reader = pd.read_csv(fname, chunksize=chunks, na_values=['null'], usecols=analysis_cols, nrows=test_n_rows, engine='c')
Using pandas-0.16.2_58_g01995b2-py3.4