.loc[iterator] treats missing keys differently than .loc[list] · Issue #20748 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
In [2]: pd.Series([1,2,3]).loc[[i for i in (4,5)]]
KeyError Traceback (most recent call last) in () ----> 1 pd.Series([1,2,3]).loc[[i for i in (4,5)]]
/home/nobackup/repo/pandas/pandas/core/indexing.py in getitem(self, key) 1370 1371 maybe_callable = com._apply_if_callable(key, self.obj) -> 1372 return self._getitem_axis(maybe_callable, axis=axis) 1373 1374 def _is_scalar_access(self, key):
/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1829 raise ValueError('Cannot index with multidimensional key') 1830 -> 1831 return self._getitem_iterable(key, axis=axis) 1832 1833 # nested tuple slicing
/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis) 1109 1110 if self._should_validate_iterable(axis): -> 1111 self._has_valid_type(key, axis) 1112 1113 labels = self.obj._get_axis(axis)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis) 1683 raise KeyError( 1684 u"None of [{key}] are in the [{axis}]".format( -> 1685 key=key, axis=self.obj._get_axis_name(axis))) 1686 else: 1687
KeyError: 'None of [[4, 5]] are in the [index]'
In [3]: pd.Series([1,2,3]).loc[(i for i in (4,5))] Out[3]: 4 NaN 5 NaN dtype: float64
Problem description
Since we convert iterators to lists anyway...
indexer, keyarr = labels._convert_listlike_indexer( |
---|
... we might as well do the conversion as soon as possible (i.e., in __getitem__
), and simplify the code by only handling list-likes which have a length. I would also consider changing is_list_like
to return False
for iterators, or provide it with a has_len=False
argument.
It would also solve this other, less important, difference:
In [2]: pd.Series([1,2,3]).loc[[i for i in (2,5)]]
/usr/bin/ipython3:1: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
#! /bin/sh
Out[2]:
2 3.0
5 NaN
dtype: float64
In [3]: pd.Series([1,2,3]).loc[(i for i in (2,5))]
Out[3]:
2 3.0
5 NaN
dtype: float64
... and probably others.
Expected Output
Exactly the same for lists and iterators.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: d04b746
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.23.0.dev0+754.gd04b7464d.dirty
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1