.loc[iterator] treats missing keys differently than .loc[list] · Issue #20748 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

In [2]: pd.Series([1,2,3]).loc[[i for i in (4,5)]]

KeyError Traceback (most recent call last) in () ----> 1 pd.Series([1,2,3]).loc[[i for i in (4,5)]]

/home/nobackup/repo/pandas/pandas/core/indexing.py in getitem(self, key) 1370 1371 maybe_callable = com._apply_if_callable(key, self.obj) -> 1372 return self._getitem_axis(maybe_callable, axis=axis) 1373 1374 def _is_scalar_access(self, key):

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1829 raise ValueError('Cannot index with multidimensional key') 1830 -> 1831 return self._getitem_iterable(key, axis=axis) 1832 1833 # nested tuple slicing

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis) 1109 1110 if self._should_validate_iterable(axis): -> 1111 self._has_valid_type(key, axis) 1112 1113 labels = self.obj._get_axis(axis)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis) 1683 raise KeyError( 1684 u"None of [{key}] are in the [{axis}]".format( -> 1685 key=key, axis=self.obj._get_axis_name(axis))) 1686 else: 1687

KeyError: 'None of [[4, 5]] are in the [index]'

In [3]: pd.Series([1,2,3]).loc[(i for i in (4,5))] Out[3]: 4 NaN 5 NaN dtype: float64

Problem description

Since we convert iterators to lists anyway...

indexer, keyarr = labels._convert_listlike_indexer(

... we might as well do the conversion as soon as possible (i.e., in __getitem__), and simplify the code by only handling list-likes which have a length. I would also consider changing is_list_like to return False for iterators, or provide it with a has_len=False argument.

It would also solve this other, less important, difference:

In [2]: pd.Series([1,2,3]).loc[[i for i in (2,5)]]
/usr/bin/ipython3:1: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  #! /bin/sh
Out[2]: 
2    3.0
5    NaN
dtype: float64

In [3]: pd.Series([1,2,3]).loc[(i for i in (2,5))]
Out[3]: 
2    3.0
5    NaN
dtype: float64

... and probably others.

Expected Output

Exactly the same for lists and iterators.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: d04b746
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.23.0.dev0+754.gd04b7464d.dirty
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1