Pandas assign malfunctions with if/else conditional check · Issue #30357 · pandas-dev/pandas (original) (raw)

Problem description

I need to create a new column within a pandas daframe from the values of another column. For instance, provided the city column in the following dataframe , the new column will duplicate the value from the city column only if it is within a list , otherwise the correspnding netries woill be populated as "other". The following code snippet works like a charm.

df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa', 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]}) df['MajorCity'] = df.apply(lambda x: x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other',axis=1) print(df)

Output

city temp MajorCity
Kolkata 0 Kolkata
Delhi 1 Delhi
Mumbai 2 Mumbai
Bankura 3 other
Dhaka 4 other
Jaipur 5 other
Goa 6 other
Delhi 7 Delhi
Mumbai 8 Mumbai
Kolkata 9 Kolkata

But when I try to implement it with pandas assign function as below

Your code here

df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa', 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]}) df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')

I received the following error:

Error message:

ValueError Traceback (most recent call last)
in
1 df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa',
2 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]})
----> 3 df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in assign(self, **kwargs)
3667 if PY36:
3668 for k, v in kwargs.items():
-> 3669 data[k] = com.apply_if_callable(v, data)
3670 else:
3671 # <= 3.5: do all calculations first...

~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in apply_if_callable(maybe_callable, obj, **kwargs)
363
364 if callable(maybe_callable):
--> 365 return maybe_callable(obj, **kwargs)
366
367 return maybe_callable

in (x)
1 df = pd.DataFrame({'city': ['Kolkata','Delhi','Mumbai','Bankura','Dhaka','Jaipur','Goa',
2 'Delhi','Mumbai','Kolkata'],'temp':[i for i in range(10)]})
----> 3 df.assign(MajorCity = lambda x:x.city if x.city in ['Kolkata','Delhi','Mumbai'] else 'other')

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in nonzero(self)
1553 "The truth value of a {0} is ambiguous. "
1554 "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
-> 1555 self.class.name
1556 )
1557 )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-72-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.10
pytest : 4.6.2
hypothesis : None
sphinx : 2.1.0
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.3
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.4
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8