BUG: seemingly unexpected behavior with errrors='ignore' in json_normalize() · Issue #41876 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

raises a KeyError due to the missing 'id' in the second element, even though errors='ignore' was passed

pd.json_normalize( data=[{"id": 1, "outer": {"inner": [{"field": 2}]}}, {"outer": {"inner": [{"field": 3}]}}], record_path=["outer", "inner"], meta=["id"], errors="ignore", )

no issues since both elements have an 'id', and shows that it is looking for 'id' at the right level

pd.json_normalize( data=[{"id": 1, "outer": {"inner": [{"field": 2}]}}, {"id": 4, "outer": {"inner": [{"field": 3}]}}], record_path=["outer", "inner"], meta=["id"], errors="ignore", )

no issues when record_path has length 1

pd.json_normalize( data=[{"id": 1, "outer": [{"field": 2}]}, {"outer": [{"field": 3}]}], record_path=["outer"], meta=["id"], errors="ignore", )

Problem description

I would expect the first case above to return the same output as the third case (i.e., with missing ids filled with np.nan). Is the absence of a try...except around the first _pull_field() call intentional?

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.67-ts1
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.utf8
LOCALE : en_US.UTF-8

pandas : 1.0.5
numpy : 1.16.6
pytz : 2020.4
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : 5.3.1
hypothesis : 3.57.0
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.14.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.0.3
numexpr : 2.6.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pytest : 5.3.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : 1.3.18
tables : None
tabulate : 0.8.3
xarray : None
xlrd : 1.1.0
xlwt : None
xlsxwriter : None
numba : 0.50.1