ENH: Add merge type validation on pandas.merge
by KevsterAmp · Pull Request #59435 · pandas-dev/pandas (original) (raw)
- closes BUG: UnboundLocalError when full outer merging two dataframes #59422 (Replace xxxx with the GitHub issue number)
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Added type annotations to new arguments/methods/functions.
- Added an entry in the latest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature
Tests on pandas.merge
(pandas-dev) kev@mac pandas % pytest pandas/tests/reshape/merge/test_merge.py
+ /opt/homebrew/Caskroom/miniforge/base/envs/pandas-dev/bin/ninja
[1/1] Generating write_version_file with a custom command
============================================================================================================ test session starts =============================================================================================================
platform darwin -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0
PyQt5 5.15.9 -- Qt runtime 5.15.8 -- Qt compiled 5.15.8
rootdir: /Users/kev/self/pandas
configfile: pyproject.toml
plugins: localserver-0.0.0, qt-4.4.0, cov-5.0.0, anyio-4.4.0, hypothesis-6.108.7, cython-0.3.1, xdist-3.6.1
collected 939 items
pandas/tests/reshape/merge/test_merge.py ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
------------------------------------------------------------------------------------------ generated xml file: /Users/kev/self/pandas/test-data.xml ------------------------------------------------------------------------------------------
============================================================================================================ slowest 30 durations ============================================================================================================
0.02s call pandas/tests/reshape/merge/test_merge.py::TestMergeDtypes::test_merge_ea_with_string[inner-str0]
0.02s call pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_non_unique_indexes
0.01s call pandas/tests/reshape/merge/test_merge.py::TestMerge::test_validation
0.01s call pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_indicator_result_integrity
0.01s call pandas/tests/reshape/merge/test_merge.py::TestMerge::test_indicator
0.01s call pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_non_unique_index_many_to_many
0.01s call pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_left_empty_right_notempty
(23 durations < 0.005s hidden. Use -vv to show these durations.)
============================================================================================================ 939 passed in 1.37s =============================================================================================================
Error Reproducible Example
from datetime import datetime
data_A = list() data_B = list()
for i in range(10): data_A.append({ "id": i, "created_date": datetime.today(), "created_at": datetime.now(), })
data_B.append({
"id": i if i % 2 == 0 else 3*i,
"created_date": datetime.today(),
"created_at": datetime.now(),
})
df_A = pd.DataFrame.from_dict(data_A)
df_B = pd.DataFrame.from_dict(data_B)
df_A.merge(df_B, how="full", on="id")
Traceback (most recent call last):
File "/Users/kev/self/pandas/test.py", line 29, in <module>
df_A.merge(df_B, how="full", on="id")
File "/Users/kev/self/pandas/pandas/core/frame.py", line 10807, in merge
return merge(
File "/Users/kev/self/pandas/pandas/core/reshape/merge.py", line 355, in merge
raise ValueError(f"'{how}' is not a valid Merge type ({merge_type})")
ValueError: 'full' is not a valid Merge type (['left', 'right', 'inner', 'outer', 'cross'])
Not sure if it should raise ValueError
or MergeError