ENH: explicit filters parameter in pd.read_parquet by mrastgoo · Pull Request #53212 · pandas-dev/pandas (original) (raw)
@mrastgoo would you have time to add a test?
We have an existing test for the columns keyword (test_read_columns
), so could add an equivalent test_read_filters
test case next to it. Something like:
def test_read_filters(self, engine, tmp_path):
df = pd.DataFrame({"int": list(range(4)), "part": list("aabb"),})
expected = pd.DataFrame({"int": [0, 1]})
check_round_trip(
df,
engine,
path=tmp_path,
expected=expected,
write_kwargs={"partition_cols": ["part"]},
read_kwargs={"filters": [("string", "==", "a")], "columns":["int"]},
repeat=1,
)
I used partition cols to actually see the effect of the filter also for fastparquet (to ensure it is correctly passed through). And with that, the repeat=1
is needed to not add additional files to the directory when writing a second time (with partition_cols, it does not just overwrite the single file).