BUG(?): ArrowExtensionArray.median is an "approximate median" · Issue #52679 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

@lukemanley

Description

@lukemanley

ArrowExtensionArray.median is implemented using pyarrow.compute.approximate_median which might produce surprising results for users looking for a "true" median:

In [1]: import pandas as pd

In [2]: pd.Series([1, 2], dtype="float64").median()
Out[2]: 1.5

In [3]: pd.Series([1, 2], dtype="float64[pyarrow]").median()
Out[3]: 1.0

It does not look like pyarrow provides a compute function for a "true" median. Is this intentional? Should it be documented somehow?

cc @mroeschke