ENH/API: accept list-like percentiles in describe (WIP) by TomAugspurger · Pull Request #7088 · pandas-dev/pandas (original) (raw)

@TomAugspurger why are you using Counter rather than pd.core.algorithm.value_counts() again?

Counter DOES not sort, while value_counts does.

On windows test_generic/test_describe_object is failing on the datetimes because of the arbitrary order from Counter (as the counts are all 1 for some reason it picks the 2nd one).

======================================================================
FAIL: test_describe_objects (pandas.tests.test_generic.TestDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_generic.py", line 997, in test_describe_objects
    assert_frame_equal(result, expected)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\util\testing.py", line 573, in assert_frame_equal
    check_exact=check_exact)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\util\testing.py", line 520, in assert_series_equal
    assert_almost_equal(left.values, right.values, check_less_precise)
  File "testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2465)
  File "testing.pyx", line 93, in pandas._testing.assert_almost_equal (pandas\src\testing.c:1793)
  File "testing.pyx", line 139, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2338)
AssertionError: Timestamp('2010-01-02 00:00:00') != Timestamp('2010-01-01 00:00:00')

----------------------------------------------------------------------
Ran 6971 tests in 194.439s

FAILED (SKIP=130, failures=1)
2.7\pandas\core\generic.py(3576)describe_categorical_1d()
-> if data.dtype == object:
(Pdb) n
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3585)describe_categorical_1d()
-> elif issubclass(data.dtype.type, np.datetime64):
(Pdb)
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3586)describe_categorical_1d()
-> names = ['count', 'unique']
(Pdb) p data
0   2010-01-01
1   2010-01-02
2   2010-01-03
3   2010-01-04
Name: C1, dtype: datetime64[ns]
(Pdb) n
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3587)describe_categorical_1d()
-> asint = data.dropna().values.view('i8')
(Pdb) n
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3588)describe_categorical_1d()
-> objcounts = compat.Counter(asint)
(Pdb) p asint
array([1262304000000000000, 1262390400000000000, 1262476800000000000,
       1262563200000000000], dtype=int64)
(Pdb) l
3583                        result += [top, freq]
3584
3585                elif issubclass(data.dtype.type, np.datetime64):
3586                    names = ['count', 'unique']
3587                    asint = data.dropna().values.view('i8')
3588 ->                 objcounts = compat.Counter(asint)
3589                    result = [data.count(), len(objcounts)]
3590                    if result[1] > 0:
3591                        top, freq = objcounts.most_common(1)[0]
3592                        names += ['first', 'last', 'top', 'freq']
3593                        result += [lib.Timestamp(asint.min()),
(Pdb) p compat.Counter(asint)
Counter({1262390400000000000: 1, 1262563200000000000: 1, 1262304000000000000: 1, 1262476800000000000: 1})
(Pdb) p asint
array([1262304000000000000, 1262390400000000000, 1262476800000000000,
       1262563200000000000], dtype=int64)
(Pdb) p pd.algorithms.value_counts
*** AttributeError: AttributeError("'module' object has no attribute 'algorithms'",)
(Pdb) p pd.core.algorithms.value_counts
<function value_counts at 0x00000000076FDBA8>
(Pdb) p pd.core.algorithms.value_counts(asint)
1262304000000000000    1
1262563200000000000    1
1262476800000000000    1
1262390400000000000    1
dtype: int64
(Pdb) p pd.core.algorithms.value_counts(asint,sort=True)
1262304000000000000    1
1262563200000000000    1
1262476800000000000    1
1262390400000000000    1
dtype: int64
(Pdb)