ENH/API: accept list-like percentiles in describe (WIP) by TomAugspurger · Pull Request #7088 · pandas-dev/pandas (original) (raw)
@TomAugspurger why are you using Counter
rather than pd.core.algorithm.value_counts()
again?
Counter
DOES not sort, while value_counts
does.
On windows test_generic/test_describe_object
is failing on the datetimes because of the arbitrary order from Counter
(as the counts are all 1 for some reason it picks the 2nd one).
======================================================================
FAIL: test_describe_objects (pandas.tests.test_generic.TestDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\tests\test_generic.py", line 997, in test_describe_objects
assert_frame_equal(result, expected)
File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\util\testing.py", line 573, in assert_frame_equal
check_exact=check_exact)
File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-2.7\pandas\util\testing.py", line 520, in assert_series_equal
assert_almost_equal(left.values, right.values, check_less_precise)
File "testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2465)
File "testing.pyx", line 93, in pandas._testing.assert_almost_equal (pandas\src\testing.c:1793)
File "testing.pyx", line 139, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2338)
AssertionError: Timestamp('2010-01-02 00:00:00') != Timestamp('2010-01-01 00:00:00')
----------------------------------------------------------------------
Ran 6971 tests in 194.439s
FAILED (SKIP=130, failures=1)
2.7\pandas\core\generic.py(3576)describe_categorical_1d()
-> if data.dtype == object:
(Pdb) n
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3585)describe_categorical_1d()
-> elif issubclass(data.dtype.type, np.datetime64):
(Pdb)
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3586)describe_categorical_1d()
-> names = ['count', 'unique']
(Pdb) p data
0 2010-01-01
1 2010-01-02
2 2010-01-03
3 2010-01-04
Name: C1, dtype: datetime64[ns]
(Pdb) n
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3587)describe_categorical_1d()
-> asint = data.dropna().values.view('i8')
(Pdb) n
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\generic.py(3588)describe_categorical_1d()
-> objcounts = compat.Counter(asint)
(Pdb) p asint
array([1262304000000000000, 1262390400000000000, 1262476800000000000,
1262563200000000000], dtype=int64)
(Pdb) l
3583 result += [top, freq]
3584
3585 elif issubclass(data.dtype.type, np.datetime64):
3586 names = ['count', 'unique']
3587 asint = data.dropna().values.view('i8')
3588 -> objcounts = compat.Counter(asint)
3589 result = [data.count(), len(objcounts)]
3590 if result[1] > 0:
3591 top, freq = objcounts.most_common(1)[0]
3592 names += ['first', 'last', 'top', 'freq']
3593 result += [lib.Timestamp(asint.min()),
(Pdb) p compat.Counter(asint)
Counter({1262390400000000000: 1, 1262563200000000000: 1, 1262304000000000000: 1, 1262476800000000000: 1})
(Pdb) p asint
array([1262304000000000000, 1262390400000000000, 1262476800000000000,
1262563200000000000], dtype=int64)
(Pdb) p pd.algorithms.value_counts
*** AttributeError: AttributeError("'module' object has no attribute 'algorithms'",)
(Pdb) p pd.core.algorithms.value_counts
<function value_counts at 0x00000000076FDBA8>
(Pdb) p pd.core.algorithms.value_counts(asint)
1262304000000000000 1
1262563200000000000 1
1262476800000000000 1
1262390400000000000 1
dtype: int64
(Pdb) p pd.core.algorithms.value_counts(asint,sort=True)
1262304000000000000 1
1262563200000000000 1
1262476800000000000 1
1262390400000000000 1
dtype: int64
(Pdb)