API: Add dtype parameter to Categorical.from_codes by topper-123 · Pull Request #24398 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation67 Commits10 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

topper-123

Added parameter dtype to Categorical.from_codes.

topper-123

@codecov

@codecov

jschendel

jschendel

jschendel

jschendel

gfyoung

if dtype is not None:
if categories is not None or ordered is not None:
raise ValueError("Cannot specify both `dtype` and `categories`"
" or `ordered`.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message confuses me. Are you saying: both "dtype" and ("categories" / "ordered") ?

I think this will need to be reworded for clarity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied that message from Categorical.__init__, but I agree, and have changed it in both locations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic. Thanks for doing that!

jreback

@pep8speaks

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 08, 2019 at 14:54 Hours UTC

jreback

jreback

jreback

@@ -59,6 +59,7 @@
Categorical, CategoricalIndex, DataFrame, DatetimeIndex, Float64Index,
Index, Int64Index, Interval, IntervalIndex, MultiIndex, NaT, Panel, Period,
PeriodIndex, RangeIndex, Series, TimedeltaIndex, Timestamp)
from pandas.api.types import CategoricalDtype as CDT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same style

@@ -30,6 +30,7 @@
DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, Panel,
PeriodIndex, Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat,
concat, isna, to_datetime)
from pandas.api.types import CategoricalDtype

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback

@jreback

@topper-123

Ok, i’ve reverted the deprecation.

TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing it, but did you add a test for Categorical.from_codes(codes, categories, dtype=dtype) raising?

Categorical.from_codes([0, 1], Categorical(['a', 'b', 'a']))
codes = np.random.choice([0, 1], 5, p=[0.9, 0.1])
dtype = CategoricalDtype(categories=["train", "test"])
Categorical.from_codes(codes, dtype=dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test is duplicative with earlier ones (even on master). I'd be OK with removing it.

@jorisvandenbossche

Yes, +1 on not deprecating categories and ordered

@jreback

so ok with adding dtype in .from_codes as that promotes consistency, but why are folks not in favor of deprcating categories and ordered? this is just moving code away from the single point of using CDT for all operations.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@TomAugspurger

Mainly balancing the volume of code that is out there using categories & orderer, against the (small) cost of supporting both.

@jorisvandenbossche

so ok with adding dtype in .from_codes as that promotes consistency, but why are folks not in favor of deprcating categories and ordered?

We have exactly the same pattern in the main Categorical constructor as well: users have the option to just pass categories or ordered, or a full fledged dtype.
Deprecating it there of course has a much larger impact than for from_codes, but this means we have the machinery to handle the combination, so I don't really see the need to deprecate it here (certainly given that it is much more convenient to use).

@jreback

We have exactly the same pattern in the main Categorical constructor as well: users have the option to just pass categories or ordered, or a full fledged dtype.

ok I can see the argument for this then. But this is a tag confusing, maybe let's enhance the doc-strings slightly on the constructor & from_codes to make this even more cclear that you should pass (categories, ordered) or dtype (yes it errors, but a doc-string not will help).

@topper-123 can you raise an issue / PR for this.

jreback

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger

@TomAugspurger

6008c08 has some changes

@TomAugspurger

Buglet when neither categories nor dtype is provided.

In [1]: import pandas as pd

In [2]: pd.Categorical.from_codes([0, 1])

TypeError Traceback (most recent call last) in ----> 1 pd.Categorical.from_codes([0, 1])

~/sandbox/pandas/pandas/core/arrays/categorical.py in from_codes(cls, codes, categories, ordered, dtype) 661 662 if len(codes) and ( --> 663 codes.max() >= len(dtype.categories) or codes.min() < -1): 664 raise ValueError("codes need to be between -1 and " 665 "len(categories)-1")

TypeError: object of type 'NoneType' has no len()

fixing now.

@TomAugspurger

@TomAugspurger

6008c08 also had a bug with the tests for raising when both categories and dtype were used. The test I added used Categorical() instead of Categorical.from_codes. That's fixed now.

@TomAugspurger

TomAugspurger

@topper-123

@TomAugspurger

Thanks. Merging in a few hours if now objections.

jreback

@jreback

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@topper-123 @Pingviinituutti

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@topper-123 @Pingviinituutti

@jbrockmendel

#6581 lists the categorical and ordered kwargs as deprecated, but i dont see any warnings to that effect. Am i missing something?

@jorisvandenbossche

That seems a mistake indeed, as those arguments are not deprecated. Looking at the last part of the comments above, there was some discussion about this but finally decided to not deprecate.

Removed it from the list in #6581

@topper-123

Yeah, the original proposal was to deprecate, but we decided to keep them after all, as it we felt it was more convenient to allow a similar instantiation method as in __init__.

@jbrockmendel

@topper-123 thanks for clarifying. Does this mean that either a) an entry can be removed from #6581 or b) some FutureWarnings can be removed from CategoricalDtype/Categorical? If so, are you up for taking point on that?

@jorisvandenbossche

@jbrockmendel as I said above, I already removed it from #6581

There were no warnings introduced in this PR, so also nothing to remove (there are other categorical related ones though, but not related to this PR).