API: dtlike.astype(inty) by jbrockmendel · Pull Request #49715 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation17 Commits6 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

jbrockmendel

@jbrockmendel

jreback

@@ -329,6 +329,7 @@ Other API changes
- Default value of ``dtype`` in :func:`get_dummies` is changed to ``bool`` from ``uint8`` (:issue:`45848`)
- :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting datetime64 data to any of "datetime64[s]", "datetime64[ms]", "datetime64[us]" will return an object with the given resolution instead of coercing back to "datetime64[ns]" (:issue:`48928`)
- :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting timedelta64 data to any of "timedelta64[s]", "timedelta64[ms]", "timedelta64[us]" will return an object with the given resolution instead of coercing to "float64" dtype (:issue:`48963`)
- :meth:`DatetimeIndex.astype`, :meth:`TimedeltaIndex.astype`, :meth:`PeriodIndex.astype` :meth:`Series.astype`, :meth:`DataFrame.astype` with ``datetime64``, ``timedelta64`` or :class:`PeriodDtype` dtypes no longer allow converting to integer dtypes other than "int64", do ``obj.astype('int64', copy=False).astype(dtype)`` instead (:issue:`??`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this right? eg you can use copy=False to actually do this? how so

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing copy=False here just avoids doing a double-copy, since the second .astype will always copy

mroeschke

mroeschke

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why an exception is only made for int64? (Assuming TimedeltaIndex.astype(int) is the nanos representation), TimedeltaIndex(["1ns"]) should be able to fit in a smaller int type

@jbrockmendel @mroeschke

Co-authored-by: Matthew Roeschke 10647082+mroeschke@users.noreply.github.com

@mroeschke

Also a suggestion in a follow up PR: Could you collect all the "dtype conversion" whatsnew notes in the Other API section and give them their own dedicated section like under Other API -> Dtype Consistency/Strictness? I think it would be good to explain to users that pandas is trying to move toward dtype conversion operations that are less lossy and more explicit.

@jbrockmendel

Curious why an exception is only made for int64? (Assuming TimedeltaIndex.astype(int) is the nanos representation), TimedeltaIndex(["1ns"]) should be able to fit in a smaller int type

My main concern is achieving consistency between Index/Series behavior. ATM Series raises for anything other than i8/u8 (see removed code in core.dtypes.astype). So the question is which way to go to achieve consistency. I prefer to go towards more-strict rather than less.

.astype in general means "the same information but in a different format" which does not hold for dtlike->inty. These are semantic gibberish 100% dependent on implementation details. Continuing to allow int64 is a compromise, though in a perfect world I'd tell users to just .view("i8") instead.

xref #45034, #38544, and parts of #22384.

@jbrockmendel

@jbrockmendel

…andas into api-astype-dtlike-to-ints

@jbrockmendel

@jbrockmendel

@mroeschke

If I understand #45034 correctly, it's generally concluded that it's okay to disallow casting ints except for int64? cc @jorisvandenbossche

@jorisvandenbossche

If I understand #45034 correctly, it's generally concluded that it's okay to disallow casting ints except for int64?

That is not my reading of that issue .. I mentioned several times there that I personally don't see why such restriction is needed (while I also said that I don't think this is very important, though)

It's also doing something different than what we actually said would change in the deprecation warnings ("In a future version, this astype will return exactly the specified dtype instead of int64, and will raise if that conversion overflows.")

Continuing to allow int64 is a compromise, though in a perfect world I'd tell users to just .view("i8") instead.

In a perfect world, there would be no view() method on pandas objects (if you ask me ;))

@jbrockmendel

It's also doing something different than what we actually said would change in the deprecation warnings

yah, i reconsidered and changed my mind about that deprecation bc it is value-dependent behavior and in most cases just a very weird thing to .astype to and likely to be unintentional.

@mroeschke

Personally I don't have a strong opinion on which direction to go as this astyping is strange. I could maybe interpret a user trying to do this to try to get an epoch timestamp/total seconds(?)/not sure what for Period.

Agreed that changing this to achieve consistency between Series/Index is important though. I suppose I also like the stricter Series behavior to just signal that this astyping is ill defined

@jbrockmendel

im now thinking the most likely case where a user would pass something other than int64 is if they pass int which on some builds gets interpreted as int32. In this case they almost certainly wanted int64, and giving them int32 is going to throw them off (and make an unwanted copy).

we should disallow anything other than int64 (which is still more than pyarrow allows)

@jreback

I am with @jbrockmendel here, I think we should only allow int64, this is super problematic with overflow and especially on windows.

@phofl

I think int64 makes the most sense, int32 overflows real fast. I had similar issues to what @jbrockmendel mentioned while building a windows service in the past, this was super annoying. Only allowing int64 increases usability imo

This was referenced

Jan 5, 2023

@jbrockmendel

@MarcoGorelli

@MarcoGorelli thoughts here?

No objections (no strong opinion either though, this isn't a topic I've delved into yet)

mroeschke

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as is. Although it's slightly different from the original deprecation message the exception at least gives an alternative to convert to another int type

@mroeschke