BUG: pivot_table losing tz by jbrockmendel · Pull Request #32558 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation17 Commits9 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

closes #xxxx
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This gets rid of the last values_from_object usage (pending other PRs)

_tile_compat likely makes more sense as an Index method. I kept it here as a proof of concept because I think we actually need it in the EA interface too. Being able to broadcast a length=1 EA to a length=N EA will be necessary for some of the arithmetic perf going on.

What are your proposed semantics of an EA.tile? If it's restricted to a single dimension, can to be emulated using _concat_same_type?

What are your proposed semantics of an EA.tile? If it's restricted to a single dimension, can to be emulated using _concat_same_type?

The base class implementation would probably just use _concat_same_type. For ndarray-backed EAs I'd use broadcast_to which doesn't make a copy, but results in a non-writeable array (in that sense it would behave more like broadcast_to than np.tile)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should just have EA.tile; this is very messy otherwise.

can you see if we have any issues about this (pivot and tz), I think we might.


def _broadcast_tile(arr: np.ndarray, num: int) -> np.ndarray:
"""
Emulate np.tile but using views instead of copies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an important optimization? Currently we also use np.tile ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large inputs yes

The base class implementation would probably just use _concat_same_type.

We could also simplify this a lot by using _concat_same_type here (which basically repeats my inline question: is this optimization important?)

Regarding tile in the EA interface: numpy also has no tile method, so it's not that it would make it more compatible with numpy arrays (for that, we would need to implement a protocol to make np.tile work on EAs).
Personally, having the centralized "tile for EA function" here is fine for me.

…-values_from_object-reshape

We could also simplify this a lot by using _concat_same_type here (which basically repeats my inline question: is this optimization important?)

This is actually close to orthogonal to the broadcast_to optimization. When I disable the broadcat_to optimization and just use np.tile, this implementation is about 2x faster than one that uses _concat_same_type (based on arr = pd.date_range('2016-01-01', periods=10**5, freq='S') and num=5)

can you see if we have any issues about this (pivot and tz), I think we might.

searching on "pivot" didnt show anything tz-related

This is actually close to orthogonal to the broadcast_to optimization.

Why? It seems the reason you are not using the slower/simpler np.tile / _concat_same_type is to be able to use this optimization?

this implementation is about 2x faster than one that uses _concat_same_type

This is only for the tile? What does it give in the full pivot operation?

Why? It seems the reason you are not using the slower/simpler np.tile / _concat_same_type is to be able to use this optimization?

No, its also because the non-concat_same_type implementation is 2x faster, even without the broadcast_to optimization.

For arithmetic I'm soon going to need to broadcast/tile length=1 EAs to length=N, which is very wasteful without the broadcast_to; that is the forward-looking motivation for using the broadcast_to optimization.

This is only for the tile? What does it give in the full pivot operation?

No idea.

…-values_from_object-reshape

Updated to use a simpler implementation that does not do any optimizations.

looks fine, can you add a release note, ping on green.

…-values_from_object-reshape

booyah! there goes our last usage of values_from_object

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request

Mar 22, 2020