dispatch scalar DataFrame ops to Series by jbrockmendel · Pull Request #22163 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation27 Commits19 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Many issues closed; will track them down and update. Will also need whatsnew.

closes #18874
closes #20088
closes #15697
closes #13128
closes #8554
closes #8932
closes #21610
closes #22005
closes #22047
closes #22242

This will be less verbose after #22068 implements ops.dispatch_to_series.

This still only dispatches a subset of ops. #22019 dispatches another (disjoint) subset. After that is another easy-ish case where alignment is known. Saved for last are cases with ambiguous alignment that is currently done in an ad-hoc best-guess way.

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 10, 2018 at 14:11 Hours UTC

Uses too many kludges, should wait for fixes in series and index comparisons. Closing.

Re-opening after scaling back an unreasonably ambitious py2/py3 compat goal. In particular consider:

df = pd.Series(['bar', 'bar'], name='foo').to_frame()
df < 0
df['foo'] < 0

In PY3 ATM this gives:

>>> df < 0
    foo
0  True
1  True
>>> df['foo'] < 0
[...]
TypeError: '<' not supported between instances of 'str' and 'int'

And in PY2:

>>> df < 0
     foo
0  False
1  False
>>> df['foo'] < 0
0    False
1    False
Name: foo, dtype: bool

Making the PY2/PY3 behavior identical is not feasible, but we can (and this PR does) ensure that the DataFrame/Series behavior matches. In PY2 this is unchanged, in PY3 the DataFrame comparison now correctly raises.

gfyoung added Refactor

Internal refactoring of code

Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff

labels

Updated OP with issues this closes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to have gh comments on the tests where appropriate. pls also do the whatsnew. its pretty important that we nail down and match the closed issues with the code.

@@ -4949,6 +4949,14 @@ def _combine_match_columns(self, other, func, level=None, try_cast=True):
return self._constructor(new_data)

def _combine_const(self, other, func, errors='raise', try_cast=True):
if lib.is_scalar(other) or np.ndim(other) == 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is pretty annoything that we have to do this, I would make an explict function maybe is_any_scalar I think as we have these types of checks all over. pls make an issue for this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1327,6 +1327,10 @@ def wrapper(self, other, axis=None):

res_name = get_op_result_name(self, other)

if isinstance(other, list):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is is_list_like (maybe after some other comparisons) enough here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM the isinstance(other, list) check is done below the isinstance(other, (np.ndarray, pd.Index)) check. Wrapping lists earlier let us send lists through that same ndarray/Index block. Ideally the catchall else: block can be reduced to only-scalars, but we're not there yet.

@@ -1706,7 +1708,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):
if fill_value is not None:
self = self.fillna(fill_value)

return self._combine_const(other, na_op, try_cast=True)
pass_op = op if lib.is_scalar(other) else na_op

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are checking for a scalar here and above?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying. If lib.is_scalar(other) then we will be dispatching to the Series op, in which case we want to pass the "raw" op (e.g. operator.add) and not the wrapped op na_op.

This PR handles only scalars since that is a relatively easy case. A few PRs down the road we'll have all these ops dispatch to series, at which point this won't be necessary.

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
# test df[df > 0]
for df in [self.tsframe, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's strip out the mixed_frame to another function (even though that duplicates some code), bonus can parametrize this test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus can parametrize this test.

I don't think tsframe, mixed_frame, mixed_float, mixed_int are available in the namespace.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be made fixtures. this becomes so much easier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, am starting to implement this in the test_arithmetic sequence of PRs. Will update this test when that lands.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k thanks

tm.assert_frame_equal(actual, dfn)
actual = df1 - NA
tm.assert_frame_equal(actual, dfn)
with pytest.raises(TypeError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this raising? this is a big change if you don't allow nan to act as NaT in ops

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current behavior for Series and Index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a subsection in the whatsnew then, marked as an api change.

expected = op(s, value).dtypes
assert_series_equal(result, expected)

invalid = {(operator.pow, '<M8[ns]'),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull this out and parametrize

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test already has two layers of parametrization; it isn't clear how to pull this out without making it more verbose+repetitive. Let me give this some thought and circle back.

This was referenced

Aug 8, 2018

Phew. Just added GH references to tests and a ton of Whats New.

needs a rebase and some comments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
# test df[df > 0]
for df in [self.tsframe, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k thanks

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request

This was referenced

Aug 16, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request

This was referenced

Oct 23, 2018

Labels

Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff

Internal refactoring of code