DEPR: Deprecate generic timestamp dtypes by gfyoung · Pull Request #15987 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation53 Commits2 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

gfyoung

We only use the nanosecond frequency, and numpy doesn't even handle generic timestamp dtype well internally with its ndarray object.

xref #15524 (comment).

jreback

dtype = np.timedelta64
s = Series([], dtype=dtype)
self.assertTrue(s.empty)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use new pytest when you can, e.g.

assert s.empty
assert s.dtype == 'm8[ns]'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, not really a fan of having to rewrite every single test comparison just to be more "idiomatic" with the existing test standard.

Why not write our own in pandas.util.testing that we can implement according to idiom and call from everywhere else?

e.g.

def assert_true(val): assert val

def assert_equal(a, b): assert a == b

The idiom is thus abstracted away from our tests, and we don't have to worry about it when writing test cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as I write this, I see that we do have one of these functions already:

assert_equal(a, b, msg="")

I think from now on, we should advocate that people use our static util.testing functions (and avoid using self.assert... or assert keyword).

def test_astype_empty_constructor_equality(self):
# see gh-15524
for dtype in np.typecodes['All']:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paramerize when you can on new tests (you may need to create a new class that inherits from object and/or write as bare functions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unittest classes and pytest decorators do not mesh well together based on experience. I can't find anything so far in documentation on the pytest to do so. Unless you know of a straightforward way to do this, I'm leaving as is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring some native pytest way to implement, a test decorator is the only way to do this (similar to what was done with capturing stdout and stderr). This is probably preferable since it would be independent of testing platform.

However, I would prefer to leave that as a follow-up so I can experiment separately.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those docs don't answer my question. Does not explain how to parameterize a test method within tm.TestCase. As I already mentioned, unittest and pytest don't mesh well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying that I don't disagree about parameterizing. However, I think it is preferable and easier to implement our own parameterizaton decorator (just as we did with capture), that is all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am trying to shrink the codebase
implementing parameteize is not trivial and would need a whole test suite - it diesnt add any value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but neither is incorporating pytest's builtin paramterize without adding new code (i.e. an entirely new class). The only way to avoid that is to write this is as a bare function (not a test method of tm.TestCase), but I prefer the test class organization if possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just make another class inherit from object AS i show in the example

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There actually is no example where you create a class that inherits from object. However, that being said, having been able to sit down and re-examine the class suggestion, I do see what you mean now.

Several things can come from this discussion:

  1. Document how you can instantiate test classes that inherit from object (and not tm.TestCase). I still prefer writing test classes instead of raw functions for organizational reasons, though it will still follow pytest idiom.
  2. Make a new PR that modifies the changes in TST: Add test decorators for redirecting stdout and stderr #15952 to use pytest fixtures for redirecting stdout and stdin (that will take substantial work though).
  3. Modify the existing test class in this file to inherit only from object (new commit).
# numpy arrays don't handle generic timestamp dtypes well. Since
# we only use the nanosecond frequency, interpret generic timestamp
# dtypes as nanosecond frequency.
if dtype.name == "datetime64":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way too hacky to put this here, this should be trapped in maybe_cast_to_datetime, or if not, then let's create a generic way (via function there to do this).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, except that when we call .astype, maybe_cast_to_datetime does not get called. But certainly, there are other places that casting can go.

@codecov

@codecov

@gfyoung

@jreback : All comments have been addressed, and everything is green.

@jorisvandenbossche

In the issue, the discussion was about construction of empty series objects, while here you also added constructing/astyping actual data with non-parameterized np.datetime64. As I said in the issue (#15524 (comment), so my comment was relevant after all :-)), I don't see a strong reason to allow this.

@jorisvandenbossche

@gfyoung I see now you replied in the issue:

Those examples you bring up above, should we follow in numpy 's footsteps? We only use one frequency, so I don't see why those two examples wouldn't work.

There is maybe not a strong reason that they shouldn't work, but I also don't see a good reason that they should work. It just expands what users can do in an unnecessary way. Why is it needed that people can do .astype(np.datetime) while they already can do .astype('datetime64[ns]') which is just more explicit (and does not 'misuse' the numpy dtype by interpreting it differently as numpy intended).

@gfyoung

@jorisvandenbossche : Well, it has to go one way or the other. Either you can astype and initialize with the dtype (empty or non-empty), or you can't do either. The consensus with @jreback was that you should be able to do both. The point of this PR was to make behavior consistent.

I don't have any major issues going the other way and preventing anyone from casting to the generic dtype (i.e. np.datetime64 or np.timedelta64) via astype or the constructor (though personally, having this convenience is nice because I sometimes forget how to format / exactly specify the frequency). However, some agreement on how to proceed with this would be good.

@jreback

I am ok with this (still have to review the change again) as we already support this.

In [4]: Series([]).astype(np.datetime64)
Out[4]: Series([], dtype: datetime64[ns])

this should be consistent with direct construction (via dtype).

@jreback

On second though, we could deprecate accepting direct np.datetime64 as a passed dtype (this doesn't affect accept an array already in this format, just the dtype= kw and .astype())

So the principal of construction / conversion should work with the same type will be preserved.

I think I do like being more explicit here (IOW doing the deprecation). @gfyoung can you see what this breaks?

IIRC this was originally an issue to fix something in csv parsing. But this was an orthogonal change (I don't even really understand how this change was related, maybe you can show that?)

@jorisvandenbossche

(this doesn't affect accept an array already in this format, just the dtype= kw and .astype())

And for dtype= it already fails:

In [5]: Series([], dtype=np.datetime64)
...
TypeError: cannot convert datetimelike to dtype [datetime64]

so it is only astype that would change.

If possible, I would be in favor of not allowing bare np.datetime64 (and thus deprecating the astype case)

@jreback

@jorisvandenbossche yep its the .astype that works now, not the direct construction. So let's investigate deprecating directly specifying np.datetime64, np.timedelta64 in .astype and see what happens.

@gfyoung

@jreback , @jorisvandenbossche : That sounds like a good plan. Shouldn't be too bad I imagine.

@jorisvandenbossche BTW, the reason for that weird error message that you saw in the issue here is because at some point, we call arr.view(dtype), which when dtype=np.datetime64 causes numpy to get angry (internal representation breaks), and it can't display a proper error message.

@gfyoung gfyoung changed the titleENH: Handle generic timestamp dtypes with Series DEPR: Deprecate generic timestamp dtypes

Apr 13, 2017

@gfyoung

Deprecation worked out reasonably well, but I can't really figure out why AppVeyor and Circle are failing my tests. The Travis logs are clean (and passing), so I don't think that there's any hidden warning being issued that's causing AppVeyor and Circle to miss them.

Suggestions on how to fix this?

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't directly see what could be going on with circle ci and appveyor.

"deprecated and will raise in a future version. "
"Please pass in '{dtype}[ns]' instead.")
warnings.warn(msg.format(dtype=dtype.name),
FutureWarning, stacklevel=2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this stacklevel should be 8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant 5? (same as the one beneath). Done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think this should be 8 (at least, that's what I counted). But you didn't remove the stacklevel=False, so this does not fail the tests currently

if dtype.name in ('datetime64', 'datetime64[ns]'):
if dtype.name == 'datetime64':
warnings.warn(msg.format(dtype=dtype.name),
FutureWarning, stacklevel=2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one 5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if dtype.name in ('timedelta64', 'timedelta64[ns]'):
if dtype.name == 'timedelta64':
warnings.warn(msg.format(dtype=dtype.name),
FutureWarning, stacklevel=2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one also 5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

data = [1]
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to remove the check_stacklevel here? (btw, easiest to try out what it should be is the set warnings to raise an error, and then you can count the number of stacks in the traceback)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's fair. Done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you removed this?

warning_type = FutureWarning
with tm.assert_produces_warning(warning_type,
check_stacklevel=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it may help, I don't think it is necessarily needed to assert the warning here (as you already have an explicit test for that above), but you can also just catch warnings so they do not appear in the log (with warnings.catch_warnings(record=True):)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

@jreback

lgtm. I see you 'fixed' the passing np.datetime64 to the constructor (and then deprecated it), but that's fine.

ping on green.

@gfyoung

Sounds good...once I can figure out these warnings 😁

@gfyoung

We only use the nanosecond frequency, and numpy doesn't even handle generic timestamp dtypes well.

xref pandas-devgh-15524 (comment).

@gfyoung

@gfyoung

jreback

@jreback

@gfyoung gfyoung deleted the timestamp64-no-freq branch

April 14, 2017 13:50

gfyoung added a commit to forking-repos/pandas that referenced this pull request

Oct 28, 2018

@gfyoung

Previously deprecated for Series constructor and the .astype method. Now being enforced.

xref pandas-devgh-15987.

gfyoung added a commit to forking-repos/pandas that referenced this pull request

Oct 28, 2018

@gfyoung

Previously deprecated for Series constructor and the .astype method. Now being enforced.

xref pandas-devgh-15987.

gfyoung added a commit to forking-repos/pandas that referenced this pull request

Oct 28, 2018

@gfyoung

Previously deprecated for Series constructor and the .astype method. Now being enforced.

xref pandas-devgh-15987.

jreback pushed a commit that referenced this pull request

Oct 28, 2018

@gfyoung @jreback

Previously deprecated for Series constructor and the .astype method. Now being enforced.

xref gh-15987.

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request

Nov 19, 2018

@gfyoung @tm9k1

Previously deprecated for Series constructor and the .astype method. Now being enforced.

xref pandas-devgh-15987.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@gfyoung @Pingviinituutti

Previously deprecated for Series constructor and the .astype method. Now being enforced.

xref pandas-devgh-15987.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@gfyoung @Pingviinituutti

Previously deprecated for Series constructor and the .astype method. Now being enforced.

xref pandas-devgh-15987.