DEPR: Deprecate generic timestamp dtypes by gfyoung · Pull Request #15987 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation53 Commits2 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
We only use the nanosecond frequency, and numpy
doesn't even handle generic timestamp dtype
well internally with its ndarray
object.
xref #15524 (comment).
dtype = np.timedelta64 |
---|
s = Series([], dtype=dtype) |
self.assertTrue(s.empty) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use new pytest when you can, e.g.
assert s.empty
assert s.dtype == 'm8[ns]'
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Done.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, not really a fan of having to rewrite every single test comparison just to be more "idiomatic" with the existing test standard.
Why not write our own in pandas.util.testing
that we can implement according to idiom and call from everywhere else?
e.g.
def assert_true(val): assert val
def assert_equal(a, b): assert a == b
The idiom is thus abstracted away from our tests, and we don't have to worry about it when writing test cases.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, as I write this, I see that we do have one of these functions already:
assert_equal(a, b, msg="")
I think from now on, we should advocate that people use our static util.testing
functions (and avoid using self.assert...
or assert
keyword).
def test_astype_empty_constructor_equality(self): |
---|
# see gh-15524 |
for dtype in np.typecodes['All']: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paramerize when you can on new tests (you may need to create a new class that inherits from object and/or write as bare functions)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unittest
classes and pytest
decorators do not mesh well together based on experience. I can't find anything so far in documentation on the pytest
to do so. Unless you know of a straightforward way to do this, I'm leaving as is.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barring some native pytest
way to implement, a test decorator is the only way to do this (similar to what was done with capturing stdout
and stderr
). This is probably preferable since it would be independent of testing platform.
However, I would prefer to leave that as a follow-up so I can experiment separately.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those docs don't answer my question. Does not explain how to parameterize a test method within tm.TestCase
. As I already mentioned, unittest
and pytest
don't mesh well.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying that I don't disagree about parameterizing. However, I think it is preferable and easier to implement our own parameterizaton decorator (just as we did with capture), that is all.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am trying to shrink the codebase
implementing parameteize is not trivial and would need a whole test suite - it diesnt add any value
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but neither is incorporating pytest's builtin paramterize without adding new code (i.e. an entirely new class). The only way to avoid that is to write this is as a bare function (not a test method of tm.TestCase), but I prefer the test class organization if possible.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just make another class inherit from object AS i show in the example
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There actually is no example where you create a class that inherits from object
. However, that being said, having been able to sit down and re-examine the class
suggestion, I do see what you mean now.
Several things can come from this discussion:
- Document how you can instantiate test classes that inherit from
object
(and nottm.TestCase
). I still prefer writing test classes instead of raw functions for organizational reasons, though it will still followpytest
idiom. - Make a new PR that modifies the changes in TST: Add test decorators for redirecting stdout and stderr #15952 to use
pytest
fixtures for redirectingstdout
andstdin
(that will take substantial work though). - Modify the existing test class in this file to inherit only from
object
(new commit).
# numpy arrays don't handle generic timestamp dtypes well. Since |
---|
# we only use the nanosecond frequency, interpret generic timestamp |
# dtypes as nanosecond frequency. |
if dtype.name == "datetime64": |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is way too hacky to put this here, this should be trapped in maybe_cast_to_datetime
, or if not, then let's create a generic way (via function there to do this).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree, except that when we call .astype
, maybe_cast_to_datetime
does not get called. But certainly, there are other places that casting can go.
@jreback : All comments have been addressed, and everything is green.
In the issue, the discussion was about construction of empty series objects, while here you also added constructing/astyping actual data with non-parameterized np.datetime64. As I said in the issue (#15524 (comment), so my comment was relevant after all :-)), I don't see a strong reason to allow this.
@gfyoung I see now you replied in the issue:
Those examples you bring up above, should we follow in numpy 's footsteps? We only use one frequency, so I don't see why those two examples wouldn't work.
There is maybe not a strong reason that they shouldn't work, but I also don't see a good reason that they should work. It just expands what users can do in an unnecessary way. Why is it needed that people can do .astype(np.datetime)
while they already can do .astype('datetime64[ns]')
which is just more explicit (and does not 'misuse' the numpy dtype by interpreting it differently as numpy intended).
@jorisvandenbossche : Well, it has to go one way or the other. Either you can astype
and initialize with the dtype
(empty or non-empty), or you can't do either. The consensus with @jreback was that you should be able to do both. The point of this PR was to make behavior consistent.
I don't have any major issues going the other way and preventing anyone from casting to the generic dtype
(i.e. np.datetime64
or np.timedelta64
) via astype
or the constructor (though personally, having this convenience is nice because I sometimes forget how to format / exactly specify the frequency). However, some agreement on how to proceed with this would be good.
I am ok with this (still have to review the change again) as we already support this.
In [4]: Series([]).astype(np.datetime64)
Out[4]: Series([], dtype: datetime64[ns])
this should be consistent with direct construction (via dtype
).
On second though, we could deprecate accepting direct np.datetime64
as a passed dtype (this doesn't affect accept an array already in this format, just the dtype=
kw and .astype()
)
So the principal of construction / conversion should work with the same type will be preserved.
I think I do like being more explicit here (IOW doing the deprecation). @gfyoung can you see what this breaks?
IIRC this was originally an issue to fix something in csv parsing. But this was an orthogonal change (I don't even really understand how this change was related, maybe you can show that?)
(this doesn't affect accept an array already in this format, just the dtype= kw and .astype())
And for dtype=
it already fails:
In [5]: Series([], dtype=np.datetime64)
...
TypeError: cannot convert datetimelike to dtype [datetime64]
so it is only astype
that would change.
If possible, I would be in favor of not allowing bare np.datetime64
(and thus deprecating the astype case)
@jorisvandenbossche yep its the .astype
that works now, not the direct construction. So let's investigate deprecating directly specifying np.datetime64
, np.timedelta64
in .astype
and see what happens.
@jreback , @jorisvandenbossche : That sounds like a good plan. Shouldn't be too bad I imagine.
@jorisvandenbossche BTW, the reason for that weird error message that you saw in the issue here is because at some point, we call arr.view(dtype)
, which when dtype=np.datetime64
causes numpy
to get angry (internal representation breaks), and it can't display a proper error message.
gfyoung changed the title
ENH: Handle generic timestamp dtypes with Series DEPR: Deprecate generic timestamp dtypes
Deprecation worked out reasonably well, but I can't really figure out why AppVeyor and Circle are failing my tests. The Travis logs are clean (and passing), so I don't think that there's any hidden warning being issued that's causing AppVeyor and Circle to miss them.
Suggestions on how to fix this?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't directly see what could be going on with circle ci and appveyor.
"deprecated and will raise in a future version. " |
---|
"Please pass in '{dtype}[ns]' instead.") |
warnings.warn(msg.format(dtype=dtype.name), |
FutureWarning, stacklevel=2) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this stacklevel should be 8
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant 5? (same as the one beneath). Done.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think this should be 8 (at least, that's what I counted). But you didn't remove the stacklevel=False
, so this does not fail the tests currently
if dtype.name in ('datetime64', 'datetime64[ns]'): |
---|
if dtype.name == 'datetime64': |
warnings.warn(msg.format(dtype=dtype.name), |
FutureWarning, stacklevel=2) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this one 5
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if dtype.name in ('timedelta64', 'timedelta64[ns]'): |
---|
if dtype.name == 'timedelta64': |
warnings.warn(msg.format(dtype=dtype.name), |
FutureWarning, stacklevel=2) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this one also 5
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
data = [1] |
---|
with tm.assert_produces_warning(FutureWarning, |
check_stacklevel=False): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try to remove the check_stacklevel here? (btw, easiest to try out what it should be is the set warnings to raise an error, and then you can count the number of stacks in the traceback)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's fair. Done.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you removed this?
warning_type = FutureWarning |
---|
with tm.assert_produces_warning(warning_type, |
check_stacklevel=False): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it may help, I don't think it is necessarily needed to assert the warning here (as you already have an explicit test for that above), but you can also just catch warnings so they do not appear in the log (with warnings.catch_warnings(record=True):
)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
lgtm. I see you 'fixed' the passing np.datetime64
to the constructor (and then deprecated it), but that's fine.
ping on green.
Sounds good...once I can figure out these warnings 😁
We only use the nanosecond frequency, and numpy doesn't even handle generic timestamp dtypes well.
xref pandas-devgh-15524 (comment).
gfyoung deleted the timestamp64-no-freq branch
gfyoung added a commit to forking-repos/pandas that referenced this pull request
Previously deprecated for Series constructor
and the .astype
method. Now being enforced.
xref pandas-devgh-15987.
gfyoung added a commit to forking-repos/pandas that referenced this pull request
Previously deprecated for Series constructor
and the .astype
method. Now being enforced.
xref pandas-devgh-15987.
gfyoung added a commit to forking-repos/pandas that referenced this pull request
Previously deprecated for Series constructor
and the .astype
method. Now being enforced.
xref pandas-devgh-15987.
jreback pushed a commit that referenced this pull request
Previously deprecated for Series constructor
and the .astype
method. Now being enforced.
xref gh-15987.
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request
Previously deprecated for Series constructor
and the .astype
method. Now being enforced.
xref pandas-devgh-15987.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request
Previously deprecated for Series constructor
and the .astype
method. Now being enforced.
xref pandas-devgh-15987.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request
Previously deprecated for Series constructor
and the .astype
method. Now being enforced.
xref pandas-devgh-15987.