ENH: Timedelta isoformat by TomAugspurger · Pull Request #15136 · pandas-dev/pandas (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Split from #14904
A few questions before I finish all the docs / tests:
The spec seems very permissive about what's allowed, so we need to choose
- Do we include components pandas can't represent like Years and Months with 0
(P0Y0M...)
(no) - Do we include components whose value is 0, e.g.
P0DT...
vs.PT...
(yes) - Do we 0 pad units, e.g.
...T5H
vsT05H
(no) - Do we strip trailing 0s in the seconds decimal:
...6.000000000S
vs.6S
(yes)
We probably want a reader for this format as well, maybe in the Timedelta constructor. Not sure if I'll have time to get to that before 0.20 . I'd rather prioritize the JSON table schema PR.
fyi the parsing issue is here: #11375
""" |
---|
Format Timedelta as ISO 8601 Duration like |
`P[n]Y[n]M[n]DT[n]H[n]M[n]S`, where the `[n]`s are replaced by the |
values for that duration. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add the link to the 8601 format ?
`ISO8601 duration`_ |
---|
.. _ISO601 duration: https://en.wikipedia.org/wiki/ISO\_8601 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this link in the doc string as well
result = td.isoformat() |
---|
expected = 'P0DT0H0M0.000000123S' |
self.assertEqual(result, expected) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for NaT (which i think should just be nan), similar to what we do for Timestamp
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. How about NaT
to match Timestamp
?
In [24]: pd.Timestamp('NaT').isoformat() Out[24]: 'NaT'
In [25]: pd.Timedelta('NaT').isoformat() Out[25]: 'NaT'
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep that's fine
Thanks for the review. Added the NaT
test and fixed up some doc issues. I also added a small example in timedeltas.rst
, under the Attributes
section. There wasn't really an appropriate place for it in api.rst
that I saw.
Current coverage is 86.23% (diff: 100%)
@@ master #15136 diff @@
Files 145 145
Lines 51352 54034 +2682
Methods 0 0
Messages 0 0
Branches 0 0
- Hits 43932 46598 +2666
- Misses 7420 7436 +16
Partials 0 0
Powered by Codecov. Last update 362e78d...9fda713
This was referenced
Jan 15, 2017
Do we need to note that a Timedelta
is not actually a 'duration' as defined by that standard? (as a timedelta specified absolute time difference and not a relative)
@jorisvandenbossche does a Timedelta
actual violate the definition? The way I'm reading it, a duration can mean either absolute or relative difference.
Something like P1M
, for 1 month, is going to be relative and we can't represent that as a Timedelta
. But the duration from '2016-01-01' to '2016-02-01' can be represented as a timedelta.
@pwalsh does that sound right to you?
Yeah, it is not really clear from the wiki page. It depends on how the 'days' are interpreted, but from "But keep in mind that "PT36H" is not the same as "P1DT12H" when switching from or to Daylight saving time." I would conclude that the days are also relative and not absolute (not 1D == 24H), and I think for a timedelta this is absolute?
@jorisvandenbossche rereading the wiki page again today. I missed this section earlier
They should only be used as part of a time interval as prescribed by the standard. Time intervals are discussed in the next section
[ ...]
A time interval is the intervening time between two time points.
How do you read that? I'm going back and forth on it. I think that means we're OK, as python uses time deltas to represent the time between two points, and the spec allows
Duration only, such as "P1Y2M10DT2H30M", with additional context information
As one valid representation. I am concerned that other libraries don't implement an isoformat
on time deltas.
Timedelta
is de-facto a duration, so I think the above definition is fine.
Timedelta is de-facto a duration,
Not in the sense that it is defined on the wiki (or at least the possible notations for a duration). For example, "1 month" is not representable by a Timedelta in general.
But as Tom points out, they "should only be used as part of a time interval" according to the wiki, so which means that the duration is always interpreted based on a certain context. So in such a case, then also "1 month" is exactly defined in absolute time. Although the json table schema does not seem to follow that restriction of only using it as part of a time interval.
But anyway, this is probably too detailed discussion for this use case as long as we only specify days and below.
@TomAugspurger my only remaining question is this in the JSON standard anywhere / common practice? (or is there anything for describing how to store durations / intervals)?
For datetimes we have ISO and as epochs, is there such a thing for durations? (if so maybe we should have a kw arg timedelta_unit
?, though could be added later).
- This JS library converts absolute durations, like timedeltas, to iso8601 duration format.
- moment.js is quite clear that its durations are relative, not absolute
Durations do not have a defined beginning and end date. They are contextless.
A duration is conceptually more similar to '2 hours' than to 'between 2 and 4 pm today'. As such, they are not a good solution to converting between units that depend on context.
- this python package seems to read and write between iso8601 durations and time deltas.
ok @TomAugspurger then I think your .isoformat()
is just fine, its a duration, and represents exactly what Timedelta
is. (whether releative or absolute is context dependent, which is fine).
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request
Author: Tom Augspurger tom.augspurger88@gmail.com
Closes pandas-dev#15136 from TomAugspurger/timedelta-isoformat and squashes the following commits:
9fda713 [Tom Augspurger] ENH: Timedelta isoformat