ENH: Timedelta isoformat by TomAugspurger · Pull Request #15136 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

TomAugspurger

Split from #14904

A few questions before I finish all the docs / tests:

The spec seems very permissive about what's allowed, so we need to choose

We probably want a reader for this format as well, maybe in the Timedelta constructor. Not sure if I'll have time to get to that before 0.20 . I'd rather prioritize the JSON table schema PR.

@jreback

fyi the parsing issue is here: #11375

jreback

"""
Format Timedelta as ISO 8601 Duration like
`P[n]Y[n]M[n]DT[n]H[n]M[n]S`, where the `[n]`s are replaced by the
values for that duration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the link to the 8601 format ?

jreback

`ISO8601 duration`_
.. _ISO601 duration: https://en.wikipedia.org/wiki/ISO\_8601

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link in the doc string as well

@jreback

jreback

result = td.isoformat()
expected = 'P0DT0H0M0.000000123S'
self.assertEqual(result, expected)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for NaT (which i think should just be nan), similar to what we do for Timestamp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. How about NaT to match Timestamp?

In [24]: pd.Timestamp('NaT').isoformat() Out[24]: 'NaT'

In [25]: pd.Timedelta('NaT').isoformat() Out[25]: 'NaT'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's fine

@TomAugspurger

Thanks for the review. Added the NaT test and fixed up some doc issues. I also added a small example in timedeltas.rst, under the Attributes section. There wasn't really an appropriate place for it in api.rst that I saw.

@codecov-io

Current coverage is 86.23% (diff: 100%)

Merging #15136 into master will increase coverage by 0.68%

@@ master #15136 diff @@

Files 145 145
Lines 51352 54034 +2682
Methods 0 0
Messages 0 0
Branches 0 0

Powered by Codecov. Last update 362e78d...9fda713

@jreback

This was referenced

Jan 15, 2017

@jorisvandenbossche

Do we need to note that a Timedelta is not actually a 'duration' as defined by that standard? (as a timedelta specified absolute time difference and not a relative)

@TomAugspurger

@jorisvandenbossche does a Timedelta actual violate the definition? The way I'm reading it, a duration can mean either absolute or relative difference.
Something like P1M, for 1 month, is going to be relative and we can't represent that as a Timedelta. But the duration from '2016-01-01' to '2016-02-01' can be represented as a timedelta.

@pwalsh does that sound right to you?

@jorisvandenbossche

Yeah, it is not really clear from the wiki page. It depends on how the 'days' are interpreted, but from "But keep in mind that "PT36H" is not the same as "P1DT12H" when switching from or to Daylight saving time." I would conclude that the days are also relative and not absolute (not 1D == 24H), and I think for a timedelta this is absolute?

@TomAugspurger

@jorisvandenbossche rereading the wiki page again today. I missed this section earlier

They should only be used as part of a time interval as prescribed by the standard. Time intervals are discussed in the next section
[ ...]
A time interval is the intervening time between two time points.

How do you read that? I'm going back and forth on it. I think that means we're OK, as python uses time deltas to represent the time between two points, and the spec allows

Duration only, such as "P1Y2M10DT2H30M", with additional context information

As one valid representation. I am concerned that other libraries don't implement an isoformat on time deltas.

@jreback

Timedelta is de-facto a duration, so I think the above definition is fine.

@jorisvandenbossche

Timedelta is de-facto a duration,

Not in the sense that it is defined on the wiki (or at least the possible notations for a duration). For example, "1 month" is not representable by a Timedelta in general.
But as Tom points out, they "should only be used as part of a time interval" according to the wiki, so which means that the duration is always interpreted based on a certain context. So in such a case, then also "1 month" is exactly defined in absolute time. Although the json table schema does not seem to follow that restriction of only using it as part of a time interval.

But anyway, this is probably too detailed discussion for this use case as long as we only specify days and below.

@TomAugspurger

@jreback

@TomAugspurger my only remaining question is this in the JSON standard anywhere / common practice? (or is there anything for describing how to store durations / intervals)?

For datetimes we have ISO and as epochs, is there such a thing for durations? (if so maybe we should have a kw arg timedelta_unit?, though could be added later).

@TomAugspurger

Durations do not have a defined beginning and end date. They are contextless.

A duration is conceptually more similar to '2 hours' than to 'between 2 and 4 pm today'. As such, they are not a good solution to converting between units that depend on context.

They do convert to ISO8601

@jreback

ok @TomAugspurger then I think your .isoformat() is just fine, its a duration, and represents exactly what Timedelta is. (whether releative or absolute is context dependent, which is fine).

@jreback

@jreback

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request

Mar 21, 2017

@TomAugspurger @AnkurDedania

Author: Tom Augspurger tom.augspurger88@gmail.com

Closes pandas-dev#15136 from TomAugspurger/timedelta-isoformat and squashes the following commits:

9fda713 [Tom Augspurger] ENH: Timedelta isoformat