BUG: DatetimeIndex.insert doesnt preserve name and tz by sinhrks · Pull Request #7299 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation21 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

sinhrks

DatetimeIndex.insert doesn't preserve name and tz attributes. Also modified DatetimeIndex.asobject to return an object Index which has the same name as original to cover the case when the result is being object Index.

# normal Index preserves its name after insertion
idx = pd.Index([1, 2, 3], name='normal')
inserted = idx.insert(0, 1)
inserted.name
# normal

# But DatetimeIndex doesn't
dtidx = pd.DatetimeIndex([datetime.datetime(2011, 1, 3), datetime.datetime(2011, 1, 4)], freq='M', name='dtidx')
inserted = dtidx.insert(0, datetime.datetime(2011, 1, 5))
inserted.name
# None

@jreback

did you test if adding a new date to a time-series with a given frequency then changes to None if the inferred different? (e.g. adding a 2nd day of the month to a MS or something). note that I think this check should be done before passing to the constructor as its more efficient (no need to really reinfer the freq). - note that this is NOT true for deleting (and already merged that)

@sinhrks

@jreback Added a logic to check the item is on freq before inserting. Is it correct that inserting '2011-01-07' to Daily frequency index [2011-01-01, 2011-02, 2011-03'] is still Daily frequency? (Frequency is valid if all the values are on the offset, and unrelated to sorted / continuous).

And should set inferred_freq in case of the result's freq is None?

@jreback

I am pretty sure that insert will almost always reset the freq to None (I think it reinfers inside the constructor).

just want to tests (and you example is not True, freq is None)

In [4]: i = pd.date_range('20130101',periods=3).insert(3,Timestamp('20130107'))

In [5]: i
Out[5]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-01-07]
Length: 4, Freq: None, Timezone: None

In [6]: pd.infer_freq(i)

@immerrr

Sorry, for barging in, I could've sworn I've seen code like this elsewhere. I've finally remembered: it was the subclassing guide. in numpy docs.

As long as Index is a subclass of ndarray (btw, are there plans to decouple Index from ndarray?), it seems a better way to overload __array_wrap__ to transfer metadata to new objects, otherwise you're going to need similar workarounds for a lot of functions in numpy.

@jreback

hmm, @immerrr that is right....a better way (I thought that was already done....guess not)

yes, in theory plans to decouple Index from ndarray, but @jtratner not really actively working on this. Its not that hard actually. interested?

@sinhrks

OK. Based on inferred_freq, I understand that values are all on the offset, continuous and sorted to be the specific frequency.

Because timezone is corrupted if values are directly passed to DatetimeIndex.__init__, I use _simple_new and set inferred_freq separately (like current union does). I'll check __array_wrap__ if it makes better implementation.

didx = pd.DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30'])
didx.inferred_freq
# M

didx = pd.DatetimeIndex(['2000-01-31', '2000-03-31', '2000-02-29', '2000-04-30'])
didx.inferred_freq
# None

didx = pd.DatetimeIndex(['2000-01-31', '2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30'])
didx.inferred_freq
# None

didx = pd.DatetimeIndex(['2000-01-31', '2000-03-31', '2000-04-30'])
didx.inferred_freq
# None

#7302 should also be fixed because delete may result in intermittent values?
Fortunately I think my other previous PR looks not affected...

@immerrr

@jreback I'm now in a rather deadline-ish mode just skimming through mail in spare time. Once I'm done with that, I could take a stab at it.

@jreback

@sinhrks yep I think that is correct; you basically DON't want to pass in a freq when you insert and let it reinfer (what I was saying is a) test this b) you can prob do a quick check if the inserted matches the same freq and would be faster so you CAN pass in a new freq).

also, I think delete should work the same (so need to fix that too).

maybe a function _maybe_infer or soething which you pass the kwargs and the new array

@rosnfeld

Edit: moving my comment to #7302

@sinhrks

@jreback. Thanks. But I think checking whether the constructed DTI meets the freq in advance is little difficult in some cases. For example:

# If target is on the edge, easy to check
pd.DatetimeIndex(['2011-01-01, '2011-01-02', '2011-01-03', '2011-01-04']).insert(4, pd.Timestamp('2011-01-05'))

# But following is difficult, calling infer_freq after construction is easier.
pd.DatetimeIndex(['2011-01-01, '2011-01-03', '2011-01-04']).insert(1, pd.Timestamp('2011-01-02'))

I've modified to call infer_freq after index construction, but is it different from your idea?

@jreback

I think the constructor should infer it (I just don't remember exactly how/when this happens), and in fact you may not want to infer it (e.g. let it happen automatically if necessary). Which I think is the prior behavior. So maybe don't don't explicity infer at all (i was just thinking it would be faster/easier to do that).

@jreback

@sinhrks I was thinking that checking the edge cases only for both insert/delete makes sense (for delete its easy as it would be the same freq), insert if the new stamp obeys the current freq. otherwise just pass thru freq=None which doesn't automatically infer (i think), also see @rosnfeld comment #7302 (which I think is related to the automatic inferring)

@sinhrks

@jreback OK. I've modified, and hope to meet what you saying. I feel the check is very specific logic and not split to a separate function for now.

@jreback

@jreback

@rosnfeld can you pull in this PR and see if you are talking about in #7302 is changed / same

@sinhrks

Thanks. I understand @rosnfeld is pointing the difference caused by #7302 (Index.delete), not included in this PR (Index.insert). I've prepared the same fix for #7302 and issue separate PR.

@jreback

ok, that's fine (but its actually the same issue),

@rosnfeld

I don't see the code for this pull request changing anything about the delete behavior from what was already done in #7302, but #7320 breaks things, as noted on that PR. I can continue the conversation here or there, whichever you prefer.

@jreback

@sinhrks pls fix this up similarly to .delete and I think good 2 go

@sinhrks

@sinhrks

jreback added a commit that referenced this pull request

Jun 6, 2014

@jreback

BUG: DatetimeIndex.insert doesnt preserve name and tz

@jreback

this is fine for now...in the future could extend it; maybe add an issue for 0.15?

Labels

Bug Indexing

Related to indexing on series/frames, not to indexes themselves