BUG: Timestamp cannot parse nanosecond from string by sinhrks · Pull Request #7907 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation13 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Fixes 2 problems related to Timestamp
string parsing:
Timestamp
parsing results incorrect if input contains offset string (Closes BUG: Constructing Timestamp from local time + offset + time zone yields wrong offset #7833).
NOTE: Also modifiedTimestamp.__repr__
to display fixed timezone info, because this can be eitherpytz
ordateutil
.
# Result after the fix
# If string contains offset, it will be parsed using fixed timezone offset. Following results in 2013-11-01 05:00:00 in UTC (There is some existing tests checking this behaviour).
repr(pd.Timestamp('2013-11-01 00:00:00-0500'))
# Timestamp('2013-11-01 00:00:00-0500', tz='pytz.FixedOffset(-300)')
# If tz is specified simultaneously, it should convert the timezone.
repr(pd.Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago'))
# Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')
repr(pd.Timestamp('2013-11-01 00:00:00-0500', tz='Asia/Tokyo'))
# Timestamp('2013-11-01 14:00:00+0900', tz='Asia/Tokyo')
Timestamp
losesnanosecond
info when parsing fromstr
. (Closes Timestamp losing nanosecond accuracy when reading from string #7878)
# Result after the fix
repr(pd.Timestamp('2001-01-01 00:00:00.000000005'))
# Timestamp('2001-01-01 00:00:00.000000005')
repr(pd.Timestamp('2001-01-01 00:00:00.000000005', tz='US/Eastern'))
# Timestamp('2001-01-01 00:00:00.000000005-0500', tz='US/Eastern')
repr(pd.Timestamp('2001-01-01 00:00:00.000001234+09:00'))
# Timestamp('2001-01-01 00:00:00.000001234+0900', tz='pytz.FixedOffset(540)')
repr(pd.Timestamp('2001-01-01 00:00:00.000001234+09:00', tz='Asia/Tokyo'))
# Timestamp('2001-01-01 00:00:00.000001234+0900', tz='Asia/Tokyo')
# Because non ISO 8601 format is parsed by dateutil, nanosecond will lost (no change)
repr(pd.Timestamp('01-01-01 00:00:00.000000001'))
# Timestamp('2001-01-01 00:00:00')
repr(pd.Timestamp('01-01-01 00:00:00.000000001+9:00'))
# Timestamp('2001-01-01 00:00:00+0900', tz='tzoffset(None, 32400)')
CC @cyber42 @ischwabacher @rockg @adamgreenhall
obj.value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &obj.dts) |
---|
_check_dts_bounds(&obj.dts) |
if out_tzoffset != -1: |
obj.tzinfo = pytz.FixedOffset(out_tzoffset) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is preferable for fixed offset, pytz.FixedOffset
or dateutil.tz.tzoffset
?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a pytz ; dateutil must be explicitly specified (in theory could have an option for this but will defer that to the future - you can create an issue if u would like - pls copy @dbew on it)
@sinhrks can you run a perf test and post if anything amiss (I don't think so, but just to be sure)
As the original one gets little slower, modified the logic a little. Following is a perf before/after the modification. Looks varied, but no test gets constantly worse.
Perf with original logic in this PR (for ref)
-------------------------------------------------------------------------------
Test name | head[ms] | base[ms] | ratio |
-------------------------------------------------------------------------------
read_parse_dates_iso8601 | 3.5694 | 2.3197 | 1.5387 |
1st run with current logic
-------------------------------------------------------------------------------
Test name | head[ms] | base[ms] | ratio |
-------------------------------------------------------------------------------
read_parse_dates_iso8601 | 2.3770 | 2.3616 | 1.0065 |
...
packers_read_pickle | 306.5277 | 254.1726 | 1.2060 |
frame_apply_lambda_mean | 9.4993 | 7.8301 | 1.2132 |
write_store_table_dc | 246.1410 | 197.7520 | 1.2447 |
frame_dtypes | 0.2213 | 0.1760 | 1.2579 |
reindex_daterange_pad | 2.0043 | 1.5914 | 1.2595 |
read_store_table_mixed | 21.8923 | 17.3550 | 1.2614 |
frame_mask_bools | 33.6537 | 26.1207 | 1.2884 |
frame_get_numeric_data | 0.2956 | 0.2290 | 1.2908 |
indexing_dataframe_boolean_rows_object | 1.3426 | 0.9793 | 1.3710 |
index_float64_div | 8.8797 | 6.3930 | 1.3890 |
frame_dropna_axis1_all | 426.2613 | 299.9337 | 1.4212 |
2nd run
-------------------------------------------------------------------------------
Test name | head[ms] | base[ms] | ratio |
-------------------------------------------------------------------------------
read_parse_dates_iso8601 | 2.3813 | 2.3716 | 1.0041 |
...
strings_match | 12.7503 | 10.5516 | 1.2084 |
timeseries_1min_5min_mean | 1.5577 | 1.2853 | 1.2119 |
reindex_daterange_backfill | 1.8867 | 1.5500 | 1.2172 |
frame_ctor_dtindex_CBMonthEndx1 | 7.7756 | 6.3493 | 1.2246 |
eval_frame_add_all_threads | 35.2084 | 28.6183 | 1.2303 |
frame_ctor_dtindex_CustomBusinessDayx1 | 3.8170 | 3.0283 | 1.2604 |
eval_frame_mult_one_thread | 37.0866 | 29.2060 | 1.2698 |
eval_frame_chained_cmp_python_one_thread | 135.0936 | 104.4240 | 1.2937 |
query_with_boolean_selection | 74.9604 | 57.4800 | 1.3041 |
dtindex_from_series_ctor | 0.0354 | 0.0270 | 1.3088 |
strings_center | 9.6043 | 7.3230 | 1.3115 |
stat_ops_frame_sum_int_axis_1 | 2.3720 | 1.7384 | 1.3645 |
strings_pad | 10.5731 | 7.6413 | 1.3837 |
groupby_multi_different_functions | 38.6817 | 25.0644 | 1.5433 |
This is kind of difficult to go through. The tests look okay to me...cannot comment on the specific changes, though.
@@ -399,7 +401,8 @@ Bug Fixes |
---|
- Bug in ``Series.str.cat`` with an index which was filtered as to not include the first item (:issue:`7857`) |
- Bug in ``Timestamp`` cannot parse ``nanosecond`` from string (:issue:`7878`) |
- Bug in ``Timestamp`` with string offset and ``tz`` results in incorrect (:issue:`7833`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean to say just incorrect, right?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, should be more specific, or grammatically wrong?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was just a minor grammar point
just say incorrect (not in incorrect)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
jreback added a commit that referenced this pull request
BUG: Timestamp cannot parse nanosecond from string
This was referenced
Aug 11, 2014