PERF: Release GIL on some datetime ops by chris-b1 · Pull Request #11263 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation11 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This is a WIP, but far enough along I thought I'd share and see if the approach was reasonable.
This releases the GIL on most vectorized field accessors (e.g. dt.year
) and conversion to and from Period
. May be places it could be done - obviously would be nice for parsing, but I'm not sure that's possible.
ohh nice!
can u share some timings?
continue |
---|
out[i] = period_ordinal_to_dt64(periodarr[i], freq) |
with nogil: |
if periodarr[i] == NPY_NAT: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move nogil outside the loop
Here are some timings - getting a pretty nice speedup. In single-threaded case things are looking about flat.
In [1]: from pandas.util.testing import test_parallel In [2]: dti = pd.date_range('1900-1-1', periods=100000)
In [3]: def f(): ...: for i in range(4): ...: dti.year In [4]: @test_parallel(4) ...: def g(): ...: dti.year
In [8]: %timeit f() 10 loops, best of 3: 25.8 ms per loop
In [9]: %timeit g() 100 loops, best of 3: 7.71 ms per loop
pandas_datetime_to_datetimestruct(dtindex[i], PANDAS_FR_ns, &dts) |
---|
out[i] = monthrange(dts.year, dts.month)[1] |
pandas_datetime_to_datetimestruct(dtindex[i], PANDAS_FR_ns, &dts) |
out[i] = days_per_month_table[is_leapyear(dts.year)][dts.month-1] |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob makes sense to define this as a c-function and make it nogil (the days_per_month......)
@@ -3849,6 +3849,7 @@ def get_time_micros(ndarray[int64_t] dtindex): |
---|
@cython.wraparound(False) |
@cython.boundscheck(False) |
def get_date_field(ndarray[int64_t] dtindex, object field): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you declared field
as char[:]
instead would you be able to nogil
the whole thing until raise
?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-b1 loooks good. can you add a whatsnew note (perf) and squash.
chris-b1 changed the title
(WIP) PERF: Release GIL on some datetime ops PERF: Release GIL on some datetime ops
jreback added a commit that referenced this pull request
PERF: Release GIL on some datetime ops
@chris-b1 can you add these (clean then make again to see them)
warning: pandas/src/period.pyx:144:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:145:23: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:147:55: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:148:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:169:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:170:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:15: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:53: Use boundscheck(False) for faster access
building 'pandas._period' extension