Issue 37101: Filterer.filter can be rewritten using built-ins just as efficient much more readable (original) (raw)

Created on 2019-05-30 15:49 by dwvisser, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_filter_perf.py dwvisser,2019-05-31 13:26 The script used for performance comparisons
pr13683.diff dwvisser,2019-05-31 18:03 GitHub Pull Request 13683, commit hash de38631
Pull Requests
URL Status Linked Edit
PR 13683 closed dwvisser,2019-05-30 16:43
Messages (9)
msg343983 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-30 15:49
Alternative version of Filterer.filter(...) would look like this, which takes advantage of efficient Python 3.x built-ins, and is immediately understandable: def _filter_callable(filter): return filter.filter if hasattr(filter, 'filter') else filter def filter(self, record): filters = map(_filter_callable, self.filters) return all(f(record) for f in filters) I will add a tested pull request on GitHub.
msg343985 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-05-30 16:01
This could simplify even further with the following: def filter(self, record): return all(getattr(f, 'filter', f)(record) for f in self.filters)
msg343986 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-30 17:22
Great! My PR now uses getattr as @zach.ware has suggested.
msg344045 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2019-05-31 09:20
I grant that your version is more compact, but I don't find it more readable in general (perhaps it is for very experienced Python developers/developers who like a functional style, but I persoanlly don't think it improves readability). Since the rationale proposed for making this change is "efficiency", I would like to see some real-world performance numbers to back this up. Can you please provide some quantitative data on the amount of speedup achievable in real-world scenarios?
msg344075 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 13:26
It is with great surprise and sadness that I report that I found that performance measurements don't support merging this change. I'm attaching my performance test script. I ran it with/without the logging filters for 1000000 iterations, e.g., ./python all 1000000 The results on my system were as follows, where the numbers may be interpreted as milliseconds per 1000 log messages: No Filters 3 Filters master 17.9 19.8 CBO-37101 19.6 23.1 I imagine I could put a guard 'if' statement that would restore the 'no filters' performance to match 'master', but my use of generators apparently causes the use of significantly more cycles than the existing code.
msg344076 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 13:45
Correction on that example of running the test script: ./python test_filter_perf.py all 1000000 I simply prepended this with the Linux `time` command for measuring.
msg344098 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 18:03
Adding the patch file associated with GitHub pull request #13683, as well as the performance test I just reported.
msg344130 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 21:42
FWIW, when I tried this instead, I got 21 to 22 msec per 1000 log messages (still a 5-10% performance hit): def filter(self, record): rv = True if self.filters: rv = all(getattr(f, 'filter', f)(record) for f in self.filters) return rv I don't see that getattr(...) should be less efficient than hasattr(...), so the generator expression is the likely culprit.
msg344171 - (view) Author: Dale Visser (dwvisser) * Date: 2019-06-01 13:08
I've learned a lot about the performance trade-offs of generator expressions. The only way of shortening this code (readability is subjective) that I've found not to negatively impact performance is this: def filter(self, record): rv = True for f in self.filters: if not getattr(f, 'filter', f)(record): rv = False break return rv
History
Date User Action Args
2022-04-11 14:59:16 admin set github: 81282
2019-06-01 13:08:51 dwvisser set messages: +
2019-05-31 21:42:22 dwvisser set messages: +
2019-05-31 18:03:01 dwvisser set files: + pr13683.diffmessages: +
2019-05-31 13:45:40 dwvisser set messages: +
2019-05-31 13:26:16 dwvisser set status: open -> closedfiles: + test_filter_perf.pymessages: + resolution: rejectedstage: patch review -> resolved
2019-05-31 09:20:07 vinay.sajip set messages: +
2019-05-30 17:22:11 dwvisser set messages: +
2019-05-30 16:43:49 dwvisser set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest13570>
2019-05-30 16:01:21 zach.ware set nosy: + zach.waremessages: + versions: + Python 3.8, - Python 3.9
2019-05-30 15:51:03 SilentGhost set nosy: + vinay.sajip
2019-05-30 15:49:09 dwvisser create