Issue 37101: Filterer.filter can be rewritten using built-ins just as efficient much more readable (original) (raw)

Created on 2019-05-30 15:49 by dwvisser, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_filter_perf.py	dwvisser,2019-05-31 13:26	The script used for performance comparisons
pr13683.diff	dwvisser,2019-05-31 18:03	GitHub Pull Request 13683, commit hash de38631

Pull Requests
URL	Status	Linked	Edit
PR 13683	closed	dwvisser,2019-05-30 16:43

Messages (9)
msg343983 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-30 15:49
Alternative version of Filterer.filter(...) would look like this, which takes advantage of efficient Python 3.x built-ins, and is immediately understandable: def _filter_callable(filter): return filter.filter if hasattr(filter, 'filter') else filter def filter(self, record): filters = map(_filter_callable, self.filters) return all(f(record) for f in filters) I will add a tested pull request on GitHub.
msg343985 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2019-05-30 16:01
This could simplify even further with the following: def filter(self, record): return all(getattr(f, 'filter', f)(record) for f in self.filters)
msg343986 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-30 17:22
Great! My PR now uses getattr as @zach.ware has suggested.
msg344045 - (view)	Author: Vinay Sajip (vinay.sajip) *	Date: 2019-05-31 09:20
I grant that your version is more compact, but I don't find it more readable in general (perhaps it is for very experienced Python developers/developers who like a functional style, but I persoanlly don't think it improves readability). Since the rationale proposed for making this change is "efficiency", I would like to see some real-world performance numbers to back this up. Can you please provide some quantitative data on the amount of speedup achievable in real-world scenarios?
msg344075 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 13:26
It is with great surprise and sadness that I report that I found that performance measurements don't support merging this change. I'm attaching my performance test script. I ran it with/without the logging filters for 1000000 iterations, e.g., ./python all 1000000 The results on my system were as follows, where the numbers may be interpreted as milliseconds per 1000 log messages: No Filters 3 Filters master 17.9 19.8 CBO-37101 19.6 23.1 I imagine I could put a guard 'if' statement that would restore the 'no filters' performance to match 'master', but my use of generators apparently causes the use of significantly more cycles than the existing code.
msg344076 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 13:45
Correction on that example of running the test script: ./python test_filter_perf.py all 1000000 I simply prepended this with the Linux `time` command for measuring.
msg344098 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 18:03
Adding the patch file associated with GitHub pull request #13683, as well as the performance test I just reported.
msg344130 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 21:42
FWIW, when I tried this instead, I got 21 to 22 msec per 1000 log messages (still a 5-10% performance hit): def filter(self, record): rv = True if self.filters: rv = all(getattr(f, 'filter', f)(record) for f in self.filters) return rv I don't see that getattr(...) should be less efficient than hasattr(...), so the generator expression is the likely culprit.
msg344171 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-06-01 13:08
I've learned a lot about the performance trade-offs of generator expressions. The only way of shortening this code (readability is subjective) that I've found not to negatively impact performance is this: def filter(self, record): rv = True for f in self.filters: if not getattr(f, 'filter', f)(record): rv = False break return rv

History
Date	User	Action	Args
2022-04-11 14:59:16	admin	set	github: 81282
2019-06-01 13:08:51	dwvisser	set	messages: +
2019-05-31 21:42:22	dwvisser	set	messages: +
2019-05-31 18:03:01	dwvisser	set	files: + pr13683.diffmessages: +
2019-05-31 13:45:40	dwvisser	set	messages: +
2019-05-31 13:26:16	dwvisser	set	status: open -> closedfiles: + test_filter_perf.pymessages: + resolution: rejectedstage: patch review -> resolved
2019-05-31 09:20:07	vinay.sajip	set	messages: +
2019-05-30 17:22:11	dwvisser	set	messages: +
2019-05-30 16:43:49	dwvisser	set	keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest13570>
2019-05-30 16:01:21	zach.ware	set	nosy: + zach.waremessages: + versions: + Python 3.8, - Python 3.9
2019-05-30 15:51:03	SilentGhost	set	nosy: + vinay.sajip
2019-05-30 15:49:09	dwvisser	create