Pattern matching optimization. Comparison of values specified through "|" (original) (raw)
November 6, 2022, 3:20pm 1
Without further ado, let’s take a look at an example:
import dis, timeit
a = 'lctrl'
def match_case_ors():
match a:
case 'ctrl' | 'lctrl' | 'rctrl': ...
case _: ...
def if_contains():
if a in {'ctrl', 'lctrl', 'rctrl'}: ...
else: ...
dis.dis(match_case_ors)
print('-' * 84)
dis.dis(if_contains)
match_case = timeit.timeit(match_case_ors)
if_stmt = timeit.timeit(if_contains)
print('\n'
f'{match_case = :g}\n'
f'{if_stmt = :g}\n'
f'speedup {1. - if_stmt / match_case:.1%}'
)
We can see that if is ~15-30% (and higher) faster. And the more elements, the faster if works; while the pattern matching gets slower and slower.
The solution is simple. Instead of
9 14 COPY 1
16 LOAD_CONST 1 ('ctrl')
18 COMPARE_OP 2 (==)
24 POP_JUMP_IF_FALSE 3 (to 32)
26 POP_TOP
10 28 LOAD_CONST 0 (None)
30 RETURN_VALUE
9 >> 32 COPY 1
34 LOAD_CONST 2 ('lctrl')
36 COMPARE_OP 2 (==)
42 POP_JUMP_IF_FALSE 3 (to 50)
44 POP_TOP
10 46 LOAD_CONST 0 (None)
48 RETURN_VALUE
9 >> 50 COPY 1
52 LOAD_CONST 3 ('rctrl')
54 COMPARE_OP 2 (==)
60 POP_JUMP_IF_FALSE 3 (to 68)
62 POP_TOP
10 64 LOAD_CONST 0 (None)
66 RETURN_VALUE
Generate bytecode like:
16 2 LOAD_GLOBAL 0 (a)
14 LOAD_CONST 1 (frozenset({'ctrl', 'lctrl', 'rctrl'}))
16 CONTAINS_OP 0
18 POP_JUMP_IF_FALSE 2 (to 24)
17 20 LOAD_CONST 0 (None)
22 RETURN_VALUE
In such cases, it is the use of a set or a frozen set that will give such an increase, otherwise it is not significant.
And since the behavior of the set in this case is predictable and does exactly what we need, it seems to me a good idea to implicitly create a set in such pattern matching.
If someone does not like this option, there is a less optimized, but quite obvious solution.
Instead of POP_JUMP_IF_FALSE, use the opposite POP_JUMP_IF_TRUE, just like lazy or does.
16 2 LOAD_GLOBAL 0 (a)
14 LOAD_CONST 1 ('ctrl')
16 COMPARE_OP 2 (==)
22 POP_JUMP_IF_TRUE 22 (to 68)
24 LOAD_GLOBAL 0 (a)
36 LOAD_CONST 2 ('lctrl')
38 COMPARE_OP 2 (==)
44 POP_JUMP_IF_TRUE 11 (to 68)
46 LOAD_GLOBAL 0 (a)
58 LOAD_CONST 3 ('rctrl')
60 COMPARE_OP 2 (==)
66 POP_JUMP_IF_FALSE 2 (to 72)
18 >> 68 LOAD_CONST 0 (None)
70 RETURN_VALUE
Otherwise, using if in such cases is more performant, but not as elegant as pattern matching.
This is my first thread, so I may have misunderstood something. Also, my native language is Russian, not English.
Thanks.
Melendowski (Matt D) November 6, 2022, 4:05pm 2
I haven’t got a ton of experience with pattern matching but from what I’ve seen people do with it, and then note that it’s slower, that it’s being used incorrectly… Or at the very least compared incorrectly.
I’m on mobile and cannot check but what’s the generated byte code for this
def if_any_equals():
if any(a == x for x in ('ctrl', 'lctrl', 'rctrl')):
...
else:
...
I think that’s a more valid comparison as you alluded to with your notes about using containment checks in a set
Is there a way to check, without a lot of overhead, that all elements of |
are hashable to simply place them in a set
and do a containment check for the match-case
?
15 0 RESUME 0
16 2 LOAD_GLOBAL 1 (NULL + any)
14 LOAD_CONST 1 (<code object <genexpr> at 0x0000016F5FD45890, file "D:\Documents\Projects\PycharmProjects\newvertest\main.py", line 16>)
16 MAKE_FUNCTION 0
18 LOAD_CONST 2 (('ctrl', 'lctrl', 'rctrl'))
20 GET_ITER
22 CALL 0
32 CALL 1
42 POP_JUMP_IF_FALSE 2 (to 48)
17 44 LOAD_CONST 0 (None)
46 RETURN_VALUE
19 >> 48 LOAD_CONST 0 (None)
50 RETURN_VALUE
Disassembly of <code object <genexpr> at 0x0000016F5FD45890, file "D:\Documents\Projects\PycharmProjects\newvertest\main.py", line 16>:
16 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
6 LOAD_FAST 0 (.0)
>> 8 FOR_ITER 15 (to 42)
12 STORE_FAST 1 (x)
14 LOAD_GLOBAL 0 (a)
26 LOAD_FAST 1 (x)
28 COMPARE_OP 2 (==)
34 YIELD_VALUE 2
36 RESUME 1
38 POP_TOP
40 JUMP_BACKWARD 17 (to 8)
>> 42 LOAD_CONST 0 (None)
44 RETURN_VALUE
match_case = 0.0502488
if_stmt = 0.378934
speedup -654.1%
As you can see, this implementation slowed down the code by more than 6.5 times, it creates an additional generator object and a call to the built-in function, so this is definitely not what suits us, any is used in other cases (when you have for example a large array of values to be checked, this is definitely not the case).
By the way, you can. Using PyDroid or Termux (mini linux)
dis module for bytecode.
Melendowski (Matt D) November 6, 2022, 4:48pm 5
Ok so the generator adds significant overhead. Then the closest if statement equivalent might be
def if_else_eq_or():
if a == 'ctrl' or a == 'lctrl' or a == 'rctrl':
...
else:
...
Which has a lot less flexible than any()
but it closer to the match case
brandtbucher (Brandt Bucher) November 6, 2022, 5:57pm 6
Hi @nikdissv-Forever! Thanks taking the time to look into improving match
performance.
This is definitely an interesting idea, and we have considered it. Unfortunately, it’s not equivalent to the current behavior: we currently don’t require hashability for matching against value patterns. Checking for containment in a set on the other hand, will raise a TypeError
if the subject is unhashable, and also causes problems for subjects with custom __eq__
implementations. This is the same reason why we don’t turn if a in ('ctrl', 'lctrl', 'rctrl'): ...
into a frozenset
either.
It’s certainly possible to consider adding a “fast path” for subjects that are instances of “native” Python types like int
or str
. However, that adds more conditional branches, complexity, and code bloat, and might not result in a significant improvement for common cases (especially on newer Python versions, which already contain bytecode-level optimizations for things like conditional branches based on string equality, which your example features).
Just a note: the code you provide here isn’t equivalent either (and it’s probably slower). You’re reloading the global name a
for each sub-pattern.
However the bytecode compiler’s decision to inline the return block in the original code is sort of interesting. I’d like to take a look into why the control flow is a bit unusual here.
Rosuav (Chris Angelico) November 6, 2022, 7:24pm 7
If the set is replaced with a tuple, the performance improvement weakens (and becomes a bit chaotic), but is still there. This is true even when matching the last element provided, and when failing to match.
It would, however, be a small change in behaviour: instead of just doing an equality check, it would do an identity-or-equality check. Consider:
>>> class Const:
... nan = float("nan")
...
>>> match Const.nan:
... case 'foo' | 'bar' | Const.nan: print("Gotcha")
... case _: print("Nope")
...
Nope
>>> Const.nan in ('foo', 'bar', Const.nan)
True
So if a fast path were added, it would have to exclude floats for this reason (unless it’s decided that that change is acceptable, but that’s not just a performance question any more). So I guess the question is: how common is it to match against multiple strings/ints, and would it be worth optimizing this case at the cost of complexity?
I’ll tell you specifically about myself. I needed this approach in one project, but due to performance, I preferred to write it using ifs.
I perfectly understood the problems of the method I proposed
Indeed, the check for the hashability of the object in this case will be superfluous and we most likely will not predict cases when this is the case, this is not optimal.
Later I will experiment with different cases, see which jumps the compiler generates.
So far, it seems to me that the match case does not use lazy optimization at all and always checks all values. Perhaps of course this is not always the case.
Nevertheless, this cannot be left without optimization, because the use of ifs where the matching pattern could be is not preferable.
Does anyone have any ideas?
I managed to get it to do it lazily. All it took was to add a couple of branches. To be fair, what the compiler generated works well this time.
def match_case_ors(key='alt'): # I decided to transfer it to local variables, it did not give a visible effect.
match key:
case ('ctrl' | 'lctrl' | 'rctrl'):
result = 'control'
case ('alt' | 'lalt' | 'ralt'):
result = 'alt'
case _:
result = 'default'
return result
If we rewrite this in ifs:
def if_contains(key='ctrll'):
# if key in {'ctrl', 'lctrl', 'rctrl'}:
if key == 'ctrl' or key == 'lctrl' or key == 'rctrl':
result = 'control'
# elif key in {'alt', 'lalt', 'ralt'}:
elif key == 'alt' or key == 'lalt' or key == 'ralt':
result = 'alt'
else:
result = 'default'
return result
It will be the fastest by 5-10%, this is not such a big difference at all to optimize something in a match case with this.
# if_contains
18 2 LOAD_FAST 0 (key)
4 LOAD_CONST 1 ('ctrl')
6 COMPARE_OP 2 (==)
12 POP_JUMP_IF_TRUE 12 (to 38)
14 LOAD_FAST 0 (key)
...
...
19 >> 38 LOAD_CONST 4 ('control')
40 STORE_FAST 1 (result)
25 42 LOAD_FAST 1 (result)
44 RETURN_VALUE
...
At the same time, the order of jumps in the match case is slightly different.
...
6 COPY 1
8 LOAD_CONST 1 ('ctrl')
10 COMPARE_OP 2 (==)
16 POP_JUMP_IF_FALSE 1 (to 20)
18 JUMP_FORWARD 16 (to 52)
>> 20 COPY 1
...
As you can see, the match does POP_JUMP_IF_FALSE, and if true then JUMP_FORWARD.
That is, it ends up jumping anyway, whereas if does one jump if true (instead of JUMP_FORWARD).
this gives it a slight difference in speed, as I said, this behavior is quite acceptable, but as you can see it is still not ideal. If someone finds time to fix it, then it’s good, if not, then it’s okay.