Issue 34315: Regex not evalauated correctly (original) (raw)

Sample code below

import re

regex = r'DELETE\s*(?P[a-zA-z_0-9])\sFROM\s*(?P[a-zA-z_0-9]+)\s*([a-zA-Z0-9_])\s(?PWHERE){0,1}(\s.)*?'

test_str = 'DELETE FROM my_table1 t_ WHERE id in (1,2,3)'

matches = re.finditer(regex, test_str, re.MULTILINE)

print([m.groupdict() for m in matches])

Below is the expected output.

[{'table_alias': '', 'table_name': 'my_table1', 'where_statement': 'WHERE'}]

But in Win Server 2012 R2, the output is [{'table_alias': '', 'table_name': 'mytable1', 'where_statement': None}]

Using 3.7 in Win Server 2012 R2 also the output is not as expected. But in Win 10 and other linux variants, expected output is obtained.

I was able to recreate the 'bad' output on Linux using 'bad' input.

The issue is caused when you misspell WHERE, regex is looking for the exact word "WHERE", any lowercase (where), multicase (WHeRe), or misspelling (WERE) is going to cause it to return None because regex didn't find a matching substring.

I also on a whim tested out a bunch of encodings before realizing it didn't run on bytes objects anyways, so really the only way to get this output is to misspell the input. I think this problem should probably be closed as it's not a bug with the python core.