Issue 1518406: re '' char interpretation problem (original) (raw)

Created on 2006-07-06 21:26 by ooldham, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reProblem.py ooldham,2006-07-06 21:26 short code snippet showing both problems with re and '\' char
Messages (4)
msg29083 - (view) Author: ollie oldham (ooldham) Date: 2006-07-06 21:26
I've run across 2 problems having to do with '\' character problems with the re module. Problem 1 does not match the re when it should have. Problem 2 matches, when it should not have. There is a short snippet of code attached that shows the problems I'm having, and the output as it occurs on my machine. I'm running on Windows 2000 Python versions: 2.4b1 and 2.4.3c1 both act the same way. Problem (1) : why does * work and not + ? import re rex = re.compile(r'[a-z]:\.*', re.IGNORECASE) rey = re.compile(r'[a-z]:\.+', re.IGNORECASE) path1 = r'D:\Logs' print rex.match(path1) # Matches - as it should have. print rey.match(path1) # FAILES to match - should have. Problem 2) : match occurs on nonUncPath when it should not import re uncPath = r'\\someUNC\path' nonUncPath = r'\nonUnc\path' rew = re.compile('\\\\.+', re.IGNORECASE) print rew.match(uncPath) # works as it should. print rew.match(nonUncPath) # matches and it should NOT.
msg29084 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2006-07-06 21:36
Logged In: YES user_id=7887 1) r'[a-z]:\.+' should not match r'D:\Logs'. r'\.+' matches one or more dots. There's no dot in this string. 2) '\\\\.+' is the equivalent of r'\\.+', and should match anything that starts with a '\' and has at least one char following it, which includes r'\nonUnc\path'.
msg29085 - (view) Author: ollie oldham (ooldham) Date: 2006-07-06 22:46
Logged In: YES user_id=649833 I beg to differ on problem 1) Since ‘r’ was used in the definition of both the re and path, the ‘.’ Char is not being escaped (not supposed to be anyway). And even if it is, then rex=re.compile(‘[a-z]:\\.+’, re.IGNORECASE) should get me what I want (in textual form:: char a-z colon backslash with 1 or more trailing chars). But that does not work either. I beg to differ on item 2) as well: Yes - '\\\\.+' is the equivalent of r'\\.+' BUT I then read this as: 2 backslashes with 1 or more chars – NOT backslash with escaped ‘.’
msg29086 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2006-07-06 22:55
Logged In: YES user_id=7887 Please, use a single way to report issues. Do not message *and* add a comment to the bug. I think you're missing the behavior of r'' in Python. It changes the way the Python interpreter parses the string, not the way the regular expression compiler/interpreter works. r'\.' is precisely the same as '\\.', and both of them really describe the string |\. . >>> r'\.' == '\\.' True >>> print r'\.' \. Escaping a dot means a real dot. Please have a look at the re module documentation and perhaps some general regular expression info for more details.
History
Date User Action Args
2022-04-11 14:56:18 admin set github: 43628
2006-07-06 21:26:22 ooldham create