[issue34304] clarification on escaping \d in regular expressions

Saba Kauser report at bugs.python.org
Wed Aug 1 01:13:39 EDT 2018


New submission from Saba Kauser <skauseribmdb at gmail.com>:

Hello,

I have a program that works well upto python 3.6 but fails with python 3.7.

import re

pattern="DBMS_NAME: string(%d) %s"
sym = ['\[','\]','\(','\)']
for chr in sym:
  pattern = re.sub(chr, '\\' + chr, pattern)
  print(pattern)
  
pattern=re.sub('%s','.*?',pattern)
print(pattern)
pattern = re.sub('%d', '\\d+', pattern) 
print(pattern)
result=re.match(pattern, "DBMS_NAME: string(8) \"DB2/NT64\" ")
print(result)
result=re.match("DBMS_NAME python4: string\(\d+\) .*?", "DBMS_NAME python4: string(8) \"DB2/NT64\" ")
print(result)

expected output:
DBMS_NAME: string(%d) %s
DBMS_NAME: string(%d) %s
DBMS_NAME: string\(%d) %s
DBMS_NAME: string\(%d\) %s
DBMS_NAME: string\(%d\) .*?
DBMS_NAME: string\(\d+\) .*?
<re.Match object; span=(0, 21), match='DBMS_NAME: string(8) '>
<re.Match object; span=(0, 29), match='DBMS_NAME python4: string(8) '>

However, the below statement execution fails with python 3.7:
pattern = re.sub('%d', '\\d+', pattern) 

DBMS_NAME: string(%d) %s
DBMS_NAME: string(%d) %s
DBMS_NAME: string\(%d) %s
DBMS_NAME: string\(%d\) %s
DBMS_NAME: string\(%d\) .*?
Traceback (most recent call last):
  File "c:\users\skauser\appdata\local\programs\python\python37\lib\sre_parse.py", line 1021, in parse_template
    this = chr(ESCAPES[this][1])
KeyError: '\\d'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pattern.txt", line 11, in <module>
    pattern = re.sub('%d', '\\d+', pattern)
  File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "c:\users\skauser\appdata\local\programs\python\python37\lib\sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \d at position 0

if I change the statement to have 3 backslash like 
pattern = re.sub('%d', '\\\d+', pattern) 

I can correctly generate correct regular expression.

Can you please comment if this has changed in python 3.7 and we need to escape 'd' in '\d' as well ?

Thank you!

----------
components: Regular Expressions
messages: 322842
nosy: ezio.melotti, mrabarnett, sabakauser
priority: normal
severity: normal
status: open
title: clarification on escaping \d in regular expressions
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34304>
_______________________________________


More information about the Python-bugs-list mailing list