[issue39949] truncating match in regular expression match objects repr
Seth Troisi
report at bugs.python.org
Thu Jun 18 20:05:58 EDT 2020
Seth Troisi <braintwo at gmail.com> added the comment:
I was thinking about how to add the end quote and found these weird cases:
>>> "asdf'asdf'asdf"
"asdf'asdf'asdf"
>>> "asdf\"asdf\"asdf"
'asdf"asdf"asdf'
>>> "asdf\"asdf'asdf"
'asdf"asdf\'asdf'
This means that len(s) +2 (or 3 for bytes) != len(repr(s))
e.g.
>>> s = "\"''''''"
'"\'\'\'\'\'\''
>>> s
>>> len(s)
7
>>> len(repr(s))
15
This can lead to a weird partial trailing character
>>> re.match(".*", "a"*48 + "'\"")
<_sre.SRE_Match object; span=(0, 50), match='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\>
This means I'll need to rethink len(group0) >= 48 as the condition for truncation (as a 30 length string can be truncated by %.50R)
Maybe it makes sense to write group0 to a temp string and then check if that's truncated and extract the quote character from that
OR
PyUnicode_FromFormat('%R', group0[:50]) # avoids trailing escape character ('\') but might be longer than 50 characters
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue39949>
_______________________________________
More information about the Python-bugs-list
mailing list