[Tutor] regex: don't match embedded quotes
Albert-Jan Roskam
fomcl at yahoo.com
Tue Jun 11 12:56:34 CEST 2013
Hi,
I have written a regex that is supposed to match correctly quoted (single quotes on each side, or double quotes on each side) text. It works, but it also matches embedded quoted text, which I don't want to happen.
I should somehow modify the 'comment' group such that it backreferences to 'quote' and includes only the inner quote sign. Background: I am playing around with this to see how hard it would be to write my own Pygments lexer, which I could then also use in IPython notebook.
>>> import re
>>> s = "some enumeration 1 'test' 2 'blah' 3 'difficult \"One\"'."
>>> matches = re.finditer("(?P<quote>['\"])(?P<comment>[^'\"]*)(?P=quote)", s, re.DEBUG)
subpattern 1
in
literal 39
literal 34
subpattern 2
max_repeat 1 65535
in
negate None
literal 39
literal 34
groupref 1
# follow-up to a previous thread about splitting on punctuation: I have no idea how the output of re.DEBUG could help me improve my regex.
>>> [match.group("comment") for match in matches]
['test', 'blah', 'One'] # I do not want to match "One"
Regards,
Albert-Jan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the Tutor
mailing list