[Tutor] regex: don't match embedded quotes
Albert-Jan Roskam
fomcl at yahoo.com
Tue Jun 11 13:05:38 CEST 2013
----- Original Message -----
> From: Albert-Jan Roskam <fomcl at yahoo.com>
> To: Python Mailing List <tutor at python.org>
> Cc:
> Sent: Tuesday, June 11, 2013 12:56 PM
> Subject: [Tutor] regex: don't match embedded quotes
>
> Hi,
>
> I have written a regex that is supposed to match correctly quoted (single quotes
> on each side, or double quotes on each side) text. It works, but it also matches
> embedded quoted text, which I don't want to happen.
> I should somehow modify the 'comment' group such that it backreferences
> to 'quote' and includes only the inner quote sign. Background: I am
> playing around with this to see how hard it would be to write my own Pygments
> lexer, which I could then also use in IPython notebook.
>
>>>> import re
>>>> s = "some enumeration 1 'test' 2 'blah' 3
> 'difficult \"One\"'."
>>>> matches =
> re.finditer("(?P<quote>['\"])(?P<comment>[^'\"]*)(?P=quote)",
> s, re.DEBUG)
<snip>
Okay, I am having blood-shut eyes now, but I think I've got it:
>>> matches = re.finditer("(?P<quote>['\"])(?P<comment>(?<!(?P=quote)).*?)(?P=quote)", s)
>>> [match.group("comment") for match in matches]
['test', 'blah', 'difficult "One"']
In other words: The 'comment' group should preceded by be a negative lookbehind (?<!) to the 'quote' group, followed by a non-greedy match of anything (.*?). Not sure if ".*?" is a good idea, ie zero-or-more-of-anything.
regards,
Albert-Jan
More information about the Tutor
mailing list