[Tutor] regex: don't match embedded quotes

Albert-Jan Roskam fomcl at yahoo.com
Tue Jun 11 13:05:38 CEST 2013


----- Original Message -----
> From: Albert-Jan Roskam <fomcl at yahoo.com>
> To: Python Mailing List <tutor at python.org>
> Cc: 
> Sent: Tuesday, June 11, 2013 12:56 PM
> Subject: [Tutor] regex: don't match embedded quotes
> 
> Hi,
>  
> I have written a regex that is supposed to match correctly quoted (single quotes 
> on each side, or double quotes on each side) text. It works, but it also matches 
> embedded quoted text, which I don't want to happen.
> I should somehow modify the 'comment' group such that it backreferences 
> to 'quote' and includes only the inner quote sign. Background: I am 
> playing around with this to see how hard it would be to write my own Pygments 
> lexer, which I could then also use in IPython notebook.
>  
>>>> import re
>>>> s = "some enumeration 1 'test' 2 'blah' 3 
> 'difficult \"One\"'."
>>>> matches = 
> re.finditer("(?P<quote>['\"])(?P<comment>[^'\"]*)(?P=quote)", 
> s, re.DEBUG)

<snip>
 
Okay, I am having blood-shut eyes now, but I think I've got it:
 
>>> matches = re.finditer("(?P<quote>['\"])(?P<comment>(?<!(?P=quote)).*?)(?P=quote)", s)
>>> [match.group("comment") for match in matches]
['test', 'blah', 'difficult "One"']

In other words: The 'comment' group should preceded by be a negative lookbehind (?<!) to the 'quote' group, followed by a non-greedy match of anything (.*?). Not sure if ".*?" is a good idea, ie zero-or-more-of-anything.
 
regards,
Albert-Jan


More information about the Tutor mailing list