trouble with regex?

MRAB python at mrabarnett.plus.com
Thu Oct 8 12:42:21 EDT 2009


inhahe wrote:
> Can someone tell me why this doesn't work?
> 
> colorre = re.compile ('('
>                         '^'
>                        '|'
>                         '(?:'
>                            '\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
>                            '(?:'
>                               ',(?:10|11|12|13|14|15|0\\d|\\d)'
>                            ')?'
>                         ')'
>                       ')(.*?)')
> 
> I'm trying to extract mirc color codes.
> 
> this works:
> 
> colorre = re.compile ('\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
>                       '(?:'
>                          ',(?:10|11|12|13|14|15|0\\d|\\d)'
>                       ')?'
>                       )
> 
> but I wanted to modify it so that it returns me groups of (color code, 
> text after the code), except for the first text at the beginning of the 
> string before any color code, for which it should return ('', text). 
> that's what the first paste above is trying to do, but it doesn't work. 
> here are some results:
> 
>  >>> colorre.findall('a\x0b1,1')
> [('', ''), ('\x0b1,1', '')]
>  >>> colorre.findall('a\x0b1,1b')
> [('', ''), ('\x0b1,1', '')]
>  >>> colorre.findall('ab')
> [('', '')]
>  >>> colorre.findall('\x0b1,1')
> [('', '')]
>  >>> colorre.findall('\x0b1,1a')
> [('', '')]
>  >>>
> 
> i can easily work with the string that does work and just use group 
> starting and ending positions, but i'm curious as to why i can't get it 
> working teh way i want :/
> 
The problem with the regex is that .*? is a lazy repeat: it'll try to
match as few characters as possible, which is why the second group is
always ''. Try a greedy repeat instead, but matching only
non-backspaces:

colorre = re.compile('('
                        '^'
                       '|'
                        '(?:'
                           '\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
                           '(?:'
                              ',(?:10|11|12|13|14|15|0\\d|\\d)'
                           ')?'
                        ')'
                      ')([^\x0b]*)')



More information about the Python-list mailing list