regex problem

John Machin sjmachin at lexicon.net
Tue Jul 26 10:06:09 EDT 2005


Duncan Booth wrote:
> John Machin wrote:
> 
> 
>>So here's the mean lean no-flab version -- you don't even need the 
>>parentheses (sorry, Thomas).
>>
>>
>>>>>rx1=re.compile(r"""\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,""")
>>>>>rx1.findall("1234,2222-8888,4567,")
>>
>>['1234,', '2222-8888,', '4567,']
> 
> 
> No flab? What about all that repetition of \d? A less flabby version:
> 
> 
>>>>rx1=re.compile(r"""\b\d{4}(?:-\d{4})?,""")
>>>>rx1.findall("1234,2222-8888,4567,")
> 
> ['1234,', '2222-8888,', '4567,']
> 


OK, good idea to factor out the prefix and follow it by optional -1234.
However optimising re engines do common prefix factoring, *and* they 
rewrite stuff like x{4} as xxxx.

Cheers,
John



More information about the Python-list mailing list