[2.5] Regex doesn't support MULTILINE?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sun Jul 22 04:34:17 EDT 2007


En Sun, 22 Jul 2007 01:56:32 -0300, Gilles Ganault <nospam at nospam.com>  
escribió:

> Incidently, as far as using Re alone is concerned, it appears that
> re.MULTILINE isn't enough to get Re to include newlines: re.DOTLINE
> must be added.
>
> Problem is, when I add re.DOTLINE, the search takes less than a second
> for a 500KB file... and about 1mn30 for a file that's 1MB, with both
> files holding similar contents.
>
> Why such a huge difference in performance?
>
> pattern = "<span class=.?defaut.?>(\d+:\d+).*?</span>"

Try to avoid using ".*" and ".+" (even the non greedy forms); in this  
case, I think you want the scan to stop when it reaches the ending </span>  
or any other tag, so use: [^<]* instead.

BTW, better to use a raw string to represent the pattern: pattern =  
r"...\d+..."

-- 
Gabriel Genellina




More information about the Python-list mailing list