re.findall()-Problem ?

Matthias Kuhn Kuhn_Matthias at gmx.de
Tue Oct 30 05:19:56 EST 2001


Hi,                                                   
                                                      
I have a question to re.findall(). In the sample-code below 
there is a teststring 's', where I want to find some tokens, 
preceding a number followed by a comma, e.g. '200,t'  ->  't'

The first re 'r1', finds the tokens 200 .. 241,
in the second 'r2' I've added 250, so that it only finds
the tokens in the first #BEGIN..#END-Block.

When I run the sample-code the result of the findall() 
is correct, but there is a big difference 
in the elapsed time:
r1:     0.0 sec  (< 10 msec)
r2: ca. 18 sec

This is done with Python 1.5.2 on a PentiumIII(750MHz) under
NT4.0(SP6).

For Python 2.1 on the same machine the results are:
r1:     0.0 sec
r2: ca. 49 sec

If I put the first #BEG..#END-Block at the end of the string
the timing results are:
r1:  0.0 sec
r2:  0.0 sec
That's what I've expected! 
Now my question, is there something wrong with my regular expressions
or is there a problem with the re-module?

Thanks for any advice

Matthias


import time
import re

s = """    
#BEGINCHANNELHEADER 
200,t 
201,sec 
202,sec 
210,IMPLIZIT 
220,724
240,0.0
241,0.007001
250,0.0
251,5.061844
#ENDCHANNELHEADER

#BEGINCHANNELHEADER 
200,CVI_P_Cal_FL
201,bar
202,bar
210,EXPLIZIT 
211,vme____0.i16
213,BLOCK 
214,INT16
220,724
221,1
222,72
240,0.000000
241,0.010000
260,Numerisch 
#ENDCHANNELHEADER

#BEGINCHANNELHEADER 
200,CVI_P_Cal_FL
201,bar
202,bar
210,EXPLIZIT 
211,vme____0.i16
213,BLOCK 
214,INT16
220,724
221,1
222,72
240,0.000000
241,0.010000
260,Numerisch 
#ENDCHANNELHEADER

#BEGINCHANNELHEADER 
200,CVI_P_Cal_FL
201,bar
202,bar
210,EXPLIZIT 
211,vme____0.i16
213,BLOCK 
214,INT16
220,724
221,1
222,72
240,0.000000
241,0.010000
260,Numerisch 
#ENDCHANNELHEADER
"""

r1 = '(?s)200,(\S+).+?202,(\S+).+?210,(\S+).+?220,(\S+).+?240,(\S+).+?241,(\S+)'
r2 = '(?s)200,(\S+).+?202,(\S+).+?210,(\S+).+?220,(\S+).+?240,(\S+).+?241,(\S+).+?250,(\S+)'

for r in r1, r2:
    a = time.time()
    l = re.findall(r, s)
    b = time.time()
    print 'elapsed Time: ', b-a
    for i in l: print i



More information about the Python-list mailing list