Akward code using multiple regexp searches

Jason Lai jmlai at uci.edu
Fri Sep 10 03:03:44 EDT 2004


Topher Cawlfield wrote:
> Hi,
> 
> I'm relatively new to Python, and I already love it even after several 
> years of writing Perl.  But a few times already I've found myself 
> writing the following bit of awkward code when parsing text files.  Can 
> anyone suggest a more elegant solution?
> 
> rexp1 = re.compile(r'blah(dee)blah')
> rexp2 = re.compile(r'hum(dum)')
> for line in inFile:
>     reslt = rexp1.search(line)
>     if reslt:
>         something = reslt.group(1)
>     else:
>         reslt = rexp2.search(line)
>         if reslt:
>             somethingElse = reslt.group(1)
> 
> I'm getting more and more nested if statements, which gets ugly and very 
> hard to follow after the fourth or fifth regexp search.
> 
> Equivalent Perl code is more compact but more importantly seems to 
> communicate the process of searching for multiple regular expressions 
> more clearly:
> 
> while (<IN>) {
>     if (/blah(dee)blah/) {
>         $something = $1;
>     } elsif (/hum(dum)/) {
>         $somethingElse = $1;
>     }
> }
> 
> I'm a little bit worried about doing the following in Python, since I'm 
> not sure if the compiler is smart enough to avoid doing each regexp 
> search twice:
> 
> for line in inFile:
>     if rexp1.search(line)
>         something = rexp1.search(line).group(1)
>     elif rexp2.search(line):
>         somethingElse = rexp2.search(line).group(1)
> 
> In many cases I am worried about efficiency as these scripts parse a 
> couple GB of text!
> 
> Does anyone have a suggestion for cleaning up this commonplace Python 
> code construction?
> 
> Thanks,
>     Topher Cawlfield

Does it have to be stored in a different variable? If you have a list of 
regexs and you want to see if any of them match, you could create a 
compound regex such as "blah(dee)blah|hum(dum)" and search for that 
(although you have to be careful about overlaps.)

  - Jason Lai



More information about the Python-list mailing list