Regular Expressions - Python vs Perl

Terry Hancock hancock at anansispaceworks.com
Fri Apr 22 10:30:56 EDT 2005


On Thursday 21 April 2005 09:01 am, codecraig wrote:
>   I am interested in regular expressions and how Perl and Python
> compare.  Particulary, I am interested in performance (i.e. speed),
> memory usage, flexibility, completeness (i.e. supports simple and
> complex regex operations...basically is RegEx a strong module/library
> in Python?)

Understand that I have used regexes very very little in Perl (I took a
class, that's about it).  However, I have translated a couple of Perl modules
into Python.

I find that Perl programmers use the rather opaque "regex style" much
too often, so that I usually replace several regexes with simple string
searches, e.g.

original program:     uses regex to match /.*foo.*/
python translation:  just use s.find('foo')

That's not really for performance reasons (though it probably is faster?),
but because it just makes it clearer what you're trying to do.

OTOH, some of the regexes will be "real" regexes, in which case
Python's way of expressing regexes as strings makes things a whole
lot clearer, e.g.:

junk = r'.*'
word = r'\b\w+\b'
domain = r'(%s\.)*%s' % (word,word)
re_mail = re.compile(junk + word + '@' + domain + junk )

Although, of course, you can just write:

re_mail = re.compile(r'.*\b\w+\b@(\b\w+\b.)*\b\w+\b.*')

Which is shorter, but frankly, I had a hard time just keeping it straight to
type it here --- I think the first version was actually faster to write, even if
it takes up more space.  Also, I screwed up on the first time when I wrote
the regex for a word (forgot the '+'), so having it factored out like this made
it faster to fix the mistake.

Which is still a dumb example, but you can see what I mean about making
the code easier to read / refactor.  AFAIK, Perl does not make this particularly
easy.

Python regexes probably allow almost, but not quite all, of what Perl
regexes do (I think the current Python regex language is pretty much
identical with the one in Perl 5, but some newer features are in the most
cutting-edge Perl release, IIRC).

For all but the simplest jobs, Python regexes should be compiled, as I
do above. In fact, I just never bother with using them directly --- I think
the regex will get compiled when used, even if you don't do it explicitly,
and the explicit compiled regex can be stored for multiple uses, etc.

Although it wouldn't surprise me to learn that Perl's regex engine is
slightly more optimized (seeing as it is used so much), I wouldn't want
to bet on it. I doubt you'll notice any difference even if one exists, and
the speedup from eliminating regexes where they don't belong would
probably wipe it out anyway.

>   Anyone have any information on this?  Any numbers, benchmarks?

No benchmarks, sorry.  I don't care enough about the speed. But  I do
feel that Python regexes are both clearer and more flexible.  They encourage
code re-use and self-documentation.  And they can pretty much do whatever
their Perl equivalents can.

OTOH, Python programmers do not love them the way Perl programmers
do.  So they are used less.  This is not least because Python has a lot of
very powerful higher-level string manipulation tools.

> Thanks so much.  I know this is a python user group...but try to be has
> un-biased as you can.

Can't claim to be unbiased, sorry. ;-)

Cheers,
Terry

--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks  http://www.anansispaceworks.com




More information about the Python-list mailing list