Regular Expressions

John Machin sjmachin at lexicon.net
Sun Feb 11 15:50:36 EST 2007


On Feb 12, 3:35 am, "deviantbunnyl... at gmail.com"
<deviantbunnyl... at gmail.com> wrote:
> > That's a little harsh -- regexes have their place, together with pointer
> > arithmetic, bit manipulations, reverse polish notation and goto. The
> > problem is when people use them inappropriately e.g. using a regex when a
> > simple string.find will do.
>
> > > A quote attributed variously to
> > > Tim Peters and Jamie Zawinski says "Some people, when confronted with a
> > > problem, think 'I know, I'll use regular expressions.' Now they have two
> > > problems."
>
> > I believe that is correctly attributed to Jamie Zawinski.
>
> > --
> > Steven
>
> So as a newbie, I have to ask. I've played with the re module now for
> a while, I think regular expressions are super fun and useful. As far
> as them being a problem I found they can be tricky and sometimes the
> regex's I've devised do unexpected things...(which I can think of two
> instances where that unexpected thing was something that I had hoped
> to get into further down the line, yay for me!). So I guess I don't
> really understand why they are a "bad idea" to use.

Regexes are not "bad". However people tend to overuse them, whether
they are overkill (like Gabriel's date-splitting example) or underkill
-- see your next sentence :-)

> I don't know of
> any other way yet to parse specific data out of a text, html, or xml
> file without resorting to regular expressions.
> What other ways are there?

Text: Paul Maguire's pyparsing module (Google is your friend); read
David Mertz's book on text processing with Python (free download, I
believe); modules for specific data formats e.g. csv

HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.

HTH,
John




More information about the Python-list mailing list