newbie re question

Gonçalo Rodrigues op73418 at mail.telepac.pt
Fri Nov 8 08:04:03 EST 2002


On Thu, 07 Nov 2002 15:53:19 GMT, Robin Munn <rmunn at pobox.com> wrote:

>On Thu, 07 Nov 2002 at 04:58 GMT, Bengt Richter <bokr at oz.net> wrote:
>> On Wed, 06 Nov 2002 21:52:45 +0000, Gonçalo Rodrigues <op73418 at mail.telepac.pt> wrote:
>> 
>>>On Wed, 06 Nov 2002 21:17:12 GMT, "Fredrik Lundh"
>>><fredrik at pythonware.com> wrote:
>>>
>>>>Gonçalo Rodrigues wrote:
>>>>
>>>>> I've been trying to grok re's and settled myself a little exercise:
>>>>> concoct a re for a Python identifier.
>>>>>

[text snipped]

>
>This whole discussion is starting to remind me of a quote that I used to
>have hanging on my cubicle wall:
>
>    Some people, when confronted with a problem, think "I know, I'll use
>    regular expressions." Now they have two problems.
>                                                       - Jamie Zawinski
>
>Are re's really what you need to use here? Correct me if I'm wrong, but
>it looks like what you're trying to do is split the string into words
>with a regexp, then apply rules about which characters may or may not be
>part of the word. For the first part of that, why not use ''.split()?
>Then search the list of words with a much simpler regexp:
>
>>>> import re
>>>> pattern = re.compile(r'^[a-zA-Z_][a-zA-Z_\.]*$')
>>>> list_to_search = 'aadf cdase b ad:aa aasa a.aa a@ aa _aa _aafr@ aa_aa aa__a?jk'.split()
>>>> list_to_search
>['aadf', 'cdase', 'b', 'ad:aa', 'aasa', 'a.aa', 'a@', 'aa', '_aa', '_aafr@', 'aa_aa', 'aa__a?jk']
>>>> result_list = [item for item in list_to_search if pattern.match(item)]
>>>> result_list
>['aadf', 'cdase', 'b', 'aasa', 'a.aa', 'aa', '_aa', 'aa_aa']
>
>(My apologies for the more-than-80-chars lines here).
>
>My rule of thumb: if a regular expression takes more than ten seconds to
>grok *in its entirety*, it's too complicated, and another solution
>should be sought. Now sometimes there won't be an elegant solution and
>you wind up having to use a complicated regexp, but usually just trying
>to think about other solutions will help clarify the problem in your
>mind.
>
>BTW, the Jamie Zawinski quote above came from a Slashdot discussion:
>
>    http://slashdot.org/articles/99/06/01/2122209.shtml
>
>Scroll up about one or two pages from the bottom of the article to find
>the post in which that quote appears.

The first sentence in my first post starting the thread reads (and sorry
for quoting myself)

>>>>> I've been trying to grok re's and settled myself a little exercise:
>>>>> concoct a re for a Python identifier.

I'm not trying to solve any particular problem, just trying to get he
hang of re's - which I will *need* further down the road.

Anyway, thanks for the remainder. This re business is very tricky and I
will probably end up, for the specific problems I have to solve, with a
mixed approach, applying re's only to the most specific and
easily-describable patterns.

With my best regards,
G. Rodrigues



More information about the Python-list mailing list