Regular Expression Hell?

clee at thalamus.wustl.edu clee at thalamus.wustl.edu
Mon May 8 19:35:21 EDT 2000


In article <8f7d9n$ii7 at autodesk.autodesk.com>,
  "Akira Kiyomiya" <akira.kiyomiya at autodesk.com> wrote:
> Okay, these two regular expression codes are from Python Essential
Reference
> book and I am pretty confused about these.
>
> Could someone dare to explan step by step?

I will give it a try ( don't have the manuals in front of me so beware
of mistakes):
> e_mail = re.compile(r'([a-zA-Z][\w-]*@[\w-]+(?:\.[w-]+)*)')

1. [\w-]*  confuses me a little: \w by itself stands for [a-zA-Z0-9_]
so  with the * it matches 0 or more repetitions of those
characters.  The '-' sign is a little confusing becuase in square
brackets it usually denotes a range, but here it's clearly meant to
simply match itself.  You'll need to check to see if it does.

2.  After matching (1) we require an @ sign followed by 1 or more
repetions of [\w-]
3.a  The (?:  ) construct allows you to define a group without it being
captured for later use in the .group(n) function.
3.b. The (?:\.[\w-]+)* then matches 0 or more instances of a litteral
'.' character or any of the characters in the class defined by [\w-] (1)
(I'm assuming that you left out the backslash for \w above by accident

So yes, this looks like it would match an email address to me.

> URL = re.compile(r'((ftp | http)://[\w-]+(?:\.[\w-]+)*(?:/[\w-]*)*)')
>
> # I know ftp or http part plus "://" you need it for URL.  Then,.....
I am
> lost....
>
> Akira
>

If you understand my explaination for the first regexp, the second one
should make sense now.  (Assuming I haven't made a mistake.)

Andrew Kuchling has written a great intro to using re's as the regular
expression HOWTO at http://python.org.  You should check it out.

Good luck,

-chris


Sent via Deja.com http://www.deja.com/
Before you buy.



More information about the Python-list mailing list