Regular Expression Hell?

Trent Mick trentm at activestate.com
Mon May 8 19:45:19 EDT 2000


On Mon, May 08, 2000 at 02:58:13PM -0700, Akira Kiyomiya wrote:
> e_mail = re.compile(r'([a-zA-Z][\w-]*@[\w-]+(?:\.[w-]+)*)')
> 
break it into pieces:

(
  [a-zA-Z]        # exactly one alpha character (i.e. upper or
                  # lower case letter)
  [\w-]*          # any number (that is what the '*' means) of
                  # alphanumeric characters (that is what '\w'
				  # means) or hyphen characters (that is the '-')
				  # Note: if a hyphen is the last character in
				  # a [...] block then it is literal, i.e. means
				  # a hyphen rather than a range
				  # Note: for an email regexp, probably want to
				  # allow periods as well (use '\.' in the [...] block)
  @               # the '@' character
  [\w-]+          # one or more (that is what '+' means) alphanumeric
                  # or hyphne characters
  (?:\.[w-]+)*    # pretend the '?:' isn't there, you can look it
                  # up in the docs if you want, that leaves (\.[w-]+)*
				  # (
				  #   \.      # a literal period character
				  #   [w-]+   # one or more 'w' or '-' characters
				  #           # this is probably a bug, wanted [\w-]+
				  # )*        # zero or more of these blocks
)

I'll leave the URL one to you.


Trent

-- 
Trent Mick
trentm at activestate.com




More information about the Python-list mailing list