OT: regex to find email

Remy Blank remy.blank_asps at pobox.com
Tue Sep 21 12:53:33 EDT 2004


Josh Close wrote:
> Ok, I see how this works.... but now how would I add {3,64} for the id
> and {3,255} for the domain? I forgot to throw that part in earlier. I
> believe a valid id is 3-64 chars and domain is 3-255 chars.
> 
> So basically like this
> 
> [\w[\w\._-]*]{3,64}@[[\w\._-]{3,255}\.[\w\._-]+]
> 
> ......I know that won't work, but I'd like to verify that the id is
> 3-64 chars long, and doesn't start with -._ and the domain is 3-255
> chars long and doesn't start with -._ but must have a dot and tld's
> like .com.au need to be accounted for also.

Let's see. Testing for a 3-64 char id should be easy:

[a-zA-Z0-9][\w\.-]{2,63}@ ...

You can't use \w in the first bracket, because you want to exclude
the underscore.

About the domain, I can't remember if the total length is limited,
or if each individual component is. The latter case is easy (say,
for components with lenghts 3-64):

... @([\w-]{3,64}\.)+[\w-]{3,64}

But I suspect this is not yet what you want. If you want to make
sure the total length of the domain is 3-255 chars, you'll have
to extract it after a match and check its length. Extraction could
be done with a named group:

... @(?P<domain>([\w-]{3,64}\.)+[\w-]{3,64})

Although I'm not sure how nested groups are handled. Combining both
parts and defining a group for the id as well gives:

(?P<id>[a-zA-Z0-9][\w\.-]{2,63})@(?P<domain>([\w-]{3,64}\.)+[\w-]{3,64})

(All on one line, obviously)

HTH,
-- Remy


Remove underscore and suffix in reply address for a timely response.




More information about the Python-list mailing list