OT: regex to find email
Remy Blank
remy.blank_asps at pobox.com
Tue Sep 21 12:53:33 EDT 2004
Josh Close wrote:
> Ok, I see how this works.... but now how would I add {3,64} for the id
> and {3,255} for the domain? I forgot to throw that part in earlier. I
> believe a valid id is 3-64 chars and domain is 3-255 chars.
>
> So basically like this
>
> [\w[\w\._-]*]{3,64}@[[\w\._-]{3,255}\.[\w\._-]+]
>
> ......I know that won't work, but I'd like to verify that the id is
> 3-64 chars long, and doesn't start with -._ and the domain is 3-255
> chars long and doesn't start with -._ but must have a dot and tld's
> like .com.au need to be accounted for also.
Let's see. Testing for a 3-64 char id should be easy:
[a-zA-Z0-9][\w\.-]{2,63}@ ...
You can't use \w in the first bracket, because you want to exclude
the underscore.
About the domain, I can't remember if the total length is limited,
or if each individual component is. The latter case is easy (say,
for components with lenghts 3-64):
... @([\w-]{3,64}\.)+[\w-]{3,64}
But I suspect this is not yet what you want. If you want to make
sure the total length of the domain is 3-255 chars, you'll have
to extract it after a match and check its length. Extraction could
be done with a named group:
... @(?P<domain>([\w-]{3,64}\.)+[\w-]{3,64})
Although I'm not sure how nested groups are handled. Combining both
parts and defining a group for the id as well gives:
(?P<id>[a-zA-Z0-9][\w\.-]{2,63})@(?P<domain>([\w-]{3,64}\.)+[\w-]{3,64})
(All on one line, obviously)
HTH,
-- Remy
Remove underscore and suffix in reply address for a timely response.
More information about the Python-list
mailing list