What is wrong with this regex for matching emails?

alister alister.ware at ntlworld.com
Tue Dec 19 15:21:52 EST 2017


On Mon, 18 Dec 2017 07:57:27 +1100, Ben Finney wrote:

> Peng Yu <pengyu.ut at gmail.com> writes:
> 
>> Hi,
>>
>> I would like to extract "abc at efg.hij.xyz". But it only shows ".hij".
> 
> Others have address this question. I'll answer a separate one:
> 
>> Does anybody see what is wrong with it? Thanks.
> 
> One thing that's wrong with it is that it is far too restrictive.
> 
>> email_regex =
>> re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)')
> 
> This excludes a great many email addresses that are valid. Please don't
> try to restrict a match for email addresses that will exclude actual
> email addresses.
> 
> For an authoritative guide to matching email addresses, see RFC 3696 §3
> <URL:https://tools.ietf.org/html/rfc3696#section-3>.
> 
> A more correct match would boil down to:
> 
> * Match any printable Unicode characters (not just ASCII).
> 
> * Locate the *last* ‘@’ character. (An email address may contain more
>   than one ‘@’ character; you should allow any printable ASCII character
>   in the local part.)
> 
> * Match the domain part as the text after the last ‘@’ character. Match
>   the local part as anything before that character. Reject an address
>   that has either of these empty.
> 
> * Validate the domain by DNS request. Your program is not an authority
>   for what domains are valid; the only authority for that is the DNS.
> 
> * Don't validate the local part at all. Your program is not an authority
>   for what local parts are accepted to the destination host; the only
>   authority for that is the destination mail host.

At which point you have basicaly boiled your test down to 
<Anything>@<anything>.<anything> which is rather pointless

there are only 2 reasons why you would want an email anyway

1) Data mining, just to add to your mailing list- in which case even if 
it validates you still don't know if it is a fake address to prevent spam 
so validating is pointless

2) it is part of a registration process, in which case if it is incorrect 
the registration email will not be received & registration cannot be 
completed so self validating without any effort.




-- 
OMNIVERSAL AWARENESS??  Oh, YEH!!  First you need four GALLONS of JELL-O
and a BIG WRENCH!! ... I think you drop th'WRENCH in the JELL-O as if
it was a FLAVOR, or an INGREDIENT ... ... or ... I ... um ... WHERE'S
the WASHING MACHINES?



More information about the Python-list mailing list