What is wrong with this regex for matching emails?

Chris Angelico rosuav at gmail.com
Tue Dec 19 16:21:02 EST 2017


On Wed, Dec 20, 2017 at 7:21 AM, alister via Python-list
<python-list at python.org> wrote:
> On Mon, 18 Dec 2017 07:57:27 +1100, Ben Finney wrote:
>> A more correct match would boil down to:
>>
>> * Match any printable Unicode characters (not just ASCII).
>>
>> * Locate the *last* ‘@’ character. (An email address may contain more
>>   than one ‘@’ character; you should allow any printable ASCII character
>>   in the local part.)
>>
>> * Match the domain part as the text after the last ‘@’ character. Match
>>   the local part as anything before that character. Reject an address
>>   that has either of these empty.
>>
>> * Validate the domain by DNS request. Your program is not an authority
>>   for what domains are valid; the only authority for that is the DNS.
>>
>> * Don't validate the local part at all. Your program is not an authority
>>   for what local parts are accepted to the destination host; the only
>>   authority for that is the destination mail host.
>
> At which point you have basicaly boiled your test down to
> <Anything>@<anything>.<anything> which is rather pointless

Not quite. Firstly, I would exclude all whitespace from your matchable
characters; even though technically you CAN have spaces in email
addresses, that'll almost never happen, and it's a lot more common to
delimit with whites. Secondly, there's actually no requirement to have
a dot in the domain part (and Ben never said so). However, you can
straight-forwardly validate the domain by attempting a lookup.

rosuav at sikorsky:~$ dig +short mx ntlworld.com
1 mx.tb.ukmail.iss.as9143.net.
1 mx.mnd.ukmail.iss.as9143.net.
rosuav at sikorsky:~$ dig +short mx benfinney.id.au
10 in1-smtp.messagingengine.com.
20 in2-smtp.messagingengine.com.
rosuav at sikorsky:~$ dig +short mx dud.example.off.rosuav.com
rosuav at sikorsky:~$

If there are no MX records for a domain, either the domain doesn't
exist, or it doesn't receive mail. (Remove the "+short" for a more
verbose report, in which case the failure state is a return code of
NXDOMAIN.)

> there are only 2 reasons why you would want an email anyway
>
> 1) Data mining, just to add to your mailing list- in which case even if
> it validates you still don't know if it is a fake address to prevent spam
> so validating is pointless
>
> 2) it is part of a registration process, in which case if it is incorrect
> the registration email will not be received & registration cannot be
> completed so self validating without any effort.

3) You're building a text display system (forum posts, text chat, etc,
etc) and want to have web links and email addresses automatically
become clickable

4) You ask a user to provide "contact information". If s/he provides
an email address, you automatically send emails; if a phone number,
you automatically send SMS/text messages; otherwise, you leave it up
to a human to contact the user.

Plenty of possibilities beyond those two. Don't assume there's nothing
else that can be done just because your imagination can't come up with
anything :)

ChrisA



More information about the Python-list mailing list