What is wrong with this regex for matching emails?

alister alister.ware at ntlworld.com
Tue Dec 19 17:12:01 EST 2017


On Wed, 20 Dec 2017 08:21:02 +1100, Chris Angelico wrote:

> On Wed, Dec 20, 2017 at 7:21 AM, alister via Python-list
> <python-list at python.org> wrote:
>> On Mon, 18 Dec 2017 07:57:27 +1100, Ben Finney wrote:
>>> A more correct match would boil down to:
>>>
>>> * Match any printable Unicode characters (not just ASCII).
>>>
>>> * Locate the *last* ‘@’ character. (An email address may contain more
>>>   than one ‘@’ character; you should allow any printable ASCII
>>>   character in the local part.)
>>>
>>> * Match the domain part as the text after the last ‘@’ character.
>>> Match
>>>   the local part as anything before that character. Reject an address
>>>   that has either of these empty.
>>>
>>> * Validate the domain by DNS request. Your program is not an authority
>>>   for what domains are valid; the only authority for that is the DNS.
>>>
>>> * Don't validate the local part at all. Your program is not an
>>> authority
>>>   for what local parts are accepted to the destination host; the only
>>>   authority for that is the destination mail host.
>>
>> At which point you have basicaly boiled your test down to
>> <Anything>@<anything>.<anything> which is rather pointless
> 
> Not quite. Firstly, I would exclude all whitespace from your matchable
> characters; even though technically you CAN have spaces in email
> addresses, that'll almost never happen, and it's a lot more common to
> delimit with whites. Secondly, there's actually no requirement to have a
> dot in the domain part (and Ben never said so). However, you can
> straight-forwardly validate the domain by attempting a lookup.
> 
> rosuav at sikorsky:~$ dig +short mx ntlworld.com 1
> mx.tb.ukmail.iss.as9143.net.
> 1 mx.mnd.ukmail.iss.as9143.net.
> rosuav at sikorsky:~$ dig +short mx benfinney.id.au 10
> in1-smtp.messagingengine.com.
> 20 in2-smtp.messagingengine.com.
> rosuav at sikorsky:~$ dig +short mx dud.example.off.rosuav.com
> rosuav at sikorsky:~$
> 
> If there are no MX records for a domain, either the domain doesn't
> exist, or it doesn't receive mail. (Remove the "+short" for a more
> verbose report, in which case the failure state is a return code of
> NXDOMAIN.)
> 
>> there are only 2 reasons why you would want an email anyway
>>
>> 1) Data mining, just to add to your mailing list- in which case even if
>> it validates you still don't know if it is a fake address to prevent
>> spam so validating is pointless
>>
>> 2) it is part of a registration process, in which case if it is
>> incorrect the registration email will not be received & registration
>> cannot be completed so self validating without any effort.
> 
> 3) You're building a text display system (forum posts, text chat, etc,
> etc) and want to have web links and email addresses automatically become
> clickable
possible but again if people making the posts want to be contacted & will 
list a working email address, more likely it will be munged to stop the 
spammers from harvesting it. otherwise if an emaikl has to be given they 
will provide a valid looking fake.
> 
> 4) You ask a user to provide "contact information". If s/he provides an
> email address, you automatically send emails; if a phone number, you
> automatically send SMS/text messages; otherwise, you leave it up to a
> human to contact the user.
> 
I can see auto detecting between a Tel no & an email may be a plausible 
desire, you now have 2 problems because not only are email address so 
difficult to validate that it is not worth the effort telephone numbers 
are also to variable to validate reliably (assuming an international 
audience)

> Plenty of possibilities beyond those two. Don't assume there's nothing
> else that can be done just because your imagination can't come up with
> anything :)

Indeed the most obvious other reason is scraping web pages Newsgroups & 
forums for email addresses to spam & I am sure no-one wants to help with 
that

> 
> ChrisA





-- 
I know you believe you understand what you think this fortune says, but
I'm not sure you realize that what you are reading is not what it means.



More information about the Python-list mailing list