What is wrong with this regex for matching emails?

Random832 random832 at fastmail.com
Mon Dec 18 01:43:22 EST 2017


On Sun, Dec 17, 2017, at 10:46, Chris Angelico wrote:
> But if you're trying to *validate* an email address - for instance, if
> you receive a form submission and want to know if there was an email
> address included - then my recommendation is simply DON'T. You can't
> get all the edge cases right; it is actually impossible for a regex to
> perfectly match every valid email address and no invalid addresses.

That's not actually true (the thing that notoriously can't be matched in
a regex, RFC822 "address", is basically most of the syntax of the To:
header - the part that is *the address* as we speak of it normally is
"addr-spec" and is in fact a regular language, though a regex to match
it goes on for a few hundred characters. The formal syntax also has some
surprising corners that might not reflect real-world implementations:
for example, a local-part may not begin or end with a dot or contain two
dots in a row (unless quoted - the suggestion someone else made that a
local-part may contain an @ sign also requires quoting). It's also
unfortunate that a domain-part may not end with the dot, since this
would provide a way to specify TLD- only addresses without allowing the
error of mistakenly leaving the TLD off of an address.

> And that's only counting *syntactically* valid - it doesn't take into
> account the fact that "blah at junk.example.com" is not going to get
> anywhere. So if you're trying to do validation, basically just don't.

The recommendation still stands, of course - this script is probably not
the place to explore these obscure corners. If the email address is
important, you can send a link to it and wait for them to click it to
confirm the email. If it's not, don't bother at all.



More information about the Python-list mailing list