What is wrong with this regex for matching emails?

Random832 random832 at fastmail.com
Tue Dec 19 17:44:37 EST 2017


On Mon, Dec 18, 2017, at 02:01, Chris Angelico wrote:
> Hmm, is that true? I was under the impression that the quoting rules
> were impossible to match with a regex.  Or maybe it's just that they're
> impossible to match with a *standard* regex, but the extended
> implementations (including Python's, possibly) are able to match them?

What's impossible to match with a regex are the comments permitted by RFC822 (which are delimited by balanced parentheses - AIUI perl can do it, python can't.) Which are, according to my argument, not part of the address.

> Anyhow, it is FAR from simple; and also, for the purpose of "detect
> email addresses in text documents", not desirable. Same as with URL
> detection - it's better to have a handful of weird cases that don't
> autolink correctly than to mis-detect any address that's at the end of
> a sentence, for instance. For that purpose, it's better to ignore the
> RFC and just craft a regex that matches *common* email address
> formats.

Email addresses don't, according to the formal spec, allow a dot at the end of the domain part. I was half-seriously proposing that as an extension (since DNS names *do*).



More information about the Python-list mailing list