What is wrong with this regex for matching emails?

Ned Batchelder ned at nedbatchelder.com
Sun Dec 17 12:01:14 EST 2017


On 12/17/17 10:29 AM, Peng Yu wrote:
> Hi,
>
> I would like to extract "abc at efg.hij.xyz". But it only shows ".hij".
> Does anybody see what is wrong with it? Thanks.
>
> $ cat main.py
> #!/usr/bin/env python
> # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:
>
> import re
> email_regex = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)')
> s = 'abc at efg.hij.xyz.'
> for email in re.findall(email_regex, s):
>      print email
>
> $ ./main.py
> .hij
>

There are two problems: you have a group at the end to match .something, 
but you need to make that 1-or-more of those, with a +. Second, 
re.findall will only return the matched groups, so you need to change 
your final group to be a non-capturing group, with (?:...)

     email_regex = 
re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)+')

--Ned.



More information about the Python-list mailing list