Help with Regex for domain names

Aahz aahz at pythoncraft.com
Sun Aug 2 16:15:00 EDT 2009


In article <mailman.3998.1248989346.8015.python-list at python.org>,
MRAB  <python at mrabarnett.plus.com> wrote:
>Nobody wrote:
>> On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote:
>> 
>>>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')
>>> You might also want to consider that some country
>>> codes such as "co" for Columbia might match more than
>>> you want, for example:
>>>
>>>   re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com')
>>>
>>> will match.
>> 
>> ... so put \b at the end, i.e.:
>> 
>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b')
>> 
>It would still match "www.bbc.co.uk", so you might need:
>
>regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b(?!\.\b)')

If it's a string containing just the candidate domain, you can do

regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)$')
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer



More information about the Python-list mailing list