Ann: Validating Emails and HTTP URLs in Python

Philip Semanchuk philip at semanchuk.com
Mon May 3 09:24:49 EDT 2010


On May 3, 2010, at 9:06 AM, andrew cooke wrote:

>
> Hi,
>
> The latest Lepl release includes an implementation of RFC 3696 - the
> RFC that describes how best to validate email addresses and HTTP
> URLs.  For more information please see http://www.acooke.org/lepl/rfc3696.html
>
> Lepl's main page is http://www.acooke.org/lepl
>
> Because Lepl compiles to regular expressions wherever possible, the
> library is quite fast - in testing I was seeing about 1ms needed to
> validate a URL.
>
> Please bear in mind that this is the very first release of this
> module, so it may have some bugs...  If you find any problems contact
> me and I'll fix them ASAP.

Thanks, Andrew, for contributing that to the open source community.

FYI, Fourthought's PyXML has a module called uri.py that contains  
regexes for URL validation. I've over a million URLs (harvested from  
the Internet) through their code. I can't say I checked each and every  
result, but I never saw anything that would lead me to believe it was  
misbehaving.

It might be interesting to compare the results of running a large list  
of URLs through your code and theirs.

Good luck
Philip




More information about the Python-list mailing list