Searching for email address

Ian Bicking ianb at colorstudy.com
Tue Jul 30 21:48:43 EDT 2002


On Tue, 2002-07-30 at 20:11, Peter Hansen wrote:
> Graeme Longman wrote:
> > "Charl P. Botha" wrote:
> > > In article <mailman.1028038004.25205.python-list at python.org>,
> > > Graeme Longman wrote:
> > > > I'm writing some code which searches html for email addresses. Does
> > > > anyone out there have a good regular expression syntax for searching for
> > > > the email string, I'm sure this must've been done a few times before.
> > >
> > > Excuse my curiosity, but what are you going to do with this code? :)
> > 
> > In the process of translating our client's sites.
> > 
> > Just a script to find all the email addresses so that the email addresses
> > aren't included for translation.
> 
> Are there really going to be any addresses that don't follow the 
> format user at host.domain on those web pages?  
> 
> With real email (i.e. RFC822 and later ones) there are many options 
> including parentheses and quotation marks in addresses, but I've never 
> yet seen such a thing in a simple "mailto:" URL on the web.  

I have encountered "'" in emails, and it caused a problem when my regex
flagged it as invalid (someone by the name of O'Brien, I think).

It depends how robust you want your regex to be.  For a quick hack to
fix up some web pages, it probably doesn't matter -- especially if you
know your organization doesn't use silly email addresses.

Cheers,
  Ian






More information about the Python-list mailing list