Searching for email address
Ian Bicking
ianb at colorstudy.com
Tue Jul 30 21:48:43 EDT 2002
On Tue, 2002-07-30 at 20:11, Peter Hansen wrote:
> Graeme Longman wrote:
> > "Charl P. Botha" wrote:
> > > In article <mailman.1028038004.25205.python-list at python.org>,
> > > Graeme Longman wrote:
> > > > I'm writing some code which searches html for email addresses. Does
> > > > anyone out there have a good regular expression syntax for searching for
> > > > the email string, I'm sure this must've been done a few times before.
> > >
> > > Excuse my curiosity, but what are you going to do with this code? :)
> >
> > In the process of translating our client's sites.
> >
> > Just a script to find all the email addresses so that the email addresses
> > aren't included for translation.
>
> Are there really going to be any addresses that don't follow the
> format user at host.domain on those web pages?
>
> With real email (i.e. RFC822 and later ones) there are many options
> including parentheses and quotation marks in addresses, but I've never
> yet seen such a thing in a simple "mailto:" URL on the web.
I have encountered "'" in emails, and it caused a problem when my regex
flagged it as invalid (someone by the name of O'Brien, I think).
It depends how robust you want your regex to be. For a quick hack to
fix up some web pages, it probably doesn't matter -- especially if you
know your organization doesn't use silly email addresses.
Cheers,
Ian
More information about the Python-list
mailing list