OT: regex to find email

Carl Scharenberg carl.scharenberg at gmail.com
Wed Sep 22 15:25:07 EDT 2004


"Fredrik Lundh" <fredrik at pythonware.com> wrote in message news:<mailman.3700.1095836232.5135.python-list at python.org>...
> Jorgen Grahn wrote:
> 
> > I've seen no references to RFC 2822 in this thread ... please note that what
> > all these regexes catch is unlikely to be exactly the set of all valid RFC
> > 2822 addresses.
> 
> the perl faq is also required reading:
> 
> http://www.perldoc.com/perl5.6/pod/perlfaq9.html#How-do-I-check-a-valid-mail-address-
> 
>     Q. How do I check a valid mail address?
> 
>     A. You can't, at least, not in real time. Bummer, eh?
> 
>     Without sending mail to the address and seeing whether there's a human
>     on the other hand to answer you, you cannot determine whether a mail
>     address is valid.
> 
> what morally sound reasons are there to scrape mail addresses from text
> documents, btw?
> 
> </F>

Just as an example: I run the mailing list for a dance club that is
only active during the academic year. So our first email each
September has dozens of bounces from no-longer-valid addresses that
need to be removed from the list. I just paste the email containing
all the bounce notification text into a file and use a regex to grab
all the email addresses into a list and generate the proper removal
commands for majordomo. It beats copy-pasting each bad email address
individually from the email containing big lists of bounced addresses.

Webscrapers suck, though. As soon as I put up a webpage with my email
address my spam volume shot way up. I need to replace it with a gif
showing my address.

Carl



More information about the Python-list mailing list