OT: regex to find email
Carl Scharenberg
carl.scharenberg at gmail.com
Wed Sep 22 15:25:07 EDT 2004
"Fredrik Lundh" <fredrik at pythonware.com> wrote in message news:<mailman.3700.1095836232.5135.python-list at python.org>...
> Jorgen Grahn wrote:
>
> > I've seen no references to RFC 2822 in this thread ... please note that what
> > all these regexes catch is unlikely to be exactly the set of all valid RFC
> > 2822 addresses.
>
> the perl faq is also required reading:
>
> http://www.perldoc.com/perl5.6/pod/perlfaq9.html#How-do-I-check-a-valid-mail-address-
>
> Q. How do I check a valid mail address?
>
> A. You can't, at least, not in real time. Bummer, eh?
>
> Without sending mail to the address and seeing whether there's a human
> on the other hand to answer you, you cannot determine whether a mail
> address is valid.
>
> what morally sound reasons are there to scrape mail addresses from text
> documents, btw?
>
> </F>
Just as an example: I run the mailing list for a dance club that is
only active during the academic year. So our first email each
September has dozens of bounces from no-longer-valid addresses that
need to be removed from the list. I just paste the email containing
all the bounce notification text into a file and use a regex to grab
all the email addresses into a list and generate the proper removal
commands for majordomo. It beats copy-pasting each bad email address
individually from the email containing big lists of bounced addresses.
Webscrapers suck, though. As soon as I put up a webpage with my email
address my spam volume shot way up. I need to replace it with a gif
showing my address.
Carl
More information about the Python-list
mailing list