Parsing for email addresses

galileo228 mattbarkan at gmail.com
Tue Feb 16 19:07:57 EST 2010


Tim -

Thanks for this.  I actually did intend to have to sift through other
junk in the file, but then figured I could just cut and paste emails
directly from the 'to' field, thus making life easier.

Also, in this particular instance, the domain names were the same, and
thus I was able to figure out my solution, but I do need to know how
to handle the same situation when the domain names are different, so
your response was most helpful.

Apologies for leaving out some details.

Matt

On Feb 16, 3:15 pm, Tim Chase <python.l... at tim.thechases.com> wrote:
> galileo228 wrote:
> > [code]
> > fileHandle = open('/Users/Matt/Documents/python/results.txt','r')
> > names = fileHandle.readlines()
> > [/code]
>
> > Now, the 'names' list has values looking like this: ['aa... at domain.com
> > \n', 'bb... at domain.com\n', etc]. So I ran the following code:
>
> > [code]
> > for x in names:
> >     st_list.append(x.replace('... at domain.com\n',''))
> > [/code]
>
> > And that did the trick! 'Names' now has ['aaa12', 'bbb34', etc].
>
> > Obviously this only worked because all of the domain names were the
> > same. If they were not then based on your comments and my own
> > research, I would've had to use regex and the split(), which looked
> > massively complicated to learn.
>
> The complexities stemmed from several factors that, with more
> details, could have made the solutions less daunting:
>
>    (a) you mentioned "finding" the email addresses -- this makes
> it sound like there's other junk in the file that has to be
> sifted through to find "things that look like an email address".
> If the sole content of the file is lines containing only email
> addresses, then "find the email address" is a bit like [1]
>
>    (b) you omitted the detail that the domains are all the same.
>   Even if they're not the same, (a) reduces the problem to a much
> easier task:
>
>    s = set()
>    for line in file('results.txt'):
>      s.add(line.rsplit('@', 1)[0].lower())
>    print s
>
> If it was previously a CSV or tab-delimited file, Python offers
> batteries-included processing to make it easy:
>
>    import csv
>    f = file('results.txt', 'rb')
>    r = csv.DictReader(f)  # CSV
>    # r = csv.DictReader(f, delimiter='\t') # tab delim
>    s = set()
>    for row in r:
>      s.add(row['Email'].lower())
>    f.close()
>
> or even
>
>    f = file(...)
>    r = csv.DictReader(...)
>    s = set(row['Email'].lower() for row in r)
>    f.close()
>
> Hope this gives you more ideas to work with.
>
> -tkc
>
> [1]http://jacksmix.files.wordpress.com/2007/05/findx.jpg




More information about the Python-list mailing list