Scraping email to make invoice

Grant Edwards grant.b.edwards at gmail.com
Mon Apr 25 10:39:45 EDT 2016


On 2016-04-24, Michael Torrie <torriem at gmail.com> wrote:
> On 04/24/2016 12:58 PM, CM wrote:
>
>> 1. INPUT: What's the best way to scrape an email like this? The
>>    email is to a Gmail account, and the content shows up in the
>>    email as a series of basically 6x7 tables (HTML?), one table per
>>    PO number/task. I know if the freelancer were to copy and paste
>>    the whole set of tables into a text file and save it as plain
>>    text, Python could easily scrape that file, but I'd much prefer
>>    to save the user those steps. Is there a relatively easy way to
>>    go from the Gmail email to generating the invoice directly? (I
>>    know there is, but wasn't sure what is state of the art these
>>    days).
>
> I would configure Gmail to allow IMAP access (you'll have to set up a
> special password for this most likely),

Your normal gmail password is used for IMAP.

> and then use an imap library from Python to directly find the
> relevant messages and access the email message body.  If the body is
> HTML-formatted (sounds like it is) I would use either BeautifulSoup
> or lxml to parse it and get out the relevant information.

Warning: don't use the basic imaplib.  IMAP is a miserable protocol,
and imap lib is too thin a wrapper. It'll make you bleed from the ears
and wish you were dead.  Use imapclient or imaplib2.  I've used both
(with Gmail's IMAP server), and IMO both are pretty good.  Either one
is miles ahead of plain imaplib.

-- 
Grant Edwards               grant.b.edwards        Yow! But they went to MARS
                                  at               around 1953!!
                              gmail.com            




More information about the Python-list mailing list