Scraping email to make invoice

Michael Torrie torriem at gmail.com
Sun Apr 24 19:19:41 EDT 2016


On 04/24/2016 12:58 PM, CM wrote:
> 1. INPUT: What's the best way to scrape an email like this? The
> email is to a Gmail account, and the content shows up in the email as
> a series of basically 6x7 tables (HTML?), one table per PO
> number/task. I know if the freelancer were to copy and paste the
> whole set of tables into a text file and save it as plain text,
> Python could easily scrape that file, but I'd much prefer to save the
> user those steps. Is there a relatively easy way to go from the Gmail
> email to generating the invoice directly? (I know there is, but
> wasn't sure what is state of the art these days).

I would configure Gmail to allow IMAP access (you'll have to set up a
special password for this most likely), and then use an imap library
from Python to directly find the relevant messages and access the email
message body.  If the body is HTML-formatted (sounds like it is) I would
use either BeautifulSoup or lxml to parse it and get out the relevant
information.

> 2. OUPUT: The invoice will have boilerplate content on top and then 
> an Excel table at bottom that is mostly the same information from
> the source content. Ideally, so that the invoice looks good, the
> invoice should be a Word document. For the first pass at this, it
> looked best by laying out the entire invoice in Excel and then copy
> and pasting it into a Word doc as an image (since otherwise the
> columns ran over for some reason). In any case, the goal is to create
> a single page invoice that looks like a clean, professional looking
> invoice.

There are several libraries for creating Excel and Word files,
especially the XML-based formats, though I have little experience with
them.  There are also nice libraries for emitting PDF if that would work
better.

> 3. UI: I am comfortable with making GUI apps, so could use this as 
> the interface for the (somewhat computer-uncomfortable) user. But
> the less user actions necessary, the better. The emails always come
> from the same sender, and always have the same boilerplate language 
> ("Below please find your Purchase Order (PO)"), so I'm envisioning a 
> small GUI window with a single button that says "MAKE NEWEST
> INVOICE" and the user presses it and it automatically searches the
> user's email for PO # emails and creates the newest invoice. I'm
> guessing I could keep a sqlite database or flat file on the computer
> to just track what is meant by "newest", and then the output would
> have the date created in the file, so the user can be sure what has
> been invoiced.

Once you have a working script, your GUI interface would be pretty easy.
 Though it seems to me that it would be unnecessary.  This process
sounds like it should just run automatically from a cron job or something.

> I'm hoping I can write this in a couple of days.

The automated part should be possible, but personally I'd give myself a
week.



More information about the Python-list mailing list