Scraping email to make invoice

Friedrich Rentsch anthra.norell at bluewin.ch
Sun Apr 24 16:38:11 EDT 2016



On 04/24/2016 08:58 PM, CM wrote:
> I would like to write a Pythons script to automate a tedious process and could use some advice.
>
> The source content will be an email that has 5-10 PO (purchase order) numbers and information for freelance work done. The target content will be an invoice. (There will be an email like this every week).
>
> Right now, the "recommended" way to go (from the company) from source to target is manually copying and pasting all the tedious details of the work done into the invoice. But this is laborious, error-prone...and just begging for automation. There is no human judgment necessary whatsoever in this.
>
> I'm comfortable with "scraping" a text file and have written scripts for this, but could use some pointers on other parts of this operation.
>
> 1. INPUT: What's the best way to scrape an email like this? The email is to a Gmail account, and the content shows up in the email as a series of basically 6x7 tables (HTML?), one table per PO number/task. I know if the freelancer were to copy and paste the whole set of tables into a text file and save it as plain text, Python could easily scrape that file, but I'd much prefer to save the user those steps. Is there a relatively easy way to go from the Gmail email to generating the invoice directly? (I know there is, but wasn't sure what is state of the art these days).
>
> 2. OUPUT: The invoice will have boilerplate content on top and then an Excel table at bottom that is mostly the same information from the source content. Ideally, so that the invoice looks good, the invoice should be a Word document. For the first pass at this, it looked best by laying out the entire invoice in Excel and then copy and pasting it into a Word doc as an image (since otherwise the columns ran over for some reason). In any case, the goal is to create a single page invoice that looks like a clean, professional looking invoice.
>
> 3. UI: I am comfortable with making GUI apps, so could use this as the interface for the (somewhat computer-uncomfortable) user. But the less user actions necessary, the better. The emails always come from the same sender, and always have the same boilerplate language ("Below please find your Purchase Order (PO)"), so I'm envisioning a small GUI window with a single button that says "MAKE NEWEST INVOICE" and the user presses it and it automatically searches the user's email for PO # emails and creates the newest invoice. I'm guessing I could keep a sqlite database or flat file on the computer to just track what is meant by "newest", and then the output would have the date created in the file, so the user can be sure what has been invoiced.
>
> I'm hoping I can write this in a couple of days.
>
> Any suggestions welcome! Thanks.

INPUT: What's the best way to scrape an email like this?  --  Like what? You need to explain what exactly your input is or show an example.

Frederic







More information about the Python-list mailing list