[Tutor] parse emails as they come in

Steve Willoughby steve at alchemy.com
Tue Apr 1 23:17:30 CEST 2008


On Tue, Apr 01, 2008 at 09:07:04PM +0000, linuxian iandsd wrote:
> a=open('/home/john/data/file_input.tmp', 'r')
> b=open('/home/john/data/file_output', 'w')

This is collecting mail as it comes in?  If you have a mail
rule in place to dump mail into this file_input.tmp file,
you could run into trouble if multiple messages arrive close
enough together that you get a race condition.

I'd suggest just using something like procmail to invoke
your Python script directly on the incoming message, so
you don't have to dump it to a temporary input file.  
You'll be guaranteed to see one and only one mail per
invocation of your script (although it may invoke 
several copies of your script at the same time, so plan
for that, e.g., don't write to the same output filename
every time--or don't write to a file at all, just have
your script put the data into MySQL or whatever directly).

> aa=a.readlines()
> n=0
> for L in aa:

Generally speaking, it's better to let Python iterate 
through the lines of a file.  The above code sucks in
the entire (possibly huge) file into memory and then
iterates over that list.  Better:

for L in a:

or better yet:

for lines in input_file:

> # a little secret : this little script helps me load data from mail to a
> mysql database by converting it into ; separated values :)

I'd look at just gathering the raw data into Python variables and then
connecting to MySQL directly and executing a SQL statement to import the
data straight in.  You'll avoid a host of problems with properly quoting
data (what if a ';' is in one of the data fields?), as well as making it
unnecessary to carry out another post-processing step of gathering this
script's output and stuffing it into MySQL.

-- 
Steve Willoughby    |  Using billion-dollar satellites
steve at alchemy.com   |  to hunt for Tupperware.


More information about the Tutor mailing list