[Tutor] parse emails as they come in

Steve Willoughby steve at alchemy.com
Wed Apr 2 19:12:44 CEST 2008


On Wed, Apr 02, 2008 at 10:20:41AM +0000, linuxian iandsd wrote:
> well, here is a piece of final script :
> 
> #!/usr/bin/python
> #
> 
> import sys
> 
> b=[]
> while 1:
>  data = sys.stdin.readline()
>  if data != '\n':
>   b.append(data)
>  else:
>   break

I'd keep working on that loop a bit in accordance with
the advice you've already received.

I'm still not sure why you're not using Python's ability
to iterate over lines of the file directly.  I think
it may be simpler to process the data as it comes in
rather than storing it in an array and then go through
it again.

> for i in (0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16):
>  b[i]=b[i].split(':')[1].strip()
>  #print b[i]
> 
> b[1]=b[1].split(':')
> b[1]=b[1][1]+b[1][2]+b[1][3].strip()
> #print b[1][1]+b[1][2]+b[1][3].strip()

I'd also suggest checking out regular expressions.
You may find a simpler approach to parsing your data
than all this splitting on colons.

> bb=",".join(b)
> print bb
> 
> mysqlcmd='insert into webdata field1, field2, field3, field4, field5,
> field6, field7 ..... field17 values (%s)' % bb

Here you have a very common mistake, but an extremely dangerous
one, so I'd like to point it out to you.  You're pasting together
strings with commas between them and then pasting that straight
into a SQL statement.  You need to be careful to make sure that 
the data in bb is valid SQL syntax.  In particular, what if any
of the strings contain commas?  You'd get extra fields.  What if
they contain SQL commands (maybe as a coincidence or maybe not)?
You could make the insert command fail, or corrupt or destroy your
whole database depending on what's in bb.

The good news is that it's really easy to cover that situation.
The MySQL interface libraries support a special kind of statement
where it will handle that for you.  You just need to supply the
list of values (not a list you joined up yourself).

Like this:

cursor_object_to_database.execute('insert into webdata field1,
field2, field3, (etc.), field17 values (%s, %s, %s, %s, %s,
(etc.), %s)', *b)

(this is for the MySQLdb module)

Also, how confident are you that the mail format might not be wrong?
Some error checking might be good to add at some point.

> 
> On Wed, Apr 2, 2008 at 9:50 AM, Steve Willoughby <steve at alchemy.com> wrote:
> 
> > linuxian iandsd wrote:
> >
> > > well, i don't know how to pipe the file to my script !!
> > >
> >
> > It's how procmail works.  Presuming you looked up how to write
> > a procmail rule to save the body of your mail into a file, you
> > should also see right next to it the instructions for piping
> > the message to a program.  (It also can be found by a quick
> > Google search, for future reference if you want a quick answer.)
> >
> > You just put a "|" symbol in front of the script name.
> >
> > To save the message to a file, you'd say something like
> >
> > :0:
> > *Subject:.*pattern to look for
> > /home/me/temp_input_file$date$time
> >
> > To pipe it to your script, you'd say something like
> >
> > :0
> > *Subject:.*pattern to look for
> > |/home/me/scriptname
> >
> > For more information see procmailrc(5) and procmailex(5).
> >
> > Your Python script will see the message input on stdin.
> >
> >
> >
> > > On Wed, Apr 2, 2008 at 7:18 AM, Steve Willoughby <steve at alchemy.com>
> > > wrote:
> > >
> > >  linuxian iandsd wrote:
> > > >
> > > > > ok - as i mentioned in my first email i use procmail to put THE BODY
> > > > > of
> > > > >
> > > > all
> > > >
> > > > > incoming mail into a file (that is one per incoming email as i use
> > > > > the
> > > > > variable $date-$time in the name).
> > > > >
> > > > > now this file can contain only one email but it can also contain 2
> > > > > or
> > > > >
> > > > more
> > > >
> > > > > (this happens if for example there is a dns problem in the internet,
> > > > > so
> > > > >
> > > > mail
> > > >
> > > > > can't make it, but once internet recovers from the dns problem mail
> > > > >
> > > > rushes
> > > >
> > > > > in & we may have multiple messages per file. this is also true is i
> > > > > do
> > > > >
> > > > this
> > > >
> > > > Using $date-$time is insufficient since I'll wager a dozen doughnuts
> > > > that the resolution of $time isn't small enough compared to the speed
> > > > messages can arrive.
> > > >
> > > > But as I tried to explain in my previous mail, this is a problem you
> > > > don't have to solve.  By choosing to use procmail to dump a file with
> > > > a non-unique name, you create a race condition you then have to deal
> > > > with in your code.
> > > >
> > > > If, on the other hand, you use procmail to _filter_ the message
> > > > through your script, this cannot possibly happen.  You'll get an
> > > > invocation of your script per message every time.  If you have
> > > > your script directly dump the data into MySQL you never need to
> > > > write any disk files at all.
> > > >
> > > >
> > > >
> > >
> >
> >

-- 
Steve Willoughby    |  Using billion-dollar satellites
steve at alchemy.com   |  to hunt for Tupperware.


More information about the Tutor mailing list