simple text file 'parsing' question

Sean Mc Grath digitome at iol.ie
Sat Jun 19 05:05:53 EDT 1999


Unfortunately, Eudora's .mbx files are not consistent
in how a message starts. The sentinel string is always
the same. from memory it is something like
	"From ???@@???"

The problem is that it sometimes occurs in the middle of a line.
As long as you allow the sentinel to occur anywhere on a line
and keep the bit to the left of the sentinel, you can skip from
their to the blank line -- it will be all headers.

BTW, two weeks ago a new programmer with two years
college joined my company. No experience in Python. No
experience in XML. Two weeks later he has:-

	A python parser for Eudora .mbx mail archives that
	uses rfc822.py to tease out the headers
	An XML transformation script in Python
	Used Python reporting scripts to gelp create
	a DTD for rfc822 e-mail
	The beginnings of a down-translate to Folio Views
	in Python.

Does this language make programmers productive or what!!!!!



On Fri, 18 Jun 1999 23:44:11 -0700, "Phil Mayes"
<nospam at bitbucket.com> wrote:

>KP wrote in message <376B1AAC.19FE8BCE at mysolution.com>...
>>Here's my dilema: a directory filled (200+) with small emails. My goal
>>is to strip all the headers and combine them into one file. I can read
>>all the files just fine and write them all to one file, but I cannot
>>discern how to strip the headers. The answer must be very simple, yet
>>I cannot see it. Can anyone give a few pointers on how to do it, our
>>what module might be best? Thank you.
>>Ken
>
>
>A raw email always has a blank line between the header and the body.
>(To be pendantic, it should also have all its lines ending in CRLF.)
>So you can read it in and find the gap by looking for 2 EOLs:
>
>import string
>f = open('c:\\apps\\eudora\\in.mbx', 'r')
>all = f.read()
>x = string.find(all, '\n\n')
>body = all[x+2:]
># append body to output file
>--
>Phil Mayes    pmayes AT olivebr DOT com
>
>
>
>





More information about the Python-list mailing list