[Email-SIG] Demo code for mbox message tests

Broadus Jones sbjiii at comcast.net
Fri Oct 22 23:33:03 CEST 2004


I tried a slightly modified copy of your code (both are below) with varying
results.

Using a mailbox from Imail, tried this under Python 2.3.4 in Cygwin and
Win32.  

The results are below.

Does anyone have any idea why it worked this way?

Broadus

----- cygwin results start -----
bjones at bjones-2k01 ~/mail_testing
$ python check_mbox.py
message count  483
message count  481
message count  482
message count  482
message count  482
message count  482
message count  482
message count  482
message count  482

bjones at bjones-2k01 ~/mail_testing
$ md5sum mbox-?
539798a7459815366f826312697d99d0 *mbox-0
d0bdb56dd8b6377959b49765caf04205 *mbox-1
fe339e2ff213736090a063c1f61b67d8 *mbox-2
fe339e2ff213736090a063c1f61b67d8 *mbox-3
fe339e2ff213736090a063c1f61b67d8 *mbox-4
fe339e2ff213736090a063c1f61b67d8 *mbox-5
fe339e2ff213736090a063c1f61b67d8 *mbox-6
fe339e2ff213736090a063c1f61b67d8 *mbox-7
fe339e2ff213736090a063c1f61b67d8 *mbox-8
fe339e2ff213736090a063c1f61b67d8 *mbox-9

bjones at bjones-2k01 ~/mail_testing
$ ls -al mbox-?
-rwxr-xr-x    1 bjones   Enterpri  7248444 Oct 22 16:11 mbox-0
-rw-r--r--    1 bjones   Enterpri  7235445 Oct 22 16:19 mbox-1
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:19 mbox-2
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:19 mbox-3
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:20 mbox-4
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:20 mbox-5
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:20 mbox-6
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:21 mbox-7
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:21 mbox-8
-rw-r--r--    1 bjones   Enterpri  7235457 Oct 22 16:21 mbox-9

bjones at bjones-2k01 ~/mail_testing
$ 
----- cygwin results end -----

----- win32 results start -----
C:\Python23>python check_mbox.py
message count  483
message count  114
message count  75
message count  44
message count  25
message count  17
message count  12
message count  6
message count  2

C:\Python23>md5sum mbox-?
539798a7459815366f826312697d99d0 *mbox-0
e3c74093172c66328e585f1ff6594325 *mbox-1
569c47edd5781c5684060d558372162a *mbox-2
3fcab1ee8a155749bb998ee907490eaf *mbox-3
e90eccfc8c82a31aa4b553910696f160 *mbox-4
8b1754eb3ad2126bccdbd502b2a5baf1 *mbox-5
fbbb48c336644a8611aadf9cfcab4099 *mbox-6
775a646a0730211db669eeceaef8a69e *mbox-7
d33194f510207ec44541be50ec33dde2 *mbox-8
c4067ab7c6a5f2df7ac2241fc932d0b0 *mbox-9

C:\Python23>dir mbox-?
 Volume in drive C is BJones_C
 Volume Serial Number is 3CB6-8D83

 Directory of C:\Python23

10/22/2004  04:22p           7,248,444 mbox-0
10/22/2004  04:24p           7,337,591 mbox-1
10/22/2004  04:24p           7,444,674 mbox-2
10/22/2004  04:24p           7,553,703 mbox-3
10/22/2004  04:25p           7,673,618 mbox-4
10/22/2004  04:25p           7,782,220 mbox-5
10/22/2004  04:25p           7,893,373 mbox-6
10/22/2004  04:25p           8,007,114 mbox-7
10/22/2004  04:25p           8,123,874 mbox-8
10/22/2004  04:26p           8,243,674 mbox-9
              10 File(s)     77,308,285 bytes
               0 Dir(s)   1,722,507,264 bytes free

C:\Python23>
----- win32 results end -----

Code used in check_mbox.py

----- code start -----
#!/usr/bin/env python
#Given the mbox-format file "mbox-in", it writes "mbox-out" as normalized
data.
#It then reads this file and writes "mbox-out2".
#mbox-out and mbox-out2 should be identical, but aren't.

import email
# import email.Iterators
import mailbox
# import datetime

from sys import exc_info

#Error-catching replacement of email.message_from_file. See mailbox docs.
def msgfactory(fp):
	try:
		return email.message_from_file(fp)
	except email.Errors.MessageParseError:
		s="From MailerDaemon
%s\n"%email.Utils.formatdate(localtime=True)
		s+="From: MailerDaemon\n"
		s+="Subject: Error: %s\n\n"%exc_info()[1]
		s+='Sorry, couldn\'t parse message due to
error:\n"%s"\n\n'%exc_info()[1]
		return email.message_from_string(s)

def readmbox(mboxin,mboxout):
	fp=open(mboxin)
	f=open(mboxout,"w")
	mbox=mailbox.UnixMailbox(fp,msgfactory)
	msg_count=0
	for msg in mbox:
		f.write(str(msg))
		msg_count += 1
	fp.close()
	f.close()
	print "message count ", msg_count

for i in range(1,10):
	readmbox("mbox-" + str(i - 1),"mbox-" + str(i))

----- code end -----


-----Original Message-----
From: Python Email sig [mailto:email-sig at shopip.com] 
Sent: Monday, June 14, 2004 5:55 AM
To: email-sig at python.org
Subject: [Email-SIG] Demo code for mbox message tests

One would expect that reading an mbox file of messages and writing it out
would produce an identical file, at least if it was previously written by
the same Python code. This is important in my case since I generate an MD5
hash of each message. In *almost* every case the file does not change,
however I have seen a few cases where spurious spaces get appended to the
end of header lines. Use this code to verify that these Python mail
functions are working correctly.

Copy your favorite mbox file to "mbox-in", then run this code.




#!/usr/bin/env python
#Given the mbox-format file "mbox-in", it writes "mbox-out" as normalized
data.
#It then reads this file and writes "mbox-out2".
#mbox-out and mbox-out2 should be identical, but aren't.

import email
import mailbox
from sys import exc_info

#Error-catching replacement of email.message_from_file. See mailbox docs.
def msgfactory(fp):
	try:
		return email.message_from_file(fp)
	except email.Errors.MessageParseError:
		s="From MailerDaemon
%s\n"%email.Utils.formatdate(localtime=True)
		s+="From: MailerDaemon\n"
		s+="Subject: Error: %s\n\n"%exc_info()[1]
		s+='Sorry, couldn\'t parse message due to
error:\n"%s"\n\n'%exc_info()[1]
		return email.message_from_string(s)

def readmbox(mboxin,mboxout):
	fp=open(mboxin)
	f=open(mboxout,"w")
	mbox=mailbox.UnixMailbox(fp,msgfactory)
	for msg in mbox:
		f.write(str(msg))
	fp.close()
	f.close()

readmbox("mbox-in","mbox-out")
readmbox("mbox-out","mbox-out2")





More information about the Email-SIG mailing list