PyBalsa?

Donn Cave donn at u.washington.edu
Mon May 8 19:27:36 EDT 2000


Quoth François Pinard <pinard at iro.umontreal.ca>:
| Henning Schroeder <hschroeder at gmx.net> writes:
|
| > *But* it lacks one important "feature": speed.  Try parsing a mailbox
| > (using mailbox.UnixMailbox) with 600 messages and compare it with
| > e.g. mutt.  So a C module would be essential.
|
| I know I'm cheating, as Babyl files are a bit simpler to parse than `mbox'
| files, yet I was recently surprised by the blazing speed of parsing a
| Babyl file in pure Python, through some careful use of `string.find'.
| My impression is that the same could be done for `mbox' as well, with
| a bit more of attention.  By the way, I choose to not fully rely on the
| modules in the Python library, as they do more work than I really needed,
| and I felt (without checking) they would slow things down significantly.

I don't know, I read & send all my mail through imaplib, mimetools,
smtplib etc., and I think my application spends most of its delays
waiting for the server.  But it is a noticeable burden, and it would
be more critical in a time-sharing context like a web application.

I don't like the idea too much of C modules for RFC822 etc., because
I would like to think we're going to get better, more useful software
in Python.  On the other hand, it's not the fastest way in the world
to churn through a lot of text.  I think MxTextTools might be a good
compromise to think about, it's a generic lexical analyzer that can
quickly chop up stuff in formats like RFC822, leaving the interesting
parts of the logic up there in the Python layer.

	Donn Cave, donn at u.washington.edu



More information about the Python-list mailing list