From esj at harvee.org Thu Aug 5 06:46:33 2004 From: esj at harvee.org (Eric S. Johansson) Date: Thu Aug 5 06:46:58 2004 Subject: [Email-SIG] size of a message Message-ID: <4111BBA9.3040302@harvee.org> I'm looking to find how to calculate the size of a message held by a e-mail object. Obviously, I could convert the whole thing to a string and take its length but that seems rather distasteful. This seems like a real basic function and I'm sure I missing something in the documentation due to the fact that I am up past my bedtime. additionally, the project I working on gave me a reason to add extra headers dynamically to mail messages. So I have a new set of classes and examples of how to do this and I've figured out how to add this to the current classes in an upwards compatible fashion. I make no claim to this implementation being generally useful in anywhere other way than a reference implementation. If it is of interest, you're welcome to it ---eric -- Speech recognition in use. It makes mistakes, I correct most From matt at mondoinfo.com Thu Aug 5 21:22:36 2004 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Thu Aug 5 21:29:42 2004 Subject: [Email-SIG] size of a message In-Reply-To: <4111BBA9.3040302@harvee.org> References: <4111BBA9.3040302@harvee.org> Message-ID: <1091733334.69.3469@mint-julep.mondoinfo.com> Dear Eric, > I'm looking to find how to calculate the size of a message held by > a e-mail object. Obviously, I could convert the whole thing to a > string and take its length but that seems rather distasteful. As far as I'm aware, that's the way to do it. The as_string() method of a Message object should make it pretty painless. I don't really know how you'd figure it out in another way. > additionally, the project I working on gave me a reason to add > extra headers dynamically to mail messages. So I have a new set of > classes and examples of how to do this and I've figured out how to > add this to the current classes in an upwards compatible fashion. I haven't had any trouble adding and removing headers with the Message object's ordinary API. I'm sure that you've got something interesting; can you tell us a little more about it? Regards, Matt From esj at harvee.org Thu Aug 5 22:26:54 2004 From: esj at harvee.org (Eric S. Johansson) Date: Thu Aug 5 22:28:00 2004 Subject: [Email-SIG] size of a message In-Reply-To: <1091733334.69.3469@mint-julep.mondoinfo.com> References: <4111BBA9.3040302@harvee.org> <1091733334.69.3469@mint-julep.mondoinfo.com> Message-ID: <4112980E.7070502@harvee.org> Matthew Dixon Cowles wrote: > Dear Eric, > > >>I'm looking to find how to calculate the size of a message held by >>a e-mail object. Obviously, I could convert the whole thing to a >>string and take its length but that seems rather distasteful. > > > As far as I'm aware, that's the way to do it. The as_string() method > of a Message object should make it pretty painless. I don't really > know how you'd figure it out in another way. I just didn't know if there was some way of walking all of the objects in the mail message and summing up the individual elements. >>additionally, the project I working on gave me a reason to add >>extra headers dynamically to mail messages. So I have a new set of >>classes and examples of how to do this and I've figured out how to >>add this to the current classes in an upwards compatible fashion. > > > I haven't had any trouble adding and removing headers with the > Message object's ordinary API. I'm sure that you've got something > interesting; can you tell us a little more about it? in a nutshell, I have a situation where constantly modifying a message was not desirable. The message object could be shared among several other message container objects and modifying in one place would obviously be shared with others which would be a bad thing(tm). So I created a new generator object which replaces _write_headers. the new generator takes a header object as an argument to generate additional message headers on-the-fly during a as_string call. For me, it serves the purpose of allowing me to store state information in the message container object and push it into the message on output without modifying the original message. this is really important in a case where a single e-mail object may be shared among multiple container objects. I added an optional argument to my as_string (which is a close copy of the original except it turns off line wrap) to take a header class to create the additional header elements. as I said, I don't know if it would be useful to anyone else but it's certainly saving my bacon right now. ---eric -- Speech recognition in use. It makes mistakes, I correct most From matt at mondoinfo.com Thu Aug 5 22:51:57 2004 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Thu Aug 5 22:53:01 2004 Subject: [Email-SIG] size of a message In-Reply-To: <4112980E.7070502@harvee.org> References: <4111BBA9.3040302@harvee.org> <1091733334.69.3469@mint-julep.mondoinfo.com> <4112980E.7070502@harvee.org> Message-ID: <1091738454.08.3469@mint-julep.mondoinfo.com> Dear Eric, >> As far as I'm aware, that's the way to do it. The as_string() >> method of a Message object should make it pretty painless. I don't >> really know how you'd figure it out in another way. > I just didn't know if there was some way of walking all of the > objects in the mail message and summing up the individual elements. I think that as_string() is the way to go. > in a nutshell, I have a situation where constantly modifying a > message was not desirable. The message object could be shared > among several other message container objects and modifying in one > place would obviously be shared with others which would be a bad > thing(tm). So I created a new generator object which replaces > _write_headers. I don't have a need for that, but it's certainly not beyond the realm of possibility that someone else might. Google seems to archive this list pretty well. If the patch is small, you could post it here. If it's bigger and you can find some web space to post it, you could post a pointer here. That way anyone who needed that sort of thing would be likely to find it. The Vaults of Parnassus: http://www.vex.net/parnassus/ is also a great place to post links to code. Regards, Matt From t-meyer at ihug.co.nz Mon Aug 9 10:02:17 2004 From: t-meyer at ihug.co.nz (Tony Meyer) Date: Mon Aug 9 10:02:22 2004 Subject: [Email-SIG] Torture test location Message-ID: In test_email_torture.py there is this text: """ # A torture test of the email package. This should not be run as part of the # standard Python test suite since it requires several meg of email messages # collected in the wild. These source messages are not checked into the # Python distro, but are available as part of the standalone email package at # http://sf.net/projects/mimelib """ I can't figure where exactly I can find these messages. There's no files in the sf list, and the links for download are for 2.5.4 and 2.5.5 and don't appear to include these, either. Could someone point me in the right direction? Thanks, Tony Meyer From matt at mondoinfo.com Sat Aug 21 22:02:43 2004 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Sat Aug 21 22:02:54 2004 Subject: [Email-SIG] Feed parser recipe Message-ID: <1093111617.36.2300@mint-julep.mondoinfo.com> Alex and Anna Martelli are working on a new edition of O'Reilly's Python Cookbook: http://www.oreilly.com/catalog/pythoncook/ As you may know, it's largely based on the recipes in ActiveState's Python Cookbook site: http://aspn.activestate.com/ASPN/Cookbook/Python/ Alex asked me if I had anything new to offer, especially anything that uses the new features from Python 2.4. I have a bit of code that uses the new feed parser that may be a useful example. Since the deadline for submissions that use 2.4's features is September 24, I guess it's pretty unlikely that I'll be able to test it against a release version, and maybe even unlikely that I'll be able to test against a beta. So I thought I'd ask here if anyone thinks that the feed parser is likely to change significantly between now and 2.4 final. In particular, the code solves one problem I've seen in real life when using the feed parser. I've seen at least a few spam mails that have a content-type of multipart/ but which contain only a single part. The feed parser can parse them, but the resulting Message object is internally inconsistent: get_main_type() returns "multipart" but is_multipart() returns False. In that case, my code applies some messy heuristics in an attempt to figure out what the right content-type is. (Anyone is welcome to the code, but I haven't posted it here because I doubt that the standard library is the right place for a bunch of messy heuristics.) I'd be glad if anyone who has an opinion about whether that would be a useful example for the book would let me know. Regards, Matt From esj at harvee.org Mon Aug 23 13:38:10 2004 From: esj at harvee.org (Eric S. Johansson) Date: Mon Aug 23 13:38:42 2004 Subject: [Email-SIG] Feed parser recipe In-Reply-To: <1093111617.36.2300@mint-julep.mondoinfo.com> References: <1093111617.36.2300@mint-julep.mondoinfo.com> Message-ID: <4129D722.7070902@harvee.org> Matthew Dixon Cowles wrote: > Since the deadline for submissions that use 2.4's features is > September 24, for lack of a better place to ask this question, I'm wondering could submit the work I have done with making smtpd.py able to fork of children which handle the SMTP transaction and then properly terminate the child process. I think this would help make smtpd somewhat more useful as a counterpoint to smtplib. the only thing better would be to make it handle threading as well as forking which, as I think about it, wouldn't be all that hard except for the fact that have very little experience with threading and Python. ---eric -- Speech recognition in use. It makes mistakes, I correct most -------------- next part -------------- #! /usr/bin/env python2.2 """An RFC 2821 smtp proxy. Usage: %(program)s [options] [localhost:localport [remotehost:remoteport]] Options: --nosetuid -n This program generally tries to setuid `nobody', unless this flag is set. The setuid call will fail if this program is not run as root (in which case, use this flag). --version -V Print the version number and exit. --class classname -c classname Use `classname' as the concrete SMTP proxy class. Uses `SMTPProxy' by default. --debug -d Turn on debugging prints. --help -h Print this message and exit. Version: %(__version__)s If localhost is not given then `localhost' is used, and if localport is not given then 8025 is used. If remotehost is not given then `localhost' is used, and if remoteport is not given, then 25 is used. """ # Overview: # # This file implements the minimal SMTP protocol as defined in RFC 821. It # has a hierarchy of classes which implement the backend functionality for the # smtpd. A number of classes are provided: # # SMTPServer - the base class for the backend. Raises NotImplementedError # if you try to use it. # # DebuggingServer - simply prints each message it receives on stdout. # # PureProxy - Proxies all messages to a real smtpd which does final # delivery. One known problem with this class is that it doesn't handle # SMTP errors from the backend server at all. This should be fixed # (contributions are welcome!). # # MailmanProxy - An experimental hack to work with GNU Mailman # . Using this server as your real incoming smtpd, your # mailhost will automatically recognize and accept mail destined to Mailman # lists when those lists are created. Every message not destined for a list # gets forwarded to a real backend smtpd, as with PureProxy. Again, errors # are not handled correctly yet. # # Please note that this script requires Python 2.0 # # Author: Barry Warsaw # # TODO: # # - support mailbox delivery # - alias files # - ESMTP # - handle error codes from the backend smtpd import sys import os import errno import getopt import time import socket import asyncore import asynchat __all__ = ["SMTPServer","DebuggingServer","PureProxy","MailmanProxy"] program = sys.argv[0] __version__ = 'Python SMTP proxy version 0.2' class Devnull: def write(self, msg): pass def flush(self): pass DEBUGSTREAM = Devnull() NEWLINE = '\n' EMPTYSTRING = '' COMMASPACE = ', ' def usage(code, msg=''): print >> sys.stderr, __doc__ % globals() if msg: print >> sys.stderr, msg sys.exit(code) class SMTPChannel(asynchat.async_chat): COMMAND = 0 DATA = 1 def __init__(self, server, conn, addr): asynchat.async_chat.__init__(self, conn) self.__server = server self.__conn = conn self.__addr = addr self.__line = [] self.__state = self.COMMAND self.__greeting = 0 self.__mailfrom = None self.__rcpttos = [] self.__data = '' self.__fqdn = socket.getfqdn() self.__peer = conn.getpeername() print >> DEBUGSTREAM, 'Peer:', repr(self.__peer) self.push('220 %s %s' % (self.__fqdn, __version__)) self.set_terminator('\r\n') # Overrides base class for convenience def push(self, msg): asynchat.async_chat.push(self, msg + '\r\n') # Implementation of base class abstract method def collect_incoming_data(self, data): self.__line.append(data) # Implementation of base class abstract method def found_terminator(self): line = EMPTYSTRING.join(self.__line) print >> DEBUGSTREAM, 'Data:', repr(line) self.__line = [] if self.__state == self.COMMAND: if not line: self.push('500 Error: bad syntax') return method = None i = line.find(' ') if i < 0: command = line.upper() arg = None else: command = line[:i].upper() arg = line[i+1:].strip() method = getattr(self, 'smtp_' + command, None) if not method: self.push('502 Error: command "%s" not implemented' % command) return method(arg) return else: if self.__state != self.DATA: self.push('451 Internal confusion') return # Remove extraneous carriage returns and de-transparency according # to RFC 821, Section 4.5.2. data = [] for text in line.split('\r\n'): if text and text[0] == '.': data.append(text[1:]) else: data.append(text) self.__data = NEWLINE.join(data) status = self.__server.process_message(self.__peer, self.__mailfrom, self.__rcpttos, self.__data) self.__rcpttos = [] self.__mailfrom = None self.__state = self.COMMAND self.set_terminator('\r\n') if not status: self.push('250 Ok') else: self.push(status) # SMTP and ESMTP commands def smtp_HELO(self, arg): if not arg: self.push('501 Syntax: HELO hostname') return if self.__greeting: self.push('503 Duplicate HELO/EHLO') else: self.__greeting = arg self.push('250 %s' % self.__fqdn) def smtp_NOOP(self, arg): if arg: self.push('501 Syntax: NOOP') else: self.push('250 Ok') def smtp_QUIT(self, arg): # args is ignored self.push('221 Bye') self.close_when_done() # factored def __getaddr(self, keyword, arg): address = None keylen = len(keyword) if arg[:keylen].upper() == keyword: address = arg[keylen:].strip() if not address: pass elif address[0] == '<' and address[-1] == '>' and address != '<>': # Addresses can be in the form but watch out # for null address, e.g. <> address = address[1:-1] return address def smtp_MAIL(self, arg): print >> DEBUGSTREAM, '===> MAIL', arg address = self.__getaddr('FROM:', arg) if not address: self.push('501 Syntax: MAIL FROM:
') return if self.__mailfrom: self.push('503 Error: nested MAIL command') return self.__mailfrom = address print >> DEBUGSTREAM, 'sender:', self.__mailfrom self.push('250 Ok') def smtp_RCPT(self, arg): print >> DEBUGSTREAM, '===> RCPT', arg if not self.__mailfrom: self.push('503 Error: need MAIL command') return address = self.__getaddr('TO:', arg) if not address: self.push('501 Syntax: RCPT TO:
') return self.__rcpttos.append(address) print >> DEBUGSTREAM, 'recips:', self.__rcpttos self.push('250 Ok') def smtp_RSET(self, arg): if arg: self.push('501 Syntax: RSET') return # Resets the sender, recipients, and data, but not the greeting self.__mailfrom = None self.__rcpttos = [] self.__data = '' self.__state = self.COMMAND self.push('250 Ok') def smtp_DATA(self, arg): if not self.__rcpttos: self.push('503 Error: need RCPT command') return if arg: self.push('501 Syntax: DATA') return self.__state = self.DATA self.set_terminator('\r\n.\r\n') self.push('354 End data with .') def close(self): asynchat.async_chat.close(self) self.__server.process_close() class SMTPServer(asyncore.dispatcher): def __init__(self, localaddr, remoteaddr): self._localaddr = localaddr self._remoteaddr = remoteaddr asyncore.dispatcher.__init__(self) self.create_socket(socket.AF_INET, socket.SOCK_STREAM) # try to re-use a server port if possible self.set_reuse_addr() self.bind(localaddr) self.listen(5) print >> DEBUGSTREAM, \ '%s started at %s\n\tLocal addr: %s\n\tRemote addr:%s' % ( self.__class__.__name__, time.ctime(time.time()), localaddr, remoteaddr) def handle_accept(self): conn, addr = self.accept() print >> DEBUGSTREAM, 'Incoming connection from %s' % repr(addr) channel = SMTPChannel(self, conn, addr) # API for "doing something useful with the message" def process_message(self, peer, mailfrom, rcpttos, data): """Override this abstract method to handle messages from the client. peer is a tuple containing (ipaddr, port) of the client that made the socket connection to our smtp port. mailfrom is the raw address the client claims the message is coming from. rcpttos is a list of raw addresses the client wishes to deliver the message to. data is a string containing the entire full text of the message, headers (if supplied) and all. It has been `de-transparencied' according to RFC 821, Section 4.5.2. In other words, a line containing a `.' followed by other text has had the leading dot removed. This function should return None, for a normal `250 Ok' response; otherwise it returns the desired response string in RFC 821 format. """ raise NotImplementedError def process_close (self): """override this abstract method to handle the close of SMTP channel. """ pass def loop (self, timeout=30.0, use_poll=0, map=None): global f if map is None: map=asyncore.socket_map if use_poll: if hasattr (select, 'poll'): poll_fun = asyncore.poll3 else: poll_fun = asyncore.poll2 else: poll_fun = asyncore.poll while map: try: status = os.waitpid(-1,os.WNOHANG) # print "timeout loop %s %s"% (self.parent_ID, str(status)) except OSError, error: if error[0] == 10: status = (0,0) else: print self.parent_ID, error if status[0] == 0: try: poll_fun (timeout, map) except asyncore.ExitNow: return class forkSMTPServer(SMTPServer): def __init__(self, address): self.parent_ID = None SMTPServer.__init__(self, address,(0,0)) self.mychannel = None def handle_accept(self): conn, addr = self.accept() self.parent_ID = os.fork() if not self.parent_ID: print >> DEBUGSTREAM, 'Incoming connection from %s' % repr(addr) self.mychannel = SMTPChannel(self, conn, addr) else: # close off socket in parent and fake lack of connection conn.close() self.connected = 0 # print "child fork ID", self.parent_ID def process_close(self): # if we are a child process, we are done so exit if not self.parent_ID: # print "exit stage left %s"% (self.parent_ID) raise asyncore.ExitNow class DebuggingServer(SMTPServer): # Do something with the gathered message def process_message(self, peer, mailfrom, rcpttos, data): inheaders = 1 lines = data.split('\n') print '---------- MESSAGE FOLLOWS ----------' for line in lines: # headers first if inheaders and not line: print 'X-Peer:', peer[0] inheaders = 0 print line print '------------ END MESSAGE ------------' class PureProxy(SMTPServer): def process_message(self, peer, mailfrom, rcpttos, data): lines = data.split('\n') # Look for the last header i = 0 for line in lines: if not line: break i += 1 lines.insert(i, 'X-Peer: %s' % peer[0]) data = NEWLINE.join(lines) refused = self._deliver(mailfrom, rcpttos, data) # TBD: what to do with refused addresses? print >> DEBUGSTREAM, 'we got some refusals:', refused def _deliver(self, mailfrom, rcpttos, data): import smtplib refused = {} try: s = smtplib.SMTP() s.connect(self._remoteaddr[0], self._remoteaddr[1]) try: refused = s.sendmail(mailfrom, rcpttos, data) finally: s.quit() except smtplib.SMTPRecipientsRefused, e: print >> DEBUGSTREAM, 'got SMTPRecipientsRefused' refused = e.recipients except (socket.error, smtplib.SMTPException), e: print >> DEBUGSTREAM, 'got', e.__class__ # All recipients were refused. If the exception had an associated # error code, use it. Otherwise,fake it with a non-triggering # exception code. errcode = getattr(e, 'smtp_code', -1) errmsg = getattr(e, 'smtp_error', 'ignore') for r in rcpttos: refused[r] = (errcode, errmsg) return refused class MailmanProxy(PureProxy): def process_message(self, peer, mailfrom, rcpttos, data): from cStringIO import StringIO from Mailman import Utils from Mailman import Message from Mailman import MailList # If the message is to a Mailman mailing list, then we'll invoke the # Mailman script directly, without going through the real smtpd. # Otherwise we'll forward it to the local proxy for disposition. listnames = [] for rcpt in rcpttos: local = rcpt.lower().split('@')[0] # We allow the following variations on the theme # listname # listname-admin # listname-owner # listname-request # listname-join # listname-leave parts = local.split('-') if len(parts) > 2: continue listname = parts[0] if len(parts) == 2: command = parts[1] else: command = '' if not Utils.list_exists(listname) or command not in ( '', 'admin', 'owner', 'request', 'join', 'leave'): continue listnames.append((rcpt, listname, command)) # Remove all list recipients from rcpttos and forward what we're not # going to take care of ourselves. Linear removal should be fine # since we don't expect a large number of recipients. for rcpt, listname, command in listnames: rcpttos.remove(rcpt) # If there's any non-list destined recipients left, print >> DEBUGSTREAM, 'forwarding recips:', ' '.join(rcpttos) if rcpttos: refused = self._deliver(mailfrom, rcpttos, data) # TBD: what to do with refused addresses? print >> DEBUGSTREAM, 'we got refusals:', refused # Now deliver directly to the list commands mlists = {} s = StringIO(data) msg = Message.Message(s) # These headers are required for the proper execution of Mailman. All # MTAs in existance seem to add these if the original message doesn't # have them. if not msg.getheader('from'): msg['From'] = mailfrom if not msg.getheader('date'): msg['Date'] = time.ctime(time.time()) for rcpt, listname, command in listnames: print >> DEBUGSTREAM, 'sending message to', rcpt mlist = mlists.get(listname) if not mlist: mlist = MailList.MailList(listname, lock=0) mlists[listname] = mlist # dispatch on the type of command if command == '': # post msg.Enqueue(mlist, tolist=1) elif command == 'admin': msg.Enqueue(mlist, toadmin=1) elif command == 'owner': msg.Enqueue(mlist, toowner=1) elif command == 'request': msg.Enqueue(mlist, torequest=1) elif command in ('join', 'leave'): # TBD: this is a hack! if command == 'join': msg['Subject'] = 'subscribe' else: msg['Subject'] = 'unsubscribe' msg.Enqueue(mlist, torequest=1) class Options: setuid = 1 classname = 'PureProxy' def parseargs(): global DEBUGSTREAM try: opts, args = getopt.getopt( sys.argv[1:], 'nVhc:d', ['class=', 'nosetuid', 'version', 'help', 'debug']) except getopt.error, e: usage(1, e) options = Options() for opt, arg in opts: if opt in ('-h', '--help'): usage(0) elif opt in ('-V', '--version'): print >> sys.stderr, __version__ sys.exit(0) elif opt in ('-n', '--nosetuid'): options.setuid = 0 elif opt in ('-c', '--class'): options.classname = arg elif opt in ('-d', '--debug'): DEBUGSTREAM = sys.stderr # parse the rest of the arguments if len(args) < 1: localspec = 'localhost:8025' remotespec = 'localhost:25' elif len(args) < 2: localspec = args[0] remotespec = 'localhost:25' elif len(args) < 3: localspec = args[0] remotespec = args[1] else: usage(1, 'Invalid arguments: %s' % COMMASPACE.join(args)) # split into host/port pairs i = localspec.find(':') if i < 0: usage(1, 'Bad local spec: %s' % localspec) options.localhost = localspec[:i] try: options.localport = int(localspec[i+1:]) except ValueError: usage(1, 'Bad local port: %s' % localspec) i = remotespec.find(':') if i < 0: usage(1, 'Bad remote spec: %s' % remotespec) options.remotehost = remotespec[:i] try: options.remoteport = int(remotespec[i+1:]) except ValueError: usage(1, 'Bad remote port: %s' % remotespec) return options if __name__ == '__main__': options = parseargs() # Become nobody if options.setuid: try: import pwd except ImportError: print >> sys.stderr, \ 'Cannot import module "pwd"; try running with -n option.' sys.exit(1) nobody = pwd.getpwnam('nobody')[2] try: os.setuid(nobody) except OSError, e: if e.errno != errno.EPERM: raise print >> sys.stderr, \ 'Cannot setuid "nobody"; try running with -n option.' sys.exit(1) import __main__ class_ = getattr(__main__, options.classname) proxy = class_((options.localhost, options.localport), (options.remotehost, options.remoteport)) try: asyncore.loop() except KeyboardInterrupt: pass From matt at mondoinfo.com Mon Aug 23 18:36:09 2004 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Mon Aug 23 18:43:29 2004 Subject: [Email-SIG] Feed parser recipe In-Reply-To: <4129D722.7070902@harvee.org> References: <1093111617.36.2300@mint-julep.mondoinfo.com> <4129D722.7070902@harvee.org> Message-ID: <1093278597.83.863@mint-julep.mondoinfo.com> > for lack of a better place to ask this question, I'm wondering > could submit the work I have done with making smtpd.py able to fork > of children which handle the SMTP transaction and then properly > terminate the child process. I think this would help make smtpd > somewhat more useful as a counterpoint to smtplib. I think that the thing to do would be to upload a patch to SourceForge: http://sourceforge.net/tracker/?group_id=5470&atid=305470 There's more detail in the FAQ for developers: http://www.python.org/dev/devfaq.html Regards, Matt From anthony at interlink.com.au Thu Aug 26 19:14:04 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Aug 26 19:12:32 2004 Subject: [Email-SIG] Feed parser recipe In-Reply-To: <1093111617.36.2300@mint-julep.mondoinfo.com> References: <1093111617.36.2300@mint-julep.mondoinfo.com> Message-ID: <412E1A5C.8090203@interlink.com.au> Matthew Dixon Cowles wrote: > Since the deadline for submissions that use 2.4's features is > September 24, I guess it's pretty unlikely that I'll be able to test > it against a release version, There is zero chance of 2.4 final being out before then. > and maybe even unlikely that I'll be > able to test against a beta. My _current_ plan is that b1 will be either the 23rd or 30th of August. So unless you're _really_ quick, no. > So I thought I'd ask here if anyone > thinks that the feed parser is likely to change significantly between > now and 2.4 final. I don't know of any plans to alter it. I'm pretty happy with the version that's there now - it's remarkably robust. > In particular, the code solves one problem I've seen in real life > when using the feed parser. I've seen at least a few spam mails that > have a content-type of multipart/ but which contain only a > single part. The feed parser can parse them, but the resulting > Message object is internally inconsistent: get_main_type() returns > "multipart" but is_multipart() returns False. In that case, my code > applies some messy heuristics in an attempt to figure out what the > right content-type is. This was something I recall hitting and deciding that the correct solution was the current one. I'm not sure what else it could do - maybe change the multipart to something else and install a .defects, but this really doesn't appeal to me, at all. -- Anthony Baxter It's never too late to have a happy childhood. From matt at mondoinfo.com Thu Aug 26 22:14:05 2004 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Thu Aug 26 22:14:14 2004 Subject: [Email-SIG] Feed parser recipe In-Reply-To: <412E1A5C.8090203@interlink.com.au> References: <1093111617.36.2300@mint-julep.mondoinfo.com> <412E1A5C.8090203@interlink.com.au> Message-ID: <1093550140.82.2408@mint-julep.mondoinfo.com> Dear Anthony, >> So I thought I'd ask here if anyone thinks that the feed parser is >> likely to change significantly between now and 2.4 final. > I don't know of any plans to alter it. I'm pretty happy with the > version that's there now - it's remarkably robust. That's great. >> The feed parser can parse them, but the resulting Message object >> is internally inconsistent: get_main_type() returns "multipart" >> but is_multipart() returns False. In that case, my code applies >> some messy heuristics in an attempt to figure out what the right >> content-type is. > This was something I recall hitting and deciding that the correct > solution was the current one. I'm not sure what else it could do - > maybe change the multipart to something else and install a > .defects, but this really doesn't appeal to me, at all. I agree entirely. You'd have to guess what to change multipart to. And the standard library isn't the place for a bunch of messy heuristics. I think that example code, such as Alex's Cookbook, is a much better place for things like that. Regards, Matt From anthony at interlink.com.au Fri Aug 27 19:03:44 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Aug 27 19:04:11 2004 Subject: [Email-SIG] Feed parser recipe In-Reply-To: <1093550140.82.2408@mint-julep.mondoinfo.com> References: <1093111617.36.2300@mint-julep.mondoinfo.com> <412E1A5C.8090203@interlink.com.au> <1093550140.82.2408@mint-julep.mondoinfo.com> Message-ID: <412F6970.40109@interlink.com.au> Matthew Dixon Cowles wrote: > > I agree entirely. You'd have to guess what to change multipart to. > And the standard library isn't the place for a bunch of messy > heuristics. I think that example code, such as Alex's Cookbook, is a > much better place for things like that. FWIW, I also posted a cookbook entry for stripping out attachments using the email package: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302086 With the new parser, it's really quite practical to use Python as a front-line defense against email virus/worms. We found the old parser too fragile to do this without having lots of manual fixups when it fell over.