From jason at mastaler.com Sun Oct 5 02:51:27 2003 From: jason at mastaler.com (Jason R. Mastaler) Date: Sun Oct 5 02:51:38 2003 Subject: [Email-SIG] Generator.HeaderParsedGenerator Message-ID: Can the attached patch be considered for inclusion in email? This issue is a former mimelib tracker item, but those trackers are now disabled. I've included the previous commentary leading to the patch below. FWIW, we've been using this in TMDA successfully for months now. ---------------------------------------------------------------------- Category: None Group: None Status: Open Priority: 5 Submitted By: Jason R. Mastaler (jasonrm) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: TypeError: 0 with Message.as_string() Initial Comment: Trying to flatten the attached message (crashes-tmda2.txt) results in a "TypeError: 0" exception. See the attached typeerror.txt for how to reproduce the error. Is this a bug in email? If not, is there a better way to handle this than simply "TypeError: 0"? ---------------------------------------------------------------------- Comment By: Jason R. Mastaler (jasonrm) Date: 2003-07-28 14:56 Message: Logged In: YES user_id=85984 Barry, I've now uploaded the patch (hdrgen.diff). ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-07-25 06:40 Message: Logged In: YES user_id=12800 In general, please upload patches instead of pasting them into a comment field, otherwise they're basically unusable. ---------------------------------------------------------------------- Comment By: Timothy Legant (tlegant) Date: 2003-07-24 19:00 Message: Logged In: YES user_id=435234 Would the following work for a HeaderParsedGenerator? It seems to work fine here, but perhaps someone with a more intimate knowledge of the email package will spot something that could trip this up. Seems simple enough, though... The diff is against CVS Generator.py. Apologies in advance if the comment entry system wraps the diff. :( Index: Generator.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/email/Generator.py,v retrieving revision 1.21 diff -u -r1.21 Generator.py --- Generator.py 24 Jun 2003 20:19:34 -0000 1.21 +++ Generator.py 25 Jul 2003 00:57:00 -0000 @@ -356,6 +356,26 @@ +class HeaderParsedGenerator(Generator): + """Generate text from a Message created by HeaderParser. + + Header is generated as usual (by Generator). The payload of a Message + created by HeaderParser is a raw string. No encoding is necessary. If it + came in valid, it goes out valid. Conversely, if it came in bogus, it goes + out bogus. + """ + def _dispatch(self, msg): + payload = msg.get_payload() + if payload is None: + return + if not _isstring(payload): + raise TypeError, 'string payload expected: %s' % type(payload) + if self._mangle_from_: + payload = fcre.sub('>From ', payload) + self._fp.write(payload) + + + # Helper _width = len(repr(sys.maxint-1)) _fmt = '%%0%dd' % _width ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-06-20 15:05 Message: Logged In: YES user_id=12800 Moving this to version 3.0 feature requests. Specifically, add a Generator that can handle HeaderParser parsed messages. ---------------------------------------------------------------------- Comment By: Timo C. Metzemakers (tcmetzemakers) Date: 2003-06-19 02:18 Message: Logged In: YES user_id=804319 I just got bitten by this, too, and I'd like to suggest that this might be a documentation bug. If you're a casual user like myself, and you just want to examine and/or modify a message's headers before passing it on, it only seems natural to use the HeaderParser, do your thing, and then call the as_string method. A short note in the documentation about this would be useful, IMHO. A ready-to-use Generator subclass that does what Barry says would be nice to have, too. ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-06-10 10:32 Message: Logged In: YES user_id=12800 Good idea. Done. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2003-06-10 03:20 Message: Logged In: YES user_id=29957 Assuming this is the same problem I hit, the following fix to Message.py means you get something more than 'TypeError: n'. elif not isinstance(self._payload, ListType): - raise TypeError, i + raise TypeError, "Expected list, got %s"%type(self._payload) ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-06-09 14:01 Message: Logged In: YES user_id=12800 It's not a bug. This is caused by the use of the default Generator with a message parsed by the HeaderParser. Generator flattens by looking at the Content-Type headers of the constituent parts. It expects to see a message object model that jives with the Content-Type headers. But your model doesn't because you've got a message/rfc822 content type with a string payload. You should probably use a Generator subclass that overrides _dispatch(). ---------------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: hdrgen.diff Type: text/x-patch Size: 1120 bytes Desc: not available Url : http://mail.python.org/pipermail/email-sig/attachments/20031005/23bee402/hdrgen.bin From gerrit at nl.linux.org Thu Oct 23 13:58:29 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Thu Oct 23 13:58:47 2003 Subject: [Email-SIG] email v3.0 feature request Message-ID: <20031023175829.GA4089@nl.linux.org> Hi, I have an idea for email v3.0. I think it may be useful if the e-mail module can treat the Received:-headers specially, and parse them for the user. It would then offer it as a Received object (an instance of a Received class?), being an iterable sequence with information of from/by/for/at as attributes of the object (with the date being a DateTime object rather than a string, of course). I recently though I could use this for a script which checks when an email arrived at my computer and then determines what to do with it based on this information. Currently, I solve this problem my replacing the Date:-header, which is not really an elegant solution. It is just an idea. I have no plans to implement it. I hope the idea is welcome ;) yours mailly, Gerrit. -- t to another house, she shall be judicially condemned and thrown into the water. -- 1780 BC, Hammurabi, Code of Law -- Asperger Syndroom - een persoonlijke benadering: http://people.nl.linux.org/~gerrit/ Kom in verzet tegen dit kabinet: http://www.sp.nl/ From barry at python.org Thu Oct 23 14:29:06 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 23 14:29:15 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <20031023175829.GA4089@nl.linux.org> References: <20031023175829.GA4089@nl.linux.org> Message-ID: <1066933746.11634.225.camel@anthem> On Thu, 2003-10-23 at 13:58, Gerrit Holl wrote: > I think it may be useful if the e-mail module can treat the Received:-headers > specially, and parse them for the user. Very cool idea. I'd still want the mapping interface (i.e. msg['Received']) to return the headers as strings, but having another method to return and ordered list of Received objects would be quite cool. Questions: - In what order should they be sorted? Closest hop first, or first hop first? - Should they be immutable? If not, should changing them be reflected back into the original headers, through the mapping interface or when flattened? I like the idea of a datetime too, but I think Python's default datetime stuff doesn't have any notions of timezones, and I think our Received objects would definitely want to be "aware". -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/email-sig/attachments/20031023/83137391/attachment.bin From fdrake at acm.org Thu Oct 23 14:39:06 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Oct 23 14:39:35 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <1066933746.11634.225.camel@anthem> References: <20031023175829.GA4089@nl.linux.org> <1066933746.11634.225.camel@anthem> Message-ID: <16280.8266.331816.225841@grendel.zope.com> Barry Warsaw writes: > I like the idea of a datetime too, but I think Python's default datetime > stuff doesn't have any notions of timezones, and I think our Received > objects would definitely want to be "aware". datetime.datetimetz supports timezone awareness, as defined for the datetime module. I don't know if there's a good source of tzinfo objects yet. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From barry at python.org Thu Oct 23 15:14:16 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 23 15:14:22 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <16280.8266.331816.225841@grendel.zope.com> References: <20031023175829.GA4089@nl.linux.org> <1066933746.11634.225.camel@anthem> <16280.8266.331816.225841@grendel.zope.com> Message-ID: <1066936455.11634.231.camel@anthem> On Thu, 2003-10-23 at 14:39, Fred L. Drake, Jr. wrote: > Barry Warsaw writes: > > I like the idea of a datetime too, but I think Python's default datetime > > stuff doesn't have any notions of timezones, and I think our Received > > objects would definitely want to be "aware". > > datetime.datetimetz supports timezone awareness, as defined for the > datetime module. I don't know if there's a good source of tzinfo > objects yet. Yeah, sorry, that's what I meant. We'd need some tzinfo objects. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/email-sig/attachments/20031023/52d527b3/attachment.bin From fdrake at acm.org Thu Oct 23 15:22:58 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Oct 23 15:23:10 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <1066936455.11634.231.camel@anthem> References: <20031023175829.GA4089@nl.linux.org> <1066933746.11634.225.camel@anthem> <16280.8266.331816.225841@grendel.zope.com> <1066936455.11634.231.camel@anthem> Message-ID: <16280.10898.715427.165752@grendel.zope.com> Barry Warsaw writes: > Yeah, sorry, that's what I meant. We'd need some tzinfo objects. I don't much about IBM's ICU, but that may be a good source for timezone data. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From matt at mondoinfo.com Thu Oct 23 15:37:52 2003 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Thu Oct 23 15:38:03 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <20031023175829.GA4089@nl.linux.org> References: <20031023175829.GA4089@nl.linux.org> Message-ID: <1066932381.96.4063@sake.mondoinfo.com> Dear Gerrit, > I think it may be useful if the e-mail module can treat the > Received:-headers specially, and parse them for the user. I agree that doing that would be useful but I've looked at doing it outside of the email module and I think it would be pretty hard. The problem is that RFC 2822 allows a rather elaborate syntax in received headers which I think would be at least a big nuisance to parse. Here's a small example: The "Received:" field contains a (possibly empty) list of name/value pairs followed by a semicolon and a date-time specification. The first item of the name/value pair is defined by item-name, and the second item is either an addr-spec, an atom, a domain, or a msg-id. And it gets worse from there. All of the really interesting information in a received header is in parentheses. Those are technically comments not specified by the syntax above. And comments are nearly free-form. So you'd need to try to match against what you think that every popular MTA generates and then probably try a few other things besides. That sort of moving target would create a lot of maintenance work. D. J. Bernstein has more on the subject at: http://cr.yp.to/immhf/envelope.html > It is just an idea. I have no plans to implement it. Rats! Regards, Matt From barry at python.org Thu Oct 23 15:43:48 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 23 15:43:55 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <16280.10898.715427.165752@grendel.zope.com> References: <20031023175829.GA4089@nl.linux.org> <1066933746.11634.225.camel@anthem> <16280.8266.331816.225841@grendel.zope.com> <1066936455.11634.231.camel@anthem> <16280.10898.715427.165752@grendel.zope.com> Message-ID: <1066938227.11634.259.camel@anthem> On Thu, 2003-10-23 at 15:22, Fred L. Drake, Jr. wrote: > Barry Warsaw writes: > > Yeah, sorry, that's what I meant. We'd need some tzinfo objects. > > I don't much about IBM's ICU, but that may be a good source for > timezone data. Isn't there a ton of ICU stuff in Zope3, thanks to Stephan Richter? I'm sure it's huge, but maybe it ought to be split out into a separate distutils package (or ) incorporated into Python's stdlib. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/email-sig/attachments/20031023/3137aa7b/attachment.bin From barry at python.org Thu Oct 23 15:47:38 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 23 15:47:44 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <1066932381.96.4063@sake.mondoinfo.com> References: <20031023175829.GA4089@nl.linux.org> <1066932381.96.4063@sake.mondoinfo.com> Message-ID: <1066938457.11634.261.camel@anthem> On Thu, 2003-10-23 at 15:37, Matthew Dixon Cowles wrote: > D. J. Bernstein has more on the subject at: > > http://cr.yp.to/immhf/envelope.html -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/email-sig/attachments/20031023/c4ba8f1c/attachment.bin From fdrake at acm.org Thu Oct 23 16:23:41 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Oct 23 16:23:51 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <1066938227.11634.259.camel@anthem> References: <20031023175829.GA4089@nl.linux.org> <1066933746.11634.225.camel@anthem> <16280.8266.331816.225841@grendel.zope.com> <1066936455.11634.231.camel@anthem> <16280.10898.715427.165752@grendel.zope.com> <1066938227.11634.259.camel@anthem> Message-ID: <16280.14541.33096.254960@grendel.zope.com> Barry Warsaw writes: > Isn't there a ton of ICU stuff in Zope3, thanks to Stephan Richter? I'm > sure it's huge, but maybe it ought to be split out into a separate > distutils package (or ) incorporated into Python's stdlib. There's a whole pile of it, but I just took a look, and ... it doesn't include timezone data. It's really all locale information. ;-( Surely there's a timezone database included with most Linux distros, but I don't remember enough about how that's done to know where to look anymore. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From matt at mondoinfo.com Thu Oct 23 18:30:38 2003 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Thu Oct 23 18:30:49 2003 Subject: [Email-SIG] email v3.0 feature request In-Reply-To: <1066938457.11634.261.camel@anthem> References: <20031023175829.GA4089@nl.linux.org> <1066932381.96.4063@sake.mondoinfo.com> <1066938457.11634.261.camel@anthem> Message-ID: <1066947331.61.4301@sake.mondoinfo.com> Barry, >> D. J. Bernstein has more on the subject at: >> >> http://cr.yp.to/immhf/envelope.html > > > -Barry That's a little hard to interpret. I can assure you that DJB is not my favorite person. Indeed, sometimes he tends pretty far in the other direction. But my experience is that he's smart and that his code is remarkably reliable and secure. In the case of parsing received headers, I had pretty well come to the same conclusion he did before coming across his opinion. I'm pretty sure that it would be possible to extract the right information something more than 90% of the time, but it would be a hack that needed to follow a moving target. If someone wants to write and maintain such a thing, I'll offer my sincere thanks since it would be useful to me. Regards, Matt