From hpj at urpla.net Wed Jun 8 05:56:40 2016 From: hpj at urpla.net (Hans-Peter Jansen) Date: Wed, 08 Jun 2016 11:56:40 +0200 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 Message-ID: <4466958.y32vL1X9xu@xrated> Dear audience, when coming back to this list, I couldn't believe my eyes because of the low volume level, but after rechecking with the archives, I have to accept, it is that quiet here, a bit too quiet from my POV. Hmm. Well, I'm in the course of replacing a special purpose postfix email filter, that is dating back to 2004 with a redeveloped Python 3 version right now. Basically all it is doing (in pseudo code): msg = email.message_from_file(fp) processing(msg) write(msg.as_string(True)) for a few 100 million mails during that time. After replacing it with: msg = email.message_from_binary_file(fp, policy = email.policy.SMTP) processing(msg) BytesGenerator(pipe).flatten(msg) Here, processing mostly saves bodies and attachments, depending on pattern matches and adds some headers. I was quite astonished to find out, that this procedure isn't working that well anymore: the email module appears way more sensible in the current state. This is a bit disappointing, as reading the docs conveys, that some effort was put into reliability and robustness. Given the much improved unicode handling of Python 3 itself and the ever improving experience in handling emails, this is contrary to my expectations, I have to confess. Minutes after switching to the new code, I stumbled across a traceback in msg.get_all('to') from a header like this: To: unlisted-recipients: ;, ""@pop.kundenserver.de (no To-header on input) Hmm, not nice. http://bugs.python.org/issue27257 Next, I wondered, that arbitrary header data appears in the body of some mail in my MUA. Tracked down to a mangled header, that has lost proper indentation: X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtCTDJQUjAyTUI1MTQ7MjM6bEtRRlNaUHQvVTk5WCttdktlOUVrUGQvVFBH?= =?utf-8?B?cDFJemVUeXFzOGNzYnZOYWlwMDZpR0YzbXZyY09WaTBKM2pkeUl4S1VDMkxw?= =?utf-8?B?eVRkNWthRW9waUhJTzczTWd5WDZOQ3hMNU1haGFvQTVzVTdRZmxJUnZlblpW?= ... versus: X-Microsoft-Exchange-Diagnostics: 1;BL2PR02MB514;23:lKQFSZPt/U99X+mvKe9EkPd/TPG p1IzeTyqs8csbvNaip06iGF3mvrcOVi0J3jdyIxKUC2Lp yTd5kaEopiHIO73MgyX6NCxL5MahaoA5sU7QflIRvenZV Oh, well. http://bugs.python.org/issue27256 Before I added some code to circumvent those occurrences, I stumbled across a traceback in flatten: http://bugs.python.org/issue27258 All these issues were harvested in less than halve an hour. What really troubles me is the quietness around here in the light of this experience. Doesn't people use Python (3) yet/anymore for these kind of tasks? Does somebody care? Am I missing something? I will do my best to dive into these issues in the next days/weeks, but would appreciate a dialog with somebody, who is involved in the email module code already. Thanks, Pete From rdmurray at bitdance.com Wed Jun 8 09:57:27 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jun 2016 09:57:27 -0400 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <4466958.y32vL1X9xu@xrated> References: <4466958.y32vL1X9xu@xrated> Message-ID: <20160608135728.48C35B14023@webabinitio.net> On Wed, 08 Jun 2016 11:56:40 +0200, Hans-Peter Jansen wrote: > All these issues were harvested in less than halve an hour. What really > troubles me is the quietness around here in the light of this experience. > Doesn't people use Python (3) yet/anymore for these kind of tasks? Does Apparently not. I know there are a few, because I had gotten and fixed some bug reports. > somebody care? Am I missing something? Unfortunately I haven't had much time for the email module in the past while, so I'm behind in dealing with existing bug reports, and haven't yet finished the doc rewrite I started. I'm hoping to have more time for that in the near term. > I will do my best to dive into these issues in the next days/weeks, but would > appreciate a dialog with somebody, who is involved in the email module code > already. That would be me. Just got back from PyCon on Monday, so I haven't looked at the new issues yet, and it may be a couple days before I can. --David From barry at python.org Wed Jun 8 10:14:16 2016 From: barry at python.org (Barry Warsaw) Date: Wed, 8 Jun 2016 10:14:16 -0400 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <4466958.y32vL1X9xu@xrated> References: <4466958.y32vL1X9xu@xrated> Message-ID: <20160608101416.10f9da72.barry@wooz.org> On Jun 08, 2016, at 11:56 AM, Hans-Peter Jansen wrote: >All these issues were harvested in less than halve an hour. What really >troubles me is the quietness around here in the light of this experience. >Doesn't people use Python (3) yet/anymore for these kind of tasks? Does >somebody care? Am I missing something? Mailman 3 uses Python 3 so we use all the parsing stuff in Python 3.4 and 3.5, although we don't use the SMTP policy. We haven't yet noticed any problems with the compat32 (default) policy, probably because it's more compatible with the way Python 2 parses things. Also, please don't be so exasperated so quickly. Just because there hasn't been a lot of traffic on this list recently doesn't necessarily mean nobody is using it, or doesn't care about bugs. Cheers, -Barry From rdmurray at bitdance.com Wed Jun 8 10:37:41 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jun 2016 10:37:41 -0400 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <20160608101416.10f9da72.barry@wooz.org> References: <4466958.y32vL1X9xu@xrated> <20160608101416.10f9da72.barry@wooz.org> Message-ID: <20160608143741.CB1BAB14023@webabinitio.net> On Wed, 08 Jun 2016 10:14:16 -0400, Barry Warsaw wrote: > On Jun 08, 2016, at 11:56 AM, Hans-Peter Jansen wrote: > > >All these issues were harvested in less than halve an hour. What really > >troubles me is the quietness around here in the light of this experience. > >Doesn't people use Python (3) yet/anymore for these kind of tasks? Does > >somebody care? Am I missing something? > > Mailman 3 uses Python 3 so we use all the parsing stuff in Python 3.4 and 3.5, > although we don't use the SMTP policy. We haven't yet noticed any problems > with the compat32 (default) policy, probably because it's more compatible with > the way Python 2 parses things. Right, in my previous note I was talking about people not using the new policies. The compat32 code is probably being used by a lot of people and seems to be working well, including having fixed some header parsing and folding bugs relative to the python2 version of the code. --David From turnbull at sk.tsukuba.ac.jp Wed Jun 8 10:42:31 2016 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 08 Jun 2016 23:42:31 +0900 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <4466958.y32vL1X9xu@xrated> References: <4466958.y32vL1X9xu@xrated> Message-ID: Hans-Peter Jansen writes: > Dear audience, > > when coming back to this list, I couldn't believe my eyes because > of the low volume level, but after rechecking with the archives, I > have to accept, it is that quiet here, a bit too quiet from my > POV. Hmm. It's just that very few people (one or two) are working on the module and in my experience it has been rock-solid compared to either Python 2.7 email or the package distributed with Mailman 2.1. I doubt very many people are using Python 3 email on high-volume mailstreams yet, as the high-performance networking (eg, Twisted) and perhaps some other libraries were late to be ported. > I was quite astonished to find out, that this procedure isn't > working that well anymore: the email module appears way more > sensible in the current state. This is a bit disappointing, as > reading the docs conveys, that some effort was put into reliability > and robustness. Given the much improved unicode handling of Python > 3 itself and the ever improving experience in handling emails, this > is contrary to my expectations, I have to confess. It's a complete rewrite from first principles. It's more robust in principle and more maintainable in practice, but faced with 100s of millions of emails (aka "tsunami of sewage"), the robustness can't be guaranteed. I'm willing to bet it will converge to "robust in practice" much faster than the previous design did. > Minutes after switching to the new code, I stumbled across a traceback in > msg.get_all('to') from a header like this: > > To: unlisted-recipients: ;, > ""@pop.kundenserver.de (no To-header on input) > > Hmm, not nice. http://bugs.python.org/issue27257 The header arguable fails to conform to RFC 5321, though it's syntactically permissible in RFC 5322. (See my comment on the issue.) > All these issues were harvested in less than halve an hour. What > really troubles me is the quietness around here in the light of > this experience. Doesn't people use Python (3) yet/anymore for > these kind of tasks? Probably not. > Does somebody care? email 5 for Python 3 is a complete rewrite from first principles. Yes, somebody cared. > Am I missing something? Patience and understanding of how opensource software development works, perhaps. From rdmurray at bitdance.com Wed Jun 8 16:19:26 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jun 2016 16:19:26 -0400 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: References: <4466958.y32vL1X9xu@xrated> Message-ID: <20160608201927.2B8F8B14027@webabinitio.net> On Wed, 08 Jun 2016 23:42:31 +0900, "Stephen J. Turnbull" wrote: > It's a complete rewrite from first principles. It's more robust in > principle and more maintainable in practice, but faced with 100s of > millions of emails (aka "tsunami of sewage"), the robustness can't be > guaranteed. I'm willing to bet it will converge to "robust in > practice" much faster than the previous design did. Except for the header folding algorithm in the new email policies. I rewrote the compat32 header folder to be much simpler and more maintainable, but the new policy header folder, which is much more accurate and capable because of the improved header parser, is unfortunately much too complex and opaque. So that got worse from a maintainability standpoint. Rewriting it now that I understand the edge cases better is high on my list of things to do, but we all know about available time issues.... ;) The header parser could also use some consistency cleanup, since it evolved a bit during development, and there are many places where it could be simplified, but despite those warts it is much better organized and *way* more accurate and information-rich than the older parser (which, frankly, was a very low-information parser). --David From hpj at urpla.net Wed Jun 8 16:42:58 2016 From: hpj at urpla.net (Hans-Peter Jansen) Date: Wed, 08 Jun 2016 22:42:58 +0200 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <20160608101416.10f9da72.barry@wooz.org> References: <4466958.y32vL1X9xu@xrated> <20160608101416.10f9da72.barry@wooz.org> Message-ID: <9003215.n43EmMAzKH@xrated> On Mittwoch, 8. Juni 2016 10:14:16 Barry Warsaw wrote: > On Jun 08, 2016, at 11:56 AM, Hans-Peter Jansen wrote: > >All these issues were harvested in less than halve an hour. What really > >troubles me is the quietness around here in the light of this experience. > >Doesn't people use Python (3) yet/anymore for these kind of tasks? Does > >somebody care? Am I missing something? > > Mailman 3 uses Python 3 so we use all the parsing stuff in Python 3.4 and > 3.5, although we don't use the SMTP policy. We haven't yet noticed any > problems with the compat32 (default) policy, probably because it's more > compatible with the way Python 2 parses things. Okay, will resort to compat32 policy for the time being. Heading closer towards the official standards appeared as a good idea, when studying the manual. > Also, please don't be so exasperated so quickly. Just because there hasn't > been a lot of traffic on this list recently doesn't necessarily mean nobody > is using it, or doesn't care about bugs. Barry, David, Stephen, please beg my pardon for sounding overly harsh. It's been a hard week, and in the face of these issues, I felt completely lost. Cheers, Pete From hpj at urpla.net Wed Jun 8 17:04:46 2016 From: hpj at urpla.net (Hans-Peter Jansen) Date: Wed, 08 Jun 2016 23:04:46 +0200 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: References: <4466958.y32vL1X9xu@xrated> Message-ID: <3085076.xcgyQDFDgY@xrated> [Sorry Stephen, wrong key...] On Mittwoch, 8. Juni 2016 23:42:31 you wrote: > Hans-Peter Jansen writes: > > Dear audience, > > > > when coming back to this list, I couldn't believe my eyes because > > of the low volume level, but after rechecking with the archives, I > > have to accept, it is that quiet here, a bit too quiet from my > > POV. Hmm. > > It's just that very few people (one or two) are working on the module > and in my experience it has been rock-solid compared to either Python > 2.7 email or the package distributed with Mailman 2.1. I doubt very > many people are using Python 3 email on high-volume mailstreams yet, > as the high-performance networking (eg, Twisted) and perhaps some > other libraries were late to be ported. Good to know, at least, I'm not alone. > > I was quite astonished to find out, that this procedure isn't > > working that well anymore: the email module appears way more > > sensible in the current state. This is a bit disappointing, as > > reading the docs conveys, that some effort was put into reliability > > and robustness. Given the much improved unicode handling of Python > > 3 itself and the ever improving experience in handling emails, this > > is contrary to my expectations, I have to confess. > > It's a complete rewrite from first principles. It's more robust in > principle and more maintainable in practice, but faced with 100s of > millions of emails (aka "tsunami of sewage"), the robustness can't be > guaranteed. I'm willing to bet it will converge to "robust in > practice" much faster than the previous design did. I will take your word on that. As Barry and David pointed out, some issues probably vanish by simply using compat32 policy right now. > > Does somebody care? > > email 5 for Python 3 is a complete rewrite from first principles. > Yes, somebody cared. Well, there's some light at the end of the tunnel. Good to know. > > Am I missing something? > > Patience and understanding of how opensource software development > works, perhaps. Okay, as already said, I'm sorry for sounding overly harsh. Usually, when I report such problems nowadays, I add a patch proposal for fixing the issue, but these issues were overwhelming me. Needless to mention the complexity of the email package itself and my reluctance of studying RFCs. Cheers, Pete From rdmurray at bitdance.com Wed Jun 8 18:30:10 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jun 2016 18:30:10 -0400 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <3085076.xcgyQDFDgY@xrated> References: <4466958.y32vL1X9xu@xrated> <3085076.xcgyQDFDgY@xrated> Message-ID: <20160608223011.3AEECB14027@webabinitio.net> On Wed, 08 Jun 2016 23:04:46 +0200, Hans-Peter Jansen wrote: > Usually, when I report such problems nowadays, I add a patch proposal for > fixing the issue, but these issues were overwhelming me. Needless to mention > the complexity of the email package itself and my reluctance of studying RFCs. Yeah, the goal is to encode the knowledge of the RFCs in the code, so you don't have to read them. But of course, to work on the code one must read them, and the more corner cases we support, the better one has to know the RFCs to work on the code. Not a nice thing :(. Still, the code can be made a lot clearer than it currently is. It is complex because the RFCs are complex, but it is still more complex currently than it needs to be. --David From hpj at urpla.net Thu Jun 9 12:37:54 2016 From: hpj at urpla.net (Hans-Peter Jansen) Date: Thu, 09 Jun 2016 18:37:54 +0200 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <20160608143741.CB1BAB14023@webabinitio.net> References: <4466958.y32vL1X9xu@xrated> <20160608101416.10f9da72.barry@wooz.org> <20160608143741.CB1BAB14023@webabinitio.net> Message-ID: <2292062.iZc7AriIEe@xrated> On Mittwoch, 8. Juni 2016 10:37:41 R. David Murray wrote: > > Right, in my previous note I was talking about people not using the > new policies. The compat32 code is probably being used by a lot of > people and seems to be working well, including having fixed some header > parsing and folding bugs relative to the python2 version of the code. FYI, all my reported issues just _vanish_, when using the compat32 policy. This apparently makes a _huge_ difference. Probably, using SMTP policy was just a unfortunate choice. I'm back to regenerating all mails with the compat32 policy now (after adding a SMTP logging module for critical conditions ;) ). Let's see, how this goes. While at it, David, you seem to think about improving email. Apart from all considerations related to streaming/memory consumption/assembly, IMHO the weakest spot of the email package is header handling: the magic formula str(email.header.make_header(email.header.decode_header(msg['subject']))) for getting to the "real" subject string is, cough, improvable. Sure, this is complicated by the all the other modules, that are using email.header as well. I can only remotely imagine, how hard this is going to be in order to get this out of the SNAFU state.. Anyway.. Getting-back-some-confidence-in-email-ly yours, Pete From rdmurray at bitdance.com Thu Jun 9 16:00:18 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 09 Jun 2016 16:00:18 -0400 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <2292062.iZc7AriIEe@xrated> References: <4466958.y32vL1X9xu@xrated> <20160608101416.10f9da72.barry@wooz.org> <20160608143741.CB1BAB14023@webabinitio.net> <2292062.iZc7AriIEe@xrated> Message-ID: <20160609200019.6E257B14027@webabinitio.net> On Thu, 09 Jun 2016 18:37:54 +0200, Hans-Peter Jansen wrote: > Apart from all considerations related to streaming/memory > consumption/assembly, IMHO the weakest spot of the email package is header > handling: the magic formula > > str(email.header.make_header(email.header.decode_header(msg['subject']))) > > for getting to the "real" subject string is, cough, improvable. > > Sure, this is complicated by the all the other modules, that are using > email.header as well. I can only remotely imagine, how hard this is going to > be in order to get this out of the SNAFU state.. That's exactly what the new policies like SMTP do, using all the new code I wrote. (That is, I did the hard work a couple years ago.) With the new policies, getting the "real" value of the header becomes: msg['subject'] Now we just have to work out the bugs in the new code :) Streaming and memory consumption have yet to be addressed. By the time that's done, there won't be much of the original code left outside of the compatibility mode :) --David From hpj at urpla.net Fri Jun 10 11:54:34 2016 From: hpj at urpla.net (Hans-Peter Jansen) Date: Fri, 10 Jun 2016 17:54:34 +0200 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <2292062.iZc7AriIEe@xrated> References: <4466958.y32vL1X9xu@xrated> <20160608143741.CB1BAB14023@webabinitio.net> <2292062.iZc7AriIEe@xrated> Message-ID: <3009775.HNTOB4MqHU@xrated> On Donnerstag, 9. Juni 2016 18:37:54 Hans-Peter Jansen wrote: > On Mittwoch, 8. Juni 2016 10:37:41 R. David Murray wrote: > > Right, in my previous note I was talking about people not using the > > new policies. The compat32 code is probably being used by a lot of > > people and seems to be working well, including having fixed some header > > parsing and folding bugs relative to the python2 version of the code. > > FYI, all my reported issues just _vanish_, when using the compat32 policy. > This apparently makes a _huge_ difference. Probably, using SMTP policy was > just a unfortunate choice. > > I'm back to regenerating all mails with the compat32 policy now (after > adding a SMTP logging module for critical conditions ;) ). > > Let's see, how this goes. Hmm, compat32 and Python3 start to get in my way in no funny ways. As David implied already, the encoding dance is necessary for compat32. Since my app is a simple postfix filter, I have no way of controlling, how it is executed. "Of course", it is executed in the POSIX locale (LANG=C), which is, as I have learned now, one of the weakest spots of Python3 itself, when it comes to the default filesystem encoding. Needless to say, my filter acts on regex pattern for subjects, attachments and the like, that contain utf-8 literals. If I read the bugs http://bugs.python.org/issue19846, https://bugs.python.org/issue19847, and a couple of others correctly, there's no way to supply a different filesystem encoding without control of the environment variables. Hence, with Python3, I have three choices: * patch postfix to provide a proper locale * patch Python3 to force a certain locale, if setup is broken, e.g.: http://bugs.python.org/file24064//tmp/filesystem_encoding_utf8.patch * change _all_ filesystem operations to catch UnicodeEncodeError and encode manually, which are a lot. These options are awful. Altogether. Am I missing something (again)? Cheers, Pete From hpj at urpla.net Sun Jun 12 10:22:28 2016 From: hpj at urpla.net (Hans-Peter Jansen) Date: Sun, 12 Jun 2016 16:22:28 +0200 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <3009775.HNTOB4MqHU@xrated> References: <4466958.y32vL1X9xu@xrated> <2292062.iZc7AriIEe@xrated> <3009775.HNTOB4MqHU@xrated> Message-ID: <7800289.qb3L3Yz5NG@xrated> [This mail is intentionally hand wrapped..] On Freitag, 10. Juni 2016 17:54:34 Hans-Peter Jansen wrote: > On Donnerstag, 9. Juni 2016 18:37:54 Hans-Peter Jansen wrote: > > > > Let's see, how this goes. > > Hmm, compat32 and Python3 start to get in my way in no funny ways. Okay, got it working now. It turned out to be a problem in the logging module. Unfortunately, it's not reproducible outside postfix mail filter setup. http://www.postfix.org/FILTER_README.html I will try to explain this in short words, but you will not believe me, as I have a hard time to believe this myself.. When a python3 process is used in a simple mail filter setup, as described in the FILTER_README.html document, it is executed in a stripped down environment. As relevant parts, just LANG=C and PATH=/bin;/usr/bin is set. The filter reads the mail from stdin, and calls sendmail for passing it on, again, with the mail on stdin.. Now, it takes a mail with some "higher" encoding (I'm using a utf-8 encoded subject containing german umlauts), and an attempt to log the subject line (which has to be a log file for obvious reasons). Now take a save seat, this attempt of unicode logging results in a manipulation of the execution frame, the execution precedes a few instructions "below". Yes, I'm not kidding, no escaped surrogates output, no error message, just no logging of the offending line, and this "esoteric" behavior. Be assured, that if I would still have enough hair on my head, I would have teared it off completely by now. This is reproducible for Python 3.4.4 on openSUSE 13.2/x86_64 here. For the brave, who want to reproduce/investigate this issue, I'm attaching everything necessary. All others should stop reading now. Thank you. Still with me, here we go: [$: root prompt] A working postfix setup is implied. Stop all usual processing (fetchmail, ...) $ useradd --gid mail mfilter $ cat >> /etc/postfix/master.cf << EOF mfilter unix - n n - 1 pipe flags=Rq user=mfilter argv=/path/to/mail_filter_test.py -f ${sender} -- ${recipient} EOF $ systemctl restart postfix $ sendmail -f your at email.address your at email.address < umlaut-subject-2.mail $ less +F /tmp/mail_filter_test.log Defective output: 2016-06-12 15:36:28,540 [mail_filter_test] DEBUG: parse message 2016-06-12 15:36:28,543 [mail_filter_test] DEBUG: call ['/usr/sbin/sendmail', '-G', '-i', '-f', 'your at email.address', '--', 'your at email.address'] Output with SILLY_BEHAVIOR = 0: 2016-06-12 15:37:50,887 [mail_filter_test] DEBUG: parse message 2016-06-12 15:37:50,889 [mail_filter_test] DEBUG: subject: Wie deaktiviere oder l?sche ich meine SprachBox IP der Telekom? 2016-06-12 15:37:50,890 [mail_filter_test] DEBUG: call ['/usr/sbin/sendmail', '-G', '-i', '-f', 'your at email.address', '--', 'your at email.address'] Note, that the subject line is missing. In my real filter, it left the current execution frame, and execution continued one or two level up the stack. This reminds me at my assembler times (680x0 power, long ago!), where I used such tricks like modifying the program counter in "very special arrangements". I don't think, this is an adequate outcome of the attached code, do you? I'm directing this here, while I know, this is quite off-topic in the result. OTOH, Stephen and David discussed the LANG=C issues with Victor, and this is one example, where this is very relevant to Python3 fitness to act as such a filter. Apart from fixing the logging, I'm encoding all paths and file names with a configurable encoding before calling the OS. Butt ugly, but feasible. I hope, that at least one of you is able to reproduce this, before we decide, how to precede. I hope, I don't offend anybody here with that approach. Please speak up, if I should go away and search for another tree to bark at. Thanks, Pete -------------- next part -------------- A non-text attachment was scrubbed... Name: mail_filter_test.py Type: text/x-python Size: 1925 bytes Desc: not available URL: -------------- next part -------------- https://www.telekom.de/hilfe/festnetz-internet-tv/telefonieren-einstellungen/sprachbox/sprachbox-ip/sprachbox-ip-deaktivieren-oder-loeschen?samChecked=true From matt at mondoinfo.com Sun Jun 12 14:01:13 2016 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Sun, 12 Jun 2016 13:01:13 -0500 (CDT) Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <7800289.qb3L3Yz5NG@xrated> References: <4466958.y32vL1X9xu@xrated> <2292062.iZc7AriIEe@xrated> <3009775.HNTOB4MqHU@xrated> <7800289.qb3L3Yz5NG@xrated> Message-ID: <1465751662.24.299@mint-julep.mondoinfo.com> Pete, > Note, that the subject line is missing. In my real filter, it left > the current execution frame, and execution continued one or two > level up the stack. The only thing I've run into that's like that is when an exception is raised in a function that I didn't think would do that and was caught in a place that I didn't expect. That it happens with the logging module and a non-ASCII encoding but only when run in an environment with very few environment variables also suggests something like that to me. I realize that that's not what your example code suggests, but if it were obvious it would already be fixed. Regards, Matt From hpj at urpla.net Sun Jun 12 17:30:42 2016 From: hpj at urpla.net (Hans-Peter Jansen) Date: Sun, 12 Jun 2016 23:30:42 +0200 Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <1465751662.24.299@mint-julep.mondoinfo.com> References: <4466958.y32vL1X9xu@xrated> <7800289.qb3L3Yz5NG@xrated> <1465751662.24.299@mint-julep.mondoinfo.com> Message-ID: <1606484.fv4zu21fkq@xrated> Hi Matt, thanks for your hints. On Sonntag, 12. Juni 2016 13:01:13 Matthew Dixon Cowles wrote: > Pete, > > > Note, that the subject line is missing. In my real filter, it left > > the current execution frame, and execution continued one or two > > level up the stack. > > The only thing I've run into that's like that is when an exception is > raised in a function that I didn't think would do that and was caught > in a place that I didn't expect. Yes, sure, that's a more common situation. > That it happens with the logging module and a non-ASCII encoding but > only when run in an environment with very few environment variables > also suggests something like that to me. > > I realize that that's not what your example code suggests, but if it > were obvious it would already be fixed. Yes, of course. BTW, I even tried to recreate the way, that postfix ought to run the filter with no positive result, either. There's something fishy going on here. Meanwhile, I've reduced the code to the bare minimum (attached). I also noticed, that the example mail is displayed in funny ways in my MUA, therefore attached again gzipped.. Thanks, Pete -------------- next part -------------- A non-text attachment was scrubbed... Name: umlaut-subject-2.mail.gz Type: application/gzip Size: 328 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mail_filter_test.py Type: text/x-python Size: 1981 bytes Desc: not available URL: From matt at mondoinfo.com Sun Jun 12 20:03:16 2016 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Sun, 12 Jun 2016 19:03:16 -0500 (CDT) Subject: [Email-SIG] Some parsing/generation issues of email in Python 3 In-Reply-To: <1606484.fv4zu21fkq@xrated> References: <4466958.y32vL1X9xu@xrated> <7800289.qb3L3Yz5NG@xrated> <1465751662.24.299@mint-julep.mondoinfo.com> <1606484.fv4zu21fkq@xrated> Message-ID: <1465774688.69.21500@mint-julep.mondoinfo.com> Pete, Here's another thought for whatever it's worth: Do you get the same result if you make your own copy of the logging module? If you do, can you, er, log from it to see just what it's doing? Regards, Matt