From ncoghlan at gmail.com Thu Dec 1 00:39:52 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Dec 2011 09:39:52 +1000 Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning In-Reply-To: References: <4E43E9A6.7020608@netwok.org> <20110811183114.701DF3A406B@sparrow.telecommunity.com> <4ED1196E.8090505@netwok.org> Message-ID: On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby wrote: > It doesn't help at all that I'm not really in a position to provide an > implementation, and the persons most likely to implement have been leaning > somewhat towards 382, or wanting to modify 402 such that it uses .pyp > directory extensions so that PEP 395 can be supported... While I was initially a fan of the possibilities of PEP 402, I eventually decided that we would be trading an easy problem ("you need an '__init__.py' marker file or a '.pyp' extension to get Python to recognise your package directory") for a hard one ("What's your sys.path look like? What did you mean for it to look like?"). Symlinks (and the fact we implicitly call realname() during system initialisation and import) just make things even messier. *Deliberately* allowing package structures on the filesystem to become ambiguous is a recipe for future pain (and could potentially undo a lot of the good work done by PEP 328's elimination of implicit relative imports). I acknowledge there is a lot of confusion amongst novices as to how packages and imports actually work, but my diagnosis of the root cause of that problem is completely different from that supposed by PEP 402 (as documented in the more recent versions of PEP 395, I've come to believe it is due to the way we stuff up the default sys.path[0] initialisation when packages are involved). So, in the end, I've come to strongly prefer the PEP 382 approach. The principle of "Explicit is better than implicit" applies to package detection on the filesystem just as much as it does to any other kind of API design, and it really isn't that different from the way we treat actual Python files (i.e. you can *execute* arbitrary files, but they need to have an appropriate extension if you want to import them). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From anacrolix at gmail.com Thu Dec 1 01:46:47 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 1 Dec 2011 11:46:47 +1100 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: I did see this, I'm not convinced it's only relevant to PyPy. On Thu, Dec 1, 2011 at 2:25 AM, Benjamin Peterson wrote: > 2011/11/30 Matt Joiner : >> Given GCC's announcement that Intel's STM will be an extension for C >> and C++ in GCC 4.7, what does this mean for Python, and the GIL? >> >> I've seen efforts made to make STM available as a context, and for use >> in user code. I've also read about the "old attempts way back" that >> attempted to use finer grain locking. The understandably failed due to >> the heavy costs involved in both the locking mechanisms used, and the >> overhead of a reference counting garbage collection system. >> >> However given advances in locking and garbage collection in the last >> decade, what attempts have been made recently to try these new ideas >> out? In particular, how unlikely is it that all the thread safe >> primitives, global contexts, and reference counting functions be made >> __transaction_atomic, and magical parallelism performance boosts >> ensue? > > Have you seen http://morepypy.blogspot.com/2011/08/we-need-software-transactional-memory.html > ? > > > -- > Regards, > Benjamin From solipsis at pitrou.net Thu Dec 1 01:50:12 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 1 Dec 2011 01:50:12 +0100 Subject: [Python-Dev] STM and python References: Message-ID: <20111201015012.3a6f1ca2@pitrou.net> On Thu, 1 Dec 2011 01:31:14 +1100 Matt Joiner wrote: > > However given advances in locking and garbage collection in the last > decade, what attempts have been made recently to try these new ideas > out? In particular, how unlikely is it that all the thread safe > primitives, global contexts, and reference counting functions be made > __transaction_atomic, and magical parallelism performance boosts > ensue? IMHO, it sounds a bit too magical to be true. > I'm aware that C89, platforms without STM/GCC, and single threaded > performance are concerns. Please ignore these for the sake of > discussion about possibilities. > > http://gcc.gnu.org/wiki/TransactionalMemory I find it interesting that the only example of hardware transactional memory mentioned in this page is a Sun CPU project which has been cancelled. Does Intel have anything similar in the works? Regards Antoine. From greg at krypto.org Thu Dec 1 01:58:29 2011 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 30 Nov 2011 16:58:29 -0800 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: Azul has been using hardware transactional memory on their custom CPUs (and likely STM in their current x86 virtual machine based products) to great effect for their massively parallel Java VM (700+ cpu cores and gobs of ram) for over 4 years. I'll leave it to the reader to do the relevant searching to read more on that. My point is: This is up to any given Python VM implementation to take advantage of or not as it sees fit. Shoe horning it into an existing VM may not make much sense but anyone is welcome to try. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Dec 1 06:41:35 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Dec 2011 15:41:35 +1000 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: On Thu, Dec 1, 2011 at 10:58 AM, Gregory P. Smith wrote: > Azul has been using hardware transactional memory on their custom CPUs (and > likely STM in their current x86 virtual machine based products) to great > effect for their massively parallel Java VM (700+ cpu cores and gobs of ram) > for over 4 years. ?I'll leave it to the reader to do the relevant searching > to read more on that. > > My point is: This is up to any given Python VM implementation to take > advantage of or not as it sees fit. ?Shoe horning it into an existing VM may > not make much sense but anyone is welcome to try. There's a patch somewhere on the tracker to add an "Armin Rigo hook" to the CPython eval loop so he can play with STM in Python as well (at least, I think it was STM he wanted it for - it might have been something else). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From anacrolix at gmail.com Thu Dec 1 07:06:43 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 1 Dec 2011 17:06:43 +1100 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: I saw this, I believe it just exposes an STM primitive to user code. It doesn't make use of STM for Python internals. Explicit STM doesn't seem particularly useful for a language that doesn't expose raw memory in its normal usage. On Thu, Dec 1, 2011 at 4:41 PM, Nick Coghlan wrote: > On Thu, Dec 1, 2011 at 10:58 AM, Gregory P. Smith wrote: >> Azul has been using hardware transactional memory on their custom CPUs (and >> likely STM in their current x86 virtual machine based products) to great >> effect for their massively parallel Java VM (700+ cpu cores and gobs of ram) >> for over 4 years. ?I'll leave it to the reader to do the relevant searching >> to read more on that. >> >> My point is: This is up to any given Python VM implementation to take >> advantage of or not as it sees fit. ?Shoe horning it into an existing VM may >> not make much sense but anyone is welcome to try. > > There's a patch somewhere on the tracker to add an "Armin Rigo hook" > to the CPython eval loop so he can play with STM in Python as well (at > least, I think it was STM he wanted it for - it might have been > something else). > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From raymond.hettinger at gmail.com Thu Dec 1 07:10:12 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Wed, 30 Nov 2011 22:10:12 -0800 Subject: [Python-Dev] Warnings Message-ID: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> When updating the documentation, please don't go overboard with warnings. The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly. See http://docs.python.org/documenting/style.html#affirmative-tone The docs for the subprocess module currently have SEVEN warning boxes on one page: http://docs.python.org/library/subprocess.html#module-subprocess The implicit message is that our tools are hazardous and should be avoided. Please show some restraint and aim for clean looking, high-quality technical writing without the FUD. Look at the SQLite3 docs for an example of good writing. The prevention of SQL injection attacks is discussed briefly and effectively without big red boxes littering the page. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Thu Dec 1 08:02:25 2011 From: glyph at twistedmatrix.com (Glyph) Date: Thu, 1 Dec 2011 02:02:25 -0500 Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning In-Reply-To: References: <4E43E9A6.7020608@netwok.org> <20110811183114.701DF3A406B@sparrow.telecommunity.com> <4ED1196E.8090505@netwok.org> Message-ID: On Nov 30, 2011, at 6:39 PM, Nick Coghlan wrote: > On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby wrote: >> It doesn't help at all that I'm not really in a position to provide an >> implementation, and the persons most likely to implement have been leaning >> somewhat towards 382, or wanting to modify 402 such that it uses .pyp >> directory extensions so that PEP 395 can be supported... > > While I was initially a fan of the possibilities of PEP 402, I > eventually decided that we would be trading an easy problem ("you need > an '__init__.py' marker file or a '.pyp' extension to get Python to > recognise your package directory") for a hard one ("What's your > sys.path look like? What did you mean for it to look like?"). Symlinks > (and the fact we implicitly call realname() during system > initialisation and import) just make things even messier. > *Deliberately* allowing package structures on the filesystem to become > ambiguous is a recipe for future pain (and could potentially undo a > lot of the good work done by PEP 328's elimination of implicit > relative imports). > > I acknowledge there is a lot of confusion amongst novices as to how > packages and imports actually work, but my diagnosis of the root cause > of that problem is completely different from that supposed by PEP 402 > (as documented in the more recent versions of PEP 395, I've come to > believe it is due to the way we stuff up the default sys.path[0] > initialisation when packages are involved). > > So, in the end, I've come to strongly prefer the PEP 382 approach. The > principle of "Explicit is better than implicit" applies to package > detection on the filesystem just as much as it does to any other kind > of API design, and it really isn't that different from the way we > treat actual Python files (i.e. you can *execute* arbitrary files, but > they need to have an appropriate extension if you want to import > them). I've helped an almost distressing number of newbies overcome their confusion about sys.path and packages. Systems using Twisted are, almost by definition, hairy integration problems, and are frequently being created or maintained by people with little to no previous Python experience. Given that experience, I completely agree with everything you've written above (except for the part where you initially liked it). I appreciate the insight that PEP 402 offers about python's package mechanism (and the difficulties introduced by namespace packages). Its statement of the problem is good, but in my opinion its solution points in exactly the wrong direction: packages need to be _more_ explicit about their package-ness and tools need to be stricter about how they're laid out. It would be great if sys.path[0] were actually correct when running a script inside a package, or at least issued a warning which would explain how to correctly lay out said package. I would love to see a loud alarm every time a module accidentally got imported by the same name twice. I wish I knew, once and for all, whether it was 'import Image' or 'from PIL import Image'. My hope is that if Python starts to tighten these things up a bit, or at least communicate better about best practices, editors and IDEs will develop better automatic discovery features and frameworks will start to normalize their sys.path setups and stop depending on accidents of current directory and script location. This will in turn vastly decrease confusion among new python developers taking on large projects with a bunch of libraries, who mostly don't care what the rules for where files are supposed to go are, and just want to put them somewhere that works. -glyph From glyph at twistedmatrix.com Thu Dec 1 08:15:01 2011 From: glyph at twistedmatrix.com (Glyph) Date: Thu, 1 Dec 2011 02:15:01 -0500 Subject: [Python-Dev] Warnings In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> Message-ID: <22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com> On Dec 1, 2011, at 1:10 AM, Raymond Hettinger wrote: > When updating the documentation, please don't go overboard with warnings. > The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly. > See http://docs.python.org/documenting/style.html#affirmative-tone > > The docs for the subprocess module currently have SEVEN warning boxes on one page: > http://docs.python.org/library/subprocess.html#module-subprocess > The implicit message is that our tools are hazardous and should be avoided. > > Please show some restraint and aim for clean looking, high-quality technical writing without the FUD. > > Look at the SQLite3 docs for an example of good writing. The prevention of SQL injection attacks is discussed briefly and effectively without big red boxes littering the page. I'm not convinced this is actually a great example of how to outline pitfalls clearly; it doesn't say what an SQL injection attack is, or what the consequences might be. Also, it's not the best example of a positive tone. The narrative is: You probably want to do X. Don't do Y, because it will make you vulnerable to a Q attack. Instead, do Z. Here's an example of Y. Don't do it! Okay, finally, here's an example of Z. It would be better to say "You probably want to do X. Here's how you do X, with Z. Here's an example of Z." Then, later, discuss why some people want to do Y, and why you should avoid that impulse. However, what 'subprocess' is doing clearly isn't an improvement, it's not an effective introduction to secure process execution, just a reference document punctuated with ambiguous anxiety. sqlite3 is at least somewhat specific :). I think both of these documents point to a need for a recommended idiom for discussing security, or at least common antipatterns, within the Python documentation. I like the IETF's "security considerations" section, because it separates things off into a section that can be referred to later, once the developer has had an opportunity to grasp the basics. Any section with security implications can easily say "please refer to the 'security considerations' section for important information on how to avoid common mistakes" without turning into a big security digression on its own. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Dec 1 08:32:36 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Dec 2011 17:32:36 +1000 Subject: [Python-Dev] Warnings In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> Message-ID: On Thu, Dec 1, 2011 at 4:10 PM, Raymond Hettinger wrote: > When updating the documentation, please don't go overboard with warnings. > The docs need to be worded affirmatively -- say what a tool does and show > how to use it correctly. > See?http://docs.python.org/documenting/style.html#affirmative-tone > > The docs for the subprocess module currently have SEVEN warning boxes on one > page: > http://docs.python.org/library/subprocess.html#module-subprocess > The implicit message is that our tools are hazardous and should be avoided. I have no problem with eliminating a lot of those specific warnings - I kept them there in the last rewrite (and added a couple of new ones) because avoiding shell injection vulnerabilities is such a driving theme behind the subprocess module design. Since I was already changing a lot of other things, messing with that aspect really wasn't high on my priority list. Now that we have the "frequently used arguments" section, though, the rest of the warnings could fairly readily be downgraded to notes or inline references to that section. > Please?show some restraint and aim for clean looking, high-quality technical > writing without the FUD. I do object to you calling genuine attempts to educate programmers about security issues FUD, though. It's not FUD - novice programmers inflict shell injection, script injection and SQL injection vulnerabilities on the world every day. The multiple warnings are there in the subprocess docs because people often only look at the documentation for the specific function they're interested in, not at the broader context of the page it is part of. "Overkill" is a legitimate complaint, but calling attempts to highlight genuinely insecure practices FUD is the kind of attitude that has given the world so many years of persistent vulnerability to buffer overflow attacks :P Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Dec 1 08:36:37 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Dec 2011 17:36:37 +1000 Subject: [Python-Dev] Warnings In-Reply-To: <22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com> Message-ID: On Thu, Dec 1, 2011 at 5:15 PM, Glyph wrote: > I think both of these documents point to a need for a recommended idiom for > discussing security, or at least common antipatterns, within the Python > documentation. ?I like the IETF's "security considerations" section, because > it separates things off into a section that can be referred to later, once > the developer has had an opportunity to grasp the basics. ?Any section with > security implications can easily say "please refer to the 'security > considerations' section for important information on how to avoid common > mistakes" without turning into a big security digression on its own. I like that approach - one of the problems with online docs is the fact people don't read them in order, hence the proliferation of warnings for the subprocess module. A clear "Security Considerations" section with appropriate cross links would allow us to be clear and explicit about common problems without littering the docs with red warning boxes for security issues that are inherent in a particular task rather than being a Python-specific problem. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Dec 1 08:55:19 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Dec 2011 17:55:19 +1000 Subject: [Python-Dev] Warnings In-Reply-To: References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com> Message-ID: On Thu, Dec 1, 2011 at 5:36 PM, Nick Coghlan wrote: > On Thu, Dec 1, 2011 at 5:15 PM, Glyph wrote: >> I think both of these documents point to a need for a recommended idiom for >> discussing security, or at least common antipatterns, within the Python >> documentation. ?I like the IETF's "security considerations" section, because >> it separates things off into a section that can be referred to later, once >> the developer has had an opportunity to grasp the basics. ?Any section with >> security implications can easily say "please refer to the 'security >> considerations' section for important information on how to avoid common >> mistakes" without turning into a big security digression on its own. > > I like that approach - one of the problems with online docs is the > fact people don't read them in order, hence the proliferation of > warnings for the subprocess module. A clear "Security Considerations" > section with appropriate cross links would allow us to be clear and > explicit about common problems without littering the docs with red > warning boxes for security issues that are inherent in a particular > task rather than being a Python-specific problem. I created http://bugs.python.org/issue13515 to propose a specific documentation style guide adopt along these lines (expanded a bit to cover other cross-cutting concerns like the pipe buffer blocking I/O problem in subprocess). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From arigo at tunes.org Thu Dec 1 12:01:44 2011 From: arigo at tunes.org (Armin Rigo) Date: Thu, 1 Dec 2011 12:01:44 +0100 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: Hi, On Thu, Dec 1, 2011 at 07:06, Matt Joiner wrote: > I saw this, I believe it just exposes an STM primitive to user code. > It doesn't make use of STM for Python internals. That's correct. > Explicit STM doesn't seem particularly useful for a language that > doesn't expose raw memory in its normal usage. In my opinion, that sentence could not be more wrong. It is true that, as I discuss on the blog post cited a few times in this thread, the first goal I see is to use STM to replace the GIL as an internal way of keeping the state of the interpreter consistent. This could quite possibly be achieved using the new GCC __transaction_atomic keyword, although I see already annoying issues (e.g. the keyword can only protect a _syntactically nested_ piece of code as a transaction). However there is another aspect: user-exposed STM, which I didn't explore much. While it is potentially even more important, it is a language design question, so I'm happy to delegate it to python-dev. In my opinion, explicit STM (like Clojure) is not only *a* way to write multithreaded Python programs, but it seems to be *the only* way that really makes sense in general, for more than small examples and more than examples where other hacks are enough (see http://en.wikipedia.org/wiki/Software_transactional_memory#Composable_operations ). In other words, locks are low-level and should not be used in a high-level language, like direct memory accesses, just because it forces the programmer to think about increasingly complicated situations. And of course there is the background idea that TM might be available in hardware someday. My own guess is that it will occur, and I bet that in 5 to 10 years all new Intel and AMD CPUs will have Hybrid TM. On such hardware, the performance penalty mostly disappears (which is also, I guess, the reasoning behind GCC 4.7, offering a future path to use Hybrid TM). If python-dev people are interested in exploring the language design space in that direction, I would be most happy to look in more detail at GCC 4.7. If we manage to make use of it, then we could get a version of CPython using STM internally with a very minimal patch. If it seems useful we can then turn that patch into #ifdefs into the normal CPython. It would of course be off by default because of the performance hit; still, it would give an optional alternate "CPythonSTM" to play with in order to come up with good user-level abstractions. (This is what I'm already trying to do with PyPy without using GCC 4.7, and it's progressing nicely.) (My existing patch to CPython emulating user-level STM with the GIL is not really satisfying, also for the reason that it cannot emulate some other potentially useful user constructs, like abort_and_retry().) A bient?t, Armin. From g.brandl at gmx.net Thu Dec 1 22:24:54 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 01 Dec 2011 22:24:54 +0100 Subject: [Python-Dev] Warnings In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> Message-ID: Am 01.12.2011 07:10, schrieb Raymond Hettinger: > When updating the documentation, please don't go overboard with warnings. > The docs need to be worded affirmatively -- say what a tool does and show how to > use it correctly. > See http://docs.python.org/documenting/style.html#affirmative-tone > > The docs for the subprocess module currently have SEVEN warning boxes on one page: > http://docs.python.org/library/subprocess.html#module-subprocess > The implicit message is that our tools are hazardous and should be avoided. > > Please show some restraint and aim for clean looking, high-quality technical > writing without the FUD. > > Look at the SQLite3 docs for an example of good writing. The prevention of SQL > injection attacks is discussed briefly and effectively without big red boxes > littering the page. Obviously, +1. Georg From anacrolix at gmail.com Fri Dec 2 06:32:59 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 2 Dec 2011 16:32:59 +1100 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: Armin, thanks for weighing in on this. I'm keen to see a CPython making use of STM, maybe I'll give it a try over Christmas break. I'm willing to take the single threaded performance hit, as I have several applications that degrade due to significant contention with the GIL. The other benefits of STM you describe make it a lot more appealing. I actually tried out Haskell recently to make use of many of the advanced features but came crawling back. If anyone else is keen to try this, I'm happy to receive patches for testing and review. On Thu, Dec 1, 2011 at 10:01 PM, Armin Rigo wrote: > Hi, > > On Thu, Dec 1, 2011 at 07:06, Matt Joiner wrote: >> I saw this, I believe it just exposes an STM primitive to user code. >> It doesn't make use of STM for Python internals. > > That's correct. > >> Explicit STM doesn't seem particularly useful for a language that >> doesn't expose raw memory in its normal usage. > > In my opinion, that sentence could not be more wrong. > > It is true that, as I discuss on the blog post cited a few times in > this thread, the first goal I see is to use STM to replace the GIL as > an internal way of keeping the state of the interpreter consistent. > This could quite possibly be achieved using the new GCC > __transaction_atomic keyword, although I see already annoying issues > (e.g. the keyword can only protect a _syntactically nested_ piece of > code as a transaction). > > However there is another aspect: user-exposed STM, which I didn't > explore much. ?While it is potentially even more important, it is a > language design question, so I'm happy to delegate it to python-dev. > In my opinion, explicit STM (like Clojure) is not only *a* way to > write multithreaded Python programs, but it seems to be *the only* way > that really makes sense in general, for more than small examples and > more than examples where other hacks are enough (see > http://en.wikipedia.org/wiki/Software_transactional_memory#Composable_operations > ). ?In other words, locks are low-level and should not be used in a > high-level language, like direct memory accesses, just because it > forces the programmer to think about increasingly complicated > situations. > > And of course there is the background idea that TM might be available > in hardware someday. ?My own guess is that it will occur, and I bet > that in 5 to 10 years all new Intel and AMD CPUs will have Hybrid TM. > On such hardware, the performance penalty mostly disappears (which is > also, I guess, the reasoning behind GCC 4.7, offering a future path to > use Hybrid TM). > > If python-dev people are interested in exploring the language design > space in that direction, I would be most happy to look in more detail > at GCC 4.7. ?If we manage to make use of it, then we could get a > version of CPython using STM internally with a very minimal patch. ?If > it seems useful we can then turn that patch into #ifdefs into the > normal CPython. ?It would of course be off by default because of the > performance hit; still, it would give an optional alternate > "CPythonSTM" to play with in order to come up with good user-level > abstractions. ?(This is what I'm already trying to do with PyPy > without using GCC 4.7, and it's progressing nicely.) ?(My existing > patch to CPython emulating user-level STM with the GIL is not really > satisfying, also for the reason that it cannot emulate some other > potentially useful user constructs, like abort_and_retry().) > > > A bient?t, > > Armin. -- ?_? From status at bugs.python.org Fri Dec 2 18:07:32 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 2 Dec 2011 18:07:32 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20111202170732.659371CE85@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-11-25 - 2011-12-02) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3148 (+14) closed 22154 (+26) total 25302 (+40) Open issues with patches: 1342 Issues opened (29) ================== #13483: Use VirtualAlloc to allocate memory arenas http://bugs.python.org/issue13483 opened by pitrou #13486: msvc9compiler.py doesn't properly generate manifest files. http://bugs.python.org/issue13486 opened by Jahangir #13491: Fixes for sqlite3 doc http://bugs.python.org/issue13491 opened by Nebelhom #13492: ./configure --with-system-ffi=LIBFFI-PATH http://bugs.python.org/issue13492 opened by michael.kraus #13493: Import error with embedded python on AIX 6.1 http://bugs.python.org/issue13493 opened by python_hu #13494: 'cast' any value to a Boolean? http://bugs.python.org/issue13494 opened by mark.dickinson #13495: IDLE: Regression - Two ColorDelegator instances loaded http://bugs.python.org/issue13495 opened by serwy #13496: bisect module: Overflow at index computation http://bugs.python.org/issue13496 opened by Voo #13497: Fix for broken nice test on non-broken platforms with pedantic http://bugs.python.org/issue13497 opened by yaneurabeya #13498: os.makedirs exist_ok documentation is incorrect, as is some of http://bugs.python.org/issue13498 opened by r.david.murray #13499: uuid documentation example uses invalid REPL/doctest syntax http://bugs.python.org/issue13499 opened by petri.lehtinen #13500: Hitting EOF gets cmd.py into a infinite EOF on return loop http://bugs.python.org/issue13500 opened by yaneurabeya #13501: Make libedit support more generic; port readline / libedit to http://bugs.python.org/issue13501 opened by yaneurabeya #13502: Documentation for Event.wait return value is either wrong or i http://bugs.python.org/issue13502 opened by r.david.murray #13503: improved efficiency of bytearray pickling by using bytes type http://bugs.python.org/issue13503 opened by irmen #13504: Meta-issue for "Invent with Python" IDLE feedback http://bugs.python.org/issue13504 opened by ncoghlan #13505: Bytes objects pickled in 3.x with protocol <=2 are unpickled i http://bugs.python.org/issue13505 opened by pitrou #13506: IDLE sys.path does not contain Current Working Directory http://bugs.python.org/issue13506 opened by MarcoScataglini #13507: Modify OS X installer builds to package liblzma for the new lz http://bugs.python.org/issue13507 opened by ned.deily #13508: ctypes' find_library breaks with ARM ABIs http://bugs.python.org/issue13508 opened by lool #13510: Clarify that readlines() is not needed to iterate over a file http://bugs.python.org/issue13510 opened by potten #13511: ./configure --includedir, --libdir accept multiple http://bugs.python.org/issue13511 opened by rpq #13512: ~/.pypirc created insecurely http://bugs.python.org/issue13512 opened by Vincent.Danen #13513: IOBase docs incorrectly link to the GNU readline module http://bugs.python.org/issue13513 opened by meador.inge #13515: Consistent documentation practices for security concerns and c http://bugs.python.org/issue13515 opened by ncoghlan #13516: Gzip old log files in rotating handlers http://bugs.python.org/issue13516 opened by ramhux #13518: configparser http://bugs.python.org/issue13518 opened by mickeyju #13519: Tkinter rowconfigure and columnconfigure functions crash if mi http://bugs.python.org/issue13519 opened by aoi.leslie #13520: Patch to make pickle aware of __qualname__ http://bugs.python.org/issue13520 opened by sbt Most recent 15 issues with no replies (15) ========================================== #13520: Patch to make pickle aware of __qualname__ http://bugs.python.org/issue13520 #13519: Tkinter rowconfigure and columnconfigure functions crash if mi http://bugs.python.org/issue13519 #13516: Gzip old log files in rotating handlers http://bugs.python.org/issue13516 #13513: IOBase docs incorrectly link to the GNU readline module http://bugs.python.org/issue13513 #13507: Modify OS X installer builds to package liblzma for the new lz http://bugs.python.org/issue13507 #13501: Make libedit support more generic; port readline / libedit to http://bugs.python.org/issue13501 #13499: uuid documentation example uses invalid REPL/doctest syntax http://bugs.python.org/issue13499 #13498: os.makedirs exist_ok documentation is incorrect, as is some of http://bugs.python.org/issue13498 #13495: IDLE: Regression - Two ColorDelegator instances loaded http://bugs.python.org/issue13495 #13478: No documentation for timeit.default_timer http://bugs.python.org/issue13478 #13476: Simple exclusion filter for unittest autodiscovery http://bugs.python.org/issue13476 #13464: HTTPResponse is missing an implementation of readinto http://bugs.python.org/issue13464 #13463: Fix parsing of package_data http://bugs.python.org/issue13463 #13456: Providing a custom HTTPResponse class to HTTPConnection http://bugs.python.org/issue13456 #13438: "Delete patch set" review action doesn't work http://bugs.python.org/issue13438 Most recent 15 issues waiting for review (15) ============================================= #13520: Patch to make pickle aware of __qualname__ http://bugs.python.org/issue13520 #13516: Gzip old log files in rotating handlers http://bugs.python.org/issue13516 #13513: IOBase docs incorrectly link to the GNU readline module http://bugs.python.org/issue13513 #13512: ~/.pypirc created insecurely http://bugs.python.org/issue13512 #13511: ./configure --includedir, --libdir accept multiple http://bugs.python.org/issue13511 #13508: ctypes' find_library breaks with ARM ABIs http://bugs.python.org/issue13508 #13503: improved efficiency of bytearray pickling by using bytes type http://bugs.python.org/issue13503 #13501: Make libedit support more generic; port readline / libedit to http://bugs.python.org/issue13501 #13500: Hitting EOF gets cmd.py into a infinite EOF on return loop http://bugs.python.org/issue13500 #13497: Fix for broken nice test on non-broken platforms with pedantic http://bugs.python.org/issue13497 #13495: IDLE: Regression - Two ColorDelegator instances loaded http://bugs.python.org/issue13495 #13491: Fixes for sqlite3 doc http://bugs.python.org/issue13491 #13486: msvc9compiler.py doesn't properly generate manifest files. http://bugs.python.org/issue13486 #13483: Use VirtualAlloc to allocate memory arenas http://bugs.python.org/issue13483 #13473: Add tests for files byte-compiled by distutils[2] http://bugs.python.org/issue13473 Top 10 most discussed issues (10) ================================= #6715: xz compressor support http://bugs.python.org/issue6715 18 msgs #7652: Merge C version of decimal into py3k. http://bugs.python.org/issue7652 13 msgs #11379: Remove "lightweight" from minidom description http://bugs.python.org/issue11379 13 msgs #1040439: Missing documentation on how to link with libpython http://bugs.python.org/issue1040439 10 msgs #13400: packaging: build command should have options to control byte-c http://bugs.python.org/issue13400 9 msgs #13493: Import error with embedded python on AIX 6.1 http://bugs.python.org/issue13493 9 msgs #12567: curses implementation of Unicode is wrong in Python 3 http://bugs.python.org/issue12567 7 msgs #13475: Add '-p'/'--path0' command line option to override sys.path[0] http://bugs.python.org/issue13475 7 msgs #13496: bisect module: Overflow at index computation http://bugs.python.org/issue13496 7 msgs #13405: Add DTrace probes http://bugs.python.org/issue13405 6 msgs Issues closed (26) ================== #6753: Python 3.1.1 test_cmd_line fails on Fedora 11 http://bugs.python.org/issue6753 closed by haypo #7111: abort when stderr is closed http://bugs.python.org/issue7111 closed by pitrou #8414: Add test cases for assert http://bugs.python.org/issue8414 closed by ezio.melotti #11427: ctypes from_buffer no longer accepts bytes http://bugs.python.org/issue11427 closed by haypo #12307: Inconsistent formatting of section titles in PEP 0 http://bugs.python.org/issue12307 closed by eric.araujo #12618: py_compile cannot create files in current directory http://bugs.python.org/issue12618 closed by meador.inge #12850: [PATCH] stm.atomic http://bugs.python.org/issue12850 closed by arigo #12856: tempfile PRNG reuse between parent and child process http://bugs.python.org/issue12856 closed by pitrou #12945: ctypes works incorrectly with _swappedbytes_ = 1 http://bugs.python.org/issue12945 closed by meador.inge #13380: ctypes: add an internal function for reseting the ctypes cache http://bugs.python.org/issue13380 closed by meador.inge #13434: time.xmlrpc.com dead http://bugs.python.org/issue13434 closed by pitrou #13448: PEP 3155 implementation http://bugs.python.org/issue13448 closed by pitrou #13452: PyUnicode_EncodeDecimal: reject error handlers different than http://bugs.python.org/issue13452 closed by haypo #13467: Typo in doc for library/sysconfig http://bugs.python.org/issue13467 closed by eric.araujo #13471: setting access time beyond Jan. 2038 on remote share failes on http://bugs.python.org/issue13471 closed by Thorsten.Simons #13481: Use an accurate clock in timeit http://bugs.python.org/issue13481 closed by pitrou #13482: _tkinter.TclError: invalid command name "tixDirSelectBox" http://bugs.python.org/issue13482 closed by Martin.Unzner #13484: mail rejected: tutor at python.org http://bugs.python.org/issue13484 closed by eric.araujo #13485: tcl question http://bugs.python.org/issue13485 closed by amaury.forgeotdarc #13487: inspect.getmodule fails when module imports change sys.modules http://bugs.python.org/issue13487 closed by eric.araujo #13488: Some old preprocessors have problem with "#define" not in the http://bugs.python.org/issue13488 closed by jcea #13489: collections.Counter doc does not list added version http://bugs.python.org/issue13489 closed by ezio.melotti #13490: broken downloads counting on pypi.python.org http://bugs.python.org/issue13490 closed by loewis #13509: On uninstallation, distutils bdist_wininst fails to run post i http://bugs.python.org/issue13509 closed by eric.araujo #13514: PIL does not support iTXt PNG chunks [patch] http://bugs.python.org/issue13514 closed by ezio.melotti #13517: readdir() in os.listdir not threadsafe on OSX 10.6.8 http://bugs.python.org/issue13517 closed by thouis From solipsis at pitrou.net Sat Dec 3 21:39:03 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 3 Dec 2011 21:39:03 +0100 Subject: [Python-Dev] Style guide for FAQs? Message-ID: <20111203213903.1ebfe7c5@pitrou.net> Hello, I notice that some FAQs are not only outdated but seem to favour a writing style that's quite lengthy and full of anecdotal details. It seems to me that there is value in giving terse answers in FAQs (we have - or should have - reference documentation where things are explained in more detail). One primary example is the performance question: file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up It mixes a couple of generalities with incredibly specific suggestions such as early binding of methods or use of default argument values to fold constants. I think a beginner reading this entry won't get any meaningful information out of it. Any advice on whether it's ok to hack and slash into the fat? :) Regards Antoine. From solipsis at pitrou.net Sat Dec 3 21:58:01 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 3 Dec 2011 21:58:01 +0100 Subject: [Python-Dev] Style guide for FAQs? References: <20111203213903.1ebfe7c5@pitrou.net> Message-ID: <20111203215801.74ea1209@pitrou.net> On Sat, 3 Dec 2011 21:39:03 +0100 Antoine Pitrou wrote: > > One primary example is the performance question: > file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up Woohoo. This should of course be: http://docs.python.org/dev/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up cheers Antoine. From tjreedy at udel.edu Sun Dec 4 03:55:35 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 03 Dec 2011 21:55:35 -0500 Subject: [Python-Dev] Style guide for FAQs? In-Reply-To: <20111203215801.74ea1209@pitrou.net> References: <20111203213903.1ebfe7c5@pitrou.net> <20111203215801.74ea1209@pitrou.net> Message-ID: On 12/3/2011 3:58 PM, Antoine Pitrou wrote: > On Sat, 3 Dec 2011 21:39:03 +0100 > Antoine Pitrou wrote: >> >> One primary example is the performance question: >> file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up > > Woohoo. This should of course be: > http://docs.python.org/dev/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up That looks like a mini-howto ;-), rather than a FAQ entry. The changes you have made so far have looked good to me. -- Terry Jan Reedy From ncoghlan at gmail.com Sun Dec 4 05:11:58 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 4 Dec 2011 14:11:58 +1000 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Issue #13211: Add .reason attribute to HTTPError to implement parent class In-Reply-To: References: Message-ID: On Sun, Dec 4, 2011 at 12:46 AM, jason.coombs wrote: > +def test_HTTPError_interface(): > + ? ?""" > + ? ?Issue 13211 reveals that HTTPError didn't implement the URLError > + ? ?interface even though HTTPError is a subclass of URLError. > + > + ? ?>>> err = urllib.error.HTTPError(msg='something bad happened', url=None, code=None, hdrs=None, fp=None) > + ? ?>>> assert hasattr(err, 'reason') > + ? ?>>> err.reason > + ? ?'something bad happened' > + ? ?""" > + Did you re-run the test suite after forward-porting to 3.3? I'm consistently getting failures: $ ./python -m test test_urllib2 [1/1] test_urllib2 ********************************************************************** File "/home/ncoghlan/devel/py3k/Lib/test/test_urllib2.py", line 1457, in test.test_urllib2.test_HTTPError_interface Failed example: err = urllib.error.HTTPError(msg='something bad happened', url=None, code=None, hdrs=None, fp=None) Exception raised: Traceback (most recent call last): File "/home/ncoghlan/devel/py3k/Lib/doctest.py", line 1253, in __run compileflags, 1), test.globs) File "", line 1, in err = urllib.error.HTTPError(msg='something bad happened', url=None, code=None, hdrs=None, fp=None) TypeError: HTTPError does not take keyword arguments ********************************************************************** File "/home/ncoghlan/devel/py3k/Lib/test/test_urllib2.py", line 1458, in test.test_urllib2.test_HTTPError_interface Failed example: assert hasattr(err, 'reason') Exception raised: Traceback (most recent call last): File "/home/ncoghlan/devel/py3k/Lib/doctest.py", line 1253, in __run compileflags, 1), test.globs) File "", line 1, in assert hasattr(err, 'reason') NameError: name 'err' is not defined ********************************************************************** File "/home/ncoghlan/devel/py3k/Lib/test/test_urllib2.py", line 1459, in test.test_urllib2.test_HTTPError_interface Failed example: err.reason Exception raised: Traceback (most recent call last): File "/home/ncoghlan/devel/py3k/Lib/doctest.py", line 1253, in __run compileflags, 1), test.globs) File "", line 1, in err.reason NameError: name 'err' is not defined ********************************************************************** 1 items had failures: 3 of 3 in test.test_urllib2.test_HTTPError_interface ***Test Failed*** 3 failures. test test_urllib2 failed -- 3 of 65 doctests failed 1 test failed: test_urllib2 [142313 refs] Now, this failure is quite possibly due to a flaw in the PEP 3151 implementation (see http://bugs.python.org/issue12555), but picking up this kind of thing is the reason we say to always run the tests before committing, even for a simple merge. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From g.brandl at gmx.net Sun Dec 4 09:42:23 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 04 Dec 2011 09:42:23 +0100 Subject: [Python-Dev] Style guide for FAQs? In-Reply-To: References: <20111203213903.1ebfe7c5@pitrou.net> <20111203215801.74ea1209@pitrou.net> Message-ID: Am 04.12.2011 03:55, schrieb Terry Reedy: > On 12/3/2011 3:58 PM, Antoine Pitrou wrote: >> On Sat, 3 Dec 2011 21:39:03 +0100 >> Antoine Pitrou wrote: >>> >>> One primary example is the performance question: >>> file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up >> >> Woohoo. This should of course be: >> http://docs.python.org/dev/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up > > That looks like a mini-howto ;-), > rather than a FAQ entry. > > The changes you have made so far have looked good to me. Definitely. Georg From martin at v.loewis.de Sun Dec 4 10:56:06 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 04 Dec 2011 10:56:06 +0100 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: <4EDB43B6.1080501@v.loewis.de> > However given advances in locking and garbage collection in the last > decade, what attempts have been made recently to try these new ideas > out? If that's the question you want an answer to, it would have been better had you listed the efforts that you are already aware of. If you really are unaware of any effort, try googling to find http://www.kamaelia.org/STM http://peak.telecommunity.com/DevCenter/TrellisSTM http://bugs.python.org/issue12850 http://dl.acm.org/citation.cfm?id=1978911 http://www-sal.cs.uiuc.edu/~zilles/papers/python_htm.dls2006.pdf and more Regards, Martin From mail at timgolden.me.uk Sun Dec 4 11:59:13 2011 From: mail at timgolden.me.uk (Tim Golden) Date: Sun, 04 Dec 2011 10:59:13 +0000 Subject: [Python-Dev] Issue 13524: subprocess on Windows Message-ID: <4EDB5281.8040807@timgolden.me.uk> http://bugs.python.org/issue13524 Someone raised issue13524 yesterday to illustrate that a subprocess will crash immediately if an environment block is passed which does not contain a valid SystemRoot environment variable. Note that the calling (Python) process is unaffected; this isn't - strictly - a Python crash. The issue is essentially a Windows one where a fairly unusual cornercase -- passing an empty environment -- has unforseen effects. The smallest reproducible example is this: import os, sys import subprocess subprocess.Popen( [sys.executable], env={} ) and it can be prevented like this: import os, sys import subprocess subprocess.Popen( [sys.executable], env={"SystemRoot" : os.environ['SystemRoot']} ) There's a blog post here which gives a worked example: http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/ but as the author points out, nowhere on MSDN is there a warning that SystemRoot is mandatory. (And, in effect, it's not as it would just be possible to write code which had no need of it). So... what's our take on this? As I see it we could: 1) Do nothing: it's the caller's responsibility to understand the complications of the chosen Operating System. 2) Add a doc warning (ironically, considering the recent to-and-fro on doc warnings in this very module). 3) Add a check into the subprocess.Popen code which would raise some exception if the environment block is empty (or doesn't contain SystemRoot) on the grounds that this probably wasn't what the user thought they were doing. 4) Automatically add an entry for SystemRoot to the env block if it's not present already. It's tempting to opt for (1) and if we were exposing an API called CreateProcess which mimicked the underlying Windows API I would be inclined to go that way. But we're abstracting a little bit away from that and I think that that layer of abstraction carries its own responsibilities. Option (3) seems to give the best balance. It *is* a cornercase, but at the same time it's easy to misunderstand that the env block you're passing in *replaces* rather than *augments* that of the current process. Thoughts? TJG From ncoghlan at gmail.com Sun Dec 4 12:42:14 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 4 Dec 2011 21:42:14 +1000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDB5281.8040807@timgolden.me.uk> References: <4EDB5281.8040807@timgolden.me.uk> Message-ID: On Sun, Dec 4, 2011 at 8:59 PM, Tim Golden wrote: > So... what's our take on this? As I see it we could: > > 1) Do nothing: it's the caller's responsibility to understand the > ? complications of the chosen Operating System. > > 2) Add a doc warning (ironically, considering the recent to-and-fro > ? on doc warnings in this very module). > > 3) Add a check into the subprocess.Popen code which would raise some > ? exception if the environment block is empty (or doesn't contain > ? SystemRoot) on the grounds that this probably wasn't what the user > ? thought they were doing. > > 4) Automatically add an entry for SystemRoot to the env block if it's > ? not present already. > > > It's tempting to opt for (1) and if we were exposing an API called > CreateProcess which mimicked the underlying Windows API I would be > inclined to go that way. But we're abstracting a little bit away > from that and I think that that layer of abstraction carries its > own responsibilities. > > Option (3) seems to give the best balance. It *is* a cornercase, but at > the same time it's easy to misunderstand that the env block you're > passing in *replaces* rather than *augments* that of the current > process. There's actually two questions to be answered: 1. What should we do in 3.2 and 2.7? 2. Should we do anything more in 3.3? Raising an exception is not really an appropriate response for any of them - running without SystemRoot actually works fine in most cases, so raising an exception could break currently working code. As the blog post noted, it's only some specific modules that don't work if SystemRoot is not set. Should we really be inserting workarounds in subprocess for buggy platform code that doesn't fall back to a sensible default if a particular environment variable isn't set? So, I don't think this is really a subprocess problem at all. It's a platform bug on Windows that means the 'random' module may fail if SystemRoot is not set in the environment. So, I think the right approach is to: 1. Unset 'SystemRoot' in a windows shell 2. Run the test suite and observe the scale of the breakage 3. Then either: - figure out a workaround that allows us to set an appropriate default value for SystemRoot if needed (depending on the scope of the problem, either do this at interpreter startup, or only in affected modules) - if no feasible workaround is found, detect the failures related to this problem and report a more meaningful error message Either way, add explicit tests to the test suite to ensure that affected modules behave as expected when SystemRoot is not set. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mail at timgolden.me.uk Sun Dec 4 13:20:11 2011 From: mail at timgolden.me.uk (Tim Golden) Date: Sun, 04 Dec 2011 12:20:11 +0000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: References: <4EDB5281.8040807@timgolden.me.uk> Message-ID: <4EDB657B.3030105@timgolden.me.uk> On 04/12/2011 11:42, Nick Coghlan wrote: > There's actually two questions to be answered: > 1. What should we do in 3.2 and 2.7? > 2. Should we do anything more in 3.3? Agreed. > 1. Unset 'SystemRoot' in a windows shell > 2. Run the test suite and observe the scale of the breakage Sorry; something I should have highlighted in the earlier post. Behaviour varies between Windows versions. On WinXP, if you unset SystemRoot in a cmd shell, you won't be able to run the test suite: Python won't even start up. On Win7 Python will start but, eg, the random module will fail. This is actually a separate issue: how much of Python will work without a valid SystemRoot. The OP's issue was that if you use subprocess to start an arbitrary process (you get the same problem if you try "notepad.exe") and pass it an env block without a valid SystemRoot then that process will likely fail to start up. And it won't be obvious why. The case where someone tries to run Python (in general) without a valid SystemRoot is a tiny cornercase and you'd be quite right to push that back and say "Don't do that". I don't believe we have to test for it or add code to work around it. While I put the idea forward, I agree that an exception is more likely than not to break existing code. I just can't see any clear alternative, apart from option 1: we do nothing. TJG From p.f.moore at gmail.com Sun Dec 4 13:41:46 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 4 Dec 2011 12:41:46 +0000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDB657B.3030105@timgolden.me.uk> References: <4EDB5281.8040807@timgolden.me.uk> <4EDB657B.3030105@timgolden.me.uk> Message-ID: On 4 December 2011 12:20, Tim Golden wrote: > On 04/12/2011 11:42, Nick Coghlan wrote: >> >> There's actually two questions to be answered: >> 1. What should we do in 3.2 and 2.7? >> 2. Should we do anything more in 3.3? See below... > This is actually a separate issue: how much of Python will work > without a valid SystemRoot. The OP's issue was that if you use > subprocess to start an arbitrary process (you get the same problem > if you try "notepad.exe") and pass it an env block without a valid > SystemRoot then that process will likely fail to start up. And it > won't be obvious why. I'm not 100% clear on the problem here. From how I'm reading things, the problem is that not supplying SystemRoot will cause (some or all) invocations of subprocess.Popen to fail - it's not specific to starting Python. In that case, it seems to me that it's an OS issue, but one that we should work around. My feeling is that option 4 is best - set SystemRoot to its current value if it's not been set by the user. This leaves the user unable to set an environment with SystemRoot missing, but if the OS fails to handle that properly, then I'm OK with that limitation. As regards the version question above, I'd take the view that as an OS issue, it's OK to leave it unchanged in 2.7 and 3.2, but add the above to 3.3. Paul. From mail at timgolden.me.uk Sun Dec 4 15:08:36 2011 From: mail at timgolden.me.uk (Tim Golden) Date: Sun, 04 Dec 2011 14:08:36 +0000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: References: <4EDB5281.8040807@timgolden.me.uk> <4EDB657B.3030105@timgolden.me.uk> Message-ID: <4EDB7EE4.3030403@timgolden.me.uk> On 04/12/2011 12:41, Paul Moore wrote: > I'm not 100% clear on the problem here. From how I'm reading things, > the problem is that not supplying SystemRoot will cause (some or all) > invocations of subprocess.Popen to fail - it's not specific to > starting Python. That's basically the situation. > > My feeling is that option 4 is best - set SystemRoot to its current > value if it's not been set by the user. This leaves the user unable to > set an environment with SystemRoot missing, but if the OS fails to > handle that properly, then I'm OK with that limitation. FWIW if we went this route we could set it if it's missing but that still allows the user to set it to blank. I'm just a little bit wary of altering the environment which the user believes has been set. TJG From martin.packman at canonical.com Sun Dec 4 17:48:16 2011 From: martin.packman at canonical.com (Martin Packman) Date: Sun, 4 Dec 2011 16:48:16 +0000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDB5281.8040807@timgolden.me.uk> References: <4EDB5281.8040807@timgolden.me.uk> Message-ID: On 04/12/2011, Tim Golden wrote: > > Someone raised issue13524 yesterday to illustrate that a > subprocess will crash immediately if an environment block is > passed which does not contain a valid SystemRoot environment > variable. ... > 2) Add a doc warning (ironically, considering the recent to-and-fro > on doc warnings in this very module). There appears to already be such a warning, added because of a similar earlier bug: Really this is a problem with the subprocess api making a common case harder to do than necessary. If you read the documentation, you'll get it right, but that's not ideal: >From the bug, the problem with the reporter's code is he passes a dict with the one value he cares about as `env` to subprocess.Popen without realising that it will prevent the inheriting of the current environment. Your suggested fix for him also has an issue, it changes the environment of the parent process without resetting it. Instead you need something like: e = dict(os.environ) e['PATH_TO_MY_APPS'] = "path/to/my/apps" The bzrlib TestCase has a method using subprocess that provides an `env_changes` argument. With that, it's much easier to override or remove just one variable without accidentally clearing the current environment. Martin From ncoghlan at gmail.com Sun Dec 4 21:52:14 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 5 Dec 2011 06:52:14 +1000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDB657B.3030105@timgolden.me.uk> References: <4EDB5281.8040807@timgolden.me.uk> <4EDB657B.3030105@timgolden.me.uk> Message-ID: That's why I'm suggesting we look specifically at the cases where *Python* misbehaves in an empty environment on Windows. Those are legitimately our issue. The problem in *general* is a platform one, so I don't think it makes sense for us to modify the environment that has explicitly been passed in (e.g. how would you test running without SystemRoot if subprocess added it automatically?). An extra parameter in the already confusing Popen signature wouldn't be clearer than explicitly copying os.environ and modifying it. -- Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) On Dec 4, 2011 10:22 PM, "Tim Golden" wrote: > On 04/12/2011 11:42, Nick Coghlan wrote: > >> There's actually two questions to be answered: >> 1. What should we do in 3.2 and 2.7? >> 2. Should we do anything more in 3.3? >> > > Agreed. > > 1. Unset 'SystemRoot' in a windows shell >> 2. Run the test suite and observe the scale of the breakage >> > > Sorry; something I should have highlighted in the earlier post. > Behaviour varies between Windows versions. On WinXP, if you > unset SystemRoot in a cmd shell, you won't be able to run the > test suite: Python won't even start up. On Win7 Python will > start but, eg, the random module will fail. > > This is actually a separate issue: how much of Python will work > without a valid SystemRoot. The OP's issue was that if you use > subprocess to start an arbitrary process (you get the same problem > if you try "notepad.exe") and pass it an env block without a valid > SystemRoot then that process will likely fail to start up. And it > won't be obvious why. > > The case where someone tries to run Python (in general) without > a valid SystemRoot is a tiny cornercase and you'd be quite right > to push that back and say "Don't do that". I don't believe we have > to test for it or add code to work around it. > > While I put the idea forward, I agree that an exception is more likely > than not to break existing code. I just can't see any clear alternative, > apart from option 1: we do nothing. > > TJG > ______________________________**_________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/**mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Dec 4 22:08:33 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 04 Dec 2011 16:08:33 -0500 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDB5281.8040807@timgolden.me.uk> References: <4EDB5281.8040807@timgolden.me.uk> Message-ID: On 12/4/2011 5:59 AM, Tim Golden wrote: > http://bugs.python.org/issue13524 > > Someone raised issue13524 yesterday to illustrate that a > subprocess will crash immediately if an environment block is > passed which does not contain a valid SystemRoot environment > variable. > > Note that the calling (Python) process is unaffected; this > isn't - strictly - a Python crash. The issue is essentially > a Windows one where a fairly unusual cornercase -- passing > an empty environment -- has unforseen effects. > > The smallest reproducible example is this: > > import os, sys > import subprocess > subprocess.Popen( > [sys.executable], > env={} > ) > > and it can be prevented like this: > > import os, sys > import subprocess > subprocess.Popen( > [sys.executable], > env={"SystemRoot" : os.environ['SystemRoot']} > ) > > There's a blog post here which gives a worked example: > > > http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/ > > > but as the author points out, nowhere on MSDN is there a warning > that SystemRoot is mandatory. (And, in effect, it's not as it > would just be possible to write code which had no need of it). > > So... what's our take on this? As I see it we could: > > 1) Do nothing: it's the caller's responsibility to understand the > complications of the chosen Operating System. > > 2) Add a doc warning (ironically, considering the recent to-and-fro > on doc warnings in this very module). > > 3) Add a check into the subprocess.Popen code which would raise some > exception if the environment block is empty (or doesn't contain > SystemRoot) on the grounds that this probably wasn't what the user > thought they were doing. > > 4) Automatically add an entry for SystemRoot to the env block if it's > not present already. > > > It's tempting to opt for (1) and if we were exposing an API called > CreateProcess which mimicked the underlying Windows API I would be > inclined to go that way. But we're abstracting a little bit away > from that and I think that that layer of abstraction carries its > own responsibilities. > > Option (3) seems to give the best balance. It *is* a cornercase, but at > the same time it's easy to misunderstand that the env block you're > passing in *replaces* rather than *augments* that of the current > process. > > Thoughts? My inclination would be #4 on Windows, certainly for 3.3, unless there is a clear reason not to. For 2.7/3.2, at least say (not warn, just say) in the doc that that a subprocess on Windows may require that SystemRoot be set. The blog post says the problem is worse on Win 7. So it is not going away. The blog post has a comment from Martin Loewis a year ago linking to http://mail.python.org/pipermail/python-dev/2010-November/105866.html That thread refers to a bug that was not posted on the tracker. This makes at least three (including #3440). -- Terry Jan Reedy From ncoghlan at gmail.com Mon Dec 5 01:16:01 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 5 Dec 2011 10:16:01 +1000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: References: <4EDB5281.8040807@timgolden.me.uk> Message-ID: On Mon, Dec 5, 2011 at 7:08 AM, Terry Reedy wrote: > My inclination would be #4 on Windows, certainly for 3.3, unless there is a > clear reason not to. Yes, there is: that environment is the *exact* environment that should be passed to the child processes. It's not our place to go implicitly adding things to it. If MS aren't willing to add SystemRoot automatically in CreateProcess (despite releasing libraries that require it to be set), there's no way we should be adding it for them. Fixing our stuff (like importing the random module) to work to at least some degree even if SystemRoot isn't set should definitely be done, but beyond that a comment in the docs pointing out the problem (i.e. MS releasing things that require SystemRoot be set without updating CreateProcess to ensure that it *is* set) is as far as we should go. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Mon Dec 5 09:10:51 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 05 Dec 2011 09:10:51 +0100 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDB5281.8040807@timgolden.me.uk> References: <4EDB5281.8040807@timgolden.me.uk> Message-ID: <4EDC7C8B.6040007@v.loewis.de> > Thoughts? Apparently, there are at least two "users" of SystemRoot: - side-by-side (fusion?) apparently uses it to locate the WinSxS folder, at least on some Windows releases, - certain registry keys contain SystemRoot, in particular the path names of crypto providers (this apparently is XP only, and fixed on Windows 7) I agree with Nick that we shouldn't do anything except perhaps for documentation changes. There are many other environment variables whose absence could also cause failures to run the executable, such as PATH, LD_LIBRARY_PATH, etc. Even not passing DISPLAY may cause the subprocess to fail starting. IOW, users should "normally" pass all environment variables, and only augment it with any specific additions and deletions that they know are needed for the subprocess. If a user deliberately passes a small set of environment variables (e.g. none), we must assume that it was deliberate, and that any resulting failures are desired. People do such stuff for security reasons, and side-stepping their enforcement is not appropriate for Python to do. Regards, Martin From mail at timgolden.me.uk Mon Dec 5 10:01:17 2011 From: mail at timgolden.me.uk (Tim Golden) Date: Mon, 05 Dec 2011 09:01:17 +0000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDC7C8B.6040007@v.loewis.de> References: <4EDB5281.8040807@timgolden.me.uk> <4EDC7C8B.6040007@v.loewis.de> Message-ID: <4EDC885D.5030708@timgolden.me.uk> On 05/12/2011 08:10, "Martin v. L?wis" wrote: > I agree with Nick that we shouldn't do anything except perhaps > for documentation changes. There are many other environment variables > whose absence could also cause failures to run the executable, > such as PATH, LD_LIBRARY_PATH, etc. Even not passing DISPLAY may > cause the subprocess to fail starting. > > IOW, users should "normally" pass all environment variables, and > only augment it with any specific additions and deletions that > they know are needed for the subprocess. If a user deliberately > passes a small set of environment variables (e.g. none), we must > assume that it was deliberate, and that any resulting failures > are desired. People do such stuff for security reasons, and > side-stepping their enforcement is not appropriate for Python > to do. Having slept on this I must confess that this is pretty much the conclusion I'd come to: we can't do anything in code which is guaranteed to be correct in every case. The best we can do is document. And, as Martin Packman pointed out (and I had missed), this particular condition is already documented, at least enough to point a user to. We could probably do with a HOWTO (or blog post or whatever) on using subprocess on Windows, not least because a fair amount of the docs are Unix-centric and actually very slightly confusing for naive Windows-based developers. I think my proposal now is: do nothing. I'm aware that Nick Coghlan has been making fairly extensive changes to the subprocess docs recently and I don't I can propose anything on this matter which amounts to more than shuffling the pieces around. TJG From ncoghlan at gmail.com Mon Dec 5 10:41:18 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 5 Dec 2011 19:41:18 +1000 Subject: [Python-Dev] Issue 13524: subprocess on Windows In-Reply-To: <4EDC885D.5030708@timgolden.me.uk> References: <4EDB5281.8040807@timgolden.me.uk> <4EDC7C8B.6040007@v.loewis.de> <4EDC885D.5030708@timgolden.me.uk> Message-ID: On Mon, Dec 5, 2011 at 7:01 PM, Tim Golden wrote: > We could probably do with a HOWTO (or blog post or whatever) on using > subprocess on Windows, not least because a fair amount of the docs > are Unix-centric and actually very slightly confusing for naive > Windows-based developers. > > I think my proposal now is: do nothing. I'm aware that Nick Coghlan > has been making fairly extensive changes to the subprocess docs > recently and I don't I can propose anything on this matter which > amounts to more than shuffling the pieces around. The subprocess module could probably do with a HOWTO, full stop. Subprocess invocation is something where platform details are always going to matter a lot, and there are subtle details even on Unix that are confusing (e.g. I have a command in my current project that I've only managed to get working by running it via the shell - I still don't know why direct invocation of the binary with the appropriate arguments doesn't work). At the moment, we're still trying to cram an entire essay on subprocess invocation into the subprocess.Popen constructor definition, which is far from optimal. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mathieu.malaterre at gmail.com Mon Dec 5 16:26:50 2011 From: mathieu.malaterre at gmail.com (Mathieu Malaterre) Date: Mon, 5 Dec 2011 16:26:50 +0100 Subject: [Python-Dev] ImportError: No module named multiarray (is back) In-Reply-To: <4ED127C5.1060004@in.waw.pl> References: <4ECBFF19.8080100@in.waw.pl> <4ECD1D31.7080802@netwok.org> <4ED127C5.1060004@in.waw.pl> Message-ID: Hi Zbyszek, See below my comment. 2011/11/26 Zbigniew J?drzejewski-Szmek : > Hi, > I apologize in advance for the length of this mail. > > sys.path > ======== > When a script or a module is executed by invoking python with proper > arguments, sys.path is extended. When a path to script is given, the > directory containing the script is prepended. When '-m' or '-c' is used, > $CWD is prepended. This is documented in > http://docs.python.org/dev/using/cmdline.html, so far ok. > > sys.path and $PYTHONPATH is like $PATH -- if you can convince someone to put > a directory under your control in any of them, you can execute code as this > someone. Therefore, sys.path is dangerous and important. Unfortunately, > sys.path manipulations are only described very briefly, and without any > commentary, in the on-line documentation. python(1) manpage doesn't even > mention them. > > The problem: each of the commands below is insecure: > > python /tmp/script.py ? ? ? ? ? ? ? ? (when script.py is safe by itself) > ? ? ? ?('/tmp' is added to sys.path, so an attacker can override any > ? ? ? ? module imported in /tmp/script.py by writing to /tmp/module.py) > > cd /tmp && python -mtimeit -s 'import numpy' 'numpy.test()' > ? ? ? ?(UNIX users are accustomed to being able to safely execute > ? ? ? ? programs in any directory, e.g. ls, or gcc, or something. > > ? ? ? ? Here '' is added to sys.path, so it is not secure to run > ? ? ? ? python is other-user-writable directories.) > > cd /tmp/ && python -c 'import numpy; print(numpy.version.version)' > ? ? ? ? (The same as above, '' is added to sys.path.) > > cd /tmp && python > ? ? ? ? (The same as above). > > IMHO, if this (long-lived) behaviour is necessary, it should at least be > prominently documented. Also in the manpage. > > Prepending realpath(dirname(scriptname)) > ======================================== > Before adding a directory to sys.path as described above, Python actually > runs os.path.realpath over it. This means that if the path to a script given > on the commandline is actually a symlink, the directory containing the real > file will be executed. This behaviour is not really documented (the > documentation only says "the directory containing that file is added to the > start of sys.path"), but since the integrity of sys.path is so important, it > should be, IMHO. > > Using realpath instead of the (expected) path specified by the user breaks > imports of non-pure-python (mixed .py and .so) modules from modules executed > as scripts on Debian. This is because Debian installs > architecture-independent python files in /usr/share/pyshared, and symlinks > those files into /usr/lib/pymodules/pythonX.Y/. The architecture-dependent > .so and python-version-dependent .pyc files are installed in > ?/usr/lib/pymodules/pythonX.Y/. When a script, e.g. > /usr/lib/pymodules/pythonX.Y/script.py, is executed, the directory > /usr/share/pyshared is prepended to sys.path. If the script tries to import > a module which has architecture-dependent parts (e.g. numpy) it first sees > the incomplete module in /usr/share/pyshared and fails. > > This happens for example in parallel python > (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620551) and recently when > packaging CellProfiler for Debian. > > Again, if this is on purpose, it should be documented. > > PEP 395 (Qualified Names for Modules) > ===================================== > > PEP 395 proposes another sys.path manipulation. When running a script, the > directory tree will be walked upwards as long as there are __init__.py > files, and then the first directory without will be added. > > This is of course a fine idea, but it makes a scenario, which was previously > safe, insecure. More precisely, when executing a script in a directory in a > parent directory-writable-by-other-users, the parent directory will be added > to sys.path. > > So the (safe) operation of downloading an archive with a package, unzipping > it in /tmp, changing into the created directory, checking that the script > doesn't do anything bad, and running a script is now insecure if there is > __init__.py in the archive root. > > > I guess that it would be useful to have an option to turn off those sys.path > manipulations. Thanks very much for the details explanation. Given this, I believe I can safely give up on CellProfiler packaging until this issue is addressed upstream (either in CellProfiler using an indirection, or in python). Thanks, -- Mathieu From arigo at tunes.org Tue Dec 6 10:55:58 2011 From: arigo at tunes.org (Armin Rigo) Date: Tue, 6 Dec 2011 10:55:58 +0100 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: Hi, Actually, not even one month ago, Intel announced that its processors will offer Hardware Transactional Memory in 2013: http://www.h-online.com/newsticker/news/item/Processor-Whispers-About-Haskell-and-Haswell-1389507.html So yes, obviously, it's going to happen. A bient?t, Armin. From anacrolix at gmail.com Tue Dec 6 13:28:42 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Tue, 6 Dec 2011 23:28:42 +1100 Subject: [Python-Dev] STM and python In-Reply-To: References: Message-ID: This is very interesting, cheers for the link. On Tue, Dec 6, 2011 at 8:55 PM, Armin Rigo wrote: > Hi, > > Actually, not even one month ago, Intel announced that its processors > will offer Hardware Transactional Memory in 2013: > > http://www.h-online.com/newsticker/news/item/Processor-Whispers-About-Haskell-and-Haswell-1389507.html > > So yes, obviously, it's going to happen. > > > A bient?t, > > Armin. -- ?_? From jaraco at jaraco.com Tue Dec 6 23:34:07 2011 From: jaraco at jaraco.com (Jason R. Coombs) Date: Tue, 6 Dec 2011 22:34:07 +0000 Subject: [Python-Dev] [Python-checkins] cpython (2.7): PDB now will properly escape backslashes in the names of modules it executes. In-Reply-To: <4EC67559.90409@netwok.org> References: <4EC67559.90409@netwok.org> Message-ID: <7E79234E600438479EC119BD241B48D6A246E8@CH1PRD0602MB098.namprd06.prod.outlook.com> ?ric, These are all good suggestions. I'll make them at some point. Thanks. > -----Original Message----- > From: python-dev-bounces+jaraco=jaraco.com at python.org [mailto:python- > dev-bounces+jaraco=jaraco.com at python.org] On Behalf Of ?ric Araujo > Sent: Friday, 18 November, 2011 10:10 > To: python-dev at python.org > Subject: Re: [Python-Dev] [Python-checkins] cpython (2.7): PDB now will > properly escape backslashes in the names of modules it executes. > > Hi Jason, > > > http://hg.python.org/cpython/rev/f7dd5178f36a > > branch: 2.7 > > user: Jason R. Coombs > > date: Thu Nov 17 18:03:24 2011 -0500 > > summary: > > PDB now will properly escape backslashes in the names of modules it > > executes. Fixes #7750 > > > diff --git a/Lib/test/test_pdb.py b/Lib/test/test_pdb.py > > +class Tester7750(unittest.TestCase): > I think we have an unwritten rule that test class and method names should > tell something about what they test. (We do have things like TestWeirdBugs > and test_12345, but I don?t think it?s a useful pattern to follow :) Not a big > deal anyway. > > > + # if the filename has something that resolves to a python > > + # escape character (such as \t), it will fail > > + test_fn = '.\\test7750.py' > > + > > + msg = "issue7750 only applies when os.sep is a backslash" > > + @unittest.skipUnless(os.path.sep == '\\', msg) > > + def test_issue7750(self): > > + with open(self.test_fn, 'w') as f: > > + f.write('print("hello world")') > > + cmd = [sys.executable, '-m', 'pdb', self.test_fn,] > > + proc = subprocess.Popen(cmd, > > + stdout=subprocess.PIPE, > > + stdin=subprocess.PIPE, > > + stderr=subprocess.STDOUT, > > + ) > > + stdout, stderr = proc.communicate('quit\n') > > + self.assertNotIn('IOError', stdout, "pdb munged the > > + filename") > Why not check for assertIn(filename, stdout)? (In other words, check for > intended behavior rather than implementation of the erstwhile bug.) > > BTW, I?ve just tested that giving a message argument to assertNotIn (the > third argument), unittest still displays the other arguments to allow for easier > debugging. I didn?t know that, it?s cool! > > > + def tearDown(self): > > + if os.path.isfile(self.test_fn): > > + os.remove(self.test_fn) > In my own tests, I?ve become fond of using ?self.addCleanup(os.remove, > filename)?: It?s shorter that a tearDown and is right there on the line that > follows or precedes the file creation. > > > if __name__ == '__main__': > > test_main() > > + unittest.main() > This looks strange. > > Regards > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python- > dev/jaraco%40jaraco.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6662 bytes Desc: not available URL: From cs at zip.com.au Wed Dec 7 02:23:12 2011 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 7 Dec 2011 12:23:12 +1100 Subject: [Python-Dev] Warnings In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> Message-ID: <20111207012312.GA7566@cskk.homeip.net> On 30Nov2011 22:10, Raymond Hettinger wrote: | When updating the documentation, please don't go overboard with warnings. | The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly. | See http://docs.python.org/documenting/style.html#affirmative-tone I come to this late, but if we're going after the docs... At the above link one finds this text: This assures that files are flushed [...] It does not. It _ensures_ that files are flushed. The doco style "affirmative tone" _assures_. The coding practice _ensures_! Pedanticly, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ There is one evil which...should never be passed over in silence but be continually publicly attacked, and that is corruption of the language... - W.H. Auden From raymond.hettinger at gmail.com Wed Dec 7 07:40:31 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Wed, 7 Dec 2011 00:40:31 -0600 Subject: [Python-Dev] Warnings In-Reply-To: <20111207012312.GA7566@cskk.homeip.net> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <20111207012312.GA7566@cskk.homeip.net> Message-ID: <9B227A4E-9788-4E6D-B415-0DF7CED47455@gmail.com> On Dec 6, 2011, at 7:23 PM, Cameron Simpson wrote: > This assures that files are flushed [...] > > It does not. It _ensures_ that files are flushed. The doco style "affirmative > tone" _assures_. The coding practice _ensures_! > > Pedanticly, > -- > Cameron Simpson I can assure you that I've ensured that you're fully insured ;-) Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Wed Dec 7 19:22:56 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 07 Dec 2011 19:22:56 +0100 Subject: [Python-Dev] Warnings In-Reply-To: <20111207012312.GA7566@cskk.homeip.net> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <20111207012312.GA7566@cskk.homeip.net> Message-ID: Am 07.12.2011 02:23, schrieb Cameron Simpson: > On 30Nov2011 22:10, Raymond Hettinger wrote: > | When updating the documentation, please don't go overboard with warnings. > | The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly. > | See http://docs.python.org/documenting/style.html#affirmative-tone > > I come to this late, but if we're going after the docs... > > At the above link one finds this text: > > This assures that files are flushed [...] > > It does not. It _ensures_ that files are flushed. The doco style "affirmative > tone" _assures_. The coding practice _ensures_! > > Pedanticly, Oh, come on, surely this doesn't effect the casual reader? Georg From martin at v.loewis.de Wed Dec 7 19:33:57 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 07 Dec 2011 19:33:57 +0100 Subject: [Python-Dev] [Python-checkins] cpython (2.7): PDB now will properly escape backslashes in the names of modules it executes. In-Reply-To: <4EC67559.90409@netwok.org> References: <4EC67559.90409@netwok.org> Message-ID: <4EDFB195.60607@v.loewis.de> > I think we have an unwritten rule that test class and method names > should tell something about what they test. (We do have things like > TestWeirdBugs and test_12345, but I don?t think it?s a useful pattern to > follow :) I completely disagree. test_12345 is a very good name for a test case, in particular if it tests the value of a tau constant in the math module. There can't be any more precise documentation of the test purpose. Regards, Martin From steve at holdenweb.com Wed Dec 7 19:40:56 2011 From: steve at holdenweb.com (Steve Holden) Date: Wed, 7 Dec 2011 10:40:56 -0800 Subject: [Python-Dev] Python Best Again Message-ID: <48E6CE91-AA36-427D-A1C5-FFC4B9A4690E@holdenweb.com> I've just added a news item to the python.org home page noting that Linux Journal readers have voted Python the Best Programming Language for the third year in a row. This is excellent news, though I find it hard to believe that coming up on the outside we see C++. While it demonstrates that Linux Journal readers like object-oriented programming, it shows an uncomfortable tendency towards masochism :) and implies we can't necessarily trust their judgment. ;-) Attempted humor aside, here I am taking the opportunity as PSF chairman to say a big "thank you" to all developers and everyone else who helps to keep putting out releases that gain the kind of popularity that this most recent vote indicates. I know we do it to create a great programming environment, not for popularity, but the Foundation's mission involves encouraging the growth of the international Python community. Please pass this on to other members of your developer community who may not receive this message directly. Seriously, thanks. Having quality releases of a great language really does make it easier to promote Python! regards Steve -- Steve Holden steve at holdenweb.com, Holden Web, LLC http://holdenweb.com/ Python classes (and much more) through the web http://oreillyschool.com/ From massimo.dipierro at gmail.com Wed Dec 7 19:45:31 2011 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Wed, 7 Dec 2011 12:45:31 -0600 Subject: [Python-Dev] [PSF-Members] Python Best Again In-Reply-To: <48E6CE91-AA36-427D-A1C5-FFC4B9A4690E@holdenweb.com> References: <48E6CE91-AA36-427D-A1C5-FFC4B9A4690E@holdenweb.com> Message-ID: <37B986D5-DE20-473F-A438-D99AFB7FF7C4@gmail.com> Hello Steve, congratulations to all of you in the foundation who work hard to make Python the success that it is. Massimo On Dec 7, 2011, at 12:40 PM, Steve Holden wrote: > I've just added a news item to the python.org home page noting that Linux Journal readers have voted Python the Best Programming Language for the third year in a row. > > This is excellent news, though I find it hard to believe that coming up on the outside we see C++. While it demonstrates that Linux Journal readers like object-oriented programming, it shows an uncomfortable tendency towards masochism :) and implies we can't necessarily trust their judgment. ;-) > > Attempted humor aside, here I am taking the opportunity as PSF chairman to say a big "thank you" to all developers and everyone else who helps to keep putting out releases that gain the kind of popularity that this most recent vote indicates. I know we do it to create a great programming environment, not for popularity, but the Foundation's mission involves encouraging the growth of the international Python community. Please pass this on to other members of your developer community who may not receive this message directly. > > Seriously, thanks. Having quality releases of a great language really does make it easier to promote Python! > > regards > Steve > -- > Steve Holden steve at holdenweb.com, Holden Web, LLC http://holdenweb.com/ > Python classes (and much more) through the web http://oreillyschool.com/ > > > > _______________________________________________ > PSF-Members mailing list > PSF-Members at python.org > http://mail.python.org/mailman/listinfo/psf-members From ethan at stoneleaf.us Wed Dec 7 20:00:41 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 07 Dec 2011 11:00:41 -0800 Subject: [Python-Dev] Warnings In-Reply-To: References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <20111207012312.GA7566@cskk.homeip.net> Message-ID: <4EDFB7D9.6010206@stoneleaf.us> Georg Brandl wrote: > Am 07.12.2011 02:23, schrieb Cameron Simpson: >> On 30Nov2011 22:10, Raymond Hettinger wrote: >> | When updating the documentation, please don't go overboard with warnings. >> | The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly. >> | See http://docs.python.org/documenting/style.html#affirmative-tone >> >> I come to this late, but if we're going after the docs... >> >> At the above link one finds this text: >> >> This assures that files are flushed [...] >> >> It does not. It _ensures_ that files are flushed. The doco style "affirmative >> tone" _assures_. The coding practice _ensures_! >> >> Pedanticly, > > Oh, come on, surely this doesn't effect the casual reader? No, of course not -- although it might /affect/ said reader by causing him/her to think, "I don't think that word means what you think it means..." ;) Seriously, it's best to use the correct words with the correct meanings. If someone is willing to fix it, let them. ~Ethan~ From wolfson at gmail.com Wed Dec 7 21:01:52 2011 From: wolfson at gmail.com (Ben Wolfson) Date: Wed, 7 Dec 2011 12:01:52 -0800 Subject: [Python-Dev] Warnings In-Reply-To: <4EDFB7D9.6010206@stoneleaf.us> References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <20111207012312.GA7566@cskk.homeip.net> <4EDFB7D9.6010206@stoneleaf.us> Message-ID: On Wed, Dec 7, 2011 at 11:00 AM, Ethan Furman wrote: > > No, of course not -- although it might /affect/ said reader by causing > him/her to think, "I don't think that word means what you think it means..." > ?;) > > Seriously, it's best to use the correct words with the correct meanings. ?If > someone is willing to fix it, let them. I'm sure this hypothetical reader will then look "assure" up in the OED and find this: 5. To make certain the occurrence or arrival of (an event); to ensure. -- Ben Wolfson "Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry] From ben+python at benfinney.id.au Wed Dec 7 21:15:18 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 08 Dec 2011 07:15:18 +1100 Subject: [Python-Dev] Warnings References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <20111207012312.GA7566@cskk.homeip.net> Message-ID: <87k4682l61.fsf@benfinney.id.au> Georg Brandl writes: > Am 07.12.2011 02:23, schrieb Cameron Simpson: > > This assures that files are flushed [...] > > > > It does not. It _ensures_ that files are flushed. The doco style > > "affirmative tone" _assures_. The coding practice _ensures_! > > Oh, come on, surely this doesn't effect the casual reader? Some readers could of been confused irregardless. -- \ ?We must find our way to a time when faith, without evidence, | `\ disgraces anyone who would claim it.? ?Sam Harris, _The End of | _o__) Faith_, 2004 | Ben Finney From tseaver at palladion.com Wed Dec 7 21:16:24 2011 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 07 Dec 2011 15:16:24 -0500 Subject: [Python-Dev] Warnings In-Reply-To: References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <20111207012312.GA7566@cskk.homeip.net> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/07/2011 01:22 PM, Georg Brandl wrote: > Am 07.12.2011 02:23, schrieb Cameron Simpson: >> On 30Nov2011 22:10, Raymond Hettinger >> wrote: | When updating the documentation, please don't go overboard >> with warnings. | The docs need to be worded affirmatively -- say >> what a tool does and show how to use it correctly. | See >> http://docs.python.org/documenting/style.html#affirmative-tone >> >> I come to this late, but if we're going after the docs... >> >> At the above link one finds this text: >> >> This assures that files are flushed [...] >> >> It does not. It _ensures_ that files are flushed. The doco style >> "affirmative tone" _assures_. The coding practice _ensures_! >> >> Pedanticly, > > Oh, come on, surely this doesn't effect the casual reader? /me presumes an ironic mispeling there. ;) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7fyZgACgkQ+gerLs4ltQ5eaQCeL+E4CVxa1BWhm/MsPw29u/Ym QnUAoKBOY37dNA9aT5TZkv4hu9ixZjBn =jg86 -----END PGP SIGNATURE----- From victor.stinner at haypocalc.com Thu Dec 8 02:43:40 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 08 Dec 2011 02:43:40 +0100 Subject: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris issues Message-ID: <1504453.f4XqDVp2GQ@ned> Hi, I would like to deny the creation of an Unicode string containing characters outside the range [U+0000; U+10FFFF]. The check is already present in some places (e.g. the builtin chr() function), but not everywhere. The last important function is PyUnicode_FromWideChar, function used to decode text from the OS. The problem is that test_locale fails on Solaris with such checks. I would like to know how to handle Solaris issues. One possible solution is to not handle issues, and just raise exceptions and skip the failing tests on Solaris ;-) Another solution is to modify locale.strxfrm() on all platforms to return a list of int, instead of a str. The type of the result is not really important, we just have to be able to compare two results (equal, greater, lesser or equal, etc.). Another solution? -- The two Solaris issues: - in the hu_HU locale, localeconv() returns U+30000020 for the thousands separator - locale.strxfrm() calls wcsxfrm() which returns characters in the range [0x1000000; 0x1FFFFFF] For localeconv(), it is the b'\xA0' byte string decoded from an encoding looking like ISO-8859-?? (b'\xA0' is not decodable from UTF-8). It looks like a bug in the decoder. It also looks like OpenIndiana doesn't use ISO-8859 locale anymore, only UTF-8 locales (which is much better!). I'm unable to reproduce the issue on my OpenIndiana VM. For wcsxfrm(), I'm not sure of the range. Example of a result: {0x1010163, 0x1010101, 0x1010103, 0x1010101, 0x1010103, 0x1010101, 0x1010101}. It looks like wcsxfrm() uses the result of strxfrm() by grouping bytes 3 by 3 and add 0x1000000 to each group. Example of strxfrm() output for the same input: {0x01, 0x01, 0x63, 0x01, 0x01, 0x01, ...}. See http://bugs.python.org/issue13441 for more information. Victor From stephen at xemacs.org Thu Dec 8 03:13:30 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 08 Dec 2011 11:13:30 +0900 Subject: [Python-Dev] Warnings In-Reply-To: References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com> <20111207012312.GA7566@cskk.homeip.net> Message-ID: <87d3bz24l1.fsf@uwakimon.sk.tsukuba.ac.jp> Georg Brandl writes: > Oh, come on, surely this doesn't effect the casual reader? Casual readers aren't effective in any case; you want to hear the opinions of those who care. From chrism at plope.com Thu Dec 8 06:08:39 2011 From: chrism at plope.com (Chris McDonough) Date: Thu, 08 Dec 2011 00:08:39 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? Message-ID: <1323320919.2710.24.camel@thinko> On the heels of Armin's blog post about the troubles of making the same codebase run on both Python 2 and Python 3, I have a concrete suggestion. It would help a lot for code that straddles both Py2 and Py3 to be able to make use of u'' literals. It would seem to be an easy thing to reenable (see http://www.reddit.com/r/Python/comments/n3q7q/thoughts_on_python_3_armin_ronachers_thoughts_and/c36397t ) . It would seem to cost very little in terms of maintenance, and not much in docs. It would make it possible to share code like this across py2 and py3: a = u'foo' Instead of (with e.g. six): a = u('foo') Or: from __future__ import unicode_literals a = 'foo' I recognize that the last option is probably the way "its meant to be done", but in reality it's just more practical to not fail when literal notation is more specific than strictly necessary. - C From benjamin at python.org Thu Dec 8 07:02:22 2011 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 8 Dec 2011 01:02:22 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323320919.2710.24.camel@thinko> References: <1323320919.2710.24.camel@thinko> Message-ID: 2011/12/8 Chris McDonough : > On the heels of Armin's blog post about the troubles of making the same > codebase run on both Python 2 and Python 3, I have a concrete > suggestion. > > It would help a lot for code that straddles both Py2 and Py3 to be able > to make use of u'' literals. Helpful or not helpful, I think that ship has sailed. The earliest it could see the light of day is 3.3, which would leave people trying to support 3.1 and 3.2 in a bind. -- Regards, Benjamin From chrism at plope.com Thu Dec 8 07:10:44 2011 From: chrism at plope.com (Chris McDonough) Date: Thu, 08 Dec 2011 01:10:44 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> Message-ID: <1323324644.2710.28.camel@thinko> On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote: > 2011/12/8 Chris McDonough : > > On the heels of Armin's blog post about the troubles of making the same > > codebase run on both Python 2 and Python 3, I have a concrete > > suggestion. > > > > It would help a lot for code that straddles both Py2 and Py3 to be able > > to make use of u'' literals. > > Helpful or not helpful, I think that ship has sailed. The earliest it > could see the light of day is 3.3, which would leave people trying to > support 3.1 and 3.2 in a bind. Right.. the title does say "readd ... support in 3.3". Are you suggesting "the ship has sailed" for eternity because it can't be supported in Python < 3.3? - C From benjamin at python.org Thu Dec 8 07:18:06 2011 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 8 Dec 2011 01:18:06 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323324644.2710.28.camel@thinko> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> Message-ID: 2011/12/8 Chris McDonough : > On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote: >> 2011/12/8 Chris McDonough : >> > On the heels of Armin's blog post about the troubles of making the same >> > codebase run on both Python 2 and Python 3, I have a concrete >> > suggestion. >> > >> > It would help a lot for code that straddles both Py2 and Py3 to be able >> > to make use of u'' literals. >> >> Helpful or not helpful, I think that ship has sailed. The earliest it >> could see the light of day is 3.3, which would leave people trying to >> support 3.1 and 3.2 in a bind. > > Right.. the title does say "readd ... support in 3.3". ?Are you > suggesting "the ship has sailed" for eternity because it can't be > supported in Python < 3.3? I'm questioning the real utility of it. -- Regards, Benjamin From chrism at plope.com Thu Dec 8 07:31:56 2011 From: chrism at plope.com (Chris McDonough) Date: Thu, 08 Dec 2011 01:31:56 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> Message-ID: <1323325916.2710.39.camel@thinko> On Thu, 2011-12-08 at 01:18 -0500, Benjamin Peterson wrote: > 2011/12/8 Chris McDonough : > > On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote: > >> 2011/12/8 Chris McDonough : > >> > On the heels of Armin's blog post about the troubles of making the same > >> > codebase run on both Python 2 and Python 3, I have a concrete > >> > suggestion. > >> > > >> > It would help a lot for code that straddles both Py2 and Py3 to be able > >> > to make use of u'' literals. > >> > >> Helpful or not helpful, I think that ship has sailed. The earliest it > >> could see the light of day is 3.3, which would leave people trying to > >> support 3.1 and 3.2 in a bind. > > > > Right.. the title does say "readd ... support in 3.3". Are you > > suggesting "the ship has sailed" for eternity because it can't be > > supported in Python < 3.3? > > I'm questioning the real utility of it. All I can really offer is my own experience here based on writing code that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3. Having u'' work across all of these would mean porting would not require as much eyeballing as code modified via "from future import unicode_literals", it would let more code work on 2.5 unchanged, and the resulting code would execute faster than code that required us to use a u() function. What's the case against? - C From ncoghlan at gmail.com Thu Dec 8 08:33:29 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 8 Dec 2011 17:33:29 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323325916.2710.39.camel@thinko> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: Such code still won't work on 3.2, hence restoring the redundant notation would be ultimately pointless. -- Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) On Dec 8, 2011 4:34 PM, "Chris McDonough" wrote: > On Thu, 2011-12-08 at 01:18 -0500, Benjamin Peterson wrote: > > 2011/12/8 Chris McDonough : > > > On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote: > > >> 2011/12/8 Chris McDonough : > > >> > On the heels of Armin's blog post about the troubles of making the > same > > >> > codebase run on both Python 2 and Python 3, I have a concrete > > >> > suggestion. > > >> > > > >> > It would help a lot for code that straddles both Py2 and Py3 to be > able > > >> > to make use of u'' literals. > > >> > > >> Helpful or not helpful, I think that ship has sailed. The earliest it > > >> could see the light of day is 3.3, which would leave people trying to > > >> support 3.1 and 3.2 in a bind. > > > > > > Right.. the title does say "readd ... support in 3.3". Are you > > > suggesting "the ship has sailed" for eternity because it can't be > > > supported in Python < 3.3? > > > > I'm questioning the real utility of it. > > All I can really offer is my own experience here based on writing code > that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3. > Having u'' work across all of these would mean porting would not require > as much eyeballing as code modified via "from future import > unicode_literals", it would let more code work on 2.5 unchanged, and the > resulting code would execute faster than code that required us to use a > u() function. > > What's the case against? > > - C > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrism at plope.com Thu Dec 8 08:45:08 2011 From: chrism at plope.com (Chris McDonough) Date: Thu, 08 Dec 2011 02:45:08 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: <1323330308.2710.52.camel@thinko> On Thu, 2011-12-08 at 17:33 +1000, Nick Coghlan wrote: > Such code still won't work on 3.2, hence restoring the redundant > notation would be ultimately pointless. None of the code I've written which straddles Python 2/3 supports anything except Python 3.2+, and likewise I expect that for the next crop of porters/straddlers, their code won't support anything but Python 3.3+. So there is a point, which is to make it easier for people to port code that can straddle the most recent Python 3 release as well as 2.7/2.6. In that context, I don't see much relevance of having no support for u'' in Python 3.2. - C From lukasz at langa.pl Thu Dec 8 08:54:18 2011 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Thu, 8 Dec 2011 08:54:18 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323320919.2710.24.camel@thinko> References: <1323320919.2710.24.camel@thinko> Message-ID: Wiadomo?? napisana przez Chris McDonough w dniu 8 gru 2011, o godz. 06:08: > It would make it possible to share code like this across py2 and py3: > > a = u'foo' > As Armin himself wrote, py3k-compatible code ported from 2.x is often very ugly. This kind of change would only deepen the problem. -1 > Or: > > from __future__ import unicode_literals > a = 'foo' > > I recognize that the last option is probably the way "its meant to be > done" Yes, that's the reason 2.x has b''. If Python 2.8 ever came to be, making this __future__ work with the standard library would be the right way to do it. -- Pozdrawiam serdecznie, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??! Please consider the environment before printing out this e-mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 1898 bytes Desc: not available URL: From stefan at bytereef.org Thu Dec 8 10:17:52 2011 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 8 Dec 2011 10:17:52 +0100 Subject: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris issues In-Reply-To: <1504453.f4XqDVp2GQ@ned> References: <1504453.f4XqDVp2GQ@ned> Message-ID: <20111208091752.GA29901@sleipnir.bytereef.org> Victor Stinner wrote: > For localeconv(), it is the b'\xA0' byte string decoded from an encoding > looking like ISO-8859-?? (b'\xA0' is not decodable from UTF-8). It looks like > a bug in the decoder. It also looks like OpenIndiana doesn't use ISO-8859 > locale anymore, only UTF-8 locales (which is much better!). I'm unable to > reproduce the issue on my OpenIndiana VM. I'm think that b'\xA0' is a valid thousands separator. The 'fi_FI' locale also uses that. Decimal.__format__() has to handle the 'n' specifier, which takes the thousands separator directly from localeconv(). Currently I have this horrible function to deal with the problem: /* Convert decimal_point or thousands_sep, which may be multibyte or in the range [128, 255], to a UTF8 string. */ static PyObject * dotsep_as_utf8(const char *s) { PyObject *utf8; PyObject *tmp; wchar_t buf[2]; size_t n; n = mbstowcs(buf, s, 2); if (n != 1) { /* Issue #7442 */ PyErr_SetString(PyExc_ValueError, "invalid decimal point or unsupported " "combination of LC_CTYPE and LC_NUMERIC"); return NULL; } tmp = PyUnicode_FromWideChar(buf, n); if (tmp == NULL) { return NULL; } utf8 = PyUnicode_AsUTF8String(tmp); Py_DECREF(tmp); return utf8; } The main issue is that there is no portable function mbst_to_utf8() that uses the current locale. If possible, it would be great to have such a thing in the C-API. I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems have this thousands separator. Stefan Krah From stefan at bytereef.org Thu Dec 8 10:42:31 2011 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 8 Dec 2011 10:42:31 +0100 Subject: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris issues In-Reply-To: <20111208091752.GA29901@sleipnir.bytereef.org> References: <1504453.f4XqDVp2GQ@ned> <20111208091752.GA29901@sleipnir.bytereef.org> Message-ID: <20111208094231.GA30187@sleipnir.bytereef.org> Stefan Krah wrote: > I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems > have this thousands separator. Are LC_CTYPE and LC_NUMERIC set to the same value on the buildbot? Otherwise you encounter http://bugs.python.org/issue7442 . Stefan Krah From tjreedy at udel.edu Thu Dec 8 11:54:28 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 08 Dec 2011 05:54:28 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323325916.2710.39.camel@thinko> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On 12/8/2011 1:31 AM, Chris McDonough wrote: > What's the case against? From a 3.x perpective, an irrelevant 'u' would be pure noise and make the language a bit harder to learn. The intent for 3.x is that one be able to learn 3.x without knowing anything about 2.x. So bridge stuff has been put into 2.6 and even more in 2.7. But it does not really belong in 3.x. -- Terry Jan Reedy From vinay_sajip at yahoo.co.uk Thu Dec 8 12:01:49 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Thu, 8 Dec 2011 11:01:49 +0000 (UTC) Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <1323330308.2710.52.camel@thinko> Message-ID: Chris McDonough plope.com> writes: > > In that context, I don't see much relevance of having no support for u'' > in Python 3.2. > Well, if 3.2 remains in use for a longish time, then it is relevant, in the broader context, isn't it? We know how conservative Linux distributions can be with their Python releases - although most are still releasing 2.x as their system Python, this could change at some point in the future. Even if it doesn't, there might be a fair user base of people stuck with 3.2 for any number of reasons, and to support them, the change you propose won't help, because some variant of a package will still have to use u() and b(), just for 3.2 support. I'm not arguing against your proposed change itself - just against your point about the relevance of 3.2. Regards, Vinay Sajip From stephan.richter at gmail.com Thu Dec 8 12:05:51 2011 From: stephan.richter at gmail.com (Stephan Richter) Date: Thu, 08 Dec 2011 06:05:51 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> Message-ID: <5242067.5aBSYdFaIB@einstein> On Thursday, December 08, 2011 01:18:06 AM Benjamin Peterson wrote: > > Right.. the title does say "readd ... support in 3.3". Are you > > suggesting "the ship has sailed" for eternity because it can't be > > supported in Python < 3.3? > > I'm questioning the real utility of it. The real utility is to make it possible to port libraries to Py3 or at least make it a lot easier. It is somewhat naive to think that you can just tell everyone to upgrade to Python 2.7 and then use the future import. Having to change all that code can also be a big bug magnet. Chris has been a great champion of bringing the Web app community closer to Python 3. His experience with porting code is pretty extensive especially in keeping it compatible with older Pythonn 2 versions (down to 2.5). If the Python Devs want more adoption of Python 3, they should at least throw a bone from time to time and make adoption a bit easier. The arguments against this proposal seem academic and purist to me. (Mmh, I cannot believe I just wrote that having been accused of that myself in the past.) Regards, Stephan -- Entrepreneur and Software Geek Google me. "Zope Stephan Richter" From anacrolix at gmail.com Thu Dec 8 12:08:17 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 8 Dec 2011 22:08:17 +1100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: Nobody is using 3 yet ;) Sure, I use it for some personal projects, and other people pretend to support it. Not really. The worst of the pain in porting to Python 3000 has yet to even begin! On Thu, Dec 8, 2011 at 6:33 PM, Nick Coghlan wrote: > Such code still won't work on 3.2, hence restoring the redundant notation > would be ultimately pointless. > > -- > Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) > > On Dec 8, 2011 4:34 PM, "Chris McDonough" wrote: >> >> On Thu, 2011-12-08 at 01:18 -0500, Benjamin Peterson wrote: >> > 2011/12/8 Chris McDonough : >> > > On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote: >> > >> 2011/12/8 Chris McDonough : >> > >> > On the heels of Armin's blog post about the troubles of making the >> > >> > same >> > >> > codebase run on both Python 2 and Python 3, I have a concrete >> > >> > suggestion. >> > >> > >> > >> > It would help a lot for code that straddles both Py2 and Py3 to be >> > >> > able >> > >> > to make use of u'' literals. >> > >> >> > >> Helpful or not helpful, I think that ship has sailed. The earliest it >> > >> could see the light of day is 3.3, which would leave people trying to >> > >> support 3.1 and 3.2 in a bind. >> > > >> > > Right.. the title does say "readd ... support in 3.3". ?Are you >> > > suggesting "the ship has sailed" for eternity because it can't be >> > > supported in Python < 3.3? >> > >> > I'm questioning the real utility of it. >> >> All I can really offer is my own experience here based on writing code >> that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3. >> Having u'' work across all of these would mean porting would not require >> as much eyeballing as code modified via "from future import >> unicode_literals", it would let more code work on 2.5 unchanged, and the >> resulting code would execute faster than code that required us to use a >> u() function. >> >> What's the case against? >> >> - C >> >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -- ?_? From lukasz at langa.pl Thu Dec 8 13:08:31 2011 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Thu, 8 Dec 2011 13:08:31 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <5242067.5aBSYdFaIB@einstein> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <5242067.5aBSYdFaIB@einstein> Message-ID: <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> Wiadomo?? napisana przez Stephan Richter w dniu 8 gru 2011, o godz. 12:05: > It is somewhat naive to think that you can just tell > everyone to upgrade to Python 2.7 and then use the future import. Having to > change all that code can also be a big bug magnet. A big bug magnet is using a Python version that is not getting any fixes whatsoever. When I'm backporting stuff from Python 3, I'm targeting 2.6+ because it's still somewhat supported by us. What's more important though is that there were tremendous changes in that release in terms of bridging the gap between Python 2 and 3. I'm wondering why developers inflict so much impediment to support a Python version that's 5+ years old and was replaced by a newer one in virtually every operating system. Recent versions of Mac OS X, RedHat and Debian all sport Python 2.6+. It seems only GAE and Jython are stuck on Python 2.5. Python 2.6 has ABCs, supports b'' (and even has a "bytes" alias for the str type), forward compatibility __futures__ (print_function, unicode_literals, division and absolute_imports), "except Exception as e", etc. The thing we did miss was making sure the std lib doesn't break when unicode_literals are used. And that's a bummer. -- Pozdrawiam serdecznie, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??! Please consider the environment before printing out this e-mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 1898 bytes Desc: not available URL: From stephan.richter at gmail.com Thu Dec 8 13:14:09 2011 From: stephan.richter at gmail.com (Stephan Richter) Date: Thu, 08 Dec 2011 07:14:09 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> Message-ID: <3344831.JP9Cfj4Ety@einstein> On Thursday, December 08, 2011 01:08:31 PM ?ukasz Langa wrote: > A big bug magnet is using a Python version that is not getting any fixes > whatsoever. When I'm backporting stuff from Python 3, I'm targeting 2.6+ > because it's still somewhat supported by us. What's more important though > is that there were tremendous changes in that release in terms of bridging > the gap between Python 2 and 3. But you might not have that luxury and updating code to a new Python version is a lot of work. As you can see in my signature, I am very much involved in the Zope community. The entire Zope, Plone and Pyramid ecosystem is extremely large and one can simply not make blanket statements about Python version use. We try very hard to move our libraries up the version ladder but we must also take great care of backwards-compatibility. (We have seen already what happens if we do not with Zoep 2 versus 3. And Python is struggling with similar issues, even though the changes were much less drastic.) Regards, Stephan -- Entrepreneur and Software Geek Google me. "Zope Stephan Richter" From barry at python.org Thu Dec 8 13:18:44 2011 From: barry at python.org (Barry Warsaw) Date: Thu, 8 Dec 2011 07:18:44 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323320919.2710.24.camel@thinko> References: <1323320919.2710.24.camel@thinko> Message-ID: <20111208071844.6fe1970c@limelight.wooz.org> On Dec 08, 2011, at 12:08 AM, Chris McDonough wrote: > from __future__ import unicode_literals > a = 'foo' I agree this is an annoying thing to have to change when supporting a dual-Python-version codebase, but it's not the most annoying. print-functions are a little more painful to switch because there's no easy Emacs conversion for them. ;) This one is actually pretty useful because it does make you go through and be very specific about which literals are bytes and which are unicodes. Also, re-adding u'' prefixes doesn't help you much because you might still have byte literals which you have to b'' prefix. Do you really want both 'foo' and u'foo' to be unicode literals? -1 Cheers, -Barry From barry at python.org Thu Dec 8 13:27:20 2011 From: barry at python.org (Barry Warsaw) Date: Thu, 8 Dec 2011 07:27:20 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <1323330308.2710.52.camel@thinko> Message-ID: <20111208072720.0d243557@limelight.wooz.org> On Dec 08, 2011, at 11:01 AM, Vinay Sajip wrote: >Well, if 3.2 remains in use for a longish time, then it is relevant, in the >broader context, isn't it? We know how conservative Linux distributions can >be with their Python releases - although most are still releasing 2.x as >their system Python, this could change at some point in the future. Even if >it doesn't, there might be a fair user base of people stuck with 3.2 for any >number of reasons, and to support them, the change you propose won't help, >because some variant of a package will still have to use u() and b(), just >for 3.2 support. Case in point: Ubuntu 12.04 is a long term support release, meaning 5 years of official support on both the desktop and server. It will ship with Python 2.7 and 3.2 only. -Barry From ncoghlan at gmail.com Thu Dec 8 13:32:43 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 8 Dec 2011 22:32:43 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <3344831.JP9Cfj4Ety@einstein> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> Message-ID: If people decide to delay their Py3k migrations until they can drop 2.5 support, they're quite free to do so. The only reason for porting right now is to support 3.2, thus making a future reintroduction of u'' useless. Those that delay their ports can use the forward compatibility in 2.6. Having just purged so much cruft from the language, pleas to add some back permanently for a problem that is going to fade from significance within the next couple of years are unlikely to get very far. -- Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at haypocalc.com Thu Dec 8 13:24:51 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 08 Dec 2011 13:24:51 +0100 Subject: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris issues In-Reply-To: <20111208091752.GA29901@sleipnir.bytereef.org> References: <1504453.f4XqDVp2GQ@ned> <20111208091752.GA29901@sleipnir.bytereef.org> Message-ID: <4EE0AC93.5030706@haypocalc.com> Le 08/12/2011 10:17, Stefan Krah a ?crit : > I'm think that b'\xA0' is a valid thousands separator. I agree, but it's not the point: the problem is that b'\xA0' is decoded to a strange U+30000020 character by mbstowcs(). > Currently I have this horrible function to deal with the problem: > > ... > n = mbstowcs(buf, s, 2); > ... > tmp = PyUnicode_FromWideChar(buf, n); > if (tmp == NULL) { > return NULL; > } > utf8 = PyUnicode_AsUTF8String(tmp); > Py_DECREF(tmp); > return utf8; I would not help this specific issue: b'\xA0' is not decodable from UTF-8. > I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems > have this thousands separator. The problem is not directly in the C localeconv() function, but in mbstowcs() with the hu_HU locale. You can try my test program for this issue: http://bugs.python.org/file23876/localeconv_wchar.c My test is maybe not correct, because it only sets LC_ALL, which is a little bit different than Python tests (see below). -- I don't remember on which buildbot the issue occurred :-( - "sparc solaris10 gcc 3.x" has "LANG=C" and "TZ=Europe/Berlin" environement variable - "x86 OpenIndiana 3.x" and "AMD64 OpenIndian a%203.x" have "TZ=Europe/London" and no locale variable!? The issue occurred for example in test_lc_numeric_basic() of test__locale which sets LC_NUMERIC and LC_CTYPE locales (but not LC_ALL). LC_ALL and LC_NUMERIC are different in this test, but LC_NUMERIC and LC_CTYPE are the same. -- Stefan: would you accept that locale.localeconv() and locale.strxfrm() stop working (instead of returning invalid data) on Solaris in certains cases (it looks like the issue depends on the locale and the OS version)? It can be a motivation to fix the root of the issue ;-) Victor From stefan at bytereef.org Thu Dec 8 14:42:11 2011 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 8 Dec 2011 14:42:11 +0100 Subject: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris issues In-Reply-To: <4EE0AC93.5030706@haypocalc.com> References: <1504453.f4XqDVp2GQ@ned> <20111208091752.GA29901@sleipnir.bytereef.org> <4EE0AC93.5030706@haypocalc.com> Message-ID: <20111208134211.GA31211@sleipnir.bytereef.org> Victor Stinner wrote: > The problem is not directly in the C localeconv() function, but in > mbstowcs() with the hu_HU locale. Ah, I see. > You can try my test program for this issue: > http://bugs.python.org/file23876/localeconv_wchar.c Can't test on OpenSolaris, since Oracle removed the package repo and I need the ISO locales. > Stefan: would you accept that locale.localeconv() and locale.strxfrm() > stop working (instead of returning invalid data) on Solaris in certains > cases (it looks like the issue depends on the locale and the OS > version)? It can be a motivation to fix the root of the issue ;-) Yes, if the cause is a broken mbstowcs() that sounds good. Stefan Krah From vinay_sajip at yahoo.co.uk Thu Dec 8 16:27:57 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Thu, 8 Dec 2011 15:27:57 +0000 (UTC) Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: Matt Joiner gmail.com> writes: > > Nobody is using 3 yet ;) > > Sure, I use it for some personal projects, and other people pretend to > support it. Not really. > > The worst of the pain in porting to Python 3000 has yet to even begin! > The classic chicken-and-egg problem, right? Someone's got to make a start. If you aim for porting with a single codebase and are not too hung up about "practicality beats purity" hacks like e = sys.exc_info()[1], then I think decent progress can be made with little risk, as long as the project has good test coverage (and if it doesn't ... well, that's risky even if you stay on 2.x ...). Django porting took a week of elapsed time (i.e. < 1 person-week of effort) to go from thousands of test failures under 3.x and sqlite to zero test failures. Django is a pretty big project, so I can't imagine "ordinary mortal" projects are going to be too bad (as long as not implemented pathologically). Of course, the Django port has some way to go, but still ... pip and virtualenv are relatively mature single code base ports, too. As additional examples - I've done Babel, Whoosh, Elixir, WTForms and others the same way. Of course, I understand that YMMV. Regards, Vinay Sajip From jannis at leidel.info Thu Dec 8 16:53:22 2011 From: jannis at leidel.info (Jannis Leidel) Date: Thu, 8 Dec 2011 16:53:22 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On 08.12.2011, at 16:27, Vinay Sajip wrote: > Matt Joiner gmail.com> writes: > >> >> Nobody is using 3 yet ;) >> >> Sure, I use it for some personal projects, and other people pretend to >> support it. Not really. >> >> The worst of the pain in porting to Python 3000 has yet to even begin! >> > > The classic chicken-and-egg problem, right? Someone's got to make a start. If > you aim for porting with a single codebase and are not too hung up about > "practicality beats purity" hacks like e = sys.exc_info()[1], then I think > decent progress can be made with little risk, as long as the project has good > test coverage (and if it doesn't ... well, that's risky even if you stay on 2.x > ...). > > Django porting took a week of elapsed time (i.e. < 1 person-week of effort) to > go from thousands of test failures under 3.x and sqlite to zero test failures. > Django is a pretty big project, so I can't imagine "ordinary mortal" projects > are going to be too bad (as long as not implemented pathologically). Of course, > the Django port has some way to go, but still ... pip and virtualenv are > relatively mature single code base ports, too. As additional examples - I've > done Babel, Whoosh, Elixir, WTForms and others the same way. I don't want to rain on your parade, but even if your port of Django passes all tests, it's not at all near completion. As a framework we not only have to worry about the ability to run on Python 3.X but also how to teach our community to upgrade their projects (if possible at all). That means to reduce the number of hacks needed and thoroughly reviewing to not suddenly lead into a maintenance dead end. E.g. I'm still not sure the one codebase strategy is better than the 2to3 strategy. Also, stating that pip and virtualenv were easy to port like other projects seems to me like only half of the story -- Carl and me had to fix a non trivial part of your port before being able to do the Py3k release. I don't mean to diminish your work, it *is* appreciated, but I'm rather careful with generalizations when it comes to changes of a platform on such epic scale. Best, Jannis From vinay_sajip at yahoo.co.uk Thu Dec 8 17:46:31 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Thu, 8 Dec 2011 16:46:31 +0000 (UTC) Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: Jannis Leidel leidel.info> writes: > I don't want to rain on your parade, Not at all - feel free. I don't feel rained on in the least :-) > but even if your port of Django passes all tests, it's not at all near > completion. As a framework we not only have to worry about the ability to run > on Python 3.X but also how to teach our community to upgrade their projects > (if possible at all). That means to reduce the number of hacks needed and > thoroughly reviewing to not suddenly lead into a maintenance dead end. > E.g. I'm still not sure the one codebase strategy is better than the 2to3 > strategy. Of course, and I did say in the post you're replying to that I know that the Django port has some way to go. But even if you decide that the single code base port is not something you want for Django, nevertheless, I think I've shown that the single port strategy can work for a large project like Django from a purely technical perspective such as passing a very large test suite. Of course, there are many non-technical issues such as documentation, ease of ongoing maintenance etc. which no doubt you will be reviewing in due course. (In the above, I'm using "technical" in a very narrow sense, obviously.) > Also, stating that pip and virtualenv were easy to port like other projects > seems to me like only half of the story -- Carl and me had to fix a > non-trivial part of your port before being able to do the Py3k release. Sure, and I didn't mean to imply that I did all the work - but I did announce it only after I got almost all, if not all, tests passing on 2.x and 3.x from a single code base - just as I did with Django. If the tests didn't cover everything, then more work would certainly have been required, but it's still a respectable milestone to have achieved, IMO. But it's the single code base strategy that I wanted to highlight - and AFAIK you haven't had to back-pedal on that (or at least, if you did, it might have been nice to drop me a line to that effect). > I don't mean to diminish your work, it *is* appreciated, but I'm rather > careful with generalizations when it comes to changes of a platform on > such epic scale. I hope I'm not being careless where you're being careful, but where does caution start and timidity begin? You might remember that you brought up the desirability of the Python 3 port on django-developers in September, which got me thinking about it. My view of it is, if everyone thinks of it like eating an elephant, no one is even going to take the first bite, for fear of indigestion. Don't get me wrong - I understand about priorities and commitments, and everyone scratching their own itch. So, I scratched mine, and bet on the hunch that the elephant was only a chocolate elephant, and not a real one. Time will of course tell ;-) Regards, Vinay Sajip From martin at v.loewis.de Thu Dec 8 18:26:59 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 08 Dec 2011 18:26:59 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323320919.2710.24.camel@thinko> References: <1323320919.2710.24.camel@thinko> Message-ID: <4EE0F363.4060208@v.loewis.de> > It would make it possible to share code like this across py2 and py3: > > a = u'foo' > > Instead of (with e.g. six): > > a = u('foo') > > Or: > > from __future__ import unicode_literals > a = 'foo' > > I recognize that the last option is probably the way "its meant to be > done", but in reality it's just more practical to not fail when literal > notation is more specific than strictly necessary. You are giving these two options already: - The former works for all Python versions. Although it may appear tedious to convert existing code to replace all Unicode literals with function calls, it would actually be possible/easy to write an automatic converter that does so for a complete code base, based on lib2to3. - the second version is truly practical for all applications/libraries that only support 2.6+. In addition, there also is another option: - use 2to3, in some form So you have already three solutions which are all transitional in some sense, and you want yet another option? I fail to see why this option is more practical than the options that are already there. Regards, Martin From shane at hathawaymix.org Thu Dec 8 19:21:40 2011 From: shane at hathawaymix.org (Shane Hathaway) Date: Thu, 08 Dec 2011 11:21:40 -0700 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323325916.2710.39.camel@thinko> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: <4EE10034.2070809@hathawaymix.org> On 12/07/2011 11:31 PM, Chris McDonough wrote: > All I can really offer is my own experience here based on writing code > that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3. > Having u'' work across all of these would mean porting would not require > as much eyeballing as code modified via "from future import > unicode_literals", it would let more code work on 2.5 unchanged, and the > resulting code would execute faster than code that required us to use a > u() function. Could you elaborate on why "from __future__ import unicode_literals" is inadequate (other than the Python 2.6 requirement)? Shane From tseaver at palladion.com Thu Dec 8 20:03:15 2011 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 08 Dec 2011 14:03:15 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE0F363.4060208@v.loewis.de> References: <1323320919.2710.24.camel@thinko> <4EE0F363.4060208@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/2011 12:26 PM, "Martin v. L?wis" wrote: >> It would make it possible to share code like this across py2 and >> py3: >> >> a = u'foo' >> >> Instead of (with e.g. six): >> >> a = u('foo') >> >> Or: >> >> from __future__ import unicode_literals a = 'foo' >> >> I recognize that the last option is probably the way "its meant to >> be done", but in reality it's just more practical to not fail when >> literal notation is more specific than strictly necessary. > > You are giving these two options already: - The former works for all > Python versions. Although it may appear tedious to convert existing > code to replace all Unicode literals with function calls, it would > actually be possible/easy to write an automatic converter that does so > for a complete code base, based on lib2to3. I guess this could be done to generate "straddling" code from 2-only code. Note that the overhead of the function call is likely significant in some cases: generating a module scope constant is the only sane replacement there, which might be harder to do in a fixer (I haven't tried to write one yet). > - the second version is truly practical for all > applications/libraries that only support 2.6+. Right. The question is would running more P2 code unmodified in P3 be a "Good Thing" from the perspective of P3 uptake: developers who run up against such issues tend to hit "camelback-meet-straw" points and bounce off the effort. Such a tiny change (a six line patch and an extra '.. note::' in the language reference section on string literal syntax) might be worth avoiding that risk. > In addition, there also is another option: - use 2to3, in some form 2to3 is not practical in a "straddling" case: - - The script is too slow to use in development mode (like being back in "compile the world" Java / C++ land). - - The transformed code generates tracebacks that don't match the source. > So you have already three solutions which are all transitional in > some sense, and you want yet another option? I fail to see why this > option is more practical than the options that are already there. The "redundant" u'*' spelling would be present in Python3 for the same reason that the equally-reduntant b'*' spelling is present in Python 2.6+: it makes writing portable code simpler. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7hCfIACgkQ+gerLs4ltQ5t8wCfalykXvpSq6awllQUpCymf8iM 3P0An0cCY/iZHcK82V+CqW07wCpGfBtf =Q4Fv -----END PGP SIGNATURE----- From glyph at twistedmatrix.com Thu Dec 8 21:32:20 2011 From: glyph at twistedmatrix.com (Glyph) Date: Thu, 8 Dec 2011 15:32:20 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> Message-ID: On Dec 8, 2011, at 7:32 AM, Nick Coghlan wrote: > Having just purged so much cruft from the language, pleas to add some back permanently for a problem that is going to fade from significance within the next couple of years are unlikely to get very far. > This problem is never going to go away. This is not a comment on the success of py3, but rather the persistence of old versions of things. Even assuming an awesomely optimistic schedule for py3k migrations, even assuming that *everything* on PyPI supports Py3 by the end of 2013, consider that all around the world, every day, new code is still being written in FORTRAN. Much of it is being in FORTRAN 77, despite the fact that Fotran 90 is now over 20 years old. Efforts still crop up periodically (some successful, some failed) to migrate these "legacy" projects to other languages, some of them as modern as C. There are plenty of proprietary Python 2 systems which exist today for which there will not be a budget for a Python 3 migration this decade. If history is an accurate guide, people will still be hired to work on python 2.x systems in the year 2100. Some of them will be being hired to migrate that python 2.x code to python 3 (or 4, or 5, whatever we have by then). If they're not, it will be because they're being hired to try to migrate it to Javascript instead, not because the Python 3 migration is "done" by then. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Dec 8 22:27:06 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 08 Dec 2011 22:27:06 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> Message-ID: <4EE12BAA.1050601@v.loewis.de> > This is not a comment on the success of py3, but rather the persistence > of old versions of things. Even assuming an awesomely optimistic > schedule for py3k migrations, even assuming that *everything* on PyPI > supports Py3 by the end of 2013, consider that all around the world, > every day, new code is still being written in FORTRAN. While this is true for FORTRAN, it is not for Python 1.5: no new Python 1.5 code is written around the world, at least not every day. Also for FORTRAN, new code that is written every day likely isn't FORTRAN 66, but more likely FORTRAN 90 or newer. The reason for that is that FORTRAN just isn't an obsolete language, by any means, else people wouldn't bother producing new versions of it, porting compilers to new processors, and so on. Contrast this to Python 1, and soon Python 2, which actually *is* obsolete (just as FORTRAN 66 *is* obsolete). > Much of it is being in FORTRAN 77 Can you prove this? I trust that existing code is being maintained in FORTRAN 77. For new code, I'm skeptical. > There are plenty of proprietary Python 2 systems which exist today for > which there will not be a budget for a Python 3 migration this decade. And people using it can happily continue to use Python 2. If they don't have a need to port their code to Python 3, they are not concerned by whether you use a u prefix for strings in Python 3 or not. Regards, Martin From robert.kern at gmail.com Thu Dec 8 22:41:09 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 08 Dec 2011 21:41:09 +0000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE12BAA.1050601@v.loewis.de> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> Message-ID: On 12/8/11 9:27 PM, "Martin v. L?wis" wrote: [Glyph writes:] >> Much of it is being in FORTRAN 77 > > Can you prove this? I trust that existing code is being maintained > in FORTRAN 77. For new code, I'm skeptical. Personally, I've written more new code in FORTRAN 77 than in Fortran 90+. Even with all of the quirks in FORTRAN 77 compilers, it's still substantially easier to connect FORTRAN 77 code to C and Python than 90+. When they introduced some of the nicer language features, they left the precise details of memory structures of the new types undefined, so compilers chose different ways to implement them. Some of the very latest developments in modern Fortran have begun to standardize the FFI for these features (or at least let you write a standardized shim for them) and compilers are catching up. For people writing new whole programs in Fortran, yes, they are probably mostly using 90+. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From janssen at parc.com Thu Dec 8 23:09:59 2011 From: janssen at parc.com (Bill Janssen) Date: Thu, 8 Dec 2011 14:09:59 PST Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE12BAA.1050601@v.loewis.de> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> Message-ID: <51106.1323382199@parc.com> =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > While this is true for FORTRAN, it is not for Python 1.5: no new > Python 1.5 code is written around the world, at least not every day. I don't know about that. I've seen a lot of Python 2 code which was apparently written by folks who learned Python 1.5.2 and never needed to learn about newer features. I suspect that's still going on fairly widely. Bill From solipsis at pitrou.net Fri Dec 9 01:35:35 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 01:35:35 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() References: Message-ID: <20111209013535.6fb38068@pitrou.net> On Fri, 09 Dec 2011 00:16:02 +0100 victor.stinner wrote: > > +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) > + > + Get a new copy of a Unicode object. > + > + .. versionadded:: 3.3 I'm not sure I understand. Why would you make a copy of an immutable object? From tjreedy at udel.edu Fri Dec 9 01:44:32 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 08 Dec 2011 19:44:32 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On 12/8/2011 10:53 AM, Jannis Leidel wrote: > possible at all). That means to reduce the number of hacks needed and > thoroughly reviewing to not suddenly lead into a maintenance dead > end. E.g. I'm still not sure the one codebase strategy is better than > the 2to3 strategy. One codebase with version compatibility hacks and no use of 2to3 is one pure strategy. Two codebases with no compatibility hacks (at least for 2 versus 3) and use of 2to3 to bridge all differences is another. Perhaps we need something in between, with a mix of compatibility hacks and automatic 2to3 conversions that has not been discovered yet, or that can be customized on a project by project basis. Deleting 'u' prefixes from string literals is something that is easy to do with 2to3 for anyone who cannot use the future import because of supporting 2.5. More that one person has said that *any* use of 2to3 is impractical for rapid-turnaround development because 2to3 is 'too slow'. If so, have the usual methods for speeding up a Python program been applied? Has anyone profiled 2to3? Is most of the time spent in 2to3 itself or some particular module that it uses? Is the time that is spend in 2to3 itself a result of the overall framework or particular fixers? If the latter, can slow fixers be eliminated by using a compatibility hack in the Python 2 code? Has anyone tried to compile 2to3 and prerequisite Python-coded modules? -- Terry Jan Reedy From glyph at twistedmatrix.com Fri Dec 9 01:52:28 2011 From: glyph at twistedmatrix.com (Glyph) Date: Thu, 8 Dec 2011 19:52:28 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE12BAA.1050601@v.loewis.de> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> Message-ID: <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Zooming back in to the actual issue this thread is about, I think the u""-vs-"" issue is a bit of a red herring, because the _real_ problem here is that 2to3 is slow and buggy and so migration efforts are starting to work around it, and therefore want to run the same code on 3.x and all the way back to 2.5. In my opinion, effort should be spent on optimizing the suggested migration tools and getting them to work properly, not twiddling the syntax so that it's marginally easier to avoid them. On Dec 8, 2011, at 4:27 PM, Martin v. L?wis wrote: >> This is not a comment on the success of py3, but rather the persistence >> of old versions of things. Even assuming an awesomely optimistic >> schedule for py3k migrations, even assuming that *everything* on PyPI >> supports Py3 by the end of 2013, consider that all around the world, >> every day, new code is still being written in FORTRAN. > > While this is true for FORTRAN, it is not for Python 1.5: no new > Python 1.5 code is written around the world, at least not every day. > Also for FORTRAN, new code that is written every day likely isn't > FORTRAN 66, but more likely FORTRAN 90 or newer. That's because Python 1.5 was upward-compatible with 2.x, and pretty much everyone could gently migrate, and start developing on the new versions even while supporting the old ones. That is obviously not true of 3.x, by design; 2to3 requires that you still develop on the old version even if you support a new one, not to mention the substantially increased effort of migration. > The reason for that is that FORTRAN just isn't an obsolete language, > by any means, else people wouldn't bother producing new versions of > it, porting compilers to new processors, and so on. Contrast this to > Python 1, and soon Python 2, which actually *is* obsolete (just as > FORTRAN 66 *is* obsolete). Much as the Python core team might wish Python 2 would "soon" be obsolete, all of these things are happening for python 2.x now and all indications are that they will continue to happen. PyPy, Jython, ShedSkin, Skulpt, IronPython, and possibly a few others are (to varying degrees) all targeting 2.x right now, because that's where the application code they want to run is. PyPy is even porting the JIT compiler to a new processor (ARM). F66 is indeed obsolete, but it became obsolete because people stopped using it, not because the standards committee declared it so. >> Much of it is being in FORTRAN 77 > > Can you prove this? I trust that existing code is being maintained > in FORTRAN 77. For new code, I'm skeptical. I am not deeply immersed in the world where F77 is still popular, so I don't have any citations for you, but casual conversations with people working in the sciences, especially chemistry and materials science, suggests to me that a lot of F77 and start new projects in it. (I can see someone with more direct experience promptly replied in this thread already, anyway.) >> There are plenty of proprietary Python 2 systems which exist today for >> which there will not be a budget for a Python 3 migration this decade. > > And people using it can happily continue to use Python 2. If they > don't have a need to port their code to Python 3, they are not concerned > by whether you use a u prefix for strings in Python 3 or not. I didn't say they didn't have a need ever, I said they didn't have a budget now. What you are saying to those users here is basically: "if you can't migrate today, then just don't bother, we're never going to make it any easier". Despite the fact that I ultimately agree on u'' (nobody should care about this), it is not a good message to send. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Dec 9 01:56:00 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 01:56:00 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: <20111209015600.4cbc5cf1@pitrou.net> On Thu, 8 Dec 2011 19:52:28 -0500 Glyph wrote: > Zooming back in to the actual issue this thread is about, I think the u""-vs-"" issue is a bit of a red herring, because the _real_ problem here is that 2to3 is slow and buggy and so migration efforts are starting to work around it, and therefore want to run the same code on 3.x and all the way back to 2.5. > > In my opinion, effort should be spent on optimizing the suggested migration tools and getting them to work properly, not twiddling the syntax so that it's marginally easier to avoid them. Instead of modifying 2.x code and running 2to3 time after time on it, you can use 2to3 on unmodified 2.x code and fix the generated 3.x code. With proper use of branches and a DVCS, merging later 2.x changes should be mostly painless. (at least it works on https://bitbucket.org/pitrou/t3k/) Regards Antoine. From vinay_sajip at yahoo.co.uk Fri Dec 9 02:39:39 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Fri, 9 Dec 2011 01:39:39 +0000 (UTC) Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: Terry Reedy udel.edu> writes: > More that one person has said that *any* use of 2to3 is impractical for > rapid-turnaround development because 2to3 is 'too slow'. If so, have the > usual methods for speeding up a Python program been applied? Has anyone > profiled 2to3? Is most of the time spent in 2to3 itself or some > particular module that it uses? Is the time that is spend in 2to3 itself > a result of the overall framework or particular fixers? If the latter, > can slow fixers be eliminated by using a compatibility hack in the > Python 2 code? Has anyone tried to compile 2to3 and prerequisite > Python-coded modules? > It's not the speed of 2to3 per se; this seems very reasonable for a tool of its type. It's the overall process, which currently involves running 2to3 on an entire codebase (for example, using setup.py with flags to run 2to3 during setup). With a large project like Django, and hundreds or thousands of source files, 2to3 used in this way is on a hiding to nothing; no amount of profiling and tweaking is likely to lead to acceptable turnaround. However, 2to3 tools could be developed which are based on 2to3/lib2to3 and are *incremental* in nature; then as you edit and save a file, its processed version could be available very shortly afterwards (since we only need to translate the file that was saved) - this would be even quicker in an IDE where the 2to3 code (and perhaps the AST of files being worked on) could remain loaded in memory over an entire development session. That, along with some more/smarter fixers, could go some way to addressing the "too slow" issue. Regards, Vinay Sajip From tjreedy at udel.edu Fri Dec 9 03:01:30 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 08 Dec 2011 21:01:30 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: On 12/8/2011 7:52 PM, Glyph wrote: > Zooming back in to the actual issue this thread is about, I think the > u""-vs-"" issue is a bit of a red herring, because the _real_ problem > here is that 2to3 is slow and buggy and so migration efforts are > starting to work around it, and therefore want to run the same code on > 3.x and all the way back to 2.5. I would expect that running one codebase would push one to only run on 2.6+, which would make one codebase easier, but it does not seem to. > In my opinion, effort should be spent on optimizing the suggested > migration tools and getting them to work properly, not twiddling the > syntax so that it's marginally easier to avoid them. This is what I tried to say in my last post. ... > I didn't say they didn't have a /need ever/, I said they didn't have a > /budget now/. What you are saying to those users here is basically: "if > you can't migrate today, then just don't bother, we're never going to > make it any easier". Despite the fact that I ultimately agree on u'' > (nobody should care about this), it is not a good message to send. I agree that would not be a good message, but a) I do not think that was the intent (I think is was more like "the *current* start of porting tools is a moot point for those not now porting") and b) good messages go both ways. People say "Python 2 is where the money is, it has (almost?) all the production apps, etcetera." Probably (mostly?) true. So where is the support from the vast army of 2.7 users for continuing to polish 2.7 past the normal 2 years (which ended last June)? Or for improving the migration tools? -- Terry Jan Reedy From regebro at gmail.com Fri Dec 9 03:50:16 2011 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 9 Dec 2011 03:50:16 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: "from future import unicode_literals" is my fault. I'm sorry. It's pretty useless. It was suggested by somebody and I then supported it's adding, instead of allowing u'' which I suggested. But it doesn't work. One reason is that you need to be able to say "This should be str in Python 2, and binary in Python 3, that should be Unicode in Python 2 and str in Python 3, and that over there should be str in both versions", and the future import doesn't support that. Adding u'' support solves the problem, but then again, so does having a b() and an u() method. I'm not sure of the utility of adding functionality to Python 3 that can be solved with six. //Lennart From guido at python.org Fri Dec 9 03:53:55 2011 From: guido at python.org (Guido van Rossum) Date: Thu, 8 Dec 2011 18:53:55 -0800 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: Are you saying that with that future import, b"..." is still a Unicode literal? On Thu, Dec 8, 2011 at 6:50 PM, Lennart Regebro wrote: > "from future import unicode_literals" is my fault. I'm sorry. It's > pretty useless. It was suggested by somebody and I then supported it's > adding, instead of allowing u'' which I suggested. But it doesn't > work. > > One reason is that you need to be able to say "This should be str in > Python 2, and binary in Python 3, that should be Unicode in Python 2 > and str in Python 3, and that over there should be str in both > versions", and the future import doesn't support that. > > Adding u'' support solves the problem, but then again, so does having > a b() and an u() method. I'm not sure of the utility of adding > functionality to Python 3 that can be solved with six. > > //Lennart > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Dec 9 04:11:10 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 13:11:10 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: On Fri, Dec 9, 2011 at 12:01 PM, Terry Reedy wrote: > On 12/8/2011 7:52 PM, Glyph wrote: >> >> Zooming back in to the actual issue this thread is about, I think the >> u""-vs-"" issue is a bit of a red herring, because the _real_ problem >> here is that 2to3 is slow and buggy and so migration efforts are >> starting to work around it, and therefore want to run the same code on >> 3.x and all the way back to 2.5. > > > I would expect that running one codebase would push one to only run on 2.6+, > which would make one codebase easier, but it does not seem to. Actually, most of the feedback I've heard is that using one codebase is comparatively straightforward if you can drop support for 2.5 and earlier. Mainly because of this: >>> from __future__ import unicode_literals >>> from __future__ import print_function >>> print >>> print(type('')) >>> print(type(b'')) That's why I'm quite happy to say to people that if they currently have to support 2.5 or earlier, and they're not prepared to fork their codebase or drop support for those earlier Python versions in new releases, then it's *perfectly fine* for them to delay their 3.x support until they *can* use the compatibility tools we provide to make "single source" approaches easier. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Fri Dec 9 04:34:08 2011 From: barry at python.org (Barry Warsaw) Date: Thu, 8 Dec 2011 22:34:08 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: <20111208223408.0e2e8bd1@limelight.wooz.org> On Dec 09, 2011, at 03:50 AM, Lennart Regebro wrote: >One reason is that you need to be able to say "This should be str in >Python 2, and binary in Python 3, that should be Unicode in Python 2 >and str in Python 3, and that over there should be str in both >versions", and the future import doesn't support that. Sorry, I don't understand this. What does it mean to be "str in both versions"? And why would you want that? As for "str in Python 2 and binary in Python 3", b'' prefixes do that in Python >= 2.6 without the future import (if I take "binary" to mean bytes type). As for "Unicode in Python 2 and str in Python 3", unadorned strings with the future import in Python >= 2.6 does that just fine. One of the nice things too is that with #include in Python >= 2.6, changing all your PyStrings to PyBytes, you can get the same behavior in your extension modules. You still need to be clear about what are bytes and what are strings. The problem comes when you aren't or can't be sure, i.e. you have objects that are sometimes one and sometimes the other. Such as email headers. In that case, you're kind of screwed. Python 2's str type let you cheat, but not without consequences. Those consequences are spelled "UnicodeErrors" and I'll be glad to be rid of them. Cheers, -Barry From barry at python.org Fri Dec 9 04:38:16 2011 From: barry at python.org (Barry Warsaw) Date: Thu, 8 Dec 2011 22:38:16 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: <20111208223816.2329a110@limelight.wooz.org> On Dec 08, 2011, at 06:53 PM, Guido van Rossum wrote: >Are you saying that with that future import, b"..." is still a Unicode >literal? No, the future import has no impact on b-strings. -----snip snip----- from __future__ import print_function import sys print(sys.version_info.major, sys.version_info.minor, type(b'')) -----snip snip----- $ python /tmp/foo.py 2 7 $ python3 /tmp/foo.py 3 2 -----snip snip----- from __future__ import print_function, unicode_literals import sys print(sys.version_info.major, sys.version_info.minor, type(b'')) -----snip snip----- $ python /tmp/foo.py 2 7 $ python3 /tmp/foo.py 3 2 Cheers, -Barry From chrism at plope.com Fri Dec 9 05:24:33 2011 From: chrism at plope.com (Chris McDonough) Date: Thu, 08 Dec 2011 23:24:33 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111208223408.0e2e8bd1@limelight.wooz.org> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> Message-ID: <1323404673.2710.132.camel@thinko> On Thu, 2011-12-08 at 22:34 -0500, Barry Warsaw wrote: > On Dec 09, 2011, at 03:50 AM, Lennart Regebro wrote: > > >One reason is that you need to be able to say "This should be str in > >Python 2, and binary in Python 3, that should be Unicode in Python 2 > >and str in Python 3, and that over there should be str in both > >versions", and the future import doesn't support that. > > Sorry, I don't understand this. What does it mean to be "str in both > versions"? And why would you want that? > > As for "str in Python 2 and binary in Python 3", b'' prefixes do that in > Python >= 2.6 without the future import (if I take "binary" to mean bytes > type). > > As for "Unicode in Python 2 and str in Python 3", unadorned strings with the > future import in Python >= 2.6 does that just fine. > > One of the nice things too is that with #include in Python >= > 2.6, changing all your PyStrings to PyBytes, you can get the same behavior in > your extension modules. > > You still need to be clear about what are bytes and what are strings. The > problem comes when you aren't or can't be sure, i.e. you have objects that are > sometimes one and sometimes the other. Such as email headers. In that case, > you're kind of screwed. Python 2's str type let you cheat, but not without > consequences. Those consequences are spelled "UnicodeErrors" and I'll be glad > to be rid of them. The PEP 3333 WSGI protocol *requires* that you present its APIs with "native strings" (str on Python 3, str on Python 2). So while the oversimplification "don't do that" sounds great here, in real life, not so much. - C From chrism at plope.com Fri Dec 9 05:33:24 2011 From: chrism at plope.com (Chris McDonough) Date: Thu, 08 Dec 2011 23:33:24 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: <1323405204.2710.139.camel@thinko> On Fri, 2011-12-09 at 03:50 +0100, Lennart Regebro wrote: > "from future import unicode_literals" is my fault. I'm sorry. It's > pretty useless. It was suggested by somebody and I then supported it's > adding, instead of allowing u'' which I suggested. But it doesn't > work. > > One reason is that you need to be able to say "This should be str in > Python 2, and binary in Python 3, that should be Unicode in Python 2 > and str in Python 3, and that over there should be str in both > versions", and the future import doesn't support that. This is also true. But even so, b'' exists as a porting nicety. The argument for supporting u'' is the same one the one which exists for b'', except in the opposite direction. Since popular library code is going to need to run on both Python 2 and Python 3 for the foreseeable future, anything to make this easier helps. Supporting u'' in 3.3 will prevent me from needing to think about bytes/text distinction again while porting/straddling. Every time I say this to somebody who isn't listening closely they say "AHA! You're *supposed* to think about bytes vs. text, that's the whole point stupid!" They fail to hear the "again" in that sentence. I've clearly already thought about the distinction between bytes and text at least once: that's *why* I'm using a u'' literal there. I shouldn't have to think about it again to service syntax constraints. Code that is more explicit than strictly necessary should not be needlessly punished. Continuing to not support u'' in Python 3 will be like having an immigration station where folks who have a b'ritish' passport can get through right away, but folks with a u'kranian' passport need to get back on a plane that appears to come from the Ukraine before they receive another tag that says they are indeed from the Ukraine. It's just pointless makework. - C From ncoghlan at gmail.com Fri Dec 9 06:30:36 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 15:30:36 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323405204.2710.139.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323405204.2710.139.camel@thinko> Message-ID: On Fri, Dec 9, 2011 at 2:33 PM, Chris McDonough wrote: > Continuing to not support u'' in Python 3 will be like having an > immigration station where folks who have a ?b'ritish' passport can get > through right away, but folks with a u'kranian' passport need to get > back on a plane that appears to come from the Ukraine before they > receive another tag that says they are indeed from the Ukraine. ?It's > just pointless makework. OK, I think I finally understand your point. You want the ability to be able to, in your Python 2.x code, write modules that use *all three* kinds of string literal: ---------- foo = u"this is a Unicode string in both Python 2.x and 3.x" bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x" baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" ---------- This is driven by the desire to use APIs (like the PEP 3333 version of WSGI) that are defined in terms of "native strings" in the context of applications that already include a strong binary/text separation. Currently, in modules shared between the two series, you can't use the "u" marker at all, since Python 3.x leaves it out as being redundant - instead, you have a binary switch (in the form of the future import) that lets you toggle the behaviour of basic string literals between the first two forms: ---------- bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x" baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" ---------- from __future__ import unicode_literals foo = "this is a Unicode string in both Python 2.x and 3.x" baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" ---------- Currently, to get all 3 kinds of behaviour in a shared codebase without additional function calls at runtime, you need to pick one set of strings (either "always Unicode" or "native string type") and move them out to a separate module. So, for example, depending on which set you decided to move: ---------- from unicode_strings import foo bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x" baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" ---------- from __future__ import unicode_literals foo = "this is a Unicode string in both Python 2.x and 3.x" from native_strings import bar baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" ---------- Or, alternatively, you use 'six' (or a similar compatibility module) and ensure unicode at runtime, using native or binary strings otherwise: ---------- from six import u foo = u("this is a Unicode string in both Python 2.x and 3.x") bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x" baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" ---------- If you want to target 3.2, you *have* to use one of those mechanisms - any potential restoration of u'' syntax support won't help you (and even after 3.3 gets released in the latter half of next year, it's still going to be a fair while before it makes it's way into the various distros, especially the ones that include long term support from major vendors). So, instead of attempting to paper over the problem by reintroducing u'', perhaps the discussion we should be having is whether or not PEP 3333's superficially appealing concept of defining an API in terms of "native strings" is a loser in practice, and we should instead be looking more closely at PEP 444 (since that goes the route of using 'str' in 2.x and 'bytes' in 3.x, thus rendering "from __future__ import unicode_literals" an adequate solution for 2.6+ compatibility). The amount of pain that PEP 3333 seems to be causing in the web development world suggests to me we may simply have been *wrong* to think that PEP 3333 would be a workable long term approach. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From chrism at plope.com Fri Dec 9 06:33:59 2011 From: chrism at plope.com (Chris McDonough) Date: Fri, 09 Dec 2011 00:33:59 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: <1323408839.2710.143.camel@thinko> On Thu, 2011-12-08 at 19:52 -0500, Glyph wrote: > Zooming back in to the actual issue this thread is about, I think the > u""-vs-"" issue is a bit of a red herring, because the _real_ problem > here is that 2to3 is slow and buggy and so migration efforts are > starting to work around it, and therefore want to run the same code on > 3.x and all the way back to 2.5. Even if it weren't slow, I still wouldn't use it to automatically convert code at install time; a single codebase is easier to reason about, and easier to support. Users send me tracebacks all the time; having them match the source is a wonderful thing. - C From ncoghlan at gmail.com Fri Dec 9 06:41:40 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 15:41:40 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323408839.2710.143.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> Message-ID: On Fri, Dec 9, 2011 at 3:33 PM, Chris McDonough wrote: > Even if it weren't slow, I still wouldn't use it to automatically > convert code at install time; a single codebase is easier to reason > about, and easier to support. ?Users send me tracebacks all the time; > having them match the source is a wonderful thing. Yeah, if single source doesn't work, then I think Antoine's suggested way (i.e. convert once, then maintain two distinct branches and builds, the way python-dev did for years with the standard library) is a more sane option. It lets you investigate tracebacks properly, it reduces your cycle times, etc, etc. With a modern DVCS, it should be significantly less painful than it was for us when we were maintaining four branches with only svnmerge to help out. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Fri Dec 9 06:43:35 2011 From: guido at python.org (Guido van Rossum) Date: Thu, 8 Dec 2011 21:43:35 -0800 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323408839.2710.143.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> Message-ID: On Thu, Dec 8, 2011 at 9:33 PM, Chris McDonough wrote: > On Thu, 2011-12-08 at 19:52 -0500, Glyph wrote: > > Zooming back in to the actual issue this thread is about, I think the > > u""-vs-"" issue is a bit of a red herring, because the _real_ problem > > here is that 2to3 is slow and buggy and so migration efforts are > > starting to work around it, and therefore want to run the same code on > > 3.x and all the way back to 2.5. > > Even if it weren't slow, I still wouldn't use it to automatically > convert code at install time; a single codebase is easier to reason > about, and easier to support. Users send me tracebacks all the time; > having them match the source is a wonderful thing. Even though 2to3 was my idea, I am gradually beginning to appreciate this approach. I skimmed the docs for "six" and liked it. But I think the specific proposal of adding u"..." literals back to 3.3 is not going to do much good. If we had had the foresight way back when, we could have added them back to 3.1 and we would have been okay. But having them in 3.3 but not in 3.2 is just adding insult to injury. I recommend writing b"...".decode('utf-8'); maybe six's u() does the same? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrism at plope.com Fri Dec 9 07:01:10 2011 From: chrism at plope.com (Chris McDonough) Date: Fri, 09 Dec 2011 01:01:10 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> Message-ID: <1323410470.2710.158.camel@thinko> On Thu, 2011-12-08 at 21:43 -0800, Guido van Rossum wrote: > On Thu, Dec 8, 2011 at 9:33 PM, Chris McDonough > wrote: > On Thu, 2011-12-08 at 19:52 -0500, Glyph wrote: > > Zooming back in to the actual issue this thread is about, I > think the > > u""-vs-"" issue is a bit of a red herring, because the > _real_ problem > > here is that 2to3 is slow and buggy and so migration efforts > are > > starting to work around it, and therefore want to run the > same code on > > 3.x and all the way back to 2.5. > > > Even if it weren't slow, I still wouldn't use it to > automatically > convert code at install time; a single codebase is easier to > reason > about, and easier to support. Users send me tracebacks all > the time; > having them match the source is a wonderful thing. > > Even though 2to3 was my idea, I am gradually beginning to appreciate > this approach. I skimmed the docs for "six" and liked it. > > But I think the specific proposal of adding u"..." literals back to > 3.3 is not going to do much good. If we had had the foresight way back > when, we could have added them back to 3.1 and we would have been > okay. But having them in 3.3 but not in 3.2 is just adding insult to > injury. AFAICT, at the current pace of porting, lots of authors of existing, popular Python 2 libraries won't be releasing a ported/straddled version any time soon; almost certainly many won't even begin work on a port until after 3.3 is final. As a result, on the supplier side, there will be plenty of code that will eventually work only as a straddle across 2.6, 2.7, and 3.3. On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases will have the wherewithal to compile their own Python 3 (or use a PPA or equivalent) until the distros catch up. So I'm not sure why 3.2 not having support for u'' should be a real blocker for the change. > I recommend writing b"...".decode('utf-8'); maybe six's u() does the > same? It does this: def u(s): return unicode(s, "unicode_escape") That's two Python function calls, of course, which is obviously icky if you use a lot of literals at a nonmodule scope. - C From ncoghlan at gmail.com Fri Dec 9 07:36:03 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 16:36:03 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323410470.2710.158.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> <1323410470.2710.158.camel@thinko> Message-ID: On Fri, Dec 9, 2011 at 4:01 PM, Chris McDonough wrote: > On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases > will have the wherewithal to compile their own Python 3 (or use a PPA or > equivalent) until the distros catch up. > > So I'm not sure why 3.2 not having support for u'' should be a real > blocker for the change. If this argument was valid, people wouldn't be so worried about maintaining 2.5 compatibility in their libraries. Consider if I tried to make this argument to justify everyone dropping 2.5 and earlier support today: """On the consumer side, folks who want to run 2.6+ codebases on older Linux distros have the wherewithal to compile their own more recent Python 2 (or use a PPA or equivalent) until they can move to a more recent version of their distro.""" It's simply not true in the general case - people don't maintain 2.4+ compatibility for fun, they do it because RHEL5 (and CentOS 5, etc) are still reasonably common and ship with 2.4 as the system Python. As soon as you switch away from the system provided Python, you're switching away from the vendors entire pre-packaged Python *stack*, not just the interpreter itself. You then have to install (and generally build) *everything* for yourself. While that is certainly possible these days (and a lot simpler than it used to be), it's still not trivial [1]. Since 3.2 is already quite usable for applications that aren't fighting with the "native strings" problem (which seems to be the common thread running through the complaints I've heard from web framework authors), and with it being included in at least the next Ubuntu LTS, current versions of Fedora, Arch, etc, it's going to be around for a long time. Ignoring 3.1 is a reasonable option. Ignoring 3.2 entirely is unlikely to be viable for anyone that is interested in supporting 3.x within the next couple of years - the 3.3 release is at least 9 months away, and it's also going to take a while for it to make its way into distros after the final release gets published on python.org. Hence my suggestion: perhaps the problem is the fact that PEP 3.3/WSGI 1.0.1 introduced the "native string" concept as a minimalist hack to try to get a usable gateway interface in Python 3, and that just doesn't work in practice when attempting to straddle 2.x and 3.x (because the values WSGI is dealing with aren't really text, they're bytes, only *some* of which represent text). Perhaps a PEP 444 based model would be less painful and more coherent in the long run? Cheers, Nick. [1] http://readthedocs.org/docs/ncoghlan_devs-python-notes/en/latest/venv_bootstrap.html -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From chrism at plope.com Fri Dec 9 08:38:05 2011 From: chrism at plope.com (Chris McDonough) Date: Fri, 09 Dec 2011 02:38:05 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> <1323410470.2710.158.camel@thinko> Message-ID: <1323416285.2710.219.camel@thinko> On Fri, 2011-12-09 at 16:36 +1000, Nick Coghlan wrote: > On Fri, Dec 9, 2011 at 4:01 PM, Chris McDonough wrote: > > On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases > > will have the wherewithal to compile their own Python 3 (or use a PPA or > > equivalent) until the distros catch up. > > > > So I'm not sure why 3.2 not having support for u'' should be a real > > blocker for the change. > > If this argument was valid, people wouldn't be so worried about > maintaining 2.5 compatibility in their libraries. Consider if I tried > to make this argument to justify everyone dropping 2.5 and earlier > support today: > > """On the consumer side, folks who want to run 2.6+ codebases on older > Linux distros have the wherewithal to compile their own more recent > Python 2 (or use a PPA or > equivalent) until they can move to a more recent version of their distro.""" Fair point. That said, personally, I have given up entirely on Python 2.4 and 2.5 support for newer versions of my OSS libraries. I continue to backport fixes and (some) features to older library versions so folks can run those on systems that require older Pythons. I gave up 2.5 support fairly recently across everything new, and I gave up support for 2.4 a year ago or more in new releases with the same intent. In reality, there is only one major platform that requires 2.4: RHEL 5 and folks who use it will just need to also use old versions of popular libraries; trying to support it for all future feature work until it's EOLed is not sane unless someone pays for it. Python 2.5 has slightly more compelling platforms (GAE and Jython), but GAE is moving to Python 2.7 and Jython is a bit moribund these days and is not really popular enough that a critical mass of folks will clamor for new-and-shiny releases that run on it. The upshot is that most newly created code only needs to run on Python 2.6 and *some* version of Python 3. And being able to eventually write that code in a nonsucky subset of Python 2/3 is important to me, because I'm going to be developing software in that subset for many years (way past the timeframe we're talking about in which Python 3.2 will rule the roost). > It's simply not true in the general case - people don't maintain 2.4+ > compatibility for fun, they do it because RHEL5 (and CentOS 5, etc) > are still reasonably common and ship with 2.4 as the system Python. As > soon as you switch away from the system provided Python, you're > switching away from the vendors entire pre-packaged Python *stack*, > not just the interpreter itself. You then have to install (and > generally build) *everything* for yourself. While that is certainly > possible these days (and a lot simpler than it used to be), it's still > not trivial [1]. > > Since 3.2 is already quite usable for applications that aren't > fighting with the "native strings" problem (which seems to be the > common thread running through the complaints I've heard from web > framework authors), and with it being included in at least the next > Ubuntu LTS, current versions of Fedora, Arch, etc, it's going to be > around for a long time. Ignoring 3.1 is a reasonable option. Ignoring > 3.2 entirely is unlikely to be viable for anyone that is interested in > supporting 3.x within the next couple of years - the 3.3 release is at > least 9 months away, and it's also going to take a while for it to > make its way into distros after the final release gets published on > python.org. > > Hence my suggestion: perhaps the problem is the fact that PEP 3.3/WSGI > 1.0.1 introduced the "native string" concept as a minimalist hack to > try to get a usable gateway interface in Python 3, and that just > doesn't work in practice when attempting to straddle 2.x and 3.x > (because the values WSGI is dealing with aren't really text, they're > bytes, only *some* of which represent text). Perhaps a PEP 444 based > model would be less painful and more coherent in the long run? Possibly. I was the original author of PEP 444 with help from Armin. (although it has since been taken up by Alice and I do not support the updates it has received since then). A bytes-oriented WSGI-like protocol was always the saner option. The native string idea optimized in exactly the wrong place, which was to make it easy to write WSGI middleware, where you're required to do lots of textlike manipulation of header values. The idea of using bytes in places where PEP 3333 now mandates native strings was rejected because people were (somewhat justifiably) horrified at what they had to do in order to attempt treat bytes like strings in this context on Python 3 at the time. It has gotten better, but maybe still not better enough to appease the folks who blocked the idea originally. But all of that is just arguing with the umpire at this point. Promoting and getting consensus about a different protocol will hurt a lot. PEP 3333 was borne of months of intense periods of arguing and compromise. It is the way it is now because everyone was too exhausted to argue about it any more. I don't think that has changed much since it was accepted, and asking folks to go back to that particular drawing board is unlikely to have promising results. Folks have already spent many hours, and lots of money on implementations that the current PEP. They may hunt us down and murder us one by one. ;-) PEP 3333, to its credit, is also remarkably backwards compatible with PEP 333, requiring very little change in existing Python 2 WSGI implementations, which helps Python 2 folks a lot. Given an effective choice between enabling six lines of code in Python 3.3 to support u'' and months of political wrangling and code rewriting, I'll choose the former any day. If we were talking about a change to Python that actually required nontrivial effort, had some sort of nominal consequence, or had some sort of non-theoretical downside, I'd be a lot less sanguine about it. But this is just a no-brainer in the long term, AFAICT. - C From stefan_ml at behnel.de Fri Dec 9 09:02:35 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 09 Dec 2011 09:02:35 +0100 Subject: [Python-Dev] Fixing the XML batteries Message-ID: Hi everyone, I think Py3.3 would be a good milestone for cleaning up the stdlib support for XML. Note upfront: you may or may not know me as the maintainer of lxml, the de-facto non-stdlib standard Python XML tool. This (lengthy) post was triggered by the following kind of conversation that I keep having with new XML users in Python (mostly on c.l.py), which hints at some serious flaw in the stdlib. User: I'm trying to do XML stuff XYZ in Python and have problem ABC. Me: What library are you using? Could you show us some code? User: My code looks like this snippet: ... Me: You are using minidom which is known to be hard to use, slow and uses lots of memory. Use the xml.etree.ElementTree package instead, or rather its C implementation cElementTree, also in the stdlib. User (coming back after a while): thanks, that was exactly what [I didn't know] I was looking for. What does this tell us? 1) MiniDOM is what new users find first. It's highly visible because there are still lots of ancient "Python and XML" web pages out there that date back from the time before Python 2.5 (or rather something like 2.2), when it was the only XML tree library in the stdlib. It's also the first hit from the top when you search for "XML" on the stdlib docs page and contains the (to some people) familiar word "DOM", which lets users stop their search and start writing code, not expecting to find a separate alternative in the same stdlib, way further down. And the description as "mini", "simple" and "lightweight" suggests to users that it's going to be easy to use and efficient. 2) MiniDOM is not what users want. It leads to complicated, unpythonic code and lots of problems. It is neither easy to use, nor efficient, nor "lightweight", "simple" or "mini", not in absolute numbers (see http://bugs.python.org/issue11379#msg148584 and following for a recent discussion). It's also badly maintained in the sense that its performance characteristics could likely be improved, but no-one is seriously interested in doing that, because it would not lead to something that actually *is* fast or memory friendly compared to any of the 'real' alternatives that are available right now. 3) ElementTree is what users should use, MiniDOM is not. ElementTree was added to the stdlib in Py2.5 on popular demand, exactly because it is very easy to use, very fast, and very memory friendly. And because users did not want to use MiniDOM any more. Today, ElementTree has a rather straight upgrade path towards lxml.etree if more XML features like validation or XSLT are needed. MiniDOM has nothing like that to offer. It's a dead end. 4) In the stdlib, cElementTree is independent of ElementTree, but totally hidden in the documentation. In conversations like the above, it's unnecessarily complex to explain to users that there is ElementTree (which is documented in the stdlib), but that what they want to use is really cElementTree, which has the same API but does not have a stdlib documentation page that I can send them to. Note that the other Python implementations simply provide cElementTree as an alias for ElementTree. That leaves CPython as the only Python implementation that really has these two separate modules. So, there are many problems here. And I think they make it unnecessarily complicated for users to process XML in Python and that the current situation helps in turning away new users from Python as a language for XML processing. Python does have impressively great tools for working with XML. It's just that the stdlib and its documentation do not reflect or even appreciate that. What should change? a) The stdlib documentation should help users to choose the right tool right from the start. Instead of using the totally misleading wording that it uses now, it should be honest about the performance characteristics of MiniDOM and should actively suggest that those who don't know what to choose (or even *that* they can choose) should not use MiniDOM in the first place. I created a ticket (issue11379) for a minor step in this direction, but given the responses, I'm rather convinced that there's a lot more that can be done and should be done, and that it should be done now, right for the next release. b) cElementTree should finally loose it's "special" status as a separate library and disappear as an accelerator module behind ElementTree. This has been suggested a couple of times already, and AFAIR, there was some opposition because 1) ET was maintained outside of the stdlib and 2) the APIs of both were not identical. However, getting ET 1.3 into Py2.7 and 3.2 was a U-turn. Today, ET is *only* being maintained in the stdlib by Florent Xicluna (who is doing a good job with it), and ET 1.3 has basically made the APIs of both implementations compatible again. So, 3.3 would be the right milestone for fixing the "two libs for one" quirk. Given that this is the third time during the last couple of years that I'm suggesting to finally fix the stdlib and its documentation, I won't provide any further patches before it has finally been accepted that a) this is a problem and b) it should be fixed, thus allowing the patches to actually serve a purpose. If we can agree on that, I'll happily help in making this change happen. Stefan From ncoghlan at gmail.com Fri Dec 9 09:09:46 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 18:09:46 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323416285.2710.219.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> <1323410470.2710.158.camel@thinko> <1323416285.2710.219.camel@thinko> Message-ID: Given that WSGI 1.0.1 is defined in terms of native strings and restoring u'' support allows that to be expressed clearly in a shared codebase, I at least understand the point of the suggestion now. I'm not quite convinced restoring u'' is the right answer as yet, but a solid use case is always a nice place to start :) -- Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) On Dec 9, 2011 5:38 PM, "Chris McDonough" wrote: > On Fri, 2011-12-09 at 16:36 +1000, Nick Coghlan wrote: > > On Fri, Dec 9, 2011 at 4:01 PM, Chris McDonough > wrote: > > > On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases > > > will have the wherewithal to compile their own Python 3 (or use a PPA > or > > > equivalent) until the distros catch up. > > > > > > So I'm not sure why 3.2 not having support for u'' should be a real > > > blocker for the change. > > > > If this argument was valid, people wouldn't be so worried about > > maintaining 2.5 compatibility in their libraries. Consider if I tried > > to make this argument to justify everyone dropping 2.5 and earlier > > support today: > > > > """On the consumer side, folks who want to run 2.6+ codebases on older > > Linux distros have the wherewithal to compile their own more recent > > Python 2 (or use a PPA or > > equivalent) until they can move to a more recent version of their > distro.""" > > Fair point. > > That said, personally, I have given up entirely on Python 2.4 and 2.5 > support for newer versions of my OSS libraries. I continue to backport > fixes and (some) features to older library versions so folks can run > those on systems that require older Pythons. I gave up 2.5 support > fairly recently across everything new, and I gave up support for 2.4 a > year ago or more in new releases with the same intent. > > In reality, there is only one major platform that requires 2.4: RHEL 5 > and folks who use it will just need to also use old versions of popular > libraries; trying to support it for all future feature work until it's > EOLed is not sane unless someone pays for it. Python 2.5 has slightly > more compelling platforms (GAE and Jython), but GAE is moving to Python > 2.7 and Jython is a bit moribund these days and is not really popular > enough that a critical mass of folks will clamor for new-and-shiny > releases that run on it. > > The upshot is that most newly created code only needs to run on Python > 2.6 and *some* version of Python 3. And being able to eventually write > that code in a nonsucky subset of Python 2/3 is important to me, because > I'm going to be developing software in that subset for many years (way > past the timeframe we're talking about in which Python 3.2 will rule the > roost). > > > It's simply not true in the general case - people don't maintain 2.4+ > > compatibility for fun, they do it because RHEL5 (and CentOS 5, etc) > > are still reasonably common and ship with 2.4 as the system Python. As > > soon as you switch away from the system provided Python, you're > > switching away from the vendors entire pre-packaged Python *stack*, > > not just the interpreter itself. You then have to install (and > > generally build) *everything* for yourself. While that is certainly > > possible these days (and a lot simpler than it used to be), it's still > > not trivial [1]. > > > > Since 3.2 is already quite usable for applications that aren't > > fighting with the "native strings" problem (which seems to be the > > common thread running through the complaints I've heard from web > > framework authors), and with it being included in at least the next > > Ubuntu LTS, current versions of Fedora, Arch, etc, it's going to be > > around for a long time. Ignoring 3.1 is a reasonable option. Ignoring > > 3.2 entirely is unlikely to be viable for anyone that is interested in > > supporting 3.x within the next couple of years - the 3.3 release is at > > least 9 months away, and it's also going to take a while for it to > > make its way into distros after the final release gets published on > > python.org. > > > > Hence my suggestion: perhaps the problem is the fact that PEP 3.3/WSGI > > 1.0.1 introduced the "native string" concept as a minimalist hack to > > try to get a usable gateway interface in Python 3, and that just > > doesn't work in practice when attempting to straddle 2.x and 3.x > > (because the values WSGI is dealing with aren't really text, they're > > bytes, only *some* of which represent text). Perhaps a PEP 444 based > > model would be less painful and more coherent in the long run? > > Possibly. I was the original author of PEP 444 with help from Armin. > (although it has since been taken up by Alice and I do not support the > updates it has received since then). > > A bytes-oriented WSGI-like protocol was always the saner option. The > native string idea optimized in exactly the wrong place, which was to > make it easy to write WSGI middleware, where you're required to do lots > of textlike manipulation of header values. The idea of using bytes in > places where PEP 3333 now mandates native strings was rejected because > people were (somewhat justifiably) horrified at what they had to do in > order to attempt treat bytes like strings in this context on Python 3 at > the time. It has gotten better, but maybe still not better enough to > appease the folks who blocked the idea originally. > > But all of that is just arguing with the umpire at this point. > Promoting and getting consensus about a different protocol will hurt a > lot. PEP 3333 was borne of months of intense periods of arguing and > compromise. It is the way it is now because everyone was too exhausted > to argue about it any more. I don't think that has changed much since > it was accepted, and asking folks to go back to that particular drawing > board is unlikely to have promising results. Folks have already spent > many hours, and lots of money on implementations that the current PEP. > They may hunt us down and murder us one by one. ;-) PEP 3333, to its > credit, is also remarkably backwards compatible with PEP 333, requiring > very little change in existing Python 2 WSGI implementations, which > helps Python 2 folks a lot. > > Given an effective choice between enabling six lines of code in Python > 3.3 to support u'' and months of political wrangling and code rewriting, > I'll choose the former any day. If we were talking about a change to > Python that actually required nontrivial effort, had some sort of > nominal consequence, or had some sort of non-theoretical downside, I'd > be a lot less sanguine about it. But this is just a no-brainer in the > long term, AFAICT. > > - C > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Fri Dec 9 09:20:42 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 09 Dec 2011 09:20:42 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111208223408.0e2e8bd1@limelight.wooz.org> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> Message-ID: <4EE1C4DA.9060809@v.loewis.de> > Sorry, I don't understand this. What does it mean to be "str in both > versions"? And why would you want that? One use case (and the only one I'm aware of) is to pass keyword parameters. Python 2 insists that they are str (and doesn't accept unicode), Python 3 insists that they are str (and doesn't accept bytes). This is fairly uncommon as a problem, though, and is also solved in Python 2.6, which does accept Unicode strings as keyword parameter names. Regards, Martin From martin at v.loewis.de Fri Dec 9 09:25:08 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 09 Dec 2011 09:25:08 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323405204.2710.139.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323405204.2710.139.camel@thinko> Message-ID: <4EE1C5E4.6090602@v.loewis.de> > They fail to hear the "again" in that sentence. I've clearly already > thought about the distinction between bytes and text at least once: > that's *why* I'm using a u'' literal there. I shouldn't have to think > about it again to service syntax constraints. Code that is more > explicit than strictly necessary should not be needlessly punished. But you don't have to think about this *again*, in none of the proposed alternatives (whether you use a u() function, whether you use the future import, or whether you use 2to3). They differ only (slightly) in how you spell Unicode literals, but all provide for explicit spelling of Unicode literals when applied. Regards, Martin From martin at v.loewis.de Fri Dec 9 09:32:03 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 09 Dec 2011 09:32:03 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323405204.2710.139.camel@thinko> Message-ID: <4EE1C783.8050306@v.loewis.de> > Or, alternatively, you use 'six' (or a similar compatibility module) > and ensure unicode at runtime, using native or binary strings > otherwise: > > ---------- > from six import u > foo = u("this is a Unicode string in both Python 2.x and 3.x") > bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x" > baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" > ---------- An alternative here is to use a function for bar, not foo: from __future__ import unicode_literals from six.next import native_str foo = "this is a Unicode string in both Python 2.x and 3.x" bar = native_str("this is an 7-bit string in Python 2.x" " and a Unicode string in 3.x") baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" Which of them is "better" depends on which of the two string types are more common. Regards, Martin From martin at v.loewis.de Fri Dec 9 09:41:15 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 09 Dec 2011 09:41:15 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: <4EE1C9AB.2040301@v.loewis.de> > a) The stdlib documentation should help users to choose the right tool > right from the start. Instead of using the totally misleading wording > that it uses now, it should be honest about the performance > characteristics of MiniDOM and should actively suggest that those who > don't know what to choose (or even *that* they can choose) should not > use MiniDOM in the first place. I disagree. The right approach is not to document performance problems, but to fix them. > b) cElementTree should finally loose it's "special" status as a separate > library and disappear as an accelerator module behind ElementTree. This > has been suggested a couple of times already, and AFAIR, there was some > opposition because 1) ET was maintained outside of the stdlib and 2) the > APIs of both were not identical. However, getting ET 1.3 into Py2.7 and > 3.2 was a U-turn. Unfortunately (?), there is a near-contract-like agreement with Fredrik Lundh that any significant changes to ElementTree in the standard library have to be agreed by him. So whatever change you plan: make sure Fredrik gives his explicit support. Regards, Martin From martin at v.loewis.de Fri Dec 9 09:44:13 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 09 Dec 2011 09:44:13 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: <20111209013535.6fb38068@pitrou.net> References: <20111209013535.6fb38068@pitrou.net> Message-ID: <4EE1CA5D.70705@v.loewis.de> Am 09.12.2011 01:35, schrieb Antoine Pitrou: > On Fri, 09 Dec 2011 00:16:02 +0100 > victor.stinner wrote: >> >> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) >> + >> + Get a new copy of a Unicode object. >> + >> + .. versionadded:: 3.3 > > I'm not sure I understand. Why would you make a copy of an immutable > object? It can convert a unicode subtype object into a an exact unicode object. I'd rename it to _PyUnicode_AsExactUnicode, and undocument it. Regards, Martin From stefan_ml at behnel.de Fri Dec 9 09:59:24 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 09 Dec 2011 09:59:24 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE1C9AB.2040301@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> Message-ID: "Martin v. L?wis", 09.12.2011 09:41: >> a) The stdlib documentation should help users to choose the right tool >> right from the start. Instead of using the totally misleading wording >> that it uses now, it should be honest about the performance >> characteristics of MiniDOM and should actively suggest that those who >> don't know what to choose (or even *that* they can choose) should not >> use MiniDOM in the first place. > > I disagree. The right approach is not to document performance problems, > but to fix them. Here's the relevant part of my mail that you stripped: >> It's also badly maintained in the sense that its performance >> characteristics could likely be improved, but no-one is seriously >> interested in doing that, because it would not lead to something that >> actually *is* fast or memory friendly compared to any of the 'real' >> alternatives that are available right now. I can't recall anyone working on any substantial improvements during the last six years or so, and the reason for that seems obvious to me. >> b) cElementTree should finally loose it's "special" status as a separate >> library and disappear as an accelerator module behind ElementTree. This >> has been suggested a couple of times already, and AFAIR, there was some >> opposition because 1) ET was maintained outside of the stdlib and 2) the >> APIs of both were not identical. However, getting ET 1.3 into Py2.7 and >> 3.2 was a U-turn. > > Unfortunately (?), there is a near-contract-like agreement with Fredrik > Lundh that any significant changes to ElementTree in the standard > library have to be agreed by him. So whatever change you plan: make sure > Fredrik gives his explicit support. Ok, I'll try to contact him. Stefan From solipsis at pitrou.net Fri Dec 9 09:54:15 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 09:54:15 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323405204.2710.139.camel@thinko> Message-ID: <20111209095415.1ae242d0@pitrou.net> On Fri, 9 Dec 2011 15:30:36 +1000 Nick Coghlan wrote: > > Or, alternatively, you use 'six' (or a similar compatibility module) > and ensure unicode at runtime, using native or binary strings > otherwise: > > ---------- > from six import u > foo = u("this is a Unicode string in both Python 2.x and 3.x") > bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x" > baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x" > ---------- > > If you want to target 3.2, you *have* to use one of those mechanisms - > any potential restoration of u'' syntax support won't help you (and > even after 3.3 gets released in the latter half of next year, it's > still going to be a fair while before it makes it's way into the > various distros, especially the ones that include long term support > from major vendors). > > So, instead of attempting to paper over the problem by reintroducing > u'', perhaps the discussion we should be having is whether or not PEP > 3333's superficially appealing concept of defining an API in terms of > "native strings" is a loser in practice, and we should instead be > looking more closely at PEP 444 It's not only PEP 3333. Many network protocol implementations will show the same characteristics (an FTP implementation accepting str in 2.x will also want to accept str in 3.x). But using six is a reasonable suggestion for those who want to share a single codebase accross 2.x and 3.x. Regards Antoine. From python-dev at masklinn.net Fri Dec 9 10:09:39 2011 From: python-dev at masklinn.net (Xavier Morel) Date: Fri, 9 Dec 2011 10:09:39 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE1C9AB.2040301@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> Message-ID: On 2011-12-09, at 09:41 , Martin v. L?wis wrote: >> a) The stdlib documentation should help users to choose the right tool >> right from the start. Instead of using the totally misleading wording >> that it uses now, it should be honest about the performance >> characteristics of MiniDOM and should actively suggest that those who >> don't know what to choose (or even *that* they can choose) should not >> use MiniDOM in the first place. > > I disagree. The right approach is not to document performance problems, > but to fix them. Even if performance problems "should not be documented", I think Stefan's point that users should be steered away from minidom and towards ET and cET is completely valid and worthy of support: the *only* advantage minidom has over ET is that it uses an interface familiar to Java users[0] (they are about the only people using actual W3C DOM, while the DOM exists in javascript I'd say most code out there actively tries to not touch it with anything less than a 10-foot library pole like jQuery). That interface is also, of course, absolutely dreadful. Minidom is inferior in interface flow and pythonicity, in terseness, in speed, in memory consumption (even more so using cElementTree, and that's not something which can be fixed unless minidom gets a C accelerator), etc? Even after fixing minidom (if anybody has the time and drive to commit to it), ET/cET should be preferred over it. And that's not even considering the ease of switching to lxml (if only for validators), which Stefan outlined. [0] not 100% true now that I think about it: handling mixed content is simpler in minidom as there is no .text/.tail duality and text nodes are nodes like every other, but I really can't think of an other reason to prefer minidom From ncoghlan at gmail.com Fri Dec 9 10:10:07 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 19:10:07 +1000 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE1C9AB.2040301@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> Message-ID: On Fri, Dec 9, 2011 at 6:41 PM, "Martin v. L?wis" wrote: >> a) The stdlib documentation should help users to choose the right tool >> right from the start. Instead of using the totally misleading wording >> that it uses now, it should be honest about the performance >> characteristics of MiniDOM and should actively suggest that those who >> don't know what to choose (or even *that* they can choose) should not >> use MiniDOM in the first place. > > I disagree. The right approach is not to document performance problems, > but to fix them. When we offer a better way to do something that new users are want to do, we generally redirect them to the more recent alternative. I believe the redirection from the getopt module to the argparse module strikes the right tone for that kind of thing: http://docs.python.org/library/getopt For the various XML libraries, a message along the lines of "Note: The module is a . If all you are trying to do is read and write XML files, consider using the xml.etree.ElementTree module instead". I'd also be +1 on adjusting the order of the XML pages in the main index such that xml.etree.ElementTree appeared before xml.parser.expat and all the others slid down one entry. These are simple changes that don't harm current users of the modules in the least, while being up front and very helpful for beginners. Again, I think argparse vs getopt is a good comparison: argparse appears first in the main index, and there's a redirection from getopt to argparse that says "if you don't have a specific reason to be using getopt, you probably want argparse instead". -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Dec 9 10:12:50 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 19:12:50 +1000 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: <4EE1CA5D.70705@v.loewis.de> References: <20111209013535.6fb38068@pitrou.net> <4EE1CA5D.70705@v.loewis.de> Message-ID: On Fri, Dec 9, 2011 at 6:44 PM, "Martin v. L?wis" wrote: > Am 09.12.2011 01:35, schrieb Antoine Pitrou: >> On Fri, 09 Dec 2011 00:16:02 +0100 >> victor.stinner wrote: >>> >>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) >>> + >>> + ? Get a new copy of a Unicode object. >>> + >>> + ? .. versionadded:: 3.3 >> >> I'm not sure I understand. Why would you make a copy of an immutable >> object? > > It can convert a unicode subtype object into a an exact unicode > object. > > I'd rename it to _PyUnicode_AsExactUnicode, and undocument it. Isn't it basically just exposing a C level version of the unicode() builtin's behaviour? While I agree the name could be better (and PyUnicode_AsExactUnicode would certainly work), why make it private? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Fri Dec 9 10:15:17 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 10:15:17 +0100 Subject: [Python-Dev] Fixing the XML batteries References: Message-ID: <20111209101517.47e03eae@pitrou.net> Mostly uninformed +1 to Stefan's suggestions from me. Regards Antoine. On Fri, 09 Dec 2011 09:02:35 +0100 Stefan Behnel wrote: > Hi everyone, > > I think Py3.3 would be a good milestone for cleaning up the stdlib support > for XML. Note upfront: you may or may not know me as the maintainer of > lxml, the de-facto non-stdlib standard Python XML tool. This (lengthy) post > was triggered by the following kind of conversation that I keep having with > new XML users in Python (mostly on c.l.py), which hints at some serious > flaw in the stdlib. [etc.] From tjreedy at udel.edu Fri Dec 9 11:03:41 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 09 Dec 2011 05:03:41 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On 12/8/2011 8:39 PM, Vinay Sajip wrote: > It's not the speed of 2to3 per se; this seems very reasonable for a > tool of its type > It's the overall process, which currently involves running 2to3 > on an > entire codebase (for example, using setup.py with flags to run 2to3 > during setup). Oh. That explains the 'slow' complaint. > However, 2to3 tools could be developed which are based on > 2to3/lib2to3 and are *incremental* in nature; then as you edit and > save a file, its processed version could be available very shortly > afterwards (since we only need to translate the file that was saved) I had assumed that people were aleady running 2to3 on a per edited file basis already. On a multi-core machine, I would think it possible to run 2to3 and then a test on the result in a separate process while tests are running on the 2.x version. -- Terry Jan Reedy From ncoghlan at gmail.com Fri Dec 9 11:17:29 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 9 Dec 2011 20:17:29 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On Fri, Dec 9, 2011 at 8:03 PM, Terry Reedy wrote: > On 12/8/2011 8:39 PM, Vinay Sajip wrote: >> on an >> >> entire codebase (for example, using setup.py with flags to run 2to3 >> during setup). > > > Oh. That explains the 'slow' complaint. As Chris pointed out though, the real problem with the "repeatedly run 2to3" workflow is that it can make interpreting tracebacks from the field *really* hard. That's where Antoine's suggested approach may be better - use 2to3 to do the initial mechanical update in a new branch, then subsequently use a process similar to what we do ourselves for the standard library (i.e. update the 2.x and 3.x versions in parallel, perhaps using 2to3 on a few files if they have changed substantially in a particular patch). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Fri Dec 9 11:35:35 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 11:35:35 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> Message-ID: <20111209113535.2e9d2d1b@pitrou.net> On Fri, 9 Dec 2011 15:41:40 +1000 Nick Coghlan wrote: > On Fri, Dec 9, 2011 at 3:33 PM, Chris McDonough wrote: > > Even if it weren't slow, I still wouldn't use it to automatically > > convert code at install time; a single codebase is easier to reason > > about, and easier to support. ?Users send me tracebacks all the time; > > having them match the source is a wonderful thing. > > Yeah, if single source doesn't work, then I think Antoine's suggested > way (i.e. convert once, then maintain two distinct branches and > builds, the way python-dev did for years with the standard library) is > a more sane option. My suggestion is actually to convert each time you pull changes from the 2.x sources. You have three branches: - the default 2.x branch - a branch containing changesets which are pristine 2to3 runs over the 2.x codebase - a branch containing the modified 3.x code The 2to3 branch can be updated through an automatic script. Each changeset should be a child of both the previous 2to3 changeset, and the 2.x changeset which 2to3 has been run on (in other words, each changeset - except the first one - is a merge). Then the changes from the 2to3 branch are simply merged to the 3.x branch. This is the only manual step, in that you have to fix potential conflicts and regressions. (I suppose the strategy can be reversed, i.e. maintain code primarily in the 3.x branch and use 3to2 to backport them to the 2.x codebase) Regards Antoine. From regebro at gmail.com Fri Dec 9 15:11:17 2011 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 9 Dec 2011 15:11:17 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> Message-ID: On Fri, Dec 9, 2011 at 03:53, Guido van Rossum wrote: > Are you saying that with that future import, b"..." is still a Unicode > literal? If I said that, this is not what I was trying to say. :-) //Lennart From stephen at xemacs.org Fri Dec 9 15:14:08 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 09 Dec 2011 23:14:08 +0900 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323405204.2710.139.camel@thinko> Message-ID: <87ty59zvbj.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > So, instead of attempting to paper over the problem by reintroducing > u'', perhaps the discussion we should be having is whether or not PEP > 3333's superficially appealing concept of defining an API in terms of > "native strings" is a loser in practice, +1 to that discussion. str is a different type in the two implementations, binary sludge with essentially undefined semantics in Python 2 and highly standardized text in Python 3. I don't understand how this can be expected to work well, and especially not in a code base that is trying to be portable across Python 2 and 3. I sympathize with Chris's complaint that he has to think about it "again", but in fact, it seems to me that may not be entirely true. AFAICS, having the WSGI APIs mask the difference between str and bytes (or unicode and str, depending on where you're working) requires that you think about it every time you pass something to a WSGI API. I could be wrong, of course (I don't do WSGI stuff, which is why I'm really surprised to hear this, and so I don't know the rationale for the WSGI API decision), but this description of the API just triggers all my alarms. I am somewhat sympathetic to the request for reintroduction of u'' (in my personal use it would just be cruft, so I'm -0.1 on that ground), but I can't see how the WSGI API is an argument for it. Making that case requires showing that the "native string" API makes pragmatic sense, and then that u'' can help. From regebro at gmail.com Fri Dec 9 15:18:33 2011 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 9 Dec 2011 15:18:33 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111208223408.0e2e8bd1@limelight.wooz.org> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> Message-ID: On Fri, Dec 9, 2011 at 04:34, Barry Warsaw wrote: > Sorry, I don't understand this. ?What does it mean to be "str in both > versions"? ?And why would you want that? It means that it's a str, that is a string of bytes, in Python 2, and a str, that is a string of Unicode characters, in Python 3. There are cases where you want this, for example not all libraries will accept both str and Unicode under Python 2. > As for "Unicode in Python 2 and str in Python 3", unadorned strings with the > future import in Python >= 2.6 does that just fine. Yes, but the future import will change this for *all* strings, making it impossible to have a string that is a "str" in both Python 2 and Python 3. For that reason, the future import is not enough as a solution (and I suspect, one major reason why I haven't actually seen any one using it). For most cases, using something like six's b() and u() has turned out to be a better solution. It's uglier than having u'' support in Python 3, but has the benefit that b() works also in Python 2.5. >?The > problem comes when you aren't or can't be sure, i.e. you have objects that are > sometimes one and sometimes the other. ?Such as email headers. ?In that case, > you're kind of screwed. ?Python 2's str type let you cheat, but not without > consequences. ?Those consequences are spelled "UnicodeErrors" and I'll be glad > to be rid of them. Me too. From stephen at xemacs.org Fri Dec 9 15:27:56 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 09 Dec 2011 23:27:56 +0900 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323416285.2710.219.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> <1323410470.2710.158.camel@thinko> <1323416285.2710.219.camel@thinko> Message-ID: <87sjktzuoj.fsf@uwakimon.sk.tsukuba.ac.jp> Chris McDonough writes: > Given an effective choice between enabling six lines of code in Python > 3.3 to support u'' and months of political wrangling and code rewriting, > I'll choose the former any day. Sure, but the real question is whether that *is* the effective choice. Maybe the effective choice is between enabling six lines of code in Python 3.3 to support u'' and not doing so, with both options eventually entailing months of political wrangling and code rewriting because it doesn't help with the underlying problems. From fuzzyman at voidspace.org.uk Fri Dec 9 15:42:43 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 9 Dec 2011 14:42:43 +0000 Subject: [Python-Dev] Unicode re support in Python 3 Message-ID: Hey python-devers, As I'm sure many of you are aware, Armin Ronacher posted a blog entry explaining the reasons he dislikes Python 3 in its current form. Whilst I don't agree with all of his complaints, he makes a fair point about the re module Unicode support. It seems that the specific issue he has could be fixed by accepting the re module improvement / overhaul implemented by mrab: http://bugs.python.org/issue2636 As it comes with an active maintainer, and is a big step forward for Python regex support, I'd like to see it in Python 3.3. Reading through the issue it's not clear to me what needs to be done for it to be accepted (or rejected), beyond a general "it's a big change". All the best, Michael Foord -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From regebro at gmail.com Fri Dec 9 15:45:39 2011 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 9 Dec 2011 15:45:39 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> Message-ID: Slightly OT: The slowness of running 2to3 during install time can be fixed by not doing so, but instead running it when the distribution is created, including both Python 2 and Python 3 code in the distribution. http://python3porting.com/2to3.html#distribution-section There are no tools that support this at the moment though. I guess it would be cool if Distribute supported making these kinds of distributions... //Lennart From solipsis at pitrou.net Fri Dec 9 15:43:49 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 15:43:49 +0100 Subject: [Python-Dev] Unicode re support in Python 3 References: Message-ID: <20111209154349.47eb6dcc@pitrou.net> On Fri, 9 Dec 2011 14:42:43 +0000 Michael Foord wrote: > > Whilst I don't agree with all of his complaints, he makes a fair point about the re module Unicode support. It seems that the specific issue he has could be fixed by accepting the re module improvement / overhaul implemented by mrab: > > http://bugs.python.org/issue2636 > > As it comes with an active maintainer, and is a big step forward for Python regex support, I'd like to see it in Python 3.3. Reading through the issue it's not clear to me what needs to be done for it to be accepted (or rejected), beyond a general "it's a big change". Reviewing. Do you volunteer? Regards Antoine. From dirkjan at ochtman.nl Fri Dec 9 16:09:51 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 9 Dec 2011 16:09:51 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: On Fri, Dec 9, 2011 at 09:02, Stefan Behnel wrote: > a) The stdlib documentation should help users to choose the right tool right > from the start. > b) cElementTree should finally loose it's "special" status as a separate > library and disappear as an accelerator module behind ElementTree. An at least somewhat informed +1 from me. The ElementTree API is a very good way to deal with XML from Python, and it deserves to be promoted over the included alternatives. Let's deprecate the NiCad batteries and try to guide users toward the Li-Ion ones. Cheers, Dirkjan From barry at python.org Fri Dec 9 16:11:23 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 9 Dec 2011 10:11:23 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> Message-ID: <20111209101123.01e92326@limelight.wooz.org> On Dec 09, 2011, at 03:18 PM, Lennart Regebro wrote: >On Fri, Dec 9, 2011 at 04:34, Barry Warsaw wrote: >> Sorry, I don't understand this. ?What does it mean to be "str in both >> versions"? ?And why would you want that? > >It means that it's a str, that is a string of bytes, in Python 2, and >a str, that is a string of Unicode characters, in Python 3. There are >cases where you want this, for example not all libraries will accept >both str and Unicode under Python 2. As Chris points out, this seems to be a use case tied to WSGI and PEP 3333. I guess it's an unfortunate choice for so recent a PEP, but maybe there was no way to do better. Still, it seems the "native string" discussion is an indication that the PEP is introducing a binary vs. text ambiguity when switching Python versions. My previous "you're screwed" comment comes back to mind. ;) >> As for "Unicode in Python 2 and str in Python 3", unadorned strings with the >> future import in Python >= 2.6 does that just fine. > >Yes, but the future import will change this for *all* strings, making >it impossible to have a string that is a "str" in both Python 2 and >Python 3. For that reason, the future import is not enough as a >solution (and I suspect, one major reason why I haven't actually seen >any one using it). It can certainly be useful in many contexts outside of WSGI. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Fri Dec 9 16:13:17 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 9 Dec 2011 10:13:17 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE1C4DA.9060809@v.loewis.de> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <4EE1C4DA.9060809@v.loewis.de> Message-ID: <20111209101317.0f7f6db7@limelight.wooz.org> On Dec 09, 2011, at 09:20 AM, Martin v. L?wis wrote: >One use case (and the only one I'm aware of) is to pass keyword >parameters. Python 2 insists that they are str (and doesn't accept >unicode), Python 3 insists that they are str (and doesn't accept bytes). > >This is fairly uncommon as a problem, though, and is also solved in >Python 2.6, which does accept Unicode strings as keyword parameter >names. Oh, I remember this one, because I think I reported and fixed it. But I take it as a given that Python 2.6 is the minimal (sane) version to target for one-codebase cross-Python code. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Fri Dec 9 16:23:56 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 9 Dec 2011 10:23:56 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> <1323410470.2710.158.camel@thinko> <1323416285.2710.219.camel@thinko> Message-ID: <20111209102356.4ec6c646@limelight.wooz.org> On Dec 09, 2011, at 06:09 PM, Nick Coghlan wrote: >Given that WSGI 1.0.1 is defined in terms of native strings and restoring >u'' support allows that to be expressed clearly in a shared codebase, I at >least understand the point of the suggestion now. I'm not quite convinced >restoring u'' is the right answer as yet, but a solid use case is always a >nice place to start :) Maybe a more interesting approach would be to expand on the `six` idea and bring some of those concepts into the stdlib for 3.3. You could implement the u() function somewhat more efficiently in an extension module, and make that available for older Pythons via the Cheeseshop. I now also have a few more Python and C level compatibility hacks that could make it into such a module. -Barry From fuzzyman at voidspace.org.uk Fri Dec 9 16:35:05 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 9 Dec 2011 15:35:05 +0000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111209101317.0f7f6db7@limelight.wooz.org> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <4EE1C4DA.9060809@v.loewis.de> <20111209101317.0f7f6db7@limelight.wooz.org> Message-ID: <2DBA4A01-EA18-4F33-908F-D271633AB52D@voidspace.org.uk> On 9 Dec 2011, at 15:13, Barry Warsaw wrote: > On Dec 09, 2011, at 09:20 AM, Martin v. L?wis wrote: > >> One use case (and the only one I'm aware of) is to pass keyword >> parameters. Python 2 insists that they are str (and doesn't accept >> unicode), Python 3 insists that they are str (and doesn't accept bytes). >> >> This is fairly uncommon as a problem, though, and is also solved in >> Python 2.6, which does accept Unicode strings as keyword parameter >> names. > > Oh, I remember this one, because I think I reported and fixed it. But I take > it as a given that Python 2.6 is the minimal (sane) version to target for > one-codebase cross-Python code. > In mock (at least 5000 lines of code including tests) I target 2.4 -> 3.2+. Admittedly mock does little I/O but does some fairly crazy introspection (and even found bugs in Python 3 because of it). The exception handling is the worst - no compatible syntax between 2.4-5 and Python 3. So you have to use sys.exc_info. Other than that it isn't too hard / bad. All the best, Michael > -Barry > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From barry at python.org Fri Dec 9 16:42:51 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 9 Dec 2011 10:42:51 -0500 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: References: <20111209013535.6fb38068@pitrou.net> <4EE1CA5D.70705@v.loewis.de> Message-ID: <20111209104251.072d9766@limelight.wooz.org> On Dec 09, 2011, at 07:12 PM, Nick Coghlan wrote: >Isn't it basically just exposing a C level version of the unicode() >builtin's behaviour? While I agree the name could be better (and >PyUnicode_AsExactUnicode would certainly work), why make it private? Don't we already have that in PyObject_Str(), or in Python 2, PyObject_Unicode()? -Barry From carl at oddbird.net Fri Dec 9 17:34:47 2011 From: carl at oddbird.net (Carl Meyer) Date: Fri, 09 Dec 2011 09:34:47 -0700 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <2DBA4A01-EA18-4F33-908F-D271633AB52D@voidspace.org.uk> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <4EE1C4DA.9060809@v.loewis.de> <20111209101317.0f7f6db7@limelight.wooz.org> <2DBA4A01-EA18-4F33-908F-D271633AB52D@voidspace.org.uk> Message-ID: <4EE238A7.9050708@oddbird.net> On 12/09/2011 08:35 AM, Michael Foord wrote: > On 9 Dec 2011, at 15:13, Barry Warsaw wrote: >> Oh, I remember this one, because I think I reported and fixed it. >> But I take it as a given that Python 2.6 is the minimal (sane) >> version to target for one-codebase cross-Python code. >> > > In mock (at least 5000 lines of code including tests) I target 2.4 -> > 3.2+. Admittedly mock does little I/O but does some fairly crazy > introspection (and even found bugs in Python 3 because of it). pip and virtualenv also both support 2.4 - 3.2+ from a single codebase (pip is ~7300 lines of code including tests, virtualenv ~1600). I consider them a bit of a special case; since they are both early-stage bootstrapping tools, the inconvenience level for users of a 2to3 step or having to keep separate versions around would be higher than for an ordinary library. But I will say that the workarounds necessary to support 2.4 - 3.2 have not really been problematic enough to tempt me towards a more complex workflow, and I would probably take the single-codebase approach with another port, even if I needed to support pre-2.6. The sys.exc_info() business is ugly indeed, but (IMHO) not bad enough to warrant adding 2to3 hassles into the maintenance workflow. Carl From l at lrowe.co.uk Fri Dec 9 17:36:40 2011 From: l at lrowe.co.uk (Laurence Rowe) Date: Fri, 09 Dec 2011 17:36:40 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <1323330308.2710.52.camel@thinko> <20111208072720.0d243557@limelight.wooz.org> Message-ID: On Thu, 08 Dec 2011 13:27:20 +0100, Barry Warsaw wrote: > On Dec 08, 2011, at 11:01 AM, Vinay Sajip wrote: > >> Well, if 3.2 remains in use for a longish time, then it is relevant, in >> the >> broader context, isn't it? We know how conservative Linux >> distributions can >> be with their Python releases - although most are still releasing 2.x as >> their system Python, this could change at some point in the future. >> Even if >> it doesn't, there might be a fair user base of people stuck with 3.2 >> for any >> number of reasons, and to support them, the change you propose won't >> help, >> because some variant of a package will still have to use u() and b(), >> just >> for 3.2 support. > > Case in point: Ubuntu 12.04 is a long term support release, meaning 5 > years of > official support on both the desktop and server. It will ship with > Python 2.7 > and 3.2 only. From a Plone perspective, Python 3 support is something that I don't see becoming important for maybe 5 years, so support for 3.2 is simply not an issue for us. Before Plone can consider a move to Python 3 we first need support in the libraries we depend on. For those libraries under active development it seems that compatibility with both 2.x and 3.x is the best way to go. Adding support for u'' to Python 3.x certainly looks like it would cut down the amount of work required for libraries like the Zope Toolkit which already use unicode extensively. Laurence From merwok at netwok.org Fri Dec 9 17:38:56 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 09 Dec 2011 17:38:56 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: <4EE239A0.2020004@netwok.org> Hi, When running 2to3 from a setup.py script, does it run on the whole codebase or only files that are found newer by the make-like timestamp-based dependency system? If it?s the former, as some messages seem to show (sorry no time to test right now), ISTM we can fix distutils to do the latter (unless there are bugs due to import rewriting to use explicit relative imports when there are extension modules?blergh). Regards From carl at oddbird.net Fri Dec 9 17:23:18 2011 From: carl at oddbird.net (Carl Meyer) Date: Fri, 09 Dec 2011 09:23:18 -0700 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> Message-ID: <4EE235F6.6030001@oddbird.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/09/2011 07:45 AM, Lennart Regebro wrote: > The slowness of running 2to3 during install time can be fixed by not > doing so, but instead running it when the distribution is created, > including both Python 2 and Python 3 code in the distribution. > > http://python3porting.com/2to3.html#distribution-section > > There are no tools that support this at the moment though. I guess it > would be cool if Distribute supported making these kinds of > distributions... Doesn't just this move the problem to testing? Presumably one wants to test that changes to the code don't break under Python 3, and ideally at every change, not only at release time. Carl -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7iNfYACgkQ8W4rlRKtE2dsqACffHkX7fVtCnmu8E4rdbfNdAfS 0fIAoLKzkmV3woLjXQP2sb8FcnlSgrux =7pRs -----END PGP SIGNATURE----- From solipsis at pitrou.net Fri Dec 9 17:46:31 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 17:46:31 +0100 Subject: [Python-Dev] 2to3 and timestamps References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <4EE239A0.2020004@netwok.org> Message-ID: <20111209174631.68a311f5@pitrou.net> On Fri, 09 Dec 2011 17:38:56 +0100 ?ric Araujo wrote: > Hi, > > When running 2to3 from a setup.py script, does it run on the whole > codebase or only files that are found newer by the make-like > timestamp-based dependency system? If it?s the former, as some messages > seem to show (sorry no time to test right now), ISTM we can fix > distutils to do the latter (unless there are bugs due to import > rewriting to use explicit relative imports when there are extension > modules?blergh). It would be better to teach 2to3 to do it by itself. Not everybody runs 2to3 through a setup.py script. Regards Antoine. From anacrolix at gmail.com Fri Dec 9 18:02:26 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 10 Dec 2011 04:02:26 +1100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: +1 On Sat, Dec 10, 2011 at 2:09 AM, Dirkjan Ochtman wrote: > On Fri, Dec 9, 2011 at 09:02, Stefan Behnel wrote: >> a) The stdlib documentation should help users to choose the right tool right >> from the start. >> b) cElementTree should finally loose it's "special" status as a separate >> library and disappear as an accelerator module behind ElementTree. > > An at least somewhat informed +1 from me. The ElementTree API is a > very good way to deal with XML from Python, and it deserves to be > promoted over the included alternatives. > > Let's deprecate the NiCad batteries and try to guide users toward the > Li-Ion ones. > > Cheers, > > Dirkjan > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com -- ?_? From status at bugs.python.org Fri Dec 9 18:07:33 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 9 Dec 2011 18:07:33 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20111209170733.899571CDE4@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-12-02 - 2011-12-09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3169 (+21) closed 22180 (+26) total 25349 (+47) Open issues with patches: 1351 Issues opened (39) ================== #5905: strptime fails in non-UTF locale http://bugs.python.org/issue5905 reopened by haypo #12555: PEP 3151 implementation http://bugs.python.org/issue12555 reopened by ncoghlan #13521: Make dict.setdefault() atomic http://bugs.python.org/issue13521 opened by rhettinger #13522: Document error return values for PyFloat_* and PyComplex_* http://bugs.python.org/issue13522 opened by skrah #13525: Tutorial: Example of Source Code Encoding triggers error http://bugs.python.org/issue13525 opened by nicolasg #13528: Rework performance FAQ http://bugs.python.org/issue13528 opened by pitrou #13530: Docs for os.lseek neglect to mention what it returns http://bugs.python.org/issue13530 opened by nedbat #13532: In IDLE, sys.stdout.write and sys.stderr can write any picklea http://bugs.python.org/issue13532 opened by maniram.maniram #13533: Would like Py_Initialize to play friendly with host app http://bugs.python.org/issue13533 opened by dangermouseb #13535: Improved two's complement arithmetic support: to_signed() and http://bugs.python.org/issue13535 opened by ncoghlan #13537: Namedtuple instances can't be pickled in a daemonized process http://bugs.python.org/issue13537 opened by Popa.Claudiu #13538: Docstring of str() and/or behavior http://bugs.python.org/issue13538 opened by Guillaume.Bouchard #13539: A return is missing in TimeEncoding of calendar.py http://bugs.python.org/issue13539 opened by psam #13540: Document the Action API in argparse http://bugs.python.org/issue13540 opened by jason.coombs #13541: HTTPResponse (urllib) has no attribute read1 needed for TextIO http://bugs.python.org/issue13541 opened by maubp #13543: shlex with string ending in space gives "ValueError: No closin http://bugs.python.org/issue13543 opened by ekorn #13544: Add __qualname__ to functools.WRAPPER_ASSIGNMENTS http://bugs.python.org/issue13544 opened by ncoghlan #13545: Pydoc3.2: TypeError: unorderable types http://bugs.python.org/issue13545 opened by threewestwinds #13547: Clean Lib/_sysconfigdata.py and Modules/_testembed http://bugs.python.org/issue13547 opened by skrah #13548: Invalid 'line' tracer event on pass within else clause http://bugs.python.org/issue13548 opened by sdeibel #13549: Incorrect nested list comprehension documentation http://bugs.python.org/issue13549 opened by mattlong #13550: Rewrite logging hack of the threading module http://bugs.python.org/issue13550 opened by haypo #13551: pulldom doesn't populate DOM tree http://bugs.python.org/issue13551 opened by AchimGaedke #13552: Compilation issues of the curses module on OpenIndiana http://bugs.python.org/issue13552 opened by haypo #13553: Tkinter doesn't set proper application name http://bugs.python.org/issue13553 opened by th9 #13554: Tkinter doesn't use higher resolution app icon http://bugs.python.org/issue13554 opened by th9 #13555: cPickle MemoryError when loading large file (while pickle work http://bugs.python.org/issue13555 opened by phillies #13556: When tzinfo.utcoffset is out-of-bounds, the exception message http://bugs.python.org/issue13556 opened by exarkun #13557: exec of list comprehension fails on NameError http://bugs.python.org/issue13557 opened by sdeibel #13558: multiprocessing package incompatible with PyObjC http://bugs.python.org/issue13558 opened by mrmekon #13559: Use sendfile where possible in httplib http://bugs.python.org/issue13559 opened by benjamin.peterson #13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize http://bugs.python.org/issue13560 opened by haypo #13561: os.listdir documentation should mention surrogateescape http://bugs.python.org/issue13561 opened by michael.foord #13562: Notes about module load path http://bugs.python.org/issue13562 opened by Nam.Nguyen #13563: Make use of with statement in ftplib http://bugs.python.org/issue13563 opened by giampaolo.rodola #13564: ftplib and sendfile() http://bugs.python.org/issue13564 opened by giampaolo.rodola #13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le http://bugs.python.org/issue13565 opened by haypo #13566: Array objects pickled in 3.x with protocol <=2 are unpickled i http://bugs.python.org/issue13566 opened by sbt #13567: HTTPError interface changes / breaks depending on what was pas http://bugs.python.org/issue13567 opened by Keto Most recent 15 issues with no replies (15) ========================================== #13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le http://bugs.python.org/issue13565 #13564: ftplib and sendfile() http://bugs.python.org/issue13564 #13562: Notes about module load path http://bugs.python.org/issue13562 #13561: os.listdir documentation should mention surrogateescape http://bugs.python.org/issue13561 #13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize http://bugs.python.org/issue13560 #13556: When tzinfo.utcoffset is out-of-bounds, the exception message http://bugs.python.org/issue13556 #13554: Tkinter doesn't use higher resolution app icon http://bugs.python.org/issue13554 #13553: Tkinter doesn't set proper application name http://bugs.python.org/issue13553 #13544: Add __qualname__ to functools.WRAPPER_ASSIGNMENTS http://bugs.python.org/issue13544 #13540: Document the Action API in argparse http://bugs.python.org/issue13540 #13539: A return is missing in TimeEncoding of calendar.py http://bugs.python.org/issue13539 #13528: Rework performance FAQ http://bugs.python.org/issue13528 #13525: Tutorial: Example of Source Code Encoding triggers error http://bugs.python.org/issue13525 #13516: Gzip old log files in rotating handlers http://bugs.python.org/issue13516 #13507: Modify OS X installer builds to package liblzma for the new lz http://bugs.python.org/issue13507 Most recent 15 issues waiting for review (15) ============================================= #13567: HTTPError interface changes / breaks depending on what was pas http://bugs.python.org/issue13567 #13564: ftplib and sendfile() http://bugs.python.org/issue13564 #13563: Make use of with statement in ftplib http://bugs.python.org/issue13563 #13562: Notes about module load path http://bugs.python.org/issue13562 #13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize http://bugs.python.org/issue13560 #13552: Compilation issues of the curses module on OpenIndiana http://bugs.python.org/issue13552 #13550: Rewrite logging hack of the threading module http://bugs.python.org/issue13550 #13549: Incorrect nested list comprehension documentation http://bugs.python.org/issue13549 #13528: Rework performance FAQ http://bugs.python.org/issue13528 #13520: Patch to make pickle aware of __qualname__ http://bugs.python.org/issue13520 #13516: Gzip old log files in rotating handlers http://bugs.python.org/issue13516 #13515: Consistent documentation practices for security concerns and c http://bugs.python.org/issue13515 #13512: ~/.pypirc created insecurely http://bugs.python.org/issue13512 #13511: ./configure --includedir, --libdir accept multiple http://bugs.python.org/issue13511 #13508: ctypes' find_library breaks with ARM ABIs http://bugs.python.org/issue13508 Top 10 most discussed issues (10) ================================= #11051: system calls per import http://bugs.python.org/issue11051 9 msgs #13549: Incorrect nested list comprehension documentation http://bugs.python.org/issue13549 8 msgs #11816: Refactor the dis module to provide better building blocks for http://bugs.python.org/issue11816 6 msgs #12555: PEP 3151 implementation http://bugs.python.org/issue12555 6 msgs #6715: xz compressor support http://bugs.python.org/issue6715 5 msgs #11682: PEP 380 reference implementation for 3.3 http://bugs.python.org/issue11682 5 msgs #11838: IDLE: make interactive code savable as a runnable script http://bugs.python.org/issue11838 5 msgs #13515: Consistent documentation practices for security concerns and c http://bugs.python.org/issue13515 5 msgs #13538: Docstring of str() and/or behavior http://bugs.python.org/issue13538 5 msgs #13545: Pydoc3.2: TypeError: unorderable types http://bugs.python.org/issue13545 5 msgs Issues closed (26) ================== #3635: pickle.dumps cannot save instance of dict-derived class that o http://bugs.python.org/issue3635 closed by alexandre.vassalotti #9663: importlib should exclusively open bytecode files http://bugs.python.org/issue9663 closed by brett.cannon #11147: _Py_ANNOTATE_MEMORY_ORDER has unused argument, effects code wh http://bugs.python.org/issue11147 closed by barry #11894: test_multiprocessing failure on "AMD64 OpenIndiana 3.x": KeyEr http://bugs.python.org/issue11894 closed by haypo #12208: Glitches in email.policy docs http://bugs.python.org/issue12208 closed by eric.araujo #12567: curses implementation of Unicode is wrong in Python 3 http://bugs.python.org/issue12567 closed by haypo #12612: Valgrind suppressions http://bugs.python.org/issue12612 closed by neologix #12666: map semantic change not documented in What's New http://bugs.python.org/issue12666 closed by jason.coombs #13211: urllib2.HTTPError does not have 'reason' attribute. http://bugs.python.org/issue13211 closed by jason.coombs #13441: TestEnUSCollation.test_strxfrm() fails on Solaris http://bugs.python.org/issue13441 closed by haypo #13464: HTTPResponse is missing an implementation of readinto http://bugs.python.org/issue13464 closed by pitrou #13494: 'cast' any value to a Boolean? http://bugs.python.org/issue13494 closed by ezio.melotti #13499: uuid documentation example uses invalid REPL/doctest syntax http://bugs.python.org/issue13499 closed by ezio.melotti #13500: Hitting EOF gets cmd.py into a infinite EOF on return loop http://bugs.python.org/issue13500 closed by python-dev #13503: improved efficiency of bytearray pickling by using bytes type http://bugs.python.org/issue13503 closed by pitrou #13513: IOBase docs incorrectly link to the readline module http://bugs.python.org/issue13513 closed by meador.inge #13523: Python does not warn in module .py files does not exist if the http://bugs.python.org/issue13523 closed by ncoghlan #13524: critical error with import tempfile http://bugs.python.org/issue13524 closed by Andrey.Morozov #13526: Deprecate the old Unicode API http://bugs.python.org/issue13526 closed by loewis #13527: Remove obsolete mentions in the GUIs page http://bugs.python.org/issue13527 closed by pitrou #13529: Segfault inside of gc/weakref http://bugs.python.org/issue13529 closed by alex #13531: add test for defaultdict with non-callable first argument http://bugs.python.org/issue13531 closed by ezio.melotti #13534: test_cmath fails on ppc with glibc-2.14.90 due to buggy archit http://bugs.python.org/issue13534 closed by dmalcolm #13536: ast.literal_eval fails on sets http://bugs.python.org/issue13536 closed by benjamin.peterson #13542: Memory leak in multiprocessing.pool http://bugs.python.org/issue13542 closed by neologix #13546: sys.setrecursionlimit() crashes IDLE http://bugs.python.org/issue13546 closed by ned.deily From janssen at parc.com Fri Dec 9 18:47:11 2011 From: janssen at parc.com (Bill Janssen) Date: Fri, 9 Dec 2011 09:47:11 PST Subject: [Python-Dev] Unicode re support in Python 3 In-Reply-To: References: Message-ID: <67010.1323452831@parc.com> Michael Foord wrote: > Hey python-devers, > > As I'm sure many of you are aware, Armin Ronacher posted a blog entry > explaining the reasons he dislikes Python 3 in its current form. > > Whilst I don't agree with all of his complaints, he makes a fair point > about the re module Unicode support. It seems that the specific issue > he has could be fixed by accepting the re module improvement / > overhaul implemented by mrab: > > http://bugs.python.org/issue2636 > > As it comes with an active maintainer, and is a big step forward for > Python regex support, I'd like to see it in Python 3.3. Reading > through the issue it's not clear to me what needs to be done for it to > be accepted (or rejected), beyond a general "it's a big change". I've been using mrab's regex daily for about six months, and have found it stable and useful. It now includes two features which are both unusual and useful (very Pythonic!), named lists and fuzzy matching. Bill From mwm at mired.org Fri Dec 9 19:07:36 2011 From: mwm at mired.org (Mike Meyer) Date: Fri, 9 Dec 2011 10:07:36 -0800 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: <20111209100736.5f16419a@mikmeyer-vm-fedora> On Fri, 09 Dec 2011 09:02:35 +0100 Stefan Behnel wrote: > a) The stdlib documentation should help users to choose the right > tool right from the start. > b) cElementTree should finally loose it's "special" status as a > separate library and disappear as an accelerator module behind > ElementTree. +1 and +1. I've done a lot of xml work in Python, and unless you've got a particular reason for wanting to use the dom, ElementTree is the only sane way to go. I recently converted a middling-sized app from using the dom to using ElementTree, and wrote up some guidelines for the process for the client. I can try and shake it out of my clients lawyers if it would help with this or others are interested. References: <20111209100736.5f16419a@mikmeyer-vm-fedora> Message-ID: <68178.1323454554@parc.com> Mike Meyer wrote: > On Fri, 09 Dec 2011 09:02:35 +0100 > Stefan Behnel wrote: > > > a) The stdlib documentation should help users to choose the right > > tool right from the start. > > b) cElementTree should finally loose it's "special" status as a > > separate library and disappear as an accelerator module behind > > ElementTree. > > +1 and +1. > > I've done a lot of xml work in Python, and unless you've got a > particular reason for wanting to use the dom, ElementTree is the only > sane way to go. I use ElementTree for parsing valid XML, but minidom for producing it. I think another thing that might go into "refreshing the batteries" is a feature comparison of BeautifulSoup and HTML5lib against the stdlib competition, to see what needs to be added/revised. Having to switch to an outside package for parsing possibly invalid HTML is a pain. Bill From p.f.moore at gmail.com Fri Dec 9 19:24:41 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 9 Dec 2011 18:24:41 +0000 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <68178.1323454554@parc.com> References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> Message-ID: On 9 December 2011 18:15, Bill Janssen wrote: > I use ElementTree for parsing valid XML, but minidom for producing it. > > I think another thing that might go into "refreshing the batteries" is a > feature comparison of BeautifulSoup and HTML5lib against the stdlib > competition, to see what needs to be added/revised. ?Having to switch to > an outside package for parsing possibly invalid HTML is a pain. For what little use I make of XML/HTML parsing, I use lxml, simply because it has a parser that covers the sort of HTML I have to deal with in real life. As I have lxml installed, I use it for any XML parsing tasks, just because I'm used to it. Paul From glyph at twistedmatrix.com Fri Dec 9 19:39:20 2011 From: glyph at twistedmatrix.com (Glyph) Date: Fri, 9 Dec 2011 13:39:20 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> Message-ID: <7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com> On Dec 9, 2011, at 12:43 AM, Guido van Rossum wrote: > Even if it weren't slow, I still wouldn't use it to automatically > convert code at install time; a single codebase is easier to reason > about, and easier to support. Users send me tracebacks all the time; > having them match the source is a wonderful thing. > > Even though 2to3 was my idea, I am gradually beginning to appreciate this approach. I skimmed the docs for "six" and liked it. Actually, maybe I like it a bit better than I thought. The biggest issue for the single-codebase approach is 'except ... as ...'. Peppering one's codebase with calls to sys.exc_info() can be a real performance problem, especially on PyPy. Not to mention how ugly it is. For some reason I thought that this syntax was only supported by 2.7 and up; I see now that it's 2.6 and up. This is still a problem for 2.5 support, of course, but 2.6-only may not be too far away for many projects; Twisted's support schedule for Python versions typically follows Ubuntu's, which means that we might be able to drop 2.5 as early as 2013! :). Even in the plans that involve 2to3 though, "drop everything prior to 2.6" was always supposed to be step 0, so "single codebase" adds much less of a burden than I thought. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-dev at masklinn.net Fri Dec 9 19:39:17 2011 From: python-dev at masklinn.net (Xavier Morel) Date: Fri, 9 Dec 2011 19:39:17 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <68178.1323454554@parc.com> References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> Message-ID: On 2011-12-09, at 19:15 , Bill Janssen wrote: > I use ElementTree for parsing valid XML, but minidom for producing it. Could you expand on your reasons to use minidom for producing XML? From victor.stinner at haypocalc.com Fri Dec 9 19:51:14 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 09 Dec 2011 19:51:14 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: <20111209013535.6fb38068@pitrou.net> References: <20111209013535.6fb38068@pitrou.net> Message-ID: <4EE258A2.8020902@haypocalc.com> On 09/12/2011 01:35, Antoine Pitrou wrote: > On Fri, 09 Dec 2011 00:16:02 +0100 > victor.stinner wrote: >> >> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) >> + >> + Get a new copy of a Unicode object. >> + >> + .. versionadded:: 3.3 > > I'm not sure I understand. Why would you make a copy of an immutable > object? PyUnicode_Copy() can be used to modify a string to create a new string with the same length. It is used for example by str.upper(), str.title(), ... (fixup()). It is also used by str.__getnewargs__(). I am not sure that str.__getnewargs__() must be a copy of str (s.__getnewargs__() is not x). As mentionned by Martin, PyUnicode_Copy() is also used to get "an exact" Unicode object when you have a subtype. We can maybe make the function private. Victor From flying-sheep at web.de Thu Dec 8 14:31:26 2011 From: flying-sheep at web.de (Philipp A.) Date: Thu, 8 Dec 2011 14:31:26 +0100 Subject: [Python-Dev] re.findall() should return named tuple Message-ID: hi devs, just an idea that popped up in my mind: re.findall() returns a list of tuples, where every entry of each tuple represents a match group. since match groups can be named, we are able to use named tuples instead of plain tuples here, in the same fashion as namedtuple?s rename works: misssing group names get renamed to _1 and so on. i suggest to add the rename keyword option, to findall, defaulting to True, since mixed positional and named tuples are more common than in usual use cases of namedtuple. do you think it?s a good idea? finally: should i join the mailing list to see answers? should i file a PEP? i have no idea how the inner workings of python development are, but i wanted to share this idea with you :) thanks for keeping python great, philipp -------------- next part -------------- An HTML attachment was scrubbed... URL: From janssen at parc.com Fri Dec 9 20:33:17 2011 From: janssen at parc.com (Bill Janssen) Date: Fri, 9 Dec 2011 11:33:17 PST Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> Message-ID: <69816.1323459197@parc.com> Xavier Morel wrote: > On 2011-12-09, at 19:15 , Bill Janssen wrote: > > I use ElementTree for parsing valid XML, but minidom for producing it. > Could you expand on your reasons to use minidom for producing XML? Inertia, I guess. I tried that first, and it seems to work. I tend to use html5lib and/or BeautifulSoup instead of ElementTree, and that's mainly because I find the documentation for ElementTree is confusing and partial and inconsistent. Having various undated but obsolete tutorials and documentation still up on effbot.org doesn't help. Bill From solipsis at pitrou.net Fri Dec 9 20:32:16 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 20:32:16 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() References: <20111209013535.6fb38068@pitrou.net> <4EE258A2.8020902@haypocalc.com> Message-ID: <20111209203216.2c627d61@pitrou.net> On Fri, 09 Dec 2011 19:51:14 +0100 Victor Stinner wrote: > On 09/12/2011 01:35, Antoine Pitrou wrote: > > On Fri, 09 Dec 2011 00:16:02 +0100 > > victor.stinner wrote: > >> > >> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) > >> + > >> + Get a new copy of a Unicode object. > >> + > >> + .. versionadded:: 3.3 > > > > I'm not sure I understand. Why would you make a copy of an immutable > > object? > > PyUnicode_Copy() can be used to modify a string to create a new string > with the same length. It is used for example by str.upper(), > str.title(), ... (fixup()). Then the doc should mention that the returned string can be modified. Otherwise it's a bit obscure why the function exists. Regards Antoine. From pje at telecommunity.com Fri Dec 9 20:58:10 2011 From: pje at telecommunity.com (PJ Eby) Date: Fri, 9 Dec 2011 14:58:10 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111209101123.01e92326@limelight.wooz.org> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> Message-ID: On Fri, Dec 9, 2011 at 10:11 AM, Barry Warsaw wrote: > As Chris points out, this seems to be a use case tied to WSGI and PEP > 3333. I > guess it's an unfortunate choice for so recent a PEP, but maybe there was > no > way to do better. For the record, "native strings" are defined the way they are because of IronPython and Jython, which had unicode strings long before CPython. At the time WSGI was developed, the approach for Python 3 (then called "3000") was expected to be similar, and the new I/O system was not (AFAIR) designed yet. All that changed in PEP 3333 was introducing *byte* strings (to accommodate the I/O changes), not native strings. In fact, I'm not sure why people are bringing it into this discussion at all: PEP 3333 was designed to work well with 2to3, which does the right thing for WSGI code: it converts 2.x "str" to 3.x "str", as it should. If you're writing 2.x WSGI code with 'u' literals, *your code is broken*. WSGI doesn't need 'u' literals and never has. It *does* need b'' literals for stuff that refers to request and response bodies, but everything else should be plain old string literals for the appropriate Python version. It can certainly be useful in many contexts outside of WSGI. > And *only* there, pretty much. ;-) PEP 3333 was designed to work with the official upgrade path (2to3), which is why it has a concept of native strings. Thing is, if you mark them with a 'u', you're writing incorrect code for 2.x. -------------- next part -------------- An HTML attachment was scrubbed... URL: From manday at gmx.net Fri Dec 9 21:26:29 2011 From: manday at gmx.net (Cedric Sodhi) Date: Fri, 9 Dec 2011 21:26:29 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ Message-ID: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING SIMILAR, JUST DON'T. Otherwise, read on. I know very well that this topic has been discussed before. On forums. Mailing lists. IRC. Blogs. From person to person, even. And I know equally well, from all those years experiencing argument-turned-debates on the internet, how a (minor|major) fraction of participants make up for their inability to lead a proper debate by speaking the loudest of all, so that eventually quantity triumphs over quality and logic. That ahead; I hope you can try not to fall in that category. Let instead reason prevail over sentimentalism, mislead purism, elitism, and all other sorts of isms which hinder advancement in the greater context. Python has surprised once already: The changes from 2 to 3 were not downwards compatible because the core developers realized there is more to a sustainable language than constantly patching it up until it comes apart like the roman empire. Let's keep that spirit for a second and let us discuss braces, again, with the clear goal of improving the language. End of disclaimer? End of disclaimer! Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has reasons. What are those reasons? Well, primarily, it forces the programmer to maintain well readable code. Then, some might argue, it is quicker to type. Two reasons, but of what importance are they? And are they actually reasons? You may guessed it from the questions themselves that I'm about to question that. I don't intend to connote brazen implications, so let me spell out what I just implied: I think anyone who thinks that exclusive WSB is a good alternative or even preferable to DB is actually deluding themselves for some personal version of one of those isms mentioned above. Let's examine these alleged advantages objectively one for one. But before that, just to calm troubled waters a little, allow me bring forward the conclusion: Absolutely no intentions to remowe WSB from Python. Although one might have gotten that impression from the early paragraphs, no intentions to break downwards compatibility, either. What Python needs is an alternative to WSB and can stay Python by still offering WSB to all those who happen to like it. Readable code, is it really an advantage? Two linebreaks, just for the suspense, then: Of course it is. Forcing the programmer to write readable code, is that an advantage? No suspense, the answer is Of course not. Python may have started off as the casual scripting language for casual people. People, who may not even have known programming. And perhaps it has made sense to force -- or shall we say motivate, since you can still produce perfectly obfuscated code with Python -- them to write readably. But Python has matured and so has its clientele. Python does not become a better language, neither for beginners nor for experienced programmers who also frequently use Python these days, by patronizing them and restricting them in their freedom. Readable code? Yes. Forcing people to write readable code by artificial means? No. Practice is evidence for the mischief of this policy: Does the FOSS community suffer from a notorious lack of proper indention or readability of code? Of course we don't. I'm not a native speaker, but dict.cc tells me that what we call "mit Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called breaking a fly on the wheel in English. I may lack the analogy for the fly on the wheel, which, if I'm not mistaken, used to be a device for torture in the Middle Ages, but I can tell you that the cannon ball which might have struck the sparrows, coincidently caused havoc in the hinterlands. For the wide-spread and professional language Python is today, the idea of forcing people to indent is misguided. These days, it may address a neglible minority of absolute beginners who barely started programming and would not listen to the simple advice of indenting properly, but on the other hand it hurts/annoys/deters a great community of typical programmers for whom DB has long become a de facto standard. For them, it's not a mere inconsistency without, for them, any apparent reason. It's more than the inconvenience not being able to follow ones long time practices, using the scripts one wrote for delimiters, the shortcuts that are usually offered by editor, etc. It also brings about a whole class of new problems which may be anticipated and prevent, yet bear a great potential for new, even hard-to-find bugs (just in case anyone would respond that we had eventually successfully redeemed the mismatched parenthesis problem - at what cost?!). Not just difficult to find, near to impossible would be the right word for anyone who has to review someone else's patch. It is widely known among the programmer's community that spaces and tabs are remarkably similar to eachother. So similar even, that people fight wars about which to use in a non-py context. It might strike one as an equally remarkably nonsensical idea to give them programmatic meaning - two DIFFERENT meanings, to make things even worse. While it becomes a practical impossibility to spot these kind of bugs while reviewing code -- optionally mangled through a medium which expands tabs to whitespace, not so much of a rarity -- it is still a time-consuming and tedious job to find them in a local situation. More or less easily rectified, but once you spent a while trying to figure something like that out, you inevitably have the urge to ask: Why? Last of all, some might argue that it's convenient to not to have type delimiters. Well, be my guest. I also appreciate single lined conditional or loops once in a while. I understand how not having to type delimiters if you don't want them lifts a burden. Hence I would not want rid Python of them. WSB may come in handy. But equally, it may not. Proposing the actual changes that would have to be made to accomodate both, WSB and DB is beyond the scope of this script. It is the CONCLUSION that the current situation is undesirable and Python, although not apparent at the first glance, suffers from exclusive WSB, which is the goal of this thread. Discussing has its etymological roots in Discourse, which connotes a loosely guided conversation about a topic. Therefore, I conclude with a DEBATE!!!111 kind regards, -- MD (not proof-read) From brian at python.org Fri Dec 9 21:36:21 2011 From: brian at python.org (Brian Curtin) Date: Fri, 9 Dec 2011 14:36:21 -0600 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: On Fri, Dec 9, 2011 at 14:26, Cedric Sodhi wrote: > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > SIMILAR, JUST DON'T. > > Otherwise, read on. > > I know very well that this topic has been discussed before. On forums. > Mailing lists. IRC. Blogs. From person to person, even. > > And I know equally well, from all those years experiencing > argument-turned-debates on the internet, how a (minor|major) fraction of > participants make up for their inability to lead a proper debate by > speaking the loudest of all, so that eventually quantity triumphs over > quality and logic. > > That ahead; I hope you can try not to fall in that category. Let instead > reason prevail over sentimentalism, mislead purism, elitism, and all > other sorts of isms which hinder advancement in the greater context. > > Python has surprised once already: The changes from 2 to 3 were not > downwards compatible because the core developers realized there is more > to a sustainable language than constantly patching it up until it comes > apart like the roman empire. > > Let's keep that spirit for a second and let us discuss braces, again, > with the clear goal of improving the language. > > End of disclaimer? > > End of disclaimer! > > Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has > reasons. What are those reasons? Well, primarily, it forces the > programmer to maintain well readable code. Then, some might argue, it is > quicker to type. > > Two reasons, but of what importance are they? And are they actually > reasons? > > You may guessed it from the questions themselves that I'm about to > question that. > > I don't intend to connote brazen implications, so let me spell out what > I just implied: I think anyone who thinks that exclusive WSB is a good > alternative or even preferable to DB is actually deluding themselves for > some personal version of one of those isms mentioned above. > > Let's examine these alleged advantages objectively one for one. But > before that, just to calm troubled waters a little, allow me bring > forward the conclusion: > > Absolutely no intentions to remowe WSB from Python. Although one might > have gotten that impression from the early paragraphs, no intentions to > break downwards compatibility, either. > > What Python needs is an alternative to WSB and can stay Python by still > offering WSB to all those who happen to like it. > > Readable code, is it really an advantage? > > Two linebreaks, just for the suspense, then: > > Of course it is. > > Forcing the programmer to write readable code, is that an advantage? No > suspense, the answer is Of course not. > > Python may have started off as the casual scripting language for casual > people. People, who may not even have known programming. And perhaps it > has made sense to force -- or shall we say motivate, since you can still > produce perfectly obfuscated code with Python -- them to write readably. > > But Python has matured and so has its clientele. Python does not become > a better language, neither for beginners nor for experienced programmers > who also frequently use Python these days, by patronizing them and > restricting them in their freedom. > > Readable code? Yes. Forcing people to write readable code by artificial > means? No. > > Practice is evidence for the mischief of this policy: Does the FOSS > community suffer from a notorious lack of proper indention or > readability of code? Of course we don't. > > I'm not a native speaker, but dict.cc tells me that what we call "mit > Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called > breaking a fly on the wheel in English. > > I may lack the analogy for the fly on the wheel, which, if I'm not > mistaken, used to be a device for torture in the Middle Ages, but I can > tell you that the cannon ball which might have struck the sparrows, > coincidently caused havoc in the hinterlands. > > For the wide-spread and professional language Python is today, the idea > of forcing people to indent is misguided. These days, it may address a > neglible minority of absolute beginners who barely started programming > and would not listen to the simple advice of indenting properly, but on > the other hand it hurts/annoys/deters a great community of typical > programmers for whom DB has long become a de facto standard. > > For them, it's not a mere inconsistency without, for them, any apparent > reason. It's more than the inconvenience not being able to follow ones > long time practices, using the scripts one wrote for delimiters, the > shortcuts that are usually offered by editor, etc. > > It also brings about a whole class of new problems which may be > anticipated and prevent, yet bear a great potential for new, even > hard-to-find bugs (just in case anyone would respond that we had > eventually successfully redeemed the mismatched parenthesis problem - at > what cost?!). > > Not just difficult to find, near to impossible would be the right word > for anyone who has to review someone else's patch. > > It is widely known among the programmer's community that spaces and tabs > are remarkably similar to eachother. So similar even, that people fight > wars about which to use in a non-py context. It might strike one as an > equally remarkably nonsensical idea to give them programmatic meaning - > two DIFFERENT meanings, to make things even worse. > > While it becomes a practical impossibility to spot these kind of bugs > while reviewing code -- optionally mangled through a medium which > expands tabs to whitespace, not so much of a rarity -- it is still a > time-consuming and tedious job to find them in a local situation. > > More or less easily rectified, but once you spent a while trying to > figure something like that out, you inevitably have the urge to ask: Why? > > Last of all, some might argue that it's convenient to not to have type > delimiters. Well, be my guest. I also appreciate single lined > conditional or loops once in a while. I understand how not having to > type delimiters if you don't want them lifts a burden. Hence I would not > want rid Python of them. WSB may come in handy. But equally, it may not. > > Proposing the actual changes that would have to be made to accomodate > both, WSB and DB is beyond the scope of this script. It is the > CONCLUSION that the current situation is undesirable and Python, > although not apparent at the first glance, suffers from exclusive WSB, > which is the goal of this thread. > > Discussing has its etymological roots in Discourse, which connotes a > loosely guided conversation about a topic. Therefore, I conclude with a > > DEBATE!!!111 > > kind regards, > -- MD > > (not proof-read) You forgot the patch. From regebro at gmail.com Fri Dec 9 21:46:42 2011 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 9 Dec 2011 21:46:42 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE239A0.2020004@netwok.org> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <4EE239A0.2020004@netwok.org> Message-ID: On Fri, Dec 9, 2011 at 17:38, ?ric Araujo wrote: > When running 2to3 from a setup.py script, does it run on the whole > codebase or only files that are found newer by the make-like > timestamp-based dependency system? Only on the ones that are newer. But since at install time, that's all of them, it doesn't really help. :-) //Lennart From jnoller at gmail.com Fri Dec 9 21:59:05 2011 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 9 Dec 2011 15:59:05 -0500 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <80BEA596DA974153981E972102B17257@gmail.com> On Friday, December 9, 2011 at 3:26 PM, Cedric Sodhi wrote: > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > SIMILAR, JUST DON'T. > > Otherwise, read on. > > I know very well that this topic has been discussed before. On forums. > Mailing lists. IRC. Blogs. From person to person, even. > > And I know equally well, from all those years experiencing > argument-turned-debates on the internet, how a (minor|major) fraction of > participants make up for their inability to lead a proper debate by > speaking the loudest of all, so that eventually quantity triumphs over > quality and logic. > > That ahead; I hope you can try not to fall in that category. Let instead > reason prevail over sentimentalism, mislead purism, elitism, and all > other sorts of isms which hinder advancement in the greater context. > > Python has surprised once already: The changes from 2 to 3 were not > downwards compatible because the core developers realized there is more > to a sustainable language than constantly patching it up until it comes > apart like the roman empire. > > Let's keep that spirit for a second and let us discuss braces, again, > with the clear goal of improving the language. > > End of disclaimer? > > End of disclaimer! > > Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has > reasons. What are those reasons? Well, primarily, it forces the > programmer to maintain well readable code. Then, some might argue, it is > quicker to type. > > Two reasons, but of what importance are they? And are they actually > reasons? > > You may guessed it from the questions themselves that I'm about to > question that. > > I don't intend to connote brazen implications, so let me spell out what > I just implied: I think anyone who thinks that exclusive WSB is a good > alternative or even preferable to DB is actually deluding themselves for > some personal version of one of those isms mentioned above. > > Let's examine these alleged advantages objectively one for one. But > before that, just to calm troubled waters a little, allow me bring > forward the conclusion: > > Absolutely no intentions to remowe WSB from Python. Although one might > have gotten that impression from the early paragraphs, no intentions to > break downwards compatibility, either. > > What Python needs is an alternative to WSB and can stay Python by still > offering WSB to all those who happen to like it. > > Readable code, is it really an advantage? > > Two linebreaks, just for the suspense, then: > > Of course it is. > > Forcing the programmer to write readable code, is that an advantage? No > suspense, the answer is Of course not. > > Python may have started off as the casual scripting language for casual > people. People, who may not even have known programming. And perhaps it > has made sense to force -- or shall we say motivate, since you can still > produce perfectly obfuscated code with Python -- them to write readably. > > But Python has matured and so has its clientele. Python does not become > a better language, neither for beginners nor for experienced programmers > who also frequently use Python these days, by patronizing them and > restricting them in their freedom. > > Readable code? Yes. Forcing people to write readable code by artificial > means? No. > > Practice is evidence for the mischief of this policy: Does the FOSS > community suffer from a notorious lack of proper indention or > readability of code? Of course we don't. > > I'm not a native speaker, but dict.cc (http://dict.cc) tells me that what we call "mit > Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called > breaking a fly on the wheel in English. > > I may lack the analogy for the fly on the wheel, which, if I'm not > mistaken, used to be a device for torture in the Middle Ages, but I can > tell you that the cannon ball which might have struck the sparrows, > coincidently caused havoc in the hinterlands. > > For the wide-spread and professional language Python is today, the idea > of forcing people to indent is misguided. These days, it may address a > neglible minority of absolute beginners who barely started programming > and would not listen to the simple advice of indenting properly, but on > the other hand it hurts/annoys/deters a great community of typical > programmers for whom DB has long become a de facto standard. > > For them, it's not a mere inconsistency without, for them, any apparent > reason. It's more than the inconvenience not being able to follow ones > long time practices, using the scripts one wrote for delimiters, the > shortcuts that are usually offered by editor, etc. > > It also brings about a whole class of new problems which may be > anticipated and prevent, yet bear a great potential for new, even > hard-to-find bugs (just in case anyone would respond that we had > eventually successfully redeemed the mismatched parenthesis problem - at > what cost?!). > > Not just difficult to find, near to impossible would be the right word > for anyone who has to review someone else's patch. > > It is widely known among the programmer's community that spaces and tabs > are remarkably similar to eachother. So similar even, that people fight > wars about which to use in a non-py context. It might strike one as an > equally remarkably nonsensical idea to give them programmatic meaning - > two DIFFERENT meanings, to make things even worse. > > While it becomes a practical impossibility to spot these kind of bugs > while reviewing code -- optionally mangled through a medium which > expands tabs to whitespace, not so much of a rarity -- it is still a > time-consuming and tedious job to find them in a local situation. > > More or less easily rectified, but once you spent a while trying to > figure something like that out, you inevitably have the urge to ask: Why? > > Last of all, some might argue that it's convenient to not to have type > delimiters. Well, be my guest. I also appreciate single lined > conditional or loops once in a while. I understand how not having to > type delimiters if you don't want them lifts a burden. Hence I would not > want rid Python of them. WSB may come in handy. But equally, it may not. > > Proposing the actual changes that would have to be made to accomodate > both, WSB and DB is beyond the scope of this script. It is the > CONCLUSION that the current situation is undesirable and Python, > although not apparent at the first glance, suffers from exclusive WSB, > which is the goal of this thread. > > Discussing has its etymological roots in Discourse, which connotes a > loosely guided conversation about a topic. Therefore, I conclude with a > > DEBATE!!!111 > > kind regards, > -- MD > > (not proof-read) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org (mailto:Python-Dev at python.org) > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com +1 From python-dev at masklinn.net Fri Dec 9 22:02:54 2011 From: python-dev at masklinn.net (Xavier Morel) Date: Fri, 9 Dec 2011 22:02:54 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <85361AAC-2DB1-4EF4-8DB6-07AB8846BDDA@masklinn.net> On 2011-12-09, at 21:26 , Cedric Sodhi wrote: > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > SIMILAR, JUST DON'T. > > Otherwise, read on. > > I know very well that this topic has been discussed before. On forums. > Mailing lists. IRC. Blogs. From person to person, even. > > And I know equally well, from all those years experiencing > argument-turned-debates on the internet, how a (minor|major) fraction of > participants make up for their inability to lead a proper debate by > speaking the loudest of all, so that eventually quantity triumphs over > quality and logic. > > That ahead; I hope you can try not to fall in that category. Let instead > reason prevail over sentimentalism, mislead purism, elitism, and all > other sorts of isms which hinder advancement in the greater context. > > Python has surprised once already: The changes from 2 to 3 were not > downwards compatible because the core developers realized there is more > to a sustainable language than constantly patching it up until it comes > apart like the roman empire. > > Let's keep that spirit for a second and let us discuss braces, again, > with the clear goal of improving the language. > > End of disclaimer? > > End of disclaimer! > > Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has > reasons. What are those reasons? Well, primarily, it forces the > programmer to maintain well readable code. Then, some might argue, it is > quicker to type. > > Two reasons, but of what importance are they? And are they actually > reasons? > > You may guessed it from the questions themselves that I'm about to > question that. > > I don't intend to connote brazen implications, so let me spell out what > I just implied: I think anyone who thinks that exclusive WSB is a good > alternative or even preferable to DB is actually deluding themselves for > some personal version of one of those isms mentioned above. > > Let's examine these alleged advantages objectively one for one. But > before that, just to calm troubled waters a little, allow me bring > forward the conclusion: > > Absolutely no intentions to remowe WSB from Python. Although one might > have gotten that impression from the early paragraphs, no intentions to > break downwards compatibility, either. > > What Python needs is an alternative to WSB and can stay Python by still > offering WSB to all those who happen to like it. > > Readable code, is it really an advantage? > > Two linebreaks, just for the suspense, then: > > Of course it is. > > Forcing the programmer to write readable code, is that an advantage? No > suspense, the answer is Of course not. > > Python may have started off as the casual scripting language for casual > people. People, who may not even have known programming. And perhaps it > has made sense to force -- or shall we say motivate, since you can still > produce perfectly obfuscated code with Python -- them to write readably. > > But Python has matured and so has its clientele. Python does not become > a better language, neither for beginners nor for experienced programmers > who also frequently use Python these days, by patronizing them and > restricting them in their freedom. > > Readable code? Yes. Forcing people to write readable code by artificial > means? No. > > Practice is evidence for the mischief of this policy: Does the FOSS > community suffer from a notorious lack of proper indention or > readability of code? Of course we don't. > > I'm not a native speaker, but dict.cc tells me that what we call "mit > Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called > breaking a fly on the wheel in English. > > I may lack the analogy for the fly on the wheel, which, if I'm not > mistaken, used to be a device for torture in the Middle Ages, but I can > tell you that the cannon ball which might have struck the sparrows, > coincidently caused havoc in the hinterlands. > > For the wide-spread and professional language Python is today, the idea > of forcing people to indent is misguided. These days, it may address a > neglible minority of absolute beginners who barely started programming > and would not listen to the simple advice of indenting properly, but on > the other hand it hurts/annoys/deters a great community of typical > programmers for whom DB has long become a de facto standard. > > For them, it's not a mere inconsistency without, for them, any apparent > reason. It's more than the inconvenience not being able to follow ones > long time practices, using the scripts one wrote for delimiters, the > shortcuts that are usually offered by editor, etc. > > It also brings about a whole class of new problems which may be > anticipated and prevent, yet bear a great potential for new, even > hard-to-find bugs (just in case anyone would respond that we had > eventually successfully redeemed the mismatched parenthesis problem - at > what cost?!). > > Not just difficult to find, near to impossible would be the right word > for anyone who has to review someone else's patch. > > It is widely known among the programmer's community that spaces and tabs > are remarkably similar to eachother. So similar even, that people fight > wars about which to use in a non-py context. It might strike one as an > equally remarkably nonsensical idea to give them programmatic meaning - > two DIFFERENT meanings, to make things even worse. > > While it becomes a practical impossibility to spot these kind of bugs > while reviewing code -- optionally mangled through a medium which > expands tabs to whitespace, not so much of a rarity -- it is still a > time-consuming and tedious job to find them in a local situation. > > More or less easily rectified, but once you spent a while trying to > figure something like that out, you inevitably have the urge to ask: Why? > > Last of all, some might argue that it's convenient to not to have type > delimiters. Well, be my guest. I also appreciate single lined > conditional or loops once in a while. I understand how not having to > type delimiters if you don't want them lifts a burden. Hence I would not > want rid Python of them. WSB may come in handy. But equally, it may not. > > Proposing the actual changes that would have to be made to accomodate > both, WSB and DB is beyond the scope of this script. It is the > CONCLUSION that the current situation is undesirable and Python, > although not apparent at the first glance, suffers from exclusive WSB, > which is the goal of this thread. > > Discussing has its etymological roots in Discourse, which connotes a > loosely guided conversation about a topic. Therefore, I conclude with a > > DEBATE!!!111 > > kind regards, > ? MD You do know braces are already in __future__ right? Also, why did you feel the need to post eleven thousand (1100) words on a topic settled more than a decade ago (Wed Feb 28 17:47:12 2001 +0000). As far as I can see, you also don't provide any argument towards making the language more complex beyond you not liking its current state. Your whole email is word-salad circling around that, and your finding insufficient tooling a reason to alter the language to fit the tool instead. PS: Haskell's core syntax is defined through braces and semicolons, and "layout" (also called "off-side rule") is added to allow getting rid of these when writing code. I do not remember *ever* seeing Haskell code written by a human which used braces. Not in repositories, not in blogs, not in forums, not in mailing lists, not anywhere. If the issues of indentation-based blocks were as dire as you seem to believe, this would not be possible, as developers would flock towards "safer" delimiter-block syntax, but facts belie your assertions, and explicit braces are instead relegated to generated code. Which could be a defense of adding braces to Python, but really, if you're generating Python code I think you should generate bytecode directly to ensure nobody will go editing generated code. Hence the issue of having to generate indentation is not an issue. From ben+python at benfinney.id.au Fri Dec 9 22:07:39 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 10 Dec 2011 08:07:39 +1100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <8762hp3144.fsf@benfinney.id.au> Cedric Sodhi writes: > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > SIMILAR, JUST DON'T. If you're going to post a long screed on a settled subject, and try to lay a heap of special restrictions in an open discussion forum on how you want people to respond: just don't. -- \ ?Don't be afraid of missing opportunities. Behind every failure | `\ is an opportunity somebody wishes they had missed.? ?Jane | _o__) Wagner, via Lily Tomlin | Ben Finney From mwm at mired.org Fri Dec 9 22:11:29 2011 From: mwm at mired.org (Mike Meyer) Date: Fri, 9 Dec 2011 13:11:29 -0800 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <20111209131129.1b804e52@mikmeyer-vm-fedora> On Fri, 9 Dec 2011 21:26:29 +0100 Cedric Sodhi wrote: > Readable code, is it really an advantage? > Of course it is. Ok, you got that right. > Forcing the programmer to write readable code, is that an advantage? > No suspense, the answer is Of course not. This is *not* an "Of course". Readable code is *important*. Giving programmers more power in exchange for less readable code is a bad trade. For an extended analsysis, see: http://blog.mired.org/2011/10/more-power-is-not-always-good-thing.html One of Python's best points is that the community resists the urge to add things just to add things. The community generally applies three tests to any feature before accepting it: 1) It should have a good use case. 2) It should enable more readable code for that use case. 3) It shouldn't make writing unreadable code easy. DB fails all three of these tests. It doesn't have a good use case. The code you create using it is not more readable than the alternative. And it definitely makes writing unreadable code easy. And of course, it violates TOOWTDI. References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <20111209131129.1b804e52@mikmeyer-vm-fedora> Message-ID: <20111209212650.GA4346@slate.Speedport_W_723V_Typ_A> On Fri, Dec 09, 2011 at 01:11:29PM -0800, Mike Meyer wrote: > On Fri, 9 Dec 2011 21:26:29 +0100 > Cedric Sodhi wrote: > > Readable code, is it really an advantage? > > Of course it is. > > Ok, you got that right. Thank you. It doesn't go unnoticed that you learned your Feedback Rules. > > > Forcing the programmer to write readable code, is that an advantage? > > No suspense, the answer is Of course not. > > This is *not* an "Of course". Readable code is *important*. Giving > programmers more power in exchange for less readable code is a bad > trade. For an extended analsysis, see: > http://blog.mired.org/2011/10/more-power-is-not-always-good-thing.html And here is the catch. The typical ignoratio elenchi which is frequently put forward by those who want to depict WSB as a neccessity, as a social contract ? Locke for the Python community, by which they oblidge themselves to write readable code. The fallacy is trivial, though, and even further supported by evidence presented by reality. Indeed, you pretty much serve the comeback on a silver plate: "Power in exchange for less readable code" There is no such exchange. Instead of further elaborating on why I say that, I leave it to you and possible others readers to recognize the fallacy as a whole. Rather, let me support the argument by the apparent evidence which I already emphasized in the introductory script: Not a single language in the FOSS community suffers from a lack of proper indention. The propagated fear of unreadable code is unjustified. That article you linked also completely ignores that. kind regards, Cedric From solipsis at pitrou.net Fri Dec 9 22:43:32 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Dec 2011 22:43:32 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <20111209224332.2ad19a28@pitrou.net> Dear Cedric, I'm guessing you drank too much (perhaps you are training for New Year's Eve), ate some bad sausages or are simply very self-complacent. python-dev is not the place where to post long unstructured ramblings with no practical value. Consider writing on your personal blog instead. Thank you Antoine. On Fri, 9 Dec 2011 21:26:29 +0100 Cedric Sodhi wrote: > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > SIMILAR, JUST DON'T. > > Otherwise, read on. > > I know very well that this topic has been discussed before. On forums. > Mailing lists. IRC. Blogs. From person to person, even. > > And I know equally well, from all those years experiencing > argument-turned-debates on the internet, how a (minor|major) fraction of > participants make up for their inability to lead a proper debate by > speaking the loudest of all, so that eventually quantity triumphs over > quality and logic. > > That ahead; I hope you can try not to fall in that category. Let instead > reason prevail over sentimentalism, mislead purism, elitism, and all > other sorts of isms which hinder advancement in the greater context. > > Python has surprised once already: The changes from 2 to 3 were not > downwards compatible because the core developers realized there is more > to a sustainable language than constantly patching it up until it comes > apart like the roman empire. > > Let's keep that spirit for a second and let us discuss braces, again, > with the clear goal of improving the language. > > End of disclaimer? > > End of disclaimer! > > Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has > reasons. What are those reasons? Well, primarily, it forces the > programmer to maintain well readable code. Then, some might argue, it is > quicker to type. > > Two reasons, but of what importance are they? And are they actually > reasons? > > You may guessed it from the questions themselves that I'm about to > question that. > > I don't intend to connote brazen implications, so let me spell out what > I just implied: I think anyone who thinks that exclusive WSB is a good > alternative or even preferable to DB is actually deluding themselves for > some personal version of one of those isms mentioned above. > > Let's examine these alleged advantages objectively one for one. But > before that, just to calm troubled waters a little, allow me bring > forward the conclusion: > > Absolutely no intentions to remowe WSB from Python. Although one might > have gotten that impression from the early paragraphs, no intentions to > break downwards compatibility, either. > > What Python needs is an alternative to WSB and can stay Python by still > offering WSB to all those who happen to like it. > > Readable code, is it really an advantage? > > Two linebreaks, just for the suspense, then: > > Of course it is. > > Forcing the programmer to write readable code, is that an advantage? No > suspense, the answer is Of course not. > > Python may have started off as the casual scripting language for casual > people. People, who may not even have known programming. And perhaps it > has made sense to force -- or shall we say motivate, since you can still > produce perfectly obfuscated code with Python -- them to write readably. > > But Python has matured and so has its clientele. Python does not become > a better language, neither for beginners nor for experienced programmers > who also frequently use Python these days, by patronizing them and > restricting them in their freedom. > > Readable code? Yes. Forcing people to write readable code by artificial > means? No. > > Practice is evidence for the mischief of this policy: Does the FOSS > community suffer from a notorious lack of proper indention or > readability of code? Of course we don't. > > I'm not a native speaker, but dict.cc tells me that what we call "mit > Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called > breaking a fly on the wheel in English. > > I may lack the analogy for the fly on the wheel, which, if I'm not > mistaken, used to be a device for torture in the Middle Ages, but I can > tell you that the cannon ball which might have struck the sparrows, > coincidently caused havoc in the hinterlands. > > For the wide-spread and professional language Python is today, the idea > of forcing people to indent is misguided. These days, it may address a > neglible minority of absolute beginners who barely started programming > and would not listen to the simple advice of indenting properly, but on > the other hand it hurts/annoys/deters a great community of typical > programmers for whom DB has long become a de facto standard. > > For them, it's not a mere inconsistency without, for them, any apparent > reason. It's more than the inconvenience not being able to follow ones > long time practices, using the scripts one wrote for delimiters, the > shortcuts that are usually offered by editor, etc. > > It also brings about a whole class of new problems which may be > anticipated and prevent, yet bear a great potential for new, even > hard-to-find bugs (just in case anyone would respond that we had > eventually successfully redeemed the mismatched parenthesis problem - at > what cost?!). > > Not just difficult to find, near to impossible would be the right word > for anyone who has to review someone else's patch. > > It is widely known among the programmer's community that spaces and tabs > are remarkably similar to eachother. So similar even, that people fight > wars about which to use in a non-py context. It might strike one as an > equally remarkably nonsensical idea to give them programmatic meaning - > two DIFFERENT meanings, to make things even worse. > > While it becomes a practical impossibility to spot these kind of bugs > while reviewing code -- optionally mangled through a medium which > expands tabs to whitespace, not so much of a rarity -- it is still a > time-consuming and tedious job to find them in a local situation. > > More or less easily rectified, but once you spent a while trying to > figure something like that out, you inevitably have the urge to ask: Why? > > Last of all, some might argue that it's convenient to not to have type > delimiters. Well, be my guest. I also appreciate single lined > conditional or loops once in a while. I understand how not having to > type delimiters if you don't want them lifts a burden. Hence I would not > want rid Python of them. WSB may come in handy. But equally, it may not. > > Proposing the actual changes that would have to be made to accomodate > both, WSB and DB is beyond the scope of this script. It is the > CONCLUSION that the current situation is undesirable and Python, > although not apparent at the first glance, suffers from exclusive WSB, > which is the goal of this thread. > > Discussing has its etymological roots in Discourse, which connotes a > loosely guided conversation about a topic. Therefore, I conclude with a > > DEBATE!!!111 > > kind regards, > -- MD > > (not proof-read) From ethan at stoneleaf.us Fri Dec 9 22:36:25 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 09 Dec 2011 13:36:25 -0800 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <4EE27F59.6080308@stoneleaf.us> This belongs on python-ideas. Please take it there. ~Ethan~ From donald.stufft at gmail.com Fri Dec 9 22:53:32 2011 From: donald.stufft at gmail.com (Donald Stufft) Date: Fri, 9 Dec 2011 16:53:32 -0500 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209224332.2ad19a28@pitrou.net> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <20111209224332.2ad19a28@pitrou.net> Message-ID: <3202CD7C22604106874163646AD9E8D9@gmail.com> I don't always post to python-dev, but when I do I ask for braces. On Friday, December 9, 2011 at 4:43 PM, Antoine Pitrou wrote: > > Dear Cedric, > > I'm guessing you drank too much (perhaps you are training for New Year's > Eve), ate some bad sausages or are simply very self-complacent. > python-dev is not the place where to post long unstructured ramblings > with no practical value. Consider writing on your personal blog > instead. > > Thank you > > Antoine. > > > > On Fri, 9 Dec 2011 21:26:29 +0100 > Cedric Sodhi wrote: > > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > > SIMILAR, JUST DON'T. > > > > Otherwise, read on. > > > > I know very well that this topic has been discussed before. On forums. > > Mailing lists. IRC. Blogs. From person to person, even. > > > > And I know equally well, from all those years experiencing > > argument-turned-debates on the internet, how a (minor|major) fraction of > > participants make up for their inability to lead a proper debate by > > speaking the loudest of all, so that eventually quantity triumphs over > > quality and logic. > > > > That ahead; I hope you can try not to fall in that category. Let instead > > reason prevail over sentimentalism, mislead purism, elitism, and all > > other sorts of isms which hinder advancement in the greater context. > > > > Python has surprised once already: The changes from 2 to 3 were not > > downwards compatible because the core developers realized there is more > > to a sustainable language than constantly patching it up until it comes > > apart like the roman empire. > > > > Let's keep that spirit for a second and let us discuss braces, again, > > with the clear goal of improving the language. > > > > End of disclaimer? > > > > End of disclaimer! > > > > Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has > > reasons. What are those reasons? Well, primarily, it forces the > > programmer to maintain well readable code. Then, some might argue, it is > > quicker to type. > > > > Two reasons, but of what importance are they? And are they actually > > reasons? > > > > You may guessed it from the questions themselves that I'm about to > > question that. > > > > I don't intend to connote brazen implications, so let me spell out what > > I just implied: I think anyone who thinks that exclusive WSB is a good > > alternative or even preferable to DB is actually deluding themselves for > > some personal version of one of those isms mentioned above. > > > > Let's examine these alleged advantages objectively one for one. But > > before that, just to calm troubled waters a little, allow me bring > > forward the conclusion: > > > > Absolutely no intentions to remowe WSB from Python. Although one might > > have gotten that impression from the early paragraphs, no intentions to > > break downwards compatibility, either. > > > > What Python needs is an alternative to WSB and can stay Python by still > > offering WSB to all those who happen to like it. > > > > Readable code, is it really an advantage? > > > > Two linebreaks, just for the suspense, then: > > > > Of course it is. > > > > Forcing the programmer to write readable code, is that an advantage? No > > suspense, the answer is Of course not. > > > > Python may have started off as the casual scripting language for casual > > people. People, who may not even have known programming. And perhaps it > > has made sense to force -- or shall we say motivate, since you can still > > produce perfectly obfuscated code with Python -- them to write readably. > > > > But Python has matured and so has its clientele. Python does not become > > a better language, neither for beginners nor for experienced programmers > > who also frequently use Python these days, by patronizing them and > > restricting them in their freedom. > > > > Readable code? Yes. Forcing people to write readable code by artificial > > means? No. > > > > Practice is evidence for the mischief of this policy: Does the FOSS > > community suffer from a notorious lack of proper indention or > > readability of code? Of course we don't. > > > > I'm not a native speaker, but dict.cc (http://dict.cc) tells me that what we call "mit > > Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called > > breaking a fly on the wheel in English. > > > > I may lack the analogy for the fly on the wheel, which, if I'm not > > mistaken, used to be a device for torture in the Middle Ages, but I can > > tell you that the cannon ball which might have struck the sparrows, > > coincidently caused havoc in the hinterlands. > > > > For the wide-spread and professional language Python is today, the idea > > of forcing people to indent is misguided. These days, it may address a > > neglible minority of absolute beginners who barely started programming > > and would not listen to the simple advice of indenting properly, but on > > the other hand it hurts/annoys/deters a great community of typical > > programmers for whom DB has long become a de facto standard. > > > > For them, it's not a mere inconsistency without, for them, any apparent > > reason. It's more than the inconvenience not being able to follow ones > > long time practices, using the scripts one wrote for delimiters, the > > shortcuts that are usually offered by editor, etc. > > > > It also brings about a whole class of new problems which may be > > anticipated and prevent, yet bear a great potential for new, even > > hard-to-find bugs (just in case anyone would respond that we had > > eventually successfully redeemed the mismatched parenthesis problem - at > > what cost?!). > > > > Not just difficult to find, near to impossible would be the right word > > for anyone who has to review someone else's patch. > > > > It is widely known among the programmer's community that spaces and tabs > > are remarkably similar to eachother. So similar even, that people fight > > wars about which to use in a non-py context. It might strike one as an > > equally remarkably nonsensical idea to give them programmatic meaning - > > two DIFFERENT meanings, to make things even worse. > > > > While it becomes a practical impossibility to spot these kind of bugs > > while reviewing code -- optionally mangled through a medium which > > expands tabs to whitespace, not so much of a rarity -- it is still a > > time-consuming and tedious job to find them in a local situation. > > > > More or less easily rectified, but once you spent a while trying to > > figure something like that out, you inevitably have the urge to ask: Why? > > > > Last of all, some might argue that it's convenient to not to have type > > delimiters. Well, be my guest. I also appreciate single lined > > conditional or loops once in a while. I understand how not having to > > type delimiters if you don't want them lifts a burden. Hence I would not > > want rid Python of them. WSB may come in handy. But equally, it may not. > > > > Proposing the actual changes that would have to be made to accomodate > > both, WSB and DB is beyond the scope of this script. It is the > > CONCLUSION that the current situation is undesirable and Python, > > although not apparent at the first glance, suffers from exclusive WSB, > > which is the goal of this thread. > > > > Discussing has its etymological roots in Discourse, which connotes a > > loosely guided conversation about a topic. Therefore, I conclude with a > > > > DEBATE!!!111 > > > > kind regards, > > -- MD > > > > (not proof-read) > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org (mailto:Python-Dev at python.org) > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marty at martyalchin.com Fri Dec 9 22:53:45 2011 From: marty at martyalchin.com (Marty Alchin) Date: Fri, 9 Dec 2011 13:53:45 -0800 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: You've really only given one reason why braces are a good idea: "I also appreciate single lined conditional or loops once in a while." Not only is this argument even weaker than the two you yourself gave in defense of whitespace, these two features are already supported in Python. If you're not aware of them, perhaps you should spend some quality time with the documentation rather than suggesting unnecessary changes. -Marty -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 9 23:21:42 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Dec 2011 14:21:42 -0800 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi wrote: > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > SIMILAR, JUST DON'T. > Every single response in this thread so far has ignored this request. The correct response honoring this should have been deafening silence. For me, if I had to design a new language today, I would probably use braces, not because they're better than whitespace, but because pretty much every other lanugage uses them, and there are more interesting concepts to distinguish a new language. That said, I don't regret that Python uses indentation, and the rest I have to say about the topic would violate the above request. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From manday at gmx.net Fri Dec 9 23:29:30 2011 From: manday at gmx.net (Cedric Sodhi) Date: Fri, 9 Dec 2011 23:29:30 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A> On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote: > On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <[1]manday at gmx.net> wrote: > > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > SIMILAR, JUST DON'T. > > Every single response in this thread so far has ignored this request. The > correct response honoring this should have been deafening silence. > > For me, if I had to design a new language today, I would probably use > braces, not because they're better than whitespace, but because pretty > much every other lanugage uses them, and there are more interesting > concepts to distinguish a new language. That said, I don't regret that > Python uses indentation, and the rest I have to say about the topic would > violate the above request. > I think this deserves a reply. Thank you for contributing your opinion and respecting my request and therefore honoring the rules of a civilized debate. -- Cedric From anacrolix at gmail.com Fri Dec 9 23:40:54 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 10 Dec 2011 09:40:54 +1100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A> Message-ID: If braces were introduced I would switch to Haskell, I can't stand the noise. If you want to see a language that allows both whitespace, semi colons and braces take a look at it. Nails it. On Dec 10, 2011 9:31 AM, "Cedric Sodhi" wrote: > On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote: > > On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <[1]manday at gmx.net> > wrote: > > > > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD > BEEN > > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT > "WHO > > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > > SIMILAR, JUST DON'T. > > > > Every single response in this thread so far has ignored this request. > The > > correct response honoring this should have been deafening silence. > > > > For me, if I had to design a new language today, I would probably use > > braces, not because they're better than whitespace, but because pretty > > much every other lanugage uses them, and there are more interesting > > concepts to distinguish a new language. That said, I don't regret that > > Python uses indentation, and the rest I have to say about the topic > would > > violate the above request. > > > > I think this deserves a reply. Thank you for contributing your opinion > and respecting my request and therefore honoring the rules of a > civilized debate. > > -- Cedric > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Fri Dec 9 23:43:59 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 10 Dec 2011 09:43:59 +1100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <69816.1323459197@parc.com> References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> <69816.1323459197@parc.com> Message-ID: I second this. The doco is very bad. On Dec 10, 2011 6:34 AM, "Bill Janssen" wrote: > Xavier Morel wrote: > > > On 2011-12-09, at 19:15 , Bill Janssen wrote: > > > I use ElementTree for parsing valid XML, but minidom for producing it. > > Could you expand on your reasons to use minidom for producing XML? > > Inertia, I guess. I tried that first, and it seems to work. > > I tend to use html5lib and/or BeautifulSoup instead of ElementTree, and > that's mainly because I find the documentation for ElementTree is > confusing and partial and inconsistent. Having various undated but > obsolete tutorials and documentation still up on effbot.org doesn't > help. > > > Bill > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From manday at gmx.net Fri Dec 9 23:58:06 2011 From: manday at gmx.net (Cedric Sodhi) Date: Fri, 9 Dec 2011 23:58:06 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A> Message-ID: <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A> I reply to your contribution mainly because I see another, valid argument hidden in what you formulated as an opinion: Readability would be reduced by such "noise". To anticipate other people agreeing with that, let me say, that it would be exactly one more character, and the same amount of key presses. All that, assuming you use editor automatisms, which particularly the advocates of WSB tend to bring forth in defense of WSB and aforementioned problems associated with it. Only one more character and not more key presses? Yes, instead of opening a block with a colon, you open it with an opening bracket. And you close it with a closing one. Referring to "noise", I take it you are preferring naturally expressed languages (what Roff's PIC, for example, exemplifies to banality). How is a COLON, which, in natural language, PUNCTUATES a context, any more suited than braces, which naturally ENCLOSE a structure? Obviously, it by far is not, even from the standpoint of not intersparsing readable code with unnatural characters. On Sat, Dec 10, 2011 at 09:40:54AM +1100, Matt Joiner wrote: > If braces were introduced I would switch to Haskell, I can't stand the > noise. If you want to see a language that allows both whitespace, semi > colons and braces take a look at it. Nails it. > > On Dec 10, 2011 9:31 AM, "Cedric Sodhi" <[1]manday at gmx.net> wrote: > > On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote: > > ? ?On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi > <[1][2]manday at gmx.net> wrote: > > > > ? ? ?IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS > HAD BEEN > > ? ? ?DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", > THAT "WHO > > ? ? ?DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING > > ? ? ?SIMILAR, JUST DON'T. > > > > ? ?Every single response in this thread so far has ignored this > request. The > > ? ?correct response honoring this should have been deafening silence. > > > > ? ?For me, if I had to design a new language today, I would probably > use > > ? ?braces, not because they're better than whitespace, but because > pretty > > ? ?much every other lanugage uses them, and there are more interesting > > ? ?concepts to distinguish a new language. That said, I don't regret > that > > ? ?Python uses indentation, and the rest I have to say about the topic > would > > ? ?violate the above request. > > > > I think this deserves a reply. Thank you for contributing your opinion > and respecting my request and therefore honoring the rules of a > civilized debate. > > -- Cedric > _______________________________________________ > Python-Dev mailing list > [3]Python-Dev at python.org > [4]http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > [5]http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > > References > > Visible links > 1. mailto:manday at gmx.net > 2. mailto:manday at gmx.net > 3. mailto:Python-Dev at python.org > 4. http://mail.python.org/mailman/listinfo/python-dev > 5. http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com From guido at python.org Sat Dec 10 00:03:08 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Dec 2011 15:03:08 -0800 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A> <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A> Message-ID: Point of order (repeated), please move this thread to python-ideas. -- --Guido van Rossum (python.org/~guido) From vinay_sajip at yahoo.co.uk Sat Dec 10 00:12:09 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Fri, 9 Dec 2011 23:12:09 +0000 (UTC) Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> <7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com> Message-ID: Glyph twistedmatrix.com> writes: > The biggest issue for the single-codebase approach is 'except ... as ...'. >?Peppering one's codebase with calls to sys.exc_info() can be a real > performance problem, especially on PyPy. ?Not to mention how ugly it is. > For some reason I thought that this syntax was only supported by 2.7 and up; > I see now that it's 2.6 and up. Granted that it's ugly, but where is the evidence that it can be a real performance problem? I mean in practice on real projects, rather than in theory or on code contrived to show up a problem. Please note, I'm not saying it isn't a real performance problem, I'm just asking where the evidence is, whether running on PyPy or elsewhere. Regards, Vinay Sajip From steve at pearwood.info Sat Dec 10 01:01:04 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 10 Dec 2011 11:01:04 +1100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A> <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A> Message-ID: <4EE2A140.2010509@pearwood.info> Guido van Rossum wrote: > Point of order (repeated), please move this thread to python-ideas. Isn't that cruel to the people reading python-ideas? -- Steven From anacrolix at gmail.com Sat Dec 10 02:35:20 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 10 Dec 2011 12:35:20 +1100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A> <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A> Message-ID: Ditch the colon too. Also you're a troll. On Dec 10, 2011 9:58 AM, "Cedric Sodhi" wrote: > I reply to your contribution mainly because I see another, valid > argument hidden in what you formulated as an opinion: > > Readability would be reduced by such "noise". To anticipate other people > agreeing with that, let me say, that it would be exactly one more > character, and the same amount of key presses. All that, assuming you > use editor automatisms, which particularly the advocates of WSB tend to > bring forth in defense of WSB and aforementioned problems associated > with it. > > Only one more character and not more key presses? Yes, instead of > opening a block with a colon, you open it with an opening bracket. And > you close it with a closing one. > > Referring to "noise", I take it you are preferring naturally expressed > languages (what Roff's PIC, for example, exemplifies to banality). > > How is a COLON, which, in natural language, PUNCTUATES a context, any > more suited than braces, which naturally ENCLOSE a structure? > > Obviously, it by far is not, even from the standpoint of not > intersparsing readable code with unnatural characters. > > On Sat, Dec 10, 2011 at 09:40:54AM +1100, Matt Joiner wrote: > > If braces were introduced I would switch to Haskell, I can't stand the > > noise. If you want to see a language that allows both whitespace, semi > > colons and braces take a look at it. Nails it. > > > > On Dec 10, 2011 9:31 AM, "Cedric Sodhi" <[1]manday at gmx.net> wrote: > > > > On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote: > > > On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi > > <[1][2]manday at gmx.net> wrote: > > > > > > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT > THIS > > HAD BEEN > > > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", > > THAT "WHO > > > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR > SOMETHING > > > SIMILAR, JUST DON'T. > > > > > > Every single response in this thread so far has ignored this > > request. The > > > correct response honoring this should have been deafening > silence. > > > > > > For me, if I had to design a new language today, I would > probably > > use > > > braces, not because they're better than whitespace, but because > > pretty > > > much every other lanugage uses them, and there are more > interesting > > > concepts to distinguish a new language. That said, I don't > regret > > that > > > Python uses indentation, and the rest I have to say about the > topic > > would > > > violate the above request. > > > > > > > I think this deserves a reply. Thank you for contributing your > opinion > > and respecting my request and therefore honoring the rules of a > > civilized debate. > > > > -- Cedric > > _______________________________________________ > > Python-Dev mailing list > > [3]Python-Dev at python.org > > [4]http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > [5] > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > > > > References > > > > Visible links > > 1. mailto:manday at gmx.net > > 2. mailto:manday at gmx.net > > 3. mailto:Python-Dev at python.org > > 4. http://mail.python.org/mailman/listinfo/python-dev > > 5. > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Sat Dec 10 04:28:06 2011 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 10 Dec 2011 05:28:06 +0200 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> <69816.1323459197@parc.com> Message-ID: On Sat, Dec 10, 2011 at 00:43, Matt Joiner wrote: > I second this. The doco is very bad. > It would be constructive to open issues for specific problems in the documentation. I'm sure this won't be hard to fix. Documentation should not be the roadblock for using a library. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Dec 10 05:01:00 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 09 Dec 2011 23:01:00 -0500 Subject: [Python-Dev] re.findall() should return named tuple In-Reply-To: References: Message-ID: On 12/8/2011 8:31 AM, Philipp A. wrote: > hi devs, > > just an idea that popped up in my mind: re.findall() returns a list of > tuples, where every entry of each tuple represents a match group. > since match groups can be named, we are able to use named tuples instead > of plain tuples here, in the same fashion as namedtuple?s rename works: > misssing group names get renamed to _1 and so on. i suggest to add the > rename keyword option, to findall, defaulting to True, since mixed > positional and named tuples are more common than in usual use cases of > namedtuple. > > do you think it?s a good idea? I have not used named tuples or re.findall (much), so I have no opinion). > finally: should i join the mailing list to see answers? should i file a > PEP? i have no idea how the inner workings of python development are, > but i wanted to share this idea with you :) Ideas like this should either go the the python-ideas list or to the tracker at bugs.python.org as a feature request. If you post to the list, you should either subscribe at mail.python.org or follow it as a newsgroup at news.gmane.org (which is what I do). Posting a tracker issue requires registration of some sort. -- Terry Jan Reedy From tjreedy at udel.edu Sat Dec 10 05:11:57 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 09 Dec 2011 23:11:57 -0500 Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?) In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On 12/9/2011 5:17 AM, Nick Coghlan wrote: > As Chris pointed out though, the real problem with the "repeatedly run > 2to3" workflow is that it can make interpreting tracebacks from the > field *really* hard. This just gave me the idea of tagging tracebacks with the Python version number. Something like Traceback (Py3.2.2, most recent call last): and perhaps with the platform also Traceback (most recent call last) [Py3.2.2 on win23]: Since computation has stopped, the few extra milliseconds is trivial. This would certainly help on Python list and the tracker when people do post the traceback (which they do not always) without version and system (which they often do not, especially on Python list). It might suggest to people that this is important info to include. I wonder if this would also help with tracebacks sent to library/app developers. -- Terry Jan Reedy From ncoghlan at gmail.com Sat Dec 10 06:55:45 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 10 Dec 2011 15:55:45 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> Message-ID: On Sat, Dec 10, 2011 at 5:58 AM, PJ Eby wrote: > In fact, I'm not sure why people are bringing it into this discussion at > all: PEP 3333 was designed to work well with 2to3, which does the right > thing for WSGI code: it converts 2.x "str" to 3.x "str", as it should. ?If > you're writing 2.x WSGI code with 'u' literals, *your code is broken*. > > WSGI doesn't need 'u' literals and never has. ?It *does* need b'' literals > for stuff that refers to request and response bodies, but everything else > should be plain old string literals for the appropriate Python version. The reason it came up is that the reason "from __future__ import unicode_literals" doesn't obviously help with doing single codebase style ports for a lot of WSGI related code is because such code actually has *3* string types to deal with: Actual text (u'', unicode -> str) Native strings for WSGI ('', str -> str) Binary data (b'', str -> bytes) That works fine with 2to3, since 2to3 will strip out the leading 'u' from the actual text literals, but presents a potential hassle for the single codebase approach. Most other contexts only need the binary->binary and text->text conversion, so the future import really helps out. However, I just realised that there actually *is* a relatively clear way to spell this for all 2.6+ versions: the future import *doesn't* change the meaning of the 'str' builtin (it's still the 8-bit string type in 2.x), so the native way to spell the above distinction when "from __future__ import unicode_literals" is in effect is as follows: Actual text: '' Native strings for WSGI: str('') Binary data: b'' Calling a builtin is much lower overhead than calling a helper from a compatibility module, and this also makes it clear that native strings are the odd ones out. So I'm back to being -1 on the idea of adding back u'' literals for 3.3. Instead, people should explicitly call str() on any literals that they want to be actual str instances both in 3.x and in 2.x when the unicode literals future import is in effect. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stefan_ml at behnel.de Sat Dec 10 08:38:35 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 10 Dec 2011 08:38:35 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <68178.1323454554@parc.com> References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> Message-ID: Bill Janssen, 09.12.2011 19:15: > I think another thing that might go into "refreshing the batteries" is a > feature comparison of BeautifulSoup and HTML5lib against the stdlib > competition, to see what needs to be added/revised. Having to switch to > an outside package for parsing possibly invalid HTML is a pain. Such a feature request should be worth a separate thread. Note, however, that html5lib is likely way too big to add it to the stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, which would be the target release series for better HTML support. So, whatever library or API you would want to use for HTML processing is currently only the second question as long as Py3 lacks a real-world HTML parser in the stdlib, as well as a robust character detection mechanism. I don't think that can be fixed all that easily. Stefan From timwintle at gmail.com Sat Dec 10 09:28:33 2011 From: timwintle at gmail.com (Tim Wintle) Date: Sat, 10 Dec 2011 08:28:33 +0000 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> Message-ID: <1323505713.13580.19.camel@tim-laptop> On Fri, 2011-12-09 at 19:39 +0100, Xavier Morel wrote: > On 2011-12-09, at 19:15 , Bill Janssen wrote: > > I use ElementTree for parsing valid XML, but minidom for producing it. > Could you expand on your reasons to use minidom for producing XML? To throw my 2c in here: I personally normally use minidom for manipulating (x)html data (through html5lib), and for writing XML. I think it's primarily because DOM: a) matches the way I think about XML documents. b) Provides the same API as I use in other languages. (FWIW, I do a lot of DOM manipulation in javascript) c) "Feels" (to me) more similar to other formats I work with. All three may be because I haven't spent enough time with ElementTree - again I've found the documentation lacking. Tim From ben+python at benfinney.id.au Sat Dec 10 13:15:07 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 10 Dec 2011 23:15:07 +1100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <871usc39o4.fsf@benfinney.id.au> Guido van Rossum writes: > On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi wrote: > > > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD > > BEEN DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", > > THAT "WHO DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR > > SOMETHING SIMILAR, JUST DON'T. > > Every single response in this thread so far has ignored this request. The request was completely unreasonable. Cedric does not get to unilaterally set restrictions on who and how people respond to a screed in a public forum. > the rest I have to say about the topic would violate the above > request. You have my permission to violate the above request. That should have at least as much authority as the request itself, so you are hereby empowered to respond as you like. -- \ ?We can't depend for the long run on distinguishing one | `\ bitstream from another in order to figure out which rules | _o__) apply.? ?Eben Moglen, _Anarchism Triumphant_, 1999 | Ben Finney From guido at python.org Sat Dec 10 17:06:57 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Dec 2011 08:06:57 -0800 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <871usc39o4.fsf@benfinney.id.au> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <871usc39o4.fsf@benfinney.id.au> Message-ID: On Sat, Dec 10, 2011 at 4:15 AM, Ben Finney wrote: > Guido van Rossum writes: > > > On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi wrote: > > > > > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD > > > BEEN DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", > > > THAT "WHO DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR > > > SOMETHING SIMILAR, JUST DON'T. > > > > Every single response in this thread so far has ignored this request. > > The request was completely unreasonable. Cedric does not get to > unilaterally set restrictions on who and how people respond to a screed > in a public forum. > Oh, of course. I was just playing along. But my real point was to berate the community for responding at all to such an obvious trolling post. > > the rest I have to say about the topic would violate the above > > request. > > You have my permission to violate the above request. That should have at > least as much authority as the request itself, so you are hereby > empowered to respond as you like. > It would be utterly redundant. He said it all in his all-caps message. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Sat Dec 10 12:14:13 2011 From: francismb at email.de (francis) Date: Sat, 10 Dec 2011 12:14:13 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> Message-ID: <4EE33F05.4000907@email.de> Hi Cedric, On 12/09/2011 09:26 PM, Cedric Sodhi wrote: > It is widely known among the programmer's community that spaces and tabs > are remarkably similar to eachother. So similar even, that people fight > wars about which to use in a non-py context. It might strike one as an > equally remarkably nonsensical idea to give them programmatic meaning - > two DIFFERENT meanings, to make things even worse. > > While it becomes a practical impossibility to spot these kind of bugs > while reviewing code -- optionally mangled through a medium which > expands tabs to whitespace, not so much of a rarity -- it is still a > time-consuming and tedious job to find them in a local situation. > I'm not so experienced with python as the majority of people here, but I've read that the practice is: do not to mix them (spaces and tabs). If this is taking much of you time while reviewing I would recommend you to let some script run on you code first to spot that mixture. IMHO that is a rule that should go in the code rules of your project and the build process should break if this mixture if found. Don't let that code reach the sync repository. As I said I'm maybe failing to see some case. Formatting is like food, everyone has it's own taste. One has to use spicery to change it (if possible). For me the view of the code (the layout) by the programmer should be automatically changed by the tool that reads the code. Here you could have a python with braces if you want... (I thing that 'go' has some autoformater or a standard way of formatting). -- francis From pje at telecommunity.com Sat Dec 10 18:09:59 2011 From: pje at telecommunity.com (PJ Eby) Date: Sat, 10 Dec 2011 12:09:59 -0500 Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?) In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On Fri, Dec 9, 2011 at 11:11 PM, Terry Reedy wrote: > This just gave me the idea of tagging tracebacks with the Python version > number. Something like > > Traceback (Py3.2.2, most recent call last): > > and perhaps with the platform also > > Traceback (most recent call last) [Py3.2.2 on win23]: > > Since computation has stopped, the few extra milliseconds is trivial. This > would certainly help on Python list and the tracker when people do post the > traceback (which they do not always) without version and system (which they > often do not, especially on Python list). It might suggest to people that > this is important info to include. I wonder if this would also help with > tracebacks sent to library/app developers. > Yes, but doctest will need to take this into account, both for its native traceback matcher, and for traceback matches using ellipses. Otherwise you introduce more Python version hell for doctest users. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Dec 10 18:52:37 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 11 Dec 2011 04:52:37 +1100 Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?) In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: <4EE39C65.60508@pearwood.info> Terry Reedy wrote: > On 12/9/2011 5:17 AM, Nick Coghlan wrote: > >> As Chris pointed out though, the real problem with the "repeatedly run >> 2to3" workflow is that it can make interpreting tracebacks from the >> field *really* hard. > > This just gave me the idea of tagging tracebacks with the Python version > number. Something like > > Traceback (Py3.2.2, most recent call last): > > and perhaps with the platform also > > Traceback (most recent call last) [Py3.2.2 on win23]: > > Since computation has stopped, the few extra milliseconds is trivial. > This would certainly help on Python list and the tracker when people do > post the traceback (which they do not always) without version and system > (which they often do not, especially on Python list). It might suggest > to people that this is important info to include. [...] But how often is it actually important information to include? I am active on both the tutor and the python-list lists, and it seems to me that this proposed feature won't be very useful in either place. In my experience, the version number is rarely important for the sorts of questions that are commonly asked. Python is quite a stable language, and alist = alist.append(1) has confused newbies since version 1.5 and will probably continue confusing them in version 4000. (Aside: I was reading historical What's New docs today, and was stunned to realise how many cool features go back all the way to version 2.0.) Obviously there are times where knowing the version is useful, but often you can often derive the version number from the error (at least to 1 significant figure): >>> map(chr, (40, 41, 42))[1] Traceback (most recent call last): File "", line 1, in TypeError: 'map' object is not subscriptable Assuming map has not been shadowed, this is obviously Python 3. If the question involves tracking down an actual bug in Python, the version number becomes important. E.g. "it works as documented in 2.6 on Linux, but not in 2.7 on OS-X" sort of thing. But that's quite unusual. Newbies barely read tracebacks at all. Adding the version number and platform will just add more text which they won't read and will probably discourage them further from reading it (more text = less chance they read it). Experienced coders tend to know when the version number is important and provided it only when necessary. So it's hard to see who this is aimed at... users experienced enough to pay attention to tracebacks but not experienced enough to know when to provide the version number? YMMV, but I don't see much value in this. If it comes at the cost of making doctest harder to use, I'm actively against it. Otherwise I'm just mildly "meh, why bother?". -- Steven From storchaka at gmail.com Sat Dec 10 20:50:15 2011 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 10 Dec 2011 21:50:15 +0200 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <4EE33F05.4000907@email.de> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <4EE33F05.4000907@email.de> Message-ID: 10.12.11 13:14, francis ???????(??): > Formatting is like food, everyone has it's own taste. One has > to use spicery to change it (if possible). For me the view of > the code (the layout) by the programmer should be automatically > changed by the tool that reads the code. Here you could have > a python with braces if you want... (I thing that 'go' has some > autoformater or a standard way of formatting). pindent -c From python-dev at masklinn.net Sat Dec 10 21:35:40 2011 From: python-dev at masklinn.net (Xavier Morel) Date: Sat, 10 Dec 2011 21:35:40 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <4EE33F05.4000907@email.de> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <4EE33F05.4000907@email.de> Message-ID: On 2011-12-10, at 12:14 , francis wrote: > > (I thing that 'go' has some > autoformater or a standard way of formatting). `gofmt` yes, it simply reformats all the code to match the style decided by the core go team, it does not provide support formatting- independent edition. Think of it as pep8.py editing the code in place instead of just reporting the stuff it does not like. From janssen at parc.com Sat Dec 10 21:54:09 2011 From: janssen at parc.com (Bill Janssen) Date: Sat, 10 Dec 2011 12:54:09 PST Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> Message-ID: <85935.1323550449@parc.com> Stefan Behnel wrote: > Bill Janssen, 09.12.2011 19:15: > > I think another thing that might go into "refreshing the batteries" is a > > feature comparison of BeautifulSoup and HTML5lib against the stdlib > > competition, to see what needs to be added/revised. Having to switch to > > an outside package for parsing possibly invalid HTML is a pain. > > Such a feature request should be worth a separate thread. > > Note, however, that html5lib is likely way too big to add it to the > stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML > in Python 3, which would be the target release series for better HTML > support. So, whatever library or API you would want to use for HTML > processing is currently only the second question as long as Py3 lacks > a real-world HTML parser in the stdlib, as well as a robust character > detection mechanism. I don't think that can be fixed all that easily. Sounds like it needs a PEP. I'm only advocating spending some thought on what needs to be done -- whether outside libraries need to be adopted into the stdlib would be a step after that. But understanding *why* those libraries exist and are widely used should be a prerequisite to "refreshing" the stdlib's support. Bill From glyph at twistedmatrix.com Sat Dec 10 22:32:46 2011 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Sat, 10 Dec 2011 16:32:46 -0500 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> Message-ID: <4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com> On Dec 10, 2011, at 2:38 AM, Stefan Behnel wrote: > Note, however, that html5lib is likely way too big to add it to the stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, which would be the target release series for better HTML support. So, whatever library or API you would want to use for HTML processing is currently only the second question as long as Py3 lacks a real-world HTML parser in the stdlib, as well as a robust character detection mechanism. I don't think that can be fixed all that easily. Here's the problem in a nutshell, I think: Everybody wants an HTML parser in the stdlib, because it's inconvenient to pull in a dependency for such a "simple" task. Everybody wants the stdlib to remain small, stable, and simple and not get "overcomplicated". Parsing arbitrary HTML5 is a monstrously complex problem, for which there exist rapidly-evolving standards and libraries to deal with it. Parsing 'the web' (which is rapidly growing to include stuff like SVG, MathML etc) is even harder. My personal opinion is that HTML5Lib gets this problem almost completely right, and so it should be absorbed by the stdlib. Trying to re-invent this from scratch, or even use something like BeautifulSoup which uses a bunch of heuristics and hacks rather than reference to the laboriously-crafted standard that says exactly how parsing malformed stuff has to go to be "like a browser", seems like it will just give the stdlib solution a reputation for working on the test input but not working in the real world. (No disrespect to BeautifulSoup: it was a great attempt in the pre-HTML5 world which it was born into, and I've used it numerous times to implement useful things. But much more effort has been poured into this problem since then, and the problems are better understood now.) -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From regebro at gmail.com Sat Dec 10 22:56:15 2011 From: regebro at gmail.com (Lennart Regebro) Date: Sat, 10 Dec 2011 22:56:15 +0100 Subject: [Python-Dev] [PATCH] Adding braces to __future__ In-Reply-To: <871usc39o4.fsf@benfinney.id.au> References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A> <871usc39o4.fsf@benfinney.id.au> Message-ID: On Sat, Dec 10, 2011 at 13:15, Ben Finney wrote: > Guido van Rossum writes: > >> On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi wrote: >> >> > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD >> > BEEN DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", >> > THAT "WHO DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR >> > SOMETHING SIMILAR, JUST DON'T. >> >> Every single response in this thread so far has ignored this request. > > The request was completely unreasonable. As it basically said "I will ignore everything everyone ever will say on this issue, and if you don't think I should do that, then you should ignore me", I find the request very reasonable. I wish more people would advertise that they not only know about the facts of the matter but completely ignore them. It's basically a big sign saying "LALALALIMNOTLISTENING", which would shorten a lot of internet debates if it was more widely used. :-) //Lennart From tjreedy at udel.edu Sat Dec 10 23:30:49 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 10 Dec 2011 17:30:49 -0500 Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?) In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On 12/10/2011 12:09 PM, PJ Eby wrote: > On Fri, Dec 9, 2011 at 11:11 PM, Terry Reedy > wrote: > > This just gave me the idea of tagging tracebacks with the Python > version number. Something like > > Traceback (Py3.2.2, most recent call last): > > and perhaps with the platform also > > Traceback (most recent call last) [Py3.2.2 on win23]: > > Since computation has stopped, the few extra milliseconds is > trivial. This would certainly help on Python list and the tracker > when people do post the traceback (which they do not always) without > version and system (which they often do not, especially on Python > list). It might suggest to people that this is important info to > include. I wonder if this would also help with tracebacks sent to > library/app developers. > > > Yes, but doctest will need to take this into account, both for its > native traceback matcher, and for traceback matches using ellipses. > Otherwise you introduce more Python version hell for doctest users. Is doctest really insisting that the whole line Traceback (most recent call last): exactly match, with nothing added? It really should not, as that is not part of the language spec. This seems like the tail wagging the dog. -- Terry Jan Reedy From tjreedy at udel.edu Sat Dec 10 23:44:18 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 10 Dec 2011 17:44:18 -0500 Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?) In-Reply-To: <4EE39C65.60508@pearwood.info> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <4EE39C65.60508@pearwood.info> Message-ID: On 12/10/2011 12:52 PM, Steven D'Aprano wrote: > Terry Reedy wrote: >> On 12/9/2011 5:17 AM, Nick Coghlan wrote: >> >>> As Chris pointed out though, the real problem with the "repeatedly run >>> 2to3" workflow is that it can make interpreting tracebacks from the >>> field *really* hard. >> >> This just gave me the idea of tagging tracebacks with the Python >> version number. Something like >> >> Traceback (Py3.2.2, most recent call last): >> >> and perhaps with the platform also >> >> Traceback (most recent call last) [Py3.2.2 on win23]: >> >> Since computation has stopped, the few extra milliseconds is trivial. >> This would certainly help on Python list and the tracker when people >> do post the traceback (which they do not always) without version and >> system (which they often do not, especially on Python list). It might >> suggest to people that this is important info to include. > [...] > > But how often is it actually important information to include? > > I am active on both the tutor and the python-list lists, and it seems to > me that this proposed feature won't be very useful in either place. In > my experience, the version number is rarely important for the sorts of > questions that are commonly asked. My experience on Python list is that version and platform are often important. But leave that aside. It is definitely important on the tracker, which I already mentioned. Just a few days ago, for instance, the opening message of http://bugs.python.org/issue13538 has " >>> bytes("foo") Traceback (most recent call last): File "", line 1, in TypeError: string argument without an encoding" with no indication of the version anywhere in the message. Perhaps in such cases the OP correctly marks the version up in the header, but it would be nice to have it right there in the traceback. As for doctest, it could/should be changed to check for s.startswith("Traceback (most recent call last)") (instead of s == ...) if it does not do that now. -- Terry Jan Reedy From jcea at jcea.es Sun Dec 11 00:30:49 2011 From: jcea at jcea.es (Jesus Cea) Date: Sun, 11 Dec 2011 00:30:49 +0100 Subject: [Python-Dev] Adding GNU conditional execution in the Makefile? Message-ID: <4EE3EBA9.2050600@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Working in the DTRACE probes, I think I can simplify the build logic quite a bit using the GNU Makefile conditional execution: . In concrete, I have object files that must be compiled and linked, or not, according to a "configure" test result. But currently I think we are not using these features. Maybe because we don't want to force the use of GMAKE, I don't know. If this is a policy, I would like to know. And if somebody has a suggestion to cope with this difficulty... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTuPrqZlgi5GaxT1NAQIRmQP/ebIcya/xg/lCTXPd6QyaBaFxrhL6jLiP osKeklCSH/aw6tt6v1lK7XgPf8HBEU11KGBmL4xJUsVcDExkNb3Mdu3bSW4Gb5ao Ep1PxvEWLxa/yVkKuvgdBpvdCoxibhNLfGgVTj08ZE18o9tGbhNKS6EN94uAQJT9 ZASlf8baOss= =5lr+ -----END PGP SIGNATURE----- From tjreedy at udel.edu Sun Dec 11 00:30:34 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 10 Dec 2011 18:30:34 -0500 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com> References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> <4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com> Message-ID: On 12/10/2011 4:32 PM, Glyph Lefkowitz wrote: > On Dec 10, 2011, at 2:38 AM, Stefan Behnel wrote: > >> Note, however, that html5lib is likely way too big to add it to the >> stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML >> in Python 3, which would be the target release series for better HTML >> support. So, whatever library or API you would want to use for HTML >> processing is currently only the second question as long as Py3 lacks >> a real-world HTML parser in the stdlib, as well as a robust character >> detection mechanism. I don't think that can be fixed all that easily. > > Here's the problem in a nutshell, I think: > > 1. Everybody wants an HTML parser in the stdlib, because it's > inconvenient to pull in a dependency for such a "simple" task. > 2. Everybody wants the stdlib to remain small, stable, and simple and > not get "overcomplicated". > 3. Parsing arbitrary HTML5 is a monstrously complex problem, for which > there exist rapidly-evolving standards and libraries to deal with > it. Parsing 'the web' (which is rapidly growing to include stuff > like SVG, MathML etc) is even harder. > > > My personal opinion is that HTML5Lib gets this problem almost completely > right, and so it should be absorbed by the stdlib. A little data: the HTML5lib project lives at https://code.google.com/p/html5lib/ It has 4 owners and 22 other committers. The most recent release, html5lib 0.90 for Python, is nearly 2 years old. Since there is a separate Python3 repository, and there is no mention on Python3 compatibility elsewhere that I saw, including the pypi listing, I assume that is for Python2 only. A comment on a recent (July 11) Python3 issue https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port suggest that the Python3 version still has problems. "Merged in now, though still lots of errors and failures in the testsuite." -- Terry Jan Reedy From guido at python.org Sun Dec 11 02:02:28 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Dec 2011 17:02:28 -0800 Subject: [Python-Dev] Adding GNU conditional execution in the Makefile? In-Reply-To: <4EE3EBA9.2050600@jcea.es> References: <4EE3EBA9.2050600@jcea.es> Message-ID: I don't know how widespread gmake is, but I certainly don't want Python to be dependent on GNU tools exclusively. You don't have to use GCC to compile it. (Autoconfig is a different story, it only is needed when config.inchanges. Similar, readline is optional.) --Guido On Sat, Dec 10, 2011 at 3:30 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Working in the DTRACE probes, I think I can simplify the build logic > quite a bit using the GNU Makefile conditional execution: > . > > In concrete, I have object files that must be compiled and linked, or > not, according to a "configure" test result. > > But currently I think we are not using these features. Maybe because > we don't want to force the use of GMAKE, I don't know. > > If this is a policy, I would like to know. > > And if somebody has a suggestion to cope with this difficulty... > > - -- > Jesus Cea Avion _/_/ _/_/_/ _/_/_/ > jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ > jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ > . _/_/ _/_/ _/_/ _/_/ _/_/ > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQCVAwUBTuPrqZlgi5GaxT1NAQIRmQP/ebIcya/xg/lCTXPd6QyaBaFxrhL6jLiP > osKeklCSH/aw6tt6v1lK7XgPf8HBEU11KGBmL4xJUsVcDExkNb3Mdu3bSW4Gb5ao > Ep1PxvEWLxa/yVkKuvgdBpvdCoxibhNLfGgVTj08ZE18o9tGbhNKS6EN94uAQJT9 > ZASlf8baOss= > =5lr+ > -----END PGP SIGNATURE----- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Sun Dec 11 03:25:33 2011 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Sat, 10 Dec 2011 21:25:33 -0500 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> <4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com> Message-ID: <489D0DBD-FE82-4200-97DF-5FA425E2AEF6@twistedmatrix.com> On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote: > A little data: the HTML5lib project lives at > https://code.google.com/p/html5lib/ > It has 4 owners and 22 other committers. > > The most recent release, html5lib 0.90 for Python, is nearly 2 years old. Since there is a separate Python3 repository, and there is no mention on Python3 compatibility elsewhere that I saw, including the pypi listing, I assume that is for Python2 only. I believe that you are correct. > A comment on a recent (July 11) Python3 issue > https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port > suggest that the Python3 version still has problems. "Merged in now, though still lots of errors and failures in the testsuite." I don't see what bearing this has on the discussion. There are three possible ways I can imagine to interpret this information. First, you could believe that porting a codebase from Python 2 to Python 3 is much easier than solving a difficult domain-specific problem. In that case, html5lib has done the hard part and someone interested in html-in-the-stdlib should do the rest. Second, you could believe that porting a codebase from Python 2 to Python 3 is harder than solving a difficult domain-specific problem, in which case something is seriously wrong with Python 3 or its attendant migration tools and that needs to be fixed, so someone should fix that rather than worrying about parsing HTML right now. (I doubt that many subscribers to this list would share this opinion, though.) Third, you could believe that parsing HTML is not a difficult domain-specific problem. But only a crazy person would believe that, so you're left with one of the previous options :). -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Dec 11 06:55:01 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 11 Dec 2011 00:55:01 -0500 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <489D0DBD-FE82-4200-97DF-5FA425E2AEF6@twistedmatrix.com> References: <20111209100736.5f16419a@mikmeyer-vm-fedora> <68178.1323454554@parc.com> <4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com> <489D0DBD-FE82-4200-97DF-5FA425E2AEF6@twistedmatrix.com> Message-ID: <4EE445B5.5090802@udel.edu> On 12/10/2011 9:25 PM, Glyph Lefkowitz wrote: > On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote: >> A little data: the HTML5lib project lives at >> https://code.google.com/p/html5lib/ >> It has 4 owners and 22 other committers. If there really are 4 'owners' rather than 4 people with admin access to the site, then there are 4 people to negotiate with. >> The most recent release, html5lib 0.90 for Python, is nearly 2 years >> old. Since there is a separate Python3 repository, and there is no >> mention on Python3 compatibility elsewhere that I saw, including the >> pypi listing, I assume that is for Python2 only. > > I believe that you are correct. There are issues pointing to a 1.0 release, but I could not find any current timetable. The project lots a bit stagnant. That does not bode well for a commitment to future active maintenance. >> A comment on a recent (July 11) Python3 issue >> https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port >> > Type Status Priority Milestone Owner Summary Port> >> suggest that the Python3 version still has problems. "Merged in now, >> though still lots of errors and failures in the testsuite." > > I don't see what bearing this has on the discussion. I think both points above show that 'absorbing HTML5Lib in the stdlib' will involve more sociological and technical problems than doing so with a active one-person module that already runs on 3.2. One is that the multiple version Python 2.x codebase is the reference version and that will not be incorporated. A serious plan will have to address the real situation. --- Terry Jan Reedy From python at mrabarnett.plus.com Sun Dec 11 21:12:41 2011 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 11 Dec 2011 20:12:41 +0000 Subject: [Python-Dev] Omission in re.sub? Message-ID: <4EE50EB9.3000606@mrabarnett.plus.com> I've just come across an omission in re.sub which I hadn't noticed before. In re.sub the replacement string can contain escape sequences, for example: >>> repr(re.sub(r"x", r"\n", "axb")) "'a\\nb'" However: >>> repr(re.sub(r"x", r"\x0A", "axb")) "'a\\\\x0Ab'" Yes, it doesn't recognise "\xNN". Is there a reason for this? The regex module does the same, but is there any objection to me fixing it in the regex module? (I'm thinking about compatibility with re here.) From pje at telecommunity.com Sun Dec 11 21:12:52 2011 From: pje at telecommunity.com (PJ Eby) Date: Sun, 11 Dec 2011 15:12:52 -0500 Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?) In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: On Sat, Dec 10, 2011 at 5:30 PM, Terry Reedy wrote: > Is doctest really insisting that the whole line > Traceback (most recent call last): > exactly match, with nothing added? It really should not, as that is not > part of the language spec. This seems like the tail wagging the dog. > It's a regular expression match, actually. The standard matcher ignores everything between the Traceback line (matched by a regex) and the first unindented line that follows in the doctest. However, if you explicitly try to match a traceback with the ellipsis matcher, intending to observe whether certain specific lines are printed, then you wouldn't be using doctest's built-in matcher, and that was the case I was concerned about. However, as it turns out, I was confused about when this latter case occurs: in order to do it, you have to actually intentionally print a traceback (e.g. via traceback.format_exception() and friends), rather than allowing the exception to propagate normally. This doesn't happen nearly as often in my doctests as I thought it did, but if format_exception() changes it'll still affect some people. The other piece I was pointing out was that if you change the message without changing the doctest regex, then pasting an interpreter transcript into a doctest will no longer work, because doctest will think it's trying to match non-error output. So that has to be changed when the exception format changes. So, no actual objection here; just saying that if you don't change that regex, people who create *new* doctests with tracebacks won't be able to get them to work without deleting the version info from their copy-pasted tracebacks. I was also concerned about a situation that, while it exists, does not occur anywhere near as frequently as I thought it would in my own tests, even for things that seriously abuse Python internals and likely can't be ported to Python 3 anyway. ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 11 21:27:35 2011 From: guido at python.org (Guido van Rossum) Date: Sun, 11 Dec 2011 12:27:35 -0800 Subject: [Python-Dev] Omission in re.sub? In-Reply-To: <4EE50EB9.3000606@mrabarnett.plus.com> References: <4EE50EB9.3000606@mrabarnett.plus.com> Message-ID: As long as there's a way to place a single backslash in the output this seems fine to me, though I'm not sure it's important. Of course it will likely break some test... the test will then have to be fixed. I can't remember why we did this -- is there a full list of all the escapes that re.sub() interprets somewhere? I thought it was pretty limited. Maybe it's the related list of escapes that are supported in regular expressions? --Guido On Sun, Dec 11, 2011 at 12:12 PM, MRAB wrote: > I've just come across an omission in re.sub which I hadn't noticed > before. > > In re.sub the replacement string can contain escape sequences, for > example: > >>>> repr(re.sub(r"x", r"\n", "axb")) > "'a\\nb'" > > However: > >>>> repr(re.sub(r"x", r"\x0A", "axb")) > "'a\\\\x0Ab'" > > Yes, it doesn't recognise "\xNN". > > Is there a reason for this? > > The regex module does the same, but is there any objection to me fixing > it in the regex module? (I'm thinking about compatibility with re here.) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Sun Dec 11 21:47:48 2011 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 11 Dec 2011 20:47:48 +0000 Subject: [Python-Dev] Omission in re.sub? In-Reply-To: References: <4EE50EB9.3000606@mrabarnett.plus.com> Message-ID: <4EE516F4.4000208@mrabarnett.plus.com> On 11/12/2011 20:27, Guido van Rossum wrote: > On Sun, Dec 11, 2011 at 12:12 PM, MRAB > wrote: >> I've just come across an omission in re.sub which I hadn't noticed >> before. >> >> In re.sub the replacement string can contain escape sequences, for >> example: >> >>>>> repr(re.sub(r"x", r"\n", "axb")) >> "'a\\nb'" >> >> However: >> >>>>> repr(re.sub(r"x", r"\x0A", "axb")) >> "'a\\\\x0Ab'" >> >> Yes, it doesn't recognise "\xNN". >> >> Is there a reason for this? >> >> The regex module does the same, but is there any objection to me >> fixing it in the regex module? (I'm thinking about compatibility >> with re here.) > > As long as there's a way to place a single backslash in the output > this seems fine to me, though I'm not sure it's important. Of course > it will likely break some test... the test will then have to be > fixed. > > I can't remember why we did this -- is there a full list of all the > escapes that re.sub() interprets somewhere? I thought it was pretty > limited. Maybe it's the related list of escapes that are supported > in regular expressions? > The documentation says: """That is, \n is converted to a single newline character, \r is converted to a linefeed, and so forth.""" All of the other escape sequences work as expected, except for \uNNNN and \UNNNNNNNN which aren't supported at all in re. I should probably also add \N{...} to the list for completeness. From guido at python.org Sun Dec 11 22:04:56 2011 From: guido at python.org (Guido van Rossum) Date: Sun, 11 Dec 2011 13:04:56 -0800 Subject: [Python-Dev] Omission in re.sub? In-Reply-To: <4EE516F4.4000208@mrabarnett.plus.com> References: <4EE50EB9.3000606@mrabarnett.plus.com> <4EE516F4.4000208@mrabarnett.plus.com> Message-ID: I guess the current rule is that any escapes referring to characters by a numeric value are not supported; this probably made some kind of sense because \1 etc. are backreferences. But since we're discouraging octal escapes anyway I think it's fine to improve over this. On Sun, Dec 11, 2011 at 12:47 PM, MRAB wrote: > On 11/12/2011 20:27, Guido van Rossum wrote: >> >> On Sun, Dec 11, 2011 at 12:12 PM, MRAB >> wrote: >>> >>> I've just come across an omission in re.sub which I hadn't noticed >>> before. >>> >>> In re.sub the replacement string can contain escape sequences, for >>> example: >>> >>>>>> repr(re.sub(r"x", r"\n", "axb")) >>> >>> "'a\\nb'" >>> >>> However: >>> >>>>>> repr(re.sub(r"x", r"\x0A", "axb")) >>> >>> "'a\\\\x0Ab'" >>> >>> Yes, it doesn't recognise "\xNN". >>> >>> Is there a reason for this? >>> >>> The regex module does the same, but is there any objection to me >>> fixing it in the regex module? (I'm thinking about compatibility >>> with re here.) >> >> >> As long as there's a way to place a single backslash in the output >> this seems fine to me, though I'm not sure it's important. Of course >> it will likely break some test... the test will then have to be >> fixed. >> >> I can't remember why we did this -- is there a full list of all the >> escapes that re.sub() interprets somewhere? I thought it was pretty >> limited. Maybe it's the related list of escapes that are supported >> in regular expressions? >> > The documentation says: """That is, \n is converted to a single newline > character, \r is converted to a linefeed, and so forth.""" > > All of the other escape sequences work as expected, except for \uNNNN > and \UNNNNNNNN which aren't supported at all in re. > > I should probably also add \N{...} to the list for completeness. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From martin at v.loewis.de Sun Dec 11 23:03:41 2011 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Dec 2011 23:03:41 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> Message-ID: <4EE528BD.2040102@v.loewis.de> Am 09.12.2011 10:09, schrieb Xavier Morel: > On 2011-12-09, at 09:41 , Martin v. L?wis wrote: >>> a) The stdlib documentation should help users to choose the right >>> tool right from the start. Instead of using the totally >>> misleading wording that it uses now, it should be honest about >>> the performance characteristics of MiniDOM and should actively >>> suggest that those who don't know what to choose (or even *that* >>> they can choose) should not use MiniDOM in the first place. >> [...] > > Minidom is inferior in interface flow and pythonicity, in terseness, > in speed, in memory consumption (even more so using cElementTree, and > that's not something which can be fixed unless minidom gets a C > accelerator), etc? Even after fixing minidom (if anybody has the time > and drive to commit to it), ET/cET should be preferred over it. I don't mind pointing people to ElementTree, despite that I disagree whether the ET interface is "superior" to DOM. It's Stefan's reasoning as to *why* people should be pointed to ET, and what words should be used to do that. IOW, I detest bashing some part of the standard library, just to urge users to use some other part of the standard library. People are still using PyXML, despite it's not being maintained anymore. Telling them to replace 4DOM with minidom is much more appropriate than telling them to rewrite in ET. Regards, Martin From martin at v.loewis.de Sun Dec 11 23:07:07 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Dec 2011 23:07:07 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> Message-ID: <4EE5298B.8090908@v.loewis.de> > For the various XML libraries, a message along the lines of "Note: The > module is a . If all you > are trying to do is read and write XML files, consider using the > xml.etree.ElementTree module instead". I wouldn't mind such a wording. I still would mind the changes that Stefan proposed (which are actually different from yours). Regards, Martin From martin at v.loewis.de Sun Dec 11 23:14:57 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Dec 2011 23:14:57 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: References: <20111209013535.6fb38068@pitrou.net> <4EE1CA5D.70705@v.loewis.de> Message-ID: <4EE52B61.40801@v.loewis.de> Am 09.12.2011 10:12, schrieb Nick Coghlan: > On Fri, Dec 9, 2011 at 6:44 PM, "Martin v. L?wis" wrote: >> Am 09.12.2011 01:35, schrieb Antoine Pitrou: >>> On Fri, 09 Dec 2011 00:16:02 +0100 >>> victor.stinner wrote: >>>> >>>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) >>>> + >>>> + Get a new copy of a Unicode object. >>>> + >>>> + .. versionadded:: 3.3 >>> >>> I'm not sure I understand. Why would you make a copy of an immutable >>> object? >> >> It can convert a unicode subtype object into a an exact unicode >> object. >> >> I'd rename it to _PyUnicode_AsExactUnicode, and undocument it. > > Isn't it basically just exposing a C level version of the unicode() > builtin's behaviour? No. To call the unicode() builtin, do PyObject_CallFunction(&PyUnicode_Type, "O", param) or some such. PyUnicode_Copy doesn't correspond to any Python-level API. > While I agree the name could be better (and > PyUnicode_AsExactUnicode would certainly work), why make it private? I suggest to be minimalistic in extensions to the API. There should be a demonstrated need for an API before adding it, which I don't see in this case. In general, it will be difficult to find a demonstrable need for new APIs, since the majority (more than 99%) of API use cases is already covered by the abstract object API (i.e. what ceval uses). The unicode type in particular has a bad tradition of adding tons of function to the C API, only so we find out a few releases later that the API is obsolete (e.g. needs additional/different parameters), so we carry unused functions around just because some extension module may use them. Regards, Martin From python at mrabarnett.plus.com Sun Dec 11 23:36:32 2011 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 11 Dec 2011 22:36:32 +0000 Subject: [Python-Dev] Omission in re.sub? In-Reply-To: References: <4EE50EB9.3000606@mrabarnett.plus.com> <4EE516F4.4000208@mrabarnett.plus.com> Message-ID: <4EE53070.9010702@mrabarnett.plus.com> On 11/12/2011 21:04, Guido van Rossum wrote: > On Sun, Dec 11, 2011 at 12:47 PM, MRAB wrote: >> On 11/12/2011 20:27, Guido van Rossum wrote: >>> >>> On Sun, Dec 11, 2011 at 12:12 PM, MRAB >>> wrote: >>>> >>>> I've just come across an omission in re.sub which I hadn't noticed >>>> before. >>>> >>>> In re.sub the replacement string can contain escape sequences, for >>>> example: >>>> >>>>>>> repr(re.sub(r"x", r"\n", "axb")) >>>> >>>> "'a\\nb'" >>>> >>>> However: >>>> >>>>>>> repr(re.sub(r"x", r"\x0A", "axb")) >>>> >>>> "'a\\\\x0Ab'" >>>> >>>> Yes, it doesn't recognise "\xNN". >>>> >>>> Is there a reason for this? >>>> >>>> The regex module does the same, but is there any objection to me >>>> fixing it in the regex module? (I'm thinking about compatibility >>>> with re here.) >>> >>> >>> As long as there's a way to place a single backslash in the output >>> this seems fine to me, though I'm not sure it's important. Of course >>> it will likely break some test... the test will then have to be >>> fixed. >>> >>> I can't remember why we did this -- is there a full list of all the >>> escapes that re.sub() interprets somewhere? I thought it was pretty >>> limited. Maybe it's the related list of escapes that are supported >>> in regular expressions? >>> >> The documentation says: """That is, \n is converted to a single newline >> character, \r is converted to a linefeed, and so forth.""" >> >> All of the other escape sequences work as expected, except for \uNNNN >> and \UNNNNNNNN which aren't supported at all in re. >> >> I should probably also add \N{...} to the list for completeness. >> > I guess the current rule is that any escapes referring to characters > by a numeric value are not supported; this probably made some kind of > sense because \1 etc. are backreferences. But since we're discouraging > octal escapes anyway I think it's fine to improve over this. > A pattern can contain them, even octal escapes (must be 3 digits). From martin at v.loewis.de Sun Dec 11 23:39:53 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 11 Dec 2011 23:39:53 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> Message-ID: <4EE53139.8020500@v.loewis.de> > I can't recall anyone working on any substantial improvements during the > last six years or so, and the reason for that seems obvious to me. What do you think is the reason? It's not at all obvious to me. Regards, Martin From martin at v.loewis.de Sun Dec 11 23:40:47 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Dec 2011 23:40:47 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: <4EE5316F.9060004@v.loewis.de> Am 09.12.2011 16:09, schrieb Dirkjan Ochtman: > On Fri, Dec 9, 2011 at 09:02, Stefan Behnel wrote: >> a) The stdlib documentation should help users to choose the right tool right >> from the start. >> b) cElementTree should finally loose it's "special" status as a separate >> library and disappear as an accelerator module behind ElementTree. > > An at least somewhat informed +1 from me. The ElementTree API is a > very good way to deal with XML from Python, and it deserves to be > promoted over the included alternatives. > > Let's deprecate the NiCad batteries and try to guide users toward the > Li-Ion ones. If you are proposing to deprecate minidom: -1 Regards, Martin From solipsis at pitrou.net Sun Dec 11 23:45:06 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 11 Dec 2011 23:45:06 +0100 Subject: [Python-Dev] cpython: Issue #5689: Add support for lzma compression to the tarfile module. References: Message-ID: <20111211234506.071db305@pitrou.net> On Sat, 10 Dec 2011 20:40:17 +0100 lars.gustaebel wrote: > > The :mod:`tarfile` module makes it possible to read and write tar > -archives, including those using gzip or bz2 compression. > +archives, including those using gzip, bz2 and lzma compression. > (:file:`.zip` files can be read and written using the :mod:`zipfile` module.) Perhaps there should be a "versionchanged" directive for lzma support? Regards Antoine. From martin at v.loewis.de Sun Dec 11 23:44:50 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Dec 2011 23:44:50 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: <20111209203216.2c627d61@pitrou.net> References: <20111209013535.6fb38068@pitrou.net> <4EE258A2.8020902@haypocalc.com> <20111209203216.2c627d61@pitrou.net> Message-ID: <4EE53262.1080405@v.loewis.de> Am 09.12.2011 20:32, schrieb Antoine Pitrou: > On Fri, 09 Dec 2011 19:51:14 +0100 > Victor Stinner wrote: >> On 09/12/2011 01:35, Antoine Pitrou wrote: >>> On Fri, 09 Dec 2011 00:16:02 +0100 >>> victor.stinner wrote: >>>> >>>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) >>>> + >>>> + Get a new copy of a Unicode object. >>>> + >>>> + .. versionadded:: 3.3 >>> >>> I'm not sure I understand. Why would you make a copy of an immutable >>> object? >> >> PyUnicode_Copy() can be used to modify a string to create a new string >> with the same length. It is used for example by str.upper(), >> str.title(), ... (fixup()). > > Then the doc should mention that the returned string can be modified. > Otherwise it's a bit obscure why the function exists. I'm skeptical about this modification part. If you make a copy, it's not clear at all that the new characters that you put in will fit in range with the width of the unicode string. Even decreasing the ordinal of a character may be incorrect as the result may not be canonical anymore. Regards, Martin From solipsis at pitrou.net Sun Dec 11 23:46:09 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 11 Dec 2011 23:46:09 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: <4EE53262.1080405@v.loewis.de> References: <20111209013535.6fb38068@pitrou.net> <4EE258A2.8020902@haypocalc.com> <20111209203216.2c627d61@pitrou.net> <4EE53262.1080405@v.loewis.de> Message-ID: <1323643569.3366.19.camel@localhost.localdomain> Le dimanche 11 d?cembre 2011 ? 23:44 +0100, "Martin v. L?wis" a ?crit : > Am 09.12.2011 20:32, schrieb Antoine Pitrou: > > On Fri, 09 Dec 2011 19:51:14 +0100 > > Victor Stinner wrote: > >> On 09/12/2011 01:35, Antoine Pitrou wrote: > >>> On Fri, 09 Dec 2011 00:16:02 +0100 > >>> victor.stinner wrote: > >>>> > >>>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) > >>>> + > >>>> + Get a new copy of a Unicode object. > >>>> + > >>>> + .. versionadded:: 3.3 > >>> > >>> I'm not sure I understand. Why would you make a copy of an immutable > >>> object? > >> > >> PyUnicode_Copy() can be used to modify a string to create a new string > >> with the same length. It is used for example by str.upper(), > >> str.title(), ... (fixup()). > > > > Then the doc should mention that the returned string can be modified. > > Otherwise it's a bit obscure why the function exists. > > I'm skeptical about this modification part. If you make a copy, it's > not clear at all that the new characters that you put in will fit > in range with the width of the unicode string. Even decreasing the > ordinal of a character may be incorrect as the result may not be > canonical anymore. Ah, good point. And perhaps a good reason to make the API private. Regards Antoine. From python-dev at masklinn.net Sun Dec 11 23:47:45 2011 From: python-dev at masklinn.net (Xavier Morel) Date: Sun, 11 Dec 2011 23:47:45 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE528BD.2040102@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> <4EE528BD.2040102@v.loewis.de> Message-ID: <4E7DB3D7-F4DF-40D8-981D-23F71658EBCB@masklinn.net> On 2011-12-11, at 23:03 , Martin v. L?wis wrote: > People are still using PyXML, despite it's not being maintained anymore. > Telling them to replace 4DOM with minidom is much more appropriate than > telling them to rewrite in ET. From my understanding, Stefan's suggestion is mostly aimed at "new" python users trying to manipulate XML and not knowing what to use (yet). It's not about telling people to rewrite existing codebase (it's a good idea as well when possible, as far as I'm concerned, but it's a different issue). From martin at v.loewis.de Sun Dec 11 23:50:50 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Dec 2011 23:50:50 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> Message-ID: <4EE533CA.4000605@v.loewis.de> Am 09.12.2011 11:17, schrieb Nick Coghlan: > On Fri, Dec 9, 2011 at 8:03 PM, Terry Reedy wrote: >> On 12/8/2011 8:39 PM, Vinay Sajip wrote: >>> on an >>> >>> entire codebase (for example, using setup.py with flags to run 2to3 >>> during setup). >> >> >> Oh. That explains the 'slow' complaint. > > As Chris pointed out though, the real problem with the "repeatedly run > 2to3" workflow is that it can make interpreting tracebacks from the > field *really* hard. It's hard, but not *really* hard. In most cases, the line numbers in the 2to3 result are exactly the same as in the original, and if not, the quoted source in the traceback will give you enough context to find the source line of the problem. Regards, Martin From martin at v.loewis.de Sun Dec 11 23:58:42 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 11 Dec 2011 23:58:42 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE239A0.2020004@netwok.org> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <4EE239A0.2020004@netwok.org> Message-ID: <4EE535A2.1000002@v.loewis.de> > When running 2to3 from a setup.py script, does it run on the whole > codebase or only files that are found newer by the make-like > timestamp-based dependency system? If you run "build" repeatedly (e.g. in a development cycle), then it will process only the modified files (comparing time stamps between the build/ area and the original source). Regards, Martin From martin at v.loewis.de Mon Dec 12 00:00:43 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 12 Dec 2011 00:00:43 +0100 Subject: [Python-Dev] 2to3 and timestamps In-Reply-To: <20111209174631.68a311f5@pitrou.net> References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko> <1323325916.2710.39.camel@thinko> <4EE239A0.2020004@netwok.org> <20111209174631.68a311f5@pitrou.net> Message-ID: <4EE5361B.4050408@v.loewis.de> >> When running 2to3 from a setup.py script, does it run on the whole >> codebase or only files that are found newer by the make-like >> timestamp-based dependency system? If it?s the former, as some messages >> seem to show (sorry no time to test right now), ISTM we can fix >> distutils to do the latter (unless there are bugs due to import >> rewriting to use explicit relative imports when there are extension >> modules?blergh). > > It would be better to teach 2to3 to do it by itself. Not everybody runs > 2to3 through a setup.py script. For the 2to3 command line tool, the issue is where it shall place the output. It currently supports writing diffs to stdout (without saving any conversion result), and overwriting the original file (which means that it loses the original files). So before you try to consider incremental output, you need to consider original-preserving saves first. Regards, Martin From martin at v.loewis.de Mon Dec 12 00:04:24 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 12 Dec 2011 00:04:24 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <1323408839.2710.143.camel@thinko> <7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com> Message-ID: <4EE536F8.2010209@v.loewis.de> > Even in the plans that involve 2to3 > though, "drop everything prior to 2.6" was always supposed to be step 0, > so "single codebase" adds much less of a burden than I thought. Are you talking about general porting, or about Twisted? It is a common misconception that "drop everything prior to 2.6" was a recommended step 0 for porting to Python 3. That was never recommended. Instead, what *was* recommended is "port to Python 2.6", which for many projects already supporting, say, 2.5, was a no-op, so people read more into that than was actually necessary. With the project ported to 2.6, you could then make use of the 3k warnings to learn what issues you would face when porting to 3k. Regards, Martin From victor.stinner at haypocalc.com Mon Dec 12 01:54:44 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 12 Dec 2011 01:54:44 +0100 Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage() In-Reply-To: <20111209203216.2c627d61@pitrou.net> References: <4EE258A2.8020902@haypocalc.com> <20111209203216.2c627d61@pitrou.net> Message-ID: <1618219.EXT6TC1vln@ned> Le vendredi 9 d?cembre 2011 20:32:16 Antoine Pitrou a ?crit : > ... it's a bit obscure why the function exists. Yeah ok, I marked the function as private: renamed to _PyUnicode_Copy() and I undocumented it. Victor From guido at python.org Mon Dec 12 04:14:48 2011 From: guido at python.org (Guido van Rossum) Date: Sun, 11 Dec 2011 19:14:48 -0800 Subject: [Python-Dev] Omission in re.sub? In-Reply-To: <4EE53070.9010702@mrabarnett.plus.com> References: <4EE50EB9.3000606@mrabarnett.plus.com> <4EE516F4.4000208@mrabarnett.plus.com> <4EE53070.9010702@mrabarnett.plus.com> Message-ID: On Sun, Dec 11, 2011 at 2:36 PM, MRAB wrote: > On 11/12/2011 21:04, Guido van Rossum wrote: >> >> On Sun, Dec 11, 2011 at 12:47 PM, MRAB ?wrote: >>> >>> On 11/12/2011 20:27, Guido van Rossum wrote: >>>> >>>> >>>> On Sun, Dec 11, 2011 at 12:12 PM, MRAB >>>> wrote: >>>>> >>>>> >>>>> I've just come across an omission in re.sub which I hadn't noticed >>>>> before. >>>>> >>>>> In re.sub the replacement string can contain escape sequences, for >>>>> example: >>>>> >>>>>>>> repr(re.sub(r"x", r"\n", "axb")) >>>>> >>>>> >>>>> "'a\\nb'" >>>>> >>>>> However: >>>>> >>>>>>>> repr(re.sub(r"x", r"\x0A", "axb")) >>>>> >>>>> >>>>> "'a\\\\x0Ab'" >>>>> >>>>> Yes, it doesn't recognise "\xNN". >>>>> >>>>> Is there a reason for this? >>>>> >>>>> The regex module does the same, but is there any objection to me >>>>> fixing it in the regex module? (I'm thinking about compatibility >>>>> with re here.) >>>> >>>> >>>> >>>> As long as there's a way to place a single backslash in the output >>>> this seems fine to me, though I'm not sure it's important. Of course >>>> it will likely break some test... the test will then have to be >>>> fixed. >>>> >>>> I can't remember why we did this -- is there a full list of all the >>>> escapes that re.sub() interprets somewhere? I thought it was pretty >>>> limited. Maybe it's the related list of escapes that are supported >>>> in regular expressions? >>>> >>> The documentation says: """That is, \n is converted to a single newline >>> character, \r is converted to a linefeed, and so forth.""" >>> >>> All of the other escape sequences work as expected, except for \uNNNN >>> and \UNNNNNNNN which aren't supported at all in re. >>> >>> I should probably also add \N{...} to the list for completeness. >>> >> I guess the current rule is that any escapes referring to characters >> by a numeric value are not supported; this probably made some kind of >> sense because \1 etc. are backreferences. But since we're discouraging >> octal escapes anyway I think it's fine to improve over this. >> > A pattern can contain them, even octal escapes (must be 3 digits). Fine, then I think we should model this. Though I think that we could start deprecating octal escapes in patterns so that eventually we can support over 99 backreferences. So maybe we should just not start supporting octal in the substitution string now. -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Dec 12 08:32:37 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 11 Dec 2011 23:32:37 -0800 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE528BD.2040102@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> <4EE528BD.2040102@v.loewis.de> Message-ID: <4EE5AE15.7060208@stoneleaf.us> Martin, You seem heavily invested in minidom. In the near future I will need to parse and rewrite parts of an xml file created by a third-party program (PrintShopMail, for the curious). It contains both binary and textual data. Would you recommend minidom for this purpose? What other purposes would you recommend minidom for? xml-confused-ly yours, ~Ethan~ (Comments by others are, of course, also welcome. :) From chrism at plope.com Mon Dec 12 09:40:42 2011 From: chrism at plope.com (Chris McDonough) Date: Mon, 12 Dec 2011 03:40:42 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> Message-ID: <1323679242.2710.350.camel@thinko> On Sat, 2011-12-10 at 15:55 +1000, Nick Coghlan wrote: > So I'm back to being -1 on the idea of adding back u'' literals for > 3.3. Instead, people should explicitly call str() on any literals that > they want to be actual str instances both in 3.x and in 2.x when the > unicode literals future import is in effect. After thinking on it a while, I can't see anything wrong with this strategy except for the 10X performance hit for defining native literals. Truth be told, in the vast majority of WSGI apps only high-level WSGI libraries (like WebOb and Werkzeug) and standalone middleware really needs to work with native strings. And the middleware really should be using the high-level libraries to parse WSGI anyway. So there are a finite number of places where it's actually a real issue. As someone who ported WebOb and other stuff built on top of it to Python 3 without using "from __future__ import unicode_literals", I'm kinda sad that to be using best practice I'll have to go back and flip the polarity on everything. It's my cross to bear, though. If I have any issue with it in the future I'll bring u'' back up. - C From stefan_ml at behnel.de Mon Dec 12 10:04:22 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 12 Dec 2011 10:04:22 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE53139.8020500@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> Message-ID: "Martin v. L?wis", 11.12.2011 23:39: >> I can't recall anyone working on any substantial improvements during the >> last six years or so, and the reason for that seems obvious to me. > > What do you think is the reason? It's not at all obvious to me. Just to repeat myself for the third time here: lack of interest. Stefan From lars at gustaebel.de Mon Dec 12 10:28:16 2011 From: lars at gustaebel.de (lars at gustaebel.de) Date: Mon, 12 Dec 2011 10:28:16 +0100 Subject: [Python-Dev] cpython: Issue #5689: Add support for lzma compression to the tarfile module. In-Reply-To: <20111211234506.071db305@pitrou.net> References: <20111211234506.071db305@pitrou.net> Message-ID: <20111212092815.GA19922@axis.g33x.de> On Sun, Dec 11, 2011 at 11:45:06PM +0100, Antoine Pitrou wrote: > On Sat, 10 Dec 2011 20:40:17 +0100 > lars.gustaebel wrote: > > > > The :mod:`tarfile` module makes it possible to read and write tar > > -archives, including those using gzip or bz2 compression. > > +archives, including those using gzip, bz2 and lzma compression. > > (:file:`.zip` files can be read and written using the :mod:`zipfile` module.) > > Perhaps there should be a "versionchanged" directive for lzma support? This is now fixed. -- Lars Gust?bel lars at gustaebel.de There's no present. There's only the immediate future and the recent past. (George Carlin) From stefan_ml at behnel.de Mon Dec 12 10:59:23 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 12 Dec 2011 10:59:23 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE528BD.2040102@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> <4EE528BD.2040102@v.loewis.de> Message-ID: "Martin v. L?wis", 11.12.2011 23:03: > Am 09.12.2011 10:09, schrieb Xavier Morel: >> On 2011-12-09, at 09:41 , Martin v. L?wis wrote: >>>> a) The stdlib documentation should help users to choose the right >>>> tool right from the start. Instead of using the totally >>>> misleading wording that it uses now, it should be honest about >>>> the performance characteristics of MiniDOM and should actively >>>> suggest that those who don't know what to choose (or even *that* >>>> they can choose) should not use MiniDOM in the first place. >>> > [...] >> >> Minidom is inferior in interface flow and pythonicity, in terseness, >> in speed, in memory consumption (even more so using cElementTree, and >> that's not something which can be fixed unless minidom gets a C >> accelerator), etc? Even after fixing minidom (if anybody has the time >> and drive to commit to it), ET/cET should be preferred over it. > > I don't mind pointing people to ElementTree, despite that I disagree > whether the ET interface is "superior" to DOM. Yes, that's clearly a point where we agree to disagree, and I understand that you are as biased towards minidom as I am biased towards ElementTree. However, I think I made it clear that the implementation of cElementTree (and lxml.etree as well, for that purpose) is largely superiour to MiniDOM in terms of performance, for any sensible meaning of the word performance. And I'm also convinced that the API is largely superiour in terms of usability. ET certainly matches Python as a language much better than MiniDOM. But that's just my personal opinion. > It's Stefan's reasoning > as to *why* people should be pointed to ET, and what words should be > used to do that. IOW, I detest bashing some part of the standard > library, just to urge users to use some other part of the standard library. I'm all for finding a good way of putting it into words, as long as it keeps uninformed users from taking the wrong decision and getting the wrong idea of how complicated and slow Python is. > People are still using PyXML, despite it's not being maintained anymore. My experience with that is that it's only *new* users that are still running into PyXML by accident, because they didn't see that it's a dead project and they find it through ancient web pages that tell them that they need it because "it's the way to do XML in Python" and "if minidom is not enough, use PyXML". Maybe we should "misuse" the stdlib documentation to clear that up as well. "PyXML" is just too attractive a name for a dead project. Just look through the xml-sig page, basically all requests regarding PyXML during the last five years deal with problems in installing it, i.e. *before* even starting to use it. So you can't use this to claim that people really *are* still using it. > Telling them to replace 4DOM with minidom is much more appropriate Do you actually have any evidence that anyone is still actively using 4DOM? > than telling them to rewrite in ET. I usually encourage people to rewrite minidom code for ET. It makes the code simpler, more readable, more maintainable and much faster. Stefan From stefan_ml at behnel.de Mon Dec 12 11:08:44 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 12 Dec 2011 11:08:44 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> <4EE528BD.2040102@v.loewis.de> Message-ID: Stefan Behnel, 12.12.2011 10:59: > Just look through the xml-sig page Hmm, I meant "xml-sig mailing list archive" here ... Stefan From pje at telecommunity.com Mon Dec 12 15:50:46 2011 From: pje at telecommunity.com (PJ Eby) Date: Mon, 12 Dec 2011 09:50:46 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <1323679242.2710.350.camel@thinko> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> Message-ID: On Mon, Dec 12, 2011 at 3:40 AM, Chris McDonough wrote: > Truth be told, in the vast majority of WSGI apps only high-level WSGI > libraries (like WebOb and Werkzeug) and standalone middleware really > needs to work with native strings. And the middleware really should be > using the high-level libraries to parse WSGI anyway. So there are a > finite number of places where it's actually a real issue. > And those only if they're using "six" or a similar joint-codebase strategy, *and* using unicode_literals in a 2.x module that also does WSGI. If they're using 2to3 and stick with explicit u'', they'll be fine. Unfortunately, AFAIR, nobody in the PEP 3333 discussions brought up either the unicode_literals import OR the strategy of using a common codebase, so 2to3 on plain code and writing new Python3 code were the only porting scenarios discussed. (Not that I'm sure it would've made a difference, as I'm not sure what we could have done differently that would still support simple Python3 code and easy 2to3 porting.) As someone who ported WebOb and other stuff built on top of it to Python > 3 without using "from __future__ import unicode_literals", I'm kinda sad > that to be using best practice I'll have to go back and flip the > polarity on everything. Eh? If you don't need unicode_literals, what's the problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrism at plope.com Mon Dec 12 22:18:40 2011 From: chrism at plope.com (Chris McDonough) Date: Mon, 12 Dec 2011 16:18:40 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> Message-ID: <1323724720.2710.388.camel@thinko> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: > As someone who ported WebOb and other stuff built on top of it > to Python > 3 without using "from __future__ import unicode_literals", I'm > kinda sad > that to be using best practice I'll have to go back and flip > the > polarity on everything. > > > Eh? If you don't need unicode_literals, what's the problem? Porting the WebOb code sucked. It's only about 5K lines of code but the porting effort took me about 80 hours. Some of the problem is certainly my own idiocy, but some of it is just because straddling code across Python 2 and Python 3 currently requires that you change lots and lots of code for suspect benefit. - C From ericsnowcurrently at gmail.com Mon Dec 12 22:44:56 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 12 Dec 2011 14:44:56 -0700 Subject: [Python-Dev] (no subject) Message-ID: Guido posted this on Google+: > IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). Will this document have a broad use, such that we should make sure it is accurate (to avoid any future confusion)? I skimmed through and found that it covers a lot of ground, not necessarily about vulnerabilities, with some inaccuracies but not a ton that I noticed. If it doesn't matter then no big deal. Just thought I'd bring it up. -eric From ericsnowcurrently at gmail.com Mon Dec 12 22:46:32 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 12 Dec 2011 14:46:32 -0700 Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities Message-ID: re-sending with subject :) On Mon, Dec 12, 2011 at 2:44 PM, Eric Snow wrote: > Guido posted this on Google+: > >> IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). > > Will this document have a broad use, such that we should make sure it > is accurate (to avoid any future confusion)? ?I skimmed through and > found that it covers a lot of ground, not necessarily about > vulnerabilities, with some inaccuracies but not a ton that I noticed. > If it doesn't matter then no big deal. ?Just thought I'd bring it up. > > -eric From guido at python.org Mon Dec 12 22:52:49 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 12 Dec 2011 13:52:49 -0800 Subject: [Python-Dev] (no subject) In-Reply-To: References: Message-ID: The authors are definitely interested in feedback! Best probably to post it to my G+ thread. On Mon, Dec 12, 2011 at 1:44 PM, Eric Snow wrote: > Guido posted this on Google+: > >> IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). > > Will this document have a broad use, such that we should make sure it > is accurate (to avoid any future confusion)? ?I skimmed through and > found that it covers a lot of ground, not necessarily about > vulnerabilities, with some inaccuracies but not a ton that I noticed. > If it doesn't matter then no big deal. ?Just thought I'd bring it up. > > -eric > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From victor.stinner at haypocalc.com Mon Dec 12 23:56:50 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 12 Dec 2011 23:56:50 +0100 Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities In-Reply-To: References: Message-ID: <4EE686B2.6040806@haypocalc.com> >>> IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). Random comments. I didn't read everything. -- "Vulnerability descriptions for the language Python Standards and terminology based on the 3.x standard only." (...) "Automatic conversion also occurs when an integer becomes too large to fit within the constraints of the large integer specified in the language (typically C) used to create the Python interpreter. On a 32?bit machine this would be the range ?2^30 to 2^30?1. When an integer becomes too large to fit into that range it is converted to an extended precision integer of arbitrary length." (...) "otherwise, if either argument is a floating point number, the other is converted to floating otherwise, if either argument is a long integer, the other is converted to long integer;" 10 and 2**1024 have the same type (int) in Python 3. I don't really understand what "extended precision" means. There are no more "long" integers. -- "Python.16 Wrap?around Error [XYY]" (...) "... exception handling for floating point operations cannot be assumed to catch this type of error because they are not standardized in the underlying C language." Can you give me an example of such problem? If there is really an issue, can we configure the FPU to catch such error? pyfpe.h has PyFPE_START_PROTECT and PyFPE_END_PROTECT macros, but they do nothing by default. You can to enable this protection using ./configure --with-fpectl. -- "if(y > 0):print(x)" Even if this example is valid, it is surprising to see parenthesis around the condition in Python. "if y > 0: print(x)" or even "if y > 0: print(x)" would be better. -- "Python also encourages structured programming by not introducing any of the following constructs which could easily lead to unstructured code: - Labels and branching statements such as GO TO; - Case, GO TO DEPENDING, EVALUATE, switch and other statements that branch dependent on a variable?s value; and - ALTER which changes GO TO label to branch to a different label." You have to modify the language (and so build your own interpreter) to add a "goto" instruction to Python. Or do you mean that someone may want to implement something like goto using exceptions for example? -- "When sorting a list using the sort() method, attempting to inspect or mutate the content of the list will result in undefined behaviour." Oh... I never imagined such "use case". Let's try: $ ./python Python 3.3.0a0 (default:3ad7d01acbf4+, Dec 12 2011, 21:07:55) >>> def hack(x): ... mylist.append(10) ... return ... >>> mylist=[1] >>> mylist.sort(key=hack) Traceback (most recent call last): File "", line 1, in ValueError: list modified during sort Same behaviour with Python 2.7 and 3.2: so the Python behaviour is defined, you get a ValueError. Are there other ways to inspect or mutate a list while sorting it? -- "The sequence of keys in a dictionary is undefined because the hashing function used to index the keys is unspecified therefore different implementations are likely to yield different sequences." Exact. You might mention that collections.OrderedDict has a defined behaviour: it lists keys (and values) in the insertion order. -- "Mixing tabs and spaces to indent is defined differently for UNIX and non?UNIX platforms;" You can use the -tt command line option to raise an IndentationError (a block can still be indented using spaces and tabs). Victor From ncoghlan at gmail.com Tue Dec 13 01:14:08 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 13 Dec 2011 10:14:08 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> Message-ID: On Tue, Dec 13, 2011 at 12:50 AM, PJ Eby wrote: > Unfortunately, AFAIR, nobody in the PEP 3333 discussions brought up either > the unicode_literals import OR the strategy of using a common codebase, so > 2to3 on plain code and writing new Python3 code were the only porting > scenarios discussed. ?(Not that I'm sure it would've made a difference, as > I'm not sure what we could have done differently that would still support > simple Python3 code and easy 2to3 porting.) That's not web-sig's fault though - it's only as people have been trying it and *succeeding* that we've come to realise that single code base approaches are significantly more feasible than we originally anticipated. Now, depending on whether you need to support 2.5 and earlier, we even have a reasonable answer to the native strings problem: If supporting only 2.6+, use "from __future__ import unicode_literals" and the 'str' builtin: Import at top of module: "from __future__ import unicode_literals" Text: "" Native: str("") Binary: b"" If also supporting 2.5 and earlier, use "six" (or an equivalent compatibility module): Import at top of module: "from six import u, b" Text: u("") Native: "" Binary: b("") Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From wolfson at gmail.com Tue Dec 13 04:56:16 2011 From: wolfson at gmail.com (Ben Wolfson) Date: Mon, 12 Dec 2011 19:56:16 -0800 Subject: [Python-Dev] str.format implementation Message-ID: Hi, I'm hoping to get some kind of consensus about the divergences between the implementation and documentation of str.format (http://mail.python.org/pipermail/python-dev/2011-June/111860.html and the linked bug report contain examples of the divergences). These pertain to the arg_name, attribute_name, and element_index fields of the grammar in the docs: replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}" field_name ::= arg_name ("." attribute_name | "[" element_index "]")* arg_name ::= [identifier | integer] attribute_name ::= identifier element_index ::= integer | index_string index_string ::= + Nothing definitive emerged from the last round of discussion, and as far as I can recall there are now three proposals for what kind of changes might be worth making: (1) the implementation should conform to the docs;* (2) like (1) with the change that element_index should be changed to "integer | identifier" (rendering index_string otiose); (3) like (1) with the change that index_string should be changed to ''. * the docs link "integer" to http://docs.python.org/reference/lexical_analysis.html#grammar-token-integer but the current implementation only allows decimal integers, which seems reasonable and worth retaining. (2) was suggested by Greg Ewing on python-dev and (3) by Petri Lehtinen in the bug report. (Petri actually suggested that braces be disallowed except for the nesting in the format_spec, but it comes to the same thing.) None of these should be difficult to implement; patches exist for (1) and (2). (2) and (3) would lead to format strings that are easier to for the programmer to visually parse; (1) would make the indexing part of the replacement field conform more closely to the way indexing with strings behaves in Python generally, where arbitrary strings can be used. (It wouldn't conform exactly, obviously, since ']' would still be excluded.) I personally would prefer (1) to (2) or (3), and (3) to (2), had I my druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of them to nothing (or to changing the docs to reflect the current batty behavior). -- Ben Wolfson "Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry] From jimjjewett at gmail.com Tue Dec 13 08:09:02 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 13 Dec 2011 02:09:02 -0500 Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions Message-ID: (see http://www.python.org/dev/peps/pep-0393/ and http://hg.python.org/cpython/file/6f097ff9ac04/Include/unicodeobject.h ) typedef struct { PyObject_HEAD Py_ssize_t length; Py_hash_t hash; struct { unsigned int interned:2; unsigned int kind:2; /* now 3 in implementation */ unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; } state; wchar_t *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyCompactUnicodeObject; typedef struct { PyCompactUnicodeObject _base; union { void *any; Py_UCS1 *latin1; Py_UCS2 *ucs2; Py_UCS4 *ucs4; } data; } PyUnicodeObject; (1) Why is PyObject_HEAD used instead of PyObject_VAR_HEAD? It is because of the names (.length vs .size), or a holdover from when unicode (as opposed to str) did not expect to be compact, or is there a deeper reason? (2) Why does PyASCIIObject have a wstr member, and why does PyCompactUnicodeObject have wstr_length? As best I can tell from the PEP or header file, wstr is only meaningful when either: (2a) wstr is shared with (and redundant to) the canonical representation -- which will therefore not be ASCII. So wstr (and wstr_length) shouldn't need to be represented explicitly, and certainly not in the PyASCIIObject base. or (2b) The string is a "Legacy String" (and PyUnicode_READY has not been called). Because it is a Legacy String, the object header must already be a full PyUnicodeObject, and the wstr fields could at least be stored there. I'm also not sure why wstr can't be stored in the existing .data member -- once PyUnicode_READY is called, it will either be there (shared) or be discarded. Are there other times when the wstr will be explicitly re-filled and cached? (3) I would feel much less nervous if the remaining 4 values of PyUnicode_Kind were explicitly reserved, and the macros raised an error when they showed up. (Better still would be to allow other values, and to have the macros delegate to some attribute on the (sub) type object.) Discussion on py-ideas strongly suggested that people should not be rolling their own string string representations, and that it won't really save as much as people think it will, etc ... but I'm not sure that saying "do it without inheritance" is the best solution -- and that is what treating kind as an exhaustive list does. -jJ From martin at v.loewis.de Tue Dec 13 08:55:02 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 13 Dec 2011 08:55:02 +0100 Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions In-Reply-To: References: Message-ID: <4EE704D6.5000901@v.loewis.de> > (1) Why is PyObject_HEAD used instead of PyObject_VAR_HEAD? It is > because of the names (.length vs .size), or a holdover from when > unicode (as opposed to str) did not expect to be compact, or is there > a deeper reason? The unicode object is not a var object. In a var object, tp_itemsize gives the element size, which is not possible for unicode objects, since the itemsize may vary by instance. In addition, not all instances have the items after the base object (plus the size of the base object in tp_basicsize is also not always correct). > (2) Why does PyASCIIObject have a wstr member, and why does > PyCompactUnicodeObject have wstr_length? As best I can tell from the > PEP or header file, wstr is only meaningful when either: No. wstr is most of all relevant if someone calls PyUnicode_AsUnicode(AndSize); any unicode object might get the wstr pointer filled out at some point. It can be shared only if sizeof(Py_UNICODE) matches the canonical width of the string. wstr_length is only relevant if wstr is not NULL. For a pure ASCII string (and also for Latin-1 and other BMP strings), the wstr length will always equal the canonical length (number of code points). Only for ASCII objects the optimization was made to drop the wstr_length from the representation. > I'm also not sure why wstr can't be stored in the existing > .data member -- once PyUnicode_READY > is called, it will either be there (shared) or be discarded. Most objects won't have the .data member. For those that do, .data holds the canonical representation (and *only* after PyUnicode_READY has been called). > (3) I would feel much less nervous if the remaining 4 values of > PyUnicode_Kind were explicitly reserved, and the macros raised an > error when they showed up. (Better still would be to allow other > values, and to have the macros delegate to some attribute on the (sub) > type object.) > > Discussion on py-ideas strongly suggested that people should not be > rolling their own string string representations, and that it won't > really save as much as people think it will, etc ... but I'm not sure > that saying "do it without inheritance" is the best solution -- and > that is what treating kind as an exhaustive list does. If people use C, they can construct all kinds of "illegal" representations, for any object (e.g. lists where the stored length differs from the actual length, dictionaries where key an value are switched, and so on). If they do that, they likely get crashes and other failures, so they quickly stop doing it. In the specific case of kind values: many places will either work incorrectly, or have an assertion in debug mode already if an unexpected kind is encountered. I don't mind adding such checks to more places, but I also don't see a need to explicitly care about this specific class of bugs where people would have to deliberately try to "cheat". Regards, Martin From raymond.hettinger at gmail.com Tue Dec 13 09:37:20 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 13 Dec 2011 00:37:20 -0800 Subject: [Python-Dev] str.format implementation In-Reply-To: References: Message-ID: On Dec 12, 2011, at 7:56 PM, Ben Wolfson wrote: > I personally would prefer (1) to (2) or (3), and (3) to (2), had I my > druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of > them to nothing (or to changing the docs to reflect the current batty > behavior). +1 on changing the batty behavior. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Dec 13 11:11:07 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 13 Dec 2011 20:11:07 +1000 Subject: [Python-Dev] str.format implementation In-Reply-To: References: Message-ID: On Tue, Dec 13, 2011 at 6:37 PM, Raymond Hettinger wrote: > > On Dec 12, 2011, at 7:56 PM, Ben Wolfson wrote: > > I personally would prefer (1) to (2) or (3), and (3) to (2), had I my > druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of > them to nothing (or to changing the docs to reflect the current batty > behavior). > > > +1 on changing the batty behavior. Skimming my comments from last time this came up, +1 on just going with what the docs say. The PEP underspecified it, so taking the docs as the spec for this aspect seems like a reasonable course of action. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From petri at digip.org Tue Dec 13 10:44:53 2011 From: petri at digip.org (Petri Lehtinen) Date: Tue, 13 Dec 2011 11:44:53 +0200 Subject: [Python-Dev] str.format implementation In-Reply-To: References: Message-ID: <20111213094453.GD27440@p16> Ben Wolfson wrote: > Hi, > > I'm hoping to get some kind of consensus about the divergences between > the implementation and documentation of str.format > (http://mail.python.org/pipermail/python-dev/2011-June/111860.html and > the linked bug report contain examples of the divergences). These > pertain to the arg_name, attribute_name, and element_index fields of > the grammar in the docs: > > replacement_field ::= "{" [field_name] ["!" conversion] [":" > format_spec] "}" > field_name ::= arg_name ("." attribute_name | "[" > element_index "]")* > arg_name ::= [identifier | integer] > attribute_name ::= identifier > element_index ::= integer | index_string > index_string ::= + > > Nothing definitive emerged from the last round of discussion, and as > far as I can recall there are now three proposals for what kind of > changes might be worth making: > > (1) the implementation should conform to the docs;* > (2) like (1) with the change that element_index should be changed to > "integer | identifier" (rendering index_string otiose); > (3) like (1) with the change that index_string should be changed to > ''. > > * the docs link "integer" to > http://docs.python.org/reference/lexical_analysis.html#grammar-token-integer > but the current implementation only allows decimal integers, which > seems reasonable and worth retaining. > > (2) was suggested by Greg Ewing on python-dev and (3) by Petri > Lehtinen in the bug report. (Petri actually suggested that braces be > disallowed except for the nesting in the format_spec, but it comes to > the same thing.) > > None of these should be difficult to implement; patches exist for (1) > and (2). (2) and (3) would lead to format strings that are easier to > for the programmer to visually parse; (1) would make the indexing part > of the replacement field conform more closely to the way indexing with > strings behaves in Python generally, where arbitrary strings can be > used. (It wouldn't conform exactly, obviously, since ']' would still > be excluded.) > > I personally would prefer (1) to (2) or (3), and (3) to (2), had I my > druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of > them to nothing (or to changing the docs to reflect the current batty > behavior). +1 for changing. And as I've said before, I prefer proposal (3). From amauryfa at gmail.com Tue Dec 13 11:37:32 2011 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 13 Dec 2011 11:37:32 +0100 Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities In-Reply-To: <4EE686B2.6040806@haypocalc.com> References: <4EE686B2.6040806@haypocalc.com> Message-ID: 2011/12/12 Victor Stinner > "When sorting a list using the sort() method, attempting to inspect or > mutate the content of the list will result in undefined behaviour." But is this even true? in listobject.c::listsort(), since 2002, /* The list is temporarily made empty, so that mutations performed * by comparison functions can't affect the slice of memory we're * sorting (allowing mutations during sorting is a core-dump * factory, since ob_item may change). */ So behaviour is not undefined at all... maybe this report is only based on note #10 of the documentation: http://docs.python.org/library/stdtypes.html#mutable-sequence-types and only considers python 2.2 or older... -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From arigo at tunes.org Tue Dec 13 14:13:55 2011 From: arigo at tunes.org (Armin Rigo) Date: Tue, 13 Dec 2011 14:13:55 +0100 Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities In-Reply-To: References: <4EE686B2.6040806@haypocalc.com> Message-ID: Hi, On Tue, Dec 13, 2011 at 11:37, Amaury Forgeot d'Arc wrote: >> "When sorting a list using the sort() method, attempting to inspect or >> mutate the content of the list will result in undefined behaviour." > > (...) > So behaviour is not undefined at all... No, the behavior _is_ undefined. The comment you cited says that it cannot crash the Python interpreter; additionally, it makes a best-effort attempt at catching such accesses and raising ValueError. But I think I can build a strange-looking example where you mutate a list during sorting and don't get a ValueError (although admittedly it needs a lot of hacking to do that nowadays, e.g. multiple threads). A bient?t, Armin. From l at lrowe.co.uk Tue Dec 13 14:33:42 2011 From: l at lrowe.co.uk (Laurence Rowe) Date: Tue, 13 Dec 2011 14:33:42 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> Message-ID: On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough wrote: > On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: > > >> As someone who ported WebOb and other stuff built on top of it >> to Python >> 3 without using "from __future__ import unicode_literals", I'm >> kinda sad >> that to be using best practice I'll have to go back and flip >> the >> polarity on everything. >> >> >> Eh? If you don't need unicode_literals, what's the problem? > > Porting the WebOb code sucked. It's only about 5K lines of code but the > porting effort took me about 80 hours. Some of the problem is certainly > my own idiocy, but some of it is just because straddling code across > Python 2 and Python 3 currently requires that you change lots and lots > of code for suspect benefit. Could this manual work be cut down if there was a version of 2to3 that targeted the subset of the language that is compatible with both 2 and 3? That would seem to avoid most of the drawbacks to the current 2to3 approach. Laurence From amauryfa at gmail.com Tue Dec 13 14:35:08 2011 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 13 Dec 2011 14:35:08 +0100 Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities In-Reply-To: References: <4EE686B2.6040806@haypocalc.com> Message-ID: 2011/12/13 Armin Rigo > No, the behavior _is_ undefined. The comment you cited says that it > cannot crash the Python interpreter; additionally, it makes a > best-effort attempt at catching such accesses and raising ValueError. > But I think I can build a strange-looking example where you mutate a > list during sorting and don't get a ValueError (although admittedly it > needs a lot of hacking to do that nowadays, e.g. multiple threads). > I'm interested to see how! The current implementation installs an empty array in the list, and the initial array is only held by a local variable in listsort(). even gc.get_referrers() can return the empty list... -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Tue Dec 13 14:42:12 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 13 Dec 2011 13:42:12 +0000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> Message-ID: <4EE75634.6000208@voidspace.org.uk> On 13/12/2011 13:33, Laurence Rowe wrote: > On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough > wrote: > >> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: >> >> >>> As someone who ported WebOb and other stuff built on top of it >>> to Python >>> 3 without using "from __future__ import unicode_literals", I'm >>> kinda sad >>> that to be using best practice I'll have to go back and flip >>> the >>> polarity on everything. >>> >>> >>> Eh? If you don't need unicode_literals, what's the problem? >> >> Porting the WebOb code sucked. It's only about 5K lines of code but the >> porting effort took me about 80 hours. Some of the problem is certainly >> my own idiocy, but some of it is just because straddling code across >> Python 2 and Python 3 currently requires that you change lots and lots >> of code for suspect benefit. > > Could this manual work be cut down if there was a version of 2to3 that > targeted the subset of the language that is compatible with both 2 and > 3? That would seem to avoid most of the drawbacks to the current 2to3 > approach. > I'm not sure what you mean, but it *reads* as if you mean "a version of 2to3 that only converts code that doesn't need converting". Could you clarify? Thanks, Michael > Laurence > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From ncoghlan at gmail.com Tue Dec 13 15:24:16 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 14 Dec 2011 00:24:16 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE75634.6000208@voidspace.org.uk> References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: Input = normal 2.x code; Output = code that runs on both 2.x and 3.x. That is, tinkering with what 2to3 produces, not what it accepts. -- Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) On Dec 13, 2011 11:46 PM, "Michael Foord" wrote: > On 13/12/2011 13:33, Laurence Rowe wrote: > >> On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough >> wrote: >> >> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: >>> >>> >>> As someone who ported WebOb and other stuff built on top of it >>>> to Python >>>> 3 without using "from __future__ import unicode_literals", I'm >>>> kinda sad >>>> that to be using best practice I'll have to go back and flip >>>> the >>>> polarity on everything. >>>> >>>> >>>> Eh? If you don't need unicode_literals, what's the problem? >>>> >>> >>> Porting the WebOb code sucked. It's only about 5K lines of code but the >>> porting effort took me about 80 hours. Some of the problem is certainly >>> my own idiocy, but some of it is just because straddling code across >>> Python 2 and Python 3 currently requires that you change lots and lots >>> of code for suspect benefit. >>> >> >> Could this manual work be cut down if there was a version of 2to3 that >> targeted the subset of the language that is compatible with both 2 and 3? >> That would seem to avoid most of the drawbacks to the current 2to3 approach. >> >> I'm not sure what you mean, but it *reads* as if you mean "a version of > 2to3 that only converts code that doesn't need converting". Could you > clarify? > > Thanks, > > Michael > > Laurence >> >> ______________________________**_________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/**mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** >> fuzzyman%40voidspace.org.uk >> >> > > -- > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/**different.html > > ______________________________**_________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/**mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Tue Dec 13 15:27:04 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 13 Dec 2011 14:27:04 +0000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: <4EE760B8.50807@voidspace.org.uk> On 13/12/2011 14:24, Nick Coghlan wrote: > > Input = normal 2.x code; Output = code that runs on both 2.x and 3.x. > > That is, tinkering with what 2to3 produces, not what it accepts. > How is that different from what 2to3 currently does? Are you agreeing with Laurence, suggesting an alternative, or something else? Michael > -- > Nick Coghlan (via Gmail on Android, so likely to be more terse than usual) > > On Dec 13, 2011 11:46 PM, "Michael Foord" > wrote: > > On 13/12/2011 13:33, Laurence Rowe wrote: > > On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough > > wrote: > > On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: > > > As someone who ported WebOb and other stuff > built on top of it > to Python > 3 without using "from __future__ import > unicode_literals", I'm > kinda sad > that to be using best practice I'll have to go > back and flip > the > polarity on everything. > > > Eh? If you don't need unicode_literals, what's the > problem? > > > Porting the WebOb code sucked. It's only about 5K lines > of code but the > porting effort took me about 80 hours. Some of the > problem is certainly > my own idiocy, but some of it is just because straddling > code across > Python 2 and Python 3 currently requires that you change > lots and lots > of code for suspect benefit. > > > Could this manual work be cut down if there was a version of > 2to3 that targeted the subset of the language that is > compatible with both 2 and 3? That would seem to avoid most of > the drawbacks to the current 2to3 approach. > > I'm not sure what you mean, but it *reads* as if you mean "a > version of 2to3 that only converts code that doesn't need > converting". Could you clarify? > > Thanks, > > Michael > > Laurence > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > > > > -- > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/different.html > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From l at lrowe.co.uk Tue Dec 13 15:28:31 2011 From: l at lrowe.co.uk (Laurence Rowe) Date: Tue, 13 Dec 2011 15:28:31 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: On Tue, 13 Dec 2011 14:42:12 +0100, Michael Foord wrote: > On 13/12/2011 13:33, Laurence Rowe wrote: >> On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough >> wrote: >> >>> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: >>> >>> >>>> As someone who ported WebOb and other stuff built on top of it >>>> to Python >>>> 3 without using "from __future__ import unicode_literals", I'm >>>> kinda sad >>>> that to be using best practice I'll have to go back and flip >>>> the >>>> polarity on everything. >>>> >>>> >>>> Eh? If you don't need unicode_literals, what's the problem? >>> >>> Porting the WebOb code sucked. It's only about 5K lines of code but >>> the >>> porting effort took me about 80 hours. Some of the problem is >>> certainly >>> my own idiocy, but some of it is just because straddling code across >>> Python 2 and Python 3 currently requires that you change lots and lots >>> of code for suspect benefit. >> >> Could this manual work be cut down if there was a version of 2to3 that >> targeted the subset of the language that is compatible with both 2 and >> 3? That would seem to avoid most of the drawbacks to the current 2to3 >> approach. >> > I'm not sure what you mean, but it *reads* as if you mean "a version of > 2to3 that only converts code that doesn't need converting". Could you > clarify? > The approach that most people seem to have settled on for porting libraries to Python 3 is to make a single codebase that is compatible with both Python 2 and Python 3, perhaps making use of the six library. If I understand correctly, Chris' experience of porting WebOb was that there is a large amount of manual work required in this approach in part because of the many u'' strings in libraries that extensively use unicode. It should be possible to automate this with the same approach as 2to3, but instead of a transform from 2->3 it would transform code from 2->(2 & 3). In this case the transform would only have to be run once (rather than on every setup.py install) and would avoid the difficulties of debugging with tracebacks that do not exactly match the source code. Laurence From fuzzyman at voidspace.org.uk Tue Dec 13 15:34:00 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 13 Dec 2011 14:34:00 +0000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: <4EE76258.3040404@voidspace.org.uk> On 13/12/2011 14:28, Laurence Rowe wrote: > On Tue, 13 Dec 2011 14:42:12 +0100, Michael Foord > wrote: > >> On 13/12/2011 13:33, Laurence Rowe wrote: >>> On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough >>> wrote: >>> >>>> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: >>>> >>>> >>>>> As someone who ported WebOb and other stuff built on top >>>>> of it >>>>> to Python >>>>> 3 without using "from __future__ import unicode_literals", >>>>> I'm >>>>> kinda sad >>>>> that to be using best practice I'll have to go back and flip >>>>> the >>>>> polarity on everything. >>>>> >>>>> >>>>> Eh? If you don't need unicode_literals, what's the problem? >>>> >>>> Porting the WebOb code sucked. It's only about 5K lines of code >>>> but the >>>> porting effort took me about 80 hours. Some of the problem is >>>> certainly >>>> my own idiocy, but some of it is just because straddling code across >>>> Python 2 and Python 3 currently requires that you change lots and lots >>>> of code for suspect benefit. >>> >>> Could this manual work be cut down if there was a version of 2to3 >>> that targeted the subset of the language that is compatible with >>> both 2 and 3? That would seem to avoid most of the drawbacks to the >>> current 2to3 approach. >>> >> I'm not sure what you mean, but it *reads* as if you mean "a version >> of 2to3 that only converts code that doesn't need converting". Could >> you clarify? >> > > The approach that most people seem to have settled on for porting > libraries to Python 3 is to make a single codebase that is compatible > with both Python 2 and Python 3, perhaps making use of the six > library. If I understand correctly, Chris' experience of porting WebOb > was that there is a large amount of manual work required in this > approach in part because of the many u'' strings in libraries that > extensively use unicode. It should be possible to automate this with > the same approach as 2to3, but instead of a transform from 2->3 it > would transform code from 2->(2 & 3). In this case the transform would > only have to be run once (rather than on every setup.py install) and > would avoid the difficulties of debugging with tracebacks that do not > exactly match the source code. Ah, you mean a 2toPython3compatible2 converter. Not a bad idea. Michael > > Laurence > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From vinay_sajip at yahoo.co.uk Tue Dec 13 16:54:21 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Tue, 13 Dec 2011 15:54:21 +0000 (UTC) Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: Laurence Rowe lrowe.co.uk> writes: > The approach that most people seem to have settled on for porting > libraries to Python 3 is to make a single codebase that is compatible with > both Python 2 and Python 3, perhaps making use of the six library. If I > understand correctly, Chris' experience of porting WebOb was that there is > a large amount of manual work required in this approach in part because of > the many u'' strings in libraries that extensively use unicode. It should > be possible to automate this with the same approach as 2to3, but instead > of a transform from 2->3 it would transform code from 2->(2 & 3). In this > case the transform would only have to be run once (rather than on every > setup.py install) and would avoid the difficulties of debugging with > tracebacks that do not exactly match the source code. I started writing a tool today, tentatively called '2to23', which aims to do this. It's basically 2to3, but with a package of custom fixers in a package 'lib2to23.fixers' adapted from the corresponding fixers in lib2to3. It's experimental work in progress at the moment. With a sample file like import anything import dummy class CustomException(Exception): pass def func1(): a = u'abc' b = b'def' c = 'unchanged' c1 = u'abc' u'def' def func2(): try: d = 5L e = (int, long) f = (long, int) g = func3() if isinstance(g, basestring): print 'a string' elif isinstance(g, bytes): print 'some bytes' elif isinstance(g, unicode): print 'a unicode string' else: print for i in xrange(3): pass except Exception: e = sys.exc_info() raise CustomException, e[1], e[2] class BaseClass: pass class OtherBaseClass: pass class MetaClass: pass class DerivedClass(BaseClass, OtherBaseClass): __metaclass__ = MetaClass 2to23 gives the following suggested changes: --- sample.py (original) +++ sample.py (refactored) @@ -1,34 +1,41 @@ import anything import dummy +from django.utils.py3 import long_type +from django.utils.py3 import string_types +from django.utils.py3 import binary_type +from django.utils.py3 import b +from django.utils.py3 import text_type +from django.utils.py3 import u +from django.utils.py3 import xrange class CustomException(Exception): pass def func1(): - a = u'abc' - b = b'def' + a = u('abc') + b = b('def') c = 'unchanged' - c1 = u'abc' u'def' + c1 = u('abc') u('def') def func2(): try: - d = 5L + d = long_type(5) e = (int, long) f = (long, int) g = func3() - if isinstance(g, basestring): - print 'a string' - elif isinstance(g, bytes): - print 'some bytes' - elif isinstance(g, unicode): - print 'a unicode string' + if isinstance(g, string_types): + print('a string') + elif isinstance(g, binary_type): + print('some bytes') + elif isinstance(g, text_type): + print('a unicode string') else: - print + print() for i in xrange(3): pass except Exception: e = sys.exc_info() - raise CustomException, e[1], e[2] + raise CustomException(e[1]).with_traceback(e[2]) class BaseClass: pass @@ -39,8 +46,8 @@ class MetaClass: pass -class DerivedClass(BaseClass, OtherBaseClass): - __metaclass__ = MetaClass +class DerivedClass(with_metaclass(MetaClass, BaseClass, OtherBaseClass)): + pass As you can see, there's still a bit of work to do, and the sample doesn't cover all use cases yet. I'll be cross-checking it using my recent Django porting work to confirm that it covers everything at least needed for that port, which is why the fixers currently generate imports from django.utils.py3. Obviously, I'll change this in due course. Regards, Vinay Sajip From solipsis at pitrou.net Tue Dec 13 17:24:23 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 13 Dec 2011 17:24:23 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: <20111213172423.2c567d8b@pitrou.net> On Tue, 13 Dec 2011 15:28:31 +0100 "Laurence Rowe" wrote: > > The approach that most people seem to have settled on for porting > libraries to Python 3 is to make a single codebase that is compatible with > both Python 2 and Python 3, perhaps making use of the six library. Do you have evidence that "most" people have settled on that approach? (besides the couple of library writers who have commented on this thread) From barry at python.org Tue Dec 13 17:21:04 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 13 Dec 2011 11:21:04 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111213172423.2c567d8b@pitrou.net> References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> Message-ID: <20111213112104.6b02cd30@resist.wooz.org> On Dec 13, 2011, at 05:24 PM, Antoine Pitrou wrote: >On Tue, 13 Dec 2011 15:28:31 +0100 >"Laurence Rowe" wrote: >> >> The approach that most people seem to have settled on for porting >> libraries to Python 3 is to make a single codebase that is compatible with >> both Python 2 and Python 3, perhaps making use of the six library. > >Do you have evidence that "most" people have settled on that approach? >(besides the couple of library writers who have commented on this >thread) I'm not sure there's any settling at all when it comes to Python 3 porting yet. ;) Sometimes, one code base works better, other times 2to3 works well. I tend to use the latter on pure-Python setuptools-based projects, and the former on projects with C extensions, autoconf-based libraries. -Barry From regebro at gmail.com Tue Dec 13 17:40:46 2011 From: regebro at gmail.com (Lennart Regebro) Date: Tue, 13 Dec 2011 17:40:46 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein> <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> Message-ID: On Tue, Dec 13, 2011 at 14:33, Laurence Rowe wrote: > Could this manual work be cut down if there was a version of 2to3 that > targeted the subset of the language that is compatible with both 2 and 3? Not really, but a 2to6, ie something that tries to keep Python 2 compatibility by using the six library, might be useful. //Lennart From pje at telecommunity.com Tue Dec 13 20:02:45 2011 From: pje at telecommunity.com (PJ Eby) Date: Tue, 13 Dec 2011 14:02:45 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111213172423.2c567d8b@pitrou.net> References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> Message-ID: On Tue, Dec 13, 2011 at 11:24 AM, Antoine Pitrou wrote: > On Tue, 13 Dec 2011 15:28:31 +0100 > "Laurence Rowe" wrote: > > > > The approach that most people seem to have settled on for porting > > libraries to Python 3 is to make a single codebase that is compatible > with > > both Python 2 and Python 3, perhaps making use of the six library. > > Do you have evidence that "most" people have settled on that approach? > (besides the couple of library writers who have commented on this > thread) > I've seen more projects doing it that way than maintaining dual code bases. In retrospect, it seems way more attractive than having to run a converter all the time, especially if I could run a "2to6" tool *once* and then simply write new code using six-isms Among other things, it means that: * There's only one codebase * If the conversion isn't perfect, you only have to fix it once * Line numbers are the same * There's no conversion step slowing down development So, I expect that if the approach is at all viable, it'll quickly become the One Obvious Way to do it. In effect, 2to3 is a "purity" solution, but six is more like a "practicality" solution. And if there's official support for it, so much the better. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue Dec 13 17:31:41 2011 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 13 Dec 2011 11:31:41 -0500 Subject: [Python-Dev] str.format implementation In-Reply-To: References: Message-ID: <4EE77DED.904@trueblade.com> On 12/12/2011 10:56 PM, Ben Wolfson wrote: > Hi, > > I'm hoping to get some kind of consensus about the divergences between > the implementation and documentation of str.format > (http://mail.python.org/pipermail/python-dev/2011-June/111860.html and > the linked bug report contain examples of the divergences). These > pertain to the arg_name, attribute_name, and element_index fields of > the grammar in the docs: > > replacement_field ::= "{" [field_name] ["!" conversion] [":" > format_spec] "}" > field_name ::= arg_name ("." attribute_name | "[" > element_index "]")* > arg_name ::= [identifier | integer] > attribute_name ::= identifier > element_index ::= integer | index_string > index_string ::= + > > Nothing definitive emerged from the last round of discussion, and as > far as I can recall there are now three proposals for what kind of > changes might be worth making: > > (1) the implementation should conform to the docs;* > (2) like (1) with the change that element_index should be changed to > "integer | identifier" (rendering index_string otiose); I've now learned what "otiose" means. Thanks! > (3) like (1) with the change that index_string should be changed to > ''. This is still on my plate. I just haven't had a lot of Python time recently. But I do plan to address this. Eric. From tjreedy at udel.edu Tue Dec 13 22:10:27 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 13 Dec 2011 16:10:27 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> Message-ID: On 12/13/2011 2:02 PM, PJ Eby wrote: > On Tue, Dec 13, 2011 at 11:24 AM, Antoine Pitrou > wrote: > > On Tue, 13 Dec 2011 15:28:31 +0100 > "Laurence Rowe" > wrote: > > > > The approach that most people seem to have settled on for porting > > libraries to Python 3 is to make a single codebase that is > compatible with > > both Python 2 and Python 3, perhaps making use of the six library. > > Do you have evidence that "most" people have settled on that approach? > (besides the couple of library writers who have commented on this > thread) I think there is clearly enough 'some' people to justify official support of a 2to23 (or more obscurely, 2to6, but I just got the point that 6=2*3). Beyond that, we don't know and don't need to know. > I've seen more projects doing it that way than maintaining dual code > bases. In retrospect, it seems way more attractive than having to run a > converter all the time, especially if I could run a "2to6" tool *once* > and then simply write new code using six-isms > > Among other things, it means that: > > * There's only one codebase > * If the conversion isn't perfect, you only have to fix it once > * Line numbers are the same > * There's no conversion step slowing down development > > So, I expect that if the approach is at all viable, it'll quickly become > the One Obvious Way to do it. In effect, 2to3 is a "purity" solution, > but six is more like a "practicality" solution. 2to3 is the practical solution for someone converting private Python 2 code to run on Python 3 *instead* of Python 2, without looking back. By the nature of things, such conversions will be private and scattered over the next decade or so. If 2to3 works well, we will never hear about them, except for the rare praise. Ditto for public code whose author wishes to abandon Py 2. But that seems to rare so far. So we are really talking about upgrading public libraries and apps to work with Python 3 *as well as* 'recent' Python 2 versions. For that, a 'Python23' bridge seems to work for some. Looking ahead, there will in the future be a need for a 23to3 converter to remove the then extraneous bridge code. But that will need a semi-standard 'Python23' to work from. -- Terry Jan Reedy From jimjjewett at gmail.com Tue Dec 13 22:17:13 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 13 Dec 2011 16:17:13 -0500 Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions In-Reply-To: <4EE704D6.5000901@v.loewis.de> References: <4EE704D6.5000901@v.loewis.de> Message-ID: On Tue, Dec 13, 2011 at 2:55 AM, "Martin v. L?wis" wrote: >> (1) ?Why is PyObject_HEAD used instead of PyObject_VAR_HEAD? > The unicode object is not a var object. In a var object, tp_itemsize > gives the element size, which is not possible for unicode objects, > since the itemsize may vary by instance. In addition, not all instances > have the items after the base object (plus the size of the base object > in tp_basicsize is also not always correct). That makes perfect sense. Any chance of adding the rationale to the code? Either inline, such as changing unicodeobject.h line 291 from PyObject_HEAD to something like: PyObject_HEAD /* Not VAR_HEAD, because tp_itemsize varies, and data may be elsewhere. */ or in the large comments around line 288: Note that Strings use PyObject_HEAD and a length field instead of PyObject_VAR_HEAD, because the tp_itemsize varies by instance, and the actual data is not always immediately after the PyASCIIObject header. >> (2) ?Why does PyASCIIObject have a wstr member, and why does >> PyCompactUnicodeObject have wstr_length? ?As best I can tell from the >> PEP or header file, wstr is only meaningful when either: > No. wstr is most of all relevant if someone calls > PyUnicode_AsUnicode(AndSize); any unicode object might get the > wstr pointer filled out at some point. I am willing to believe that requests for a wchar_t (or utf-8 or System Locale charset) representation are common enough to justify caching the data after the first request. But then why throw it away in the first place? Wouldn't programs that create unicode from wchar_t data also be the most likely to request wchar_t data back? > wstr_length is only relevant if wstr is not NULL. For a pure ASCII > string (and also for Latin-1 and other BMP strings), the wstr length > will always equal the canonical length (number of code points). wstr_length != length exactly when: 2==sizeof(wchar_t) && PyUnicode_4BYTE_KIND == PyUnicode_KIND( str ) which can sometimes be eliminated at compile-time, and always by string creation time. In all other cases, (wstr_length == length), and wstr can be generated by widening the data without having to inspect it. Is it worth eliminating wstr_length (or even wstr) in those cases, or is that too much complexity? >> (3) ?I would feel much less nervous if the remaining 4 values of >> PyUnicode_Kind were explicitly reserved, and the macros raised an >> error when they showed up. ... > If people use C, they can construct all kinds of "illegal" ... > kind values: many places will either work incorrectly, or have > an assertion in debug mode already if an unexpected kind is > encountered. What I'm asking is that (1) The other values be documented as reserved, rather than as illegal. (2) The macros produce an error rather than silently corrupting data. This allows at least the possibility of a later change such that (3) The macros handle the new values correctly, if only by delegating back to type-supplied functions. -jJ From tjreedy at udel.edu Tue Dec 13 22:37:10 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 13 Dec 2011 16:37:10 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: On 12/13/2011 10:54 AM, Vinay Sajip wrote: > I started writing a tool today, tentatively called '2to23', which aims to do > this. It's basically 2to3, but with a package of custom fixers in a package > 'lib2to23.fixers' adapted from the corresponding fixers in lib2to3. When, some year in the future, people want to drop Python 2 compatibility from their Python23 code, they will need a 23to3 tool. You might keep this in mind when designing and documenting a bridge language. For each 2to23 fixer, is there a 23to3 fixer so that 23to3(2to23(code)) == 2to3(code) or close enough. (23to3 can and should assume that its input is the output of 2to23, and only look to convert the ultimately temporary scaffolding inserted by 2to23.) The point about documentation is to list the names that 2to23 introduces (with its special meanings) and that 23to3 will remove (assuming the special meanings). So these names should neither be in the 2 code before running 2to23 nor added to 23 code (with a different meaning) before running 23to3. If 2to23 were paired with a 23to3, so people knew that its output is not a deadend cul-de-sac, but a stepping stone to the future, it would be even more attractive. -- Terry Jan Reedy From fuzzyman at voidspace.org.uk Tue Dec 13 23:17:16 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 13 Dec 2011 22:17:16 +0000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> Message-ID: <4EE7CEEC.2020306@voidspace.org.uk> On 13/12/2011 21:10, Terry Reedy wrote: > On 12/13/2011 2:02 PM, PJ Eby wrote: >> On Tue, Dec 13, 2011 at 11:24 AM, Antoine Pitrou > > wrote: >> >> On Tue, 13 Dec 2011 15:28:31 +0100 >> "Laurence Rowe" > wrote: >> > >> > The approach that most people seem to have settled on for porting >> > libraries to Python 3 is to make a single codebase that is >> compatible with >> > both Python 2 and Python 3, perhaps making use of the six library. >> >> Do you have evidence that "most" people have settled on that >> approach? >> (besides the couple of library writers who have commented on this >> thread) > > I think there is clearly enough 'some' people to justify official > support of a 2to23 (or more obscurely, 2to6, but I just got the point > that 6=2*3). More specifically "six" [1] is the name of Benjamin Peterson's support package to help write code that works on both 2 and 3. So the idea is that the conversion isn't just a straight syntax conversion - but that it [could] generate code using this library. All the best, Michael [1] http://packages.python.org/six/ > Beyond that, we don't know and don't need to know. > >> I've seen more projects doing it that way than maintaining dual code >> bases. In retrospect, it seems way more attractive than having to run a >> converter all the time, especially if I could run a "2to6" tool *once* >> and then simply write new code using six-isms >> >> Among other things, it means that: >> >> * There's only one codebase >> * If the conversion isn't perfect, you only have to fix it once >> * Line numbers are the same >> * There's no conversion step slowing down development >> >> So, I expect that if the approach is at all viable, it'll quickly become >> the One Obvious Way to do it. In effect, 2to3 is a "purity" solution, >> but six is more like a "practicality" solution. > > 2to3 is the practical solution for someone converting private Python 2 > code to run on Python 3 *instead* of Python 2, without looking back. > By the nature of things, such conversions will be private and > scattered over the next decade or so. If 2to3 works well, we will > never hear about them, except for the rare praise. Ditto for public > code whose author wishes to abandon Py 2. But that seems to rare so far. > > So we are really talking about upgrading public libraries and apps to > work with Python 3 *as well as* 'recent' Python 2 versions. For that, > a 'Python23' bridge seems to work for some. > > Looking ahead, there will in the future be a need for a 23to3 > converter to remove the then extraneous bridge code. But that will > need a semi-standard 'Python23' to work from. > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From ncoghlan at gmail.com Tue Dec 13 23:38:06 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 14 Dec 2011 08:38:06 +1000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE7CEEC.2020306@voidspace.org.uk> References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> <4EE7CEEC.2020306@voidspace.org.uk> Message-ID: On Wed, Dec 14, 2011 at 8:17 AM, Michael Foord wrote: > More specifically "six" [1] is the name of Benjamin Peterson's support > package to help write code that works on both 2 and 3. So the idea is that > the conversion isn't just a straight syntax conversion - but that it [could] > generate code using this library. The thing is, the code you want to generate varies depending on whether you want to target 2.6+, or include 2.5 and earlier. For 2.6+, you can just use the print_function and unicode_literal __future__ imports to minimise overhead. But if 2.5 and earlier is in the mix, you need to lean more heavily on six (for u(), b() and print_()) String translation is also an open question. For some codebases, you want both u"" and "" to translate to a Unicode "" (either in Py3k or via the future import), but if a code base deals with WSGI-style native strings (by means of u"" for text, "" for native, b"" for binary), then the more appropriate translation is to use the future import and map them to "", str("") and b"" respectively. So, rather than an overall "2to6", it may be better to focus on *specific* fixers that can be tweaked or added to help with: 2.4+ -> 2.4+, 3.2+ 2.4+ -> 2.6+, 3.2+ 2.6+ -> 2.6+, 3.2+ 2.6+, 3.2+ -> 3.2+ (with handling of string literals being the most significant, and likely most complicated) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Tue Dec 13 23:52:54 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 13 Dec 2011 17:52:54 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> <4EE7CEEC.2020306@voidspace.org.uk> Message-ID: <20111213175254.7b2cd6d0@resist.wooz.org> On Dec 14, 2011, at 08:38 AM, Nick Coghlan wrote: >String translation is also an open question. For some codebases, you >want both u"" and "" to translate to a Unicode "" (either in Py3k or >via the future import) I have a fixer for this: http://bazaar.launchpad.net/~barry/flufl.i18n/devel/view/head:/myfixers/fix_ugettext.py (or maybe by "translation" you don't mean "gettext"). Cheers, -Barry From storchaka at gmail.com Wed Dec 14 00:16:34 2011 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 14 Dec 2011 01:16:34 +0200 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> <4EE7CEEC.2020306@voidspace.org.uk> Message-ID: 14.12.11 00:38, Nick Coghlan ???????(??): > String translation is also an open question. For some codebases, you > want both u"" and "" to translate to a Unicode "" (either in Py3k or > via the future import), but if a code base deals with WSGI-style > native strings (by means of u"" for text, "" for native, b"" for > binary), then the more appropriate translation is to use the future > import and map them to "", str("") and b"" respectively. There are other place for native strings -- sys.argv. if sys.argv[1] == str('-'): f = sys.stdin else: f = open(sys.argv[1], 'r') Yet another pitfall -- sys.stdin is bytes stream in 2.x and text stream in 3.x. For reading binary data: if sys.argv[1] == str('-'): if sys.version_info[0] >= 3: f = sys.stdin.buffer.raw else: f = sys.stdin else: f = open(sys.argv[1], 'r') Reading text data is even more complicated in Python 2.x. From exarkun at twistedmatrix.com Wed Dec 14 00:36:28 2011 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Tue, 13 Dec 2011 23:36:28 -0000 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> Message-ID: <20111213233628.1828.718618756.divmod.xquotient.271@localhost.localdomain> On 09:37 pm, tjreedy at udel.edu wrote: >On 12/13/2011 10:54 AM, Vinay Sajip wrote: >>I started writing a tool today, tentatively called '2to23', which aims >>to do >>this. It's basically 2to3, but with a package of custom fixers in a >>package >>'lib2to23.fixers' adapted from the corresponding fixers in lib2to3. > >When, some year in the future, people want to drop Python 2 >compatibility from their Python23 code, they will need a 23to3 tool. No, they will not. They only need a 2to3 or 2to6 tool because Python 2 and Python 3 are not compatible with each other, but they want one program to be valid in Python 2 and Python 3 simultaneously. When they decide they no longer care about Python 2, they can just stop taking care to keep their program valid as Python 2 and only take care to keep it a valid Python 3 program. There's no specific change to make, just a different approach to take with future maintenance. You might say that they will *want* to immediately discard all of their legacy Python 2 support code. I suspect many of them will not want this; but either way it's a want, not a need. Jean-Paul From martin at v.loewis.de Wed Dec 14 01:01:40 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 14 Dec 2011 01:01:40 +0100 Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions In-Reply-To: References: <4EE704D6.5000901@v.loewis.de> Message-ID: <4EE7E764.5050008@v.loewis.de> > Any chance of adding the rationale to the code? I'm really short of time right now, so you need to find somebody else to make such a change. > I am willing to believe that requests for a wchar_t (or utf-8 or > System Locale charset) representation are common enough to justify > caching the data after the first request. That's not the issue; the real issue is memory management. > But then why throw it away in the first place? Wouldn't programs that > create unicode from wchar_t data also be the most likely to request > wchar_t data back? Perhaps. But are they likely to access the string they just created again at all? They know what's in it, so why look at it again? > In all other cases, (wstr_length == length), and wstr can be generated > by widening the data without having to inspect it. Is it worth > eliminating wstr_length (or even wstr) in those cases, or is that too > much complexity? It's too little saving. > What I'm asking is that > (1) The other values be documented as reserved, rather than as illegal. How is that different? > (2) The macros produce an error rather than silently corrupting data. In debug mode, or release mode? -1 on release mode. Regards, Martin From solipsis at pitrou.net Wed Dec 14 01:30:24 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 14 Dec 2011 01:30:24 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> Message-ID: <20111214013024.74addba7@pitrou.net> On Tue, 13 Dec 2011 14:02:45 -0500 PJ Eby wrote: > > Among other things, it means that: > > * There's only one codebase > * If the conversion isn't perfect, you only have to fix it once > * Line numbers are the same > * There's no conversion step slowing down development > > So, I expect that if the approach is at all viable, it'll quickly become > the One Obvious Way to do it. Well, with all due respect, this is hand-waving. Sure, if it's viable, then fine. The question is if it's "viable", precisely. That depends on which project we're talking about. > In effect, 2to3 is a "purity" solution, but > six is more like a "practicality" solution. This sounds like your personal interpretation. I see nothing "pure" in 2to3. Regards Antoine. From pje at telecommunity.com Wed Dec 14 03:42:48 2011 From: pje at telecommunity.com (PJ Eby) Date: Tue, 13 Dec 2011 21:42:48 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111214013024.74addba7@pitrou.net> References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> <20111214013024.74addba7@pitrou.net> Message-ID: On Tue, Dec 13, 2011 at 7:30 PM, Antoine Pitrou wrote: > On Tue, 13 Dec 2011 14:02:45 -0500 > PJ Eby wrote: > > > > Among other things, it means that: > > > > * There's only one codebase > > * If the conversion isn't perfect, you only have to fix it once > > * Line numbers are the same > > * There's no conversion step slowing down development > > > > So, I expect that if the approach is at all viable, it'll quickly become > > the One Obvious Way to do it. > > Well, with all due respect, this is hand-waving. Sure, if it's > viable, then fine. The question is if it's "viable", precisely. That > depends on which project we're talking about. > What I'm saying is that it has many characteristics that are desirable for people who need to support Python 2 and 3 - which is likely the most common use case for library developers. > In effect, 2to3 is a "purity" solution, but > > six is more like a "practicality" solution. > > This sounds like your personal interpretation. I see nothing "pure" in > 2to3. > It's "pure" in being optimized for a world where you just stop using Python 2 one day, and start using 3 the next, without any crossover support. As someone else pointed out, this is a more common case for application developers than for library developers. However, until the libraries are ported, it's harder for the app developers to port their apps. Anyway, if you're supporting both 2 and 3, a common code base offers many attractions, so if it can be done, it will. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Dec 14 05:29:27 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 13 Dec 2011 23:29:27 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <20111213233628.1828.718618756.divmod.xquotient.271@localhost.localdomain> References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213233628.1828.718618756.divmod.xquotient.271@localhost.localdomain> Message-ID: On 12/13/2011 6:36 PM, exarkun at twistedmatrix.com wrote: > On 09:37 pm, tjreedy at udel.edu wrote: >> On 12/13/2011 10:54 AM, Vinay Sajip wrote: >>> I started writing a tool today, tentatively called '2to23', which >>> aims to do >>> this. It's basically 2to3, but with a package of custom fixers in a >>> package >>> 'lib2to23.fixers' adapted from the corresponding fixers in lib2to3. >> >> When, some year in the future, people want to drop Python 2 >> compatibility from their Python23 code, they will need a 23to3 tool. > > No, they will not. Yes they will, if you read my conditional statement properly. Anyway, quibbling over the meaning of 'need' is quite useless. It has two shades of meaning: lack of something required, and lack of something desired. You could have made the valid part of your point without starting off as you did. But I already implied that removal is less urgent when I wrote "When, some year in the future...". > They only need a 2to3 or 2to6 tool because Python 2 > and Python 3 are not compatible with each other, but they want one > program to be valid in Python 2 and Python 3 simultaneously. They *need* the extra stuff inserted. They do not *want* to insert by hand. So by your narrow meaning of 'need', one could say that having the insertion done by program is a want, not a need. > When they decide they no longer care about Python 2, they can just stop > taking care to keep their program valid as Python 2 and only take care > to keep it a valid Python 3 program. There's no specific change to make, > just a different approach to take with future maintenance. > > You might say that they will *want* to immediately discard all of their > legacy Python 2 support code. I suspect many of them will not want this; > but either way it's a want, not a need. If and when someone wants the extra stuff removed to eliminated both the extra run-time and mental overhead of having it around, and they do not want to remove it by hand, they will want and therefore need in the more general sense to have it done automatically. In both cases, addition and removal, the process is tedious and error-prone if done by hand. -- Terry Jan Reedy From tjreedy at udel.edu Wed Dec 14 05:51:00 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 13 Dec 2011 23:51:00 -0500 Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions In-Reply-To: <4EE7E764.5050008@v.loewis.de> References: <4EE704D6.5000901@v.loewis.de> <4EE7E764.5050008@v.loewis.de> Message-ID: On 12/13/2011 7:01 PM, "Martin v. L?wis" wrote: >> What I'm asking is that >> (1) The other values be documented as reserved, rather than as illegal. > How is that different? >> (2) The macros produce an error rather than silently corrupting data. > In debug mode, or release mode? -1 on release mode. These two requests seem slight contradictory. Non-official __xxx__ names are reserved for future use but not illegal now for user-use, and user-generated examples do not raise an exception. They simply do not get any special attention unless and until given an official meaning. Then too bad if that breaks code. So by analogy, reserved type value would be ignored, neither corrupting data or raising errors, until put in use. But I don't know how easy/practical that would be. Or maybe more to the point, how expensive a check would be. Not checking names for reservedness is the easiest thing to do. -- Terry Jan Reedy From regebro at gmail.com Wed Dec 14 08:21:07 2011 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 14 Dec 2011 08:21:07 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? Message-ID: On Tue, Dec 13, 2011 at 23:38, Nick Coghlan wrote: > On Wed, Dec 14, 2011 at 8:17 AM, Michael Foord > wrote: >> More specifically "six" [1] is the name of Benjamin Peterson's support >> package to help write code that works on both 2 and 3. So the idea is that >> the conversion isn't just a straight syntax conversion - but that it [could] >> generate code using this library. > > The thing is, the code you want to generate varies depending on > whether you want to target 2.6+, or include 2.5 and earlier. Sure. This is different fixers, and then script to run it could have a parameter for version. I'd expect though that a 2to6 first targets 2.6+, and possibly never end up supporting 2.5 at all. I do realize there still is 2.4 out in the wild, but fewer and fewer people need to support it, and the effort to support it is much higher. > String translation is also an open question. For some codebases, you > want both u"" and "" to translate to a Unicode "" (either in Py3k or > via the future import), but if a code base deals with WSGI-style > native strings (by means of u"" for text, "" for native, b"" for > binary), then the more appropriate translation is to use the future > import and map them to "", str("") and b"" respectively. Yeah, that can't be done automatically. There is no generic way to determine if a string should be binary, unicode or native. From victor.stinner at haypocalc.com Wed Dec 14 09:31:40 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 14 Dec 2011 09:31:40 +0100 Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions In-Reply-To: References: Message-ID: <2338734.i6kI0g9g3P@ned> Le mardi 13 d?cembre 2011 02:09:02 Jim Jewett a ?crit : > (3) I would feel much less nervous if the remaining 4 values of > PyUnicode_Kind were explicitly reserved, and the macros raised an > error when they showed up. (Better still would be to allow other > values, and to have the macros delegate to some attribute on the (sub) > type object.) A macro is not supposed to raise an error. In debug mode, _PyUnicode_CheckConsistency() ensures that the kind is valid and PyUnicode_KIND() fails with an assertion error if kind is PyUnicode_WCHAR_KIND. Python cannot create a string with a kind different than PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND (the legacy API creates strings with a temporary PyUnicode_WCHAR_KIND kind, kind quickly replaces by PyUnicode_READY). If you write your own extension generating an invalid string, I don't think that Python can help you. Python cannot check all data, it would be too slow. If we change something, I would suggest to remove PyUnicode_WCHAR_KIND from the PyUnicode_Kind, so you can be sure that PyUnicode_KIND() result is an enum with 3 possible values (PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND). It would help to make quiet the compiler on switch/case ;-) Victor From martin at v.loewis.de Wed Dec 14 10:15:00 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 14 Dec 2011 10:15:00 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8b@pitrou.net> <20111214013024.74addba7@pitrou.net> Message-ID: <4EE86914.1050904@v.loewis.de> > > In effect, 2to3 is a "purity" solution, but > > six is more like a "practicality" solution. > > This sounds like your personal interpretation. I see nothing "pure" in > 2to3. > > > It's "pure" in being optimized for a world where you just stop using > Python 2 one day, and start using 3 the next, without any crossover support. That's not true. 2to3 is well suited for supporting both 2 and 3 from the same code base, and reduces the number of compromises you have to make compared to an identical-source approach (more dramatically so if you also want to support 2.5 or 2.4). > Anyway, if you're supporting both 2 and 3, a common code base offers > many attractions, so if it can be done, it will. And 2to3 is a good approach to maintaining a common code base. Regards, Martin From solipsis at pitrou.net Wed Dec 14 10:58:42 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 14 Dec 2011 10:58:42 +0100 Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions References: <4EE704D6.5000901@v.loewis.de> <4EE7E764.5050008@v.loewis.de> Message-ID: <20111214105842.1eea1ced@pitrou.net> On Tue, 13 Dec 2011 23:51:00 -0500 Terry Reedy wrote: > So by analogy, reserved type value would be ignored, neither corrupting > data or raising errors, until put in use. That simply doesn't make sense. Regards Antoine. From tseaver at palladion.com Wed Dec 14 17:33:32 2011 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 14 Dec 2011 11:33:32 -0500 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE86914.1050904@v.loewis.de> References: <1323320919.2710.24.camel@thinko> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8 b@pitrou.net> <20111214013024.74addba7@pitrou.net> <4EE86914.1050904@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/14/2011 04:15 AM, "Martin v. L?wis" wrote: >> It's "pure" in being optimized for a world where you just stop >> using Python 2 one day, and start using 3 the next, without any >> crossover support. > > That's not true. 2to3 is well suited for supporting both 2 and 3 from > the same code base, and reduces the number of compromises you have to > make compared to an identical-source approach (more dramatically so if > you also want to support 2.5 or 2.4). > >> Anyway, if you're supporting both 2 and 3, a common code base >> offers many attractions, so if it can be done, it will. > > And 2to3 is a good approach to maintaining a common code base. Not in the experience of the folks who are actually doing that task: the overhead of running 2to3 every time 'setup.py develop' etc. runs dooms the effort. For instance, we have a report that the 2to3 step takes more than half an hour (on at least one user's development machine) when installing / refreshing zope.interface in a Python 3.2 virtualenv. (Note that I'm in the process of getting that package's unit test coverage up to snuff before ripping out the 2to3 support in favor of a subset). Using 2to3 during ongoing development makes Python feel like Java/C++, where "get a cup of coffee while we rebuild the world" is a frequent occurence. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7oz9wACgkQ+gerLs4ltQ7i4wCgh+9GliqukApx1skTs/0AnjKU CUMAoLzzkctR0gcSBR3qBxZmsAg1kvvt =FVtj -----END PGP SIGNATURE----- From martin at v.loewis.de Wed Dec 14 18:23:12 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 14 Dec 2011 18:23:12 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8 b@pitrou.net> <20111214013024.74addba7@pitrou.net> <4EE86914.1050904@v.loewis.de> Message-ID: <4EE8DB80.5050007@v.loewis.de> >> And 2to3 is a good approach to maintaining a common code base. > > > Not in the experience of the folks who are actually doing that task: Well, I personally actually *did* the task, so that experience certainly isn't universal. > the > overhead of running 2to3 every time 'setup.py develop' etc. runs dooms > the effort. How so? Running 2to3 after every change is very fast. I never use setup.py develop, though. > Using 2to3 during ongoing development makes Python feel like Java/C++, > where "get a cup of coffee while we rebuild the world" is a frequent > occurence. Unfortunately, these issues never get reported. I worked on porting zope.interface, and it never took 30 minutes for me, not even remotely. Regards, Martin From stefan_ml at behnel.de Wed Dec 14 19:05:54 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 14 Dec 2011 19:05:54 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: <4EE8DB80.5050007@v.loewis.de> References: <1323320919.2710.24.camel@thinko> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111213172423.2c567d8 b@pitrou.net> <20111214013024.74addba7@pitrou.net> <4EE86914.1050904@v.loewis.de> <4EE8DB80.5050007@v.loewis.de> Message-ID: "Martin v. L?wis", 14.12.2011 18:23: >> overhead of running 2to3 every time 'setup.py develop' etc. runs dooms >> the effort. > > How so? Running 2to3 after every change is very fast. I never use > setup.py develop, though. I think the problem starts with the fact that it needs to be run in the first place. It's not enough any more to just fire up the interpreter and run a test, you first have to build your code before you can get back to work, and it gets moved away into a separate directory and runs from there. So your workspace looks different depending on the environment you are currently testing with, and all your development tools have to support that as well. Even if the build step does not take half an hour, it's an otherwise unnecessary step that makes working and testing with Python 3 substantially less comfortable, and thus less likely to happen. And we all know where a reluctance against testing leads us. And, just for the record, we use 2to3 for Cython's code base, and I'm not convinced that this was a good decision. Testing the code in Py3 is actually something that I avoid if not strictly necessary, and that I leave to our CI server in most cases. I'm much more happy with lxml which was ported before there even was a 2to3, so it works on 2 and 3 out of the box. That alone makes it much nicer to develop on, and I think that it was clearly worth the additional porting work at the time. Stefan From martin at v.loewis.de Wed Dec 14 19:14:28 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 14 Dec 2011 19:14:28 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> Message-ID: <4EE8E784.2050406@v.loewis.de> Am 12.12.2011 10:04, schrieb Stefan Behnel: > "Martin v. L?wis", 11.12.2011 23:39: >>> I can't recall anyone working on any substantial improvements during the >>> last six years or so, and the reason for that seems obvious to me. >> >> What do you think is the reason? It's not at all obvious to me. > > Just to repeat myself for the third time here: lack of interest. Ah, that's certainly wrong. I am interested in these libraries. Regards, Martin From martin at v.loewis.de Wed Dec 14 19:18:13 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 14 Dec 2011 19:18:13 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> <4EE528BD.2040102@v.loewis.de> Message-ID: <4EE8E865.9070307@v.loewis.de> > Just look through the xml-sig page, basically all requests regarding > PyXML during the last five years deal with problems in installing it, > i.e. *before* even starting to use it. So you can't use this to claim > that people really *are* still using it. I'm not so sure. In many of these cases, it turned out that they were trying to run some software that uses PyXML, and that they tried installing PyXML to satisfy the prerequisite. So while they may not be software developers, they are indirectly "users" of PyXML. Regards, Martin From cpmicropro at gmail.com Wed Dec 14 17:04:49 2011 From: cpmicropro at gmail.com (Hossein) Date: Wed, 14 Dec 2011 19:34:49 +0330 Subject: [Python-Dev] Compiling the source without stat Message-ID: <4EE8C921.9000503@gmail.com> Hi. I just started to port latest python 2.7.2 to another platform (don't think you're eager to know... well it's CASIO ClassPad). And I faced a "minor" problem... It hasn't got stat or fstat or anything. (It supports a very limited set of c std lib). As pyport.c suggested, i defined both DONT_HAVE_STAT and DONT_HAVE_FSTAT, but problems only began. It failed to compile most of import.c, particularly because it fell into the wrong `#if !defined(PYOS_Something)' blocks. Sometimes it just fell into an #else part which assumed stat are available. So although HAVE_STAT is meant to control file operations, most of the source code aren't implement to use it. You see how "minor" the problem was? So now I need advice from developers. Is there a fix for it? What a question... definitely no replacement to stat. It's 99% definite that I can't compile further without touching the source code. I have to #define my own PYOS_whatever and handle files in my own way. In that case where should my special file handling cases go? I saw some marshal.c code which seemed it wants to abstract away platform's file handling from source code; but from what I understood it can't be made to use alternate file handling methods. If there is anything I should do (maybe show you my handmade pyconfig.h?) tell me. [My first post in a mailing list... Should I say] Best Regards, Hossein [in here?] From petri at digip.org Wed Dec 14 20:26:29 2011 From: petri at digip.org (Petri Lehtinen) Date: Wed, 14 Dec 2011 21:26:29 +0200 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <4EE8C921.9000503@gmail.com> References: <4EE8C921.9000503@gmail.com> Message-ID: <20111214192629.GA2054@ihaa> Hossein wrote: > Hi. I just started to port latest python 2.7.2 to another platform > (don't think you're eager to know... well it's CASIO ClassPad). > And I faced a "minor" problem... It hasn't got stat or fstat or > anything. (It supports a very limited set of c std lib). > As pyport.c suggested, i defined both DONT_HAVE_STAT and > DONT_HAVE_FSTAT, but problems only began. > It failed to compile most of import.c, particularly because it fell > into the wrong `#if !defined(PYOS_Something)' blocks. Sometimes it > just fell into an #else part which assumed stat are available. So > although HAVE_STAT is meant to control file operations, most of the > source code aren't implement to use it. You see how "minor" the > problem was? > So now I need advice from developers. > Is there a fix for it? What a question... definitely no replacement to stat. > It's 99% definite that I can't compile further without touching the > source code. I have to #define my own PYOS_whatever and handle files > in my own way. In that case where should my special file handling > cases go? I saw some marshal.c code which seemed it wants to > abstract away platform's file handling from source code; but from > what I understood it can't be made to use alternate file handling > methods. > If there is anything I should do (maybe show you my handmade > pyconfig.h?) tell me. See http://bugs.python.org/issue12082. Currently neither Python 2.x nor 3.x can be compiled without stat() or fstat(). Python 2.7 almost compiles, but Python 3 depends heavily on them. The problem boils down to the fact that you cannot really check whether a filesystem entry is a directory without calling stat() or fstat(). My personal opinion is that support for DONT_HAVE_STAT and DONT_HAVE_FSTAT defines should be dropped because they don't work, and would only be useful in a very limited set of cases. > [My first post in a mailing list... Should I say] Best Regards, n> Hossein [in here?] Yeah, why not? :) Regards, Petri From stefan_ml at behnel.de Wed Dec 14 20:41:42 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 14 Dec 2011 20:41:42 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE8E784.2050406@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> <4EE8E784.2050406@v.loewis.de> Message-ID: "Martin v. L?wis", 14.12.2011 19:14: > Am 12.12.2011 10:04, schrieb Stefan Behnel: >> "Martin v. L?wis", 11.12.2011 23:39: >>>> I can't recall anyone working on any substantial improvements during the >>>> last six years or so, and the reason for that seems obvious to me. >>> >>> What do you think is the reason? It's not at all obvious to me. >> >> Just to repeat myself for the third time here: lack of interest. > > Ah, that's certainly wrong. I am interested in these libraries. I meant: "lack of interest in improving them". It's clear from the discussion that there are still users and that new code is still being written that uses MiniDOM. However, I would argue that this cannot possibly be performance critical code and that it only deals with somewhat small documents. I say that because MiniDOM is evidently not suitable for large documents or performance critical applications, so this is the only explanation I have why the performance problems would not be obvious in the cases where it is still being used. And if they do show, it appears to be much more likely that users rewrite their code using ElementTree or lxml than that they try to fix MiniDOM's performance issues. Now, read my first quote above again (and preferably also its context, which I already emphasized in a previous post), it should be clearer now. Stefan From martin at v.loewis.de Wed Dec 14 20:51:15 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 14 Dec 2011 20:51:15 +0100 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <4EE8C921.9000503@gmail.com> References: <4EE8C921.9000503@gmail.com> Message-ID: <4EE8FE33.8040906@v.loewis.de> > It's 99% definite that I can't compile further without touching the > source code. I have to #define my own PYOS_whatever and handle files in > my own way. In that case where should my special file handling cases go? It's difficult to say how to proceed. On one hand, I don't see an overwhelming need to support systems without stat, and am tempted to say that you are on your own. On the other hand, it appears that people keep asking for it, from time to time. So if it was possible to support such systems without making the code too convoluted, it may be worth supporting it. One thing seems clear: without stat(), we cannot possibly support .pyc files, at least not in __pycache__. So one consequence of a lacking stat should be that all the code dealing with caching of byte code files needs to be disabled. Supporting .pyc as modules might still be possible. It's questionable how to deal with path searching in the absence of stat. Testing for the presence of a file is possible in principle by trying to open the file, and closing it when it was found to be present. So in the places where we only check for the presence of a file, an alternative implementation could be provided. In any case, it needs someone to champion such a project, preferably in an ongoing manner (i.e. several years). So if you are interested, you should - volunteer to maintain stat-less systems for some time - create a port of Python 3 that works stat-less - come back to python-dev for review to determine whether it's worth to support such systems. Alternatively, you can just make your own fork of Python, which you may or may not publish. Regards, Martin From catch-all at masklinn.net Wed Dec 14 20:54:50 2011 From: catch-all at masklinn.net (Xavier Morel) Date: Wed, 14 Dec 2011 20:54:50 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> <4EE8E784.2050406@v.loewis.de> Message-ID: <4503F565-5CD6-476A-9697-16FC5517659A@masklinn.net> On 2011-12-14, at 20:41 , Stefan Behnel wrote: > I meant: "lack of interest in improving them". It's clear from the discussion that there are still users and that new code is still being written that uses MiniDOM. However, I would argue that this cannot possibly be performance critical code and that it only deals with somewhat small documents. I say that because MiniDOM is evidently not suitable for large documents or performance critical applications, so this is the only explanation I have why the performance problems would not be obvious in the cases where it is still being used. And if they do show, it appears to be much more likely that users rewrite their code using ElementTree or lxml than that they try to fix MiniDOM's performance issues. Could also be because "XML is slow (and sucks)" is part of the global consciousness at this point, and that minidom is slow and verbose doesn't surprise much. From stefan_ml at behnel.de Wed Dec 14 20:59:06 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 14 Dec 2011 20:59:06 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4503F565-5CD6-476A-9697-16FC5517659A@masklinn.net> References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> <4EE8E784.2050406@v.loewis.de> <4503F565-5CD6-476A-9697-16FC5517659A@masklinn.net> Message-ID: Xavier Morel, 14.12.2011 20:54: > On 2011-12-14, at 20:41 , Stefan Behnel wrote: >> I meant: "lack of interest in improving them". It's clear from the >> discussion that there are still users and that new code is still being >> written that uses MiniDOM. However, I would argue that this cannot >> possibly be performance critical code and that it only deals with >> somewhat small documents. I say that because MiniDOM is evidently not >> suitable for large documents or performance critical applications, so >> this is the only explanation I have why the performance problems would >> not be obvious in the cases where it is still being used. And if they >> do show, it appears to be much more likely that users rewrite their >> code using ElementTree or lxml than that they try to fix MiniDOM's >> performance issues. > > Could also be because "XML is slow (and sucks)" is part of the global > consciousness at this point, and that minidom is slow and verbose > doesn't surprise much. Possibly, yes. Or that "Python is slow and sucks". But I think there are good counter arguments against both. Stefan From tjreedy at udel.edu Wed Dec 14 21:29:43 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 14 Dec 2011 15:29:43 -0500 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <20111214192629.GA2054@ihaa> References: <4EE8C921.9000503@gmail.com> <20111214192629.GA2054@ihaa> Message-ID: On 12/14/2011 2:26 PM, Petri Lehtinen wrote: > The problem boils down to the fact that you cannot really check > whether a filesystem entry is a directory without calling stat() or > fstat(). > > My personal opinion is that support for DONT_HAVE_STAT and > DONT_HAVE_FSTAT defines should be dropped because they don't work, and > would only be useful in a very limited set of cases. At present, it seems to be an attractive nuisance, tempting people like Hossein to try something that does not work. -- Terry Jan Reedy From martin at v.loewis.de Wed Dec 14 22:20:14 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 14 Dec 2011 22:20:14 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> <4EE8E784.2050406@v.loewis.de> Message-ID: <4EE9130E.7020104@v.loewis.de> Am 14.12.2011 20:41, schrieb Stefan Behnel: > "Martin v. L?wis", 14.12.2011 19:14: >> Am 12.12.2011 10:04, schrieb Stefan Behnel: >>> "Martin v. L?wis", 11.12.2011 23:39: >>>>> I can't recall anyone working on any substantial improvements >>>>> during the >>>>> last six years or so, and the reason for that seems obvious to me. >>>> >>>> What do you think is the reason? It's not at all obvious to me. >>> >>> Just to repeat myself for the third time here: lack of interest. >> >> Ah, that's certainly wrong. I am interested in these libraries. > > I meant: "lack of interest in improving them". That's also what I meant. I'm interested in improving them. > Now, read my first quote above again (and preferably also its context, > which I already emphasized in a previous post), it should be clearer now. I (now) know what you mean - but you are incorrect. Regards, Martin From stefan_ml at behnel.de Wed Dec 14 22:47:17 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 14 Dec 2011 22:47:17 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: <4EE9130E.7020104@v.loewis.de> References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> <4EE8E784.2050406@v.loewis.de> <4EE9130E.7020104@v.loewis.de> Message-ID: "Martin v. L?wis", 14.12.2011 22:20: > Am 14.12.2011 20:41, schrieb Stefan Behnel: >> "Martin v. L?wis", 14.12.2011 19:14: >>> Am 12.12.2011 10:04, schrieb Stefan Behnel: >>>> "Martin v. L?wis", 11.12.2011 23:39: >>>>>> I can't recall anyone working on any substantial improvements >>>>>> during the >>>>>> last six years or so, and the reason for that seems obvious to me. >>>>> >>>>> What do you think is the reason? It's not at all obvious to me. >>>> >>>> Just to repeat myself for the third time here: lack of interest. >>> >>> Ah, that's certainly wrong. I am interested in these libraries. >> >> I meant: "lack of interest in improving them". > > That's also what I meant. I'm interested in improving them. Then please do. I posted the numbers, so you know what the baseline is, both in terms of speed and memory usage. If you need further benchmarks of other areas of the API (e.g. tag search or whatever), just ask. Note, however, that even an improvement by an order of magnitude wouldn't solve the API issue for new users, so I'd still suggest to add an appropriate link towards ET to the MiniDOM documentation. Stefan From regebro at gmail.com Wed Dec 14 22:57:37 2011 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 14 Dec 2011 22:57:37 +0100 Subject: [Python-Dev] readd u'' literal support in 3.3? In-Reply-To: References: <1323320919.2710.24.camel@thinko> <20111208223408.0e2e8bd1@limelight.wooz.org> <20111209101123.01e92326@limelight.wooz.org> <1323679242.2710.350.camel@thinko> <1323724720.2710.388.camel@thinko> <4EE75634.6000208@voidspace.org.uk> <20111214013024.74addba7@pitrou.net> <4EE86914.1050904@v.loewis.de> Message-ID: On Wed, Dec 14, 2011 at 17:33, Tres Seaver wrote: > Not in the experience of the folks who are actually doing that task: ?the > overhead of running 2to3 every time 'setup.py develop' etc. runs dooms > the effort. ?For instance, we have a report that the 2to3 step takes more > than half an hour (on at least one user's development machine) when > installing / refreshing zope.interface in a Python 3.2 virtualenv. If that is true, then there has to be a bug somewhere... I might not have tried on 3.2 with virtualenv, but it doesn't take anywhere near that time normally, and this is not a normal runtime at all. When we are talking about 2to3 being slow here we are talking about it taking 10 seconds to install a software that would have taken under a second to install on Python 2. (Yes, I'm thinking of Distribute, I just checked. ;-) ). //Lennart From techtonik at gmail.com Thu Dec 15 09:58:31 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 15 Dec 2011 11:58:31 +0300 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: On Sat, Sep 24, 2011 at 11:27 AM, Georg Brandl wrote: > Am 24.09.2011 01:32, schrieb Guido van Rossum: > > On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik > wrote: > >> Currently if you work in console and define a function and then > >> immediately call it - it will fail with SyntaxError. > >> For example, copy paste this completely valid Python script into > console: > >> > >> def some(): > >> print "XXX" > >> some() > >> > >> There is an issue for that that was just closed by Eric. However, I'd > >> like to know if there are people here that agree that if you paste a > >> valid Python script into console - it should work without changes. > > > > You can't fix this without completely changing the way the interactive > > console treats blank lines. None that it's not just that a blank line > > is required after a function definition -- you also *can't* have a > > blank line *inside* a function definition. > > While the former could be changed (I think), the latter certainly cannot. > So it's probably not worth changing established behavior. I've just hit this UX bug once more, but now I more prepared. Despite Guido's proposal to move into python-ideas, I continue discussion here, because: 1. It is not a proposal, but a defect (well, you may argue, but please, don't) 2. This thread has a history of analysis of what's going wrong in console 3. This thread also has developer's decision that answers the question "why it's so wrong?" and "why it can't/won't be fixed" 4. Yesterday I've heard from a Java person that Python is hard to pick up and remembered how I struggled with indentation myself trying to 'learn by example' in console Right now I am trying to cope with point (3.). To summarize, let's speak code that is copy/pasted into console. Two things that will make me happy if they behave consistently in console from .py file: ---ex1--- def some(): print "XXX" some() ---/ex1--- --ex1.output-- [ex1.py] XXX [console] File "", line 3 some() ^ SyntaxError: invalid syntax --/ex1.output-- --ex2-- def some(): pass --/ex2-- --ex2.output-- [ex2.py] File "./ex2.py", line 2 pass ^ IndentationError: expected an indented block [console] File "", line 2 pass ^ IndentationError: expected an indented block --/ex2.output-- The second example already works as expected. Why it is not possible to fix ex1? Guido said: > You can't fix this without completely changing the way the interactive > console treats blank lines. But the fix doesn't require changing the way interactive console treats blank lines at all. It only requires to finish current block when a dedented line is encountered and not throwing obviously confusing SyntaxError. At the very least it should not say it is SyntaxError, because the code is pretty valid Python code. If it appears to be invalid "Python Console code" - the error message should say that explicitly. That would be a correct user-friendly fix for this UX issue, but I'd still like the behavior to be fixed - i.e. "allow dedented lines end current block in console without SyntaxError". Right now I don't see the reasons why it is not possible. Please speak code when replying about use cases/examples that will be broken - I didn't quite get the problem with "global scope if" statements. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Thu Dec 15 10:40:41 2011 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Thu, 15 Dec 2011 10:40:41 +0100 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: Il 15 dicembre 2011 09:58, anatoly techtonik ha scritto: > 1. It is not a proposal, but a defect (well, you may argue, but please, don't)> You can't copy/paste multiline scripts into system shell either, unless you append "\". It's likely that similar problems exists in a lot of other interactive shells (ruby?). And that makes sense to me, because they are supposed to be used interactively. It might be good to change this? Maybe. Is the current behavior objectively wrong? No, in my opinion. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ From hrvoje.niksic at avl.com Thu Dec 15 11:00:06 2011 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Thu, 15 Dec 2011 11:00:06 +0100 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <4EE8C921.9000503@gmail.com> References: <4EE8C921.9000503@gmail.com> Message-ID: <4EE9C526.3000404@avl.com> On 12/14/2011 05:04 PM, Hossein wrote: > If there is anything I should do You can determine what the code that calls stat() is trying to do, and implement that with other primitives that your platform provides. For example, you can determine whether a file exists by trying to open it in read-only mode and checking the error. You can find whether a filesystem path names a directory by trying to chdir into it and checking the error. You can find the size of a regular file by opening it and seeking to the end. These substitutions would not be acceptable for a desktop system, but may be perfectly adequate for an embedded one that doesn't provide stat() in the first place. Either way, I expect that you will need to modify the sources. Finally, are you 100% sure that your platform doesn't provide another API similar to stat()? From vinay_sajip at yahoo.co.uk Thu Dec 15 11:31:08 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Thu, 15 Dec 2011 10:31:08 +0000 (UTC) Subject: [Python-Dev] Proposed changes to provide compression support for rotated log files Message-ID: In response to http://bugs.python.org/issue13516 I'm thinking of implementing some changes in the rotating file handlers, as outlined here: http://plumberjack.blogspot.com/2011/12/improved-flexibility-for-log-file.html The changes (including tests) are almost ready to check in, but I thought I'd give any one here who's interested a chance to comment, in case they can spot any shortcomings of the approach I suggest. Regards, Vinay Sajip From cpmicropro at gmail.com Thu Dec 15 12:59:23 2011 From: cpmicropro at gmail.com (Hossein) Date: Thu, 15 Dec 2011 15:29:23 +0330 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <20111214192629.GA2054@ihaa> References: <20111214192629.GA2054@ihaa> Message-ID: <4EE9E11B.6090202@gmail.com> I wanted to say something in the bug page petri showed ( http://bugs.python.org/issue12082 ) however I though about first discussing it here. If faking a stat struct and a function to fill it solves the problem, and checking for existing files and folders is the only thing that python needs to be compiled (i'm talking about 2.7) then it's possible to fail-check it by just trying to open the file. If you don't want to change the stat mechanism, you can create a new #define which can let user point it to his own faked stat function and struct. I'm currently trying to fake stat to see what happens next, but I guess I will have more problems with file handling later. By the way, some people with the same problem there said they "used" python by setting the Py_DontWriteBytecodeFlag flag, but here my problem is that i can't compile it. Dunno what they really did. From mhazadmanesh2009 at gmail.com Thu Dec 15 12:41:20 2011 From: mhazadmanesh2009 at gmail.com (Hossein Azadmanesh) Date: Thu, 15 Dec 2011 15:11:20 +0330 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <4EE9C526.3000404@avl.com> References: <4EE9C526.3000404@avl.com> Message-ID: <4EE9DCE0.3090701@gmail.com> It does have its own file handling functions: Opening, getting the size, enumerating directories, etc. It has its own limitations too. No dates supported, folders only one level deep, maximum 99 files inside each folder, etc. There is not a function called stat. But I am considering faking it, will explain in another reply. From solipsis at pitrou.net Thu Dec 15 14:58:26 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 15 Dec 2011 14:58:26 +0100 Subject: [Python-Dev] Proposed changes to provide compression support for rotated log files References: Message-ID: <20111215145826.770ddff1@pitrou.net> On Thu, 15 Dec 2011 10:31:08 +0000 (UTC) Vinay Sajip wrote: > In response to http://bugs.python.org/issue13516 I'm thinking of implementing > some changes in the rotating file handlers, as outlined here: > > http://plumberjack.blogspot.com/2011/12/improved-flexibility-for-log-file.html > > The changes (including tests) are almost ready to check in, but I thought I'd > give any one here who's interested a chance to comment, in case they can spot > any shortcomings of the approach I suggest. "def filename(self, name)" sounds like a poor method name. Regards Antoine. From victor.stinner at haypocalc.com Thu Dec 15 15:03:16 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 15 Dec 2011 15:03:16 +0100 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <4EE9E11B.6090202@gmail.com> References: <20111214192629.GA2054@ihaa> <4EE9E11B.6090202@gmail.com> Message-ID: <4615736.IsbhkMj20n@ned> Le jeudi 15 d?cembre 2011 15:29:23 vous avez ?crit : > If faking a stat struct and a function to fill it > solves the problem, and checking for existing files and folders is the > only thing that python needs to be compiled (i'm talking about 2.7) then > it's possible to fail-check it by just trying to open the file. It's better to only work on Python 3.3. I consider "support platform without stat" as a new feature, and new features are only accepted in Python 3.3. Victor From tjreedy at udel.edu Thu Dec 15 19:40:57 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 15 Dec 2011 13:40:57 -0500 Subject: [Python-Dev] Proposed changes to provide compression support for rotated log files In-Reply-To: References: Message-ID: On 12/15/2011 5:31 AM, Vinay Sajip wrote: > In response to http://bugs.python.org/issue13516 I'm thinking of implementing > some changes in the rotating file handlers, as outlined here: > > http://plumberjack.blogspot.com/2011/12/improved-flexibility-for-log-file.html > > The changes (including tests) are almost ready to check in, but I thought I'd > give any one here who's interested a chance to comment, in case they can spot > any shortcomings of the approach I suggest. It appears you are adding two methods to do the same thing. One is to subclass and override one or two functions. The other is to define one or two custom functions and attach as attributes. Both seem equally easy. (Actually, subclassing takes one line less.) Are both really needed? -- Terry Jan Reedy From vinay_sajip at yahoo.co.uk Thu Dec 15 19:49:18 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Thu, 15 Dec 2011 18:49:18 +0000 (UTC) Subject: [Python-Dev] Proposed changes to provide compression support for rotated log files References: <20111215145826.770ddff1@pitrou.net> Message-ID: Antoine Pitrou pitrou.net> writes: > > "def filename(self, name)" sounds like a poor method name. > You're right - perhaps "def rotation_filename(self, default_name)" is better. Regards, Vinay Sajip From vinay_sajip at yahoo.co.uk Thu Dec 15 19:56:26 2011 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Thu, 15 Dec 2011 18:56:26 +0000 (UTC) Subject: [Python-Dev] Proposed changes to provide compression support for rotated log files References: Message-ID: Terry Reedy udel.edu> writes: > > It appears you are adding two methods to do the same thing. One is to > subclass and override one or two functions. The other is to define one > or two custom functions and attach as attributes. Both seem equally > easy. (Actually, subclassing takes one line less.) Are both really needed? > That's why I asked for comments. Subclassing can be avoided if the callable attributes are used, which is a win, for example, if you have both timed and non-timed rotating handlers: you can use the same callables in each case, whereas with subclassing you would have to subclass both the timed and non-timed handler classes. Also, in scenarios where one might want to use alternative compression formats based on an application's configuration, there would be less work because one wouldn't need to create multiple subclasses. So for most cases the strategy would be to use the callable attributes, and if they were inappropriate for some reason, they could subclass and override the methods. I've factored out the two methods from the existing implementation because at the moment, it's hard to subclass without copying the whole doRollover method (as in the ActiveState example). Regards, Vinay Sajip From tjreedy at udel.edu Thu Dec 15 20:06:30 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 15 Dec 2011 14:06:30 -0500 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: On 12/15/2011 3:58 AM, anatoly techtonik wrote: > 1. It is not a proposal, but a defect (well, you may argue, but please, > don't) You state a controversial opinion as a fact and then request that others not discuss it. To me, this is a somewhat obnoxious hit-and-run tactic. If you do not want the point discussed, don't bring it up. Anyway, I will follow your request and not argue. Since that opinion is a central point, not discussing it does not leave much to say. -- Terry Jan Reedy From victor.stinner at haypocalc.com Thu Dec 15 20:45:42 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 15 Dec 2011 20:45:42 +0100 Subject: [Python-Dev] French sprint this week-end Message-ID: <4EEA4E66.3040008@haypocalc.com> Hi, I organize an online sprint on CPython this week-end with french developers. At least six developers will participe, some of them don't know C, most know Python. Do you know simple task to start contributing to Python? Something useful and not boring if possible :-) There is the "easy" tag on the bug tracker, but many issues have a long history, already have a patch, etc. Do know other generic task like improving code coverage or support of some rare platforms? Eric Araujo, Antoine Pitrou and Charles Fran?ois Natali should help me, so I'm not alone to organize the sprint. Don't watch the buildbot until Monday. You can expect more activity on our bug tracker (and maybe on the #python-dev channel) ;-) -- If you speak french, join #python-dev-fr IRC channel (on Freenode) and see the wiki page http://wiki.python.org/moin/SprintFranceDec2011 Victor From nadeem.vawda at gmail.com Thu Dec 15 22:07:55 2011 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Thu, 15 Dec 2011 23:07:55 +0200 Subject: [Python-Dev] [Python-checkins] cpython: input() in this sense is gone In-Reply-To: References: Message-ID: On Thu, Dec 15, 2011 at 10:44 PM, benjamin.peterson wrote: > +# ? ? ? eval_input is the input for the eval() functions. Shouldn't this be "function" rather than "functions"? From mark at hotpy.org Thu Dec 15 23:18:18 2011 From: mark at hotpy.org (Mark Shannon) Date: Thu, 15 Dec 2011 22:18:18 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: Message-ID: <4EEA722A.10403@hotpy.org> Hi all, The current dict implementation is getting pretty old, isn't it time we had a new one (for xmas)? I have a new dict implementation which allows sharing of keys between objects of the same class. You can check it out here: http://bitbucket.org/markshannon/hotpy_new_dict Performance: For numerical applications, with few instances of user-defined classes, performance is pretty much unchanged, degrading about 1% for pystones. For applications that create lots of instances of user-defined classes, performance is improved and memory savings are large. For the gcbench benchmark (from unladen swallow), cpython with the new dict is about 9% faster and, more importantly, reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction). All tests were done on my ancient 32 bit intel linux machine, please try it out on your machines and let me know what sort of results you get. By the way it passes all the tests, but there are strange interactions with weakrefs and the GC. (Try running the tests, you'll see what I mean) Cheers, Mark From solipsis at pitrou.net Fri Dec 16 00:15:16 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 16 Dec 2011 00:15:16 +0100 Subject: [Python-Dev] A new dict for Xmas? References: <4EEA722A.10403@hotpy.org> Message-ID: <20111216001516.3698109e@pitrou.net> On Thu, 15 Dec 2011 22:18:18 +0000 Mark Shannon wrote: > > For the gcbench benchmark (from unladen swallow), > cpython with the new dict is about 9% faster and, more importantly, > reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction). > > All tests were done on my ancient 32 bit intel linux machine, > please try it out on your machines and let me know what sort of results > you get. Benchmark results under a Core i5, 64-bit Linux: Report on Linux localhost.localdomain 2.6.38.8-desktop-8.mga #1 SMP Fri Nov 4 00:05:53 UTC 2011 x86_64 x86_64 Total CPU cores: 4 ### call_method ### Min: 0.292352 -> 0.274041: 1.07x faster Avg: 0.292978 -> 0.277124: 1.06x faster Significant (t=17.31) Stddev: 0.00053 -> 0.00351: 6.5719x larger ### call_method_slots ### Min: 0.284101 -> 0.273508: 1.04x faster Avg: 0.285029 -> 0.274534: 1.04x faster Significant (t=26.86) Stddev: 0.00068 -> 0.00135: 1.9969x larger ### call_simple ### Min: 0.225191 -> 0.222104: 1.01x faster Avg: 0.227443 -> 0.222776: 1.02x faster Significant (t=9.53) Stddev: 0.00181 -> 0.00056: 3.2266x smaller ### fastpickle ### Min: 0.482402 -> 0.493695: 1.02x slower Avg: 0.486077 -> 0.496568: 1.02x slower Significant (t=-5.35) Stddev: 0.00340 -> 0.00276: 1.2335x smaller ### fastunpickle ### Min: 0.394846 -> 0.433733: 1.10x slower Avg: 0.397362 -> 0.436318: 1.10x slower Significant (t=-23.73) Stddev: 0.00234 -> 0.00283: 1.2129x larger ### float ### Min: 0.052567 -> 0.051377: 1.02x faster Avg: 0.053812 -> 0.052669: 1.02x faster Significant (t=3.72) Stddev: 0.00110 -> 0.00107: 1.0203x smaller ### json_dump ### Min: 0.381395 -> 0.391053: 1.03x slower Avg: 0.381937 -> 0.393219: 1.03x slower Significant (t=-7.15) Stddev: 0.00043 -> 0.00350: 8.1447x larger ### json_load ### Min: 0.347112 -> 0.369763: 1.07x slower Avg: 0.347490 -> 0.370317: 1.07x slower Significant (t=-69.64) Stddev: 0.00045 -> 0.00058: 1.2717x larger ### nbody ### Min: 0.238068 -> 0.219208: 1.09x faster Avg: 0.238951 -> 0.220000: 1.09x faster Significant (t=36.09) Stddev: 0.00076 -> 0.00090: 1.1863x larger ### nqueens ### Min: 0.262282 -> 0.252576: 1.04x faster Avg: 0.263835 -> 0.254497: 1.04x faster Significant (t=7.12) Stddev: 0.00117 -> 0.00269: 2.2914x larger ### regex_effbot ### Min: 0.060298 -> 0.057791: 1.04x faster Avg: 0.060435 -> 0.058128: 1.04x faster Significant (t=17.82) Stddev: 0.00012 -> 0.00026: 2.1761x larger ### richards ### Min: 0.148266 -> 0.143755: 1.03x faster Avg: 0.150677 -> 0.145003: 1.04x faster Significant (t=5.74) Stddev: 0.00200 -> 0.00094: 2.1329x smaller ### silent_logging ### Min: 0.057191 -> 0.059082: 1.03x slower Avg: 0.057335 -> 0.059194: 1.03x slower Significant (t=-17.40) Stddev: 0.00020 -> 0.00013: 1.4948x smaller ### unpack_sequence ### Min: 0.000046 -> 0.000042: 1.10x faster Avg: 0.000048 -> 0.000044: 1.09x faster Significant (t=128.98) Stddev: 0.00000 -> 0.00000: 1.8933x smaller gcbench first showed no memory consumption difference (using "ps -u"). I then removed the "stretch tree" (which apparently reserves memory upfront) and I saw a ~30% memory saving as well as a 20% performance improvement on large sizes. Regards Antoine. From martin at v.loewis.de Fri Dec 16 00:16:29 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 16 Dec 2011 00:16:29 +0100 Subject: [Python-Dev] Compiling the source without stat In-Reply-To: <4EE9E11B.6090202@gmail.com> References: <20111214192629.GA2054@ihaa> <4EE9E11B.6090202@gmail.com> Message-ID: <4EEA7FCD.3060805@v.loewis.de> Am 15.12.2011 12:59, schrieb Hossein: > I wanted to say something in the bug page petri showed ( > http://bugs.python.org/issue12082 ) however I though about first > discussing it here. If faking a stat struct and a function to fill it > solves the problem, and checking for existing files and folders is the > only thing that python needs to be compiled (i'm talking about 2.7) then > it's possible to fail-check it by just trying to open the file. That's not true. It also looks at the file modification time. Regards, Martin From ncoghlan at gmail.com Fri Dec 16 00:18:16 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 16 Dec 2011 09:18:16 +1000 Subject: [Python-Dev] [Python-checkins] cpython: improve abstract property support (closes #11610) In-Reply-To: References: Message-ID: On Fri, Dec 16, 2011 at 6:34 AM, benjamin.peterson wrote: > +abc > +--- > + > +Improved support for abstract base classes containing descriptors composed with > +abstract methods. The recommended approach to declaring abstract descriptors is > +now to provide :attr:`__isabstractmethod__` as a dynamically updated > +property. The built-in descriptors have been updated accordingly. > + > + ?* :class:`abc.abstractproperty` has been deprecated, use :class:`property` > + ? ?with :func:`abc.abstractmethod` instead. > + ?* :class:`abc.abstractclassmethod` has been deprecated, use > + ? ?:class:`classmethod` with :func:`abc.abstractmethod` instead. > + ?* :class:`abc.abstractstaticmethod` has been deprecated, use > + ? ?:class:`property` with :func:`abc.abstractmethod` instead. > + > +(Contributed by Darren Dale in :issue:`11610`) s/property/staticmethod/ in the final bullet point here. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mark at hotpy.org Fri Dec 16 00:43:35 2011 From: mark at hotpy.org (Mark Shannon) Date: Thu, 15 Dec 2011 23:43:35 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <20111216001516.3698109e@pitrou.net> References: <4EEA722A.10403@hotpy.org> <20111216001516.3698109e@pitrou.net> Message-ID: <4EEA8627.5030500@hotpy.org> Antoine Pitrou wrote: > On Thu, 15 Dec 2011 22:18:18 +0000 > Mark Shannon wrote: >> For the gcbench benchmark (from unladen swallow), >> cpython with the new dict is about 9% faster and, more importantly, >> reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction). >> >> All tests were done on my ancient 32 bit intel linux machine, >> please try it out on your machines and let me know what sort of results >> you get. > > Benchmark results under a Core i5, 64-bit Linux: > > Report on Linux localhost.localdomain 2.6.38.8-desktop-8.mga #1 SMP Fri > Nov 4 00:05:53 UTC 2011 x86_64 x86_64 Total CPU cores: 4 > > ### call_method ### > Min: 0.292352 -> 0.274041: 1.07x faster > Avg: 0.292978 -> 0.277124: 1.06x faster > Significant (t=17.31) > Stddev: 0.00053 -> 0.00351: 6.5719x larger > > ### call_method_slots ### > Min: 0.284101 -> 0.273508: 1.04x faster > Avg: 0.285029 -> 0.274534: 1.04x faster > Significant (t=26.86) > Stddev: 0.00068 -> 0.00135: 1.9969x larger > > ### call_simple ### > Min: 0.225191 -> 0.222104: 1.01x faster > Avg: 0.227443 -> 0.222776: 1.02x faster > Significant (t=9.53) > Stddev: 0.00181 -> 0.00056: 3.2266x smaller > > ### fastpickle ### > Min: 0.482402 -> 0.493695: 1.02x slower > Avg: 0.486077 -> 0.496568: 1.02x slower > Significant (t=-5.35) > Stddev: 0.00340 -> 0.00276: 1.2335x smaller > > ### fastunpickle ### > Min: 0.394846 -> 0.433733: 1.10x slower > Avg: 0.397362 -> 0.436318: 1.10x slower > Significant (t=-23.73) > Stddev: 0.00234 -> 0.00283: 1.2129x larger > > ### float ### > Min: 0.052567 -> 0.051377: 1.02x faster > Avg: 0.053812 -> 0.052669: 1.02x faster > Significant (t=3.72) > Stddev: 0.00110 -> 0.00107: 1.0203x smaller > > ### json_dump ### > Min: 0.381395 -> 0.391053: 1.03x slower > Avg: 0.381937 -> 0.393219: 1.03x slower > Significant (t=-7.15) > Stddev: 0.00043 -> 0.00350: 8.1447x larger > > ### json_load ### > Min: 0.347112 -> 0.369763: 1.07x slower > Avg: 0.347490 -> 0.370317: 1.07x slower > Significant (t=-69.64) > Stddev: 0.00045 -> 0.00058: 1.2717x larger > > ### nbody ### > Min: 0.238068 -> 0.219208: 1.09x faster > Avg: 0.238951 -> 0.220000: 1.09x faster > Significant (t=36.09) > Stddev: 0.00076 -> 0.00090: 1.1863x larger > > ### nqueens ### > Min: 0.262282 -> 0.252576: 1.04x faster > Avg: 0.263835 -> 0.254497: 1.04x faster > Significant (t=7.12) > Stddev: 0.00117 -> 0.00269: 2.2914x larger > > ### regex_effbot ### > Min: 0.060298 -> 0.057791: 1.04x faster > Avg: 0.060435 -> 0.058128: 1.04x faster > Significant (t=17.82) > Stddev: 0.00012 -> 0.00026: 2.1761x larger > > ### richards ### > Min: 0.148266 -> 0.143755: 1.03x faster > Avg: 0.150677 -> 0.145003: 1.04x faster > Significant (t=5.74) > Stddev: 0.00200 -> 0.00094: 2.1329x smaller > > ### silent_logging ### > Min: 0.057191 -> 0.059082: 1.03x slower > Avg: 0.057335 -> 0.059194: 1.03x slower > Significant (t=-17.40) > Stddev: 0.00020 -> 0.00013: 1.4948x smaller > > ### unpack_sequence ### > Min: 0.000046 -> 0.000042: 1.10x faster > Avg: 0.000048 -> 0.000044: 1.09x faster > Significant (t=128.98) > Stddev: 0.00000 -> 0.00000: 1.8933x smaller Thanks for running the benchmarks. It's probably best not to attach to much significance to a few percent her and there, but its good to see that performance is OK. > > > gcbench first showed no memory consumption difference (using "ps -u"). > I then removed the "stretch tree" (which apparently reserves memory > upfront) and I saw a ~30% memory saving as well as a 20% performance > improvement on large sizes. I should say how I did my memory tests. I did a search using ulimit to limit the maximum amount of memory the process was allowed. The given numbers were the minimum required to complete, I did not remove the "stretch tree". Cheers, Mark. From ron3200 at gmail.com Fri Dec 16 05:15:53 2011 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 15 Dec 2011 22:15:53 -0600 Subject: [Python-Dev] generators and ceval Message-ID: <1324008953.18721.18.camel@Gutsy> Hi, I Just added issue 13607 with a patch that removes the generator specific checks and code out of the ceval PyEval_EvalFrameEx() function. Those parts where moved up into the generator gen_send_ex() function. Doing that removed the generator flag checks from the eval loop and made it a bit cleaner. In order to do that, I needed to give generators a why to look at the 'why' value. Doing that also cleaned up the code in gen_sendex() as it can use the 'why' in a select instead of several indirect if tests. http://bugs.python.org/issue13607 Altogether it made yields about 10% faster, and everything else about 2%-3% faster (on average). But it does need to be checked. Cheers, Ron From greg.ewing at canterbury.ac.nz Fri Dec 16 06:57:06 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 16 Dec 2011 18:57:06 +1300 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EEA722A.10403@hotpy.org> References: <4EEA722A.10403@hotpy.org> Message-ID: <4EEADDB2.2020202@canterbury.ac.nz> Mark Shannon wrote: > I have a new dict implementation which allows sharing of keys between > objects of the same class. We already have the __slots__ mechanism for memory savings. Have you done any comparisons with that? Seems to me that __slots__ ought to save even more memory, since it eliminates the per-instance dict altogether rather than just the keys half of it. -- Greg From stefan_ml at behnel.de Fri Dec 16 07:53:09 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 16 Dec 2011 07:53:09 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: Stefan Behnel, 09.12.2011 09:02: > I think Py3.3 would be a good milestone for cleaning up the stdlib support > for XML. > [...] I still think it is, so let me sum up the current discussion here. > What should change? > > a) The stdlib documentation should help users to choose the right tool > right from the start. It looks like there's agreement on this part. > Instead of using the totally misleading wording that > it uses now, it should be honest about the performance characteristics of > MiniDOM and should actively suggest that those who don't know what to > choose (or even *that* they can choose) should not use MiniDOM in the first > place. There was some disagreement on whether MiniDOM should publicly disclose its performance characteristics in the documentation, and whether its use should be discouraged, even just for new users. However, it seemed that there was enough consensus to settle on Nick Coghlan's proposal for a compromise to move ElementTree up to the top of the list, and to add a visible note to the top of each of the XML modules like this: "Note: The module is a . If all you are trying to do is read and write XML files, consider using the xml.etree.ElementTree module instead" That template could (with a bit of peaking into the getopt documentation) be expanded into the following. """ [[Note: The xml.dom.minidom module provides an implementation of the W3C-DOM whose API is similar to that in other programming languages. Users who are unfamiliar with the W3C-DOM interface or who would like to write less code for processing XML files should consider using the xml.etree.ElementTree module instead.]] """ I think this should go on the xml.dom.minidom page as well as the xml.dom package page. Hand-wavingly, users who are new to the DOM are more likely to hit the package page first, whereas those who know it already will likely find the MiniDOM page directly. Note that I'd still encourage the removal of the misleading word "lightweight" until it makes sense to put it back in a meaningful way. I therefore propose the following minimalistic changes to the first paragraph on the minidom page: """ xml.dom.minidom is a [-XXX: light-weight] implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also [+XXX: provide a] significantly smaller [+XXX: API]. """ @Martin: note how the original paragraph does not refer to "4DOM" or "PyXML". It only generically mentions "the DOM interface". It is certainly not true that MiniDOM is more "light-weight" and "significantly smaller" than (most) other DOM interface implementations outside of the Python world, for example. So the current wording actually makes no sense at all. Additionally, the documentation on the xml.sax page would benefit from the following paragraph: """ [[Note: The xml.sax package provides an implementation of the SAX interface whose API is similar to that in other programming languages. Users who are unfamiliar with the SAX interface or who would like to write less code for efficient stream processing of XML files should consider using the iterparse() function in the xml.etree.ElementTree module instead.]] """ If these changes are considered acceptable, I'll copy the above over to the documentation bug I opened at http://bugs.python.org/issue11379 Can these doc changes go into both 2.7 and 3.3? Given that there is no important difference between the implementations, I don't see why the documentation should differ in Py2. > b) cElementTree should finally loose it's "special" status as a separate > library and disappear as an accelerator module behind ElementTree. There was no opposition and a general agreement on this in the thread, except for the warning that Fredrik Lundh should have a word in this. I wrote him an e-mail and didn't get a response so far. We can wait a little longer, I guess, there's still time before 3.3beta. Stefan From ncoghlan at gmail.com Fri Dec 16 09:54:17 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 16 Dec 2011 18:54:17 +1000 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: On Fri, Dec 16, 2011 at 4:53 PM, Stefan Behnel wrote: > If these changes are considered acceptable, I'll copy the above over to the > documentation bug I opened at > > http://bugs.python.org/issue11379 > > Can these doc changes go into both 2.7 and 3.3? Given that there is no > important difference between the implementations, I don't see why the > documentation should differ in Py2. Your suggested tweaks look good to me and could go into all of 2.7, 3.2 and 3.3 >> b) cElementTree should finally loose it's "special" status as a separate >> library and disappear as an accelerator module behind ElementTree. > > There was no opposition and a general agreement on this in the thread, > except for the warning that Fredrik Lundh should have a word in this. I > wrote him an e-mail and didn't get a response so far. We can wait a little > longer, I guess, there's still time before 3.3beta. Having ElementTree implicitly do "from _elementtree import *" is a 3.3 only change, though. (Note that xml.etree.cElementTree isn't the actual acceleration module - that honor already goes to "_elementtree". The only bit missing is the automatic import in xml.etree.ElementTree and the appropriate test updates to ensure the Python version still gets tested) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stefan at bytereef.org Fri Dec 16 10:00:29 2011 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 16 Dec 2011 10:00:29 +0100 Subject: [Python-Dev] French sprint this week-end In-Reply-To: <4EEA4E66.3040008@haypocalc.com> References: <4EEA4E66.3040008@haypocalc.com> Message-ID: <20111216090029.GA30463@sleipnir.bytereef.org> Victor Stinner wrote: > Do you know simple task to start contributing to Python? Something > useful and not boring if possible :-) There is the "easy" tag on the bug > tracker, but many issues have a long history, already have a patch, etc. > Do know other generic task like improving code coverage or support of > some rare platforms? On some buildbots compiler warnings are starting to accumulate. Installing a recent version of gcc and fixing those might be a good task. If the participants are new to buildbot, it might even be interesting for them. :) Stefan Krah From eliben at gmail.com Fri Dec 16 10:17:33 2011 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 16 Dec 2011 11:17:33 +0200 Subject: [Python-Dev] French sprint this week-end In-Reply-To: <20111216090029.GA30463@sleipnir.bytereef.org> References: <4EEA4E66.3040008@haypocalc.com> <20111216090029.GA30463@sleipnir.bytereef.org> Message-ID: On Fri, Dec 16, 2011 at 11:00, Stefan Krah wrote: > Victor Stinner wrote: > > Do you know simple task to start contributing to Python? Something > > useful and not boring if possible :-) There is the "easy" tag on the bug > > tracker, but many issues have a long history, already have a patch, etc. > > Do know other generic task like improving code coverage or support of > > some rare platforms? > > On some buildbots compiler warnings are starting to accumulate. Installing > a recent version of gcc and fixing those might be a good task. If the > participants are new to buildbot, it might even be interesting for them. :) > > Do we have buildbots that build Python with Clang instead of GCC? The reason I'm asking is that Clang's diagnostics are usually better, and fixing all its warnings could nicely complement fixing GCC's qualms. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirkjan at ochtman.nl Fri Dec 16 10:32:11 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 16 Dec 2011 10:32:11 +0100 Subject: [Python-Dev] French sprint this week-end In-Reply-To: References: <4EEA4E66.3040008@haypocalc.com> <20111216090029.GA30463@sleipnir.bytereef.org> Message-ID: On Fri, Dec 16, 2011 at 10:17, Eli Bendersky wrote: > Do we have buildbots that build Python with Clang instead of GCC? The reason > I'm asking is that Clang's diagnostics are usually better, and fixing all > its warnings could nicely complement fixing GCC's qualms. The box running my buildslave has clang installed, so someone with access to the buildmaster could probably set that up without too much trouble. Cheers, Dirkjan From mark at hotpy.org Fri Dec 16 11:03:30 2011 From: mark at hotpy.org (Mark Shannon) Date: Fri, 16 Dec 2011 10:03:30 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EEADDB2.2020202@canterbury.ac.nz> References: <4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz> Message-ID: <4EEB1772.1030300@hotpy.org> Greg Ewing wrote: > Mark Shannon wrote: > >> I have a new dict implementation which allows sharing of keys between >> objects of the same class. > > We already have the __slots__ mechanism for memory savings. > Have you done any comparisons with that? > You can't make Python programmers use slots, neither can you automatically change existing programs. Are you suggesting that because the __slots__ mechanism exists, the dict implementation doesn't have to be efficient? > Seems to me that __slots__ ought to save even more memory, > since it eliminates the per-instance dict altogether rather > than just the keys half of it. > Of course using __slots__ saves more memory, but people don't use them much. Cheers, Mark. From stefan_ml at behnel.de Fri Dec 16 17:00:26 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 16 Dec 2011 17:00:26 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: <4EE1C9AB.2040301@v.loewis.de> <4EE53139.8020500@v.loewis.de> <4EE8E784.2050406@v.loewis.de> Message-ID: Stefan Behnel, 14.12.2011 20:41: > It's clear from the > discussion that there are still users and that new code is still being > written that uses MiniDOM. However, I would argue that this cannot possibly > be performance critical code and that it only deals with somewhat small > documents. I say that because MiniDOM is evidently not suitable for large > documents or performance critical applications, so this is the only > explanation I have why the performance problems would not be obvious in the > cases where it is still being used. And if they do show, it appears to be > much more likely that users rewrite their code using ElementTree or lxml > than that they try to fix MiniDOM's performance issues. Out of curiosity, I reran my benchmarks under PyPy 1.7. http://blog.behnel.de/index.php?p=210 In short: MiniDOM performs substantially better there, both in terms of time and space. That by itself doesn't make PyPy an interesting platform for XML processing (using lxml in CPython is way faster), but I found it interesting to note that the problem is not strictly inherent in MiniDOM. It also depends a lot on the runtime environment, even when it comes to memory usage. Stefan From devel at baptiste-carvello.net Fri Dec 16 17:40:02 2011 From: devel at baptiste-carvello.net (Baptiste Carvello) Date: Fri, 16 Dec 2011 17:40:02 +0100 Subject: [Python-Dev] Fixing the XML batteries In-Reply-To: References: Message-ID: Le 16/12/2011 07:53, Stefan Behnel a ?crit : > Additionally, the documentation on the xml.sax page would benefit from > the following paragraph: > > """ > [[Note: The xml.sax package provides an implementation of the SAX > interface whose API is similar to that in other programming languages. > Users who are unfamiliar with the SAX interface or who would like to > write less code for efficient stream processing of XML files should > consider using the iterparse() function in the xml.etree.ElementTree > module instead.]] > """ > A small caveat to note about iterparse(), which I otherwise like a lot: when processing very big data (I encountered this with a region-wide openstreetmap XML dump), you have to remove the processed nodes from the root element. Otherwise, its memory footprint increases with the size of the document. From status at bugs.python.org Fri Dec 16 18:07:29 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 16 Dec 2011 18:07:29 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20111216170729.974CD1DEC6@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-12-09 - 2011-12-16) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3175 ( +6) closed 22220 (+40) total 25395 (+46) Open issues with patches: 1360 Issues opened (31) ================== #11886: test_time.test_tzset() fails on "x86 FreeBSD 7.2 3.x": AEST ti http://bugs.python.org/issue11886 reopened by haypo #13571: Backup files support in IDLE http://bugs.python.org/issue13571 opened by maniram.maniram #13572: import _curses fails because of UnicodeDecodeError('utf8' code http://bugs.python.org/issue13572 opened by haypo #13573: csv.writer uses str() for floats instead of repr() http://bugs.python.org/issue13573 opened by rhettinger #13574: refresh example in doc for Extending and Embedding http://bugs.python.org/issue13574 opened by flox #13576: Handling of broken condcoms in HTMLParser http://bugs.python.org/issue13576 opened by ezio.melotti #13577: __qualname__ is not present on builtin methods and functions http://bugs.python.org/issue13577 opened by meador.inge #13578: Add subprocess.iter_output() convenience function http://bugs.python.org/issue13578 opened by ncoghlan #13579: string.Formatter doesn't understand the !a conversion specifie http://bugs.python.org/issue13579 opened by ncoghlan #13581: help() appears to be broken; doesn't display __doc__ for class http://bugs.python.org/issue13581 opened by christopherthemagnificent #13582: IDLE and pythonw.exe stderr problem http://bugs.python.org/issue13582 opened by serwy #13583: sqlite3.Row doesn't support slice indexes http://bugs.python.org/issue13583 opened by xapple #13585: Add contextlib.CleanupManager http://bugs.python.org/issue13585 opened by Nikratio #13586: Replace selected not working/consistent with find http://bugs.python.org/issue13586 opened by marco #13587: Correcting the typos error in Doc/howto/urllib2.rst http://bugs.python.org/issue13587 opened by Bithin.A #13588: Change name of internal closure functions in importlib http://bugs.python.org/issue13588 opened by brett.cannon #13589: Aifc low level serialization primitives fix http://bugs.python.org/issue13589 opened by Oleg.Plakhotnyuk #13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex http://bugs.python.org/issue13590 opened by teamnoir #13592: repr(regex) doesn't include actual regex http://bugs.python.org/issue13592 opened by dwt #13594: Aifc markers write fix http://bugs.python.org/issue13594 opened by Oleg.Plakhotnyuk #13598: string.Formatter doesn't support empty curly braces "{}" http://bugs.python.org/issue13598 opened by maniram.maniram #13601: sys.stderr should be unbuffered (or always line-buffered) http://bugs.python.org/issue13601 opened by pitrou #13604: update PEP 393 (match implementation) http://bugs.python.org/issue13604 opened by Jim.Jewett #13605: document argparse's nargs=REMAINDER http://bugs.python.org/issue13605 opened by bethard #13607: Move generator specific sections out of ceval. http://bugs.python.org/issue13607 opened by ron_adam #13608: remove born-deprecated PyUnicode_AsUnicodeAndSize http://bugs.python.org/issue13608 opened by Jim.Jewett #13609: Add "os.get_terminal_size()" function http://bugs.python.org/issue13609 opened by denilsonsa #13610: On Python parsing numbers. http://bugs.python.org/issue13610 opened by Jean-Michel.Fauth #13611: Integrate ElementC14N module into xml.etree package http://bugs.python.org/issue13611 opened by scoder #13612: xml.etree.ElementTree says unknown encoding of a regular encod http://bugs.python.org/issue13612 opened by dongying #13613: Small error in regular expression poker hand example http://bugs.python.org/issue13613 opened by Eddie E Most recent 15 issues with no replies (15) ========================================== #13611: Integrate ElementC14N module into xml.etree package http://bugs.python.org/issue13611 #13608: remove born-deprecated PyUnicode_AsUnicodeAndSize http://bugs.python.org/issue13608 #13605: document argparse's nargs=REMAINDER http://bugs.python.org/issue13605 #13594: Aifc markers write fix http://bugs.python.org/issue13594 #13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex http://bugs.python.org/issue13590 #13587: Correcting the typos error in Doc/howto/urllib2.rst http://bugs.python.org/issue13587 #13586: Replace selected not working/consistent with find http://bugs.python.org/issue13586 #13583: sqlite3.Row doesn't support slice indexes http://bugs.python.org/issue13583 #13579: string.Formatter doesn't understand the !a conversion specifie http://bugs.python.org/issue13579 #13576: Handling of broken condcoms in HTMLParser http://bugs.python.org/issue13576 #13574: refresh example in doc for Extending and Embedding http://bugs.python.org/issue13574 #13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le http://bugs.python.org/issue13565 #13556: When tzinfo.utcoffset is out-of-bounds, the exception message http://bugs.python.org/issue13556 #13554: Tkinter doesn't use higher resolution app icon http://bugs.python.org/issue13554 #13553: Tkinter doesn't set proper application name http://bugs.python.org/issue13553 Most recent 15 issues waiting for review (15) ============================================= #13613: Small error in regular expression poker hand example http://bugs.python.org/issue13613 #13609: Add "os.get_terminal_size()" function http://bugs.python.org/issue13609 #13607: Move generator specific sections out of ceval. http://bugs.python.org/issue13607 #13604: update PEP 393 (match implementation) http://bugs.python.org/issue13604 #13598: string.Formatter doesn't support empty curly braces "{}" http://bugs.python.org/issue13598 #13594: Aifc markers write fix http://bugs.python.org/issue13594 #13589: Aifc low level serialization primitives fix http://bugs.python.org/issue13589 #13588: Change name of internal closure functions in importlib http://bugs.python.org/issue13588 #13585: Add contextlib.CleanupManager http://bugs.python.org/issue13585 #13583: sqlite3.Row doesn't support slice indexes http://bugs.python.org/issue13583 #13582: IDLE and pythonw.exe stderr problem http://bugs.python.org/issue13582 #13577: __qualname__ is not present on builtin methods and functions http://bugs.python.org/issue13577 #13576: Handling of broken condcoms in HTMLParser http://bugs.python.org/issue13576 #13567: HTTPError interface changes / breaks depending on what was pas http://bugs.python.org/issue13567 #13564: ftplib and sendfile() http://bugs.python.org/issue13564 Top 10 most discussed issues (10) ================================= #13521: Make dict.setdefault() atomic http://bugs.python.org/issue13521 18 msgs #13585: Add contextlib.CleanupManager http://bugs.python.org/issue13585 17 msgs #13577: __qualname__ is not present on builtin methods and functions http://bugs.python.org/issue13577 14 msgs #13405: Add DTrace probes http://bugs.python.org/issue13405 13 msgs #13609: Add "os.get_terminal_size()" function http://bugs.python.org/issue13609 9 msgs #13516: Gzip old log files in rotating handlers http://bugs.python.org/issue13516 8 msgs #13592: repr(regex) doesn't include actual regex http://bugs.python.org/issue13592 8 msgs #13248: deprecated in 3.2, should be removed in 3.3 http://bugs.python.org/issue13248 7 msgs #13604: update PEP 393 (match implementation) http://bugs.python.org/issue13604 7 msgs #1559549: ImportError needs attributes for module and file name http://bugs.python.org/issue1559549 7 msgs Issues closed (37) ================== #2979: use_builtin_types in xmlrpc.server http://bugs.python.org/issue2979 closed by python-dev #4028: Problem compiling the multiprocessing module on sunos5 http://bugs.python.org/issue4028 closed by neologix #4625: IDLE won't open anymore, .idlerc unaccessible http://bugs.python.org/issue4625 closed by ned.deily #6570: Tutorial clarity: section 4.7.2, parameters and arguments http://bugs.python.org/issue6570 closed by ezio.melotti #6695: PyXXX_ClearFreeList for dict, set, and list http://bugs.python.org/issue6695 closed by pitrou #8373: socket: AF_UNIX socket paths not handled according to PEP 383 http://bugs.python.org/issue8373 closed by pitrou #8684: improvements to sched.py http://bugs.python.org/issue8684 closed by giampaolo.rodola #9404: IDLE won't launch on XP http://bugs.python.org/issue9404 closed by ned.deily #10350: errno is read too late http://bugs.python.org/issue10350 closed by pitrou #10364: IDLE: make .py default added extension on save http://bugs.python.org/issue10364 closed by terry.reedy #13449: sched - provide an "async" argument for run() method http://bugs.python.org/issue13449 closed by giampaolo.rodola #13479: pickle too picky on re-defined classes http://bugs.python.org/issue13479 closed by gvanrossum #13505: Bytes objects pickled in 3.x with protocol <=2 are unpickled i http://bugs.python.org/issue13505 closed by alexandre.vassalotti #13528: Rework performance FAQ http://bugs.python.org/issue13528 closed by pitrou #13543: shlex with string ending in space gives "ValueError: No closin http://bugs.python.org/issue13543 closed by ekorn #13544: Add __qualname__ to functools.WRAPPER_ASSIGNMENTS http://bugs.python.org/issue13544 closed by meador.inge #13545: Pydoc3.2: TypeError: unorderable types http://bugs.python.org/issue13545 closed by haypo #13547: Clean Lib/_sysconfigdata.py and Modules/_testembed http://bugs.python.org/issue13547 closed by skrah #13549: Incorrect nested list comprehension documentation http://bugs.python.org/issue13549 closed by ezio.melotti #13563: Make use of with statement in ftplib http://bugs.python.org/issue13563 closed by giampaolo.rodola #13568: sqlite3 convert_date error with DATE type http://bugs.python.org/issue13568 closed by sherpya #13569: Loggers cannot be pickled http://bugs.python.org/issue13569 closed by vinay.sajip #13570: Expose faster unicode<->ascii functions in the C-API http://bugs.python.org/issue13570 closed by skrah #13575: old style classes still alive http://bugs.python.org/issue13575 closed by flox #13580: Pre-linkage of CPython >=2.6 binary on Linux too fat (libssl, http://bugs.python.org/issue13580 closed by pitrou #13584: argparse doesn't respect double quotes http://bugs.python.org/issue13584 closed by bethard #13591: import_module potentially imports a module twice http://bugs.python.org/issue13591 closed by meador.inge #13593: importlib needs to be updated for __qualname__ http://bugs.python.org/issue13593 closed by meador.inge #13595: Weird behavior with generators with self-referencing output. http://bugs.python.org/issue13595 closed by amaury.forgeotdarc #13596: Only recompile Lib/_sysconfigdata.py when needed http://bugs.python.org/issue13596 closed by python-dev #13597: Improve documentation of stdout/stderr buffering in Python 3.x http://bugs.python.org/issue13597 closed by pitrou #13599: Compiled regexes don't show all attributes in dir() http://bugs.python.org/issue13599 closed by ezio.melotti #13600: rot_13 codec not working http://bugs.python.org/issue13600 closed by petri.lehtinen #13602: format string '%b' doesn't work as expected http://bugs.python.org/issue13602 closed by James.Classen #13603: Add prime-related and number theory functions to Python http://bugs.python.org/issue13603 closed by maniram.maniram #13606: test_clear_dict_in_ref_cycle in test_module only works by coin http://bugs.python.org/issue13606 closed by python-dev #11610: Improved support for abstract base classes with descriptors http://bugs.python.org/issue11610 closed by python-dev From jimjjewett at gmail.com Fri Dec 16 21:14:02 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 16 Dec 2011 15:14:02 -0500 Subject: [Python-Dev] A new dict for Xmas? Message-ID: > Greg Ewing wrote: >> Mark Shannon wrote: >>> I have a new dict implementation which allows sharing of keys between >>> objects of the same class. >> We already have the __slots__ mechanism for memory savings. >> Have you done any comparisons with that? > You can't make Python programmers use slots, neither can you > automatically change existing programs. The automatic change is exactly what a dictionary upgrade provides. I haven't read your patch in detail yet, but it sounds like you're replacing the array of keys + array of values with just an array of values, and getting the numerical index from a single per-class array of keys. That would normally be sensible (so thanks!), but it isn't a drop-in replacement. If you have a "Data" class intended to take arbitrary per-instance attributes, it just forces them all to keep resizing up, even though individual instances would be small with the current dict. How is this more extreme than replacing a pure dict with some auto-calculated slots and an "other_attrs" dict that would normally remain empty? [It may be harder to implement, because of the difficulty of calculating the slots in advance ... but I don't see it as any worse, once implemented.] Of course, maybe your shared dict just points to sequential array positions (rather than matching the key position) ... in which case, it may well beat slots, though the the "Data" class would still be a problem. -jJ From tjreedy at udel.edu Fri Dec 16 22:32:05 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 16 Dec 2011 16:32:05 -0500 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EEB1772.1030300@hotpy.org> References: <4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz> <4EEB1772.1030300@hotpy.org> Message-ID: On 12/16/2011 5:03 AM, Mark Shannon wrote: > Of course using __slots__ saves more memory, > but people don't use them much. Do you think the stdlib should be using __slots__ more? -- Terry Jan Reedy From mark at hotpy.org Fri Dec 16 22:32:44 2011 From: mark at hotpy.org (Mark Shannon) Date: Fri, 16 Dec 2011 21:32:44 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: Message-ID: <4EEBB8FC.2010405@hotpy.org> Jim Jewett wrote: >> Greg Ewing wrote: >>> Mark Shannon wrote: > >>>> I have a new dict implementation which allows sharing of keys between >>>> objects of the same class. > >>> We already have the __slots__ mechanism for memory savings. >>> Have you done any comparisons with that? > >> You can't make Python programmers use slots, neither can you >> automatically change existing programs. > > The automatic change is exactly what a dictionary upgrade provides. > > I haven't read your patch in detail yet, but it sounds like you're > replacing the array of keys + array of values with just an array of > values, and getting the numerical index from a single per-class array > of keys. Each dictionary has key/hash/values as before, but instead of on array, they are broken into two: a key/hash array and a value array. The key/hash arrays can be shared amongst dicts, this happens for well behaved classes and completely empty dicts, other wise each dict gets two arrays. > > That would normally be sensible (so thanks!), but it isn't a drop-in > replacement. If you have a "Data" class intended to take arbitrary It is a drop in replacement. It conforms to the current API. > per-instance attributes, it just forces them all to keep resizing up, > even though individual instances would be small with the current dict. There is a cut-off point, at the moment it's quite unsophisticated about how it does this, but it could easily be improved. Suggestions are welcome. > > How is this more extreme than replacing a pure dict with some > auto-calculated slots and an "other_attrs" dict that would normally > remain empty? Its less extreme, but equally effective. > > [It may be harder to implement, because of the difficulty of > calculating the slots in advance ... but I don't see it as any worse, > once implemented.] Its a trade of between ease of implementation as effectiveness. I think the shared key/hash array approach gets most the advantages of a full map implementation (like PyPy or V8) with much less hassle. > > Of course, maybe your shared dict just points to sequential array > positions (rather than matching the key position) ... in which case, > it may well beat slots, though the the "Data" class would still be a > problem. It won't beat slots, mainly due to the extra space required to minimise collisions, but it is a lot more compact than the present approach. For a well behaved class with lots of instances, each with 3 or 4 attributes (ie the minimum size dict) its cuts the space used by the per-instance dict from 136 bytes (32bit machine) to 64 bytes plus the shared key/hash array. Slots would only require 12 or 16 bytes. (When verifying these numbers I found a bug in the resizing, which I have just fixed) The next enhancement would be to store the naked value array directly into an instance, trimming the space cost down to just 32 bytes, but this would cause compatibility issues as the (internal) API would need to change. Cheers, Mark. From mark at hotpy.org Fri Dec 16 22:42:11 2011 From: mark at hotpy.org (Mark Shannon) Date: Fri, 16 Dec 2011 21:42:11 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: <4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz> <4EEB1772.1030300@hotpy.org> Message-ID: <4EEBBB33.50306@hotpy.org> Terry Reedy wrote: > On 12/16/2011 5:03 AM, Mark Shannon wrote: > >> Of course using __slots__ saves more memory, >> but people don't use them much. > > Do you think the stdlib should be using __slots__ more? For some things yes, but where it's critical slots are already used. Take the ordered dict, the nodes in that use slots. The advantage of improving things in the VM is that we don't have to rewrite half of the stdlib. Cheers, Mark. From mmueller at vigilantsw.com Sat Dec 17 10:55:55 2011 From: mmueller at vigilantsw.com (Michael Mueller) Date: Sat, 17 Dec 2011 01:55:55 -0800 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c Message-ID: Hi Guys, We've been analyzing CPython with our static analysis tool (Sentry) and a NULL pointer dereference popped up the other day, in Objects/descrobject.c: if (descr != NULL) { Py_XINCREF(type); descr->d_type = type; descr->d_name = PyUnicode_InternFromString(name); if (descr->d_name == NULL) { Py_DECREF(descr); descr = NULL; } descr->d_qualname = NULL; // Possible NULL pointer dereference } If the inner conditional block can be reached, descr will be set NULL and then dereferenced on the next line. The commented line above was added in this commit: http://hg.python.org/cpython/rev/73948#l4.92 Hopefully someone can take a look and determine the appropriate fix. Best, Mike -- Mike Mueller Phone: (401) 405-1525 Email: mmueller at vigilantsw.com http://www.vigilantsw.com/ Static Analysis for C and C++ From anacrolix at gmail.com Sat Dec 17 11:33:53 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 17 Dec 2011 21:33:53 +1100 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: ?_? On Sat, Dec 17, 2011 at 8:55 PM, Michael Mueller wrote: > Hi Guys, > > We've been analyzing CPython with our static analysis tool (Sentry) > and a NULL pointer dereference popped up the other day, in > Objects/descrobject.c: > > ? ?if (descr != NULL) { > ? ? ? ?Py_XINCREF(type); > ? ? ? ?descr->d_type = type; > ? ? ? ?descr->d_name = PyUnicode_InternFromString(name); > ? ? ? ?if (descr->d_name == NULL) { > ? ? ? ? ? ?Py_DECREF(descr); > ? ? ? ? ? ?descr = NULL; > ? ? ? ?} > ? ? ? ?descr->d_qualname = NULL; // Possible NULL pointer dereference > ? ?} > > If the inner conditional block can be reached, descr will be set NULL > and then dereferenced on the next line. ?The commented line above was > added in this commit: http://hg.python.org/cpython/rev/73948#l4.92 > > Hopefully someone can take a look and determine the appropriate fix. > > Best, > Mike > > -- > Mike Mueller > Phone: (401) 405-1525 > Email: mmueller at vigilantsw.com > > http://www.vigilantsw.com/ > Static Analysis for C and C++ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com -- ?_? From fijall at gmail.com Sat Dec 17 12:53:05 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sat, 17 Dec 2011 13:53:05 +0200 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: <4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz> <4EEB1772.1030300@hotpy.org> Message-ID: On Fri, Dec 16, 2011 at 11:32 PM, Terry Reedy wrote: > On 12/16/2011 5:03 AM, Mark Shannon wrote: > >> Of course using __slots__ saves more memory, >> but people don't use them much. > > > Do you think the stdlib should be using __slots__ more? Note that unlike some other more advanced approaches, slots do change semantics. There are many cases out there where people would stuff arbitrary things on stdlib objects and this works fine without __slots__, but will stop working as soon as you introduce them. A change from no slots to using slots is not only a performance issue. Cheers, fijal From dirkjan at ochtman.nl Sat Dec 17 13:31:01 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sat, 17 Dec 2011 13:31:01 +0100 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: <4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz> <4EEB1772.1030300@hotpy.org> Message-ID: On Sat, Dec 17, 2011 at 12:53, Maciej Fijalkowski wrote: > Note that unlike some other more advanced approaches, slots do change > semantics. There are many cases out there where people would stuff > arbitrary things on stdlib objects and this works fine without > __slots__, but will stop working as soon as you introduce them. A > change from no slots to using slots is not only a performance issue. Yeah... This whole idea reeks of polymorphic inline caches (called "shapes" or "hidden classes" in SpiderMonkey and v8, respectively), where they dynamically try to infer what kind of class an object has, such that the __slots__ optimization can be done without making it visible in the semantics. The Unladen Swallow guys mention in their ProjectPlan that the overhead of opcode fetch/dispatch makes that hard, though. Cheers, Dirkjan From fijall at gmail.com Sat Dec 17 13:34:52 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sat, 17 Dec 2011 14:34:52 +0200 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: <4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz> <4EEB1772.1030300@hotpy.org> Message-ID: On Sat, Dec 17, 2011 at 2:31 PM, Dirkjan Ochtman wrote: > On Sat, Dec 17, 2011 at 12:53, Maciej Fijalkowski wrote: >> Note that unlike some other more advanced approaches, slots do change >> semantics. There are many cases out there where people would stuff >> arbitrary things on stdlib objects and this works fine without >> __slots__, but will stop working as soon as you introduce them. A >> change from no slots to using slots is not only a performance issue. > > Yeah... This whole idea reeks of polymorphic inline caches (called > "shapes" or "hidden classes" in SpiderMonkey and v8, respectively), > where they dynamically try to infer what kind of class an object has, > such that the __slots__ optimization can be done without making it > visible in the semantics. The Unladen Swallow guys mention in their > ProjectPlan that the overhead of opcode fetch/dispatch makes that > hard, though. > > Cheers, > > Dirkjan It's done in PyPy btw. Works like a charm :) It's called sharing dict and the idea dates back to self and it's maps. There is also an ongoing effort to specialize on types of fields, so you don't have to box say ints stored on classes. That's however in-progress now :) From g.brandl at gmx.net Sat Dec 17 13:57:53 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 17 Dec 2011 13:57:53 +0100 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: On 12/17/2011 11:33 AM, Matt Joiner wrote: > ?_? Would you please stop this? It may have been funny the first time, but now it looks like pure trolling. Georg From benjamin at python.org Sat Dec 17 14:02:40 2011 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 17 Dec 2011 08:02:40 -0500 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: 2011/12/17 Michael Mueller : > > Hopefully someone can take a look and determine the appropriate fix. Fixed. -- Regards, Benjamin From elic at astllc.org Sat Dec 17 17:02:05 2011 From: elic at astllc.org (Eli Collins) Date: Sat, 17 Dec 2011 11:02:05 -0500 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: In that same code, right before "PY_DECREF(descr)", should there also be a "PY_XDECREF(type)"? it looks like it might leak a reference to "type" otherwise. the line in question - http://hg.python.org/cpython/file/8c355edc5b1d/Objects/descrobject.c#l628 - Eli Collins On Sat, Dec 17, 2011 at 8:02 AM, Benjamin Peterson wrote: > 2011/12/17 Michael Mueller : > > > > Hopefully someone can take a look and determine the appropriate fix. > > Fixed. > > > -- > Regards, > Benjamin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/elic%40assurancetechnologies.com > -- Eli Collins elic at assurancetechnologies.com Software Development & I.T. Consulting Assurance Technologies www.assurancetechnologies.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinjcoyne at hotmail.com Sat Dec 17 16:54:17 2011 From: kevinjcoyne at hotmail.com (Kevin Coyne) Date: Sat, 17 Dec 2011 15:54:17 +0000 (UTC) Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities Message-ID: Victor: Python.3 Type System [IHN] - The use of ?extended precision? as a term to express Python?s ability to create and manipulate integers of any size (within the memory limitations of the computer) is poor since that term is used in reference to floating point numbers almost exclusively. I will change it to ?unlimited precision? in the revised annex. Python.16 Wrap?around Error [XYY] - My source for this is in the Python documentation under the 2nd reference to OverflowError in: http://docs.python.org/py3k/library/exceptions.html?highlight=overflowerror Python.23 Initialization of Variables [LAV] ? Point taken on the unusual syntax (I am not a Python programmer) and I will change to the more common syntax s per your 2nd suggested syntax. Python.32 Structured Programming [EWD] ? The point I was trying to make was that, unlike many early languages, Python has no constructs, like the ones mentioned, that can be used to create an unstructured program. I am not advocating, nor would it be proper in this kind of paper, that the Python language be extended to allow for unstructured statements. I will try to clarify this better in the revised version. Python.51 Undefined Behaviour [EWF] #1 ? I need to do more research as your example does suggest that mutating, at least, does raise an exception. Here are a few references that claim that this is undefined: Refer to (10) under: http://docs.python.org/release/2.4/lib/typesseq-mutable.html Python.51 Undefined Behaviour [EWF] #2 ? In regard to collections.OrderedDict, since I am only listing undefined behaviors I don?t think adding a defined behaviour here is appropriate. Python.52 Implementation?defined Behaviour [FAB] ? In regard to mixing tabs and spaces, I will add your advice to the 52.2 Guidance section Thanks for your excellent comments; the paper will be improved because of them. Kevin Coyne 703.901.6774 From benjamin at python.org Sat Dec 17 17:20:38 2011 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 17 Dec 2011 11:20:38 -0500 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: 2011/12/17 Eli Collins > > In that same code, right before "PY_DECREF(descr)", should there also be a "PY_XDECREF(type)"? it looks like it might leak a reference to "type" otherwise. No. The descr will deallocate it. PS. Please don't send HTML mail. -- Regards, Benjamin From elic at astllc.org Sat Dec 17 18:00:23 2011 From: elic at astllc.org (Eli Collins) Date: Sat, 17 Dec 2011 12:00:23 -0500 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: On Sat, Dec 17, 2011 at 11:20 AM, Benjamin Peterson wrote: > > No. The descr will deallocate it. > > PS. Please don't send HTML mail. > Thank you for the explanation. And my apologies to the entire list for the HTML; it's way too early for me, I forgot to turn that mess off. From mmueller at vigilantsw.com Sat Dec 17 18:45:11 2011 From: mmueller at vigilantsw.com (Michael Mueller) Date: Sat, 17 Dec 2011 09:45:11 -0800 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: On Sat, Dec 17, 2011 at 5:02 AM, Benjamin Peterson wrote: > 2011/12/17 Michael Mueller : >> >> Hopefully someone can take a look and determine the appropriate fix. > > Fixed. > > -- > Regards, > Benjamin Excellent! -- Mike Mueller Phone: (401) 405-1525 Email: mmueller at vigilantsw.com http://www.vigilantsw.com/ Static Analysis for C and C++ From greg.ewing at canterbury.ac.nz Sun Dec 18 01:09:16 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 18 Dec 2011 13:09:16 +1300 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: References: Message-ID: <4EED2F2C.2070409@canterbury.ac.nz> Matt Joiner wrote: > ?_? What's up with these ?_? messages? -- Greg From solipsis at pitrou.net Sun Dec 18 01:20:57 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 18 Dec 2011 01:20:57 +0100 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c References: <4EED2F2C.2070409@canterbury.ac.nz> Message-ID: <20111218012057.25fe6903@pitrou.net> On Sun, 18 Dec 2011 13:09:16 +1300 Greg Ewing wrote: > Matt Joiner wrote: > > ?_? > > What's up with these ?_? messages? >>> print(ascii("?_?")) '\u0ca0_\u0ca0' Antoine. From steve at pearwood.info Sun Dec 18 01:33:20 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 18 Dec 2011 11:33:20 +1100 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: <4EED2F2C.2070409@canterbury.ac.nz> References: <4EED2F2C.2070409@canterbury.ac.nz> Message-ID: <4EED34D0.8030507@pearwood.info> Greg Ewing wrote: > Matt Joiner wrote: >> ?_? > > What's up with these ?_? messages? > I think that, depending on the typeface you view it with, it is supposed to be some sort of smiley: two big wide open square eyes with tightly pursed lips. Presumably it is supposed to be a look of shock and surprise. As smileys go, it's pretty poor, because people are unlikely to see the same thing. The supposed eyes are probably intended to be square boxes; in my email client, the boxes contain tiny 0ca0 characters, which completely ruins the effect. Apparently you see a question mark instead of a box. Depending on the typeface, others might see a full box, an empty box, a diamond with a question mark in it, nothing at all, or some other glyph. -- Steven From steve at pearwood.info Sun Dec 18 01:39:11 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 18 Dec 2011 11:39:11 +1100 Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c In-Reply-To: <4EED34D0.8030507@pearwood.info> References: <4EED2F2C.2070409@canterbury.ac.nz> <4EED34D0.8030507@pearwood.info> Message-ID: <4EED362F.5060203@pearwood.info> Steven D'Aprano wrote: > Greg Ewing wrote: >> Matt Joiner wrote: >>> ?_? >> >> What's up with these ?_? messages? >> > > I think that, depending on the typeface you view it with, it is supposed > to be some sort of smiley: two big wide open square eyes with tightly > pursed lips. Presumably it is supposed to be a look of shock and surprise. Apparently it is supposed to be a look of disapproval: http://knowyourmeme.com/memes/%E0%B2%A0%E0%B2%A0-look-of-disapproval and the 0c0a characters on either side of the underscore is KANNADA LETTER TTHA: http://www.fileformat.info/info/unicode/char/ca0/index.htm -- Steven From fperez.net at gmail.com Sun Dec 18 08:46:48 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 18 Dec 2011 07:46:48 +0000 (UTC) Subject: [Python-Dev] Inconsistent script/console behaviour References: Message-ID: On Fri, 23 Sep 2011 16:32:30 -0700, Guido van Rossum wrote: > You can't fix this without completely changing the way the interactive > console treats blank lines. None that it's not just that a blank line is > required after a function definition -- you also *can't* have a blank > line *inside* a function definition. > > The interactive console is optimized for people entering code by typing, > not by copying and pasting large gobs of text. > > If you think you can have it both, show us the code. Apology for the advertising, but if the OP is really interested in that kind of behavior, then instead of asking for making the default shell more complex, he can use ipython which supports what he's looking for: In [5]: def some(): ...: print 'xxx' ...: some() ...: xxx and even blank lines inside functions (albeit only in certain locations): In [6]: def some(): ...: ...: print 'xxx' ...: some() ...: xxx Now, the dances we have to do in ipython to achieve that are much more complex than what would be reasonable to have in the default '>>>' python shell, which should remain simple, light and robust. But ipython is a simple install for someone who wants fancier features for interactive work. Cheers, f From roundup-admin at psf.upfronthosting.co.za Sun Dec 18 20:28:46 2011 From: roundup-admin at psf.upfronthosting.co.za (Python tracker) Date: Sun, 18 Dec 2011 19:28:46 +0000 Subject: [Python-Dev] Failed issue tracker submission Message-ID: <20111218192846.6098A1DE8A@psf.upfronthosting.co.za> An unexpected error occurred during the processing of your message. The tracker administrator is being notified. -------------- next part -------------- Return-Path: X-Original-To: report at bugs.python.org Delivered-To: roundup+tracker at psf.upfronthosting.co.za Received: from mail.python.org (mail.python.org [82.94.164.166]) by psf.upfronthosting.co.za (Postfix) with ESMTPS id EF0611DE20 for ; Sun, 18 Dec 2011 20:23:39 +0100 (CET) Received: from albatross.python.org (localhost [127.0.0.1]) by mail.python.org (Postfix) with ESMTP id 3T5wK759MkzQ00 for ; Sun, 18 Dec 2011 20:23:39 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=python.org; s=200901; t=1324236219; bh=/yIht6I8EmPEiXZ9KLwjVNemYVkalK/1gPj7HIPxFXM=; h=Date:Message-Id:Content-Type:MIME-Version: Content-Transfer-Encoding:From:To:Subject; b=oFlrztFHjmQi6JK3VCXIic9qr39+OWQ4rGmVoFTk59ABwcLwBJpJGa4BQq74DRZT9 BoWSENTtwjmDIiLNg3LgIXv9RioJHWtR6EWlj1R7fvPUfTgnjXd7fJNgbVSPG5BbgU VzVC5bQYIO9aKpzYWBTTxH700UdCfLAC27/GwIKY= Received: from localhost (HELO mail.python.org) (127.0.0.1) by albatross.python.org with SMTP; 18 Dec 2011 20:23:39 +0100 Received: from dinsdale.python.org (svn.python.org [IPv6:2001:888:2000:d::a4]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.python.org (Postfix) with ESMTPS for ; Sun, 18 Dec 2011 20:23:39 +0100 (CET) Received: from localhost ([127.0.0.1] helo=dinsdale.python.org ident=hg) by dinsdale.python.org with esmtp (Exim 4.72) (envelope-from ) id 1RcMKh-0006D3-Ii for report at bugs.python.org; Sun, 18 Dec 2011 20:23:39 +0100 Date: Sun, 18 Dec 2011 20:23:39 +0100 Message-Id: Content-Type: text/plain; charset="utf8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 From: python-dev at python.org To: report at bugs.python.org Subject: [issue7502] TmV3IGNoYW5nZXNldCBlMzdjNzE2OTg0MDkgYnkgQW50b2luZSBQaXRyb3UgaW4gYnJhbmNoICcz LjInOgpGb2xsb3d1cCB0byAjNzUwMjogYWRkIF9faGFzaF9fIG1ldGhvZCBhbmQgdGVzdHMuCmh0 dHA6Ly9oZy5weXRob24ub3JnL2NweXRob24vcmV2L2UzN2M3MTY5ODQwOQoKCk5ldyBjaGFuZ2Vz ZXQgNGZmYTk5OTJhN2Q4IGJ5IEFudG9pbmUgUGl0cm91IGluIGJyYW5jaCAnZGVmYXVsdCc6CkZv bGxvd3VwIHRvICM3NTAyOiBhZGQgX19oYXNoX18gbWV0aG9kIGFuZCB0ZXN0cy4KaHR0cDovL2hn LnB5dGhvbi5vcmcvY3B5dGhvbi9yZXYvNGZmYTk5OTJhN2Q4Cg== From martin at v.loewis.de Sun Dec 18 20:34:49 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 18 Dec 2011 20:34:49 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum In-Reply-To: References: Message-ID: <4EEE4059.5040807@v.loewis.de> > Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum What's the rationale for that change? It's a valid kind value, after all, and the C convention is that an enumeration lists all valid values (else there wouldn't be a need for an enumeration in the first place). Regards, Martin From victor.stinner at haypocalc.com Sun Dec 18 20:45:40 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 18 Dec 2011 20:45:40 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum In-Reply-To: <4EEE4059.5040807@v.loewis.de> References: <4EEE4059.5040807@v.loewis.de> Message-ID: <4EEE42E4.5020905@haypocalc.com> On 18/12/2011 20:34, "Martin v. L?wis" wrote: >> Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum > > What's the rationale for that change? It's a valid kind value, after > all, and the C convention is that an enumeration lists all valid values > (else there wouldn't be a need for an enumeration in the first place). PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed to see PyUnicode_WCHAR_KIND. For switch/case, it avoids the need of adding a dummy PyUnicode_WCHAR_KIND case (or a default case). Victor From martin at v.loewis.de Sun Dec 18 21:04:24 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 18 Dec 2011 21:04:24 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum In-Reply-To: <4EEE42E4.5020905@haypocalc.com> References: <4EEE4059.5040807@v.loewis.de> <4EEE42E4.5020905@haypocalc.com> Message-ID: <4EEE4748.2010901@v.loewis.de> Am 18.12.2011 20:45, schrieb Victor Stinner: > On 18/12/2011 20:34, "Martin v. L?wis" wrote: >>> Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum >> >> What's the rationale for that change? It's a valid kind value, after >> all, and the C convention is that an enumeration lists all valid values >> (else there wouldn't be a need for an enumeration in the first place). > > PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND > or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed > to see PyUnicode_WCHAR_KIND. Why do you say that? It can very well happen, assuming you call PyUnicode_KIND on a string that is not ready. That would be a bug in the module, but people do make bugs when programming. > For switch/case, it avoids the need of adding a dummy > PyUnicode_WCHAR_KIND case (or a default case). ... and thus hides a potential source of errors, as people may forget to call ready, and then fall through the case, letting god-knows-what happen. If the rationale is to simplify silencing compiler errors, I vote for reverting the enumeration back to a macro list. Regards, Martin From victor.stinner at haypocalc.com Sun Dec 18 21:16:19 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 18 Dec 2011 21:16:19 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum In-Reply-To: <4EEE4748.2010901@v.loewis.de> References: <4EEE4059.5040807@v.loewis.de> <4EEE42E4.5020905@haypocalc.com> <4EEE4748.2010901@v.loewis.de> Message-ID: <4EEE4A13.1040808@haypocalc.com> On 18/12/2011 21:04, "Martin v. L?wis" wrote: >> PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND >> or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed >> to see PyUnicode_WCHAR_KIND. > > Why do you say that? It can very well happen, assuming you call > PyUnicode_KIND on a string that is not ready. That would be a > bug in the module, but people do make bugs when programming. I added assert(PyUnicode_IS_READY(op)) to the macro, so the bug will be quickly catched in debug mode. I forgot that it is just an assertion and few people use Python compiled in debug mode. > If the rationale is to simplify silencing compiler errors, I > vote for reverting the enumeration back to a macro list. I'm not sure that gcc will not complain if only 3 values are handled. I agree to revert the commit if that helps developers to write bugs. Victor From martin at v.loewis.de Sun Dec 18 21:36:44 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 18 Dec 2011 21:36:44 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum In-Reply-To: <4EEE4A13.1040808@haypocalc.com> References: <4EEE4059.5040807@v.loewis.de> <4EEE42E4.5020905@haypocalc.com> <4EEE4748.2010901@v.loewis.de> <4EEE4A13.1040808@haypocalc.com> Message-ID: <4EEE4EDC.2000606@v.loewis.de> Am 18.12.2011 21:16, schrieb Victor Stinner: > On 18/12/2011 21:04, "Martin v. L?wis" wrote: >>> PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND >>> or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed >>> to see PyUnicode_WCHAR_KIND. >> >> Why do you say that? It can very well happen, assuming you call >> PyUnicode_KIND on a string that is not ready. That would be a >> bug in the module, but people do make bugs when programming. > > I added assert(PyUnicode_IS_READY(op)) to the macro, so the bug will be > quickly catched in debug mode. I forgot that it is just an assertion and > few people use Python compiled in debug mode. > >> If the rationale is to simplify silencing compiler errors, I >> vote for reverting the enumeration back to a macro list. > > I'm not sure that gcc will not complain if only 3 values are handled. I > agree to revert the commit if that helps developers to write bugs. It helps to detect bugs. User should be aware that there is an additional case, and put something like case PyUnicode_WCHAR_KIND: /* string is guaranteed to be ready here */ assert(0); into their code. Regards, Martin From solipsis at pitrou.net Sun Dec 18 23:55:16 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 18 Dec 2011 23:55:16 +0100 Subject: [Python-Dev] A new dict for Xmas? References: <4EEBB8FC.2010405@hotpy.org> Message-ID: <20111218235516.741cc14d@pitrou.net> On Fri, 16 Dec 2011 21:32:44 +0000 Mark Shannon wrote: > > > per-instance attributes, it just forces them all to keep resizing up, > > even though individual instances would be small with the current dict. > There is a cut-off point, at the moment it's quite unsophisticated about > how it does this, but it could easily be improved. > Suggestions are welcome. Can you open an issue on the bug tracker? There you can either give your repo URL, or upload a patch. Both should allow to start reviewing the code :) Regards Antoine. From stephen at xemacs.org Mon Dec 19 05:47:57 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 19 Dec 2011 13:47:57 +0900 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: <87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp> Fernando Perez writes: > Apology for the advertising, If there's any apologizing to be done, it's on Anatoly's part. Your post was short, to the point, information-packed, and should put a big fat open-centered ideographic full stop period to this thread. From solipsis at pitrou.net Tue Dec 20 09:51:49 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 20 Dec 2011 09:51:49 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail References: Message-ID: <20111220095149.6187cca8@pitrou.net> On Mon, 19 Dec 2011 22:42:43 +0100 benjamin.peterson wrote: > http://hg.python.org/cpython/rev/d85efd73b0e1 > changeset: 74088:d85efd73b0e1 > branch: 3.2 > parent: 74082:71e5a083f9b1 > user: Benjamin Peterson > date: Mon Dec 19 16:41:11 2011 -0500 > summary: > don't mention implementation detail > > files: > Doc/library/operator.rst | 10 +++++----- > 1 files changed, 5 insertions(+), 5 deletions(-) > > > diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst > --- a/Doc/library/operator.rst > +++ b/Doc/library/operator.rst > @@ -12,11 +12,11 @@ > from operator import itemgetter, iadd > > > -The :mod:`operator` module exports a set of functions implemented in C > -corresponding to the intrinsic operators of Python. For example, > -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function > -names are those used for special class methods; variants without leading and > -trailing ``__`` are also provided for convenience. I disagree with this change. Knowing that they are written in C is important when deciding to pass them to e.g. sort() or sorted(), because you know it will be faster than an arbitrary pure Python function. You could tag it as a "CPython implementation detail" if you want, or talk about performance rather than mention "C". Regards Antoine. From solipsis at pitrou.net Tue Dec 20 09:54:40 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 20 Dec 2011 09:54:40 +0100 Subject: [Python-Dev] Difference between PyUnicode_IS_ASCII and PyUnicode_IS_COMPACT_ASCII ? Message-ID: <20111220095440.43ff9f41@pitrou.net> Hello, The include file (unicodeobject.h) seems to imply that some pure ASCII strings can be non-compact, but I don't understand how that can happen. Besides, the following comment also seems wrong: - compact: * structure = PyCompactUnicodeObject * test: PyUnicode_IS_ASCII(op) && !PyUnicode_IS_COMPACT(op) Regards Antoine. From fijall at gmail.com Tue Dec 20 11:01:04 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Tue, 20 Dec 2011 12:01:04 +0200 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: <20111220095149.6187cca8@pitrou.net> References: <20111220095149.6187cca8@pitrou.net> Message-ID: On Tue, Dec 20, 2011 at 10:51 AM, Antoine Pitrou wrote: > On Mon, 19 Dec 2011 22:42:43 +0100 > benjamin.peterson wrote: >> http://hg.python.org/cpython/rev/d85efd73b0e1 >> changeset: ? 74088:d85efd73b0e1 >> branch: ? ? ?3.2 >> parent: ? ? ?74082:71e5a083f9b1 >> user: ? ? ? ?Benjamin Peterson >> date: ? ? ? ?Mon Dec 19 16:41:11 2011 -0500 >> summary: >> ? don't mention implementation detail >> >> files: >> ? Doc/library/operator.rst | ?10 +++++----- >> ? 1 files changed, 5 insertions(+), 5 deletions(-) >> >> >> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst >> --- a/Doc/library/operator.rst >> +++ b/Doc/library/operator.rst >> @@ -12,11 +12,11 @@ >> ? ? from operator import itemgetter, iadd >> >> >> -The :mod:`operator` module exports a set of functions implemented in C >> -corresponding to the intrinsic operators of Python. ?For example, >> -``operator.add(x, y)`` is equivalent to the expression ``x+y``. ?The function >> -names are those used for special class methods; variants without leading and >> -trailing ``__`` are also provided for convenience. > > I disagree with this change. Knowing that they are written in C is > important when deciding to pass them to e.g. sort() or sorted(), > because you know it will be faster than an arbitrary pure Python > function. > > You could tag it as a "CPython implementation detail" if you want, or > talk about performance rather than mention "C". > > Regards > > Antoine. If this documentation is to be used by other python implementations, then mentions of performance are outright harmful, since the performance characteristics differ quite drastically. Written in C is also not a part of specification as far as I know :) Cheers, fijal From solipsis at pitrou.net Tue Dec 20 11:08:30 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 20 Dec 2011 11:08:30 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> Message-ID: <1324375710.3368.17.camel@localhost.localdomain> Le mardi 20 d?cembre 2011 ? 12:01 +0200, Maciej Fijalkowski a ?crit : > > If this documentation is to be used by other python implementations, > then mentions of performance are outright harmful, since the > performance characteristics differ quite drastically. Written in C is > also not a part of specification as far as I know :) But that's basically the only reason to invoke the `operator.attrgetter("foo")` ugliness, instead of writing the explicit and obvious `lambda x: x.foo`. So not mentioning that it provides a speed benefit on CPython hides the primary reason for using the operator module. Overwise it's just a bunch of useless wrappers. --------- More generally, not talking about performance at all is more harmful than making CPython-specific comments in the documentation. Implementation details *deserve* to be documented when they have an impact on behaviour (including performance / resource usage). Python is not just a platonic ideal. Do you suggest we also remove this part: http://docs.python.org/dev/library/io.html#performance ? Regards Antoine. From dirkjan at ochtman.nl Tue Dec 20 11:14:15 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 20 Dec 2011 11:14:15 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: <1324375710.3368.17.camel@localhost.localdomain> References: <20111220095149.6187cca8@pitrou.net> <1324375710.3368.17.camel@localhost.localdomain> Message-ID: On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou wrote: >> If this documentation is to be used by other python implementations, >> then mentions of performance are outright harmful, since the >> performance characteristics differ quite drastically. Written in C is >> also not a part of specification as far as I know :) > > But that's basically the only reason to invoke the > `operator.attrgetter("foo")` ugliness, instead of writing the explicit > and obvious `lambda x: x.foo`. > So not mentioning that it provides a speed benefit on CPython hides the > primary reason for using the operator module. Overwise it's just a bunch > of useless wrappers. So the question is if the docs are Python documentation or CPython documentation? On PyPy, I'm guessing lambda x: x.foo might (some day) be just as fast as operator.attrgetter("foo"). > Implementation details *deserve* to be documented when they have an > impact on behaviour (including performance / resource usage). Python is > not just a platonic ideal. Do you suggest we also remove this part: > http://docs.python.org/dev/library/io.html#performance > ? I agree that it's good to document some implementation details, but it seems like the paragraph, as it was before, documented too many details. It seems like a paragraph that mentions the specificity of this aspect for CPython and omits the reference to C as the VM implementation should be acceptable to all parties. Cheers, Dirkjan From solipsis at pitrou.net Tue Dec 20 11:22:28 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 20 Dec 2011 11:22:28 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> <1324375710.3368.17.camel@localhost.localdomain> Message-ID: <20111220112228.320c389b@pitrou.net> On Tue, 20 Dec 2011 11:14:15 +0100 Dirkjan Ochtman wrote: > On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou wrote: > >> If this documentation is to be used by other python implementations, > >> then mentions of performance are outright harmful, since the > >> performance characteristics differ quite drastically. Written in C is > >> also not a part of specification as far as I know :) > > > > But that's basically the only reason to invoke the > > `operator.attrgetter("foo")` ugliness, instead of writing the explicit > > and obvious `lambda x: x.foo`. > > So not mentioning that it provides a speed benefit on CPython hides the > > primary reason for using the operator module. Overwise it's just a bunch > > of useless wrappers. > > So the question is if the docs are Python documentation or CPython > documentation? On PyPy, I'm guessing lambda x: x.foo might (some day) > be just as fast as operator.attrgetter("foo"). I would expect it to be just as fast right now, although that's just an uninformed guess. That said, CPython is both the dominant implementation and the only one (AFAIR) to have stable 3.2 support. > > Implementation details *deserve* to be documented when they have an > > impact on behaviour (including performance / resource usage). Python is > > not just a platonic ideal. Do you suggest we also remove this part: > > http://docs.python.org/dev/library/io.html#performance > > ? > > I agree that it's good to document some implementation details, but it > seems like the paragraph, as it was before, documented too many > details. It seems like a paragraph that mentions the specificity of > this aspect for CPython and omits the reference to C as the VM > implementation should be acceptable to all parties. Agreed. The original wording was poor since it mentioned C while what is really significant is performance. There are probably Python programmers who don't even know what C is. Regards Antoine. From python-dev at masklinn.net Tue Dec 20 11:25:32 2011 From: python-dev at masklinn.net (Xavier Morel) Date: Tue, 20 Dec 2011 11:25:32 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: <1324375710.3368.17.camel@localhost.localdomain> References: <20111220095149.6187cca8@pitrou.net> <1324375710.3368.17.camel@localhost.localdomain> Message-ID: <4E93C721-BD86-42C6-81A4-FD2ED92FD4C6@masklinn.net> On 2011-12-20, at 11:08 , Antoine Pitrou wrote: > But that's basically the only reason to invoke the > `operator.attrgetter("foo")` ugliness, instead of writing the explicit > and obvious `lambda x: x.foo`. I don't agree with this, an attrgetter in the current namespace can be clearer than an explicit lambda in place, and more importantly when trying to fetch more than one attribute attrgetter is far superior to lambdas as far as I'm concerned. I don't think I've ever seen `attrgetter` (or any of the other `operator` functions) advocated on basis of speed. This mention does not even exist in the Python 2 docs, which does not prevent people from using `operator`. From tjreedy at udel.edu Tue Dec 20 11:27:41 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 20 Dec 2011 05:27:41 -0500 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: <20111220095149.6187cca8@pitrou.net> References: <20111220095149.6187cca8@pitrou.net> Message-ID: On 12/20/2011 3:51 AM, Antoine Pitrou wrote: > On Mon, 19 Dec 2011 22:42:43 +0100 > benjamin.peterson wrote: >> http://hg.python.org/cpython/rev/d85efd73b0e1 >> changeset: 74088:d85efd73b0e1 >> branch: 3.2 >> parent: 74082:71e5a083f9b1 >> user: Benjamin Peterson >> date: Mon Dec 19 16:41:11 2011 -0500 >> summary: >> don't mention implementation detail >> >> files: >> Doc/library/operator.rst | 10 +++++----- >> 1 files changed, 5 insertions(+), 5 deletions(-) >> >> >> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst >> --- a/Doc/library/operator.rst >> +++ b/Doc/library/operator.rst >> @@ -12,11 +12,11 @@ >> from operator import itemgetter, iadd >> >> >> -The :mod:`operator` module exports a set of functions implemented in C >> -corresponding to the intrinsic operators of Python. For example, >> -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function >> -names are those used for special class methods; variants without leading and >> -trailing ``__`` are also provided for convenience. > > I disagree with this change. Knowing that they are written in C is > important when deciding to pass them to e.g. sort() or sorted(), > because you know it will be faster than an arbitrary pure Python > function. > > You could tag it as a "CPython implementation detail" if you want, or > talk about performance rather than mention "C". The existence of operator and the behavior of its functions is not a C implementation detail. So some change was needed. I think a programmer can assume that they are are written in the implementation language to be as fast as possible. I do not think we should load the manual with 'In CPython, this is implemented in C" notes all over. For instance, there is nothing is the library manual that I can see that specifies that the builtin functions and types are written in C (for CPython). And I remember that Guido has asked that the manual not discuss big O() behavior of the methods of builtin classes. I so see a note like "The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules." But that should be revised somehow for the same reason as operator. But I don't this this is typical. The heapq module makes no mention of _heapq. I think all this sort of stuff belong in a separate CPython Notes. Perhaps Python Setup and Usage could be renamed CPython Setup and Usage and expanded with more info on gc (ref counting), O() notes, Python vs. C code, etc. I presume that other implementations are not run with 'python script.py', so the very first section is CPython specific anyway. In fact, I have the impression that for some *nix systems, that is CPython 2 specific. -- Terry Jan Reedy From solipsis at pitrou.net Tue Dec 20 11:57:08 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 20 Dec 2011 11:57:08 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail References: <20111220095149.6187cca8@pitrou.net> Message-ID: <20111220115708.40bd882e@pitrou.net> On Tue, 20 Dec 2011 05:27:41 -0500 Terry Reedy wrote: > > > > I disagree with this change. Knowing that they are written in C is > > important when deciding to pass them to e.g. sort() or sorted(), > > because you know it will be faster than an arbitrary pure Python > > function. > > > > You could tag it as a "CPython implementation detail" if you want, or > > talk about performance rather than mention "C". > > The existence of operator and the behavior of its functions is not a C > implementation detail. And? > I think a programmer > can assume that they are are written in the implementation language to > be as fast as possible. Yeah, you can assume anything, and then get bitten by the fact that e.g. OrderedDict is pure Python and thus massively slower than dict. But at least you've achieved some platonic ideal of how documentation should not talk about implementation details, which is great, right? Why you think we should leave users in the dark rather than inform them is beyond me. While we certainly should find a good compromise between readability and completeness, and should certainly tweak the doc's wording and layout adequately, removing useful information is nonsense. > For instance, > there is nothing is the library manual that I can see that specifies > that the builtin functions and types are written in C (for CPython). I guess everyone expects builtin functions and types to be reasonably fast, regardless of the language or implementation. (even though I did see some beginner code rewrite its own slow "list" wrapper, so it's probably not an universal expectation) > Perhaps Python Setup and Usage could be renamed CPython Setup and Usage > and expanded with more info on gc (ref counting), O() notes, Python vs. > C code, etc. Really? That's a perfectly inappropriate place to talk about performance details of *any* implementation. Regards Antoine. From dirkjan at ochtman.nl Tue Dec 20 12:24:58 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 20 Dec 2011 12:24:58 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> Message-ID: On Tue, Dec 20, 2011 at 11:27, Terry Reedy wrote: > And I remember that Guido has > asked that the manual not discuss big O() > behavior of the methods of builtin classes. Do you know when/where he did that? It seems useful to know that on CPython, list.insert(0, x) will become slow as the list grows... It probably shouldn't be upfront, but O() hints for some of the core stuff seems useful (though again, in some cases they should probably be limited to CPython). Cheers, Dirkjan From lukasz at langa.pl Tue Dec 20 13:27:11 2011 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Tue, 20 Dec 2011 13:27:11 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: <20111220115708.40bd882e@pitrou.net> References: <20111220095149.6187cca8@pitrou.net> <20111220115708.40bd882e@pitrou.net> Message-ID: <09213AF2-38D1-447C-BD23-7E30D2C54EBA@langa.pl> Wiadomo?? napisana przez Antoine Pitrou w dniu 20 gru 2011, o godz. 11:57: > Why you think we should leave users in the dark rather than inform them > is beyond me. While we certainly should find a good compromise between > readability and completeness, and should certainly tweak the doc's > wording and layout adequately, removing useful information is nonsense. +1 -- Best regards, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. From lukasz at langa.pl Tue Dec 20 13:29:01 2011 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Tue, 20 Dec 2011 13:29:01 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> Message-ID: <2F828189-2D3F-4E10-A70F-7DBECC8870C9@langa.pl> Wiadomo?? napisana przez Dirkjan Ochtman w dniu 20 gru 2011, o godz. 12:24: > On Tue, Dec 20, 2011 at 11:27, Terry Reedy wrote: >> And I remember that Guido has >> asked that the manual not discuss big O() >> behavior of the methods of builtin classes. > > Do you know when/where he did that? http://mail.python.org/pipermail/python-dev/2008-March/077511.html -- Best regards, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. From jxo6948 at rit.edu Tue Dec 20 13:29:31 2011 From: jxo6948 at rit.edu (John O'Connor) Date: Tue, 20 Dec 2011 07:29:31 -0500 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> Message-ID: On Tue, Dec 20, 2011 at 6:24 AM, Dirkjan Ochtman wrote: > On Tue, Dec 20, 2011 at 11:27, Terry Reedy wrote: >> And I remember that Guido has >> asked that the manual not discuss big O() >> behavior of the methods of builtin classes. > > Do you know when/where he did that? It seems useful to know that on > CPython, list.insert(0, x) will become slow as the list grows... It > probably shouldn't be upfront, but O() hints for some of the core > stuff seems useful (though again, in some cases they should probably > be limited to CPython). I think the question of the day is whether the documentation is targeting those who wish to have an understanding of what is happening under the hood, or those that want to take such details for granted. I much prefer the little notes and performance hints. - John From benjamin at python.org Tue Dec 20 16:57:06 2011 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 20 Dec 2011 10:57:06 -0500 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: <20111220095149.6187cca8@pitrou.net> References: <20111220095149.6187cca8@pitrou.net> Message-ID: 2011/12/20 Antoine Pitrou : > On Mon, 19 Dec 2011 22:42:43 +0100 > benjamin.peterson wrote: >> http://hg.python.org/cpython/rev/d85efd73b0e1 >> changeset: ? 74088:d85efd73b0e1 >> branch: ? ? ?3.2 >> parent: ? ? ?74082:71e5a083f9b1 >> user: ? ? ? ?Benjamin Peterson >> date: ? ? ? ?Mon Dec 19 16:41:11 2011 -0500 >> summary: >> ? don't mention implementation detail >> >> files: >> ? Doc/library/operator.rst | ?10 +++++----- >> ? 1 files changed, 5 insertions(+), 5 deletions(-) >> >> >> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst >> --- a/Doc/library/operator.rst >> +++ b/Doc/library/operator.rst >> @@ -12,11 +12,11 @@ >> ? ? from operator import itemgetter, iadd >> >> >> -The :mod:`operator` module exports a set of functions implemented in C >> -corresponding to the intrinsic operators of Python. ?For example, >> -``operator.add(x, y)`` is equivalent to the expression ``x+y``. ?The function >> -names are those used for special class methods; variants without leading and >> -trailing ``__`` are also provided for convenience. > > I disagree with this change. Knowing that they are written in C is > important when deciding to pass them to e.g. sort() or sorted(), > because you know it will be faster than an arbitrary pure Python > function. In that case, I would rather speak of "fast" functions rather than "implemented in C" functions (a la the itertools docs). Would that be acceptable? -- Regards, Benjamin From solipsis at pitrou.net Tue Dec 20 17:10:50 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 20 Dec 2011 17:10:50 +0100 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> Message-ID: <1324397450.3368.25.camel@localhost.localdomain> Le mardi 20 d?cembre 2011 ? 10:57 -0500, Benjamin Peterson a ?crit : > 2011/12/20 Antoine Pitrou : > > On Mon, 19 Dec 2011 22:42:43 +0100 > > benjamin.peterson wrote: > >> http://hg.python.org/cpython/rev/d85efd73b0e1 > >> changeset: 74088:d85efd73b0e1 > >> branch: 3.2 > >> parent: 74082:71e5a083f9b1 > >> user: Benjamin Peterson > >> date: Mon Dec 19 16:41:11 2011 -0500 > >> summary: > >> don't mention implementation detail > >> > >> files: > >> Doc/library/operator.rst | 10 +++++----- > >> 1 files changed, 5 insertions(+), 5 deletions(-) > >> > >> > >> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst > >> --- a/Doc/library/operator.rst > >> +++ b/Doc/library/operator.rst > >> @@ -12,11 +12,11 @@ > >> from operator import itemgetter, iadd > >> > >> > >> -The :mod:`operator` module exports a set of functions implemented in C > >> -corresponding to the intrinsic operators of Python. For example, > >> -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function > >> -names are those used for special class methods; variants without leading and > >> -trailing ``__`` are also provided for convenience. > > > > I disagree with this change. Knowing that they are written in C is > > important when deciding to pass them to e.g. sort() or sorted(), > > because you know it will be faster than an arbitrary pure Python > > function. > > In that case, I would rather speak of "fast" functions rather than > "implemented in C" functions (a la the itertools docs). Would that be > acceptable? Definitely. Regards Antoine. From benjamin at python.org Tue Dec 20 17:15:12 2011 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 20 Dec 2011 11:15:12 -0500 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: <1324397450.3368.25.camel@localhost.localdomain> References: <20111220095149.6187cca8@pitrou.net> <1324397450.3368.25.camel@localhost.localdomain> Message-ID: 2011/12/20 Antoine Pitrou : > Le mardi 20 d?cembre 2011 ? 10:57 -0500, Benjamin Peterson a ?crit : >> In that case, I would rather speak of "fast" functions rather than >> "implemented in C" functions (a la the itertools docs). Would that be >> acceptable? > > Definitely. Done. -- Regards, Benjamin From techtonik at gmail.com Tue Dec 20 19:40:28 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 20 Dec 2011 21:40:28 +0300 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: <87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Dec 19, 2011 at 7:47 AM, Stephen J. Turnbull wrote: > Fernando Perez writes: > > > Apology for the advertising, > > If there's any apologizing to be done, it's on Anatoly's part. Your > post was short, to the point, information-packed, and should put a big > fat open-centered ideographic full stop period to this thread. Fernando clearly showed that IPython rocks, because CPython suxx. I don't think anybody should apologize for the intention to fix this by enhancing CPython, so as a python-dev subscriber you should be ashamed of yourself for this proposal already. ;) Thanks everyone else for explaining the problem with current implementation. I'll post a follow-up as soon as I have a time to wrap my head around the details and see for myself why the IPython solution is so hard to implement. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at haypocalc.com Tue Dec 20 20:26:51 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 20 Dec 2011 20:26:51 +0100 Subject: [Python-Dev] Difference between PyUnicode_IS_ASCII and PyUnicode_IS_COMPACT_ASCII ? In-Reply-To: <20111220095440.43ff9f41@pitrou.net> References: <20111220095440.43ff9f41@pitrou.net> Message-ID: <4EF0E17B.6010503@haypocalc.com> On 20/12/2011 09:54, Antoine Pitrou wrote: > > Hello, > > The include file (unicodeobject.h) seems to imply that some pure ASCII > strings can be non-compact, but I don't understand how that can happen. If you create a string from Py_UNICODE* or wchar_t* (using the legacy API), PyUnicode_READY() may create a non-compact but ASCII string. Such string would be in the following state (extract of unicodeobject.h): - legacy string, ready: * structure = PyUnicodeObject structure * test: !PyUnicode_IS_COMPACT(op) && kind != PyUnicode_WCHAR_KIND * kind = PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND * compact = 0 * ready = 1 * data.any is not NULL * utf8 is shared and utf8_length = length with data.any if ascii = 1 * utf8_length = 0 if utf8 is NULL > Besides, the following comment also seems wrong: > > - compact: > > * structure = PyCompactUnicodeObject > * test: PyUnicode_IS_ASCII(op)&& !PyUnicode_IS_COMPACT(op) I added the "test" lines recently because I always forget how to get the structure type. The correct test should be: - compact: * structure = PyCompactUnicodeObject * test: PyUnicode_IS_COMPACT(op) && !PyUnicode_IS_ASCII(op) Victor From fijall at gmail.com Tue Dec 20 21:22:04 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Tue, 20 Dec 2011 22:22:04 +0200 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> <1324375710.3368.17.camel@localhost.localdomain> Message-ID: On Tue, Dec 20, 2011 at 12:14 PM, Dirkjan Ochtman wrote: > On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou wrote: >>> If this documentation is to be used by other python implementations, >>> then mentions of performance are outright harmful, since the >>> performance characteristics differ quite drastically. Written in C is >>> also not a part of specification as far as I know :) >> >> But that's basically the only reason to invoke the >> `operator.attrgetter("foo")` ugliness, instead of writing the explicit >> and obvious `lambda x: x.foo`. >> So not mentioning that it provides a speed benefit on CPython hides the >> primary reason for using the operator module. Overwise it's just a bunch >> of useless wrappers. > > So the question is if the docs are Python documentation or CPython > documentation? On PyPy, I'm guessing lambda x: x.foo might (some day) > be just as fast as operator.attrgetter("foo"). > as of now lambda is much faster on pypy for a constant name (there is not a good reason why exactly attrgetter is slower, but it somehow losts the fact that name is constant if it is). I'm in general fine with saying that this is either Python documentation or CPython documentation, but leaving this intermingled has caused us quite some headaches in the past. For example using attrgetter and map rather than just writing a loop is slower on PyPy, so a knowledge that it's *fast* in the operator module is misleading *in Python*. How about we somehow mark that all python documentation when it talks about performance, it talks about CPython performance? Cheers, fijal From tjreedy at udel.edu Tue Dec 20 22:57:13 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 20 Dec 2011 16:57:13 -0500 Subject: [Python-Dev] cpython (3.2): don't mention implementation detail In-Reply-To: References: <20111220095149.6187cca8@pitrou.net> <1324397450.3368.25.camel@localhost.localdomain> Message-ID: On 12/20/2011 11:15 AM, Benjamin Peterson wrote: > 2011/12/20 Antoine Pitrou: >> Le mardi 20 d?cembre 2011 ? 10:57 -0500, Benjamin Peterson a ?crit : >>> In that case, I would rather speak of "fast" functions rather than >>> "implemented in C" functions (a la the itertools docs). Would that be >>> acceptable? >> >> Definitely. > > Done. I like what you did too. -- Terry Jan Reedy From stephen at xemacs.org Wed Dec 21 03:14:05 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 21 Dec 2011 11:14:05 +0900 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: <87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87iplak6xe.fsf@uwakimon.sk.tsukuba.ac.jp> anatoly techtonik writes: > Fernando clearly showed that IPython rocks, because CPython suxx. No, IPython rocks because it focuses on doing one thing well: providing an interactive environment that takes advantage of the many features that Python provides in support. CPython should do the same: specifically, focus on the *language* that we all consider excellent but still can be improved, and on the (still) leading implementation of the language and the stdlib.[1] > so as a python-dev subscriber you should be ashamed of yourself for > this proposal already. ;) ROTFLMAO! No, I still think you're making an awfully big deal of something that doesn't need fixing, and I wish you would stop. Footnotes: [1] Note that this *is* *one* task, because CPython has chosen a definition of "language excellence" that includes prototype implementation of proposed language features and "batteries included". From chris at simplistix.co.uk Wed Dec 21 08:16:06 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Wed, 21 Dec 2011 07:16:06 +0000 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187A2.6070909@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> Message-ID: <4EF187B6.3080406@simplistix.co.uk> What's the python-dev view on this? -------- Original Message -------- Subject: Anyone still using Python 2.5? Date: Wed, 21 Dec 2011 07:15:46 +0000 From: Chris Withers To: Python List , "testing-in-python at lists.idyll.org" , simplistix at googlegroups.com Hi All, What's the general consensus on supporting Python 2.5 nowadays? Do people still have to use this in commercial environments or is everyone on 2.6+ nowadays? I'm finally getting some continuous integration set up for my packages and it's highlighting some 2.5 compatibility issues. I'm wondering whether to fix those (lots of ugly "from __future__ import with_statement" everywhere) or just to drop Python 2.5 support. What do people feel? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From python.leojay at gmail.com Wed Dec 21 09:50:46 2011 From: python.leojay at gmail.com (Leo Jay) Date: Wed, 21 Dec 2011 16:50:46 +0800 Subject: [Python-Dev] Cannot use multiprocessing and zip together on windows In-Reply-To: References: Message-ID: Hi All, I posted this several days ago in python mailing list but got no response and I think it might be a bug, so I post it here. Apologize if it's not appropriate. I have a file p.zip, there is a __main__.py in it, and the content of __main__.py is: from multiprocessing import Process import os def f(): ? print 'in f, pid:', os.getpid() if __name__ == '__main__': ? print 'pid:', os.getpid() ? p = Process(target=f) ? p.start() ? p.join() On linux, I can get expected result for running "python p.zip" But on windows xp, I got: Traceback (most recent call last): ?File "", line 1, in ?File "C:\python27\lib\multiprocessing\forking.py", line 346, in main ? prepare(preparation_data) ?File "C:\python27\lib\multiprocessing\forking.py", line 454, in prepare ? assert main_name not in sys.modules, main_name AssertionError: __main__ It seems that the situation described here is similar: http://bugs.python.org/issue10128 But the patch doesn't work for me. Anybody knows how to fix this? Thanks. -- Best Regards, Leo Jay From dirkjan at ochtman.nl Wed Dec 21 09:55:34 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 21 Dec 2011 09:55:34 +0100 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187B6.3080406@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: On Wed, Dec 21, 2011 at 08:16, Chris Withers wrote: > What's the general consensus on supporting Python 2.5 nowadays? > > Do people still have to use this in commercial environments or is > everyone on 2.6+ nowadays? This seems rather off-topic for python-dev. FWIW, on Gentoo we're just now getting to dropping 2.4, so we'll support 2.5 quite a bit longer. That's also the tendency I see from the ecosystem, at least insofar as I notice. On the other hand, we've had 2.7 as the default python on our stable branch since March 2011. I also know Mercurial is still supporting 2.4 (they tend to be conservative about dropping support for old releases). Cheers, Dirkjan From neologix at free.fr Wed Dec 21 10:42:07 2011 From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 21 Dec 2011 10:42:07 +0100 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187B6.3080406@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: > Do people still have to use this in commercial environments or is > everyone on 2.6+ nowadays? RHEL 5.7 ships with Python 2.4.3. So no, not everybody is on 2.6+ today, and this won't happen before a couple years. cf From phd at phdru.name Wed Dec 21 11:29:15 2011 From: phd at phdru.name (Oleg Broytman) Date: Wed, 21 Dec 2011 14:29:15 +0400 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187B6.3080406@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: <20111221102915.GA18354@iskra.aviel.ru> On Wed, Dec 21, 2011 at 07:16:06AM +0000, Chris Withers wrote: > What's the general consensus on supporting Python 2.5 nowadays? > > Do people still have to use this in commercial environments I have to use it. There is a rather large and complex intranet site with both 32- and 64-bit versions of Python and libraries, and there are about 70 copies of it at client sites so it'd be very hard to recompile and adapt it to Python 2.6, test and upgrade all clients. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From barry at python.org Wed Dec 21 13:42:45 2011 From: barry at python.org (Barry Warsaw) Date: Wed, 21 Dec 2011 07:42:45 -0500 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187B6.3080406@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: <20111221074245.6652c314@resist.wooz.org> On Dec 21, 2011, at 07:16 AM, Chris Withers wrote: >What's the general consensus on supporting Python 2.5 nowadays? FWIW, Ubuntu dropped 2.5 quite a while ago. The next LTS (long term support) release in April 2012 will have only Python 2.7 (and 3.2). The currently in-development next Debian release currently has only Python 2.6, 2.7, and 3.2 with 2.7 as the default. For my own code, Python 2.6 is the minimum, and I'm seeing more upstream libraries target 2.6 as a minimum also (e.g. dbus-python). When projects say they still need to target older Pythons, RHEL support is usually cited as the reason. Cheers, -Barry From fuzzyman at voidspace.org.uk Wed Dec 21 14:07:14 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 21 Dec 2011 13:07:14 +0000 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <20111221074245.6652c314@resist.wooz.org> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> Message-ID: <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk> On 21 Dec 2011, at 12:42, Barry Warsaw wrote: > On Dec 21, 2011, at 07:16 AM, Chris Withers wrote: > >> What's the general consensus on supporting Python 2.5 nowadays? > > FWIW, Ubuntu dropped 2.5 quite a while ago. The next LTS (long term support) > release in April 2012 will have only Python 2.7 (and 3.2). The currently > in-development next Debian release currently has only Python 2.6, 2.7, and 3.2 > with 2.7 as the default. > > For my own code, Python 2.6 is the minimum, and I'm seeing more upstream > libraries target 2.6 as a minimum also (e.g. dbus-python). When projects say > they still need to target older Pythons, RHEL support is usually cited as the > reason. For "production work" I've been on 2.6 for a while and will soon be switching to 2.7 (I do my development on 2.7). For my libraries I'm still supporting 2.4. The *major* syntax feature you lose by targeting 2.4 is the with statement, so it will be nice to drop 2.4 support. The next releases of mock and unittest2 will still support 2.4, but the ones after that will be 2.5+. Thankfully tox makes testing across multiple versions (and implementations) easy. All the best, Michael Foord > > Cheers, > -Barry > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From solipsis at pitrou.net Wed Dec 21 14:20:44 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 21 Dec 2011 14:20:44 +0100 Subject: [Python-Dev] Anyone still using Python 2.5? References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: <20111221142044.53382f94@pitrou.net> On Wed, 21 Dec 2011 07:16:06 +0000 Chris Withers wrote: > What's the python-dev view on this? Python 2.5 is not supported by *us* anymore (*). Anyone still using it therefore relies on their OS vendor to apply potential security patches and other important fixes. Library authors can of course choose to still support it. I wouldn't care personally. I'm of the opinion that people who (by their choice of OS) have a preference for legacy software shouldn't ask for the latest versions of Python libraries. (*) From http://www.python.org/download/releases/2.5.6/ : ?This release is the final release of Python 2.5; under the current release policy, no security issues in Python 2.5 will be fixed anymore.? Regards Antoine. From jwzxgo at gmail.com Wed Dec 21 14:31:25 2011 From: jwzxgo at gmail.com (wang tiezhen) Date: Wed, 21 Dec 2011 14:31:25 +0100 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk> Message-ID: I am still working on projects based on Python2.4 in commercial environments (limitation of OS: Solaris 5.10). And I don't think this will be changed soon.. 2011/12/21 Michael Foord > > On 21 Dec 2011, at 12:42, Barry Warsaw wrote: > > > On Dec 21, 2011, at 07:16 AM, Chris Withers wrote: > > > >> What's the general consensus on supporting Python 2.5 nowadays? > > > > FWIW, Ubuntu dropped 2.5 quite a while ago. The next LTS (long term > support) > > release in April 2012 will have only Python 2.7 (and 3.2). The currently > > in-development next Debian release currently has only Python 2.6, 2.7, > and 3.2 > > with 2.7 as the default. > > > > For my own code, Python 2.6 is the minimum, and I'm seeing more upstream > > libraries target 2.6 as a minimum also (e.g. dbus-python). When > projects say > > they still need to target older Pythons, RHEL support is usually cited > as the > > reason. > > > For "production work" I've been on 2.6 for a while and will soon be > switching to 2.7 (I do my development on 2.7). > > For my libraries I'm still supporting 2.4. The *major* syntax feature you > lose by targeting 2.4 is the with statement, so it will be nice to drop 2.4 > support. The next releases of mock and unittest2 will still support 2.4, > but the ones after that will be 2.5+. > > Thankfully tox makes testing across multiple versions (and > implementations) easy. > > All the best, > > Michael Foord > > > > > Cheers, > > -Barry > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > > > > > -- > http://www.voidspace.org.uk/ > > > May you do good and not evil > May you find forgiveness for yourself and forgive others > May you share freely, never taking more than you give. > -- the sqlite blessing > http://www.sqlite.org/different.html > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jwzxgo%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesc-lists-python-dev2 at pyropus.ca Wed Dec 21 14:35:34 2011 From: charlesc-lists-python-dev2 at pyropus.ca (Charles Cazabon) Date: Wed, 21 Dec 2011 07:35:34 -0600 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk> Message-ID: <20111221133534.GA27321@pyropus.ca> Michael Foord wrote: > On 21 Dec 2011, at 12:42, Barry Warsaw wrote: > > > > FWIW, Ubuntu dropped 2.5 quite a while ago. The next LTS (long term > > support) release in April 2012 will have only Python 2.7 (and 3.2). True, but 2.5 is still current on Hardy, an LTS release that is officially supported until April 2013. Lots of places still use 2.5 on Hardy (or on Lucid, the LTS release after Hardy, though they have to get it from the deadsnakes repository as its not the normal version on Lucid). My workplace uses 2.5 for a lot of things, but is slowly transitioning to 2.6. > For "production work" I've been on 2.6 for a while and will soon be > switching to 2.7 (I do my development on 2.7). > > For my libraries I'm still supporting 2.4. My own personal software generally tries to stay compatible further back. getmail is used on lots of little network appliances and such that don't necessarily run a current OS, so getmail v4 targets 2.3.3 and up. If I'm writing something new today, I usually assume 2.6 and up. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://pyropus.ca/software/ ----------------------------------------------------------------------- From techtonik at gmail.com Wed Dec 21 15:26:05 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 21 Dec 2011 17:26:05 +0300 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187B6.3080406@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: I believe most AppEngine applications in Python are still using 2.5 run-time. So are development boxes for these applications. It may take another year or two for the transition. -- anatoly t. On Wed, Dec 21, 2011 at 10:16 AM, Chris Withers wrote: > What's the python-dev view on this? > > -------- Original Message -------- > Subject: Anyone still using Python 2.5? > Date: Wed, 21 Dec 2011 07:15:46 +0000 > From: Chris Withers > To: Python List , "testing-in-python at lists.** > idyll.org " idyll.org >, > simplistix at googlegroups.com > > Hi All, > > What's the general consensus on supporting Python 2.5 nowadays? > > Do people still have to use this in commercial environments or is > everyone on 2.6+ nowadays? > > I'm finally getting some continuous integration set up for my packages > and it's highlighting some 2.5 compatibility issues. I'm wondering > whether to fix those (lots of ugly "from __future__ import > with_statement" everywhere) or just to drop Python 2.5 support. > > What do people feel? > > cheers, > > Chris > > -- > Simplistix - Content Management, Batch Processing & Python Consulting > - http://www.simplistix.co.uk > ______________________________**_________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/**mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > techtonik%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Dec 21 15:28:15 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 22 Dec 2011 00:28:15 +1000 Subject: [Python-Dev] Cannot use multiprocessing and zip together on windows In-Reply-To: References: Message-ID: On Wed, Dec 21, 2011 at 6:50 PM, Leo Jay wrote: > It seems that the situation described here is similar: > http://bugs.python.org/issue10128 > > But the patch doesn't work for me. > > Anybody knows how to fix this? Try the patch from http://bugs.python.org/issue10845 (the one on #10128 only partially addresses the problem - a similarly incomplete answer was our first attempt at fixing this for 3.2) I've added a note to the issue you linked indicating that the change should also be backported to the 2.7 maintenance branch. (IIRC, the reason backporting to 2.7 didn't come up originally is that the only reason we found the bad interaction in 3.2 was because we added test.__main__, so the regression test suite can be executed via "python -m test". At the time, it didn't occur to me, or anyone else involved, that the underlying bug also affected 2.7). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From dmalcolm at redhat.com Wed Dec 21 18:11:58 2011 From: dmalcolm at redhat.com (David Malcolm) Date: Wed, 21 Dec 2011 12:11:58 -0500 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: <1324487519.2461.1.camel@surprise> On Wed, 2011-12-21 at 10:42 +0100, Charles-Fran?ois Natali wrote: > > Do people still have to use this in commercial environments or is > > everyone on 2.6+ nowadays? > > RHEL 5.7 ships with Python 2.4.3. So no, not everybody is on 2.6+ > today, and this won't happen before a couple years. (and RHEL 4.9 with Python 2.3.4, fwiw) From skippy.hammond at gmail.com Thu Dec 22 02:25:27 2011 From: skippy.hammond at gmail.com (Mark Hammond) Date: Thu, 22 Dec 2011 12:25:27 +1100 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187B6.3080406@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: <4EF28707.405@gmail.com> FWIW, the most recent version of pywin32 has the following download counts (rounded to the nearest thousand) Version 32bit 64bit ------------------------- 3.2 - 75,000 9,000 3.1 - 4,000 1,000 2.7 - 126,000 16,000 2.6 - 46,000 6,000 2.5 - 21,000 n/a 2.4 - 3,000 n/a 2.3 - 1,000 n/a So ISTM that 2.5 isn't hugely popular these days, but also isn't insignificant. It probably means I could "safely" drop 2.3 and 2.4 support though... Mark On 21/12/2011 6:16 PM, Chris Withers wrote: > What's the python-dev view on this? > > -------- Original Message -------- > Subject: Anyone still using Python 2.5? > Date: Wed, 21 Dec 2011 07:15:46 +0000 > From: Chris Withers > To: Python List , > "testing-in-python at lists.idyll.org" , > simplistix at googlegroups.com > > Hi All, > > What's the general consensus on supporting Python 2.5 nowadays? > > Do people still have to use this in commercial environments or is > everyone on 2.6+ nowadays? > > I'm finally getting some continuous integration set up for my packages > and it's highlighting some 2.5 compatibility issues. I'm wondering > whether to fix those (lots of ugly "from __future__ import > with_statement" everywhere) or just to drop Python 2.5 support. > > What do people feel? > > cheers, > > Chris > From greg at krypto.org Thu Dec 22 02:41:17 2011 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 21 Dec 2011 17:41:17 -0800 Subject: [Python-Dev] Adding features to 2to3... cpython/default right? can I backport to 2.7? Message-ID: I have some features I need to add to lib2to3 to make it more useful for our purposes at work supporting our massive code base in a Python 2 to 3 transition. Which tree should I develop these and check these into? cpython/default? Can I backport this to 3.2 and 2.7? It counts as a feature addition which is normally a no-no for backports. But in this case I'm enhancing 2to3 which is a useful tool. No big deal to me _personally_ if I can't backport from 3.3 (cpython/default) as I'd apply the changes to our copy at work internally but it seems wise to me for us to keep enhancing and improving 2to3 in a Python 2.x/3.x release independent manner to make people's conversions easier. The features I want to commit (all pretty easy additions) are command line flag / constructor option support for: 1) writing output files to a different directory tree instead of overwriting the input file. 2) modifying the output filename by altering the suffix (.py -> .py3 for example) 3) always writing output files even if there were no changes to make (useful in combination with the above to effectively act as a "copy library X to this directory converting it to python 3 syntax along the way"). The old http://hg.python.org/2to3/ tree exists but it really looks like an out of date version. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Thu Dec 22 02:49:37 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 22 Dec 2011 01:49:37 +0000 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <4EF28707.405@gmail.com> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com> Message-ID: <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk> On 22 Dec 2011, at 01:25, Mark Hammond wrote: > FWIW, the most recent version of pywin32 has the following download counts (rounded to the nearest thousand) > > Version 32bit 64bit > ------------------------- > 3.2 - 75,000 9,000 > 3.1 - 4,000 1,000 > 2.7 - 126,000 16,000 > 2.6 - 46,000 6,000 > 2.5 - 21,000 n/a > 2.4 - 3,000 n/a > 2.3 - 1,000 n/a > > So ISTM that 2.5 isn't hugely popular these days, but also isn't insignificant. It probably means I could "safely" drop 2.3 and 2.4 support though... > These figures can't possibly be true. No-one is using Python 3 yet. ;-) FWIW I heard a few days ago about a UK government department, HMGCC (Her Majesty's Government Communication Centre - based in Milton Keynes), who use Python for research projects. They switched to using Python 3 a while ago. All the best, Michael Foord > Mark > > On 21/12/2011 6:16 PM, Chris Withers wrote: >> What's the python-dev view on this? >> >> -------- Original Message -------- >> Subject: Anyone still using Python 2.5? >> Date: Wed, 21 Dec 2011 07:15:46 +0000 >> From: Chris Withers >> To: Python List , >> "testing-in-python at lists.idyll.org" , >> simplistix at googlegroups.com >> >> Hi All, >> >> What's the general consensus on supporting Python 2.5 nowadays? >> >> Do people still have to use this in commercial environments or is >> everyone on 2.6+ nowadays? >> >> I'm finally getting some continuous integration set up for my packages >> and it's highlighting some 2.5 compatibility issues. I'm wondering >> whether to fix those (lots of ugly "from __future__ import >> with_statement" everywhere) or just to drop Python 2.5 support. >> >> What do people feel? >> >> cheers, >> >> Chris >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From greg at krypto.org Thu Dec 22 02:51:44 2011 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 21 Dec 2011 17:51:44 -0800 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <20111218235516.741cc14d@pitrou.net> References: <4EEBB8FC.2010405@hotpy.org> <20111218235516.741cc14d@pitrou.net> Message-ID: On Sun, Dec 18, 2011 at 2:55 PM, Antoine Pitrou wrote: > On Fri, 16 Dec 2011 21:32:44 +0000 > Mark Shannon wrote: > > > > > per-instance attributes, it just forces them all to keep resizing up, > > > even though individual instances would be small with the current dict. > > There is a cut-off point, at the moment it's quite unsophisticated about > > how it does this, but it could easily be improved. > > Suggestions are welcome. > > Can you open an issue on the bug tracker? > There you can either give your repo URL, or upload a patch. > Both should allow to start reviewing the code :) > > Regards > > Antoine. > +1 I'm interested in seeing this as well. Anything that improves the memory overhead in cpython is appreciated as it decreases the pain when moving an app from 32bit to 64bit. :) -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at haypocalc.com Thu Dec 22 02:49:06 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 22 Dec 2011 02:49:06 +0100 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF187B6.3080406@simplistix.co.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: <4EF28C92.1090501@haypocalc.com> > What's the general consensus on supporting Python 2.5 nowadays? There is no such consensus :-) > Do people still have to use this in commercial environments or is > everyone on 2.6+ nowadays? At work, we are still using Python 2.5. Six months ago, we started a project to upgrade to 2.7, but we have now more urgent tasks, so the upgrade is delayed to later. Even if we upgrade new clients to 2.7, we will have to continue to support 2.5 for some more months (or years?). In a personal project (the IPy library), I dropped support of Python 2.5 in february 2011. Recently, I got a mail asking me where the previous version of my library (supporting Python 2.4) can be downloaded! Someone is still using Python 2.4: "I'm stuck with python 2.4 in my work environment." > What do people feel? For a new project, try to support Python 2.5, especially if you would like to write a portable library. For a new application working on Mac OS X, Windows and Linux, you can only support Python 2.6. Victor From victor.stinner at haypocalc.com Thu Dec 22 02:55:40 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 22 Dec 2011 02:55:40 +0100 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> Message-ID: <4EF28E1C.6080703@haypocalc.com> On 21/12/2011 15:26, anatoly techtonik wrote: > I believe most AppEngine applications in Python are still using 2.5 > run-time. So are development boxes for these applications. It may take > another year or two for the transition. App engine 1.6 improved support of Python 2.7, so I hope that -slowly- everybody will move to Python 3. Oops, I mean Python 2.7 ;-) http://code.google.com/appengine/docs/python/python27/ Victor From benjamin at python.org Thu Dec 22 03:08:49 2011 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 21 Dec 2011 20:08:49 -0600 Subject: [Python-Dev] Adding features to 2to3... cpython/default right? can I backport to 2.7? In-Reply-To: References: Message-ID: 2011/12/21 Gregory P. Smith : > I have some features I need to add to lib2to3 to make it more useful for our > purposes at work supporting our massive code base in a Python 2 to 3 > transition. Which tree should I develop these and check these into? > > cpython/default? > > Can I backport this to 3.2 and 2.7? ?It counts as a feature addition which > is normally a no-no for backports. ?But in this case I'm enhancing 2to3 > which is a useful tool. You may backport things for 2to3. It's exempt from feature freeze. > > No big deal to me _personally_ if I can't backport from 3.3 > (cpython/default) as I'd apply the changes to our copy at work internally > but it seems wise to me for us to keep enhancing and improving 2to3 in a > Python 2.x/3.x release independent manner to make people's conversions > easier. > > The features I want to commit (all pretty easy additions) are command line > flag / constructor option support for: > ? 1) writing output files to a different directory tree instead of > overwriting the input file. > ? 2) modifying the output filename by altering the suffix (.py -> .py3 for > example) > ? 3) always writing output files even if there were no changes to make > (useful in combination with the above to effectively act as a "copy library > X to this directory converting it to python 3 syntax along the way"). > > The old?http://hg.python.org/2to3/?tree exists but it really looks like an > out of date version. Indeed; I should probably just delete it. -- Regards, Benjamin From mwm at mired.org Thu Dec 22 03:45:50 2011 From: mwm at mired.org (Mike Meyer) Date: Wed, 21 Dec 2011 18:45:50 -0800 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com> <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk> Message-ID: <20111221184550.00dc6b8a@bhuda.mired.org> On Thu, 22 Dec 2011 01:49:37 +0000 Michael Foord wrote: > These figures can't possibly be true. No-one is using Python 3 yet. ;-) Since you brought it up. Is anyone paying people (or trying to hire people) to write Python 3? Thanks, http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From anacrolix at gmail.com Thu Dec 22 04:36:41 2011 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 22 Dec 2011 14:36:41 +1100 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <20111221184550.00dc6b8a@bhuda.mired.org> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com> <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk> <20111221184550.00dc6b8a@bhuda.mired.org> Message-ID: I'm paid to write Python3. I've also been writing Python3 for hobby projects since mid 2010. I'm on the verge of going back to 2.7 due to compatibility issues :( On Thu, Dec 22, 2011 at 1:45 PM, Mike Meyer wrote: > On Thu, 22 Dec 2011 01:49:37 +0000 > Michael Foord wrote: >> These figures can't possibly be true. No-one is using Python 3 yet. ;-) > > Since you brought it up. Is anyone paying people (or trying to hire > people) to write Python 3? > > ? ? ? ?Thanks, > ? ? ? ? -- > Mike Meyer ? ? ? ? ? ? ?http://www.mired.org/ > Independent Software developer/SCM consultant, email for more information. > > O< ascii ribbon campaign - stop html mail - www.asciiribbon.org > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com -- ?_? From a.badger at gmail.com Thu Dec 22 06:17:32 2011 From: a.badger at gmail.com (Toshio Kuratomi) Date: Wed, 21 Dec 2011 21:17:32 -0800 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF28C92.1090501@haypocalc.com> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <4EF28C92.1090501@haypocalc.com> Message-ID: <20111222051732.GF24681@unaka.lan> On Thu, Dec 22, 2011 at 02:49:06AM +0100, Victor Stinner wrote: > > >Do people still have to use this in commercial environments or is > >everyone on 2.6+ nowadays? > > At work, we are still using Python 2.5. Six months ago, we started a > project to upgrade to 2.7, but we have now more urgent tasks, so the > upgrade is delayed to later. Even if we upgrade new clients to 2.7, > we will have to continue to support 2.5 for some more months (or > years?). > At my work, I'm on RHEL5 and RHEL6. So I'm currently supporting python-2.4 and python-2.6. We're up to 75% RHEL6 (though, not the machines where most of our deployed, custom written apps are running) so I shouldn't have to support python-2.4 for much longer. > In a personal project (the IPy library), I dropped support of Python > 2.5 in february 2011. Recently, I got a mail asking me where the > previous version of my library (supporting Python 2.4) can be > downloaded! Someone is still using Python 2.4: "I'm stuck with python > 2.4 in my work environment." > As part of work, I package for EPEL5 (addon packages for RHEL5). Sometimes we need a new version of a package or a new package for RHEL5 and thus need to have python-2.4 compatible versions of the package and any of its dependencies. When I no longer need to maintain python-2.4 stuff for work, I'm hoping to not have to do quite so much of this but sometimes I know I'll still get requests to update an existing package to fix a bug or fix a feature and that will require updates of dependent libraries. I'll still be stuck looking for python-2.4 compatible versions of all of these :-( > >What do people feel? > > For a new project, try to support Python 2.5, especially if you would > like to write a portable library. For a new application working on > Mac OS X, Windows and Linux, you can only support Python 2.6. > I agree that libraries have a need to go farther back than applications. I have one library that I support on python-2.3 (for RHEL4... I'm counting down the months on that one :-). Every other library I maintain, I make sure I support at least python-2.4. Application-wise, I currently have to support python-2.4+ but given that Linux distros seem to all have some version out that supports at least python-2.6, I don't think I'll be developing any applications that intentionally support less than that once I get moved away from RHEL-5 at my workplace. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From techtonik at gmail.com Thu Dec 22 08:56:47 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 22 Dec 2011 09:56:47 +0200 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com> <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk> Message-ID: On Thu, Dec 22, 2011 at 4:49 AM, Michael Foord wrote: > > On 22 Dec 2011, at 01:25, Mark Hammond wrote: > > > FWIW, the most recent version of pywin32 has the following download > counts (rounded to the nearest thousand) > > > > Version 32bit 64bit > > ------------------------- > > 3.2 - 75,000 9,000 > > 3.1 - 4,000 1,000 > > 2.7 - 126,000 16,000 > > 2.6 - 46,000 6,000 > > 2.5 - 21,000 n/a > > 2.4 - 3,000 n/a > > 2.3 - 1,000 n/a > > > > So ISTM that 2.5 isn't hugely popular these days, but also isn't > insignificant. It probably means I could "safely" drop 2.3 and 2.4 support > though... > > > > > These figures can't possibly be true. No-one is using Python 3 yet. ;-) > python.org should have a poll/settings for active python.org accounts to allow people mark when they switch to Python 3. FWIW I heard a few days ago about a UK government department, HMGCC (Her > Majesty's Government Communication Centre - based in Milton Keynes), who > use Python for research projects. They switched to using Python 3 a while > ago. > if that == True: front_page.response(news_template.render("News About Her Majesty switched to Python 3")) Can't stand to do a +1 for the news item. All the best, > > Michael Foord > -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Thu Dec 22 09:05:11 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 22 Dec 2011 10:05:11 +0200 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <4EF28E1C.6080703@haypocalc.com> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <4EF28E1C.6080703@haypocalc.com> Message-ID: On Thu, Dec 22, 2011 at 4:55 AM, Victor Stinner < victor.stinner at haypocalc.com> wrote: > On 21/12/2011 15:26, anatoly techtonik wrote: > >> I believe most AppEngine applications in Python are still using 2.5 >> run-time. So are development boxes for these applications. It may take >> another year or two for the transition. >> > > App engine 1.6 improved support of Python 2.7, so I hope that -slowly- > everybody will move to Python 3. Oops, I mean Python 2.7 ;-) > > http://code.google.com/appengine/docs/python/python27/ > I've just got reminded that Python 2.7 support in AppEngine is still experimental, so the exodus is unlikely to happen soon. https://groups.google.com/forum/#!topic/google-appengine-python/tPbDEAHke64 -------------- next part -------------- An HTML attachment was scrubbed... URL: From timwintle at gmail.com Thu Dec 22 10:44:32 2011 From: timwintle at gmail.com (Tim Wintle) Date: Thu, 22 Dec 2011 09:44:32 +0000 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <20111221074245.6652c314@resist.wooz.org> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> Message-ID: <1324547072.30982.15.camel@tim-laptop> On Wed, 2011-12-21 at 07:42 -0500, Barry Warsaw wrote: > On Dec 21, 2011, at 07:16 AM, Chris Withers wrote: > > >What's the general consensus on supporting Python 2.5 nowadays? > > FWIW, Ubuntu dropped 2.5 quite a while ago. Some servers I deploy to run Ubuntu, but we're installing previous python versions to support our apps - OS support isn't a factor in which version we develop for. I work on applications in 2.4-2.6. Generally: 2.4 apps are legacy and a migration is planned in the next year (either to 2.7 or to pypy). 2.5 apps are the speed-critical ones. Our tests showed the performance was different enough between 2.5 and 2.6 for me to not update. They also have significant native extensions in them so are potentially the most difficult to port to python3. 2.6 apps are newish and (mainly) pure python. I can see myself still using 2.5 for many years, but porting the 2.6 and 2.4 code to either pypy or python3 in the not too distant future. I believe we're most likely to choose python3 for apps with heavy use of Unicode (and pick a version after the changes to internal unicode format landed). Tim Wintle From solipsis at pitrou.net Thu Dec 22 10:56:38 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 22 Dec 2011 10:56:38 +0100 Subject: [Python-Dev] Anyone still using Python 2.5? References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> <1324547072.30982.15.camel@tim-laptop> Message-ID: <20111222105638.59498a88@pitrou.net> On Thu, 22 Dec 2011 09:44:32 +0000 Tim Wintle wrote: > > 2.5 apps are the speed-critical ones. Our tests showed the performance > was different enough between 2.5 and 2.6 for me to not update. Really? Where's the regression? Regards Antoine. From stefan_ml at behnel.de Thu Dec 22 11:12:24 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 22 Dec 2011 11:12:24 +0100 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <20111222105638.59498a88@pitrou.net> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> <1324547072.30982.15.camel@tim-laptop> <20111222105638.59498a88@pitrou.net> Message-ID: Antoine Pitrou, 22.12.2011 10:56: > On Thu, 22 Dec 2011 09:44:32 +0000 > Tim Wintle wrote: >> >> 2.5 apps are the speed-critical ones. Our tests showed the performance >> was different enough between 2.5 and 2.6 for me to not update. > > Really? Where's the regression? That's not unexpected at least, and matches my own (limited) experience here. My gut feeling is that Py2.6 added a lot of "new in Py3.0" overhead, but without all the optimisations that went into Py3.x since then. At least some of that came back later with Py2.7. Would be nice to (eventually) see Py2.[567] run in speed.python.org in order to get a better idea of the relative performance. Stefan From fijall at gmail.com Thu Dec 22 11:09:53 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 22 Dec 2011 12:09:53 +0200 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <20111222105638.59498a88@pitrou.net> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> <1324547072.30982.15.camel@tim-laptop> <20111222105638.59498a88@pitrou.net> Message-ID: On Thu, Dec 22, 2011 at 11:56 AM, Antoine Pitrou wrote: > On Thu, 22 Dec 2011 09:44:32 +0000 > Tim Wintle wrote: >> >> 2.5 apps are the speed-critical ones. Our tests showed the performance >> was different enough between 2.5 and 2.6 for me to not update. > > Really? Where's the regression? > > Regards > > Antoine. Sounds weird, for all I know 2.6 is faster or not slower than 2.5. From timwintle at gmail.com Thu Dec 22 12:05:09 2011 From: timwintle at gmail.com (Tim Wintle) Date: Thu, 22 Dec 2011 11:05:09 +0000 Subject: [Python-Dev] Anyone still using Python 2.5? In-Reply-To: <20111222105638.59498a88@pitrou.net> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> <1324547072.30982.15.camel@tim-laptop> <20111222105638.59498a88@pitrou.net> Message-ID: <1324551909.32266.58.camel@tim-laptop> On Thu, 2011-12-22 at 10:56 +0100, Antoine Pitrou wrote: > On Thu, 22 Dec 2011 09:44:32 +0000 > Tim Wintle wrote: > > > > 2.5 apps are the speed-critical ones. Our tests showed the performance > > was different enough between 2.5 and 2.6 for me to not update. > > Really? Where's the regression? I'm not certain - IIRC there were several nice optimisations in 2.6, and I wasn't expecting that when I first looked. I was running code designed for 2.5 under 2.6, so it's likely that with sufficient tweaking for 2.6 I might not have the same result. I tested this specific code with the python builds we have in production, not general python code - I don't mean this as a recommendation that anyone else assume 2.5 is faster for them. I suspect that Stefan's comments about newly added features without the optimisation in python3 might be partially true, but having the extra code to support them (while not using them) might also be part of the cause - ceval.c had over 1K line changes between r25 and r26, including cases for new opcodes, and new opcode predictions etc - it's possible that my code just happens to not follow the most optimal paths. I'm talking about a slow-down of under 10%, but enough that I couldn't justify moving these apps to 2.6 at the time for economic reasons, and pypy would be the main incentive to move this to 2.7. Tim From macsmith.us at gmail.com Thu Dec 22 12:10:32 2011 From: macsmith.us at gmail.com (Mac Smith) Date: Thu, 22 Dec 2011 16:40:32 +0530 Subject: [Python-Dev] reading multiline output Message-ID: <79DF8234-67BF-4297-ACE6-8D091D05B3E9@gmail.com> Hi, I have started HandBrakeCLI using subprocess.popen but the output is multiline and not terminated with \n so i am not able to read it using readline() while the HandBrakeCLI is running. kindly suggest some alternative. i have attached the output in a file. -------------- next part -------------- A non-text attachment was scrubbed... Name: output Type: application/octet-stream Size: 4541 bytes Desc: not available URL: -------------- next part -------------- -- Thanks Mac From phd at phdru.name Thu Dec 22 12:30:41 2011 From: phd at phdru.name (Oleg Broytman) Date: Thu, 22 Dec 2011 15:30:41 +0400 Subject: [Python-Dev] reading multiline output In-Reply-To: <79DF8234-67BF-4297-ACE6-8D091D05B3E9@gmail.com> References: <79DF8234-67BF-4297-ACE6-8D091D05B3E9@gmail.com> Message-ID: <20111222113041.GA18753@iskra.aviel.ru> Hello. We are sorry but we cannot help you. This mailing list is to work on developing Python (adding new features to Python itself and fixing bugs); if you're having problems learning, understanding or using Python, please find another forum. Probably python-list/comp.lang.python mailing list/news group is the best place; there are Python developers who participate in it; you may get a faster, and probably more complete, answer there. See http://www.python.org/community/ for other lists/news groups/fora. Thank you for understanding. On Thu, Dec 22, 2011 at 04:40:32PM +0530, Mac Smith wrote: > I have started HandBrakeCLI using subprocess.popen but the output is multiline and not terminated with \n so i am not able to read it using readline() while the HandBrakeCLI is running. kindly suggest some alternative. i have attached the output in a file. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From martin at v.loewis.de Thu Dec 22 14:34:00 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 22 Dec 2011 14:34:00 +0100 Subject: [Python-Dev] Adding GNU conditional execution in the Makefile? In-Reply-To: <4EE3EBA9.2050600@jcea.es> References: <4EE3EBA9.2050600@jcea.es> Message-ID: <4EF331C8.9000307@v.loewis.de> > If this is a policy, I would like to know. As Guido says: Python should work with "traditional make", I think this is particularly relevant for the BSDs, and Solaris. > And if somebody has a suggestion to cope with this difficulty... Why don't you use some @FOO@ replacement? Have something expand to either the object file name, or nothing. Regards, Martin From martin at v.loewis.de Thu Dec 22 15:09:25 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 22 Dec 2011 15:09:25 +0100 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EEA722A.10403@hotpy.org> References: <4EEA722A.10403@hotpy.org> Message-ID: <4EF33A15.1040707@v.loewis.de> > The current dict implementation is getting pretty old, > isn't it time we had a new one (for xmas)? I like the approach, and I think something should be done indeed. If you don't contribute your approach, I'd like to drop at least ma_smalltable for 3.3. A number of things about your branch came to my mind: - it would be useful to have a specialized representation for all-keys-are-strings. In that case, me_hash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. - why does _dictkeys need to be a full-blown Python object? We need refcounting and the size, but not the type slot. - I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account). Regards, Martin From solipsis at pitrou.net Thu Dec 22 17:49:31 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 22 Dec 2011 17:49:31 +0100 Subject: [Python-Dev] hg.python.org mod_wsgi changes Message-ID: <20111222174931.20c1c693@pitrou.net> Hello, Today I've modified the WSGI configuration at hg.python.org. If you notify anything wrong (e.g. when cloning a repository), please tell me. For the curious: http://mercurial.selenic.com/bts/issue2595 Regards Antoine. From fijall at gmail.com Thu Dec 22 19:15:04 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 22 Dec 2011 20:15:04 +0200 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EF33A15.1040707@v.loewis.de> References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> Message-ID: > - I wonder whether the shared keys could be computed at compile > ?time, considering all attribute names that get assigned for > ?self. The compiler could list those in the code object, and > ?class creation could iterate over all methods (taking base > ?classes into account). This is hard, because sometimes you don't quite know what the self *is* even, especially if __init__ calls some methods or there is any sort of control flow. You can however track what gets assigned at runtime at have shapes associated with objects. From jafo at tummy.com Fri Dec 23 01:15:33 2011 From: jafo at tummy.com (Sean Reifschneider) Date: Thu, 22 Dec 2011 17:15:33 -0700 Subject: [Python-Dev] Fwd: Anyone still using Python 2.5? In-Reply-To: <20111221074245.6652c314@resist.wooz.org> References: <4EF187A2.6070909@simplistix.co.uk> <4EF187B6.3080406@simplistix.co.uk> <20111221074245.6652c314@resist.wooz.org> Message-ID: <20111223001533.GA2061@tummy.com> On Wed, Dec 21, 2011 at 07:42:45AM -0500, Barry Warsaw wrote: >FWIW, Ubuntu dropped 2.5 quite a while ago. The next LTS (long term support) That's true for *CURRENT* releases, however Ubuntu still supports Python 2.5 via 8.04 LTS (end of life in April 2013). Lucid is 2.6 and goes EOL in 2015. Red Hat Enterprise is a bit more difficult a situation. They currently still have active support for Python 2.3 in RHEL 4, but that comes up to EOL in just a couple of months (Feb 2012). But they have this "extended life cycle" that ends in Feb 2015. RHEL 5 has python 2.4.3 and has an EOL of April 2014 (April 2017 for extended life cycle). There was a fairly large lag between RHEL 5 and RHEL 6 (almost 4 years), so there are a *LOT* of RHEL 5 systems out there. RHEL 6 has Python 2.6.6, BTW. This is why I recently released the "ineedpy2" package so that your program can request and search for specific versions of Python on a multi-python system. We have a number of systems that have Python 2.3 and older on them, but many of those systems have newer Pythons also available as alternate names. We recommend that whenever possible customers target deploying against the system python, meaning version 2.4.3 if they are deploying on CentOS 5. Because otherwise security updates of Python and *all the libraries they depend on* need to be tracked manually. Some customers decide to go one route, some to go the other, but that is our recommendation. Ideally, you are building your apps to target a production environment, not just using the latest and greatest Python without compelling reasons. So, yes, people are still using Python 2.5 and 2.4. Mostly this is people who have already deployed apps and are either fixing/updating them, or are adding new applications that they want to target the same production environment rather than setting up a new environment. Sean -- Linux, because eventually you grow up enough to be trusted with a fork(). Sean Reifschneider, Member of Technical Staff tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability From martin at v.loewis.de Fri Dec 23 09:57:01 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 23 Dec 2011 09:57:01 +0100 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> Message-ID: <4EF4425D.4060409@v.loewis.de> Am 22.12.2011 19:15, schrieb Maciej Fijalkowski: >> - I wonder whether the shared keys could be computed at compile >> time, considering all attribute names that get assigned for >> self. The compiler could list those in the code object, and >> class creation could iterate over all methods (taking base >> classes into account). > > This is hard, because sometimes you don't quite know what the self > *is* even, especially if __init__ calls some methods or there is any > sort of control flow. You can however track what gets assigned at > runtime at have shapes associated with objects. Actually, it's fairly easy, as it only needs to be heuristical. I am proposing the exact heuristics as specified above ("attribute names that get assigned for self"). I don't think that __init__ calling methods is much of an issue here, since these methods then still have attributes assigned to self. Regards, Martin From mark at hotpy.org Fri Dec 23 10:51:47 2011 From: mark at hotpy.org (Mark Shannon) Date: Fri, 23 Dec 2011 09:51:47 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EF33A15.1040707@v.loewis.de> References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> Message-ID: <4EF44F33.4000508@hotpy.org> Martin v. L?wis wrote: >> The current dict implementation is getting pretty old, >> isn't it time we had a new one (for xmas)? > > I like the approach, and I think something should be done indeed. > If you don't contribute your approach, I'd like to drop at least > ma_smalltable for 3.3. > > A number of things about your branch came to my mind: > - it would be useful to have a specialized representation for > all-keys-are-strings. In that case, me_hash could be dropped > from the representation. You would get savings compared to > the status quo even in the non-shared case. It might tricky switching key tables and I dont think it would save much memory as keys that are widely shared take up very little memory anyway, and not many other dicts are long-lived. (It might improve performance for dicts used for keyword arguments) > - why does _dictkeys need to be a full-blown Python object? > We need refcounting and the size, but not the type slot. It doesn't. It's just a hangover from my original HotPy implementation where all objects needed a type for the GC. So yes, the type slot could be removed. > - I wonder whether the shared keys could be computed at compile > time, considering all attribute names that get assigned for > self. The compiler could list those in the code object, and > class creation could iterate over all methods (taking base > classes into account). > It probably wouldn't be that hard to make a guess at compile time as to what the shared keys would be, but it doesn't really matter. The generation of intermediate shared keys will only happen once per class, so the overhead would be negligible. To cut down on that overhead, we could use a ref-count trick: If the instance being updating and its class hold the only two refs to an immutable keys(-set -table -vector?) then just treat it as mutable. I'll modify the repo to incorporate these changes when I have a chance. Cheers, Mark. From martin at v.loewis.de Fri Dec 23 11:33:59 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 23 Dec 2011 11:33:59 +0100 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EF44F33.4000508@hotpy.org> References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org> Message-ID: <4EF45917.10605@v.loewis.de> >> - it would be useful to have a specialized representation for >> all-keys-are-strings. In that case, me_hash could be dropped >> from the representation. You would get savings compared to >> the status quo even in the non-shared case. > It might tricky switching key tables and I dont think it would save much > memory as keys that are widely shared take up very little memory anyway, > and not many other dicts are long-lived. Why do you say that? In a plain 3.3 interpreter, I counted 595 dict objects (see script below). Of these, 563 (so nearly of them) had only strings as keys. Among those, I found 286 different key sets, where 231 key sets occurred only once (i.e. wouldn't be shared). Together, the string dictionaries had 13282 keys, and you could save as many pointers (actually more, because there will be more key slots than keys). I'm not sure why you think the string dicts with unshared keys would be short-lived. But even if they were, what matters is the steady-state number of dictionaries - if for every short-lived dictionary that gets released another one is created, any memory savings from reducing the dict size would still materialize. >> - I wonder whether the shared keys could be computed at compile >> time, considering all attribute names that get assigned for >> self. The compiler could list those in the code object, and >> class creation could iterate over all methods (taking base >> classes into account). >> > > It probably wouldn't be that hard to make a guess at compile time as to > what the shared keys would be, but it doesn't really matter. > The generation of intermediate shared keys will only happen once per > class, so the overhead would be negligible. I'm not so much concerned about overhead, but about correctness/ effectiveness of the heuristics. For a class with dynamic attributes, you may well come up with a very large key set. With source analysis, you wouldn't attempt to grow the keyset beyond what likely is being shared. Regards, Martin import sys d = sys.getobjects(0,dict) print(len(d), "dicts") d2 = [] for o in d: keys = o.keys() if not keys:continue types = tuple(set(type(k) for k in keys)) if types != (str,): continue d2.append(tuple(sorted(keys))) print(len(d2), "str dicts") freq = {} for keys in d2: freq[keys] = freq.get(keys,0)+1 print(len(freq), "different key sets") freq = sorted(freq.items(), key=lambda t:t[1]) print(len([o for o in freq if o[1]==1]), "unsharable") print(sum(len(o[0]) for o in freq), "keys") print(freq[-10:]) From mark at hotpy.org Fri Dec 23 12:21:26 2011 From: mark at hotpy.org (Mark Shannon) Date: Fri, 23 Dec 2011 11:21:26 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EF45917.10605@v.loewis.de> References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org> <4EF45917.10605@v.loewis.de> Message-ID: <4EF46436.2080904@hotpy.org> Martin v. L?wis wrote: >>> - it would be useful to have a specialized representation for >>> all-keys-are-strings. In that case, me_hash could be dropped >>> from the representation. You would get savings compared to >>> the status quo even in the non-shared case. >> It might tricky switching key tables and I dont think it would save much >> memory as keys that are widely shared take up very little memory anyway, >> and not many other dicts are long-lived. > > Why do you say that? In a plain 3.3 interpreter, I counted 595 dict > objects (see script below). Of these, 563 (so nearly of them) had > only strings as keys. Among those, I found 286 different key sets, > where 231 key sets occurred only once (i.e. wouldn't be shared). > > Together, the string dictionaries had 13282 keys, and you could save > as many pointers (actually more, because there will be more key slots > than keys). The question is how much memory needs to be saved to be worth adding the complexity, 10kb: No, 100Mb: yes. So data from "real" benchmarks would be useful. Also, I'm assuming that it would be tricky to implement correctly due to implicit assumptions in the rest of the code. If I'm wrong and its easy to implement then please do. > > I'm not sure why you think the string dicts with unshared keys would be > short-lived. Not all, but most. Most dicts with unshared keys would most likely be for keyword parameters. Explicit dicts tend to be few in number. (When I say few I mean up to 1k, not 100k or 1M). Module dicts are very likely to have unshared keys; they number in the 10s or 100s, but they do tend to be large. > But even if they were, what matters is the steady-state > number of dictionaries - if for every short-lived dictionary that > gets released another one is created, any memory savings from reducing > the dict size would still materialize. But only a few kb? > >>> - I wonder whether the shared keys could be computed at compile >>> time, considering all attribute names that get assigned for >>> self. The compiler could list those in the code object, and >>> class creation could iterate over all methods (taking base >>> classes into account). >>> >> It probably wouldn't be that hard to make a guess at compile time as to >> what the shared keys would be, but it doesn't really matter. >> The generation of intermediate shared keys will only happen once per >> class, so the overhead would be negligible. > > I'm not so much concerned about overhead, but about correctness/ > effectiveness of the heuristics. For a class with dynamic attributes, > you may well come up with a very large key set. With source analysis, > you wouldn't attempt to grow the keyset beyond what likely is being > shared. I agree some sort of heuristic is required to limit excessive growth and prevent pathological behaviour. The current implementation just has a cut off at a certain size; it could definitely be improved. As I said, I'll update the code soon and then, well what's the phase... Oh yes, "patches welcome" ;) Thanks for the feedback. Cheers, Mark. > > Regards, > Martin > > import sys > d = sys.getobjects(0,dict) > print(len(d), "dicts") > d2 = [] > for o in d: > keys = o.keys() > if not keys:continue > types = tuple(set(type(k) for k in keys)) > if types != (str,): > continue > d2.append(tuple(sorted(keys))) > print(len(d2), "str dicts") > freq = {} > for keys in d2: > freq[keys] = freq.get(keys,0)+1 > print(len(freq), "different key sets") > freq = sorted(freq.items(), key=lambda t:t[1]) > print(len([o for o in freq if o[1]==1]), "unsharable") > print(sum(len(o[0]) for o in freq), "keys") > print(freq[-10:]) From stefan_ml at behnel.de Fri Dec 23 13:03:17 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 23 Dec 2011 13:03:17 +0100 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EF46436.2080904@hotpy.org> References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org> <4EF45917.10605@v.loewis.de> <4EF46436.2080904@hotpy.org> Message-ID: Mark Shannon, 23.12.2011 12:21: > Martin v. L?wis wrote: >>>> - it would be useful to have a specialized representation for >>>> all-keys-are-strings. In that case, me_hash could be dropped >>>> from the representation. You would get savings compared to >>>> the status quo even in the non-shared case. >>> It might tricky switching key tables and I dont think it would save much >>> memory as keys that are widely shared take up very little memory anyway, >>> and not many other dicts are long-lived. >> >> Why do you say that? In a plain 3.3 interpreter, I counted 595 dict >> objects (see script below). Of these, 563 (so nearly of them) had >> only strings as keys. Among those, I found 286 different key sets, >> where 231 key sets occurred only once (i.e. wouldn't be shared). >> >> Together, the string dictionaries had 13282 keys, and you could save >> as many pointers (actually more, because there will be more key slots >> than keys). > > The question is how much memory needs to be saved to be worth adding the > complexity, 10kb: No, 100Mb: yes. > So data from "real" benchmarks would be useful. Consider taking a parsed MiniDOM tree as a benchmark. It contains so many instances of just a couple of different classes that it just has to make a huge difference if each of those instances is even just a bit smaller. It should also make a clear difference for plain Python ElementTree. I attached a benchmark script that measures the parsing speed as well as the total memory usage of the in-memory tree. You can get data files from the following places, just download them and pass their file names on the command line: http://gnosis.cx/download/hamlet.xml http://www.ibiblio.org/xml/examples/religion/ot/ot.xml Here are some results from my own machine for comparison: http://blog.behnel.de/index.php?p=197 Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: etbenchmark.py Type: text/x-python Size: 4760 bytes Desc: not available URL: From martin at v.loewis.de Fri Dec 23 14:05:22 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 23 Dec 2011 14:05:22 +0100 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EF46436.2080904@hotpy.org> References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org> <4EF45917.10605@v.loewis.de> <4EF46436.2080904@hotpy.org> Message-ID: <4EF47C92.1020603@v.loewis.de> > If I'm wrong and its easy to implement then please do. Ok, so I take it that you are not interested in the idea. No problem. Regards, Martin From martin at v.loewis.de Fri Dec 23 14:07:57 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 23 Dec 2011 14:07:57 +0100 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org> <4EF45917.10605@v.loewis.de> <4EF46436.2080904@hotpy.org> Message-ID: <4EF47D2D.2080707@v.loewis.de> > Consider taking a parsed MiniDOM tree as a benchmark. It contains so > many instances of just a couple of different classes that it just has to > make a huge difference if each of those instances is even just a bit > smaller. It should also make a clear difference for plain Python > ElementTree. Of course, for minidom, Mark's current implementation should already save quite a lot of memory, since all elements and text nodes have the same attributes. Still, it would be good to see how Mark's implementation deals with that. Regards, Martin From mark at hotpy.org Fri Dec 23 16:08:44 2011 From: mark at hotpy.org (Mark Shannon) Date: Fri, 23 Dec 2011 15:08:44 +0000 Subject: [Python-Dev] A new dict for Xmas? In-Reply-To: <4EF47C92.1020603@v.loewis.de> References: <4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org> <4EF45917.10605@v.loewis.de> <4EF46436.2080904@hotpy.org> <4EF47C92.1020603@v.loewis.de> Message-ID: <4EF4997C.9010808@hotpy.org> Martin v. L?wis wrote: >> If I'm wrong and its easy to implement then please do. > > Ok, so I take it that you are not interested in the idea. No problem. Its just that I don't think it would yield results commensurate with the effort. Also I think its worth keeping the initial version as simple as reasonably possible. Refinements can be added later. Cheers, Mark. > > Regards, > Martin From status at bugs.python.org Fri Dec 23 18:07:32 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 23 Dec 2011 18:07:32 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20111223170732.0F1B81CEEE@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-12-16 - 2011-12-23) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3168 ( -7) closed 22272 (+52) total 25440 (+45) Open issues with patches: 1358 Issues opened (27) ================== #13614: `setup.py register` fails if long_description contains RST http://bugs.python.org/issue13614 opened by techtonik #13615: `setup.py register` fails with -r argument http://bugs.python.org/issue13615 opened by techtonik #13617: Reject embedded null characters in wchar* strings http://bugs.python.org/issue13617 opened by haypo #13619: Add a new codec: "locale", the current locale encoding http://bugs.python.org/issue13619 opened by haypo #13621: Unicode performance regression in python3.3 vs python3.2 http://bugs.python.org/issue13621 opened by Boris.FELD #13629: _PyParser_TokenNames does not match up with the token.h number http://bugs.python.org/issue13629 opened by meador.inge #13630: IDLE: Find(ed) text is not highlighted while dialog box is ope http://bugs.python.org/issue13630 opened by marco #13631: readline fails to parse some forms of .editrc under editline ( http://bugs.python.org/issue13631 opened by zvezdan #13632: Update token documentation to reflect actual token types http://bugs.python.org/issue13632 opened by meador.inge #13633: Handling of hex character references in HTMLParser.handle_char http://bugs.python.org/issue13633 opened by ezio.melotti #13636: Python SSL Stack doesn't have a Secure Default set of ciphers http://bugs.python.org/issue13636 opened by naif #13638: PyErr_SetFromErrnoWithFilenameObject is undocumented http://bugs.python.org/issue13638 opened by pitrou #13639: UnicodeDecodeError when creating tar.gz with unicode name http://bugs.python.org/issue13639 opened by jason.coombs #13640: add mimetype for application/vnd.apple.mpegurl http://bugs.python.org/issue13640 opened by Hiroaki.Kawai #13641: decoding functions in the base64 module could accept unicode s http://bugs.python.org/issue13641 opened by pitrou #13642: urllib incorrectly quotes username and password in https basic http://bugs.python.org/issue13642 opened by joneskoo #13643: 'ascii' is a bad filesystem default encoding http://bugs.python.org/issue13643 opened by gz #13644: Python 3 crashes (segfaults) with this code. http://bugs.python.org/issue13644 opened by maniram.maniram #13645: test_import fails after test_coding http://bugs.python.org/issue13645 opened by pitrou #13646: Document poor interaction between multiprocessing and -m on Wi http://bugs.python.org/issue13646 opened by ncoghlan #13647: Python SSL stack doesn't securely validate certificate (as cli http://bugs.python.org/issue13647 opened by naif #13649: termios.ICANON is not documented http://bugs.python.org/issue13649 opened by techtonik #13651: Improve redirection in urllib http://bugs.python.org/issue13651 opened by tom.kel #13653: reorder set.intersection parameters for better performance http://bugs.python.org/issue13653 opened by dalke #13655: Python SSL stack doesn't have a default CA Store http://bugs.python.org/issue13655 opened by naif #13657: IDLE doesn't support sys.ps1 and sys.ps2. http://bugs.python.org/issue13657 opened by maniram.maniram #13658: Extra clause in class grammar documentation http://bugs.python.org/issue13658 opened by Joshua.Landau Most recent 15 issues with no replies (15) ========================================== #13658: Extra clause in class grammar documentation http://bugs.python.org/issue13658 #13657: IDLE doesn't support sys.ps1 and sys.ps2. http://bugs.python.org/issue13657 #13649: termios.ICANON is not documented http://bugs.python.org/issue13649 #13642: urllib incorrectly quotes username and password in https basic http://bugs.python.org/issue13642 #13641: decoding functions in the base64 module could accept unicode s http://bugs.python.org/issue13641 #13640: add mimetype for application/vnd.apple.mpegurl http://bugs.python.org/issue13640 #13638: PyErr_SetFromErrnoWithFilenameObject is undocumented http://bugs.python.org/issue13638 #13633: Handling of hex character references in HTMLParser.handle_char http://bugs.python.org/issue13633 #13632: Update token documentation to reflect actual token types http://bugs.python.org/issue13632 #13631: readline fails to parse some forms of .editrc under editline ( http://bugs.python.org/issue13631 #13608: remove born-deprecated PyUnicode_AsUnicodeAndSize http://bugs.python.org/issue13608 #13605: document argparse's nargs=REMAINDER http://bugs.python.org/issue13605 #13594: Aifc markers write fix http://bugs.python.org/issue13594 #13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex http://bugs.python.org/issue13590 #13574: refresh example in doc for Extending and Embedding http://bugs.python.org/issue13574 Most recent 15 issues waiting for review (15) ============================================= #13651: Improve redirection in urllib http://bugs.python.org/issue13651 #13645: test_import fails after test_coding http://bugs.python.org/issue13645 #13643: 'ascii' is a bad filesystem default encoding http://bugs.python.org/issue13643 #13640: add mimetype for application/vnd.apple.mpegurl http://bugs.python.org/issue13640 #13639: UnicodeDecodeError when creating tar.gz with unicode name http://bugs.python.org/issue13639 #13636: Python SSL Stack doesn't have a Secure Default set of ciphers http://bugs.python.org/issue13636 #13632: Update token documentation to reflect actual token types http://bugs.python.org/issue13632 #13631: readline fails to parse some forms of .editrc under editline ( http://bugs.python.org/issue13631 #13629: _PyParser_TokenNames does not match up with the token.h number http://bugs.python.org/issue13629 #13619: Add a new codec: "locale", the current locale encoding http://bugs.python.org/issue13619 #13617: Reject embedded null characters in wchar* strings http://bugs.python.org/issue13617 #13609: Add "os.get_terminal_size()" function http://bugs.python.org/issue13609 #13607: Move generator specific sections out of ceval. http://bugs.python.org/issue13607 #13604: update PEP 393 (match implementation) http://bugs.python.org/issue13604 #13598: string.Formatter doesn't support empty curly braces "{}" http://bugs.python.org/issue13598 Top 10 most discussed issues (10) ================================= #13643: 'ascii' is a bad filesystem default encoding http://bugs.python.org/issue13643 31 msgs #13636: Python SSL Stack doesn't have a Secure Default set of ciphers http://bugs.python.org/issue13636 29 msgs #8604: Adding an atomic FS write API http://bugs.python.org/issue8604 12 msgs #8828: Atomic function to rename a file http://bugs.python.org/issue8828 12 msgs #13585: Add contextlib.ContextStack http://bugs.python.org/issue13585 11 msgs #5689: Support xz compression in tarfile module http://bugs.python.org/issue5689 8 msgs #11638: python setup.py sdist --formats tar* crashes if version is uni http://bugs.python.org/issue11638 8 msgs #13555: cPickle MemoryError when loading large file (while pickle work http://bugs.python.org/issue13555 8 msgs #13621: Unicode performance regression in python3.3 vs python3.2 http://bugs.python.org/issue13621 8 msgs #13647: Python SSL stack doesn't securely validate certificate (as cli http://bugs.python.org/issue13647 8 msgs Issues closed (49) ================== #1785: "inspect" gets broken by some descriptors http://bugs.python.org/issue1785 closed by pitrou #3932: HTMLParser cannot handle '&' and non-ascii characters in attri http://bugs.python.org/issue3932 closed by ezio.melotti #5424: Packed IPaddr conversion tests should be extended http://bugs.python.org/issue5424 closed by pitrou #6321: Reload Python modules when running programs http://bugs.python.org/issue6321 closed by samwyse #7502: All DocTestCase instances compare and hash equal to each other http://bugs.python.org/issue7502 closed by pitrou #8035: urllib.request.urlretrieve hangs waiting for connection close http://bugs.python.org/issue8035 closed by neologix #8093: IDLE processes don't close http://bugs.python.org/issue8093 closed by ned.deily #9039: IDLE and module Doc http://bugs.python.org/issue9039 closed by terry.reedy #11006: warnings with subprocess and pipe2 http://bugs.python.org/issue11006 closed by rosslagerwall #11178: Running tests inside a package by module name fails http://bugs.python.org/issue11178 closed by michael.foord #11231: bytes() constructor is not correctly documented http://bugs.python.org/issue11231 closed by haypo #11764: inspect.getattr_static code execution w/ class body as non dic http://bugs.python.org/issue11764 closed by michael.foord #11813: inspect.getattr_static doesn't get module attributes http://bugs.python.org/issue11813 closed by python-dev #11829: inspect.getattr_static code execution with meta-metaclasses http://bugs.python.org/issue11829 closed by python-dev #11867: Make test_mailbox deterministic http://bugs.python.org/issue11867 closed by neologix #11870: test_3_join_in_forked_from_thread() of test_threading hangs 1 http://bugs.python.org/issue11870 closed by neologix #12231: regrtest: add -k and -K options to filter tests by function/fi http://bugs.python.org/issue12231 closed by pitrou #12708: multiprocessing.Pool is missing a starmap[_async]() method. http://bugs.python.org/issue12708 closed by pitrou #12798: Update mimetypes documentation http://bugs.python.org/issue12798 closed by orsenthil #12809: Missing new setsockopts in Linux (eg: IP_TRANSPARENT) http://bugs.python.org/issue12809 closed by neologix #13294: http.server: minor code style changes. http://bugs.python.org/issue13294 closed by orsenthil #13443: wrong links and examples in the functional HOWTO http://bugs.python.org/issue13443 closed by orsenthil #13522: Document error return values for PyFloat_* and PyComplex_* http://bugs.python.org/issue13522 closed by pitrou #13530: Docs for os.lseek neglect to mention what it returns http://bugs.python.org/issue13530 closed by haypo #13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize http://bugs.python.org/issue13560 closed by haypo #13571: Backup files support in IDLE http://bugs.python.org/issue13571 closed by terry.reedy #13576: Handling of broken condcoms in HTMLParser http://bugs.python.org/issue13576 closed by ezio.melotti #13577: __qualname__ is not present on builtin methods and functions http://bugs.python.org/issue13577 closed by pitrou #13581: help() appears to be broken; doesn't display __doc__ for class http://bugs.python.org/issue13581 closed by pitrou #13610: On Python parsing numbers. http://bugs.python.org/issue13610 closed by ezio.melotti #13613: Small error in regular expression poker hand example http://bugs.python.org/issue13613 closed by ezio.melotti #13616: Never ending loop in in update_refs Modules/gcmodule.c http://bugs.python.org/issue13616 closed by David.Butler #13618: bytes.decode() UnicodeEncodeError on Apple iOS (>16-bit) chara http://bugs.python.org/issue13618 closed by silverbacknet #13620: Support Chrome in webbrowser.py http://bugs.python.org/issue13620 closed by orsenthil #13622: Bytes performance regression in python3.3 vs python3.2 http://bugs.python.org/issue13622 closed by haypo #13623: Bytes performance regression in python3.3 vs python3.2 http://bugs.python.org/issue13623 closed by haypo #13624: UTF-8 encoder performance regression in python3.3 http://bugs.python.org/issue13624 closed by haypo #13625: multiprocessing.reduction gives OSError: [Errno 9] in 2.7.2 http://bugs.python.org/issue13625 closed by neologix #13626: Python SSL stack doesn't support DH ciphers http://bugs.python.org/issue13626 closed by pitrou #13627: Python SSL stack doesn't support Elliptic Curve ciphers http://bugs.python.org/issue13627 closed by pitrou #13628: python-gdb.py: patch to improve support of optimized Python http://bugs.python.org/issue13628 closed by haypo #13634: Python SSL stack doesn't support Compression configuration http://bugs.python.org/issue13634 closed by pitrou #13635: Python SSL stack doesn't support ordering of Ciphers http://bugs.python.org/issue13635 closed by pitrou #13637: binascii.a2b_* functions could accept unicode strings http://bugs.python.org/issue13637 closed by pitrou #13648: xml.sax.saxutils.escape does not escapes \x00 http://bugs.python.org/issue13648 closed by loewis #13650: urllib HTTPRedirectHandler does not implement documented behav http://bugs.python.org/issue13650 closed by tom.kel #13652: Creating lambda functions in a loop has unexpected results whe http://bugs.python.org/issue13652 closed by benjamin.peterson #13654: IDLE: Freezes and/or crash on SyntaxWarning... is used prior t http://bugs.python.org/issue13654 closed by ned.deily #13656: Document ctypes.util and ctypes.wintypes. http://bugs.python.org/issue13656 closed by maniram.maniram From fuzzyman at voidspace.org.uk Thu Dec 29 02:28:45 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 29 Dec 2011 01:28:45 +0000 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Hello all, A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Although it's a security issue I'm posting it here because it is now public and seems important. The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s ? keep one Core Duo core busy This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. Their recommended fix is to randomize the hash function. All the best, Michael -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From jnoller at gmail.com Thu Dec 29 02:37:56 2011 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 28 Dec 2011 20:37:56 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote: > Hello all, > > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: > > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Although it's a security issue I'm posting it here because it is now public and seems important. > > The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: > > reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB > 7 minutes of CPU usage for a 1 MB request > ~20 kbits/s ? keep one Core Duo core busy > > This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). > > The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. > > Their recommended fix is to randomize the hash function. > > All the best, > > Michael > Back up link for the PDF: http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf Ocert disclosure: http://www.ocert.org/advisories/ocert-2011-003.html jesse From jnoller at gmail.com Thu Dec 29 02:48:00 2011 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 28 Dec 2011 20:48:00 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Wednesday, December 28, 2011 at 8:37 PM, Jesse Noller wrote: > > > On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote: > > > Hello all, > > > > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: > > > > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > > > Although it's a security issue I'm posting it here because it is now public and seems important. > > > > The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: > > > > reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB > > 7 minutes of CPU usage for a 1 MB request > > ~20 kbits/s ? keep one Core Duo core busy > > > > This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). > > > > The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. > > > > Their recommended fix is to randomize the hash function. > > > > All the best, > > > > Michael > > Back up link for the PDF: > http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Ocert disclosure: > http://www.ocert.org/advisories/ocert-2011-003.html And more analysis/information: http://cryptanalysis.eu/blog/2011/12/28/effective-dos-attacks-against-web-application-plattforms-hashdos/ From ericsnowcurrently at gmail.com Thu Dec 29 02:49:08 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Dec 2011 18:49:08 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord wrote: > Hello all, > > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: > > ? ? ? ? http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Although it's a security issue I'm posting it here because it is now public and seems important. > > The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: > > ? ? ? ?reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB > ? ? ? ?7 minutes of CPU usage for a 1 MB request > ? ? ? ?~20 kbits/s ? keep one Core Duo core busy > > This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). > > The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. > > Their recommended fix is to randomize the hash function. Ironically, this morning I ran across a discussion from about 8 years ago on basically the same thing: http://mail.python.org/pipermail/python-dev/2003-May/035874.html From what I read in the thread, it didn't seem like anyone here was too worried about it. Does this new research change anything? -eric From alex.gaynor at gmail.com Thu Dec 29 02:51:21 2011 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Thu, 29 Dec 2011 01:51:21 +0000 (UTC) Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: A few thoughts on this: a) This is not a new issue, I'm curious what the new interest is in it. b) Whatever the solution to this is, it is *not* CPython specific, any decision should be reflected in the Python language spec IMO, if CPython has the semantic that dicts aren't vulnerable to hash collision then users *will* rely on this and another implementation having a different (valid) behavior opens up users to security issues. c) I'm not convinced a randomized hash is appropriate for the default dict, for a number of reasons: it's a performance hit on every dict operations, using a per-process seed means you can't compile the hash of an obj at Python's compile time, a per-dict seed inhibits a bunch of other optimizations. These may not be relevant to CPython, but they are to PyPy and probably the invoke-dynamic work on Jython (pursuant to point b). Therefore I think these should be considered application issues, since request limiting is difficult and error prone, I'd recommend the Python stdlib including a non-hash based map (such as a binary tree). Alex From raymond.hettinger at gmail.com Thu Dec 29 03:09:21 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Wed, 28 Dec 2011 18:09:21 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. It is believed that they give us better-than-random results for commonly encountered datasets. A change to randomized hashes would have a negative performance impact on those cases. Also, randomizing the hash wreaks havoc on doctests, book examples not matching actual dict reprs, and on efforts by users to optimize the insertion order into dicts with frequent lookups. Raymond On Dec 28, 2011, at 5:28 PM, Michael Foord wrote: > Hello all, > > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: > > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Although it's a security issue I'm posting it here because it is now public and seems important. > > The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: > > reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB > 7 minutes of CPU usage for a 1 MB request > ~20 kbits/s ? keep one Core Duo core busy > > This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). > > The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. > > Their recommended fix is to randomize the hash function. > > All the best, > > Michael > > > -- > http://www.voidspace.org.uk/ > > > May you do good and not evil > May you find forgiveness for yourself and forgive others > May you share freely, never taking more than you give. > -- the sqlite blessing > http://www.sqlite.org/different.html > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com From lists at cheimes.de Thu Dec 29 04:04:17 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 04:04:17 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4EFBD8B1.4020207@cheimes.de> Am 29.12.2011 02:37, schrieb Jesse Noller: > Back up link for the PDF: > http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Ocert disclosure: > http://www.ocert.org/advisories/ocert-2011-003.html >From http://www.nruns.com/_downloads/advisory28122011.pdf --- Python uses a hash function which is very similar to DJBX33X, which can be broken using a meet-in-the-middle attack. It operates on register size and is thus different for 64 and 32 bit machines. While generating multi-collisions efficiently is also possible for the 64 bit version of the function, the resulting colliding strings are too large to be relevant for anything more than an academic attack. Plone as the most prominent Python web framework accepts 1 MB of POST data, which it parses in about 7 minutes of CPU time in the worst case. This gives an attacker with about 20 kbit/s the possibility to keep one Core Duo core constantly busy. If the attacker is in the position to have a Gigabit line available, he can keep about 50.000 Core Duo cores busy. --- If I remember correctly CPython uses the long for its hash function so 64bit Windows uses a 32bit hash. From lists at cheimes.de Thu Dec 29 03:55:22 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 03:55:22 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: <4EFBD69A.9030903@cheimes.de> Am 29.12.2011 03:09, schrieb Raymond Hettinger: > FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. > It is believed that they give us better-than-random results for commonly > encountered datasets. A change to randomized hashes would have a > negative performance impact on those cases. > > Also, randomizing the hash wreaks havoc on doctests, book examples > not matching actual dict reprs, and on efforts by users to optimize > the insertion order into dicts with frequent lookups. My five cents on the topic: I totally concur with Raymound. He, Tim and all the others did a fantastic job with the dict implementation and optimization. We shouldn't overreact and mess with the current hashing and dict code just because a well-known and old attack vector pops up again. The dict code is far too crucial for Python's overall performance. However the issue should be documented in our docs. I've been dealing with web stuff and security for almost a decade. I've seen far worse attack vectors. This one can easily be solved with a couple of lines of Python code. For example Application developers can limit the maximum amount of POST parameters to a sensible amount and limit the length of each key, too. The issue less severe on platforms with 64bit hashes, so it won't affect most people. I think only 32bit Unix and Windows in general (32bit long) are in trouble. CPython could aid developers with a special subclass of dict. The crucial lookup function is already overwrite-able per dict instance and on subclasses of dict through PyDictObj's struct member PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example specialized subclass could limit the seach for a free slot to n recursions or choose to ignore the hash argument and calculate its own hash of the key. Christian From brian at python.org Thu Dec 29 04:41:22 2011 From: brian at python.org (Brian Curtin) Date: Wed, 28 Dec 2011 21:41:22 -0600 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Wed, Dec 28, 2011 at 19:51, Alex Gaynor wrote: > A few thoughts on this: > > a) This is not a new issue, I'm curious what the new interest is in it. Well they (the presenters of the report) had to be accepted to that conference for *something*, otherwise we wouldn't know they exist. From solipsis at pitrou.net Thu Dec 29 11:32:44 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Dec 2011 11:32:44 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4EFBD8B1.4020207@cheimes.de> Message-ID: <20111229113244.58cb739c@pitrou.net> On Thu, 29 Dec 2011 04:04:17 +0100 Christian Heimes wrote: > Am 29.12.2011 02:37, schrieb Jesse Noller: > > Back up link for the PDF: > > http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > > > Ocert disclosure: > > http://www.ocert.org/advisories/ocert-2011-003.html > > >From http://www.nruns.com/_downloads/advisory28122011.pdf > > --- > Python uses a hash function which is very similar to DJBX33X, which can > be broken using a > meet-in-the-middle attack. It operates on register size and is thus > different for 64 and 32 bit > machines. While generating multi-collisions efficiently is also possible > for the 64 bit version > of the function, the resulting colliding strings are too large to be > relevant for anything more > than an academic attack. > > Plone as the most prominent Python web framework accepts 1 MB of POST > data, which it > parses in about 7 minutes of CPU time in the worst case. > This gives an attacker with about 20 kbit/s the possibility to keep one > Core Duo core > constantly busy. If the attacker is in the position to have a Gigabit > line available, he can keep > about 50.000 Core Duo cores busy. > --- > > If I remember correctly CPython uses the long for its hash function so > 64bit Windows uses a 32bit hash. Not anymore, Py_hash_t is currently aligned with Py_ssize_t. Regards Antoine. From solipsis at pitrou.net Thu Dec 29 12:10:00 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Dec 2011 12:10:00 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFBD69A.9030903@cheimes.de> Message-ID: <20111229121000.582e8f31@pitrou.net> On Thu, 29 Dec 2011 03:55:22 +0100 Christian Heimes wrote: > > I've been dealing with web stuff and security for almost a decade. I've > seen far worse attack vectors. This one can easily be solved with a > couple of lines of Python code. For example Application developers can > limit the maximum amount of POST parameters to a sensible amount and > limit the length of each key, too. Shouldn't the setting be implemented by frameworks? > CPython could aid developers with a special subclass of dict. The > crucial lookup function is already overwrite-able per dict instance and > on subclasses of dict through PyDictObj's struct member PyDictEntry > *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example > specialized subclass could limit the seach for a free slot to n > recursions or choose to ignore the hash argument and calculate its own > hash of the key. Or, rather, the specialized subclass could implement hash randomization. Regards Antoine. From mark at hotpy.org Thu Dec 29 12:13:26 2011 From: mark at hotpy.org (Mark Shannon) Date: Thu, 29 Dec 2011 11:13:26 +0000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: <4EFC4B56.90709@hotpy.org> Michael Foord wrote: > Hello all, > > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: > > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Although it's a security issue I'm posting it here because it is now public and seems important. > > The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: > > reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB > 7 minutes of CPU usage for a 1 MB request > ~20 kbits/s ? keep one Core Duo core busy > > This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). > > The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. > > Their recommended fix is to randomize the hash function. > The attack relies on being able to predict the hash value for a given string. Randomising the string hash function is quite straightforward. There is no need to change the dictionary code. A possible (*untested*) patch is attached. I'll leave it for those more familiar with unicodeobject.c to do properly. Cheers, Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: hash.patch Type: text/x-diff Size: 1367 bytes Desc: not available URL: From mark at hotpy.org Thu Dec 29 12:25:03 2011 From: mark at hotpy.org (Mark Shannon) Date: Thu, 29 Dec 2011 11:25:03 +0000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: <4EFC4E0F.2070201@hotpy.org> Raymond Hettinger wrote: > FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. > It is believed that they give us better-than-random results for commonly > encountered datasets. A change to randomized hashes would have a > negative performance impact on those cases. Tim Peter's analysis applies mainly to ints which would be unchanged. A change to the hash function for strings would make no difference to the performance of the dict, as the ordering of the hash values is already quite different from the ordering of the strings for any string of more than 3 characters. > > Also, randomizing the hash wreaks havoc on doctests, book examples > not matching actual dict reprs, and on efforts by users to optimize > the insertion order into dicts with frequent lookups. The docs clearly state that the ordering of iteration over dicts is arbitrary. Perhaps changing it once in a while might be a good thing :) Cheers, Mark. > > > Raymond > > > > > > On Dec 28, 2011, at 5:28 PM, Michael Foord wrote: > >> Hello all, >> >> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: >> >> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf >> >> Although it's a security issue I'm posting it here because it is now public and seems important. >> >> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: >> >> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB >> 7 minutes of CPU usage for a 1 MB request >> ~20 kbits/s ? keep one Core Duo core busy >> >> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). >> >> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. >> >> Their recommended fix is to randomize the hash function. >> >> All the best, >> >> Michael >> >> >> -- >> http://www.voidspace.org.uk/ >> >> >> May you do good and not evil >> May you find forgiveness for yourself and forgive others >> May you share freely, never taking more than you give. >> -- the sqlite blessing >> http://www.sqlite.org/different.html >> >> >> >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org From solipsis at pitrou.net Thu Dec 29 12:42:18 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Dec 2011 12:42:18 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4E0F.2070201@hotpy.org> Message-ID: <20111229124218.6affbebd@pitrou.net> On Thu, 29 Dec 2011 11:25:03 +0000 Mark Shannon wrote: > > > > Also, randomizing the hash wreaks havoc on doctests, book examples > > not matching actual dict reprs, and on efforts by users to optimize > > the insertion order into dicts with frequent lookups. > > The docs clearly state that the ordering of iteration over dicts is > arbitrary. Perhaps changing it once in a while might be a good thing :) We already change it once in a while. http://twistedmatrix.com/trac/ticket/5352 ;) Regards Antoine. From armin.ronacher at active-4.com Thu Dec 29 12:29:53 2011 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 29 Dec 2011 12:29:53 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: <4EFC4F31.3090703@active-4.com> Hi, Just some extra thoughts about the whole topic in the light of web applications (since this was hinted in the talk) running on Python: Yes, you can limit the number of maximum allowed parameters for post data but really there are so many places where data is parsed into hashing containers that it's quite a worthless task. Here a very brief list of things usually parsed into a dict or set and where it happens: - URL parameters and url encoded form data Generally this happens somewhere in a framework but typically also in utility libraries that deal with URLs. For instance the stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do just that and that code is used left and right. Even if a framework would start limiting it's own URL parsing there is still a lot of code that does not do that the stdlib does that as well. With form data it's worse because you have multipart headers that need parsing and that is usually abstracted away so far from the user that they do not do that. Many frameworks just use the cgi module's parsing functions which also just directly feed into a dictionary. - HTTP headers. There is zero a WSGI framework can do about that since the headers are parsed into a dictionary by the WSGI server. - Incoming JSON data. Again outside of what the framework can do for the most part. simplejson can be modified to stop parsing with the hook stuff but nobody does that and since users invoke simplejson's parsing routines themselves most webapps would still be vulnerable even if all frameworks would fix the problem. - Hidden dict parameters. Things like the parameter part of content-type or the content-disposition headers are generally also just parsed into a dictionary. Likewise many frameworks parse things into set headers (for instance incoming etags). The cookie header is usually parsed into a dictionary as well. The issue is nothing new and at least my current POV on this topic was that your server should be guarded and shoot handlers of requests going rogue. Dictionaries are not the only thing that has a worst case performance that could be triggered by user input. That said. Considering that there are so many different places where things are probably close to arbitrarily long that is parsed into a dictionary or other hashing structure it's hard for a web application developer or framework to protect itself against. In case the watchdog is not a viable solution as I had assumed it was, I think it's more reasonable to indeed consider adding a flag to Python that allows randomization of hashes optionally before startup. However as it was said earlier, the attack is a lot more complex to carry out on a 64bit environment that it's probably (as it stands right now!) safe to ignore. The main problem there however is not that it's a new attack but that some dickheads could now make prebaked attacks against websites to disrupt them that might cause some negative publicity. In general though there are so many more ways to DDOS a website than this that I would rate the whole issue very low. Regards, Armin From mal at egenix.com Thu Dec 29 13:49:44 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 29 Dec 2011 13:49:44 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC4B56.90709@hotpy.org> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> Message-ID: <4EFC61E8.7090100@egenix.com> Mark Shannon wrote: > Michael Foord wrote: >> Hello all, >> >> A paper (well, presentation) has been published highlighting security problems with the hashing >> algorithm (exploiting collisions) in many programming languages Python included: >> >> >> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf >> >> >> Although it's a security issue I'm posting it here because it is now public and seems important. >> >> The issue they report can cause (for example) handling an http post to consume horrible amounts of >> cpu. For Python the figures they quoted: >> >> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB >> 7 minutes of CPU usage for a 1 MB request >> ~20 kbits/s ? keep one Core Duo core busy >> >> This was apparently reported to the security list, but hasn't been responded to beyond an >> acknowledgement on November 24th (the original report didn't make it onto the security list >> because it was held in a moderation queue). >> The same vulnerability was reported against various languages and web frameworks, and is already >> fixed in some of them. >> >> Their recommended fix is to randomize the hash function. >> > > The attack relies on being able to predict the hash value for a given string. Randomising the string > hash function is quite straightforward. > There is no need to change the dictionary code. > > A possible (*untested*) patch is attached. I'll leave it for those more familiar with > unicodeobject.c to do properly. The paper mentions that several web frameworks work around this by limiting the number of parameters per GET/POST/HEAD request. This sounds like a better alternative than randomizing the hash function of strings. Uncontrollable randomization has issues when you work with multi-process setups, since the processes would each use different hash values for identical strings. Putting the base_hash value under application control could be done to solve this problem, making sure that all processes use the same random base value. BTW: Since your randomization trick uses the current time, it would also be rather easy to tune an attack to find the currently used base_hash. To make this safe, you'd have to use a more random source for initializing the base_hash. Note that the same hash collision attack can be used for other key types as well, e.g. integers (where it's very easy to find hash collisions), so this kind of randomization would have to be applied to other basic types too. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 29 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From armin.ronacher at active-4.com Thu Dec 29 13:57:07 2011 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Thu, 29 Dec 2011 13:57:07 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC4F31.3090703@active-4.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> Message-ID: <4EFC63A3.5010008@active-4.com> Hi, Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to implement - Breaks unittests most likely (those were broken in the first place but that's still a very annoying change to make) - Might cause problems with interoperability of Pythons compiled with different hash salts - You're not really solving the problem because each linux distribution (besides Gentoo I guess) would have just one salt compiled in and that would be popular enough to have the same issue. - Option b: Environment variable for the salt + Easy-ish to implement + Easy to synchronize over different machines - initialization for base types happens early and unpredictive which makes it hard for embedded Python interpreters (think mod_wsgi and other things) to specify the salt - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do Where to add the salt to? Unicode strings and bytestrings (byte objects) I guess since those are the most common offenders. Sometimes tuples are keys of dictionaries but in that case a contributing factor to the hash is the string in the tuple anyways. Also related: since this is a security related issue, would this be something that goes into Python 2? Does that affect how a fix would look like? Regards, Armin From lists at cheimes.de Thu Dec 29 14:04:05 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 14:04:05 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20111229121000.582e8f31@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFBD69A.9030903@cheimes.de> <20111229121000.582e8f31@pitrou.net> Message-ID: <4EFC6545.5070904@cheimes.de> Am 29.12.2011 12:10, schrieb Antoine Pitrou: >> I've been dealing with web stuff and security for almost a decade. I've >> seen far worse attack vectors. This one can easily be solved with a >> couple of lines of Python code. For example Application developers can >> limit the maximum amount of POST parameters to a sensible amount and >> limit the length of each key, too. > > Shouldn't the setting be implemented by frameworks? Web framework like Django or CherryPy can be considered an application from the CPython core's point of view. ;) You are right. The term "framework" is a better word. >> CPython could aid developers with a special subclass of dict. The >> crucial lookup function is already overwrite-able per dict instance and >> on subclasses of dict through PyDictObj's struct member PyDictEntry >> *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example >> specialized subclass could limit the seach for a free slot to n >> recursions or choose to ignore the hash argument and calculate its own >> hash of the key. > > Or, rather, the specialized subclass could implement hash randomization. Yeah! I was thinking about the same when I wrote "calculate its own hash" but I was too sloppy to carry on my argument. Please take 3am as my excuse. From hs at ox.cx Thu Dec 29 14:11:49 2011 From: hs at ox.cx (Hynek Schlawack) Date: Thu, 29 Dec 2011 14:11:49 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC63A3.5010008@active-4.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> Message-ID: Hi, how about Option d: Host based salt + Easy-ish to implement ? how about basing it on the hostname for example? + transparent for all processes on the same host - probably unit test breakage In fact, we could use host based as default with the option to specify own which would solve the sync problems. That said, I agree with Armin that fixing this in the frameworks isn't an option. Regards, Hynek Am Donnerstag, 29. Dezember 2011 um 13:57 schrieb Armin Ronacher: > Hi, > > Something I should add to this now that I thought about it a bit more: > > Assuming this should be fixed on a language level the solution would > probably be to salt hashes. The most common hash to salt here is the > PyUnicode hash for obvious reasons. > > - Option a: Compiled in Salt > + Easy to implement > - Breaks unittests most likely (those were broken in the first place > but that's still a very annoying change to make) > - Might cause problems with interoperability of Pythons compiled with > different hash salts > - You're not really solving the problem because each linux > distribution (besides Gentoo I guess) would have just one salt > compiled in and that would be popular enough to have the same > issue. > > - Option b: Environment variable for the salt > + Easy-ish to implement > + Easy to synchronize over different machines > - initialization for base types happens early and unpredictive which > makes it hard for embedded Python interpreters (think mod_wsgi and > other things) to specify the salt > > - Option c: Random salt at runtime > + Easy to implement > - impossible to synchronize > - breaks unittests in the same way as a compiled in salt would do > > Where to add the salt to? Unicode strings and bytestrings (byte > objects) I guess since those are the most common offenders. Sometimes > tuples are keys of dictionaries but in that case a contributing factor > to the hash is the string in the tuple anyways. > > Also related: since this is a security related issue, would this be > something that goes into Python 2? Does that affect how a fix would > look like? > > > Regards, > Armin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org (mailto:Python-Dev at python.org) > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/hs%40ox.cx From lists at cheimes.de Thu Dec 29 14:19:28 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 14:19:28 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC4B56.90709@hotpy.org> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> Message-ID: <4EFC68E0.4000606@cheimes.de> Am 29.12.2011 12:13, schrieb Mark Shannon: > The attack relies on being able to predict the hash value for a given > string. Randomising the string hash function is quite straightforward. > There is no need to change the dictionary code. > > A possible (*untested*) patch is attached. I'll leave it for those more > familiar with unicodeobject.c to do properly. I'm worried that hash randomization of str is going to break 3rd party software that rely on a stable hash across multiple Python instances. Persistence layers like ZODB and cross interpreter communication channels used by multiprocessing may (!) rely on the fact that the hash of a string is fixed. Perhaps the dict code is a better place for randomization. The code in lookdict() and lookdict_unicode() could add a value to the hash. My approach is less intrusive and also closes the attack vector for all possible objects including str, byte, int and so on. I like also Armin's idea of an optional hash randomization. Christian From solipsis at pitrou.net Thu Dec 29 14:21:19 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Dec 2011 14:21:19 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> Message-ID: <20111229142119.02e1ab50@pitrou.net> On Thu, 29 Dec 2011 13:57:07 +0100 Armin Ronacher wrote: > > - Option c: Random salt at runtime > + Easy to implement > - impossible to synchronize > - breaks unittests in the same way as a compiled in salt would do This option would have my preference. I don't think hash() was ever meant to be "synchronizable". Already using a 32-bit Python will give you different results from a 64-bit Python. As for breaking unittests, these tests were broken in the first place. hash() does change from time to time. > Where to add the salt to? Unicode strings and bytestrings (byte > objects) I guess since those are the most common offenders. Sometimes > tuples are keys of dictionaries but in that case a contributing factor > to the hash is the string in the tuple anyways. Or it could be a process-wide constant for all dicts. If the constant is additive as proposed by Mark the impact should be negligible. (but the randomness must be good enough) Regards Antoine. From fdrake at acm.org Thu Dec 29 14:30:55 2011 From: fdrake at acm.org (Fred Drake) Date: Thu, 29 Dec 2011 08:30:55 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC68E0.4000606@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> Message-ID: On Thu, Dec 29, 2011 at 8:19 AM, Christian Heimes wrote: > Persistence layers like ZODB and cross interpreter communication > channels used by multiprocessing may (!) rely on the fact that the hash > of a string is fixed. ZODB does not rely on a fixed hash function for strings; for any application to rely on a stable hash would cause problems when updating Python versions. -Fred -- Fred L. Drake, Jr.? ? "A person who won't read has no advantage over one who can't read." ?? --Samuel Langhorne Clemens From lists at cheimes.de Thu Dec 29 14:32:21 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 14:32:21 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC63A3.5010008@active-4.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> Message-ID: <4EFC6BE5.6000400@cheimes.de> Am 29.12.2011 13:57, schrieb Armin Ronacher: > Hi, > > Something I should add to this now that I thought about it a bit more: > > Assuming this should be fixed on a language level the solution would > probably be to salt hashes. The most common hash to salt here is the > PyUnicode hash for obvious reasons. > > - Option a: Compiled in Salt > + Easy to implement > - Breaks unittests most likely (those were broken in the first place > but that's still a very annoying change to make) > - Might cause problems with interoperability of Pythons compiled with > different hash salts > - You're not really solving the problem because each linux > distribution (besides Gentoo I guess) would have just one salt > compiled in and that would be popular enough to have the same > issue. > > - Option b: Environment variable for the salt > + Easy-ish to implement > + Easy to synchronize over different machines > - initialization for base types happens early and unpredictive which > makes it hard for embedded Python interpreters (think mod_wsgi and > other things) to specify the salt > > - Option c: Random salt at runtime > + Easy to implement > - impossible to synchronize > - breaks unittests in the same way as a compiled in salt would do - Option d: Don't change __hash__ but only use randomized hash for PyDictEntry lookup + Easy to implement - breaks only software to relies on a fixed order of dict keys - breaks only a few to no unit tests IMHO we don't have to alter the outcome of hash("some string"), hash(1) and all other related types. We just need to reduce the change the an attacker can produce collisions in the dict (and set?) code that looks up the slot (PyDictEntry). How about adding the random value in Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / lookdict_unicode() (Python 3.x)? With this approach the hash of all our objects stay the same and just the dict code needs to be altered. The approach has also the benefit that all possible objects gain a randomized hash. > Also related: since this is a security related issue, would this be > something that goes into Python 2? Does that affect how a fix would > look like? IMHO it does affect the fix. A changed and randomized hash function may break software that relies on a stable hash() function. Christian From lists at cheimes.de Thu Dec 29 14:34:59 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 14:34:59 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20111229113244.58cb739c@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4EFBD8B1.4020207@cheimes.de> <20111229113244.58cb739c@pitrou.net> Message-ID: <4EFC6C83.40207@cheimes.de> Am 29.12.2011 11:32, schrieb Antoine Pitrou: >> If I remember correctly CPython uses the long for its hash function so >> 64bit Windows uses a 32bit hash. > > Not anymore, Py_hash_t is currently aligned with Py_ssize_t. Thanks for the update. Python 2.x still uses long and several large frameworks like Zope/Plone require 2.x. Christian From solipsis at pitrou.net Thu Dec 29 15:14:33 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Dec 2011 15:14:33 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> <4EFC6BE5.6000400@cheimes.de> Message-ID: <20111229151433.17a14ee0@pitrou.net> On Thu, 29 Dec 2011 14:32:21 +0100 Christian Heimes wrote: > Am 29.12.2011 13:57, schrieb Armin Ronacher: > > Hi, > > > > Something I should add to this now that I thought about it a bit more: > > > > Assuming this should be fixed on a language level the solution would > > probably be to salt hashes. The most common hash to salt here is the > > PyUnicode hash for obvious reasons. > > > > - Option a: Compiled in Salt > > + Easy to implement > > - Breaks unittests most likely (those were broken in the first place > > but that's still a very annoying change to make) > > - Might cause problems with interoperability of Pythons compiled with > > different hash salts > > - You're not really solving the problem because each linux > > distribution (besides Gentoo I guess) would have just one salt > > compiled in and that would be popular enough to have the same > > issue. > > > > - Option b: Environment variable for the salt > > + Easy-ish to implement > > + Easy to synchronize over different machines > > - initialization for base types happens early and unpredictive which > > makes it hard for embedded Python interpreters (think mod_wsgi and > > other things) to specify the salt > > > > - Option c: Random salt at runtime > > + Easy to implement > > - impossible to synchronize > > - breaks unittests in the same way as a compiled in salt would do > > - Option d: Don't change __hash__ but only use randomized hash for > PyDictEntry lookup > + Easy to implement > - breaks only software to relies on a fixed order of dict keys > - breaks only a few to no unit tests > > IMHO we don't have to alter the outcome of hash("some string"), hash(1) > and all other related types. We just need to reduce the change the an > attacker can produce collisions in the dict (and set?) code that looks > up the slot (PyDictEntry). How about adding the random value in > Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / > lookdict_unicode() (Python 3.x)? With this approach the hash of all our > objects stay the same and just the dict code needs to be altered. The > approach has also the benefit that all possible objects gain a > randomized hash. I basically agree with your proposal. The only downside is that custom hashed containers (such as _pickle.c's memotable) don't automatically benefit. That said, I think it would be difficult to craft an attack against the aforementioned memotable (you would have to basically choose the addresses of pickled objects). Regards Antoine. From debatem1 at gmail.com Thu Dec 29 16:41:49 2011 From: debatem1 at gmail.com (geremy condra) Date: Thu, 29 Dec 2011 10:41:49 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: On Wed, Dec 28, 2011 at 8:49 PM, Eric Snow wrote: > On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord > wrote: >> Hello all, >> >> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: >> >> ? ? ? ? http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf >> >> Although it's a security issue I'm posting it here because it is now public and seems important. >> >> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: >> >> ? ? ? ?reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB >> ? ? ? ?7 minutes of CPU usage for a 1 MB request >> ? ? ? ?~20 kbits/s ? keep one Core Duo core busy >> >> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). >> >> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. >> >> Their recommended fix is to randomize the hash function. > > Ironically, this morning I ran across a discussion from about 8 years > ago on basically the same thing: > > http://mail.python.org/pipermail/python-dev/2003-May/035874.html > > ?From what I read in the thread, it didn't seem like anyone here was > too worried about it. ?Does this new research change anything? Not really. It's actually somewhat behind previous work in that it doesn't exploit the timing deltas, just generates very large ones. Geremy Condra From ned at nedbatchelder.com Thu Dec 29 17:25:37 2011 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 29 Dec 2011 11:25:37 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: <4EFC9481.6010309@nedbatchelder.com> On 12/28/2011 9:09 PM, Raymond Hettinger wrote: > Also, randomizing the hash wreaks havoc on doctests, book examples > not matching actual dict reprs, and on efforts by users to optimize > the insertion order into dicts with frequent lookups. I don't have a strong opinion about what to do about this vulnerability, but I know that none of these three reasons are a good reason to not change anything. Dictionary key order has never been guaranteed, and changes from time to time. Any code relying on it is broken to begin with. This is one of the reasons not to use doctests in the first place: comparing dicts textually has always been silly. --Ned. From timothy.c.delaney at gmail.com Thu Dec 29 20:59:47 2011 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 30 Dec 2011 06:59:47 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> Message-ID: +1 to option d (host-based salt) but would need to consistently order the hostnames/addresses to guarantee that all processes on the same machine got the same salt by default. +1 to option c (environment variable) as an override. And/or maybe an override on the command line. +1 to implementing the salt in the dictionary hash as an additive value. +0 to exposing the salt as a constant (3.3+ only) - or alternatively expose a hash function that just takes an existing hash and returns the salted hash. That would make it very easy for anything that wanted a salted hash to get one. For choosing the default salt, I think something like: a. If IPv6 is enabled, take the link-local address of the interface with the default route. Pretty much guaranteed not to change, can't be determined externally (salting doesn't need a secret, but it doesn't hurt), large number so probably a good salt. (If it is likely to change, a salt override should be being used instead). Don't use any other IPv6 address. In particular, never use a "temporary" IPv6" address like Windows assigns - multiprocessing could end up with instances with different salts. b. Take the FQDN of the machine. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Thu Dec 29 21:00:33 2011 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 30 Dec 2011 07:00:33 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> Message-ID: > > +1 to option c (environment variable) as an override. And/or maybe an > override on the command line. > That obviously should have said option b (environment variable) ... Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Dec 29 21:07:59 2011 From: pje at telecommunity.com (PJ Eby) Date: Thu, 29 Dec 2011 15:07:59 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC6BE5.6000400@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> <4EFC6BE5.6000400@cheimes.de> Message-ID: On Thu, Dec 29, 2011 at 8:32 AM, Christian Heimes wrote: > IMHO we don't have to alter the outcome of hash("some string"), hash(1) > and all other related types. We just need to reduce the change the an > attacker can produce collisions in the dict (and set?) code that looks > up the slot (PyDictEntry). How about adding the random value in > Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / > lookdict_unicode() (Python 3.x)? With this approach the hash of all our > objects stay the same and just the dict code needs to be altered. I don't understand how that helps a collision attack. If you can still generate two strings with the same (pre-randomized) hash, what difference does it make that the dict adds a random number? The post-randomized number will still be the same, no? Or does this attack just rely on the hash *remainders* being the same? If so, I can see how hashing the hash would help. But since the attacker doesn't know the modulus, and it can change as the dictionary grows, I would expect the attack to require matching hashes, not just matching hash remainders... unless I'm just completely off base here. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at mcmillan.ws Thu Dec 29 22:28:23 2011 From: paul at mcmillan.ws (Paul McMillan) Date: Thu, 29 Dec 2011 13:28:23 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> Message-ID: It's worth pointing out that if the salt is somehow exposed to an attacker, or is guessable, much of the benefit goes away. It's likely that a timing attack could be used to discover the salt if it is fixed per machine or process over a long period of time. If a salt is generally fixed per machine, but varies from machine-to-machine, I think we'll see an influx of frustrated devs who have something that works perfectly on their machine but not for others. It doesn't matter that they're doing it wrong, we'll still have to deal with them as a community. This seems like an argument in favor of randomizing it at runtime by default, so it fails early for them. Allowing an environment and command line override makes sense, as it allows users to rotate the salt as frequently as their needs dictate. -Paul From lists at cheimes.de Thu Dec 29 22:31:05 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 22:31:05 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> <4EFC6BE5.6000400@cheimes.de> Message-ID: <4EFCDC19.2070703@cheimes.de> Am 29.12.2011 21:07, schrieb PJ Eby: > I don't understand how that helps a collision attack. If you can still > generate two strings with the same (pre-randomized) hash, what > difference does it make that the dict adds a random number? The > post-randomized number will still be the same, no? > > Or does this attack just rely on the hash *remainders* being the same? > If so, I can see how hashing the hash would help. But since the > attacker doesn't know the modulus, and it can change as the dictionary > grows, I would expect the attack to require matching hashes, not just > matching hash remainders... unless I'm just completely off base here. The attack doesn't need perfect collisions. The attacker calculates strings in a way so that their hashes results in as many collision as possible in the dict code. An attacker succeeds when the initial slot for an hash is filled and as many subsequent slots of the perturbed masked hash, too. Also an attacker can easily predict the size and therefore the mask for the hash remainder. A POST request parser usually starts with an empty dict and the growth rate of Python's dicts is well documented. The changing mask makes the attack just a tiny bit more challenging. The hash randomization idea adds a salt to throw the attacker of course. Instead of position = hash & mask it's now hash = salt + hash position = hash & mask where salt is a random, process global value that is fixed for the life time of the program. The salt also affects the perturbance during the search for new slots. As you already stated this salt won't be affective against full hash collisions. The attack needs A LOT of problematic strings to become an issue, possible hundred of thousands or even millions of keys in a very large POST request. In reality an attacker won't reach the full theoretical O(n^2) performance degradation for a hash table. But even more than O(n) instead of O(1) for a million keys in each request has some impact on your servers' CPUs. Some vendors have limited to POST request to 1 MB or the amount of keys to 10,000 to work around the issue. One paper also states that attacks on Python with 64bit is just academical for now. Christian From tjreedy at udel.edu Thu Dec 29 23:19:58 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 29 Dec 2011 17:19:58 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC4F31.3090703@active-4.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> Message-ID: The talk was a presentation yesterday by Alexander Klink and Julian W?lde at the Chaos Communication Congress in Germany hashDoS at alech.de I read the non-technical summary at http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars and watched the video of the talk at https://www.youtube.com/watch?feature=player_embedded&v=_EEhviEO1Vo# My summary: hash table creation with N keys changes from amortized O(N) to O(N**2) time if the hash values of all the keys are the same. This should only happen for large N if done intentionally. This is easy to accomplish with a linear multiply and add hash function, such as used in PHP4 (but nowhere else that the authors found). A nonlinear multiply and xor hash function, used in one form or another by everything else, is much harder to break. It is *theoretically* vulnerable to brute-force search and this has been known for years. With a more cleaver meet-in-the-middle strategy, that builds a dict of suffixes and then searches for matching prefixes, 32-bit hashes are *practically* vulnerable. The attack depends on, for instance, 2**16 (64K) being 1/64K of 2**32. (I did not hear when this strategy was developed, but it is certainly more practical on a desktop now than even 8 years ago.) [64-bit hashes are much, much less vulnerable to attack, at least for now. So it seems to me that anyone who hashes potential attack data can avoid the problem by using 64-bit Python with 64-bit hash values. If I understood Antoine, that should be all 64-bit builds.] More summary: Perl added an #define option to start the hash calculation with non-zero value instead of 0 years ago to "avoid algorithmic complexity attacks". The patch is at 47:20 in the video. The authors believe all should do similarly. [The change amounts to adding a char, unknown to attackers, to the beginning of every string before hashing. So it adds a small bit of time. The code patch shown did not show the source of the non-zero seed or the timing and scope of any randomization. As the discussion here has shown, this is an important issue to applications. So 'do the same' is inadequate and over-simplified advice. I believe Armin's patch is similar to the Perl patch.] Since the authors sent out CERT alert about Nov 1, PHP has added to PHP5 a new function to limit the number of vars hashed. Microsoft will do something similar now with hash randomization to follow (maybe?). JRuby is going to do something. Java does not think it needs to change Java itself, but will leave all to the frameworks. [The discussion here suggests that this is an inadequate response for 32-bit systems like Java since one person/group may not control all the pieces of a server system. However, a person or group can run all pieces on a system Python with an option turned on.] -- Terry Jan Reedy From tjreedy at udel.edu Thu Dec 29 23:28:22 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 29 Dec 2011 17:28:22 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFCDC19.2070703@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> <4EFC6BE5.6000400@cheimes.de> <4EFCDC19.2070703@cheimes.de> Message-ID: On 12/29/2011 4:31 PM, Christian Heimes wrote: > The hash randomization idea adds a salt to throw the attacker of course. > Instead of > > position = hash& mask > > it's now > > hash = salt + hash As I understood the talk (actually, the bit of Perl interpreter C code shown), the randomization is to change hash(s) to hash(salt+s) so that the salt is completely mixed into the hash from the beginning, rather than just tacked on at the end. -- Terry Jan Reedy From lists at cheimes.de Thu Dec 29 23:50:16 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 Dec 2011 23:50:16 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> <4EFC6BE5.6000400@cheimes.de> <4EFCDC19.2070703@cheimes.de> Message-ID: <4EFCEEA8.8010206@cheimes.de> Am 29.12.2011 23:28, schrieb Terry Reedy: > As I understood the talk (actually, the bit of Perl interpreter C code > shown), the randomization is to change hash(s) to hash(salt+s) so that > the salt is completely mixed into the hash from the beginning, rather > than just tacked on at the end. Yes, the Perl and Ruby code uses a random seed as IV for hash generation. It's the best way to create randomized hashes but it might not be a feasible fix for Python 2.x. I'm worried that it might break applications that rely on stable hash values. From timothy.c.delaney at gmail.com Fri Dec 30 01:55:45 2011 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 30 Dec 2011 11:55:45 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com> Message-ID: On 30 December 2011 06:59, Tim Delaney wrote: > +0 to exposing the salt as a constant (3.3+ only) - or alternatively > expose a hash function that just takes an existing hash and returns the > salted hash. That would make it very easy for anything that wanted a salted > hash to get one. > Sorry - brain fart on my part there - the salt needs to be included right from the start. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien at tayon.net Fri Dec 30 15:26:29 2011 From: julien at tayon.net (julien tayon) Date: Fri, 30 Dec 2011 15:26:29 +0100 Subject: [Python-Dev] hello, new dict addition for new eve ? Message-ID: Hello, Sorry to annoy the very busy core devs :) out of the blue I quite noticed people were 1) wanting to have a new dict for Xmas 2) strongly resenting dict addition. Even though I am not a good developper, I have come to a definition of addition that would follow algebraic rules, and not something of a dutch logic. (it is a jest, not a troll) I propose the following code to validate my point of view regarding the dictionnatry addition as a proof of concept : https://github.com/jul/ADictAdd_iction/blob/master/test.py It follows all my dusty math books regarding addition + it has the amability to have rules of conservation. I pretty much see a real advantage in this behaviour in functional programming (map/reduce). (see the demonstrate.py), and it has a sense (if dict can be seen has vectors). I have been told to be a troll, but I am pretty serious. Since, I coded with luck, no internet, intuition, and a complete ignorance of the real meaning of the magic methods most of the time, thus the actual implementation of the addition surely needs a complete refactoring. Sheers, Bonne f?tes Julien From blendmaster1024 at gmail.com Fri Dec 30 15:49:11 2011 From: blendmaster1024 at gmail.com (lahwran) Date: Fri, 30 Dec 2011 07:49:11 -0700 Subject: [Python-Dev] Your email to the mailing list In-Reply-To: References: Message-ID: ...oops, I did not intend to send this to the mailing list. I apologize for the accidental off topic. On Fri, Dec 30, 2011 at 7:40 AM, lahwran wrote: > I don't want to post to the mailing list about this; But I must say, I > found your email very entertaining. You have a good sense of humor. From blendmaster1024 at gmail.com Fri Dec 30 15:40:38 2011 From: blendmaster1024 at gmail.com (lahwran) Date: Fri, 30 Dec 2011 07:40:38 -0700 Subject: [Python-Dev] Your email to the mailing list Message-ID: I don't want to post to the mailing list about this; But I must say, I found your email very entertaining. You have a good sense of humor. From guido at python.org Fri Dec 30 17:40:06 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 30 Dec 2011 09:40:06 -0700 Subject: [Python-Dev] hello, new dict addition for new eve ? In-Reply-To: References: Message-ID: Hi Julien, Don't despair! I have tried to get people to warm up to dict addition too -- in fact it was my counter-proposal at the time when we were considering adding sets to the language. I will look at your proposal, but I have a point of order first: this should be discussed on python-ideas, not on python-dev. I have added python-ideas to the thread and moved python-dev to Bcc, so followups will hopefully all go to python-ideas. --Guido On Fri, Dec 30, 2011 at 7:26 AM, julien tayon wrote: > Hello, > Sorry to annoy the very busy core devs :) out of the blue > > I quite noticed people were > 1) wanting to have a new dict for Xmas > 2) strongly resenting dict addition. > > Even though I am not a good developper, I have come to a definition of > addition that would follow algebraic rules, and not something of a > dutch logic. (it is a jest, not a troll) > > I propose the following code to validate my point of view regarding > the dictionnatry addition as a proof of concept : > https://github.com/jul/ADictAdd_iction/blob/master/test.py > > It follows all my dusty math books regarding addition + it has the > amability to have rules of conservation. > > I pretty much see a real advantage in this behaviour in functional > programming (map/reduce). (see the demonstrate.py), and it has a sense > (if dict can be seen has vectors). > > I have been told to be a troll, but I am pretty serious. > > Since, I coded with luck, no internet, intuition, and a complete > ignorance of the real meaning of the magic methods most of the time, > thus the actual implementation of the addition surely needs a complete > refactoring. > > Sheers, > Bonne f?tes > Julien > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Dec 30 18:07:34 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 30 Dec 2011 18:07:34 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20111230170734.025381CCBF@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-12-23 - 2011-12-30) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3178 (+10) closed 22288 (+16) total 25466 (+26) Open issues with patches: 1365 Issues opened (21) ================== #12760: Add create mode to open() http://bugs.python.org/issue12760 reopened by pitrou #13294: http.server: minor code style changes. http://bugs.python.org/issue13294 reopened by ezio.melotti #13659: Add a help() viewer for IDLE's Shell. http://bugs.python.org/issue13659 opened by ramchandra.apte #13663: pootle.python.org is outdated. http://bugs.python.org/issue13663 opened by naoki #13664: UnicodeEncodeError in gzip when filename contains non-ascii http://bugs.python.org/issue13664 opened by jason.coombs #13665: TypeError: string or integer address expected instead of str i http://bugs.python.org/issue13665 opened by jason.coombs #13666: datetime documentation typos http://bugs.python.org/issue13666 opened by steveire #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 opened by Zhiping.Deng #13669: XATTR_SIZE_MAX and XATTR_LIST_MAX undefined on kfreebsd/debian http://bugs.python.org/issue13669 opened by zbysz #13670: Increase test coverage for pstats.py http://bugs.python.org/issue13670 opened by andrea.crotti #13672: Add co_qualname attribute in code objects http://bugs.python.org/issue13672 opened by Arfrever #13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi http://bugs.python.org/issue13673 opened by sbt #13674: crash in datetime.strftime http://bugs.python.org/issue13674 opened by patrick.vrijlandt #13676: sqlite3: Zero byte truncates string contents http://bugs.python.org/issue13676 opened by petri.lehtinen #13677: correct docstring for builtin compile http://bugs.python.org/issue13677 opened by Jim.Jewett #13679: Multiprocessing system crash http://bugs.python.org/issue13679 opened by Rock.Achu #13680: Aifc comptype write fix http://bugs.python.org/issue13680 opened by Oleg.Plakhotnyuk #13681: Aifc read compressed frames fix http://bugs.python.org/issue13681 opened by Oleg.Plakhotnyuk #13682: Documentation of os.fdopen() refers to non-existing bufsize ar http://bugs.python.org/issue13682 opened by petri.lehtinen #13683: Docs in Python 3:raise statement mistake http://bugs.python.org/issue13683 opened by ramchandra.apte #13684: httplib tunnel infinite loop http://bugs.python.org/issue13684 opened by luzakiru Most recent 15 issues with no replies (15) ========================================== #13684: httplib tunnel infinite loop http://bugs.python.org/issue13684 #13683: Docs in Python 3:raise statement mistake http://bugs.python.org/issue13683 #13682: Documentation of os.fdopen() refers to non-existing bufsize ar http://bugs.python.org/issue13682 #13680: Aifc comptype write fix http://bugs.python.org/issue13680 #13677: correct docstring for builtin compile http://bugs.python.org/issue13677 #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 #13666: datetime documentation typos http://bugs.python.org/issue13666 #13665: TypeError: string or integer address expected instead of str i http://bugs.python.org/issue13665 #13664: UnicodeEncodeError in gzip when filename contains non-ascii http://bugs.python.org/issue13664 #13649: termios.ICANON is not documented http://bugs.python.org/issue13649 #13641: decoding functions in the base64 module could accept unicode s http://bugs.python.org/issue13641 #13640: add mimetype for application/vnd.apple.mpegurl http://bugs.python.org/issue13640 #13638: PyErr_SetFromErrnoWithFilenameObject is undocumented http://bugs.python.org/issue13638 #13633: Handling of hex character references in HTMLParser.handle_char http://bugs.python.org/issue13633 #13631: readline fails to parse some forms of .editrc under editline ( http://bugs.python.org/issue13631 Most recent 15 issues waiting for review (15) ============================================= #13684: httplib tunnel infinite loop http://bugs.python.org/issue13684 #13681: Aifc read compressed frames fix http://bugs.python.org/issue13681 #13680: Aifc comptype write fix http://bugs.python.org/issue13680 #13677: correct docstring for builtin compile http://bugs.python.org/issue13677 #13676: sqlite3: Zero byte truncates string contents http://bugs.python.org/issue13676 #13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi http://bugs.python.org/issue13673 #13670: Increase test coverage for pstats.py http://bugs.python.org/issue13670 #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 #13651: Improve redirection in urllib http://bugs.python.org/issue13651 #13645: import machinery vulnerable to timestamp collisions http://bugs.python.org/issue13645 #13640: add mimetype for application/vnd.apple.mpegurl http://bugs.python.org/issue13640 #13636: Python SSL Stack doesn't have a Secure Default set of ciphers http://bugs.python.org/issue13636 #13631: readline fails to parse some forms of .editrc under editline ( http://bugs.python.org/issue13631 #13629: _PyParser_TokenNames does not match up with the token.h number http://bugs.python.org/issue13629 #13619: Add a new codec: "locale", the current locale encoding http://bugs.python.org/issue13619 Top 10 most discussed issues (10) ================================= #13679: Multiprocessing system crash http://bugs.python.org/issue13679 10 msgs #13674: crash in datetime.strftime http://bugs.python.org/issue13674 9 msgs #13669: XATTR_SIZE_MAX and XATTR_LIST_MAX undefined on kfreebsd/debian http://bugs.python.org/issue13669 8 msgs #8828: Atomic function to rename a file http://bugs.python.org/issue8828 7 msgs #9260: A finer grained import lock http://bugs.python.org/issue9260 5 msgs #13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi http://bugs.python.org/issue13673 5 msgs #6028: Interpreter aborts when chaining an infinite number of excepti http://bugs.python.org/issue6028 3 msgs #13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le http://bugs.python.org/issue13565 3 msgs #13657: IDLE doesn't support sys.ps1 and sys.ps2. http://bugs.python.org/issue13657 3 msgs #13672: Add co_qualname attribute in code objects http://bugs.python.org/issue13672 3 msgs Issues closed (17) ================== #3555: Regression: nested exceptions crash (Cannot recover from stack http://bugs.python.org/issue3555 closed by terry.reedy #7338: recursive __attribute__ -> Fatal Python error: Cannot recover http://bugs.python.org/issue7338 closed by terry.reedy #11638: python setup.py sdist --formats tar* crashes if version is uni http://bugs.python.org/issue11638 closed by jason.coombs #11812: transient socket failure to connect to 'localhost' http://bugs.python.org/issue11812 closed by neologix #13632: Update token documentation to reflect actual token types http://bugs.python.org/issue13632 closed by meador.inge #13639: UnicodeDecodeError when creating tar.gz with unicode name http://bugs.python.org/issue13639 closed by terry.reedy #13643: 'ascii' is a bad filesystem default encoding http://bugs.python.org/issue13643 closed by terry.reedy #13644: Python 3 aborts with this code. http://bugs.python.org/issue13644 closed by terry.reedy #13658: Extra clause in class grammar documentation http://bugs.python.org/issue13658 closed by python-dev #13660: maniandram maniandram wants to chat http://bugs.python.org/issue13660 closed by pitrou #13661: maniandram maniandram wants to chat http://bugs.python.org/issue13661 closed by pitrou #13662: os.listdir bug http://bugs.python.org/issue13662 closed by ezio.melotti #13667: __contains__ method behavior http://bugs.python.org/issue13667 closed by benjamin.peterson #13671: double comma cant be parsed in config module http://bugs.python.org/issue13671 closed by lukasz.langa #13675: IDLE won't open if it can't read recent-files.lst http://bugs.python.org/issue13675 closed by michael.foord #13678: way to prevent accidental variable overriding http://bugs.python.org/issue13678 closed by benjamin.peterson #12715: Add symlink support to shutil functions http://bugs.python.org/issue12715 closed by pitrou From brian at python.org Fri Dec 30 20:29:36 2011 From: brian at python.org (Brian Curtin) Date: Fri, 30 Dec 2011 13:29:36 -0600 Subject: [Python-Dev] [Python-checkins] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile, In-Reply-To: References: Message-ID: On Thu, Dec 29, 2011 at 11:55, antoine.pitrou wrote: > http://hg.python.org/cpython/rev/cf57ef65bcd0 > changeset: ? 74194:cf57ef65bcd0 > user: ? ? ? ?Antoine Pitrou > date: ? ? ? ?Thu Dec 29 18:54:15 2011 +0100 > summary: > ?Issue #12715: Add an optional symlinks argument to shutil functions (copyfile, copymode, copystat, copy, copy2). > When that parameter is true, symlinks aren't dereferenced and the operation > instead acts on the symlink itself (or creates one, if relevant). > > Patch by Hynek Schlawack. > > files: > ?Doc/library/shutil.rst ?| ? 46 ++++- > ?Lib/shutil.py ? ? ? ? ? | ?101 +++++++++--- > ?Lib/test/test_shutil.py | ?219 ++++++++++++++++++++++++++++ > ?Misc/NEWS ? ? ? ? ? ? ? | ? ?5 + > ?4 files changed, 333 insertions(+), 38 deletions(-) > > > diff --git a/Doc/library/shutil.rst b/Doc/library/shutil.rst > --- a/Doc/library/shutil.rst > +++ b/Doc/library/shutil.rst > @@ -45,7 +45,7 @@ > ? ?be copied. > > > -.. function:: copyfile(src, dst) > +.. function:: copyfile(src, dst[, symlinks=False]) > > ? ?Copy the contents (no metadata) of the file named *src* to a file named *dst*. > ? ?*dst* must be the complete target file name; look at :func:`copy` for a copy that > @@ -56,37 +56,56 @@ > ? ?such as character or block devices and pipes cannot be copied with this > ? ?function. ?*src* and *dst* are path names given as strings. > > + ? If *symlinks* is true and *src* is a symbolic link, a new symbolic link will > + ? be created instead of copying the file *src* points to. > + > ? ?.. versionchanged:: 3.3 > ? ? ? :exc:`IOError` used to be raised instead of :exc:`OSError`. > + ? ? ?Added *symlinks* argument. Can we expect that readers on Windows know how os.symlink works, or should the stipulations of os.symlink usage also be laid out or at least linked to from there? Basically, almost everyone is going to get an OSError if they call this on Windows. You have to be on Windows Vista or beyond *and* the calling process has to have the proper privileges (typically gained through elevation - "Run as Administrator"). From solipsis at pitrou.net Fri Dec 30 20:39:20 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 Dec 2011 20:39:20 +0100 Subject: [Python-Dev] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile, References: Message-ID: <20111230203920.0cf28b1e@pitrou.net> On Fri, 30 Dec 2011 13:29:36 -0600 Brian Curtin wrote: > > Can we expect that readers on Windows know how os.symlink works, or > should the stipulations of os.symlink usage also be laid out or at > least linked to from there? I assume it won't make a difference in real life, since symlinks are quite rare under Windows. > Basically, almost everyone is going to get an OSError if they call > this on Windows. You have to be on Windows Vista or beyond *and* the > calling process has to have the proper privileges (typically gained > through elevation - "Run as Administrator"). I still haven't managed to use symlinks under Windows 7, myself. The recipes I've tried didn't work. Regards Antoine. From brian at python.org Fri Dec 30 20:51:33 2011 From: brian at python.org (Brian Curtin) Date: Fri, 30 Dec 2011 13:51:33 -0600 Subject: [Python-Dev] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile, In-Reply-To: <20111230203920.0cf28b1e@pitrou.net> References: <20111230203920.0cf28b1e@pitrou.net> Message-ID: On Fri, Dec 30, 2011 at 13:39, Antoine Pitrou wrote: > On Fri, 30 Dec 2011 13:29:36 -0600 > Brian Curtin wrote: >> >> Can we expect that readers on Windows know how os.symlink works, or >> should the stipulations of os.symlink usage also be laid out or at >> least linked to from there? > > I assume it won't make a difference in real life, since symlinks are > quite rare under Windows. > >> Basically, almost everyone is going to get an OSError if they call >> this on Windows. You have to be on Windows Vista or beyond *and* the >> calling process has to have the proper privileges (typically gained >> through elevation - "Run as Administrator"). > > I still haven't managed to use symlinks under Windows 7, myself. > The recipes I've tried didn't work. This might be a place where an image in the documentation would be helpful. I don't think we do that anywhere else, but maybe I could add it to the (sorely out of date and in need of a rebuild) Windows FAQ? What you need to do on Win7 is go to Start > All Programs > Accessories > Command Prompt, but right click on it instead of left click. Choose "Run as Administrator", then it'll make you choose yes or no to elevate privileges. At that point, deep in the heart of everyone's favorite operating system, it should acquire the SeCreateSymbolicLink user privilege. After that, os.symlink should work fine. From jimjjewett at gmail.com Sat Dec 31 02:04:39 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 30 Dec 2011 20:04:39 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: In http://mail.python.org/pipermail/python-dev/2011-December/115138.html, Christian Heimes pointed out that > ... we don't have to alter the outcome of hash ... We just need to reduce the chance that > an attacker can produce collisions in the dict (and set?) I'll state it more strongly. hash probably should not change (at least for this), but we may want to consider a different conflict resolution strategy when the first slot is already filled. Remember that there was a fair amount of thought and timing effort put into selecting the current strategy; it is deliberately sub-optimal for random input, in order to do better with typical input. < http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt > If there is a change, it would currently be needed in three places for each of set and dict (the lookdict functions and insertdict_clean). It may be worth adding some macros just to keep those six in sync. Once those macros are in place, that allows a compile-time switch. My personal opinion is that accepting *and parsing* enough data for this to be a problem is enough of an edge case that I don't want normal dicts slowed down at all for this; I would therefore prefer that the change be restricted to such a compile-time switch, with current behavior the default. http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictobject.c#l571 583 for (perturb = hash; ep->me_key != NULL; perturb >>= PERTURB_SHIFT) { 584 i = (i << 2) + i + perturb + 1; PERTURB_SHIFT is already a private #define to 5; per dictnotes, 4 and 6 perform almost as well. Someone worried can easily make that change today, and be protected from "generic" anti-python attacks. I believe the salt suggestions have equivalent to replacing perturb = hash; with something like perturb = hash + salt; Changing i = (i << 2) + i + perturb + 1; would allow effectively replacing the initial hash, but risks spoiling performance in the non-adversary case. Would there be objections to replacing those two lines with something like: for (perturb = FIRST_PERTURB(hash, key); ep->me_key != NULL; perturb = NEXT_PERTURB(hash, key, perturb)) { i = NEXT_SLOT(i, perturb); The default macro definitions should keep things as they are #define FIRST_PERTURB(hash, key) hash #define NEXT_PERTURB(hash, key, perturb) perturb >> PERTURB_SHIFT #define NEXT_SLOT(i, perturb) (i << 2) + i + perturb + 1 while allowing #ifdefs for (slower but) safer things like adding a salt, or even using alternative hashes. -jJ From victor.stinner at haypocalc.com Sat Dec 31 03:22:24 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 31 Dec 2011 03:22:24 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> Message-ID: <4EFE71E0.2000505@haypocalc.com> Le 29/12/2011 02:28, Michael Foord a ?crit : > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: > > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf This PDF doesn't explain exactly the problem and how it can be solved. Let's try to summarize this "vulnerability". The creation of a Python dictionary has a complexity of O(n) in most cases, but O(n^2) in the *worst* case. The attack tries to go into the worst case. It requires to compute a set of N keys having the same hash value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to compute these keys once. It looks like it is now cheap enough in practice to compute this dataset for Python (and other languages). A countermeasure would be to check that we don't have more than X keys with the same hash value. But in practice, we don't know in advance how data are processed, and there are too many input vectors in various formats. If we want to fix something, it should be done in the implementation of the dict type or in the hash algorithm. We can implement dict differently to avoid this issue, using a binary tree for example. Because dict is a fundamental type in Python, I don't think that we can change its implementation (without breaking backward compatibility and so applications in production). A possibility would be to add a *new* type, but all libraries and applications would need to be changed to fix the vulnerability. The last choice is to change the hash algorithm. The *idea* is the same than adding salt to hashed password (in practice it will be a little bit different): if a pseudo-random salt is added, the attacker cannot prepare a single dataset, he/she will have to regenerate a new dataset for each possible salt value. If the salt is big enough (size in bits), the attacker will need too much CPU to generate the dataset (compute N keys with the same hash value). Basically, it slows down the attack by 2^(size of the salt). -- Another possibility would be to replace our fast hash function by a better hash function like MD5 or SHA1 (so the creation of the dataset would be too slow in practice = too expensive), but cryptographic hash functions are much slower (and so would slow down Python too much). Limiting the size of the POST data doesn't solve the problem because there are many other input vectors and data formats. It may block the most simple attacks because the attacker cannot inject enough keys to slow down your CPU. Victor From steve at pearwood.info Sat Dec 31 03:19:01 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 31 Dec 2011 13:19:01 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: Message-ID: <4EFE7115.9070502@pearwood.info> Jim Jewett wrote: > My personal opinion is that accepting *and parsing* enough data for > this to be a problem > is enough of an edge case that I don't want normal dicts slowed down > at all for this; I would > therefore prefer that the change be restricted to such a compile-time > switch, with current behavior the default. By compile-time, do you mean when the byte-code is compilated, i.e. just before runtime, rather than a switch when compiling the Python executable from source? I will assume so. I'm not a big fan of compile-time (runtime) switches. It makes it too hard to compare before-and-after behaviour within a single session, and impossible to have fine control over which objects have which behaviour. I don't like all-or-nothing settings. (E.g. I'd love to be able to turn -O optimization on and off on a per-function basis, but can't.) How about using a similar strategy to the current dict behaviour with __missing__ and defaultdict? Here's my suggestion: - If a dict subclass defines __salt__, then it is called to salt the hash value before lookups. If __salt__ is undefined or None, the current behaviour remains unchanged. - Add a dict subclass (saltdict, for lack of a better name) that defines __salt__ appropriately to the collections module. In this case, I don't know enough to suggest what is an appropriate salt. I leave that to the security experts to argue about. - Update the relevant standard library modules to use saltdict where needed. This allows a single application or framework to use saltdict where necessary, without slowing down all dict accesses. Dicts which never see user-generated input (e.g. globals) can remain full-speed. If there is no consensus about the best salting strategy, then apps can choose their own by subclassing dict. Responsibility for doing the right thing falls onto the library author, rather than Python itself. Some people may consider that a minus. -- Steven From victor.stinner at haypocalc.com Sat Dec 31 03:31:03 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 31 Dec 2011 03:31:03 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC68E0.4000606@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> Message-ID: <4EFE73E7.3070500@haypocalc.com> Le 29/12/2011 14:19, Christian Heimes a ?crit : > Perhaps the dict code is a better place for randomization. The problem is the creation of a dict with keys all having the same hash value. The current implementation of dict uses a linked-list. Adding a new item requires to compare the new key to all existing keys (compare the value, not the hash, which is much slower). We had to change completly how dict is implemented to be able to fix this issue. I don't think that we can change the dict implementation without breaking backward compatibility or breaking applications. Change the implementation would change dict properties, and applications rely on the properties of the current implementation. Tell me if I am wrong. Victor From victor.stinner at haypocalc.com Sat Dec 31 03:39:45 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 31 Dec 2011 03:39:45 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC4F31.3090703@active-4.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4F31.3090703@active-4.com> Message-ID: <4EFE75F1.9030305@haypocalc.com> > In case the watchdog is not a viable solution as I had assumed it was, I > think it's more reasonable to indeed consider adding a flag to Python > that allows randomization of hashes optionally before startup. A flag will only be needed if the overhead of the fix is too high. > However as it was said earlier, the attack is a lot more complex to > carry out on a 64bit environment that it's probably (as it stands right > now!) safe to ignore. I suppose that there are still servers running 32 bits Python. > The main problem there however is not that it's a new attack but that > some dickheads could now make prebaked attacks against websites to > disrupt them that might cause some negative publicity. In general > though there are so many more ways to DDOS a website than this that I > would rate the whole issue very low. There are countermeasures for low level DDOS (ICMP ping flood, TCP syn flood, etc.). An application (or a firewall) cannot implement a countermeasure for this high level issue. It can only be fixed in Python directly (by changing the implementation of the dict type or of the hash function). Victor From lists at cheimes.de Sat Dec 31 04:24:15 2011 From: lists at cheimes.de (Christian Heimes) Date: Sat, 31 Dec 2011 04:24:15 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFE73E7.3070500@haypocalc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <4EFE73E7.3070500@haypocalc.com> Message-ID: <4EFE805F.6000302@cheimes.de> Am 31.12.2011 03:31, schrieb Victor Stinner: > Le 29/12/2011 14:19, Christian Heimes a ?crit : >> Perhaps the dict code is a better place for randomization. > > The problem is the creation of a dict with keys all having the same hash > value. The current implementation of dict uses a linked-list. Adding a > new item requires to compare the new key to all existing keys (compare > the value, not the hash, which is much slower). > > We had to change completly how dict is implemented to be able to fix > this issue. I don't think that we can change the dict implementation > without breaking backward compatibility or breaking applications. Change > the implementation would change dict properties, and applications rely > on the properties of the current implementation. > > Tell me if I am wrong. You are right and I was wrong. We can't do the randomization inside the dict code. The randomization factor must used as initialization factor (IV) for the hashing algorithm. At first I thought the attack used the unique property of Python's dict implementation (perturbed hash instead of buckets for equal hashes) but I was wrong. It took me several hours of reading and digging into the math until I figured out my mistake. Sorry! :) This means we can't implement a salted dict unless the salted dict re-implemention the hash algorithm for unicode, bytes and memoryview. I doubt that a wise idea. I've checked my first draft of a possible solution: http://hg.python.org/features/randomhash/ . The pseudo RNG has to be replaced with something better and it's missing an option to feed a seed, too. Christian From lists at cheimes.de Sat Dec 31 04:28:18 2011 From: lists at cheimes.de (Christian Heimes) Date: Sat, 31 Dec 2011 04:28:18 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFE7115.9070502@pearwood.info> References: <4EFE7115.9070502@pearwood.info> Message-ID: <4EFE8152.20109@cheimes.de> Am 31.12.2011 03:19, schrieb Steven D'Aprano: > How about using a similar strategy to the current dict behaviour with > __missing__ and defaultdict? Here's my suggestion: > > > - If a dict subclass defines __salt__, then it is called to salt the hash > value before lookups. If __salt__ is undefined or None, the current > behaviour remains unchanged. This was my initial proposal, too. It took me a while to figure out that it won't work. Post-salting won't fix the issue. The random seed must be used as IV inside hashing algorithm. My brain was still in holiday mode and it took me a while to figure out the math. Sorry for any confusion! Christian From lists at cheimes.de Sat Dec 31 04:59:41 2011 From: lists at cheimes.de (Christian Heimes) Date: Sat, 31 Dec 2011 04:59:41 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFE71E0.2000505@haypocalc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> Message-ID: <4EFE88AD.2060505@cheimes.de> Am 31.12.2011 03:22, schrieb Victor Stinner: > The creation of a Python dictionary has a complexity of O(n) in most > cases, but O(n^2) in the *worst* case. The attack tries to go into the > worst case. It requires to compute a set of N keys having the same hash > value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to > compute these keys once. It looks like it is now cheap enough in > practice to compute this dataset for Python (and other languages). Correct. The meet-in-the-middle attack and the unique properties of algorithms that are similar to DJBX33A and DJBX33A make the attack easy on platforms with 32bit hash. > A countermeasure would be to check that we don't have more than X keys > with the same hash value. But in practice, we don't know in advance how > data are processed, and there are too many input vectors in various formats. > > If we want to fix something, it should be done in the implementation of > the dict type or in the hash algorithm. We can implement dict > differently to avoid this issue, using a binary tree for example. > Because dict is a fundamental type in Python, I don't think that we can > change its implementation (without breaking backward compatibility and > so applications in production). A possibility would be to add a *new* > type, but all libraries and applications would need to be changed to fix > the vulnerability. A BTree is too slow for common operations, it's O(log n) instead of O(1) in average. We can't replace our dict with a btree type. A new btree type is a lot of work, too. The unique structure of CPython's dict implementation makes it harder to get the number of values with equal hash. The academic hash map (the one I learnt about at university) uses a bucket to store all elements with equal hash (more precise hash: mod mask). However Python's dict however perturbs the hash until it finds a free slot its array. The second, third ... collision can be caused by a legit and completely different (!) hash. > The last choice is to change the hash algorithm. The *idea* is the same > than adding salt to hashed password (in practice it will be a little bit > different): if a pseudo-random salt is added, the attacker cannot > prepare a single dataset, he/she will have to regenerate a new dataset > for each possible salt value. If the salt is big enough (size in bits), > the attacker will need too much CPU to generate the dataset (compute N > keys with the same hash value). Basically, it slows down the attack by > 2^(size of the salt). That's the idea of randomized hashing functions as implemented by Ruby 1.8, Perl and others. The random seed is used as IV. Multiple rounds of multiply, XOR and MOD (integer overflows) cause a deviation. In your other posting you were worried about the performance implication. A randomized hash function just adds a single ADD operation, that's all. Downside: With randomization all hashes are unpredictable and change after every restart of the interpreter. This has some subtle side effects like a different outcome of {a:1, b:1, c:1}.keys() after a restart of the interpreter. > Another possibility would be to replace our fast hash function by a > better hash function like MD5 or SHA1 (so the creation of the dataset > would be too slow in practice = too expensive), but cryptographic hash > functions are much slower (and so would slow down Python too much). I agree with your analysis. Cryptographic hash functions are far too slow for our use case. During my research I found another hash function that claims to be fast and that may not be vulnerable to this kind of attack: http://isthe.com/chongo/tech/comp/fnv/ Christian From tjreedy at udel.edu Sat Dec 31 06:02:43 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 31 Dec 2011 00:02:43 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: Message-ID: On 12/30/2011 8:04 PM, Jim Jewett wrote: > I'll state it more strongly. hash probably should not change (at > least for this), I agree, especially since the vulnerability can be avoided by using 64 bit servers and will generally abate as more switch anyway. > but we may > want to consider a different conflict resolution strategy when the > first slot is already filled. > > Remember that there was a fair amount of thought and timing effort put > into selecting the > current strategy; it is deliberately sub-optimal for random input, in > order to do better with > typical input.< > http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt> It would be good to have a set of attack strings to see how vulernerable Py dicts actually are (Python may not have been actually tested with data) and the affect of any change. I gave the project email of the 2 presenters in my first post. They apparently want to work with language developers to improve defenses against attack. -- Terry Jan Reedy From stephen at xemacs.org Sat Dec 31 13:03:22 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 31 Dec 2011 21:03:22 +0900 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFE71E0.2000505@haypocalc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> Message-ID: <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> Victor Stinner writes: > Let's try to summarize this "vulnerability". > > The creation of a Python dictionary has a complexity of O(n) in most > cases, but O(n^2) in the *worst* case. The attack tries to go into the > worst case. It requires to compute a set of N keys having the same hash > value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to > compute these keys once. It looks like it is now cheap enough in > practice to compute this dataset for Python (and other languages). I don't know the implementation issues well enough to claim it is a solution, but this hasn't been mentioned before AFAICS: While the dictionary probe has to start with a hash for backward compatibility reasons, is there a reason the overflow strategy for insertion has to be buckets containing lists? How about double-hashing, etc? From lists at cheimes.de Sat Dec 31 15:16:24 2011 From: lists at cheimes.de (Christian Heimes) Date: Sat, 31 Dec 2011 15:16:24 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4EFF1938.6080809@cheimes.de> Am 31.12.2011 13:03, schrieb Stephen J. Turnbull: > I don't know the implementation issues well enough to claim it is a > solution, but this hasn't been mentioned before AFAICS: > > While the dictionary probe has to start with a hash for backward > compatibility reasons, is there a reason the overflow strategy for > insertion has to be buckets containing lists? How about > double-hashing, etc? Python's dict implementation doesn't use bucket but open addressing (aka closed hashed table). The algorithm for conflict resolution doesn't use double hashing. Instead it takes the original and (in most cases) cached hash and perturbs the hash with a series of add, multiply and bit shift ops. From martin at v.loewis.de Sat Dec 31 15:40:34 2011 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 31 Dec 2011 15:40:34 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFE73E7.3070500@haypocalc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <4EFE73E7.3070500@haypocalc.com> Message-ID: <20111231154034.Horde.Cc8gWML8999O-x7iTQRBreA@webmail.df.eu> Zitat von Victor Stinner : > The current implementation of dict uses a linked-list. [...] > Tell me if I am wrong. At least with this statement, you are wrong: the current implementation does *not* use a linked-list. Instead, it uses open addressing. Regards, Martin From pje at telecommunity.com Sat Dec 31 19:04:28 2011 From: pje at telecommunity.com (PJ Eby) Date: Sat, 31 Dec 2011 13:04:28 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull wrote: > While the dictionary probe has to start with a hash for backward > compatibility reasons, is there a reason the overflow strategy for > insertion has to be buckets containing lists? How about > double-hashing, etc? > This won't help, because the keys still have the same hash value. ANYTHING you do to them after they're generated will result in them still colliding. The *only* thing that works is to change the hash function in such a way that the strings end up with different hashes in the first place. Otherwise, you'll still end up with (deliberate) collisions. (Well, technically, you could use trees or some other O log n data structure as a fallback once you have too many collisions, for some value of "too many". Seems a bit wasteful for the purpose, though.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jyasskin at gmail.com Sat Dec 31 22:04:02 2011 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sat, 31 Dec 2011 13:04:02 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Wed, Dec 28, 2011 at 5:37 PM, Jesse Noller wrote: > > > On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote: > >> Hello all, >> >> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: >> >> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf >> >> Although it's a security issue I'm posting it here because it is now public and seems important. >> >> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted: >> >> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB >> 7 minutes of CPU usage for a 1 MB request >> ~20 kbits/s ? keep one Core Duo core busy >> >> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). >> >> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them. >> >> Their recommended fix is to randomize the hash function. >> >> All the best, >> >> Michael >> > Back up link for the PDF: > http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > Ocert disclosure: > http://www.ocert.org/advisories/ocert-2011-003.html Discussion of hash functions in general: http://burtleburtle.net/bob/hash/doobs.html Two of the best hash functions that currently exist: http://code.google.com/p/cityhash/ and http://code.google.com/p/smhasher/wiki/MurmurHash. I'm not sure exactly what problem the paper is primarily complaining about: 1. Multiply+add and multiply+xor hashes are weak: this would be solved by changing to either of the better-and-faster hashes I linked to above. On the other hand: http://mail.python.org/pipermail/python-3000/2007-September/010327.html 2. It's possible to find collisions in any hash function in a 32-bit space: only solved by picking a varying seed at startup or compile time. If you decide to change to a stronger hash function overall, it might also be useful to change the advice "to somehow mix together (e.g. using exclusive or) the hash values for the components" in http://docs.python.org/py3k/reference/datamodel.html#object.__hash__. hash(tuple(components)) will likely be better if tuple's hash is improved. Hash functions are already unstable across Python versions. Making them unstable across interpreter processes (multiprocessing doesn't share dicts, right?) doesn't sound like a big additional problem. Users who want a distributed hash table will need to pull their own hash function out of hashlib or re-implement a non-cryptographic hash instead of using the built-in one, but they probably need to do that already to allow themselves to upgrade Python. Jeffrey From pje at telecommunity.com Sat Dec 31 22:43:00 2011 From: pje at telecommunity.com (PJ Eby) Date: Sat, 31 Dec 2011 16:43:00 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin wrote: > Hash functions are already unstable across Python versions. Making > them unstable across interpreter processes (multiprocessing doesn't > share dicts, right?) doesn't sound like a big additional problem. > Users who want a distributed hash table will need to pull their own > hash function out of hashlib or re-implement a non-cryptographic hash > instead of using the built-in one, but they probably need to do that > already to allow themselves to upgrade Python. > Here's an idea. Suppose we add a sys.hash_seed or some such, that's settable to an int, and defaults to whatever we're using now. Then programs that want a fix can just set it to a random number, and on Python versions that support it, it takes effect. Everywhere else it's a silent no-op. Downside: sys has to have slots for this to work; does sys actually have slots? My memory's hazy on that. I guess actually it'd have to be sys.set_hash_seed(). But same basic idea. Anyway, this would make fixing the problem *possible*, while still pushing off the hard decisions to the app/framework developers. ;-) Downside: every hash operation includes one extra memory access, but strings only compute their hash once anyway.) Given that changing dict won't help, and changing the default hash is a non-starter, an option to set the seed is probably the way to go. (Maybe with an environment variable and/or command line option so users can work around old code.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Dec 31 23:38:48 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 31 Dec 2011 17:38:48 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On 12/31/2011 4:43 PM, PJ Eby wrote: > Here's an idea. Suppose we add a sys.hash_seed or some such, that's > settable to an int, and defaults to whatever we're using now. Then > programs that want a fix can just set it to a random number, I do not think we can allow that to change once there are hashed dictionaries existing. -- Terry Jan Reedy