From tjreedy at udel.edu Sun Jun 1 00:21:00 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 31 May 2014 18:21:00 -0400 Subject: [Python-Dev] Updating turtle.py In-Reply-To: <538A1A03.4020405@v.loewis.de> References: <538A1A03.4020405@v.loewis.de> Message-ID: On 5/31/2014 2:05 PM, "Martin v. L?wis" wrote: > Am 31.05.14 05:32, schrieb Terry Reedy: >> I have two areas of questions about updating turtle.py. First the module >> itself, then a turtle tracker issue versus code cleanup policies. >> >> A. Unlike most stdlib modules, turtle is copyrighted and licensed by an >> individual. >> ''' >> # turtle.py: a Tkinter based turtle graphics module for Python >> # Version 1.1b - 4. 5. 2009 >> # Copyright (C) 2006 - 2010 Gregor Lingl >> # email: glingl at aon.at >> ''' >> I am not sure what the copyright covers other than the exact text >> contributed, with updates, by Gregor. It certainly does not cover the >> API and whatever code he copied from the previous version (unless that >> was also by him, and I have no idea how much he copied when >> reimplementing). I don't think it should cover additions made by others >> either. Should there be another line to cover these? > > He should provide a contributor form, covering his past contributions. > Would you like to contact him about this? Thank you for the advice. I emailed him about contributor form, change notice in the file, and maintenance. > Adding a license up-front (as you propose) is counter-productive; the > author may not agree to your specific licensing terms. If he was > unwilling to agree to the contributor form (which I doubt, knowing > him personally), the only option would be to remove the code from the > distribution. > >> Responding today, I cautioned that clean-up only patches, such as she >> apparently would like to start with, are not in favor. > > I would not say that. I recall that I asked Gregor to make a number of > style changes before he submitted the code, and eventually agreed to the > code when I thought it was "good enough". However, continuing on that > path sounds reasonable to me. I am not sure what you mean by 'that path', to be continued on. > It is the mixing of clean-up patches with functional changes that is not > in favor. What I have understood from Guido is that 'blind' format changes, not part of working on the file, are not good as they could cause harm without direct benefit. On the otherhand, you are saying that if the code is reviewed, then the format changes should be separate, possibly with a commit note that they are not 'blind'. >> Since she only marked the issue for 3.5, I also cautioned that 3.5-only >> cleanups would make fixing bugs in other issues harder. Is the code >> clean-up policy the same for all branches? > > I don't think that we should be taken hostage by merging restrictions > of the DVCS - we switched to the DVCS precisely with the promise that > merging would be easier. Given the number of bug fixes that the turtle > module has seen, which is miniscule in the last few years... I ran differ on the 3.4 and 3.5 versions of turtle.py and did not see any differences. So at the moment, forward porting is trivial. > I'd suggest that it is less work to restrict cleanup > to 3.5, and then deal with any forward-porting of bug fixing when it > actually happens. This would make it non-trivial for any patch hitting a difference. -- Terry Jan Reedy From steve at pearwood.info Sun Jun 1 10:11:39 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 1 Jun 2014 18:11:39 +1000 Subject: [Python-Dev] Should standard library modules optimize for CPython? Message-ID: <20140601081139.GO10355@ando> I think I know the answer to this, but I'm going to ask it anyway... I know that there is a general policy of trying to write code in the standard library that does not disadvantage other implementations. How far does that go the other way? Should the standard library accept slower code because it will be much faster in other implementations? Briefly, I have a choice of algorithm for the median function in the statistics module. If I target CPython, I will use a naive but simple O(N log N) implementation based on sorting the list and returning the middle item. (That's what the module currently does.) But if I target PyPy, I will use an O(N) algorithm which knocks the socks off the naive version even for smaller lists. In CPython that's typically 2-5 times slower; in PyPy it's typically 3-8 times faster, and the bigger the data set the more the advantage. For the specific details, see http://bugs.python.org/issue21592 My feeling is that the CPython standard library should be written for CPython, that is, it should stick to the current naive implementation of median, and if PyPy wants to speed the function up, they can provide their own version of the module. I should *not* complicate the implementation by trying to detect which Python the code is running under and changing algorithms accordingly. However, I should put a comment in the module pointing at the tracker issue. Does this sound right to others? Thanks, -- Steve From stefan_ml at behnel.de Sun Jun 1 11:02:56 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 01 Jun 2014 11:02:56 +0200 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: <20140601081139.GO10355@ando> References: <20140601081139.GO10355@ando> Message-ID: Steven D'Aprano, 01.06.2014 10:11: > Briefly, I have a choice of algorithm for the median function in the > statistics module. If I target CPython, I will use a naive but simple > O(N log N) implementation based on sorting the list and returning the > middle item. (That's what the module currently does.) But if I target > PyPy, I will use an O(N) algorithm which knocks the socks off the naive > version even for smaller lists. In CPython that's typically 2-5 times > slower; in PyPy it's typically 3-8 times faster, and the bigger the data > set the more the advantage. > > For the specific details, see http://bugs.python.org/issue21592 > > My feeling is that the CPython standard library should be written for > CPython, that is, it should stick to the current naive implementation of > median, and if PyPy wants to speed the function up, they can provide > their own version of the module. Note that if you compile the module with Cython, CPython heavily benefits from the new implementation, too, by a factor of 2-5x. So there isn't really a reason to choose between two implementations because of the two runtimes, just use the new one for both and compile it for CPython. I added the necessary bits to the ticket. Stefan From ncoghlan at gmail.com Sun Jun 1 14:31:17 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 1 Jun 2014 22:31:17 +1000 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: <20140601081139.GO10355@ando> References: <20140601081139.GO10355@ando> Message-ID: On 1 Jun 2014 18:13, "Steven D'Aprano" wrote: > > My feeling is that the CPython standard library should be written for > CPython, that is, it should stick to the current naive implementation of > median, and if PyPy wants to speed the function up, they can provide > their own version of the module. I should *not* complicate the > implementation by trying to detect which Python the code is running > under and changing algorithms accordingly. However, I should put a > comment in the module pointing at the tracker issue. Does this sound > right to others? One option is to set the pure Python module up to be paired with an accelerator module (and update the test suite accordingly), even if we *don't provide* an accelerator in CPython. That just inverts the more common case (where we have an accelerator written in C, but another implementation either doesn't need one, or just doesn't have one yet). Cheers, Nick. > > > Thanks, > > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Jun 1 18:17:22 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 01 Jun 2014 18:17:22 +0200 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: <20140601081139.GO10355@ando> References: <20140601081139.GO10355@ando> Message-ID: Le 01/06/2014 10:11, Steven D'Aprano a ?crit : > > My feeling is that the CPython standard library should be written for > CPython, that is, it should stick to the current naive implementation of > median, and if PyPy wants to speed the function up, they can provide > their own version of the module. I should *not* complicate the > implementation by trying to detect which Python the code is running > under and changing algorithms accordingly. However, I should put a > comment in the module pointing at the tracker issue. Does this sound > right to others? It sounds ok to me. Regards Antoine. From benjamin at python.org Mon Jun 2 01:02:03 2014 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 01 Jun 2014 16:02:03 -0700 Subject: [Python-Dev] [RELEASE] Python 2.7.7 Message-ID: <1401663723.32188.123962509.15B80988@webmail.messagingengine.com> I'm happy to announce the immediate availability of Python 2.7.7. Python 2.7.7 is a regularly scheduled bugfix release for the Python 2.7 series. This release includes months of accumulated bugfixes. All the changes in Python 2.7.7 are described in detail in the Misc/NEWS file of the source tarball. You can view it online at http://hg.python.org/cpython/raw-file/f89216059edf/Misc/NEWS The 2.7.7 release also contains fixes for two severe, if arcane, potential security vulnerabilities. The first was the possibility of reading arbitrary process memory using JSONDecoder.raw_decode. [1] (No other json APIs are affected.) The second security issue is an integer overflow in the strop module. [2] (You actually have no reason whatsoever to use the strop module.) Another security note for 2.7.7 is that the release includes a backport from Python 3 of hmac.compare_digest. This begins the implementation of PEP 466, Network Security Enhancements for Python 2.7.x. Downloads are at https://python.org/download/releases/2.7.7/ This is a production release. As always, please report bugs to http://bugs.python.org/ Build great things, Benjamin Peterson 2.7 Release Manager (on behalf of all of Python's contributors) [1] http://bugs.python.org/issue21529 [2] http://bugs.python.org/issue21530 From raymond.hettinger at gmail.com Mon Jun 2 02:13:54 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 1 Jun 2014 17:13:54 -0700 Subject: [Python-Dev] Updating turtle.py In-Reply-To: References: Message-ID: <455F20E5-429E-49A2-A652-CED483334BA1@gmail.com> On May 30, 2014, at 8:32 PM, Terry Reedy wrote: > B. Lets assuming that turtle.py is, at least to some degree, fair game for fixes and enhancements. PSF Python PyLadies (Jessica Keller, Lynn Root) are participating in the 2014 GNOME Outreach Program for Women (OPW) https://wiki.python.org/moin/OPW/2014 . One of the projects (bottem of that page) is Graphical Python, in particular Turtle. > > A few days ago, Jessica posted > http://bugs.python.org/issue21573 Clean up turtle.py code formatting > "Lib/turtle.py has some code formatting issues. Let's clean them up to make the module easier to read as interns start working on it this summer." She want to follow cleanup with code examination, fixes, and enhancements. If these modules are going to change (and Gregor gives us the go-ahead), I suggest we do real clean-ups, not shallow pep8/pylint micro-changes. I use these modules as part of a program to teach adults how to teach programming to children. I've have good success but think the code for several of the modules needs to be simplified. At some point, kids wrote some of this code but along the way it got "adultified", making it less useful for teaching younger kids. I would like to be involved in helping to improve these modules in a substantive way and would be happy to coach anyone who wants to undertake the effort and bring a useful patch to fruition. One thing I would not like to see happen is telling interns that their time is being well spent by pep-8 checking code in the standard library. It sends that wrong message about what constitutes an actual contribution to the core. There are plenty of useful things to do instead (we have an "easy" tag on tracker to highlight a few of them). Another thought is that there are tons of python projects that could use real help and those would likely be a better place to start than trying to patch mature standard library code (where the chance of regression, code churn, or rejection is much higher). Over the past few years, I've taught Python to over three thousand programmers and have gotten a number of them started in open source (a number of them are now active contributors to OpenStack for example), but I almost never direct them to take their baby steps in the Python core (unless they've found an actual defect or room for improvement). It's a bummer, but in mature code, almost every idea that occurs to a beginner is something that makes the code worse in some way -- that isn't always true but it happens often enough to be discouraging. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Jun 2 02:19:06 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 1 Jun 2014 17:19:06 -0700 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: References: <20140601081139.GO10355@ando> Message-ID: <8556310D-E314-466E-9BED-E66FCD4C04F1@gmail.com> On Jun 1, 2014, at 9:17 AM, Antoine Pitrou wrote: > Le 01/06/2014 10:11, Steven D'Aprano a ?crit : >> >> My feeling is that the CPython standard library should be written for >> CPython, that is, it should stick to the current naive implementation of >> median, and if PyPy wants to speed the function up, they can provide >> their own version of the module. I should *not* complicate the >> implementation by trying to detect which Python the code is running >> under and changing algorithms accordingly. However, I should put a >> comment in the module pointing at the tracker issue. Does this sound >> right to others? > > It sounds ok to me. That makes sense. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jun 2 06:03:09 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 02 Jun 2014 13:03:09 +0900 Subject: [Python-Dev] Updating turtle.py In-Reply-To: <455F20E5-429E-49A2-A652-CED483334BA1@gmail.com> References: <455F20E5-429E-49A2-A652-CED483334BA1@gmail.com> Message-ID: <878upg556q.fsf@uwakimon.sk.tsukuba.ac.jp> Raymond Hettinger writes: > One thing I would not like to see happen is telling interns that > their time is being well spent by pep-8 checking code in the > standard library. It sends that wrong message about what > constitutes an actual contribution to the core. There are plenty > of useful things to do instead (we have an "easy" tag on tracker to > highlight a few of them). I have to ask for a qualification here, at least in the case of GSoC interns. Of course the intern should contribute to the code, but they are also supposed to become developers in the community. Spending a few hours checking code for PEP-8-correctness is useful training in writing good core code going forward. I agree with you that if they don't move on after a day or so, they should be told to do so. OTOH, I haven't yet met an intern who was willing and able to write in good PEP 8 style to start with, let alone one who was willing to "waste his time" doing style-checking on existing code -- is this really a problem? I agree that they should be told that this is an investment in *their* skills, and at best of marginal value to *Python*, of course. As you point out, directing them away from core code to other projects requiring PEP 8 in their style guides is usually a good idea, too. > It's a bummer, but in mature code, almost every idea that occurs to > a beginner is something that makes the code worse in some way -- > that isn't always true but it happens often enough to be > discouraging. This is precisely why style-checking in the core may be a good idea for interns: assume the code is *good* code (it probably is), don't mess with the algorithms, but make the code "look right" according to project standards. The risk you cite is still there, but much less. It shows them what Pythonicity looks like at a deeper level than the relatively superficial[1] guidelines in PEP 8. Footnotes: [1] Not deprecatory. Consistent good looks are important. From ncoghlan at gmail.com Mon Jun 2 09:12:47 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jun 2014 17:12:47 +1000 Subject: [Python-Dev] Updating turtle.py In-Reply-To: <878upg556q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <455F20E5-429E-49A2-A652-CED483334BA1@gmail.com> <878upg556q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2 June 2014 14:03, Stephen J. Turnbull wrote: > Raymond Hettinger writes: > > It's a bummer, but in mature code, almost every idea that occurs to > > a beginner is something that makes the code worse in some way -- > > that isn't always true but it happens often enough to be > > discouraging. > > This is precisely why style-checking in the core may be a good idea > for interns: assume the code is *good* code (it probably is), don't > mess with the algorithms, but make the code "look right" according to > project standards. The risk you cite is still there, but much less. > It shows them what Pythonicity looks like at a deeper level than the > relatively superficial[1] guidelines in PEP 8. The problem from my perspective is that the standard library contains code where it's either old enough to predate the evolution of the conventions now documented in PEP 8, or else we declared some code (especially test code) "good enough" for inclusion because we *really* wanted the functionality it provided (the original ipaddr tests come to mind - I suspect that tracker issue is one of the specific cases Raymond is thinking of as well). Even if we had unlimited reviewer resources (which we don't), mechanical code cleanups tend to fall under the "if it ain't broke, don't fix it" guideline. That then sets us up for a conflict between folks just getting started and trying to be helpful, and those of us that are of the school of thought that sees a difference between "cleaning code up to make it easier to work on a subsequent bug fix or feature request" and "cleaning code up for the sake of cleaning it up". The latter is generally a bad idea, while the former may be a good idea, but it can be hard to explain the difference to folks that are more familiar with code bases started in the modern era where the ability to easily run automated tests and code analysis on every commit is almost assumed, rather than being seen as an exceptional situation. There's a reason the desire to "throw it out and start again with a clean slate" is a common trait amongst developers: green field programming is genuinely *more fun* than maintenance programming in most cases. I believe Raymond's concern (and mine) is that if the challenges of maintenance programming aren't made clear to potential contributors up front, they're going to be disappointed when their patches that might be fine for a green field project, or as part of the development of a particular feature or fix, are instead rejected as imposing too much risk for not enough gain when considered in isolation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From martin at v.loewis.de Mon Jun 2 09:14:11 2014 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 02 Jun 2014 09:14:11 +0200 Subject: [Python-Dev] Updating turtle.py In-Reply-To: References: <538A1A03.4020405@v.loewis.de> Message-ID: <538C2443.3070702@v.loewis.de> Am 01.06.14 00:21, schrieb Terry Reedy: >>> Responding today, I cautioned that clean-up only patches, such as she >>> apparently would like to start with, are not in favor. >> >> I would not say that. I recall that I asked Gregor to make a number of >> style changes before he submitted the code, and eventually agreed to the >> code when I thought it was "good enough". However, continuing on that >> path sounds reasonable to me. > > I am not sure what you mean by 'that path', to be continued on. The path of improving the coding style of the turtle module. >> I'd suggest that it is less work to restrict cleanup >> to 3.5, and then deal with any forward-porting of bug fixing when it >> actually happens. > > This would make it non-trivial for any patch hitting a difference. Indeed. OTOH, it's simpler for anybody doing the code cleanup to do it only on one branch. Regards, Martin From victor.stinner at gmail.com Mon Jun 2 10:43:47 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 2 Jun 2014 10:43:47 +0200 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: <20140601081139.GO10355@ando> References: <20140601081139.GO10355@ando> Message-ID: 2014-06-01 10:11 GMT+02:00 Steven D'Aprano : > My feeling is that the CPython standard library should be written for > CPython, Right. PyPy, Jython and IronPython already have their "own" standard library when they need a different implement. PyPy: "lib_pypy" directory (lib-python is the CPython stdlib): https://bitbucket.org/pypy/pypy/src/ac52eb7bbbb059d0b8d001a2103774917cf7396f/lib_pypy/?at=default Jython: "Lib" directory (lib-python is the CPython stdlib): https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a1135/Lib/?at=default IronPython: "IronPython.Modules" directory: http://ironpython.codeplex.com/SourceControl/latest#IronPython_Main/Languages/IronPython/IronPython.Modules/ See for example the _fsum.py module of Jython: https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a1135/Lib/_fsum.py?at=default Victor From stephen at xemacs.org Mon Jun 2 10:46:40 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 02 Jun 2014 17:46:40 +0900 Subject: [Python-Dev] Updating turtle.py In-Reply-To: References: <455F20E5-429E-49A2-A652-CED483334BA1@gmail.com> <878upg556q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87zjhv4s27.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Even if we had unlimited reviewer resources (which we don't), Raymond said "interns". We at least have a mentor. > There's a reason the desire to "throw it out and start again with a > clean slate" is a common trait amongst developers: You mean the Cascade of Attention-Deficit Teenagers development model? > I believe Raymond's concern (and mine) is that if the challenges of > maintenance programming aren't made clear to potential contributors > up front, So make it clear when the assignment is given. Remember, the point I'm making is that it's an investment for the intern, not for Python. If their code eventually gets relegated to a branch the may never ever get merged, that's a learning experience too -- they may have been told, and *thought* they signed up for that up front, but it's different when you actually get told, "it could be useful, but on balance let's not touch this code" or even "the 'owner' of the code doesn't have time to look at changes". It's not something I suggest as a "rite of initiation" for *all* interns. I just think it would be overkill to prohibit it in principle -- I have a couple of (non-Python) interns who would benefit from the exercise (their projects are greenfield code, so they have no "model code" to start from). It wasn't clear to me whether Raymond meant to go that far as a general prohibition. Regards, From fijall at gmail.com Mon Jun 2 10:48:20 2014 From: fijall at gmail.com (Maciej Fijalkowski) Date: Mon, 2 Jun 2014 10:48:20 +0200 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: References: <20140601081139.GO10355@ando> Message-ID: On Mon, Jun 2, 2014 at 10:43 AM, Victor Stinner wrote: > 2014-06-01 10:11 GMT+02:00 Steven D'Aprano : >> My feeling is that the CPython standard library should be written for >> CPython, > > Right. PyPy, Jython and IronPython already have their "own" standard > library when they need a different implement. > > PyPy: "lib_pypy" directory (lib-python is the CPython stdlib): > https://bitbucket.org/pypy/pypy/src/ac52eb7bbbb059d0b8d001a2103774917cf7396f/lib_pypy/?at=default it's for stuff that's in CPython implemented in C, not a reimplementation of python stuff. we patched the most obvious CPython-specific hacks, but it's a loosing battle, you guys will go way out of your way to squeeze extra 2% by doing very obscure hacks. > > Jython: "Lib" directory (lib-python is the CPython stdlib): > https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a1135/Lib/?at=default > > IronPython: "IronPython.Modules" directory: > http://ironpython.codeplex.com/SourceControl/latest#IronPython_Main/Languages/IronPython/IronPython.Modules/ > > See for example the _fsum.py module of Jython: > https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a1135/Lib/_fsum.py?at=default > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com From tjreedy at udel.edu Mon Jun 2 11:17:37 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 02 Jun 2014 05:17:37 -0400 Subject: [Python-Dev] Updating turtle.py In-Reply-To: References: <455F20E5-429E-49A2-A652-CED483334BA1@gmail.com> <878upg556q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6/2/2014 3:12 AM, Nick Coghlan wrote: > Even if we had unlimited reviewer resources (which we don't), > mechanical code cleanups tend to fall under the "if it ain't broke, > don't fix it" guideline. That then sets us up for a conflict between > folks just getting started and trying to be helpful, and those of us > that are of the school of thought that sees a difference between > "cleaning code up to make it easier to work on a subsequent bug fix or In the case of turtle, Jessica said from the beginning that code cleanup would be for the purpose of understanding the code and making it easier to do bug fixes and enhancements. > feature request" and "cleaning code up for the sake of cleaning it > up". As you know, many outsiders think that we take PEP 8 more seriously than we do. The latter is generally a bad idea, while the former may be a > good idea, Lita seemed to quickly understand that being able to test a bug fix is more important than making it look pretty. In any case, I believe she is doing something else until we hear from Gregor or otherwise decide how to proceed with turtle. > but it can be hard to explain the difference to folks that > are more familiar with code bases started in the modern era where the > ability to easily run automated tests and code analysis on every > commit is almost assumed, rather than being seen as an exceptional > situation. -- Terry Jan Reedy From stefan_ml at behnel.de Mon Jun 2 12:32:36 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 02 Jun 2014 12:32:36 +0200 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: References: <20140601081139.GO10355@ando> Message-ID: Maciej Fijalkowski, 02.06.2014 10:48: > On Mon, Jun 2, 2014 at 10:43 AM, Victor Stinner wrote: >> 2014-06-01 10:11 GMT+02:00 Steven D'Aprano : >>> My feeling is that the CPython standard library should be written for >>> CPython, >> >> Right. PyPy, Jython and IronPython already have their "own" standard >> library when they need a different implement. >> >> PyPy: "lib_pypy" directory (lib-python is the CPython stdlib): >> https://bitbucket.org/pypy/pypy/src/ac52eb7bbbb059d0b8d001a2103774917cf7396f/lib_pypy/?at=default > > it's for stuff that's in CPython implemented in C, not a > reimplementation of python stuff. we patched the most obvious > CPython-specific hacks, but it's a loosing battle, you guys will go > way out of your way to squeeze extra 2% by doing very obscure hacks. Thus my proposal to compile the modules in CPython with Cython, rather than duplicating their code or making/keeping them CPython specific. I think reducing the urge to reimplement something in C is a good thing. Stefan From michael.haubenwallner at ssi-schaefer.com Mon Jun 2 20:11:15 2014 From: michael.haubenwallner at ssi-schaefer.com (Michael Haubenwallner) Date: Mon, 02 Jun 2014 20:11:15 +0200 Subject: [Python-Dev] use cases for "python-config" versus "pkg-config python" In-Reply-To: <5385F7E7.9090408@ssi-schaefer.com> References: <5385F7E7.9090408@ssi-schaefer.com> Message-ID: <538CBE43.7070303@ssi-schaefer.com> Hi, following up myself with a patch proposal: On 05/28/2014 04:51 PM, Michael Haubenwallner wrote: > Stumbling over problems on AIX (Modules/python.exp not found) building libxml2 as python module > let me wonder about the intended use-cases for 'python-config' and 'pkg-config python'. > > FWIW, I can see these distinct use cases here, and I'm kindly asking if I got them right: > > * Build an application containing a python interpreter (like python$EXE itself): > + link against libpython.so > + re-export symbols from libpython.so for python-modules (platform-specific) > + This is similar to build against any other library, thus > = 'python.pc' is installed (for 'pkg-config python'). > > * Build a python-module (like build/lib.-/*.so): > + no need to link against libpython.so, instead > + expect symbols from libpython.so to be available at runtime, platform-specific either as > + undefined symbols at build-time (Linux, others), or > + a list of symbols to import from "the main executable" (AIX) > + This is specific to python-modules, thus > = 'python-config' is installed. > Based on these use-cases, I'm on a trip towards a patch improving AIX support here, where the attached one is a draft against python-tip (next step is to have python-config not print $LIBS, but $LINKFORMODULE only). Thoughts? Thank you! /haubi/ -------------- next part -------------- diff -r dc3afbee4ad1 Makefile.pre.in --- a/Makefile.pre.in Mon Jun 02 01:32:23 2014 -0700 +++ b/Makefile.pre.in Mon Jun 02 19:57:54 2014 +0200 @@ -87,6 +87,9 @@ SGI_ABI= @SGI_ABI@ CCSHARED= @CCSHARED@ LINKFORSHARED= @LINKFORSHARED@ +BLINKFORSHARED= @BLINKFORSHARED@ +LINKFORMODULE= @LINKFORMODULE@ +BLINKFORMODULE= @BLINKFORMODULE@ ARFLAGS= @ARFLAGS@ # Extra C flags added for building the interpreter object files. CFLAGSFORSHARED=@CFLAGSFORSHARED@ @@ -540,7 +543,7 @@ # Build the interpreter $(BUILDPYTHON): Modules/python.o $(LIBRARY) $(LDLIBRARY) $(PY3LIBRARY) - $(LINKCC) $(PY_LDFLAGS) $(LINKFORSHARED) -o $@ Modules/python.o $(BLDLIBRARY) $(LIBS) $(MODLIBS) $(SYSLIBS) $(LDLAST) + $(LINKCC) $(PY_LDFLAGS) $(BLINKFORSHARED) -o $@ Modules/python.o $(BLDLIBRARY) $(LIBS) $(MODLIBS) $(SYSLIBS) $(LDLAST) platform: $(BUILDPYTHON) pybuilddir.txt $(RUNSHARED) $(PYTHON_FOR_BUILD) -c 'import sys ; from sysconfig import get_platform ; print(get_platform()+"-"+sys.version[0:3])' >platform @@ -666,7 +669,7 @@ fi Modules/_testembed: Modules/_testembed.o $(LIBRARY) $(LDLIBRARY) $(PY3LIBRARY) - $(LINKCC) $(PY_LDFLAGS) $(LINKFORSHARED) -o $@ Modules/_testembed.o $(BLDLIBRARY) $(LIBS) $(MODLIBS) $(SYSLIBS) $(LDLAST) + $(LINKCC) $(PY_LDFLAGS) $(BLINKFORSHARED) -o $@ Modules/_testembed.o $(BLDLIBRARY) $(LIBS) $(MODLIBS) $(SYSLIBS) $(LDLAST) ############################################################################ # Importlib @@ -1310,7 +1313,7 @@ # pkgconfig directory LIBPC= $(LIBDIR)/pkgconfig -libainstall: all python-config +libainstalldirs: @for i in $(LIBDIR) $(LIBPL) $(LIBPC); \ do \ if test ! -d $(DESTDIR)$$i; then \ @@ -1319,6 +1322,16 @@ else true; \ fi; \ done + +# resolve Makefile variables eventually found in configured python.pc values +$(DESTDIR)$(LIBPC)/python-$(VERSION).pc: Misc/python.pc Makefile libainstalldirs + @echo "Resolving more values for $(LIBPC)/python-$(VERSION).pc"; \ + if test set = "$${PYTHON_PC_CONTENT:+set}"; \ + then echo '$(PYTHON_PC_CONTENT)' | tr '@' '\n' > $@; \ + else PYTHON_PC_CONTENT="`awk -v ORS='@' '{print $0}' < Misc/python.pc`" $(MAKE) $@ `grep = Misc/python.pc`; \ + fi + +libainstall: all python-config libainstalldirs $(DESTDIR)$(LIBPC)/python-$(VERSION).pc @if test -d $(LIBRARY); then :; else \ if test "$(PYTHONFRAMEWORKDIR)" = no-framework; then \ if test "$(SHLIB_SUFFIX)" = .dll; then \ @@ -1338,7 +1351,6 @@ $(INSTALL_DATA) Modules/Setup $(DESTDIR)$(LIBPL)/Setup $(INSTALL_DATA) Modules/Setup.local $(DESTDIR)$(LIBPL)/Setup.local $(INSTALL_DATA) Modules/Setup.config $(DESTDIR)$(LIBPL)/Setup.config - $(INSTALL_DATA) Misc/python.pc $(DESTDIR)$(LIBPC)/python-$(VERSION).pc $(INSTALL_SCRIPT) $(srcdir)/Modules/makesetup $(DESTDIR)$(LIBPL)/makesetup $(INSTALL_SCRIPT) $(srcdir)/install-sh $(DESTDIR)$(LIBPL)/install-sh $(INSTALL_SCRIPT) python-config.py $(DESTDIR)$(LIBPL)/python-config.py @@ -1540,6 +1552,7 @@ -rm -rf build platform -rm -rf $(PYTHONFRAMEWORKDIR) -rm -f python-config.py python-config + -rm -f Misc/python.pc # Make things extra clean, before making a distribution: # remove all generated files, even Makefile[.pre] @@ -1612,7 +1625,7 @@ .PHONY: frameworkinstallmaclib frameworkinstallapps frameworkinstallunixtools .PHONY: frameworkaltinstallunixtools recheck autoconf clean clobber distclean .PHONY: smelly funny patchcheck touch altmaninstall commoninstall -.PHONY: gdbhooks +.PHONY: gdbhooks libainstalldirs # IF YOU PUT ANYTHING HERE IT WILL GO AWAY # Local Variables: diff -r dc3afbee4ad1 Misc/python-config.in --- a/Misc/python-config.in Mon Jun 02 01:32:23 2014 -0700 +++ b/Misc/python-config.in Mon Jun 02 19:57:54 2014 +0200 @@ -55,7 +55,7 @@ if not getvar('Py_ENABLE_SHARED'): libs.insert(0, '-L' + getvar('LIBPL')) if not getvar('PYTHONFRAMEWORK'): - libs.extend(getvar('LINKFORSHARED').split()) + libs.extend(getvar('LINKFORMODULE').split()) print(' '.join(libs)) elif opt == '--extension-suffix': diff -r dc3afbee4ad1 Misc/python-config.sh.in --- a/Misc/python-config.sh.in Mon Jun 02 01:32:23 2014 -0700 +++ b/Misc/python-config.sh.in Mon Jun 02 19:57:54 2014 +0200 @@ -43,7 +43,6 @@ LIBS="@LIBS@ $SYSLIBS -lpython${VERSION}${ABIFLAGS}" BASECFLAGS="@BASECFLAGS@" LDLIBRARY="@LDLIBRARY@" -LINKFORSHARED="@LINKFORSHARED@" OPT="@OPT@" PY_ENABLE_SHARED="@PY_ENABLE_SHARED@" LDVERSION="@LDVERSION@" @@ -53,6 +52,7 @@ PYTHONFRAMEWORK="@PYTHONFRAMEWORK@" INCDIR="-I$includedir/python${VERSION}${ABIFLAGS}" PLATINCDIR="-I$includedir/python${VERSION}${ABIFLAGS}" +LINKFORMODULE="@LINKFORMODULE@" # Scan for --help or unknown argument. for ARG in $* @@ -88,15 +88,15 @@ echo "$LIBS" ;; --ldflags) - LINKFORSHAREDUSED= + LINKFORMODULEUSED= if [ -z "$PYTHONFRAMEWORK" ] ; then - LINKFORSHAREDUSED=$LINKFORSHARED + LINKFORMODULEUSED=$LINKFORMODULE fi LIBPLUSED= if [ "$PY_ENABLE_SHARED" = "0" ] ; then LIBPLUSED="-L$LIBPL" fi - echo "$LIBPLUSED -L$libdir $LIBS $LINKFORSHAREDUSED" + echo "$LIBPLUSED -L$libdir $LIBS $LINKFORMODULEUSED" ;; --extension-suffix) echo "$SO" diff -r dc3afbee4ad1 Misc/python.pc.in --- a/Misc/python.pc.in Mon Jun 02 01:32:23 2014 -0700 +++ b/Misc/python.pc.in Mon Jun 02 19:57:54 2014 +0200 @@ -9,5 +9,5 @@ Requires: Version: @VERSION@ Libs.private: @LIBS@ -Libs: -L${libdir} -lpython at VERSION@@ABIFLAGS@ +Libs: -L${libdir} -lpython at VERSION@@ABIFLAGS@ @LINKFORSHARED@ Cflags: -I${includedir}/python at VERSION@@ABIFLAGS@ diff -r dc3afbee4ad1 configure.ac --- a/configure.ac Mon Jun 02 01:32:23 2014 -0700 +++ b/configure.ac Mon Jun 02 19:57:54 2014 +0200 @@ -1948,6 +1948,9 @@ AC_SUBST(BLDSHARED) AC_SUBST(CCSHARED) AC_SUBST(LINKFORSHARED) +AC_SUBST(BLINKFORSHARED) +AC_SUBST(LINKFORMODULE) +AC_SUBST(BLINKFORMODULE) # SHLIB_SUFFIX is the extension of shared libraries `(including the dot!) # -- usually .so, .sl on HP-UX, .dll on Cygwin @@ -1975,8 +1978,8 @@ then case $ac_sys_system/$ac_sys_release in AIX*) - BLDSHARED="Modules/ld_so_aix \$(CC) -bI:Modules/python.exp" - LDSHARED="\$(BINLIBDEST)/config/ld_so_aix \$(CC) -bI:\$(BINLIBDEST)/config/python.exp" + BLDSHARED="Modules/ld_so_aix \$(CC) \$(BLINKFORMODULE)" + LDSHARED="\$(LIBPL)/ld_so_aix \$(CC) \$(LINKFORMODULE)" ;; IRIX/5*) LDSHARED="ld -shared";; IRIX*/6*) LDSHARED="ld ${SGI_ABI} -shared -all";; @@ -2136,13 +2139,21 @@ esac fi AC_MSG_RESULT($CCSHARED) -# LINKFORSHARED are the flags passed to the $(CC) command that links -# the python executable -- this is only needed for a few systems +# LINKFORSHARED are the flags passed to the $(CC) command that links an +# application using a python interpreter -- this is only needed for a few systems +# BLINKFORSHARED is for the python executable -- defaults to LINKFORSHARED +# LINKFORMODULE are the flags passed to the $(CC) command that links a +# modules to be imported by the python interpreter of such an application. +# BLINKFORMODULE is for modules built in this python's Modules/ directory. +# Use ${} here if necessary, as these end up in python-config.sh too. AC_MSG_CHECKING(LINKFORSHARED) if test -z "$LINKFORSHARED" then case $ac_sys_system/$ac_sys_release in - AIX*) LINKFORSHARED='-Wl,-bE:Modules/python.exp -lld';; + AIX*) BLINKFORSHARED='-Wl,-bE:Modules/python.exp -lld' + LINKFORSHARED='-Wl,-bE:${LIBPL}/python.exp -lld' + BLINKFORMODULE='-Wl,-bI:Modules/python.exp' + LINKFORMODULE='-Wl,-bI:${LIBPL}/python.exp';; hp*|HP*) LINKFORSHARED="-Wl,-E -Wl,+s";; # LINKFORSHARED="-Wl,-E -Wl,+s -Wl,+b\$(BINLIBDEST)/lib-dynload";; @@ -2193,6 +2204,9 @@ fi AC_MSG_RESULT($LINKFORSHARED) +test -n "${BLINKFORSHARED}" || BLINKFORSHARED="${LINKFORSHARED}" +test -n "${LINKFORMODULE}" || LINKFORMODULE="${LINKFORSHARED}" +test -n "${BLINKFORMODULE}" || BLINKFORMODULE="${LINKFORMODULE}" AC_SUBST(CFLAGSFORSHARED) AC_MSG_CHECKING(CFLAGSFORSHARED) From bcannon at gmail.com Mon Jun 2 20:28:40 2014 From: bcannon at gmail.com (Brett Cannon) Date: Mon, 02 Jun 2014 18:28:40 +0000 Subject: [Python-Dev] use cases for "python-config" versus "pkg-config python" References: <5385F7E7.9090408@ssi-schaefer.com> <538CBE43.7070303@ssi-schaefer.com> Message-ID: Patches sent to python-dev are typically ignored. Could you open an issue on bugs.python.org and upload it there? On Mon Jun 02 2014 at 2:20:43 PM, Michael Haubenwallner < michael.haubenwallner at ssi-schaefer.com> wrote: > Hi, > > following up myself with a patch proposal: > > On 05/28/2014 04:51 PM, Michael Haubenwallner wrote: > > Stumbling over problems on AIX (Modules/python.exp not found) building > libxml2 as python module > > let me wonder about the intended use-cases for 'python-config' and > 'pkg-config python'. > > > > FWIW, I can see these distinct use cases here, and I'm kindly asking if > I got them right: > > > > * Build an application containing a python interpreter (like python$EXE > itself): > > + link against libpython.so > > + re-export symbols from libpython.so for python-modules > (platform-specific) > > + This is similar to build against any other library, thus > > = 'python.pc' is installed (for 'pkg-config python'). > > > > * Build a python-module (like build/lib.-/*.so): > > + no need to link against libpython.so, instead > > + expect symbols from libpython.so to be available at runtime, > platform-specific either as > > + undefined symbols at build-time (Linux, others), or > > + a list of symbols to import from "the main executable" (AIX) > > + This is specific to python-modules, thus > > = 'python-config' is installed. > > > > Based on these use-cases, I'm on a trip towards a patch improving AIX > support here, > where the attached one is a draft against python-tip (next step is to have > python-config > not print $LIBS, but $LINKFORMODULE only). > > Thoughts? > > Thank you! > /haubi/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doko at ubuntu.com Mon Jun 2 21:57:31 2014 From: doko at ubuntu.com (Matthias Klose) Date: Mon, 02 Jun 2014 21:57:31 +0200 Subject: [Python-Dev] use cases for "python-config" versus "pkg-config python" In-Reply-To: <538CBE43.7070303@ssi-schaefer.com> References: <5385F7E7.9090408@ssi-schaefer.com> <538CBE43.7070303@ssi-schaefer.com> Message-ID: <538CD72B.7030402@ubuntu.com> Am 02.06.2014 20:11, schrieb Michael Haubenwallner: > Hi, > > following up myself with a patch proposal: > > On 05/28/2014 04:51 PM, Michael Haubenwallner wrote: >> Stumbling over problems on AIX (Modules/python.exp not found) building libxml2 as python module >> let me wonder about the intended use-cases for 'python-config' and 'pkg-config python'. >> >> FWIW, I can see these distinct use cases here, and I'm kindly asking if I got them right: >> >> * Build an application containing a python interpreter (like python$EXE itself): >> + link against libpython.so >> + re-export symbols from libpython.so for python-modules (platform-specific) >> + This is similar to build against any other library, thus >> = 'python.pc' is installed (for 'pkg-config python'). >> >> * Build a python-module (like build/lib.-/*.so): >> + no need to link against libpython.so, instead >> + expect symbols from libpython.so to be available at runtime, platform-specific either as >> + undefined symbols at build-time (Linux, others), or >> + a list of symbols to import from "the main executable" (AIX) >> + This is specific to python-modules, thus >> = 'python-config' is installed. >> > > Based on these use-cases, I'm on a trip towards a patch improving AIX support here, > where the attached one is a draft against python-tip (next step is to have python-config > not print $LIBS, but $LINKFORMODULE only). > > Thoughts? there is http://bugs.python.org/issue15590 I think it is worth improving, together with adding documentation, and maybe distinguishing the two use cases linking for a module or an embedded interpreter. Matthias From sturla.molden at gmail.com Tue Jun 3 17:13:11 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 3 Jun 2014 15:13:11 +0000 (UTC) Subject: [Python-Dev] Should standard library modules optimize for CPython? References: <20140601081139.GO10355@ando> Message-ID: <1521177704423500642.020210sturla.molden-gmail.com@news.gmane.org> Stefan Behnel wrote: > Thus my proposal to compile the modules in CPython with Cython, rather than > duplicating their code or making/keeping them CPython specific. I think > reducing the urge to reimplement something in C is a good thing. For algorithmic and numerical code, Numba has already proven that Python can be JIT compiled comparable to -O2 in C. For non-algorthmic code, the speed determinants are usually outside Python (e.g. the network connection). Numba is becoming what the "dead swallow" should have been. The question is rather should the standard library use a JIT compiler like Numba? Cython is great for writing C extensions while avoiding all the details of the Python C API. But for speeding up algorithmic code, Numba is easier to use. Sturla From stefan_ml at behnel.de Tue Jun 3 19:00:16 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 03 Jun 2014 19:00:16 +0200 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: <1521177704423500642.020210sturla.molden-gmail.com@news.gmane.org> References: <20140601081139.GO10355@ando> <1521177704423500642.020210sturla.molden-gmail.com@news.gmane.org> Message-ID: Sturla Molden, 03.06.2014 17:13: > Stefan Behnel wrote: > >> Thus my proposal to compile the modules in CPython with Cython, rather than >> duplicating their code or making/keeping them CPython specific. I think >> reducing the urge to reimplement something in C is a good thing. > > For algorithmic and numerical code, Numba has already proven that Python > can be JIT compiled comparable to -O2 in C. For non-algorthmic code, the > speed determinants are usually outside Python (e.g. the network > connection). Numba is becoming what the "dead swallow" should have been. > The question is rather should the standard library use a JIT compiler like > Numba? Cython is great for writing C extensions while avoiding all the > details of the Python C API. But for speeding up algorithmic code, Numba is > easier to use. I certainly agree that a JIT compiler can do much better optimisations on Python code than a static compiler, especially data driven optimisations. However, Numba comes with major dependencies, even runtime dependencies. >From previous discussions on this list, I gathered that there are major objections against adding such a large dependency to CPython since it can also just be installed as an external package if users want to have it. Static compilation, on the other hand, is a build time thing that adds no dependencies that CPython doesn't have already. Distributions can even package up the compiled .so files separately from the original .py/.pyc files, if they feel like it, to make them selectively installable. So the argument in favour is mostly a pragmatic one. If you can have 2-5x faster code essentially for free, why not just go for it? Stefan From sturla.molden at gmail.com Tue Jun 3 22:51:30 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 3 Jun 2014 20:51:30 +0000 (UTC) Subject: [Python-Dev] Should standard library modules optimize for CPython? References: <20140601081139.GO10355@ando> <1521177704423500642.020210sturla.molden-gmail.com@news.gmane.org> Message-ID: <1437293580423521164.103866sturla.molden-gmail.com@news.gmane.org> Stefan Behnel wrote: > So the > argument in favour is mostly a pragmatic one. If you can have 2-5x faster > code essentially for free, why not just go for it? I would be easier if the GIL or Cython's use of it was redesigned. Cython just grabs the GIL and holds on to it until it is manually released. The standard lib cannot have packages that holds the GIL forever, as a Cython compiled module would do. Cython has to start sharing access the GIL like the interpreter does. Sturla From rosuav at gmail.com Tue Jun 3 23:38:00 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 07:38:00 +1000 Subject: [Python-Dev] %x formatting of floats - behaviour change since 3.4 Message-ID: I'm helping out with the micropython project and am finding that one of their tests fails on CPython 3.5 (fresh build from Mercurial this morning). It comes down to this: Python 3.4.1rc1 (default, May 5 2014, 14:28:34) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "%x"%16.0 '10' Python 3.5.0a0 (default:88814d1f8c32, Jun 4 2014, 07:29:32) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "%x"%16.0 Traceback (most recent call last): File "", line 1, in TypeError: %x format: an integer is required, not float Is this an intentional change? And if so, is it formally documented somewhere? I don't recall seeing anything about it, but my recollection doesn't mean much. ChrisA From victor.stinner at gmail.com Wed Jun 4 00:03:07 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 4 Jun 2014 00:03:07 +0200 Subject: [Python-Dev] %x formatting of floats - behaviour change since 3.4 In-Reply-To: References: Message-ID: Hi, 2014-06-03 23:38 GMT+02:00 Chris Angelico : > Is this an intentional change? And if so, is it formally documented > somewhere? I don't recall seeing anything about it, but my > recollection doesn't mean much. Yes, it's intentional. See the issue for the rationale: http://bugs.python.org/issue19995 Victor From eric at trueblade.com Wed Jun 4 00:02:31 2014 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 03 Jun 2014 18:02:31 -0400 Subject: [Python-Dev] %x formatting of floats - behaviour change since 3.4 In-Reply-To: References: Message-ID: <538E45F7.6060209@trueblade.com> On 6/3/2014 5:38 PM, Chris Angelico wrote: > I'm helping out with the micropython project and am finding that one > of their tests fails on CPython 3.5 (fresh build from Mercurial this > morning). It comes down to this: > > Python 3.4.1rc1 (default, May 5 2014, 14:28:34) > [GCC 4.8.2] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> "%x"%16.0 > '10' > > Python 3.5.0a0 (default:88814d1f8c32, Jun 4 2014, 07:29:32) > [GCC 4.7.2] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> "%x"%16.0 > Traceback (most recent call last): > File "", line 1, in > TypeError: %x format: an integer is required, not float > > Is this an intentional change? And if so, is it formally documented > somewhere? I don't recall seeing anything about it, but my > recollection doesn't mean much. http://bugs.python.org/issue19995 From rosuav at gmail.com Wed Jun 4 00:05:58 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 08:05:58 +1000 Subject: [Python-Dev] %x formatting of floats - behaviour change since 3.4 In-Reply-To: References: Message-ID: On Wed, Jun 4, 2014 at 8:03 AM, Victor Stinner wrote: > 2014-06-03 23:38 GMT+02:00 Chris Angelico : >> Is this an intentional change? And if so, is it formally documented >> somewhere? I don't recall seeing anything about it, but my >> recollection doesn't mean much. > > Yes, it's intentional. See the issue for the rationale: > http://bugs.python.org/issue19995 Thanks! I'll fix (in this case, simply remove) the test and cite that issue. ChrisA From v+python at g.nevcal.com Wed Jun 4 00:26:00 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 03 Jun 2014 15:26:00 -0700 Subject: [Python-Dev] %x formatting of floats - behaviour change since 3.4 In-Reply-To: References: Message-ID: <538E4B78.4070503@g.nevcal.com> On 6/3/2014 3:05 PM, Chris Angelico wrote: > On Wed, Jun 4, 2014 at 8:03 AM, Victor Stinner wrote: >> 2014-06-03 23:38 GMT+02:00 Chris Angelico : >>> Is this an intentional change? And if so, is it formally documented >>> somewhere? I don't recall seeing anything about it, but my >>> recollection doesn't mean much. >> Yes, it's intentional. See the issue for the rationale: >> http://bugs.python.org/issue19995 > Thanks! I'll fix (in this case, simply remove) the test and cite that issue. Wouldn't it be better to keep the test, but expect the operation to fail? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Jun 4 00:41:30 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 08:41:30 +1000 Subject: [Python-Dev] %x formatting of floats - behaviour change since 3.4 In-Reply-To: <538E4B78.4070503@g.nevcal.com> References: <538E4B78.4070503@g.nevcal.com> Message-ID: On Wed, Jun 4, 2014 at 8:26 AM, Glenn Linderman wrote: > On 6/3/2014 3:05 PM, Chris Angelico wrote: > > On Wed, Jun 4, 2014 at 8:03 AM, Victor Stinner > wrote: > > 2014-06-03 23:38 GMT+02:00 Chris Angelico : > > Is this an intentional change? And if so, is it formally documented > somewhere? I don't recall seeing anything about it, but my > recollection doesn't mean much. > > Yes, it's intentional. See the issue for the rationale: > http://bugs.python.org/issue19995 > > Thanks! I'll fix (in this case, simply remove) the test and cite that issue. > > > Wouldn't it be better to keep the test, but expect the operation to fail? The way micropython does its tests is: Run CPython on a script, then run micropython on the same script. If the output differs, it's an error. The problem is, CPython 3.3 and CPython 3.5 give different output (one gives an exception, the other works as if int(x) had been given), so it's impossible for the test to be done right. My question was mainly to ascertain whether it's the tests or my system that needed fixing. ChrisA From steve at pearwood.info Wed Jun 4 03:01:43 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 4 Jun 2014 11:01:43 +1000 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: <20140601081139.GO10355@ando> References: <20140601081139.GO10355@ando> Message-ID: <20140604010143.GC10355@ando> On Sun, Jun 01, 2014 at 06:11:39PM +1000, Steven D'Aprano wrote: > I think I know the answer to this, but I'm going to ask it anyway... > > I know that there is a general policy of trying to write code in the > standard library that does not disadvantage other implementations. How > far does that go the other way? Should the standard library accept > slower code because it will be much faster in other implementations? [...] Thanks to everyone who replied! I just wanted to make a brief note to say that although I haven't been very chatty in this thread, I have been reading it, so thanks for the advice, it is appreciated. -- Steven From steve at pearwood.info Wed Jun 4 03:17:18 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 4 Jun 2014 11:17:18 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython Message-ID: <20140604011718.GD10355@ando> There is a discussion over at MicroPython about the internal representation of Unicode strings. Micropython is aimed at embedded devices, and so minimizing memory use is important, possibly even more important than performance. (I'm not speaking on their behalf, just commenting as an interested outsider.) At the moment, their Unicode support is patchy. They are talking about either: * Having a build-time option to restrict all strings to ASCII-only. (I think what they mean by that is that strings will be like Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) * Implementing Unicode internally as UTF-8, and giving up O(1) indexing operations. https://github.com/micropython/micropython/issues/657 Would either of these trade-offs be acceptable while still claiming "Python 3.4 compatibility"? My own feeling is that O(1) string indexing operations are a quality of implementation issue, not a deal breaker to call it a Python. I can't see any requirement in the docs that str[n] must take O(1) time, but perhaps I have missed something. -- Steven From donald at stufft.io Wed Jun 4 03:46:22 2014 From: donald at stufft.io (Donald Stufft) Date: Tue, 3 Jun 2014 21:46:22 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604011718.GD10355@ando> References: <20140604011718.GD10355@ando> Message-ID: <7B966E20-909B-4DC6-9DCC-2206A93763E9@stufft.io> I think UTF8 is the best option. > On Jun 3, 2014, at 9:17 PM, Steven D'Aprano wrote: > > There is a discussion over at MicroPython about the internal > representation of Unicode strings. Micropython is aimed at embedded > devices, and so minimizing memory use is important, possibly even > more important than performance. > > (I'm not speaking on their behalf, just commenting as an interested > outsider.) > > At the moment, their Unicode support is patchy. They are talking about > either: > > * Having a build-time option to restrict all strings to ASCII-only. > > (I think what they mean by that is that strings will be like Python 2 > strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) > > * Implementing Unicode internally as UTF-8, and giving up O(1) > indexing operations. > > https://github.com/micropython/micropython/issues/657 > > > Would either of these trade-offs be acceptable while still claiming > "Python 3.4 compatibility"? > > My own feeling is that O(1) string indexing operations are a quality of > implementation issue, not a deal breaker to call it a Python. I can't > see any requirement in the docs that str[n] must take O(1) time, but > perhaps I have missed something. > > > > > -- > Steven > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io From rosuav at gmail.com Wed Jun 4 04:32:12 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 12:32:12 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604011718.GD10355@ando> References: <20140604011718.GD10355@ando> Message-ID: On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano wrote: > * Having a build-time option to restrict all strings to ASCII-only. > > (I think what they mean by that is that strings will be like Python 2 > strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) What I was actually suggesting along those lines was that the str type still be notionally a Unicode string, but that any codepoints >127 would either raise an exception or blow an assertion, and all the code to handle multibyte representations would be compiled out. So there'd still be a difference between strings of text and streams of bytes, but all encoding and decoding to/from ASCII-compatible encodings would just point to the same bytes in RAM. Risk: Someone would implement that with assertions, then compile with assertions disabled, test only with ASCII, and have lurking bugs. ChrisA From ncoghlan at gmail.com Wed Jun 4 07:17:00 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Jun 2014 15:17:00 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604011718.GD10355@ando> References: <20140604011718.GD10355@ando> Message-ID: On 4 June 2014 11:17, Steven D'Aprano wrote: > My own feeling is that O(1) string indexing operations are a quality of > implementation issue, not a deal breaker to call it a Python. If string indexing & iteration is still presented to the user as "an array of code points", it should still avoid the bugs that plagued both Python 2 narrow builds and direct use of UTF-8 encoded Py2 strings. If they don't try to offer C API compatibility, it should be feasible to do it that way. If they *do* try to offer C API compatibility, they may have a problem. > I can't > see any requirement in the docs that str[n] must take O(1) time, but > perhaps I have missed something. There's a general expectation that indexing will be O(1) because all the builtin containers that support that syntax use it for O(1) lookup operations. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Jun 4 07:23:07 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 3 Jun 2014 22:23:07 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: On Tue, Jun 3, 2014 at 7:32 PM, Chris Angelico wrote: > On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano > wrote: > > * Having a build-time option to restrict all strings to ASCII-only. > > > > (I think what they mean by that is that strings will be like Python 2 > > strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) > > What I was actually suggesting along those lines was that the str type > still be notionally a Unicode string, but that any codepoints >127 > would either raise an exception or blow an assertion, and all the code > to handle multibyte representations would be compiled out. That would be a pretty lousy option. So there'd > still be a difference between strings of text and streams of bytes, > but all encoding and decoding to/from ASCII-compatible encodings would > just point to the same bytes in RAM. > I suppose this is why you propose to reject 128-255? > Risk: Someone would implement that with assertions, then compile with > assertions disabled, test only with ASCII, and have lurking bugs. > Never mind disabling assertions -- even with enabled assertions you'd have to expect most Python programs to fail with non-ASCII input. Then again the UTF-8 option would be pretty devastating too for anything manipulating strings (especially since many Python APIs are defined using indexes, e.g. the re module). Why not support variable-width strings like CPython 3.4? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Jun 4 08:51:12 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 16:51:12 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: On Wed, Jun 4, 2014 at 3:17 PM, Nick Coghlan wrote: > On 4 June 2014 11:17, Steven D'Aprano wrote: >> My own feeling is that O(1) string indexing operations are a quality of >> implementation issue, not a deal breaker to call it a Python. > > If string indexing & iteration is still presented to the user as "an > array of code points", it should still avoid the bugs that plagued > both Python 2 narrow builds and direct use of UTF-8 encoded Py2 > strings. It would. The downsides of a UTF-8 representation would be slower iteration and much slower (O(N)) indexing/slicing. ChrisA From martin at v.loewis.de Wed Jun 4 09:02:13 2014 From: martin at v.loewis.de (martin at v.loewis.de) Date: Wed, 04 Jun 2014 09:02:13 +0200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604011718.GD10355@ando> References: <20140604011718.GD10355@ando> Message-ID: <20140604090213.Horde.iGDQDjno1ZQixW4P-6T4Mw1@webmail.df.eu> Zitat von Steven D'Aprano : > * Having a build-time option to restrict all strings to ASCII-only. > > (I think what they mean by that is that strings will be like Python 2 > strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) An ASCII-plus-arbitrary-bytes type called "str" would prevent claiming "Python 3.4 compatibility" for sure. Restricting strings to ASCII (as Chris apparently actually suggested) would allow to claim compatibility with a stretch: existing Python code might not run on such an implementation. However, since a lot of existing Python code wouldn't run on MicroPython, anyway, one might claim to implement a Python 3.4 subset. > * Implementing Unicode internally as UTF-8, and giving up O(1) > indexing operations. > > Would either of these trade-offs be acceptable while still claiming > "Python 3.4 compatibility"? > > My own feeling is that O(1) string indexing operations are a quality of > implementation issue, not a deal breaker to call it a Python. I can't > see any requirement in the docs that str[n] must take O(1) time, but > perhaps I have missed something. I agree. It's an open question whether such an implementation would be practical, both in terms of existing Python code, and in terms of existing C extension modules that people might want to port to MicroPython. There are more things to consider for the internal implementation, in particular how the string length is implemented. Several alternatives exist: 1. store the UTF-8 length (i.e. memory size) 2. store the number of code points (i.e. Python len()) 3. store both 4. store neither, but use null termination instead Variant 3 is most run-time efficient, but could easily use 8 bytes just for the length, which could outweigh the storage of the actual data. Variants 1 and 2 lose on some operations (1 loses on computing len(), 2 loses on string concatenation). 3 would add the restriction of not allowing U+0000 in a string (which would be reasonable IMO), and make all length computations inefficient. However, it wouldn't be worse than standard C. Regards, Martin From rosuav at gmail.com Wed Jun 4 09:03:22 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 17:03:22 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: On Wed, Jun 4, 2014 at 3:23 PM, Guido van Rossum wrote: > On Tue, Jun 3, 2014 at 7:32 PM, Chris Angelico wrote: >> >> On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano >> wrote: >> > * Having a build-time option to restrict all strings to ASCII-only. >> > >> > (I think what they mean by that is that strings will be like Python 2 >> > strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) >> >> What I was actually suggesting along those lines was that the str type >> still be notionally a Unicode string, but that any codepoints >127 >> would either raise an exception or blow an assertion, and all the code >> to handle multibyte representations would be compiled out. > > > That would be a pretty lousy option. > >> So there'd >> still be a difference between strings of text and streams of bytes, >> but all encoding and decoding to/from ASCII-compatible encodings would >> just point to the same bytes in RAM. > > I suppose this is why you propose to reject 128-255? Correct. It would allow small devices to guarantee that strings are compact (MicroPython is aimed primarily at an embedded controller), guarantee identity transformations in several common encodings (and maybe this sort of build wouldn't ship with any non-ASCII-compat encodings at all), and never demonstrate behaviour different from CPython's except by explicitly failing. >> Risk: Someone would implement that with assertions, then compile with >> assertions disabled, test only with ASCII, and have lurking bugs. > > > Never mind disabling assertions -- even with enabled assertions you'd have > to expect most Python programs to fail with non-ASCII input. Right, which is why I don't like the idea. But you don't need non-ASCII characters to blink an LED or turn a servo, and there is significant resistance to the notion that appending a non-ASCII character to a long ASCII-only string requires the whole string to be copied and doubled in size (lots of heap space used). > Then again the UTF-8 option would be pretty devastating too for anything > manipulating strings (especially since many Python APIs are defined using > indexes, e.g. the re module). That's what I thought, too, but a quick poll on python-list suggests that indexing isn't nearly as common as I had thought it to be. On a smallish device, you won't have megabytes of string to index, so even O(N) indexing can't get pathological. (This would be an acknowledged limitation of micropython as a Unix Python - "it's designed for small programs, and it's performance-optimized for small programs, so it might get pathologically slow on certain large data manipulations".) > Why not support variable-width strings like CPython 3.4? That was my first recommendation, and in fact I started writing code to implement parts of PEP 393, with a view to basically doing it the same way in both Pythons. But discussion on the tracker issue showed a certain amount of hostility toward the potential expansion of strings, particularly in the worst-case example of appending a single SMP character onto a long ASCII string. ChrisA From rosuav at gmail.com Wed Jun 4 09:06:25 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 17:06:25 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604090213.Horde.iGDQDjno1ZQixW4P-6T4Mw1@webmail.df.eu> References: <20140604011718.GD10355@ando> <20140604090213.Horde.iGDQDjno1ZQixW4P-6T4Mw1@webmail.df.eu> Message-ID: On Wed, Jun 4, 2014 at 5:02 PM, wrote: > There are more things to consider for the internal implementation, > in particular how the string length is implemented. Several alternatives > exist: > 1. store the UTF-8 length (i.e. memory size) > 2. store the number of code points (i.e. Python len()) > 3. store both > 4. store neither, but use null termination instead > > Variant 3 is most run-time efficient, but could easily use 8 bytes > just for the length, which could outweigh the storage of the actual > data. Variants 1 and 2 lose on some operations (1 loses on computing > len(), 2 loses on string concatenation). 3 would add the restriction > of not allowing U+0000 in a string (which would be reasonable IMO), > and make all length computations inefficient. However, it wouldn't > be worse than standard C. The current implementation stores a 16-bit length, which is both the memory size and the len(). As far as I can see, the memory size is never needed, so I'd just go for option 2; string concatenation is already known to be one of those operations that can be slow if you do it badly, and an optimized str.join() would cover the recommended use-case. ChrisA From dw+python-dev at hmmz.org Wed Jun 4 07:39:04 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Wed, 4 Jun 2014 05:39:04 +0000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: <20140604053904.GA5309@k2> On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote: > There's a general expectation that indexing will be O(1) because all > the builtin containers that support that syntax use it for O(1) lookup > operations. Depending on your definition of built in, there is at least one standard library container that does not - collections.deque. Given the specialized kinds of application this Python implementation is targetted at, it seems UTF-8 is ideal considering the huge memory savings resulting from the compressed representation, and the reduced likelihood of there being any real need for serious text processing on the device. It is also unlikely to find software or libraries like Django or Werkzeug running on a microcontroller, more likely all the Python code would be custom, in which case, replacing string indexing with iteration, or temporary conversion to a list is easily done. In this context, while a fixed-width encoding may be the correct choice it would also likely be the wrong choice. David From ja.py at farowl.co.uk Wed Jun 4 09:41:12 2014 From: ja.py at farowl.co.uk (Jeff Allen) Date: Wed, 04 Jun 2014 08:41:12 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604011718.GD10355@ando> References: <20140604011718.GD10355@ando> Message-ID: <538ECD98.5030309@farowl.co.uk> Jython uses UTF-16 internally -- probably the only sensible choice in a Python that can call Java. Indexing is O(N), fundamentally. By "fundamentally", I mean for those strings that have not yet noticed that they contain no supplementary (>0xffff) characters. I've toyed with making this O(1) universally. Like Steven, I understand this to be a freedom afforded to implementers, rather than an issue of conformity. Jeff Allen On 04/06/2014 02:17, Steven D'Aprano wrote: > There is a discussion over at MicroPython about the internal > representation of Unicode strings. ... > My own feeling is that O(1) string indexing operations are a quality of > implementation issue, not a deal breaker to call it a Python. I can't > see any requirement in the docs that str[n] must take O(1) time, but > perhaps I have missed something. > From stephen at xemacs.org Wed Jun 4 11:36:20 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Jun 2014 18:36:20 +0900 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604053904.GA5309@k2> References: <20140604011718.GD10355@ando> <20140604053904.GA5309@k2> Message-ID: <87bnu93tkb.fsf@uwakimon.sk.tsukuba.ac.jp> dw+python-dev at hmmz.org writes: > Given the specialized kinds of application this Python > implementation is targetted at, it seems UTF-8 is ideal considering > the huge memory savings resulting from the compressed > representation, I think you really need to check what the applications are in detail. UTF-8 costs about 35% more storage for Japanese, and even more for Chinese, than does UTF-16. So if you might be using a lot of Asian localized strings, it might even be worth implementing PEP-393 to get the best of both worlds for most strings. From juraj.sukop at gmail.com Wed Jun 4 11:53:43 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Wed, 4 Jun 2014 11:53:43 +0200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <87bnu93tkb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140604011718.GD10355@ando> <20140604053904.GA5309@k2> <87bnu93tkb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 4, 2014 at 11:36 AM, Stephen J. Turnbull wrote: > > I think you really need to check what the applications are in detail. > UTF-8 costs about 35% more storage for Japanese, and even more for > Chinese, than does UTF-16. "UTF-8 can be smaller even for Asian languages, e.g.: front page of Wikipedia Japan: 83 kB in UTF-8, 144 kB in UTF-16" >From http://www.lua.org/wshop12/Ierusalimschy.pdf (p. 12) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Wed Jun 4 12:41:05 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 4 Jun 2014 06:41:05 -0400 Subject: [Python-Dev] Some notes about MicroPython from an observer Message-ID: - micropython is designed to run on a machine with 192 kilobytes of RAM and perhaps a megabyte of FLASH. The controller can execute read-only code directly from FLASH. There is no dynamic linker in this environment. (It also has a UNIX port). - However it does include a full Python parser and REPL, so the board can be programmed without a separate computer as opposed to, say, having to upload bytecode compiled on a regular computer. - It's definitely going to be a subset of Python. For example, func.__name__ is not supported - to make it more micro? - They have a C API. It is much different than the CPython C API. - It mas more than one code emitter. A certain decorator causes a function to be compiled to ARM Thumb code instead of bytecode. - It even has an inline assembler than translates Python-syntax ARM assembly (to re-use the same parser) into machine code. Most information from https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts and http://micropython.org/ From rosuav at gmail.com Wed Jun 4 12:51:36 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 20:51:36 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604133857.13a0f0b9@x34f> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> Message-ID: On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky wrote: > That's another reason why people don't like Unicode enforced upon them > - all the talk about supporting all languages and scripts is demagogy > and hypocrisy, given a choice, Unicode zealots would rather limit > people to Latin script then give up on their arbitrarily chosen, > one-among-thousands, > soon-to-be-replaced-by-apples'-and-microsofts'-"exciting-new" encoding. Wrong. I use and recommend Unicode, with UTF-8 for transmission, and I do not ever want to limit people to Latin-1 or any other such subset. Even though English is the only language I speak, I am *frequently* using non-ASCII characters (eg when I discuss mathematics on a MUD), and if I could be absolutely sure that everyone in the conversation correctly comprehended Unicode, I could do this with a lot more confidence. Unfortunately, the server I use just passes bytes in and out, and some clients assume CP-1252, others assume Latin-1, and others (including my Gypsum) try UTF-8 first and fall back on an eight-bit encoding (currently CP-1252 because of the first group). But in an ideal world, server and clients would all speak Unicode everywhere, and transmit and receive UTF-8. This is not hypocrisy, this is the way to work reliably. > Once again, my claim is what MicroPython implements now is more correct > - in a sense wider than technical - handling. We don't provide Unicode > encoding support, because it's highly bloated, but let people use any > encoding they like. That comes at some price, like length of strings in > characters are not know to runtime, only in bytes, but quite a lot of > applications can be written by having just that. The current implementation is flat-out lying, actually. It claims that it's storing Unicode codepoints (as per the Python spec) while actually storing bytes, and then it transmits those bytes to the console etc as-is. This is a bug. It needs to be fixed. The only question is, what form will the fix take? Will it be PEP 393's flexible fixed-width representation? UTF-8? UTF-16 (I hope not!)? A hybrid of Latin-1 where possible and UTF-8 otherwise? But something has to be done. ChrisA From rosuav at gmail.com Wed Jun 4 12:53:46 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 20:53:46 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604133857.13a0f0b9@x34f> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> Message-ID: On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky wrote: > And I'm saying that not to discourage Unicode addition to MicroPython, > but to hint that "force-force" approach implemented by CPython3 and > causing rage and split in the community is not appreciated. FWIW, it's Python 3 (the language) and not CPython 3.x (the implementation) that specifies Unicode strings in this way. I don't know why it has to cause a split in the community; this is the one way to make sure *everyone's* strings work perfectly, rather than having ASCII strings work fine and others start tripping over problems in various APIs. ChrisA From pmiscml at gmail.com Wed Jun 4 12:38:57 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 13:38:57 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: <20140604133857.13a0f0b9@x34f> Hello, On Wed, 4 Jun 2014 12:32:12 +1000 Chris Angelico wrote: > On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano > wrote: > > * Having a build-time option to restrict all strings to ASCII-only. > > > > (I think what they mean by that is that strings will be like > > Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) > > What I was actually suggesting along those lines was that the str type > still be notionally a Unicode string, but that any codepoints >127 > would either raise an exception or blow an assertion, That's another reason why people don't like Unicode enforced upon them - all the talk about supporting all languages and scripts is demagogy and hypocrisy, given a choice, Unicode zealots would rather limit people to Latin script then give up on their arbitrarily chosen, one-among-thousands, soon-to-be-replaced-by-apples'-and-microsofts'-"exciting-new" encoding. Once again, my claim is what MicroPython implements now is more correct - in a sense wider than technical - handling. We don't provide Unicode encoding support, because it's highly bloated, but let people use any encoding they like. That comes at some price, like length of strings in characters are not know to runtime, only in bytes, but quite a lot of applications can be written by having just that. And I'm saying that not to discourage Unicode addition to MicroPython, but to hint that "force-force" approach implemented by CPython3 and causing rage and split in the community is not appreciated. > and all the code > to handle multibyte representations would be compiled out. So there'd > still be a difference between strings of text and streams of bytes, > but all encoding and decoding to/from ASCII-compatible encodings would > just point to the same bytes in RAM. > > Risk: Someone would implement that with assertions, then compile with > assertions disabled, test only with ASCII, and have lurking bugs. > > ChrisA -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Wed Jun 4 12:53:14 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 13:53:14 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: <20140604135314.4bb31d75@x34f> Hello, On Tue, 3 Jun 2014 22:23:07 -0700 Guido van Rossum wrote: [] > Never mind disabling assertions -- even with enabled assertions you'd > have to expect most Python programs to fail with non-ASCII input. > > Then again the UTF-8 option would be pretty devastating too for > anything manipulating strings (especially since many Python APIs are > defined using indexes, e.g. the re module). If the Unicode is slow (*), then obvious choice is not using Unicode when not needed. Too bad that's a bit hard in Python3, as it enforces Unicode everywhere, and dealing with efficient strings requires prefixing them with funny characters like "b", etc. * If Unicode if slow because it causes heap to bloat and go swap, the choice is still the same. > > Why not support variable-width strings like CPython 3.4? Because, like good deal of community, we hope that Python4 will get back to reality, and strings will be efficient (both for processing and storage) by default, and niche and marginal "Unicode string" type will be used explicitly (using funny prefixes, etc.), only when really needed. Ah, all these not so funny geek jokes about internals of language implementation, hope they didn't make somebody's day dull! > > -- > --Guido van Rossum (python.org/~guido) -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Wed Jun 4 13:12:31 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 14:12:31 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: <20140604141231.3cdd4fdd@x34f> Hello, On Wed, 4 Jun 2014 17:03:22 +1000 Chris Angelico wrote: [] > > Why not support variable-width strings like CPython 3.4? > > That was my first recommendation, and in fact I started writing code > to implement parts of PEP 393, with a view to basically doing it the > same way in both Pythons. But discussion on the tracker issue showed a > certain amount of hostility toward the potential expansion of strings, > particularly in the worst-case example of appending a single SMP > character onto a long ASCII string. An alternative view is that the discussion on the tracker showed Python developers' mind-fixation on implementing something the way CPython does it. And I didn't yet go to that argument, but in the end, MicroPython does not try to rewrite CPython or compete with it. So, having few choices with pros and cons leading approximately to the tie among them, it's the least productive to make the same choice as CPython did. Even having "rule of thumb" of choosing not-a-CPython way would be more productive than having the same rule of thumb for blindly choosing CPython way. (Of course, actually it should be technical discussion based on the target requirements, like we hopefully did, with strong arguments against using something else but the de-facto standard transfer encoding for Unicode). > > ChrisA -- Best regards, Paul mailto:pmiscml at gmail.com From rosuav at gmail.com Wed Jun 4 13:17:12 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 4 Jun 2014 21:17:12 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604141231.3cdd4fdd@x34f> References: <20140604011718.GD10355@ando> <20140604141231.3cdd4fdd@x34f> Message-ID: On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky wrote: > An alternative view is that the discussion on the tracker showed Python > developers' mind-fixation on implementing something the way CPython does > it. And I didn't yet go to that argument, but in the end, MicroPython > does not try to rewrite CPython or compete with it. So, having few > choices with pros and cons leading approximately to the tie among them, > it's the least productive to make the same choice as CPython did. I'm not a CPython dev, nor a Python dev, and I don't think any of the big names of CPython or Python has showed up on that tracker as yet. But why is "be different from CPython" such a valuable choice? CPython works. It's had many hours of dev time put into it. Problems have been identified and avoided. Throwing that out means throwing away a freely-given shoulder to stand on, in an Isaac Newton way. http://www.joelonsoftware.com/articles/fog0000000069.html ChrisA From dholth at gmail.com Wed Jun 4 13:35:28 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 4 Jun 2014 07:35:28 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604141231.3cdd4fdd@x34f> Message-ID: Can of worms, opened. On Jun 4, 2014 7:20 AM, "Chris Angelico" wrote: > On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky wrote: > > An alternative view is that the discussion on the tracker showed Python > > developers' mind-fixation on implementing something the way CPython does > > it. And I didn't yet go to that argument, but in the end, MicroPython > > does not try to rewrite CPython or compete with it. So, having few > > choices with pros and cons leading approximately to the tie among them, > > it's the least productive to make the same choice as CPython did. > > I'm not a CPython dev, nor a Python dev, and I don't think any of the > big names of CPython or Python has showed up on that tracker as yet. > But why is "be different from CPython" such a valuable choice? CPython > works. It's had many hours of dev time put into it. Problems have been > identified and avoided. Throwing that out means throwing away a > freely-given shoulder to stand on, in an Isaac Newton way. > > http://www.joelonsoftware.com/articles/fog0000000069.html > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Wed Jun 4 13:15:34 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 4 Jun 2014 11:15:34 +0000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <7B966E20-909B-4DC6-9DCC-2206A93763E9@stufft.io> References: <20140604011718.GD10355@ando> <7B966E20-909B-4DC6-9DCC-2206A93763E9@stufft.io> Message-ID: For those that haven't seen this: http://www.utf8everywhere.org/ > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Donald Stufft > Sent: 4. j?n? 2014 01:46 > To: Steven D'Aprano > Cc: python-dev at python.org > Subject: Re: [Python-Dev] Internal representation of strings and > Micropython > > I think UTF8 is the best option. > From pmiscml at gmail.com Wed Jun 4 13:49:33 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 14:49:33 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> Message-ID: <20140604144933.66e6c2f4@x34f> Hello, On Wed, 4 Jun 2014 20:53:46 +1000 Chris Angelico wrote: > On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky > wrote: > > And I'm saying that not to discourage Unicode addition to > > MicroPython, but to hint that "force-force" approach implemented by > > CPython3 and causing rage and split in the community is not > > appreciated. > > FWIW, it's Python 3 (the language) and not CPython 3.x (the > implementation) that specifies Unicode strings in this way. Yeah, but it's CPython what dictates how language evolves (some people even think that it dictates how language should be implemented!), so all good parts belong to Python3, and all bad parts - to CPython3, right? ;-) > I don't > know why it has to cause a split in the community; this is the one way > to make sure *everyone's* strings work perfectly, rather than having > ASCII strings work fine and others start tripping over problems in > various APIs. It did cause split in the community, that's the fact, that's why Python2 and Python3 are at the respective positions. Anyway, I'm not interested in participating in that split, I did not yet uttered my opinion on that publicly enough, so I seized a chance to drop some witty remarks, but I don't want to start yet another Unicode flame. So, let's please be back to Unicode storage representation in MicroPython. So, https://github.com/micropython/micropython/issues/657 discussed technical aspects, in a recent mail on this list I expressed my opinion why following CPython way is not productive (for development satisfaction and evolution of Python community, to be explicit). Final argument I would have is that you certainly can implement Unicode support the PEP393 way - it would be enormous help and would be gladly accepted. The question, how useful it will be for MicroPython. It certainly will be useful to report passing of testsuites. But will it be *really* used? For microcontroller board, it might be too heavy (put simple, with it, people will be able to do less (== heap running out sooner)), than without it, so one may expect it to be disabled by default. Then POSIX port is there surely not to let people replace "python" command with "micropython" and run Django, but to let people develop and debug their apps with more comfort than on embedded board. So, it should behave close to MCU version, and would follow with MCU choice re: Unicode. That's actually the reason why I keep up this discussion - not for the sake of argument or to bash Python3's Unicode choices. With recent MicroPython announcement, we surely looked for more people to contribute to its development. But then we (or at least I can speak for myself), would like to make sure that these contribution are actually the most useful ones (for both MicroPython, and Python community in general, which gets more choices, rather than just getting N% smaller CPython rewrite). So, you're not sure how O(N) string indexing will work? But MicroPython offers a great opportunity to try! And it's something new and exciting, which surely will be useful (== will save people memory), not just something old and boring ;-). > > ChrisA -- Best regards, Paul mailto:pmiscml at gmail.com From dholth at gmail.com Wed Jun 4 14:17:16 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 4 Jun 2014 08:17:16 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604144933.66e6c2f4@x34f> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604144933.66e6c2f4@x34f> Message-ID: If we're voting I think representing Unicode internally in micropython as utf-8 with O(N) indexing is a great idea, partly because I'm not sure indexing into strings is a good idea - lots of Unicode code points don't make sense by themselves; see also grapheme clusters. It would probably work great. On Wed, Jun 4, 2014 at 7:49 AM, Paul Sokolovsky wrote: > Hello, > > On Wed, 4 Jun 2014 20:53:46 +1000 > Chris Angelico wrote: > >> On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky >> wrote: >> > And I'm saying that not to discourage Unicode addition to >> > MicroPython, but to hint that "force-force" approach implemented by >> > CPython3 and causing rage and split in the community is not >> > appreciated. >> >> FWIW, it's Python 3 (the language) and not CPython 3.x (the >> implementation) that specifies Unicode strings in this way. > > Yeah, but it's CPython what dictates how language evolves (some people > even think that it dictates how language should be implemented!), so all > good parts belong to Python3, and all bad parts - to CPython3, > right? ;-) > >> I don't >> know why it has to cause a split in the community; this is the one way >> to make sure *everyone's* strings work perfectly, rather than having >> ASCII strings work fine and others start tripping over problems in >> various APIs. > > It did cause split in the community, that's the fact, that's why > Python2 and Python3 are at the respective positions. Anyway, I'm not > interested in participating in that split, I did not yet uttered my > opinion on that publicly enough, so I seized a chance to drop some > witty remarks, but I don't want to start yet another Unicode flame. > > > > So, let's please be back to Unicode storage representation in > MicroPython. So, https://github.com/micropython/micropython/issues/657 > discussed technical aspects, in a recent mail on this list I expressed > my opinion why following CPython way is not productive (for development > satisfaction and evolution of Python community, to be explicit). > > Final argument I would have is that you certainly can implement Unicode > support the PEP393 way - it would be enormous help and would be gladly > accepted. The question, how useful it will be for MicroPython. It > certainly will be useful to report passing of testsuites. But will it > be *really* used? > > For microcontroller board, it might be too heavy (put simple, with it, > people will be able to do less (== heap running out sooner)), than > without it, so one may expect it to be disabled by default. Then POSIX > port is there surely not to let people replace "python" command > with "micropython" and run Django, but to let people develop and debug > their apps with more comfort than on embedded board. So, it should > behave close to MCU version, and would follow with MCU choice > re: Unicode. > > That's actually the reason why I keep up this discussion - not for the > sake of argument or to bash Python3's Unicode choices. With recent > MicroPython announcement, we surely looked for more people to > contribute to its development. But then we (or at least I can speak for > myself), would like to make sure that these contribution are actually > the most useful ones (for both MicroPython, and Python community in > general, which gets more choices, rather than just getting N% smaller > CPython rewrite). > > So, you're not sure how O(N) string indexing will work? But MicroPython > offers a great opportunity to try! And it's something new and exciting, > which surely will be useful (== will save people memory), not just > something old and boring ;-). > > >> >> ChrisA > > > -- > Best regards, > Paul mailto:pmiscml at gmail.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com From pmiscml at gmail.com Wed Jun 4 14:18:01 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 15:18:01 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604141231.3cdd4fdd@x34f> Message-ID: <20140604151801.4a08d40d@x34f> Hello, On Wed, 4 Jun 2014 21:17:12 +1000 Chris Angelico wrote: > On Wed, Jun 4, 2014 at 9:12 PM, Paul Sokolovsky > wrote: > > An alternative view is that the discussion on the tracker showed > > Python developers' mind-fixation on implementing something the way > > CPython does it. And I didn't yet go to that argument, but in the > > end, MicroPython does not try to rewrite CPython or compete with > > it. So, having few choices with pros and cons leading approximately > > to the tie among them, it's the least productive to make the same > > choice as CPython did. > > I'm not a CPython dev, nor a Python dev, and I don't think any of the > big names of CPython or Python has showed up on that tracker as yet. > But why is "be different from CPython" such a valuable choice? CPython > works. It's had many hours of dev time put into it. Exactly, CPython (already) exists, and it works, so people can just use it. MicroPython's aim is to go where CPython didn't, and couldn't, go. For that, it's got to be different, or it literally won't fit there, like CPython doesn't. [] -- Best regards, Paul mailto:pmiscml at gmail.com From Steve.Dower at microsoft.com Wed Jun 4 15:14:04 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 4 Jun 2014 13:14:04 +0000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604144933.66e6c2f4@x34f>, Message-ID: I'm agree with Daniel. Directly indexing into text suggests an attempted optimization that is likely to be incorrect for a set of strings. Splitting, regex, concatenation and formatting are really the main operations that matter, and MicroPython can optimize their implementation of these easily enough for O(N) indexing. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Daniel Holth Sent: ?6/?4/?2014 5:17 To: Paul Sokolovsky Cc: python-dev Subject: Re: [Python-Dev] Internal representation of strings and Micropython If we're voting I think representing Unicode internally in micropython as utf-8 with O(N) indexing is a great idea, partly because I'm not sure indexing into strings is a good idea - lots of Unicode code points don't make sense by themselves; see also grapheme clusters. It would probably work great. On Wed, Jun 4, 2014 at 7:49 AM, Paul Sokolovsky wrote: > Hello, > > On Wed, 4 Jun 2014 20:53:46 +1000 > Chris Angelico wrote: > >> On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky >> wrote: >> > And I'm saying that not to discourage Unicode addition to >> > MicroPython, but to hint that "force-force" approach implemented by >> > CPython3 and causing rage and split in the community is not >> > appreciated. >> >> FWIW, it's Python 3 (the language) and not CPython 3.x (the >> implementation) that specifies Unicode strings in this way. > > Yeah, but it's CPython what dictates how language evolves (some people > even think that it dictates how language should be implemented!), so all > good parts belong to Python3, and all bad parts - to CPython3, > right? ;-) > >> I don't >> know why it has to cause a split in the community; this is the one way >> to make sure *everyone's* strings work perfectly, rather than having >> ASCII strings work fine and others start tripping over problems in >> various APIs. > > It did cause split in the community, that's the fact, that's why > Python2 and Python3 are at the respective positions. Anyway, I'm not > interested in participating in that split, I did not yet uttered my > opinion on that publicly enough, so I seized a chance to drop some > witty remarks, but I don't want to start yet another Unicode flame. > > > > So, let's please be back to Unicode storage representation in > MicroPython. So, https://github.com/micropython/micropython/issues/657 > discussed technical aspects, in a recent mail on this list I expressed > my opinion why following CPython way is not productive (for development > satisfaction and evolution of Python community, to be explicit). > > Final argument I would have is that you certainly can implement Unicode > support the PEP393 way - it would be enormous help and would be gladly > accepted. The question, how useful it will be for MicroPython. It > certainly will be useful to report passing of testsuites. But will it > be *really* used? > > For microcontroller board, it might be too heavy (put simple, with it, > people will be able to do less (== heap running out sooner)), than > without it, so one may expect it to be disabled by default. Then POSIX > port is there surely not to let people replace "python" command > with "micropython" and run Django, but to let people develop and debug > their apps with more comfort than on embedded board. So, it should > behave close to MCU version, and would follow with MCU choice > re: Unicode. > > That's actually the reason why I keep up this discussion - not for the > sake of argument or to bash Python3's Unicode choices. With recent > MicroPython announcement, we surely looked for more people to > contribute to its development. But then we (or at least I can speak for > myself), would like to make sure that these contribution are actually > the most useful ones (for both MicroPython, and Python community in > general, which gets more choices, rather than just getting N% smaller > CPython rewrite). > > So, you're not sure how O(N) string indexing will work? But MicroPython > offers a great opportunity to try! And it's something new and exciting, > which surely will be useful (== will save people memory), not just > something old and boring ;-). > > >> >> ChrisA > > > -- > Best regards, > Paul mailto:pmiscml at gmail.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Wed Jun 4 15:29:51 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 04 Jun 2014 14:29:51 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604135314.4bb31d75@x34f> References: <20140604011718.GD10355@ando> <20140604135314.4bb31d75@x34f> Message-ID: On 04/06/2014 11:53, Paul Sokolovsky wrote: > Hello, > > On Tue, 3 Jun 2014 22:23:07 -0700 > Guido van Rossum wrote: > > [] >> Never mind disabling assertions -- even with enabled assertions you'd >> have to expect most Python programs to fail with non-ASCII input. >> >> Then again the UTF-8 option would be pretty devastating too for >> anything manipulating strings (especially since many Python APIs are >> defined using indexes, e.g. the re module). > > If the Unicode is slow (*), then obvious choice is not using Unicode > when not needed. Too bad that's a bit hard in Python3, as it enforces > Unicode everywhere, and dealing with efficient strings requires > prefixing them with funny characters like "b", etc. > > * If Unicode if slow because it causes heap to bloat and go swap, the > choice is still the same. Where is your evidence that (presumably) CPython unicode is slow? What is your response to this message http://bugs.python.org/issue16061#msg171413 from the bug tracker? > >> >> Why not support variable-width strings like CPython 3.4? > > Because, like good deal of community, we hope that Python4 will get > back to reality, and strings will be efficient (both for processing and > storage) by default, and niche and marginal "Unicode string" type will > be used explicitly (using funny prefixes, etc.), only when really > needed. Where is your evidence that supports the above claim? > > > Ah, all these not so funny geek jokes about internals of language > implementation, hope they didn't make somebody's day dull! > >> >> -- >> --Guido van Rossum (python.org/~guido) > > > -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From ncoghlan at gmail.com Wed Jun 4 15:33:01 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Jun 2014 23:33:01 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604053904.GA5309@k2> References: <20140604011718.GD10355@ando> <20140604053904.GA5309@k2> Message-ID: On 4 June 2014 15:39, wrote: > On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote: > >> There's a general expectation that indexing will be O(1) because all >> the builtin containers that support that syntax use it for O(1) lookup >> operations. > > Depending on your definition of built in, there is at least one standard > library container that does not - collections.deque. > > Given the specialized kinds of application this Python implementation is > targetted at, it seems UTF-8 is ideal considering the huge memory > savings resulting from the compressed representation, and the reduced > likelihood of there being any real need for serious text processing on > the device. Right - I wasn't clear that I think storing text internally as UTF-8 sounds fine for MicroPython. Anything where the O(N) nature of indexing by code point matters probably won't be run in that environment anyway. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Wed Jun 4 15:39:46 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 04 Jun 2014 16:39:46 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604011718.GD10355@ando> References: <20140604011718.GD10355@ando> Message-ID: 04.06.14 04:17, Steven D'Aprano ???????(??): > Would either of these trade-offs be acceptable while still claiming > "Python 3.4 compatibility"? > > My own feeling is that O(1) string indexing operations are a quality of > implementation issue, not a deal breaker to call it a Python. I can't > see any requirement in the docs that str[n] must take O(1) time, but > perhaps I have missed something. I think than breaking O(1) expectation for indexing makes the implementation significant incompatible with Python. Virtually all string operations in Python operates with indices. O(1) indexing operations can be kept with minimal memory requirements if implement Unicode internally as modified UTF-8 plus optional array of offsets for every, say, 32th character (which even can be compressed to an array of 16-bit or 32-bit integers). From dholth at gmail.com Wed Jun 4 16:01:12 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 4 Jun 2014 10:01:12 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: MicroPython is going to be significantly incompatible with Python anyway. But you should be able to run your mp code on regular Python. On Wed, Jun 4, 2014 at 9:39 AM, Serhiy Storchaka wrote: > 04.06.14 04:17, Steven D'Aprano ???????(??): > >> Would either of these trade-offs be acceptable while still claiming >> "Python 3.4 compatibility"? >> >> My own feeling is that O(1) string indexing operations are a quality of >> implementation issue, not a deal breaker to call it a Python. I can't >> see any requirement in the docs that str[n] must take O(1) time, but >> perhaps I have missed something. > > > I think than breaking O(1) expectation for indexing makes the implementation > significant incompatible with Python. Virtually all string operations in > Python operates with indices. > > O(1) indexing operations can be kept with minimal memory requirements if > implement Unicode internally as modified UTF-8 plus optional array of > offsets for every, say, 32th character (which even can be compressed to an > array of 16-bit or 32-bit integers). > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com From p.f.moore at gmail.com Wed Jun 4 16:02:48 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 4 Jun 2014 15:02:48 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: On 4 June 2014 14:39, Serhiy Storchaka wrote: > I think than breaking O(1) expectation for indexing makes the implementation > significant incompatible with Python. Virtually all string operations in > Python operates with indices. I don't use indexing on strings except in rare situations. Sure I use lots of operations that may well use indexing *internally* but that's the point. MicroPython can optimise those operations without needing to guarantee O(1) indexing, and I'd be fine with that. Paul From steve at pearwood.info Wed Jun 4 16:12:45 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 5 Jun 2014 00:12:45 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> Message-ID: <20140604141245.GF10355@ando> On Wed, Jun 04, 2014 at 01:14:04PM +0000, Steve Dower wrote: > I'm agree with Daniel. Directly indexing into text suggests an > attempted optimization that is likely to be incorrect for a set of > strings. I'm afraid I don't understand this argument. The language semantics says that a string is an array of code points. Every index relates to a single code point, no code point extends over two or more indexes. There's a 1:1 relationship between code points and indexes. How is direct indexing "likely to be incorrect"? e.g. s = "---?---" offset = s.index('?') assert s[offset] == '?' That cannot fail with Python's semantics. [Aside: it does fail in Python 2, showing that the idea that "strings are bytes" is fatally broken. Fortunately Python has moved beyond that.] > Splitting, regex, concatenation and formatting are really the > main operations that matter, and MicroPython can optimize their > implementation of these easily enough for O(N) indexing. Really? Well, it will be a nice experiment. Fortunately MicroPython runs under Linux as well as on embedded systems (a clever decision, by the way) so I look forward to seeing how their internal-utf8 implementation stacks up against CPython's FSR implementation. Out of curiosity, when the FSR was proposed, did anyone consider an internal UTF-8 representation? If so, why was it rejected? -- Steven From storchaka at gmail.com Wed Jun 4 16:17:29 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 04 Jun 2014 17:17:29 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: 04.06.14 10:03, Chris Angelico ???????(??): > Right, which is why I don't like the idea. But you don't need > non-ASCII characters to blink an LED or turn a servo, and there is > significant resistance to the notion that appending a non-ASCII > character to a long ASCII-only string requires the whole string to be > copied and doubled in size (lots of heap space used). But you need non-ASCII characters to display a title of MP3 track. From rosuav at gmail.com Wed Jun 4 16:26:10 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 5 Jun 2014 00:26:10 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka wrote: > 04.06.14 10:03, Chris Angelico ???????(??): > >> Right, which is why I don't like the idea. But you don't need >> non-ASCII characters to blink an LED or turn a servo, and there is >> significant resistance to the notion that appending a non-ASCII >> character to a long ASCII-only string requires the whole string to be >> copied and doubled in size (lots of heap space used). > > > But you need non-ASCII characters to display a title of MP3 track. Agreed. IMO, any Python, no matter how micro, needs full Unicode support; but there is resistance from uPy's devs. ChrisA From storchaka at gmail.com Wed Jun 4 16:40:14 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 04 Jun 2014 17:40:14 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: 04.06.14 17:02, Paul Moore ???????(??): > On 4 June 2014 14:39, Serhiy Storchaka wrote: >> I think than breaking O(1) expectation for indexing makes the implementation >> significant incompatible with Python. Virtually all string operations in >> Python operates with indices. > > I don't use indexing on strings except in rare situations. Sure I use > lots of operations that may well use indexing *internally* but that's > the point. MicroPython can optimise those operations without needing > to guarantee O(1) indexing, and I'd be fine with that. Any non-trivial text parsing uses indices or regular expressions (and regular expressions themself use indices internally). It would be interesting to collect a statistic about how many indexing operations happened during the life of a string in typical (Micro)Python program. From steve at pearwood.info Wed Jun 4 16:40:53 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 5 Jun 2014 00:40:53 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604133857.13a0f0b9@x34f> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> Message-ID: <20140604144053.GG10355@ando> On Wed, Jun 04, 2014 at 01:38:57PM +0300, Paul Sokolovsky wrote: > That's another reason why people don't like Unicode enforced upon them Enforcing design and language decisions is the job of the programming language. You might as well complain that Python forces C doubles as the floating point type, or that it forces Bignums as the integer type, or that it forces significant indentation, or "class" as a keyword. Or that C forces you to use braces and manage your own memory. That's the purpose of the language, to make those decisions as to what features to provide and what not to provide. > - all the talk about supporting all languages and scripts is demagogy > and hypocrisy, given a choice, Unicode zealots would rather limit > people to Latin script I have no words to describe how ridiculous this accusation is. > then give up on their arbitrarily chosen, one-among-thousands, > soon-to-be-replaced-by-apples'-and-microsofts'-"exciting-new" encoding. > Once again, my claim is what MicroPython implements now is more correct > - in a sense wider than technical - handling. We don't provide Unicode > encoding support, because it's highly bloated, but let people use any > encoding they like. That comes at some price, like length of strings in > characters are not know to runtime, only in bytes What's does uPy return for the length of '?'? If the answer is anything but 1, that's a bug. -- Steven From pmiscml at gmail.com Wed Jun 4 16:49:30 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 17:49:30 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: <20140604174930.3a5af45f@x34f> Hello, On Thu, 5 Jun 2014 00:26:10 +1000 Chris Angelico wrote: > On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka > wrote: > > 04.06.14 10:03, Chris Angelico ???????(??): > > > >> Right, which is why I don't like the idea. But you don't need > >> non-ASCII characters to blink an LED or turn a servo, and there is > >> significant resistance to the notion that appending a non-ASCII > >> character to a long ASCII-only string requires the whole string to > >> be copied and doubled in size (lots of heap space used). > > > > > > But you need non-ASCII characters to display a title of MP3 track. Yes, but to display a title, you don't need to do codepoint access at random - you need to either take a block of memory (length in bytes) and do something with it (pass to a C function, transfer over some bus, etc.), or *iterate in order* over codepoints in a string. All these operations are as efficient (O-notation) for UTF-8 as for UTF-32. Some operations are not going to be as fast, so - oops - avoid doing them without good reason. And kindly drop expectations that doing arbitrary operations on *Unicode* are as efficient as you imagined. (Note the *Unicode* in general, not particular flavor of which you got used to, up to thinking it's the one and only "right" flavor.) > Agreed. IMO, any Python, no matter how micro, needs full Unicode > support; but there is resistance from uPy's devs. FUD ;-). > > ChrisA -- Best regards, Paul mailto:pmiscml at gmail.com From rosuav at gmail.com Wed Jun 4 17:00:52 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 5 Jun 2014 01:00:52 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604174930.3a5af45f@x34f> References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> Message-ID: On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky wrote: >> > But you need non-ASCII characters to display a title of MP3 track. > > Yes, but to display a title, you don't need to do codepoint access at > random - you need to either take a block of memory (length in bytes) and > do something with it (pass to a C function, transfer over some bus, > etc.), or *iterate in order* over codepoints in a string. All these > operations are as efficient (O-notation) for UTF-8 as for UTF-32. Suppose you have a long title, and you need to abbreviate it by dropping out words (delimited by whitespace), such that you keep the first word (always) and the last (if possible) and as many as possible in between. How are you going to write that? With PEP 393 or UTF-32 strings, you can simply record the index of every whitespace you find, count off lengths, and decide what to keep and what to ellipsize. > Some operations are not going to be as fast, so - oops - avoid doing > them without good reason. And kindly drop expectations that doing > arbitrary operations on *Unicode* are as efficient as you imagined. > (Note the *Unicode* in general, not particular flavor of which you got > used to, up to thinking it's the one and only "right" flavor.) Not sure what you mean by flavors of Unicode. Unicode is a mapping of codepoints to characters, not an in-memory representation. And I've been working with Python 3.3 since before it came out, and with Pike (which has a very similar model) for longer, and in both of them, I casually perform operations on Unicode strings in the same way that I used to perform operations on REXX strings (which were eight-bit in the current system codepage - 437 for us). I do expect those operations to be efficient, and I get what I expect. Maybe they won't be in uPy, but that would be a limitation of uPy, not a fundamental problem with Unicode. ChrisA From dholth at gmail.com Wed Jun 4 17:31:17 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 4 Jun 2014 11:31:17 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604141245.GF10355@ando> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604141245.GF10355@ando> Message-ID: On Wed, Jun 4, 2014 at 10:12 AM, Steven D'Aprano wrote: > On Wed, Jun 04, 2014 at 01:14:04PM +0000, Steve Dower wrote: >> I'm agree with Daniel. Directly indexing into text suggests an >> attempted optimization that is likely to be incorrect for a set of >> strings. > > I'm afraid I don't understand this argument. The language semantics says > that a string is an array of code points. Every index relates to a > single code point, no code point extends over two or more indexes. > There's a 1:1 relationship between code points and indexes. How is > direct indexing "likely to be incorrect"? "Useful" is probably a better word. When you get into the complicated languages and you want to know how wide something is, and you might have y with two dots on it as one code point or two and left-to-right and right-to-left indicators and who knows what else... then looking at individual code points only works sometimes. I get the slicing idea. I like the idea that encoding to utf-8 would be the fastest thing you can do with a string. You could consider doing regexps in that domain, and other implementation specific optimizations in exactly the same way that any Python implementation has them. None of this would make it harder to move a servo. From Steve.Dower at microsoft.com Wed Jun 4 17:32:25 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 4 Jun 2014 15:32:25 +0000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> Message-ID: <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> Steven D'Aprano wrote: > The language semantics says that a string is an array of code points. Every > index relates to a single code point, no code point extends over two or more > indexes. > There's a 1:1 relationship between code points and indexes. How is direct > indexing "likely to be incorrect"? We're discussing the behaviour under a different (hypothetical) design decision than a 1:1 relationship between code points and indexes, so arguing from that stance doesn't make much sense. > e.g. > > s = "---?---" > offset = s.index('?') > assert s[offset] == '?' > > That cannot fail with Python's semantics. Agreed, and it shouldn't (I was actually referring to the optimization being incorrect for the goal, not the language semantics). What you'd probably find is that sizeof('?') == sizeof(s[offset]) == 2, which may be surprising, but is also correct. But what are you trying to achieve (why are you writing this code)? All this example really shows is that you're only using indexing for trivial purposes. Chris's example of an actual case where it may look like a good idea to use indexing for optimization makes this more obvious IMHO: Chris Angelico wrote: > Suppose you have a long title, and you need to abbreviate it by dropping out > words (delimited by whitespace), such that you keep the first word (always) and > the last (if possible) and as many as possible in between. How are you going to > write that? With PEP 393 or UTF-32 strings, you can simply record the index of > every whitespace you find, count off lengths, and decide what to keep and what > to ellipsize. "Recording the index" is where the optimization comes in. With a variable-length encoding - heck, even with a fixed-length one - I'd just use str.split(' ') (or re.split('\\s', string), depending on how much I care about the type of delimiter) and manipulate the list. If copying into a separate list is a problem (memory-wise), re.finditer('\\S+', string) also provides the same behaviour and gives me the sliced string, so there's no need to index for anything. The downside is that it isn't as easy to teach as the 1:1 relationship, and currently it doesn't perform as well *in CPython*. But if MicroPython is focusing on size over speed, I don't see any reason why they shouldn't permit different performance characteristics and require a slightly different approach to highly-optimized coding. In any case, this is an interesting discussion with a genuine effect on the Python interpreter ecosystem. Jython and IronPython already have different string implementations from CPython - having official (and hopefully flexible) guidance on deviations from the reference implementation would I think help other implementations provide even more value, which is only a good thing for Python. Cheers, Steve From pmiscml at gmail.com Wed Jun 4 17:38:31 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 18:38:31 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: <20140604183831.7226448c@x34f> Hello, On Wed, 04 Jun 2014 17:40:14 +0300 Serhiy Storchaka wrote: > 04.06.14 17:02, Paul Moore ???????(??): > > On 4 June 2014 14:39, Serhiy Storchaka wrote: > >> I think than breaking O(1) expectation for indexing makes the > >> implementation significant incompatible with Python. Virtually all > >> string operations in Python operates with indices. > > > > I don't use indexing on strings except in rare situations. Sure I > > use lots of operations that may well use indexing *internally* but > > that's the point. MicroPython can optimise those operations without > > needing to guarantee O(1) indexing, and I'd be fine with that. > > Any non-trivial text parsing uses indices or regular expressions (and > regular expressions themself use indices internally). I keep hearing this stuff, and unfortunately so far don't have enough time to collect all that stuff and provide detailed response. So, here's spur of the moment response - hopefully we're in the same context so it is easy to understand. So, gentlemen, you keep mixing up character-by-character random access to string and taking substrings of a string. Character-by-character random access imply that you would need to scan thru (possibly almost) all chars in a string. That's O(N) (N-length of string). With varlength encoding (taking O(N) to index arbitrary char), there's thus concern that this would be O(N^2) op. But show me real-world case for that. Common usecase is scanning string left-to-right, that should be done using iterator and thus O(N). Right-to-left scanning would be order(s) of magnitude less frequent, as and also handled by iterator. What's next? You're doing some funky anagrams and need to swap each 2 adjacent chars? Sorry, naive implementation will be slow. If you're in serious anagram business, you'll need to code C extension. No, wait! Instead you should learn Python better. You should run a string windowing iterator which will return adjacent pair and swap those constant-len strings. More cases anyone? Implementing DES and doing arbitrary permutations? Kindly drop doing that on strings, do it on bytes or lists. Hopefully, the idea is clear - if you *scan* thru string using indexes in *random* order, you're doing weird thing and *want* weird performance. Doing stuff is s[0] ot s[-1] - there's finite (and small) number of such operation per strings. Now about taking substrings of strings (which in Python often expressed by slice indexing). Well, this is quite different from scanning each character of a strings. Just like s[0]/s[-1] this usually happens finite number of times for a particular string, independent of its length, i.e. O(1) times (ex, you take a string and split it in 3 parts), or maybe number of substrings is not bound-fixed, but has different growth order, O(M) (for example, you split string in tokens, tokens can be long, but there're usually external limits on how many it's sensible to have on one line). So, again, you're not going to get quadric time unless you're unlucky or sloppy. And just again, you should brush up your Python skills and use regex functions shich return iterators to get your parsed tokens, etc. (To clarify the obvious - "you" here is abstract pronoun, not referring to respectable Python developers who actually made it possible to write efficient Python programs). So, hopefully the point is conveyed - you can write inefficient Python programs. CPython goes out of the way to hide many inefficiencies (using unbelievably bloated heap usage - from uPy's point of view, which starts up in 2K heap). You just shouldn't write inefficient programs, voila. But if you want, you can keep writing inefficient programs, they just will be inefficient. Peace. > It would be interesting to collect a statistic about how many > indexing operations happened during the life of a string in typical > (Micro)Python program. Yup. -- Best regards, Paul mailto:pmiscml at gmail.com From Steve.Dower at microsoft.com Wed Jun 4 17:51:38 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 4 Jun 2014 15:51:38 +0000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604183831.7226448c@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> Message-ID: <2a9cca53b0be4b41b425a8239d2dea77@BLUPR03MB389.namprd03.prod.outlook.com> Paul Sokolovsky wrote: > You just shouldn't write inefficient programs, voila. But if you want, you can keep writing inefficient programs, they just will be inefficient. Peace. Can I nominate this for QOTD? :) Cheers, Steve From breamoreboy at yahoo.co.uk Wed Jun 4 17:52:26 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 04 Jun 2014 16:52:26 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On 04/06/2014 16:32, Steve Dower wrote: > > If copying into a separate list is a problem (memory-wise), re.finditer('\\S+', string) also provides the same behaviour and gives me the sliced string, so there's no need to index for anything. > Out of idle curiosity is there anything that stops MicroPython, or any other implementation for that matter, from providing views of a string rather than copying every time? IIRC memoryviews in CPython rely on the buffer protocol at the C API level, so since strings don't support this protocol you can't take a memoryview of them. Could this actually be implemented in the future, is the underlying C code just too complicated, or what? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From pmiscml at gmail.com Wed Jun 4 17:53:52 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 18:53:52 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> Message-ID: <20140604185352.36f52959@x34f> Hello, On Thu, 5 Jun 2014 01:00:52 +1000 Chris Angelico wrote: > On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky > wrote: > >> > But you need non-ASCII characters to display a title of MP3 > >> > track. > > > > Yes, but to display a title, you don't need to do codepoint access > > at random - you need to either take a block of memory (length in > > bytes) and do something with it (pass to a C function, transfer > > over some bus, etc.), or *iterate in order* over codepoints in a > > string. All these operations are as efficient (O-notation) for > > UTF-8 as for UTF-32. > > Suppose you have a long title, and you need to abbreviate it by > dropping out words (delimited by whitespace), such that you keep the > first word (always) and the last (if possible) and as many as possible > in between. How are you going to write that? With PEP 393 or UTF-32 > strings, you can simply record the index of every whitespace you find, > count off lengths, and decide what to keep and what to ellipsize. I'll submit angry bugreport along the lines of "WWWHAT, it's 3.5 and there's still no str.isplit()??!!11", then do it with re.finditer() (while submitting another report on inconsistent naming scheme). [] -- Best regards, Paul mailto:pmiscml at gmail.com From songofacandy at gmail.com Wed Jun 4 18:45:51 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 5 Jun 2014 01:45:51 +0900 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538ECD98.5030309@farowl.co.uk> References: <20140604011718.GD10355@ando> <538ECD98.5030309@farowl.co.uk> Message-ID: For Jython and IronPython, UTF-16 may be best internal encoding. Recent languages (Swiffy, Golang, Rust) chose UTF-8 as internal encoding. Using utf-8 is simple and efficient. For example, no need for utf-8 copy of the string when writing to file and serializing to JSON. When implementing Python using these languages, UTF-8 will be best internal encoding. To allow Python implementations other than CPython can use UTF-8 or UTF-16 as internal encoding efficiently, I think adding internal position based API is the best solution. >>> s = "\U00100000x" >>> len(s) 2 >>> s[1:] 'x' >>> s.find('x') 1 >>> # s.isize() # Internal length. 5 for utf-8, 3 for utf-16 >>> # s.ifind('x') # Internal position, 4 for utf-8, 2 for utf-16 >>> # s.islice(s.ifind('x')) => 'x' (I like design of golang and Rust. I hope CPython uses utf-8 as internal encoding in the future. But this is off-topic.) On Wed, Jun 4, 2014 at 4:41 PM, Jeff Allen wrote: > Jython uses UTF-16 internally -- probably the only sensible choice in a > Python that can call Java. Indexing is O(N), fundamentally. By > "fundamentally", I mean for those strings that have not yet noticed that > they contain no supplementary (>0xffff) characters. > > I've toyed with making this O(1) universally. Like Steven, I understand this > to be a freedom afforded to implementers, rather than an issue of > conformity. > > Jeff Allen > > > On 04/06/2014 02:17, Steven D'Aprano wrote: >> >> There is a discussion over at MicroPython about the internal >> representation of Unicode strings. > > ... > >> My own feeling is that O(1) string indexing operations are a quality of >> implementation issue, not a deal breaker to call it a Python. I can't >> see any requirement in the docs that str[n] must take O(1) time, but >> perhaps I have missed something. >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com -- INADA Naoki From storchaka at gmail.com Wed Jun 4 18:49:18 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 04 Jun 2014 19:49:18 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604183831.7226448c@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> Message-ID: 04.06.14 18:38, Paul Sokolovsky ???????(??): >> Any non-trivial text parsing uses indices or regular expressions (and >> regular expressions themself use indices internally). > > I keep hearing this stuff, and unfortunately so far don't have enough > time to collect all that stuff and provide detailed response. So, > here's spur of the moment response - hopefully we're in the same > context so it is easy to understand. > > So, gentlemen, you keep mixing up character-by-character random access > to string and taking substrings of a string. > > Character-by-character random access imply that you would need to scan > thru (possibly almost) all chars in a string. That's O(N) (N-length of > string). With varlength encoding (taking O(N) to index arbitrary char), > there's thus concern that this would be O(N^2) op. > > But show me real-world case for that. Common usecase is scanning string > left-to-right, that should be done using iterator and thus O(N). > Right-to-left scanning would be order(s) of magnitude less frequent, as > and also handled by iterator. html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or re.search, process found token, advance position and repeat. From python at mrabarnett.plus.com Wed Jun 4 18:52:17 2014 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 04 Jun 2014 17:52:17 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604053904.GA5309@k2> Message-ID: <538F4EC1.1030509@mrabarnett.plus.com> On 2014-06-04 14:33, Nick Coghlan wrote: > On 4 June 2014 15:39, wrote: >> On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote: >> >>> There's a general expectation that indexing will be O(1) because >>> all the builtin containers that support that syntax use it for >>> O(1) lookup operations. >> >> Depending on your definition of built in, there is at least one >> standard library container that does not - collections.deque. >> >> Given the specialized kinds of application this Python >> implementation is targetted at, it seems UTF-8 is ideal considering >> the huge memory savings resulting from the compressed >> representation, and the reduced likelihood of there being any real >> need for serious text processing on the device. > > Right - I wasn't clear that I think storing text internally as UTF-8 > sounds fine for MicroPython. Anything where the O(N) nature of > indexing by code point matters probably won't be run in that > environment anyway. > In order to avoid indexing, you could use some kind of 'cursor' class to step forwards and backwards along strings. The cursor could include both the codepoint index and the byte index. From pmiscml at gmail.com Wed Jun 4 19:05:20 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 20:05:20 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> Message-ID: <20140604200520.1d432329@x34f> Hello, On Wed, 04 Jun 2014 19:49:18 +0300 Serhiy Storchaka wrote: [] > > But show me real-world case for that. Common usecase is scanning > > string left-to-right, that should be done using iterator and thus > > O(N). Right-to-left scanning would be order(s) of magnitude less > > frequent, as and also handled by iterator. > > html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize > don't use iterators. They use indices, str.find and/or regular > expressions. Common use case is quickly find substring starting from > current position using str.find or re.search, process found token, > advance position and repeat. That's sad, I agree. -- Best regards, Paul mailto:pmiscml at gmail.com From storchaka at gmail.com Wed Jun 4 19:11:11 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 04 Jun 2014 20:11:11 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538F4EC1.1030509@mrabarnett.plus.com> References: <20140604011718.GD10355@ando> <20140604053904.GA5309@k2> <538F4EC1.1030509@mrabarnett.plus.com> Message-ID: 04.06.14 19:52, MRAB ???????(??): > In order to avoid indexing, you could use some kind of 'cursor' class to > step forwards and backwards along strings. The cursor could include > both the codepoint index and the byte index. So you need different string library and different regular expression library. From storchaka at gmail.com Wed Jun 4 19:35:06 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 04 Jun 2014 20:35:06 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604174930.3a5af45f@x34f> References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> Message-ID: 04.06.14 17:49, Paul Sokolovsky ???????(??): > On Thu, 5 Jun 2014 00:26:10 +1000 > Chris Angelico wrote: >> On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka >> wrote: >>> 04.06.14 10:03, Chris Angelico ???????(??): >>>> Right, which is why I don't like the idea. But you don't need >>>> non-ASCII characters to blink an LED or turn a servo, and there is >>>> significant resistance to the notion that appending a non-ASCII >>>> character to a long ASCII-only string requires the whole string to >>>> be copied and doubled in size (lots of heap space used). >>> But you need non-ASCII characters to display a title of MP3 track. > > Yes, but to display a title, you don't need to do codepoint access at > random - you need to either take a block of memory (length in bytes) and > do something with it (pass to a C function, transfer over some bus, > etc.), or *iterate in order* over codepoints in a string. All these > operations are as efficient (O-notation) for UTF-8 as for UTF-32. Several previous comments discuss first option, ASCII-only strings. From storchaka at gmail.com Wed Jun 4 19:52:14 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 04 Jun 2014 20:52:14 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604200520.1d432329@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> Message-ID: 04.06.14 20:05, Paul Sokolovsky ???????(??): > On Wed, 04 Jun 2014 19:49:18 +0300 > Serhiy Storchaka wrote: >> html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize >> don't use iterators. They use indices, str.find and/or regular >> expressions. Common use case is quickly find substring starting from >> current position using str.find or re.search, process found token, >> advance position and repeat. > > That's sad, I agree. Other languages (Go, Rust) can be happy without O(1) indexing of strings. All string and regex operations work with iterators or cursors, and I believe this approach is not significant worse than implementing strings as O(1)-indexable arrays of characters (for some applications it can be worse, for other it can be better). But Python is different language, it has different operations for strings and different idioms. A language which doesn't support O(1) indexing is not Python, it is only Python-like language. From stephen at xemacs.org Wed Jun 4 19:57:39 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 05 Jun 2014 02:57:39 +0900 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > It would be interesting to collect a statistic about how many indexing > operations happened during the life of a string in typical (Micro)Python > program. Probably irrelevant (I doubt anybody is going to be writing programmers' editors in MicroPython), but by far the most frequently called functions in XEmacs are byte_to_char_index and its inverse. From guido at python.org Wed Jun 4 20:25:51 2014 From: guido at python.org (Guido van Rossum) Date: Wed, 4 Jun 2014 11:25:51 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: This thread has devolved into a flame war. I think we should trust the Micropython implementers (whoever they are -- are they participating here?) to know their users and let them do what feels right to them. We should just ask them not to claim full compatibility with any particular Python version -- that seems the most contentious point. Realistically, most Python code that works on Python 3.4 won't work on Micropython (for various reasons, not just the string behavior) and neither does it need to. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Wed Jun 4 20:29:29 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 4 Jun 2014 21:29:29 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> Message-ID: <20140604212929.7a1175f6@x34f> Hello, On Wed, 04 Jun 2014 20:52:14 +0300 Serhiy Storchaka wrote: [] > > That's sad, I agree. > > Other languages (Go, Rust) can be happy without O(1) indexing of > strings. All string and regex operations work with iterators or > cursors, and I believe this approach is not significant worse than > implementing strings as O(1)-indexable arrays of characters (for some > applications it can be worse, for other it can be better). But Python > is different language, it has different operations for strings and > different idioms. A language which doesn't support O(1) indexing is > not Python, it is only Python-like language. Sorry, but that's just your personal opinion, not shared by other developers, as this thread showed. And let's not pretend we live in happy-ever world of Python 1.5.2 which doesn't need anything more because it's perfect as it is. Somebody added all those iterators and iterator-returning functions to Pythons. And then the problem Python has is a typical "last mile" problem, that iterators were not applied completely everywhere. There's little choice but to move in that direction, though. What you call "idioms", other people call "sloppy programming practices". There's common suggestion how to be at peace with Python's indentation for those who find it a problem - "get over it". Well, somehow it itches to say same for people who think that Python3 should be used the same way as Python1: Get over the fact that Python is no longer little funny language being laughed at by Perl crowd for being order of magnitude slower at processing text files. While you still can do little funny tricks we all love Python for, it now also offers framework to do it right, and it makes little sense saying that doing it little funny way is the definitive trait of Python. (And for me it's easy to be such categorical - the only way I could subscribe to idea of running Python on an MCU and not be laughable is by trusting Python to provide framework for being efficient. I quit working on another language because I have trusted that iterator, generator, buffer protocols are not little funny things but thoroughly engineered efficient concepts, and I don't feel betrayed.) -- Best regards, Paul mailto:pmiscml at gmail.com From stefan_ml at behnel.de Wed Jun 4 21:14:54 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 04 Jun 2014 21:14:54 +0200 Subject: [Python-Dev] Should standard library modules optimize for CPython? In-Reply-To: <1437293580423521164.103866sturla.molden-gmail.com@news.gmane.org> References: <20140601081139.GO10355@ando> <1521177704423500642.020210sturla.molden-gmail.com@news.gmane.org> <1437293580423521164.103866sturla.molden-gmail.com@news.gmane.org> Message-ID: Sturla Molden, 03.06.2014 22:51: > Stefan Behnel wrote: >> So the >> argument in favour is mostly a pragmatic one. If you can have 2-5x faster >> code essentially for free, why not just go for it? > > I would be easier if the GIL or Cython's use of it was redesigned. Cython > just grabs the GIL and holds on to it until it is manually released. The > standard lib cannot have packages that holds the GIL forever, as a Cython > compiled module would do. Cython has to start sharing access the GIL like > the interpreter does. Granted. This shouldn't be all that difficult to add as a special case when compiling .py (not .pyx) files. Properly tuning it (i.e. avoiding to inject the GIL release-acquire cycle in the wrong spots) may take a while, but that can be improved over time. (It's not required in .pyx files because users should rather explicitly write "with nogil: pass" there to manually enable thread switches in safe and desirable places.) Stefan From steve at pearwood.info Wed Jun 4 22:10:40 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 5 Jun 2014 06:10:40 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <20140604201040.GI10355@ando> On Wed, Jun 04, 2014 at 03:32:25PM +0000, Steve Dower wrote: > Steven D'Aprano wrote: > > The language semantics says that a string is an array of code points. Every > > index relates to a single code point, no code point extends over two or more > > indexes. > > There's a 1:1 relationship between code points and indexes. How is direct > > indexing "likely to be incorrect"? > > We're discussing the behaviour under a different (hypothetical) design > decision than a 1:1 relationship between code points and indexes, so > arguing from that stance doesn't make much sense. I'm open to different implementations. I earlier even suggested that the choice of O(1) indexing versus O(N) indexing was a quality of implementation issue, not a make-or-break issue for whether something can call itself Python (or even 99% compatible with Python"). But I don't believe that exposing that implementation at the Python level is valid: regardless of whether it is efficient or not, I should be able to write code like this: a = [mystring[i] for i in range(len(mystring))] b = list(mystring) assert a == b That is not the case if you expose the underlying byte-level implementation at the Python level, and treat strings as an array of *bytes*. Paul seems to want to do this, or at least he wants Python 4 to do this. I think it is *completely* inappropriate to do so. I *think* you may agree with me, (correct me if I'm wrong) because you go on to agree with me: > > e.g. > > > > s = "---?---" > > offset = s.index('?') > > assert s[offset] == '?' > > > > That cannot fail with Python's semantics. > > Agreed, and it shouldn't but I'm not actually sure. > (I was actually referring to the optimization > being incorrect for the goal, not the language semantics). What you'd > probably find is that sizeof('?') == sizeof(s[offset]) == 2, which may > be surprising, but is also correct. You don't seem to be taking about sys.getsizeof, so I guess you're talking about something at the C level (or other underlying implementation), ignoring the object overhead. I don't know why you think I'd find that surprising -- one cannot fit 0x10FFFF Unicode code points in a single byte, so whether you use UTF-32, UTF-16, UTF-8, Python 3.3's FSR or some other implementation, at least some code points are going to use more than one byte. > But what are you trying to achieve (why are you writing this code)? > All this example really shows is that you're only using indexing for > trivial purposes. I'm trying to understand what point you are trying to make, because I'm afraid I don't quite get it. [...] > If copying into a separate list is a problem (memory-wise), > re.finditer('\\S+', string) also provides the same behaviour and gives > me the sliced string, so there's no need to index for anything. finditer returns a bunch of MatchObjects, which give you the indexes of the found substring. Whether you do it yourself, or get the re module to do it, you're indexing somewhere. > The downside is that it isn't as easy to teach as the 1:1 > relationship, and currently it doesn't perform as well *in CPython*. > But if MicroPython is focusing on size over speed, I don't see any > reason why they shouldn't permit different performance characteristics > and require a slightly different approach to highly-optimized coding. I don't have a problem with different implementations, so long as that implementation isn't exposed at the Python level with changes of semantics such as breaking the promise that a string is an array of code points, not of bytes. > In any case, this is an interesting discussion with a genuine effect > on the Python interpreter ecosystem. Jython and IronPython already > have different string implementations from CPython - having official > (and hopefully flexible) guidance on deviations from the reference > implementation would I think help other implementations provide even > more value, which is only a good thing for Python. Yes, agreed. -- Steven From v+python at g.nevcal.com Wed Jun 4 22:50:42 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 04 Jun 2014 13:50:42 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604144933.66e6c2f4@x34f>, Message-ID: <538F86A2.4080802@g.nevcal.com> On 6/4/2014 6:14 AM, Steve Dower wrote: > I'm agree with Daniel. Directly indexing into text suggests an > attempted optimization that is likely to be incorrect for a set of > strings. Splitting, regex, concatenation and formatting are really the > main operations that matter, and MicroPython can optimize their > implementation of these easily enough for O(N) indexing. > > Cheers, > Steve > > Top-posted from my Windows Phone > ------------------------------------------------------------------------ > From: Daniel Holth > Sent: ?6/?4/?2014 5:17 > To: Paul Sokolovsky > Cc: python-dev > Subject: Re: [Python-Dev] Internal representation of strings and > Micropython > > If we're voting I think representing Unicode internally in micropython > as utf-8 with O(N) indexing is a great idea, partly because I'm not > sure indexing into strings is a good idea - lots of Unicode code > points don't make sense by themselves; see also grapheme clusters. It > would probably work great. I think native UTF-8 support is the most promising route for a micropython Unicode support. It would be an interesting proof-of-concept to implement an alternative CPython with PEP-393 replaced by UTF-8 internally... doing conversions for APIs that require a different encoding, but always maintaining and computing with the UTF-8 representation. 1) The first proof-of-concept implementation should implement codepoint indexing as a O(N) operation, searching from the beginning of the string for the Nth codepoint. Other Proof-of-concept implementation could implement a codepoint boundary cache, there could be a variety of caching algorithms. 2) (Least space efficient) An array that could be indexed by codepoint position and result in byte position. (This would use more space than a UTF-32 representation!) 3) (Most space efficient) One cached entry, that caches the last codepoint/byte position referenced. UTF-8 is able to be traversed in either direction, so "next/previous" codepoint access would be relatively fast (and such are very common operations, even when indexing notation is used: "for ix in range( len( str_x )): func( str_x[ ix ])".) 4) (Fixed size caches) N entries, one for the last codepoint, and others at Codepoint_Length/N intervals. N could be tunable. 5) (Fixed size caches) Like 4, plus an extra entry like 3. 6) (Variable size caches) Like 2, but only indexing every Nth code point. N could be tunable. 7) (Variable size caches) Like 6, plus an extra entry like 3. 8) (Content specific variable size caches) Index each codepoint that is a different byte size than the previous codepoint, allowing indexing to be used in the intervals. Worst case size is like 2, best case size is a single entry for the end, when all code points are represented by the same number of bytes. 9) (Content specific variable size caches) Like 8, only cache entries could indicate fixed or variable size characters in the next interval, with a scheme like 4 or 6 used to prevent one interval from covering the whole string. Other hybrid schemes may present themselves as useful once experience is gained with some of these. It might be surprising how few algorithms need more than algorithm 3 to get reasonable performance. Glenn -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Wed Jun 4 23:14:32 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 00:14:32 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140605001432.126a0a08@x34f> Hello, On Wed, 4 Jun 2014 11:25:51 -0700 Guido van Rossum wrote: > This thread has devolved into a flame war. I think we should trust the > Micropython implementers (whoever they are -- are they participating > here?) I'm a regular contributor. I'm not sure if the author, Damien George, is on the list. In either case, he's a nice guy who prefer to do development rather than participate in flame wars ;-). And for the record, all opinions expressed are solely mine, and not official position of MicroPython project. > to know their users and let them do what feels right to them. > We should just ask them not to claim full compatibility with any > particular Python version -- that seems the most contentious point. "Full" compatibility is never claimed, and understanding it as such is optimistic, "between the lines" reading of some users. All of: announcement posted on python-list (which prompted current inflow of MicroPython-related discussions), README at https://github.com/micropython/micropython , and detailed differences doc https://github.com/micropython/micropython/wiki/Differences make it clear there's no talk about "full" compatibility, and only specific compatibility (and incompatibility) points are claimed. That said, and unlike previous attempts to develop a small Python implementations (which of course existed), we're striving to be exactly a Python language implementation, not a Python-like language implementation. As there's no formal, implementation-independent language spec, what constitutes a compatible language implementation is subject to opinions, and we welcome and appreciate independent review, like this thread did. > Realistically, most Python code that works on Python 3.4 won't work > on Micropython (for various reasons, not just the string behavior) > and neither does it need to. That's true. However, as was said, we're striving to provide a compatible implementation, and compatibility claims must be validated. While we have simple "in-house" testsuite, more serious compatibility validation requires running a testsuite for reference implementation (CPython), and that's gradually being approached. > > -- > --Guido van Rossum (python.org/~guido) -- Best regards, Paul mailto:pmiscml at gmail.com From tjreedy at udel.edu Wed Jun 4 23:19:29 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 04 Jun 2014 17:19:29 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538ECD98.5030309@farowl.co.uk> References: <20140604011718.GD10355@ando> <538ECD98.5030309@farowl.co.uk> Message-ID: On 6/4/2014 3:41 AM, Jeff Allen wrote: > Jython uses UTF-16 internally -- probably the only sensible choice in a > Python that can call Java. Indexing is O(N), fundamentally. By > "fundamentally", I mean for those strings that have not yet noticed that > they contain no supplementary (>0xffff) characters. > > I've toyed with making this O(1) universally. Like Steven, I understand > this to be a freedom afforded to implementers, rather than an issue of > conformity. > > Jeff Allen > > On 04/06/2014 02:17, Steven D'Aprano wrote: >> There is a discussion over at MicroPython about the internal >> representation of Unicode strings. > ... >> My own feeling is that O(1) string indexing operations are a quality of >> implementation issue, not a deal breaker to call it a Python. I can't >> see any requirement in the docs that str[n] must take O(1) time, but >> perhaps I have missed something. >> > -- Terry Jan Reedy From tjreedy at udel.edu Wed Jun 4 23:21:20 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 04 Jun 2014 17:21:20 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538ECD98.5030309@farowl.co.uk> References: <20140604011718.GD10355@ando> <538ECD98.5030309@farowl.co.uk> Message-ID: On 6/4/2014 3:41 AM, Jeff Allen wrote: > Jython uses UTF-16 internally -- probably the only sensible choice in a > Python that can call Java. Indexing is O(N), fundamentally. By > "fundamentally", I mean for those strings that have not yet noticed that > they contain no supplementary (>0xffff) characters. Indexing can be made O(log(k)) where k is the number of astral chars, and is usually small. -- Terry Jan Reedy From rosuav at gmail.com Wed Jun 4 23:28:00 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 5 Jun 2014 07:28:00 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538F86A2.4080802@g.nevcal.com> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604144933.66e6c2f4@x34f> <538F86A2.4080802@g.nevcal.com> Message-ID: On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman wrote: > 8) (Content specific variable size caches) Index each codepoint that is a > different byte size than the previous codepoint, allowing indexing to be > used in the intervals. Worst case size is like 2, best case size is a single > entry for the end, when all code points are represented by the same number > of bytes. Conceptually interesting, and I'd love to know how well that'd perform in real-world usage. Would do very nicely on blocks of text that are all from the same range of codepoints, but if you intersperse high and low codepoints it'll be like 2 but with significantly more complicated lookups (imagine a "name=value\nname=value\n" stream where the names and values are all in the same language - you'll have a lot of transitions). Chrisa From rdmurray at bitdance.com Wed Jun 4 23:54:08 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 04 Jun 2014 17:54:08 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605001432.126a0a08@x34f> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> Message-ID: <20140604215409.3C071250DE3@webabinitio.net> On Thu, 05 Jun 2014 00:14:32 +0300, Paul Sokolovsky wrote: > That said, and unlike previous attempts to develop a small Python > implementations (which of course existed), we're striving to be exactly > a Python language implementation, not a Python-like language > implementation. As there's no formal, implementation-independent > language spec, what constitutes a compatible language implementation is > subject to opinions, and we welcome and appreciate independent review, > like this thread did. The language reference is also the language specification. I don't know what you mean by 'formal', so presumably it doesn't qualify :) That said, if there are places that are not correctly marked as implementation specific, those are bugs in the reference and should be fixed. There almost certainly are still such bugs, and I suspect MicroPython can help us fix them, just as PyPy did/does. --David From v+python at g.nevcal.com Wed Jun 4 23:57:36 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 04 Jun 2014 14:57:36 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604144933.66e6c2f4@x34f> <538F86A2.4080802@g.nevcal.com> Message-ID: <538F9650.106@g.nevcal.com> On 6/4/2014 2:28 PM, Chris Angelico wrote: > On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman wrote: >> 8) (Content specific variable size caches) Index each codepoint that is a >> different byte size than the previous codepoint, allowing indexing to be >> used in the intervals. Worst case size is like 2, best case size is a single >> entry for the end, when all code points are represented by the same number >> of bytes. > Conceptually interesting, and I'd love to know how well that'd perform > in real-world usage. So would I :) > Would do very nicely on blocks of text that are > all from the same range of codepoints, but if you intersperse high and > low codepoints it'll be like 2 but with significantly more complicated > lookups (imagine a "name=value\nname=value\n" stream where the names > and values are all in the same language - you'll have a lot of > transitions). Lookup is binary search on code point index or a search for same in some tree structure, I would think. "like 2 but ..." well, the data structure would be bigger than for 2, but your example shows 4-5 high codepoints per low codepoint (for some languages). I did just think of another refinement to this technique (my list was not intended to be all-inclusive... just a bunch of variations I thought of then). 10) (Content specific variable size caches) Like 8, but the last character in a run is allowed (but not required) to be a different number of bytes than prior characters, because the offset calculation will still work for the first character of a different size. So #10 would halve the size of your imagined stream that intersperses one low-byte charater with each sequence of high-byte characters. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Jun 5 00:04:52 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 04 Jun 2014 18:04:52 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605001432.126a0a08@x34f> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> Message-ID: On 6/4/2014 5:14 PM, Paul Sokolovsky wrote: > That said, and unlike previous attempts to develop a small Python > implementations (which of course existed), we're striving to be exactly > a Python language implementation, not a Python-like language > implementation. As there's no formal, implementation-independent > language spec, what constitutes a compatible language implementation is > subject to opinions, and we welcome and appreciate independent review, > like this thread did. > >> Realistically, most Python code that works on Python 3.4 won't work >> on Micropython (for various reasons, not just the string behavior) >> and neither does it need to. > > That's true. However, as was said, we're striving to provide a > compatible implementation, and compatibility claims must be validated. > While we have simple "in-house" testsuite, more serious compatibility > validation requires running a testsuite for reference implementation > (CPython), and that's gradually being approached. I would call what you are doing a 'Python 3.n subset, with limitations', where n should be a specific number, which I would urge should be at least 3, if not 4 ('yield from'). To me, that would mean that every Micropython program (that does not use a clearly non-Python addon like inline assembly) would run the same* on CPython 3.n. Conversely, a Python 3.n program should either run the same* on MicroPython as CPython, or raise. What most to avoid is giving different* answers. *'same' does not include timing differences or normal float variations or bug fixes in MicroPython not in CPython. As for unicode: I would see ascii-only (very limited codepoints) or bare utf-8 (limited speed == expanded time) as possibly fitting the definition above. Just be clear what the limitations are. And accept that there will be people who do not bother to read the limitations and then complain when they bang into them. PS. You do not seem to be aware of how well the current PEP393 implementation works. If you are going to write any more about it, I suggest you run Tools/Stringbench/stringbench.py for timings. -- Terry Jan Reedy From ericsnowcurrently at gmail.com Thu Jun 5 00:12:23 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 4 Jun 2014 16:12:23 -0600 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605001432.126a0a08@x34f> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> Message-ID: On Wed, Jun 4, 2014 at 3:14 PM, Paul Sokolovsky wrote: > That said, and unlike previous attempts to develop a small Python > implementations (which of course existed), we're striving to be exactly > a Python language implementation, not a Python-like language > implementation. As there's no formal, implementation-independent > language spec, what constitutes a compatible language implementation is > subject to opinions, and we welcome and appreciate independent review, > like this thread did. Actually, there is a "formal, implementation-independent language spec": https://docs.python.org/3/reference/ > >> Realistically, most Python code that works on Python 3.4 won't work >> on Micropython (for various reasons, not just the string behavior) >> and neither does it need to. > > That's true. However, as was said, we're striving to provide a > compatible implementation, and compatibility claims must be validated. > While we have simple "in-house" testsuite, more serious compatibility > validation requires running a testsuite for reference implementation > (CPython), and that's gradually being approached. To a large extent the test suite in http://hg.python.org/cpython/file/default/Lib/test effectively validates (full) compliance with the corresponding release (change "default" to the release branch of your choice). With that goal, no small effort has been made to mark implementation-specific tests as such. So uPy could consider using the test suite (and explicitly skip the tests for features that uPy doesn't support). -eric From pmiscml at gmail.com Thu Jun 5 00:52:53 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 01:52:53 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> Message-ID: <20140605015253.301e72e7@x34f> Hello, On Wed, 04 Jun 2014 18:04:52 -0400 Terry Reedy wrote: > On 6/4/2014 5:14 PM, Paul Sokolovsky wrote: > > > That said, and unlike previous attempts to develop a small Python > > implementations (which of course existed), we're striving to be > > exactly a Python language implementation, not a Python-like language > > implementation. As there's no formal, implementation-independent > > language spec, what constitutes a compatible language > > implementation is subject to opinions, and we welcome and > > appreciate independent review, like this thread did. > > > >> Realistically, most Python code that works on Python 3.4 won't work > >> on Micropython (for various reasons, not just the string behavior) > >> and neither does it need to. > > > > That's true. However, as was said, we're striving to provide a > > compatible implementation, and compatibility claims must be > > validated. While we have simple "in-house" testsuite, more serious > > compatibility validation requires running a testsuite for reference > > implementation (CPython), and that's gradually being approached. > > I would call what you are doing a 'Python 3.n subset, with Thanks, that's what we call it ourselves in the docs linked in the original message, and use n=4. Note that being a subset is not a design requirement, but there's higher-priority requirement of staying lean, so realistically uPy will always stay a subset. > limitations', where n should be a specific number, which I would urge > should be at least 3, if not 4 ('yield from'). To me, that would mean > that every Micropython program (that does not use a clearly > non-Python addon like inline assembly) would run the same* on CPython > 3.n. Conversely, a Python 3.n program should either run the same* on > MicroPython as CPython, or raise. What most to avoid is giving > different* answers. That's nice aim, to implement which we don't have enough resources, so would appreciate any help from interested parties. > *'same' does not include timing differences or normal float > variations or bug fixes in MicroPython not in CPython. > > As for unicode: I would see ascii-only (very limited codepoints) or > bare utf-8 (limited speed == expanded time) as possibly fitting the > definition above. Just be clear what the limitations are. And accept > that there will be people who do not bother to read the limitations > and then complain when they bang into them. > > PS. You do not seem to be aware of how well the current PEP393 > implementation works. If you are going to write any more about it, I > suggest you run Tools/Stringbench/stringbench.py for timings. "Well" is subjective (or should be defined formally based on the requirements). With my MicroPython hat on, an implementation which receives a string, transcodes it, leading to bigger size, just to immediately transcode back and send out - is awful, environment unfriendly implementation ;-). -- Best regards, Paul mailto:pmiscml at gmail.com From storchaka at gmail.com Thu Jun 5 00:43:59 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 05 Jun 2014 01:43:59 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> Message-ID: 05.06.14 01:04, Terry Reedy ???????(??): > PS. You do not seem to be aware of how well the current PEP393 > implementation works. If you are going to write any more about it, I > suggest you run Tools/Stringbench/stringbench.py for timings. AFAIK stringbench is ASCII-only, so it likely is compatible with current and any future MicroPython implementations, but unlikely will expose non-ASCII limitations or performance. From rosuav at gmail.com Thu Jun 5 01:05:33 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 5 Jun 2014 09:05:33 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605015253.301e72e7@x34f> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> <20140605015253.301e72e7@x34f> Message-ID: On Thu, Jun 5, 2014 at 8:52 AM, Paul Sokolovsky wrote: > "Well" is subjective (or should be defined formally based on the > requirements). With my MicroPython hat on, an implementation which > receives a string, transcodes it, leading to bigger size, just to > immediately transcode back and send out - is awful, environment > unfriendly implementation ;-). Be careful of confusing correctness and performance, though. The transcoding you describe is inefficient, but (presumably) correct; something that's fast but wrong is straight-up buggy. You can always fix inefficiency in a later release, but buggy behaviour sometimes is relied on (which is why ECMAScript still exposes UTF-16 to scripts, and why Windows window messages have a WPARAM and an LPARAM, and why Python's threading module has duplicate names for a lot of functions, because it's just not worth changing). I'd be much more comfortable releasing something where "everything works fine, but if you use astral characters in your strings, memory usage blows out by a factor of four" (or "... the len() function takes O(N) time") than one where "everything works fine as long as you use BMP only, but SMP characters result in tests failing". ChrisA From pmiscml at gmail.com Thu Jun 5 01:11:10 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 02:11:10 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> Message-ID: <20140605021110.485c0ca1@x34f> Hello, On Wed, 4 Jun 2014 16:12:23 -0600 Eric Snow wrote: > On Wed, Jun 4, 2014 at 3:14 PM, Paul Sokolovsky > wrote: > > That said, and unlike previous attempts to develop a small Python > > implementations (which of course existed), we're striving to be > > exactly a Python language implementation, not a Python-like language > > implementation. As there's no formal, implementation-independent > > language spec, what constitutes a compatible language > > implementation is subject to opinions, and we welcome and > > appreciate independent review, like this thread did. > > Actually, there is a "formal, implementation-independent language > spec": > > https://docs.python.org/3/reference/ Opening that link in browser, pressing Ctrl+F and pasting your quote gives zero hits, so it's not exactly what you claim it to be. It's also pretty far from being formal (unambiguous, covering all choices, etc.) and comprehensive. Also, please point me at "conformance" section. That said, all of us Pythoneers treat it as the best formal reference available, no news here. > >> Realistically, most Python code that works on Python 3.4 won't work > >> on Micropython (for various reasons, not just the string behavior) > >> and neither does it need to. > > > > That's true. However, as was said, we're striving to provide a > > compatible implementation, and compatibility claims must be > > validated. While we have simple "in-house" testsuite, more serious > > compatibility validation requires running a testsuite for reference > > implementation (CPython), and that's gradually being approached. > > To a large extent the test suite in > http://hg.python.org/cpython/file/default/Lib/test effectively > validates (full) compliance with the corresponding release (change > "default" to the release branch of your choice). With that goal, no > small effort has been made to mark implementation-specific tests as > such. So uPy could consider using the test suite (and explicitly skip > the tests for features that uPy doesn't support). That's exactly what we do, per the previous paragraph. And we face a lot of questionable tests, just like you say. Shameless plug: if anyone interested to run existing code on MicroPython, please help us with CPython testsuite! ;-) > > -eric -- Best regards, Paul mailto:pmiscml at gmail.com From storchaka at gmail.com Thu Jun 5 00:54:42 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 05 Jun 2014 01:54:42 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <538ECD98.5030309@farowl.co.uk> Message-ID: 05.06.14 00:21, Terry Reedy ???????(??): > On 6/4/2014 3:41 AM, Jeff Allen wrote: >> Jython uses UTF-16 internally -- probably the only sensible choice in a >> Python that can call Java. Indexing is O(N), fundamentally. By >> "fundamentally", I mean for those strings that have not yet noticed that >> they contain no supplementary (>0xffff) characters. > > Indexing can be made O(log(k)) where k is the number of astral chars, > and is usually small. I like your idea and think it would be great if Jython will implement it. Unfortunately it is too late to do this in CPython. From ericsnowcurrently at gmail.com Thu Jun 5 02:01:23 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 4 Jun 2014 18:01:23 -0600 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605021110.485c0ca1@x34f> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> <20140605021110.485c0ca1@x34f> Message-ID: On Wed, Jun 4, 2014 at 5:11 PM, Paul Sokolovsky wrote: > On Wed, 4 Jun 2014 16:12:23 -0600 > Eric Snow wrote: >> Actually, there is a "formal, implementation-independent language >> spec": >> >> https://docs.python.org/3/reference/ > > Opening that link in browser, pressing Ctrl+F and pasting your quote > gives zero hits, so it's not exactly what you claim it to be. It's also > pretty far from being formal (unambiguous, covering all choices, etc.) > and comprehensive. Also, please point me at "conformance" section. > > That said, all of us Pythoneers treat it as the best formal reference > available, no news here. It's not just the best formal reference. It's the official specification. I agree it is not so "formal" as other language specifications and it does not enumerate every facet of the language. However, underspecified parts are worth improving (as we've done with the import system portion in the last few years). Incidentally, the efforts of other Python implementors have often resulted in such improvements to the language reference. Those improvements typically come as a result of questions to this very list. :) That's essentially what this email thread is! -eric From greg.ewing at canterbury.ac.nz Thu Jun 5 02:03:17 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Jun 2014 12:03:17 +1200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> Message-ID: <538FB3C5.6010104@canterbury.ac.nz> Serhiy Storchaka wrote: > html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't > use iterators. They use indices, str.find and/or regular expressions. > Common use case is quickly find substring starting from current position > using str.find or re.search, process found token, advance position and > repeat. For that kind of thing, you don't need an actual character index, just some way of referring to a place in a string. Instead of an integer, str.find() etc. could return a StringPosition, which would be an opaque reference to a particular point in a particular string. You would be able to pass StringPositions to indexing and slicing operations to get fast indexing into the string that they were derived from. StringPositions could support the following operations: StringPosition + int --> StringPosition StringPosition - int --> StringPosition StringPosition - StringPosition --> int These would be computed by counting characters forwards or backwards in the string, which would be slower than int arithmetic but still faster than counting from the beginning of the string every time. In other contexts, StringPositions would coerce to ints (maybe being an int subclass?) allowing them to be used in any existing algorithm that slices strings using ints. -- Greg From greg.ewing at canterbury.ac.nz Thu Jun 5 02:08:21 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Jun 2014 12:08:21 +1200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> Message-ID: <538FB4F5.9070500@canterbury.ac.nz> Serhiy Storchaka wrote: > A language which doesn't support O(1) indexing is not Python, it is only > Python-like language. That's debatable, but even if it's true, I don't think there's anything wrong with MicroPython being only a "Python-like language". As has been pointed out, fitting Python onto a small device is always going to necessitate some compromises. -- Greg From v+python at g.nevcal.com Thu Jun 5 02:08:33 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 04 Jun 2014 17:08:33 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB3C5.6010104@canterbury.ac.nz> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <538FB3C5.6010104@canterbury.ac.nz> Message-ID: <538FB501.2040601@g.nevcal.com> On 6/4/2014 5:03 PM, Greg Ewing wrote: > Serhiy Storchaka wrote: >> html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize >> don't use iterators. They use indices, str.find and/or regular >> expressions. Common use case is quickly find substring starting from >> current position using str.find or re.search, process found token, >> advance position and repeat. > > For that kind of thing, you don't need an actual character > index, just some way of referring to a place in a string. I think you meant codepoint index, rather than character index. > > Instead of an integer, str.find() etc. could return a > StringPosition, which would be an opaque reference to a > particular point in a particular string. You would be > able to pass StringPositions to indexing and slicing > operations to get fast indexing into the string that > they were derived from. > > StringPositions could support the following operations: > > StringPosition + int --> StringPosition > StringPosition - int --> StringPosition > StringPosition - StringPosition --> int > > These would be computed by counting characters forwards > or backwards in the string, which would be slower than > int arithmetic but still faster than counting from the > beginning of the string every time. > > In other contexts, StringPositions would coerce to ints > (maybe being an int subclass?) allowing them to be used > in any existing algorithm that slices strings using ints. > This starts to diverge from Python codepoint indexing via integers. Calculating or caching the codepoint index to byte offset as part of the str implementation stays compatible with Python. Introducing StringPosition makes a Python-like language. Or so it seems to me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Jun 5 02:13:37 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 04 Jun 2014 17:13:37 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB501.2040601@g.nevcal.com> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <538FB3C5.6010104@canterbury.ac.nz> <538FB501.2040601@g.nevcal.com> Message-ID: <538FB631.4000401@g.nevcal.com> On 6/4/2014 5:08 PM, Glenn Linderman wrote: > On 6/4/2014 5:03 PM, Greg Ewing wrote: >> Serhiy Storchaka wrote: >>> html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize >>> don't use iterators. They use indices, str.find and/or regular >>> expressions. Common use case is quickly find substring starting from >>> current position using str.find or re.search, process found token, >>> advance position and repeat. >> >> For that kind of thing, you don't need an actual character >> index, just some way of referring to a place in a string. > > I think you meant codepoint index, rather than character index. > >> >> Instead of an integer, str.find() etc. could return a >> StringPosition, which would be an opaque reference to a >> particular point in a particular string. You would be >> able to pass StringPositions to indexing and slicing >> operations to get fast indexing into the string that >> they were derived from. >> >> StringPositions could support the following operations: >> >> StringPosition + int --> StringPosition >> StringPosition - int --> StringPosition >> StringPosition - StringPosition --> int >> >> These would be computed by counting characters forwards >> or backwards in the string, which would be slower than >> int arithmetic but still faster than counting from the >> beginning of the string every time. >> >> In other contexts, StringPositions would coerce to ints >> (maybe being an int subclass?) allowing them to be used >> in any existing algorithm that slices strings using ints. >> > This starts to diverge from Python codepoint indexing via integers. > Calculating or caching the codepoint index to byte offset as part of > the str implementation stays compatible with Python. Introducing > StringPosition makes a Python-like language. Or so it seems to me. Another thought is that StringPosition only works (quickly, at least), as you point out, for the string that they were derived from... so algorithms that walk two strings at a time cannot use the same StringPosition to do so... yep, this is quite divergent from CPython and Python. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Jun 5 02:52:03 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Jun 2014 12:52:03 +1200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB501.2040601@g.nevcal.com> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <538FB3C5.6010104@canterbury.ac.nz> <538FB501.2040601@g.nevcal.com> Message-ID: <538FBF33.6020607@canterbury.ac.nz> Glenn Linderman wrote: > >> For that kind of thing, you don't need an actual character >> index, just some way of referring to a place in a string. > > I think you meant codepoint index, rather than character index. Probably, but what I said is true either way. > This starts to diverge from Python codepoint indexing via integers. That's true, although most programs would have to go out of their way to tell the difference, especially if StringPosition were a subclass of int. I agree that cacheing indexes would be more transparent, though. -- Greg From greg.ewing at canterbury.ac.nz Thu Jun 5 02:57:16 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Jun 2014 12:57:16 +1200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB631.4000401@g.nevcal.com> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <538FB3C5.6010104@canterbury.ac.nz> <538FB501.2040601@g.nevcal.com> <538FB631.4000401@g.nevcal.com> Message-ID: <538FC06C.2070802@canterbury.ac.nz> Glenn Linderman wrote: > > so algorithms that walk two strings at a time cannot use the same > StringPosition to do so... yep, this is quite divergent from CPython and > Python. They can, it's just that at most one of the indexing operations would be fast; the StringPosition would devolve into an int for the other one. Such an algorithm would be of dubious correctness anyway, since as you pointed out, codepoints and characters are not quite the same thing. A codepoint index in one string doesn't necessarily count off the same number of characters in another string. So to be safe, you should really walk each string individually. -- Greg From pmiscml at gmail.com Thu Jun 5 03:01:38 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 04:01:38 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB3C5.6010104@canterbury.ac.nz> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <538FB3C5.6010104@canterbury.ac.nz> Message-ID: <20140605040138.4e5a944f@x34f> Hello, On Thu, 05 Jun 2014 12:03:17 +1200 Greg Ewing wrote: > Serhiy Storchaka wrote: > > html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize > > don't use iterators. They use indices, str.find and/or regular > > expressions. Common use case is quickly find substring starting > > from current position using str.find or re.search, process found > > token, advance position and repeat. > > For that kind of thing, you don't need an actual character > index, just some way of referring to a place in a string. > > Instead of an integer, str.find() etc. could return a > StringPosition, That's more brave then I had in mind, but definitely shows what alternative implementation have in store to fight back if some perfomance problems are actually detected. My own thoughts were, for example, as response to people who (quoting) "slice strings for living" is some form of "extended slicing" like str[(0, 4, 6, 8, 15)]. But I really think that providing iterator interface for common string operations would cover most of real-world cases, and will be actually beneficial for Python language in general. > > -- > Greg -- Best regards, Paul mailto:pmiscml at gmail.com From rosuav at gmail.com Thu Jun 5 03:17:04 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 5 Jun 2014 11:17:04 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB3C5.6010104@canterbury.ac.nz> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <538FB3C5.6010104@canterbury.ac.nz> Message-ID: On Thu, Jun 5, 2014 at 10:03 AM, Greg Ewing wrote: > StringPositions could support the following operations: > > StringPosition + int --> StringPosition > StringPosition - int --> StringPosition > StringPosition - StringPosition --> int > > These would be computed by counting characters forwards > or backwards in the string, which would be slower than > int arithmetic but still faster than counting from the > beginning of the string every time. The SP would have to keep track of which string it's associated with, which might make for some surprising retentions of large strings. (Imagine returning what you think is an integer, but actually turns out to be a SP, and you're trying to work out why your program is eating up so much more memory than it should. This int-like object is so much more than an int.) ChrisA From pmiscml at gmail.com Thu Jun 5 03:19:13 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 04:19:13 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB4F5.9070500@canterbury.ac.nz> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> Message-ID: <20140605041913.14886264@x34f> Hello, On Thu, 05 Jun 2014 12:08:21 +1200 Greg Ewing wrote: > Serhiy Storchaka wrote: > > A language which doesn't support O(1) indexing is not Python, it is > > only Python-like language. > > That's debatable, but even if it's true, I don't think > there's anything wrong with MicroPython being only a > "Python-like language". As has been pointed out, fitting > Python onto a small device is always going to necessitate > some compromises. Thanks. I mentioned in another mail that we exactly trying to develop a minimalistic, but Python implementation, not Python-like language. What is "Python-like" for me. The other most well-know, and mature (as in "started quite some time ago") "small Python" implementation is PyMite aka Python-on-a-chip https://code.google.com/p/python-on-a-chip/ . It implements good deal of Python2 language. It doesn't implement exception handling (try/except). Can a Python be without exception handling? For me, the clear answer is "no". Please put that in perspective when alarming over O(1) indexing of inherently problematic niche datatype. (Again, it's not my or MicroPython's fault that it was forced as standard string type. Maybe if CPython seriously considered now-standard UTF-8 encoding, results of what is "str" type might be different. But CPython has gigabytes of heap to spare, and for MicroPython, every half-bit is precious). > > -- > Greg > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/pmiscml%40gmail.com -- Best regards, Paul mailto:pmiscml at gmail.com From tjreedy at udel.edu Thu Jun 5 04:15:30 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 04 Jun 2014 22:15:30 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605015253.301e72e7@x34f> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> <20140605015253.301e72e7@x34f> Message-ID: On 6/4/2014 6:52 PM, Paul Sokolovsky wrote: > "Well" is subjective (or should be defined formally based on the > requirements). With my MicroPython hat on, an implementation which > receives a string, transcodes it, leading to bigger size, just to > immediately transcode back and send out - is awful, environment > unfriendly implementation ;-). I am not sure what you concretely mean by 'receive a string', but I think you are again batting at a strawman. If you mean 'read from a file', and all you want to do is read bytes from and write bytes to external 'files', then there is obviously no need to transcode and neither Python 2 or 3 make you do so. -- Terry Jan Reedy From tjreedy at udel.edu Thu Jun 5 04:25:03 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 04 Jun 2014 22:25:03 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <538ECD98.5030309@farowl.co.uk> Message-ID: On 6/4/2014 6:54 PM, Serhiy Storchaka wrote: > 05.06.14 00:21, Terry Reedy ???????(??): >> On 6/4/2014 3:41 AM, Jeff Allen wrote: >>> Jython uses UTF-16 internally -- probably the only sensible choice in a >>> Python that can call Java. Indexing is O(N), fundamentally. By >>> "fundamentally", I mean for those strings that have not yet noticed that >>> they contain no supplementary (>0xffff) characters. >> >> Indexing can be made O(log(k)) where k is the number of astral chars, >> and is usually small. > > I like your idea and think it would be great if Jython will implement > it. A proof of concept implementation in Python that handles both indexing and slicing is on the tracker. It is simpler than I initially expected. > Unfortunately it is too late to do this in CPython. I mentioned it as an alternative during the '393 discussion. I more than half agree that the FSR is the better choice for CPython, which had no particular attachment to UTF-16 in the way that I think Jython, for instance, does. -- Terry Jan Reedy From stephen at xemacs.org Thu Jun 5 09:00:01 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 05 Jun 2014 16:00:01 +0900 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538F86A2.4080802@g.nevcal.com> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604144933.66e6c2f4@x34f> <538F86A2.4080802@g.nevcal.com> Message-ID: <87ppin3kpa.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > 3) (Most space efficient) One cached entry, that caches the last > codepoint/byte position referenced. UTF-8 is able to be traversed in > either direction, so "next/previous" codepoint access would be > relatively fast (and such are very common operations, even when indexing > notation is used: "for ix in range( len( str_x )): func( str_x[ ix ])".) Been there, tried that (Emacsen). Either it's a YAGNI (moving forward or backward over UTF-8 by characters short distances is plenty fast, especially if you've got a lot of ASCII you can move by words for somewhat longer distances), or it's not good enough. There *may* be a sweet spot, but it's definitely smaller than the one on Sharapova's racket. > 4) (Fixed size caches) N entries, one for the last codepoint, and > others at Codepoint_Length/N intervals. N could be tunable. To achieve space saving, cache has to be quite small, and the bigger your integers, the smaller it gets. A naive implementation on 64-bit machine would give you 16 bytes/cache entry. Using a non-native size will be a space win, but needs care in implementation. Initializing the cache is very expensive for small strings, so you need conditional and maybe lazy initialization (for large strings). By the way, there's also 10) Keep counts of the leading and trailing number of ASCII (one-octet) characters. This is often a *huge* win; it's quite common to encounter documents where size - lc - tc = 2 (ie, there's only one two-octet character in the document). 11) Keep a list (or tree) of most-recently-accessed positions. Despite my negative experience with multibyte encodings in Emacsen, I'm persuaded by the arguments that there probably aren't all that many places in core Python where indexing is used in an essential way, so MicroPython itself can probably optimize those "behind the scenes". Application programmers in the embedded context may be expected to be deal with the need to avoid random access algorithms and use iterators and generators to accomplish most tasks. From storchaka at gmail.com Thu Jun 5 09:26:19 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 05 Jun 2014 10:26:19 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB4F5.9070500@canterbury.ac.nz> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> Message-ID: 05.06.14 03:08, Greg Ewing ???????(??): > Serhiy Storchaka wrote: >> A language which doesn't support O(1) indexing is not Python, it is >> only Python-like language. > > That's debatable, but even if it's true, I don't think > there's anything wrong with MicroPython being only a > "Python-like language". As has been pointed out, fitting > Python onto a small device is always going to necessitate > some compromises. Agree, there's anything wrong. I think that even limiting integers to 32 or 64 bits is acceptable compromise for Python-like language targeted to small devices. But programming on such language requires different techniques and habits. From storchaka at gmail.com Thu Jun 5 09:39:18 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 05 Jun 2014 10:39:18 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538FB3C5.6010104@canterbury.ac.nz> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <538FB3C5.6010104@canterbury.ac.nz> Message-ID: 05.06.14 03:03, Greg Ewing ???????(??): > Serhiy Storchaka wrote: >> html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't >> use iterators. They use indices, str.find and/or regular expressions. >> Common use case is quickly find substring starting from current >> position using str.find or re.search, process found token, advance >> position and repeat. > > For that kind of thing, you don't need an actual character > index, just some way of referring to a place in a string. Of course. But _existing_ Python interfaces all work with indices. And it is too late to change this, this train was gone 20 years ago. There is no need in yet one way to do string operations. One obvious way is enough. From storchaka at gmail.com Thu Jun 5 09:54:03 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 05 Jun 2014 10:54:03 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <538F86A2.4080802@g.nevcal.com> References: <20140604011718.GD10355@ando> <20140604133857.13a0f0b9@x34f> <20140604144933.66e6c2f4@x34f>, <538F86A2.4080802@g.nevcal.com> Message-ID: 04.06.14 23:50, Glenn Linderman ???????(??): > 3) (Most space efficient) One cached entry, that caches the last > codepoint/byte position referenced. UTF-8 is able to be traversed in > either direction, so "next/previous" codepoint access would be > relatively fast (and such are very common operations, even when indexing > notation is used: "for ix in range( len( str_x )): func( str_x[ ix ])".) Great idea! It should cover most real-word cases. Note that we can scan UTF-8 string left-to-right and right-to-left. From stephen at xemacs.org Thu Jun 5 09:54:11 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 05 Jun 2014 16:54:11 +0900 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605041913.14886264@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> Message-ID: <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Sokolovsky writes: > Please put that in perspective when alarming over O(1) indexing of > inherently problematic niche datatype. (Again, it's not my or > MicroPython's fault that it was forced as standard string type. Maybe > if CPython seriously considered now-standard UTF-8 encoding, results > of what is "str" type might be different. But CPython has gigabytes of > heap to spare, and for MicroPython, every half-bit is precious). Would you please stop trolling? The reasons for adopting Unicode as a separate data type were good and sufficient in 2000, and they remain so today, even if you have been fortunate enough not to burn yourself on character-byte conflation yet. What matters to you is that str (unicode) is an opaque type -- there is no specification of the internal representation in the language reference, and in fact several different ones coexist happily across existing Python implementations -- and you're free to use a UTF-8 implementation if that suits the applications you expect for MicroPython. PEP 393 exists, of course, and specifies the current internal representation for CPython 3. But I don't see anything in it that suggests it's mandated for any other implementation. From storchaka at gmail.com Thu Jun 5 10:08:21 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 05 Jun 2014 11:08:21 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <538ECD98.5030309@farowl.co.uk> Message-ID: 05.06.14 05:25, Terry Reedy ???????(??): > I mentioned it as an alternative during the '393 discussion. I more than > half agree that the FSR is the better choice for CPython, which had no > particular attachment to UTF-16 in the way that I think Jython, for > instance, does. Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is used instead of UCS4) is the better choice for CPython. I suppose that with populating emoticons and other icon characters in nearest 5 or 10 years, even English text will often contain astral characters. And spending 4 bytes per character if long text contains one astral character looks too prodigally. From stephen at xemacs.org Thu Jun 5 12:00:01 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 05 Jun 2014 19:00:01 +0900 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <538ECD98.5030309@farowl.co.uk> Message-ID: <87ha3z3cda.fsf@uwakimon.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is > used instead of UCS4) is the better choice for CPython. I suppose that > with populating emoticons and other icon characters in nearest 5 or 10 > years, even English text will often contain astral characters. And > spending 4 bytes per character if long text contains one astral > character looks too prodigally. Why use something that complex if you don't have to? For the use case you have in mind, just map them into private space. If you really want to be aggressive, use surrogate space, too (anything that cares what a scalar represents should be trapping on non-scalars, catch that exception and look up the char -- dangerous, though, because such exceptions are probably all over the place). From victor.stinner at gmail.com Thu Jun 5 12:03:15 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 5 Jun 2014 12:03:15 +0200 Subject: [Python-Dev] Request: new "Asyncio" component on the bug tracker Message-ID: Hi, Would it be possible to add a new "Asyncio" component on bugs.python.org? If this component is selected, the default nosy list for asyncio would be used (guido, yury and me, there is already such list in the nosy list completion). Full text search for "asyncio" returns too many results. Victor From pmiscml at gmail.com Thu Jun 5 12:10:39 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 13:10:39 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> <20140605015253.301e72e7@x34f> Message-ID: <20140605131039.4f5b74d6@x34f> Hello, On Wed, 04 Jun 2014 22:15:30 -0400 Terry Reedy wrote: > On 6/4/2014 6:52 PM, Paul Sokolovsky wrote: > > > "Well" is subjective (or should be defined formally based on the > > requirements). With my MicroPython hat on, an implementation which > > receives a string, transcodes it, leading to bigger size, just to > > immediately transcode back and send out - is awful, environment > > unfriendly implementation ;-). > > I am not sure what you concretely mean by 'receive a string', but I I (surely) mean an abstract input (as an Input/Output aka I/O) operation. > think you are again batting at a strawman. If you mean 'read from a > file', and all you want to do is read bytes from and write bytes to > external 'files', then there is obviously no need to transcode and > neither Python 2 or 3 make you do so. But most files, network protocols are text-based, and I (and many other people) don't want to artificially use "binary data" type for them, with all attached funny things, like "b" prefix. And then Python2 indeed doesn't transcode anything, and Python3 does, without being asked, and for no good purpose, because in most cases, Input data will be Output as-is (maybe in byte-boundary-split chunks). So, it all goes in rounds - ignoring the forced-Unicode problem (after a week of subscription to python-list, half of traffic there appear to be dedicated to Unicode-related flames) on python-dev behalf is not going to help (Python community). [] -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Thu Jun 5 13:25:28 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 14:25:28 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140605142528.39e0e5fc@x34f> Hello, On Thu, 05 Jun 2014 16:54:11 +0900 "Stephen J. Turnbull" wrote: > Paul Sokolovsky writes: > > > Please put that in perspective when alarming over O(1) indexing of > > inherently problematic niche datatype. (Again, it's not my or > > MicroPython's fault that it was forced as standard string type. > > Maybe if CPython seriously considered now-standard UTF-8 encoding, > > results of what is "str" type might be different. But CPython has > > gigabytes of heap to spare, and for MicroPython, every half-bit is > > precious). > > Would you please stop trolling? The reasons for adopting Unicode as a > separate data type were good and sufficient in 2000, and they remain If it was kept at "separate data type" bay, there wouldn't be any problem. But it was made "one and only string type", and all strife started then. And there going to be "trolling" as long as Python developers and decision-makers will ignore (troll?) outcry from the community (again, I was surprised and not surprised to see ~50% of traffic on python-list touches Unicode issues). Well, I understand the plan - hoping that people will "get over this". And I'm personally happy to stay away from this "trolling", but any discussion related to Unicode goes in circles and returns to feeling that Unicode at the central role as put there by Python3 is misplaced. Then for me, it's just a matter of job security and personal future - I don't want to spend rest of my days as a javascript (or other idiotic language) monkey. And the message is clear in the air (http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ and elsewhere): if Python strings are now in Go, and in Python itself are now Java strings, all causing strife, why not go cruising around and see what's up, instead of staying strong, and growing bigger, community. > so today, even if you have been fortunate enough not to burn yourself > on character-byte conflation yet. > > What matters to you is that str (unicode) is an opaque type -- there > is no specification of the internal representation in the language > reference, and in fact several different ones coexist happily across > existing Python implementations -- and you're free to use a UTF-8 > implementation if that suits the applications you expect for > MicroPython. > > PEP 393 exists, of course, and specifies the current internal > representation for CPython 3. But I don't see anything in it that > suggests it's mandated for any other implementation. I knew all this before very well. What's strange is that other developers don't know, or treat seriously, all of the above. That's why gentleman who kindly was interested in adding Unicode support to MicroPython started with the idea of dragging in CPython implementation. And the only effect persuasion that it's not necessarily the best solution had, was that he started to feel that he's being manipulated into writing something ugly, instead of the bright idea he had. That's why another gentleman reduces it to: "O(1) on string indexing or not a Python!". And that's why another gentleman, who agrees to UTF-8 arguments, still gives an excuse (https://mail.python.org/pipermail/python-dev/2014-June/134727.html): "In this context, while a fixed-width encoding may be the correct choice it would also likely be the wrong choice." In this regard, I'm glad to participate in mind-resetting discussion. So, let's reiterate - there's nothing like "the best", "the only right", "the only correct", "righter than", "more correct than" in CPython's implementation of Unicode storage. It is *arbitrary*. Well, sure, it's not arbitrary, but based on requirements, and these requirements match CPython's (implied) usage model well enough. But among all possible sets of requirements, CPython's requirements are no more valid that other possible. And other set of requirement fairly clearly lead to situation where CPython implementation is rejected as not correct for those requirements at all. -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Thu Jun 5 13:32:19 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 5 Jun 2014 21:32:19 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5 June 2014 17:54, Stephen J. Turnbull wrote: > What matters to you is that str (unicode) is an opaque type -- there > is no specification of the internal representation in the language > reference, and in fact several different ones coexist happily across > existing Python implementations -- and you're free to use a UTF-8 > implementation if that suits the applications you expect for > MicroPython. However, as others have noted in the thread, the critical thing is to *not* let that internal implementation detail leak into the Python level string behaviour. That's what happened with narrow builds of Python 2 and pre-PEP-393 releases of Python 3 (effectively using UTF-16 internally), and it was the cause of a sufficiently large number of bugs that the Linux distributions tend to instead accept the memory cost of using wide builds (4 bytes for all code points) for affected versions. Preserving the "the Python 3 str type is an immutable array of code points" semantics matters significantly more than whether or not indexing by code point is O(1). The various caching tricks suggested in this thread (especially "leading ASCII characters", "trailing ASCII characters" and "position & index of last lookup") could keep the typical lookup performance well below O(N). > PEP 393 exists, of course, and specifies the current internal > representation for CPython 3. But I don't see anything in it that > suggests it's mandated for any other implementation. CPython is constrained by C API compatibility requirements, as well as implementation constraints due to the amount of internal code that would need to be rewritten to handle a variable width encoding as the canonical internal representation (since the problems with Python 2 narrow builds mean we already know variable width encodings aren't handled correctly by the current code). Implementations that share code with CPython, or try to mimic the C API especially closely, may face similar restrictions. Outside that, I think we're better off if alternative implementations are free to experiment with different internal string representations. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 5 13:43:16 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 5 Jun 2014 21:43:16 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605142528.39e0e5fc@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> Message-ID: On 5 June 2014 21:25, Paul Sokolovsky wrote: > Well, I understand the plan - hoping that people will "get over this". > And I'm personally happy to stay away from this "trolling", but any > discussion related to Unicode goes in circles and returns to feeling > that Unicode at the central role as put there by Python3 is misplaced. Many of the challenges network programmers face in Python 3 are around binary data being more inconvenient to work with than it needs to be, not the fact we decentralised boundary code by offering a strict binary/text separation as the default mode of operation. Aside from some of the POSIX locale handling issues on Linux, many of the concerns are with the usability of bytes and bytearray, not with str - that's why binary interpolation is coming back in 3.5, and there will likely be other usability tweaks for those types as well. More on that at http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-actually-changed-in-the-text-model-between-python-2-and-python-3 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pmiscml at gmail.com Thu Jun 5 14:01:21 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 15:01:21 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> Message-ID: <20140605150121.286032df@x34f> Hello, On Thu, 5 Jun 2014 21:43:16 +1000 Nick Coghlan wrote: > On 5 June 2014 21:25, Paul Sokolovsky wrote: > > Well, I understand the plan - hoping that people will "get over > > this". And I'm personally happy to stay away from this "trolling", > > but any discussion related to Unicode goes in circles and returns > > to feeling that Unicode at the central role as put there by Python3 > > is misplaced. > > Many of the challenges network programmers face in Python 3 are around > binary data being more inconvenient to work with than it needs to be, > not the fact we decentralised boundary code by offering a strict > binary/text separation as the default mode of operation. Just to clarify - (many) other gentlemen and I (in that order, I'm not taking a lead), don't call to go back to Python2 behavior with implicit conversion between byte-oriented strings and Unicode, etc. They just point out that perhaps Python3 went too far with Unicode cause by making it the default string type. Strict separation is surely mostly good thing (I can sigh that it leads to Java-like dichotomical bloat for all I/O classes, but well, I was able to put up with that in MicroPython already). > Aside from > some of the POSIX locale handling issues on Linux, many of the > concerns are with the usability of bytes and bytearray, not with str - > that's why binary interpolation is coming back in 3.5, and there will > likely be other usability tweaks for those types as well. All these changes are what let me dream on and speculate on possibility that Python4 could offer an encoding-neutral string type (which means based on bytes), while move unicode back to an explicit type to be used explicitly only when needed (bloated frameworks like Django can force users to it anyway, but that will be forcing on framework level, not on language level, against which people rebel.) People can dream, right? Thanks, Paul mailto:pmiscml at gmail.com From stefan at bytereef.org Thu Jun 5 14:10:54 2014 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 5 Jun 2014 14:10:54 +0200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605142528.39e0e5fc@x34f> References: <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> Message-ID: <20140605121054.GA348@sleipnir.bytereef.org> Paul Sokolovsky wrote: > In this regard, I'm glad to participate in mind-resetting discussion. > So, let's reiterate - there's nothing like "the best", "the only right", > "the only correct", "righter than", "more correct than" in CPython's > implementation of Unicode storage. It is *arbitrary*. Well, sure, it's > not arbitrary, but based on requirements, and these requirements match > CPython's (implied) usage model well enough. But among all possible > sets of requirements, CPython's requirements are no more valid that > other possible. And other set of requirement fairly clearly lead to > situation where CPython implementation is rejected as not correct for > those requirements at all. Several core-devs have said that using UTF-8 for MicroPython is perfectly okay. I also think it's the right choice and I hope that you guys come up with a very efficient implementation. Stefan Krah From ncoghlan at gmail.com Thu Jun 5 14:20:04 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 5 Jun 2014 22:20:04 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605150121.286032df@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> Message-ID: On 5 June 2014 22:01, Paul Sokolovsky wrote: >> Aside from >> some of the POSIX locale handling issues on Linux, many of the >> concerns are with the usability of bytes and bytearray, not with str - >> that's why binary interpolation is coming back in 3.5, and there will >> likely be other usability tweaks for those types as well. > > All these changes are what let me dream on and speculate on > possibility that Python4 could offer an encoding-neutral string type > (which means based on bytes), while move unicode back to an explicit > type to be used explicitly only when needed (bloated frameworks like > Django can force users to it anyway, but that will be forcing on > framework level, not on language level, against which people rebel.) > People can dream, right? If you don't model strings as arrays of code points, or at least assume a particular universal encoding (like UTF-8), you have to give up string concatenation in order to tolerate arbitrary encodings - otherwise you end up with unintelligible data that nobody can decode because it switches encodings without notice. That's a viable model if your OS guarantees it (Mac OS X does, for example, so Python 3 assumes UTF-8 for all OS interfaces there), but Linux currently has no such guarantee - many runtimes just decide they don't care, and assume UTF-8 anyway (Python 3 may even join them some day, due to the problems caused by trusting the locale encoding to be correct, but the startup code will need non-trivial changes for that to happen - the C.UTF-8 locale may even become widespread before we get there). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From timothy.c.delaney at gmail.com Thu Jun 5 14:21:30 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Thu, 5 Jun 2014 22:21:30 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605150121.286032df@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> Message-ID: On 5 June 2014 22:01, Paul Sokolovsky wrote: > > All these changes are what let me dream on and speculate on > possibility that Python4 could offer an encoding-neutral string type > (which means based on bytes) > To me, an "encoding neutral string type" means roughly "characters are atomic", and the best representation we have for a "character" is a Unicode code point. Through any interface that provides "characters" each individual "character" (code point) is indivisible. To me, Python 3 has exactly an "encoding-neutral string type". It also has a bytes type that is is just that - bytes which can represent anything at all.It might be the UTF-8 representation of a string, but you have the freedom to manipulate it however you like - including making it no longer valid UTF-8. Whilst I think O(1) indexing of strings is important, I don't think it's as important as the property that "characters" are indivisible and would be quite happy for MicroPython to use UTF-8 as the underlying string representation (or some more clever thing, several ideas in this thread) so long as: 1. It maintains a string type that presents code points as indivisible elements; 2. The performance consequences of using UTF-8 are documented, as well as any optimisations, tricks, etc that are used to overcome those consequences (and what impact if any they would have if code written for MicroPython was run in CPython). Cheers, Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Thu Jun 5 14:37:08 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Thu, 5 Jun 2014 15:37:08 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> Message-ID: <20140605153708.7f27412e@x34f> Hello, On Thu, 5 Jun 2014 22:20:04 +1000 Nick Coghlan wrote: [] > problems caused by trusting the locale encoding to be correct, but the > startup code will need non-trivial changes for that to happen - the > C.UTF-8 locale may even become widespread before we get there). ... And until those golden times come, it would be nice if Python did not force its perfect world model, which unfortunately is not based on surrounding reality, and let users solve their encoding problems themselves - when they need, because again, one can go quite a long way without dealing with encodings at all. Whereas now Python3 forces users to deal with encoding almost universally, but forcing a particular for all strings (which is again, doesn't correspond to the state of surrounding reality). I already hear response that it's good that users taught to deal with encoding, that will make them write correct programs, but that's a bit far away from the original aim of making it write "correct" programs easy and pleasant. (And definition of "correct" vary.) But all that is just an opinion. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Thu Jun 5 14:38:13 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 5 Jun 2014 22:38:13 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605121054.GA348@sleipnir.bytereef.org> References: <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605121054.GA348@sleipnir.bytereef.org> Message-ID: On 5 June 2014 22:10, Stefan Krah wrote: > Paul Sokolovsky wrote: >> In this regard, I'm glad to participate in mind-resetting discussion. >> So, let's reiterate - there's nothing like "the best", "the only right", >> "the only correct", "righter than", "more correct than" in CPython's >> implementation of Unicode storage. It is *arbitrary*. Well, sure, it's >> not arbitrary, but based on requirements, and these requirements match >> CPython's (implied) usage model well enough. But among all possible >> sets of requirements, CPython's requirements are no more valid that >> other possible. And other set of requirement fairly clearly lead to >> situation where CPython implementation is rejected as not correct for >> those requirements at all. > > Several core-devs have said that using UTF-8 for MicroPython is perfectly okay. > I also think it's the right choice and I hope that you guys come up with a very > efficient implementation. Based on this discussion , I've also posted a draft patch aimed at clarifying the relevant aspects of the data model section of the language reference (http://bugs.python.org/issue21667). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 5 15:15:54 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 5 Jun 2014 23:15:54 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605153708.7f27412e@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140605153708.7f27412e@x34f> Message-ID: On 5 June 2014 22:37, Paul Sokolovsky wrote: > On Thu, 5 Jun 2014 22:20:04 +1000 > Nick Coghlan wrote: >> problems caused by trusting the locale encoding to be correct, but the >> startup code will need non-trivial changes for that to happen - the >> C.UTF-8 locale may even become widespread before we get there). > > ... And until those golden times come, it would be nice if Python did > not force its perfect world model, which unfortunately is not based on > surrounding reality, and let users solve their encoding problems > themselves - when they need, because again, one can go quite a long way > without dealing with encodings at all. Whereas now Python3 forces users > to deal with encoding almost universally, but forcing a particular for > all strings (which is again, doesn't correspond to the state of > surrounding reality). I already hear response that it's good that users > taught to deal with encoding, that will make them write correct > programs, but that's a bit far away from the original aim of making it > write "correct" programs easy and pleasant. (And definition of > "correct" vary.) As I've said before in other contexts, find me Windows, Mac OS X and JVM developers, or educators and scientists that are as concerned by the text model changes as folks that are primarily focused on Linux system (including network) programming, and I'll be more willing to concede the point. Windows, Mac OS X, and the JVM are all opinionated about the text encodings to be used at platform boundaries (using UTF-16, UTF-8 and UTF-16, respectively). By contrast, Linux (or, more accurately, POSIX) says "well, it's configurable, but we won't provide a reliable mechanism for finding out what the encoding is. So either guess as best you can based on the info the OS *does* provide, assume UTF-8, assume 'some ASCII compatible encoding', or don't do anything that requires knowing the encoding of the data being exchanged with the OS, like, say, displaying file names to users or accepting arbitrary text as input, transforming it in a content aware fashion, and echoing it back in a console application". None of those options are perfectly good choices. 6(ish) years ago, we chose the first option, because it has the best chance of working properly on Linux systems that use ASCII incompatible encodings like ShiftJIS, ISO-2022, and various other East Asian codecs. For normal user space programming, Linux is pretty reliable when it comes to ensuring the locale encoding is set to something sensible, but the price we currently pay for that decision is interoperability issues with things like daemons not receiving any configuration settings and hence falling back the POSIX locale and ssh environment forwarding moving a clients encoding settings to a session on a server with different settings. I still consider it preferable to impose inconveniences like that based on use case (situations where Linux systems don't provide sensible encoding settings) than geographic region (locales where ASCII incompatible encodings are likely to still be in common use). If I (or someone else) ever find the time to implement PEP 432 (or something like it) to address some of the limitations of the interpreter startup sequence that currently make it difficult to avoid relying on the POSIX locale encoding on Linux, then we'll be in a position to reassess that decision based on the increased adoption of UTF-8 by Linux distributions in recent years. As the major community Linux distributions complete the migration of their system utilities to Python 3, we'll get to see if they decide it's better to make their locale settings more reliable, or help make it easier for Python 3 to ignore them when they're wrong. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Thu Jun 5 15:23:12 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 5 Jun 2014 23:23:12 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140604011718.GD10355@ando> References: <20140604011718.GD10355@ando> Message-ID: <20140605132312.GK10355@ando> On Wed, Jun 04, 2014 at 11:17:18AM +1000, Steven D'Aprano wrote: > There is a discussion over at MicroPython about the internal > representation of Unicode strings. Micropython is aimed at embedded > devices, and so minimizing memory use is important, possibly even > more important than performance. [...] Wow! I'm amazed at the response here, since I expected it would have received a fairly brief "Yes" or "No" response, not this long thread. Here is a summary (as best as I am able) of a few points which I think are important: (1) I asked if it would be okay for MicroPython to *optionally* use nominally Unicode strings limited to ASCII. Pretty much the only response to this as been Guido saying "That would be a pretty lousy option", and since nobody has really defended the suggestion, I think we can assume that it's off the table. (2) I asked if it would be okay for ?Py to use an UTF-8 implementation even though it would lead to O(N) indexing operations instead of O(1). There's been some opposition to this, including Guido's: Then again the UTF-8 option would be pretty devastating too for anything manipulating strings (especially since many Python APIs are defined using indexes, e.g. the re module). but unless Guido wants to say different, I think the consensus is that a UTF-8 implementation is allowed, even at the cost of O(N) indexing operations. Saving memory -- assuming that it does save memory, which I think is an assumption and not proven -- over time is allowed. (3) It seems to me that there's been a lot of theorizing about what implementation will be obviously more efficient. Folks, how about some benchmarks before making claims about code efficiency? :-) (4) Similarly, there have been many suggestions more suited in my opinion to python-ideas, or even python-list, for ways to implement O(1) indexing on top of UTF-8. Some of them involve per-string mutable state (e.g. the last index seen), or complicated int sub-classes that need to know what string they come from. Remember your Zen please: Simple is better than complex. Complex is better than complicated. ... If the implementation is hard to explain, it's a bad idea. (5) I'm not convinced that UTF-8 internally is *necessarily* more efficient, but look forward to seeing the result of benchmarks. The rationale of internal UTF-8 is that the use of any other encoding internally will be inefficient since those strings will need to be transcoded to UTF-8 before they can be written or printed, so keeping them as UTF-8 in the first place saves the transcoding step. Well, yes, but many strings may never be written out: print(prefix + s[1:].strip().lower().center(80) + suffix) creates five strings that are never written out and one that is. So if the internal encoding of strings is more efficient than UTF-8, and most of them never need transcoding to UTF-8, a non-UTF-8 internal format might be a nett win. So I'm looking forward to seeing the results of ?Py's experiments with it. Thanks to all who have commented. -- Steven From rdmurray at bitdance.com Thu Jun 5 17:05:19 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 05 Jun 2014 11:05:19 -0400 Subject: [Python-Dev] Request: new "Asyncio" component on the bug tracker In-Reply-To: References: Message-ID: <20140605150519.9F40B250DE7@webabinitio.net> On Thu, 05 Jun 2014 12:03:15 +0200, Victor Stinner wrote: > Would it be possible to add a new "Asyncio" component on > bugs.python.org? If this component is selected, the default nosy list > for asyncio would be used (guido, yury and me, there is already such > list in the nosy list completion). Done. There are two other people in the nosy list (Giapaolo and Antoine). If either of those wish to be auto-nosy, let me know. --David From p.f.moore at gmail.com Thu Jun 5 17:59:51 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 5 Jun 2014 16:59:51 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140605153708.7f27412e@x34f> Message-ID: On 5 June 2014 14:15, Nick Coghlan wrote: > As I've said before in other contexts, find me Windows, Mac OS X and > JVM developers, or educators and scientists that are as concerned by > the text model changes as folks that are primarily focused on Linux > system (including network) programming, and I'll be more willing to > concede the point. There is once again a strong selection bias in this discussion, by its very nature. People who like the new model don't have anything to complain about, and so are not heard. Just to support Nick's point, I for one find the Python 3 text model a huge benefit, both in practical terms of making my programs more robust, and educationally, as I have a far better understanding of encodings and their issues than I ever did under Python 2. Whenever a discussion like this occurs, I find it hard not to resent the people arguing that the new model should be taken away from me and replaced with a form of the old error-prone (for me) approach - as if it was in my best interests. Internal details don't bother me - using UTF8 and having indexing be potentially O(N) is of little relevance. But make me work with a string type that *doesn't* abstract a string as a sequence of Unicode code points and I'll get very upset. Paul From dholth at gmail.com Thu Jun 5 20:41:28 2014 From: dholth at gmail.com (Daniel Holth) Date: Thu, 5 Jun 2014 14:41:28 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140605153708.7f27412e@x34f> Message-ID: On Thu, Jun 5, 2014 at 11:59 AM, Paul Moore wrote: > On 5 June 2014 14:15, Nick Coghlan wrote: >> As I've said before in other contexts, find me Windows, Mac OS X and >> JVM developers, or educators and scientists that are as concerned by >> the text model changes as folks that are primarily focused on Linux >> system (including network) programming, and I'll be more willing to >> concede the point. > > There is once again a strong selection bias in this discussion, by its > very nature. People who like the new model don't have anything to > complain about, and so are not heard. > > Just to support Nick's point, I for one find the Python 3 text model a > huge benefit, both in practical terms of making my programs more > robust, and educationally, as I have a far better understanding of > encodings and their issues than I ever did under Python 2. Whenever a > discussion like this occurs, I find it hard not to resent the people > arguing that the new model should be taken away from me and replaced > with a form of the old error-prone (for me) approach - as if it was in > my best interests. > > Internal details don't bother me - using UTF8 and having indexing be > potentially O(N) is of little relevance. But make me work with a > string type that *doesn't* abstract a string as a sequence of Unicode > code points and I'll get very upset. Once you get past whether str + bytes throws an exception which seems to be the divide most people focus on, you can discover new things like dance-encoded strings, bytes decoded using an incorrect encoding intended to be transcoded into the correct encoding later, surrogates that work perfectly until .encode(), str(bytes), APIs that disagree with you about whether the result should be str or bytes, APIs that return either string or bytes depending on their initializers and so on. Unicode can still be complicated in Python 3 independent of any judgement about whether it is worse, better, or different than Python 2. From v+python at g.nevcal.com Thu Jun 5 20:48:45 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 05 Jun 2014 11:48:45 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605131039.4f5b74d6@x34f> References: <20140604011718.GD10355@ando> <87vbsg36cs.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605001432.126a0a08@x34f> <20140605015253.301e72e7@x34f> <20140605131039.4f5b74d6@x34f> Message-ID: <5390BB8D.7030305@g.nevcal.com> On 6/5/2014 3:10 AM, Paul Sokolovsky wrote: > Hello, > > On Wed, 04 Jun 2014 22:15:30 -0400 > Terry Reedy wrote: > >> think you are again batting at a strawman. If you mean 'read from a >> file', and all you want to do is read bytes from and write bytes to >> external 'files', then there is obviously no need to transcode and >> neither Python 2 or 3 make you do so. > But most files, network protocols are text-based, and I (and many other > people) don't want to artificially use "binary data" type for them, > with all attached funny things, like "b" prefix. And then Python2 > indeed doesn't transcode anything, and Python3 does, without being > asked, and for no good purpose, because in most cases, Input data will > be Output as-is (maybe in byte-boundary-split chunks). > > So, it all goes in rounds - ignoring the forced-Unicode problem (after a > week of subscription to python-list, half of traffic there appear to be > dedicated to Unicode-related flames) on python-dev behalf is not > going to help (Python community). If all your program is doing is reading and writing data (input data will be output as-is), then use of binary doesn't require "b" prefix, because you aren't manipulating the data. Then you have no unnecessary transcoding. If you actually wish to examine or manipulate the content as it flows by, then there are choices. 1) If you need to examine/manipulate only a small fraction of text data with the file, you can pay the small price of a few "b" prefixes to get high performance, and explicitly transcode only the portions that need to be manipulated. 2) If you are examining the bulk of the data as it flows by, but not manipulating it, just examining/extracting, then a full transcoding may be useful for that purpose... but you can perhaps do it explicitly, so that you keep the binary form for I/O. Careful of the block boundaries, in this case, however. 3) If you are actually manipulating the bulk of the data, then the double transcoding (once on input, and once on output) allows you to work in units of codepoints, rather than bytes, which generally makes the manipulation algorithms easier. 4) If you truly cannot afford the processor code of the double transcoding, and need to do all your manipulations at the byte level, then you could avoid the need for "b" prefix by use of a preprocessor for those sections of code that are doing all and only bytes processing... and you'll have lots of arcane, error-prone code to write to manipulate the bytes rather than the codepoints. On the other hand, if you can convince your data sources and sinks to deal in UTF-8, and implement a UTF-8 str in ?Py, then you can both avoid transcoding, and make the arcane algorithms part of the implementation of ?Py rather than of the application code, and support full Unicode. And it seems to me that the world is moving that way... towards UTF-8 as the standard interchange format. Encourage it. Glenn -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Jun 5 21:11:51 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 05 Jun 2014 12:11:51 -0700 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140605153708.7f27412e@x34f> Message-ID: <5390C0F7.7050709@g.nevcal.com> On 6/5/2014 11:41 AM, Daniel Holth wrote: > discover new things > like dance-encoded strings, bytes decoded using an incorrect encoding > intended to be transcoded into the correct encoding later, surrogates > that work perfectly until .encode(), str(bytes), APIs that disagree > with you about whether the result should be str or bytes, APIs that > return either string or bytes depending on their initializers and so > on. Unicode can still be complicated in Python 3 independent of any > judgement about whether it is worse, better, or different than Python > 2. Yes, people can find ways to write bad code in any language. -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Thu Jun 5 21:55:54 2014 From: antoine at python.org (Antoine Pitrou) Date: Thu, 05 Jun 2014 15:55:54 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> Message-ID: Le 04/06/2014 02:51, Chris Angelico a ?crit : > On Wed, Jun 4, 2014 at 3:17 PM, Nick Coghlan wrote: > > It would. The downsides of a UTF-8 representation would be slower > iteration and much slower (O(N)) indexing/slicing. There's no reason for iteration to be slower. Slicing would get O(slice offset + slice size) instead of O(slice size). Regards Antoine. From njs at pobox.com Thu Jun 5 22:51:41 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 5 Jun 2014 21:51:41 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes Message-ID: Hi all, There's a very valuable optimization -- temporary elision -- which numpy can *almost* do. It gives something like a 10-30% speedup for lots of common real-world expressions. It would probably would be useful for non-numpy code too. (In fact it generalizes the str += str special case that's currently hardcoded in ceval.c.) But it can't be done safely without help from the interpreter, and possibly not even then. So I thought I'd raise it here and see if we can get any consensus on whether and how CPython could support this. === The dream === Here's the idea. Take an innocuous expression like: result = (a + b + c) / c This gets evaluated as: tmp1 = a + b tmp2 = tmp1 + c result = tmp2 / c All these temporaries are very expensive. Suppose that a, b, c are arrays with N bytes each, and N is large. For simple arithmetic like this, then costs are dominated by memory access. Allocating an N byte array requires the kernel to clear the memory, which incurs N bytes of memory traffic. If all the operands are already allocated, then performing a three-operand operation like tmp1 = a + b involves 3N bytes of memory traffic (reading the two inputs plus writing the output). In total our example does 3 allocations and has 9 operands, so it does 12N bytes of memory access. If our arrays are small, then the kernel doesn't get involved and some of these accesses will hit the cache, but OTOH the overhead of things like malloc won't be amortized out; the best case starting from a cold cache is 3 mallocs and 6N bytes worth of cache misses (or maybe 5N if we get lucky and malloc'ing 'result' returns the same memory that tmp1 used, and it's still in cache). There's an obvious missed optimization in this code, though, which is that it keeps allocating new temporaries and throwing away old ones. It would be better to just allocate a temporary once and re-use it: tmp1 = a + b tmp1 += c tmp1 /= c result = tmp1 Now we have only 1 allocation and 7 operands, so we touch only 8N bytes of memory. For large arrays -- that don't fit into cache, and for which per-op overhead is amortized out -- this gives a theoretical 33% speedup, and we can realistically get pretty close to this. For smaller arrays, the re-use of tmp1 means that in the best case we have only 1 malloc and 4N bytes worth of cache misses, and we also have a smaller cache footprint, which means this best case will be achieved more often in practice. For small arrays it's harder to estimate the total speedup here, but 66% fewer mallocs and 33% fewer cache misses is certainly enough to make a practical difference. Such optimizations are important enough that numpy operations always give the option of explicitly specifying the output array (like in-place operators but more general and with clumsier syntax). Here's an example small-array benchmark that IIUC uses Jacobi iteration to solve Laplace's equation. It's been written in both natural and hand-optimized formats (compare "num_update" to "num_inplace"): https://yarikoptic.github.io/numpy-vbench/vb_vb_app.html#laplace-inplace num_inplace is totally unreadable, but because we've manually elided temporaries, it's 10-15% faster than num_update. With our prototype automatic temporary elision turned on, this difference disappears -- the natural code gets 10-15% faster, *and* we remove the temptation to write horrible things like num_inplace. What do I mean by "automatic temporary elision"? It's *almost* possible for numpy to automatically convert the first example into the second. The idea is: we want to replace tmp2 = tmp1 + c with tmp1 += c tmp2 = tmp1 And we can do this by defining def __add__(self, other): if is_about_to_be_thrown_away(self): return self.__iadd__(other) else: ... now tmp1.__add__(c) does an in-place add and returns tmp1, no allocation occurs, woohoo. The only little problem is implementing is_about_to_be_thrown_away(). === The sneaky-but-flawed approach === The following implementation may make you cringe, but it comes tantalizingly close to working: bool is_about_to_be_thrown_away(PyObject * obj) { return (Py_REFCNT(obj) == 1); } In fact, AFAICT it's 100% correct for libraries being called by regular python code (which is why I'm able to quote benchmarks at you :-)). The bytecode eval loop always holds a reference to all operands, and then immediately DECREFs them after the operation completes. If one of our arguments has no other references besides this one, then we can be sure that it is a dead obj walking, and steal its corpse. But this has a fatal flaw: people are unreasonable creatures, and sometimes they call Python libraries without going through ceval.c :-(. It's legal for random C code to hold an array object with a single reference count, and then call PyNumber_Add on it, and then expect the original array object to still be valid. But who writes code like that in practice? Well, Cython does. So, this is no-go. === A better (?) approach === This is a pretty arcane bit of functionality that we need, and it interacts with ceval.c, so I'm not at all confident about the best way to do it. (We even have an implementation using libunwind to walk the C stack and make sure that we're being called from ceval.c, which... works, actually, but is unsatisfactory in other ways.) I do have an idea that I *think* might work and be acceptable, but you tell me: Proposal: We add an API call PyEval_LastOpDefinitelyMatches(frame, optype, *args) which checks whether the last instruction executed in 'frame' was in fact an 'optype' instruction and did in fact have arguments 'args'. If it was, then it returns True. If it wasn't, or if we aren't sure, it returns False. The intention is that 'optype' is a semantic encoding of the instruction (like "+" or "function call") and thus can be preserved even if the bytecode details change. Then, in numpy's __add__, we do: 1) fetch the current stack frame from TLS 2) check PyEval_LastOpDefinitelyMatches(frame, "+", arg1, arg2) 3) check for arguments with refcnt == 1 4) check that all arguments are base-class numpy array objects (i.e., PyArray_CheckExact) The logic here is that step (2) tells us that someone did 'arg1 + arg2', so ceval.c is holding a temporary reference to the arguments, and step (3) tells us that at the time of the opcode evaluation there were no other references to these arguments, and step (4) tells us that 'arg1 + arg2' dispatched directly to ndarray.__add__ so there's no chance that anyone else has borrowed a reference in the mean time. AFAICT PyEval_LastOpDefinitelyMatches can *almost* be implemented now; the only problem is that stack_pointer is a local variable in PyEval_EvalFrameEx, and we would need it to be accessible via the frame object. The easy way would be to just move it in there. I don't know if this would have any weird effects on speed due to cache effects, but I guess we could arrange to put it into the same cache line as f_lasti, which is also updated on every opcode? OTOH someone has gone to some trouble to make sure that f_stacktop usually *doesn't* point to the top of the stack, and I guess there must have been some reason for this. Alternatively we could stash a pointer to stack_pointer in the frame object, and that would only need to be updated once per entry/exit to PyEval_EvalFrameEx. Obviously there are a lot of details to work out here, like what the calling convention for PyEval_LastOpDefinitelyMatches should really be, but: * Does this approach seem like it would successfully solve the problem? * Does this approach seem like it would be acceptable in CPython? * Is there a better idea I'm missing? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From p.f.moore at gmail.com Thu Jun 5 23:37:26 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 5 Jun 2014 22:37:26 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On 5 June 2014 21:51, Nathaniel Smith wrote: > Is there a better idea I'm missing? Just a thought, but the temporaries come from the stack manipulation done by the likes of the BINARY_ADD opcode. (After all the bytecode doesn't use temporaries, it's a stack machine). Maybe BINARY_ADD and friends could allow for an alternative fast calling convention for __add__implementations that uses the stack slots directly? This may be something that's only plausible from C code, though. Or may not be plausible at all. I haven't looked at ceval.c for many years... If this is an insane idea, please feel free to ignore me :-) Paul From njs at pobox.com Thu Jun 5 23:47:54 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 5 Jun 2014 22:47:54 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On Thu, Jun 5, 2014 at 10:37 PM, Paul Moore wrote: > On 5 June 2014 21:51, Nathaniel Smith wrote: >> Is there a better idea I'm missing? > > Just a thought, but the temporaries come from the stack manipulation > done by the likes of the BINARY_ADD opcode. (After all the bytecode > doesn't use temporaries, it's a stack machine). Maybe BINARY_ADD and > friends could allow for an alternative fast calling convention for > __add__implementations that uses the stack slots directly? This may be > something that's only plausible from C code, though. Or may not be > plausible at all. I haven't looked at ceval.c for many years... > > If this is an insane idea, please feel free to ignore me :-) To make sure I understand correctly, you're suggesting something like adding a new set of special method slots, __te_add__, __te_mul__, etc., which BINARY_ADD and friends would check for and if found, dispatch to without going through PyNumber_Add? And this way, a type like numpy's array could have a special implementation for __te_add__ that works the same as __add__, except with the added wrinkle that it knows that it will only be called by the interpreter and thus any arguments with refcnt 1 must be temporaries? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From p.f.moore at gmail.com Fri Jun 6 00:12:04 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 5 Jun 2014 23:12:04 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On 5 June 2014 22:47, Nathaniel Smith wrote: > To make sure I understand correctly, you're suggesting something like > adding a new set of special method slots, __te_add__, __te_mul__, > etc. I wasn't thinking in that much detail, TBH. I'm not sure adding a whole set of new slots is sensible for such a specialised case. I think I was more assuming that the special method implementations could use an alternative calling convention, METH_STACK in place of METH_VARARGS, for example. That would likely only be viable for types implemented in C. But either way, it may be more complicated than the advantages would justify... Paul From tjreedy at udel.edu Fri Jun 6 00:57:32 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 05 Jun 2014 18:57:32 -0400 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On 6/5/2014 4:51 PM, Nathaniel Smith wrote: > In fact, AFAICT it's 100% correct for libraries being called by > regular python code (which is why I'm able to quote benchmarks at you > :-)). The bytecode eval loop always holds a reference to all operands, > and then immediately DECREFs them after the operation completes. If > one of our arguments has no other references besides this one, then we > can be sure that it is a dead obj walking, and steal its corpse. > > But this has a fatal flaw: people are unreasonable creatures, and > sometimes they call Python libraries without going through ceval.c > :-(. It's legal for random C code to hold an array object with a > single reference count, and then call PyNumber_Add on it, and then > expect the original array object to still be valid. But who writes > code like that in practice? Well, Cython does. So, this is no-go. I understand that a lot of numpy/scipy code is compiled with Cython, so you really want the optimization to continue working when so compiled. Is there a simple change to Cython that would work, perhaps in coordination with a change to numpy? Is so, you could get the result before 3.5 comes out. I realized that there are other compilers than Cython and non-numpy code that could benefit, so that a more generic solution would also be good. In particular > Here's the idea. Take an innocuous expression like: > > result = (a + b + c) / c > > This gets evaluated as: > > tmp1 = a + b > tmp2 = tmp1 + c > result = tmp2 / c ... > There's an obvious missed optimization in this code, though, which is > that it keeps allocating new temporaries and throwing away old ones. > It would be better to just allocate a temporary once and re-use it: > tmp1 = a + b > tmp1 += c > tmp1 /= c > result = tmp1 Could this transformation be done in the ast? And would that help? A prolonged discussion might be better on python-ideas. See what others say. -- Terry Jan Reedy From njs at pobox.com Fri Jun 6 00:22:17 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 5 Jun 2014 23:22:17 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On Thu, Jun 5, 2014 at 11:12 PM, Paul Moore wrote: > On 5 June 2014 22:47, Nathaniel Smith wrote: >> To make sure I understand correctly, you're suggesting something like >> adding a new set of special method slots, __te_add__, __te_mul__, >> etc. > > I wasn't thinking in that much detail, TBH. I'm not sure adding a > whole set of new slots is sensible for such a specialised case. I > think I was more assuming that the special method implementations > could use an alternative calling convention, METH_STACK in place of > METH_VARARGS, for example. That would likely only be viable for types > implemented in C. > > But either way, it may be more complicated than the advantages would justify... Oh, I see, that's clever. But, unfortunately most __special__ methods at the C level don't use METH_*, they just have hard-coded calling conventions: https://docs.python.org/3/c-api/typeobj.html#number-structs -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From ncoghlan at gmail.com Fri Jun 6 01:36:09 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 6 Jun 2014 09:36:09 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <5390C0F7.7050709@g.nevcal.com> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140605153708.7f27412e@x34f> <5390C0F7.7050709@g.nevcal.com> Message-ID: On 6 Jun 2014 05:13, "Glenn Linderman" wrote: > > On 6/5/2014 11:41 AM, Daniel Holth wrote: >> >> discover new things >> like dance-encoded strings, bytes decoded using an incorrect encoding >> intended to be transcoded into the correct encoding later, surrogates >> that work perfectly until .encode(), str(bytes), APIs that disagree >> with you about whether the result should be str or bytes, APIs that >> return either string or bytes depending on their initializers and so >> on. Unicode can still be complicated in Python 3 independent of any >> judgement about whether it is worse, better, or different than Python >> 2. > > Yes, people can find ways to write bad code in any language. Note that several of the issues Daniel mentions here are due to the lack of reliable encoding settings on Linux and the challenges of the Py2->3 migration, rather than users writing bad code. Several of them represent bugs to be fixed or serve as indicators of missing features that would make it easier to work around an imperfect world. Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Jun 6 02:51:11 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 06 Jun 2014 12:51:11 +1200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605132312.GK10355@ando> References: <20140604011718.GD10355@ando> <20140605132312.GK10355@ando> Message-ID: <5391107F.1010500@canterbury.ac.nz> Steven D'Aprano wrote: > (1) I asked if it would be okay for MicroPython to *optionally* use > nominally Unicode strings limited to ASCII. Pretty much the only > response to this as been Guido saying "That would be a pretty lousy > option", It would be limiting to have this as the *only* way of dealing with unicode, but I don't see anything wrong with having this available as an option for applications that truly don't need anything more than ascii. There must be plenty of those; the controller that runs my car engine, for example, doesn't exchange text with the outside world at all. > The > rationale of internal UTF-8 is that the use of any other encoding > internally will be inefficient since those strings will need to be > transcoded to UTF-8 before they can be written or printed, No, I think the rationale is that UTF-8 is likely to use less memory than UTF-16 or UTF-32. -- Greg From Nikolaus at rath.org Fri Jun 6 03:15:42 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 05 Jun 2014 18:15:42 -0700 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: (Nathaniel Smith's message of "Thu, 5 Jun 2014 21:51:41 +0100") References: Message-ID: <8761kevnwh.fsf@vostro.rath.org> Nathaniel Smith writes: > Such optimizations are important enough that numpy operations always > give the option of explicitly specifying the output array (like > in-place operators but more general and with clumsier syntax). Here's > an example small-array benchmark that IIUC uses Jacobi iteration to > solve Laplace's equation. It's been written in both natural and > hand-optimized formats (compare "num_update" to "num_inplace"): > > https://yarikoptic.github.io/numpy-vbench/vb_vb_app.html#laplace-inplace > > num_inplace is totally unreadable, but because we've manually elided > temporaries, it's 10-15% faster than num_update. Does it really have to be that ugly? Shouldn't using tmp += u[2:,1:-1] tmp *= dy2 instead of np.add(tmp, u[2:,1:-1], out=tmp) np.multiply(tmp, dy2, out=tmp) give the same performance? (yes, not as nice as what you're proposing, but I'm still curious). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From njs at pobox.com Fri Jun 6 03:26:26 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Jun 2014 02:26:26 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: <8761kevnwh.fsf@vostro.rath.org> References: <8761kevnwh.fsf@vostro.rath.org> Message-ID: On 6 Jun 2014 02:16, "Nikolaus Rath" wrote: > > Nathaniel Smith writes: > > Such optimizations are important enough that numpy operations always > > give the option of explicitly specifying the output array (like > > in-place operators but more general and with clumsier syntax). Here's > > an example small-array benchmark that IIUC uses Jacobi iteration to > > solve Laplace's equation. It's been written in both natural and > > hand-optimized formats (compare "num_update" to "num_inplace"): > > > > https://yarikoptic.github.io/numpy-vbench/vb_vb_app.html#laplace-inplace > > > > num_inplace is totally unreadable, but because we've manually elided > > temporaries, it's 10-15% faster than num_update. > > Does it really have to be that ugly? Shouldn't using > > tmp += u[2:,1:-1] > tmp *= dy2 > > instead of > > np.add(tmp, u[2:,1:-1], out=tmp) > np.multiply(tmp, dy2, out=tmp) > > give the same performance? (yes, not as nice as what you're proposing, > but I'm still curious). Yes, only the last line actually requires the out= syntax, everything else could use in place operators instead (and automatic temporary elision wouldn't work for the last line anyway). I guess whoever wrote it did it that way for consistency (and perhaps in hopes of eking out a tiny bit more speed - in numpy currently the in-place operators are implemented by dispatching to function calls like those). Not sure how much difference it really makes in practice though. It'd still be 8 statements and two named temporaries to do the work of one infix expression, with order of operations implicit. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jun 6 03:47:50 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Jun 2014 02:47:50 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On 5 Jun 2014 23:58, "Terry Reedy" wrote: > > On 6/5/2014 4:51 PM, Nathaniel Smith wrote: > >> In fact, AFAICT it's 100% correct for libraries being called by >> regular python code (which is why I'm able to quote benchmarks at you >> :-)). The bytecode eval loop always holds a reference to all operands, >> and then immediately DECREFs them after the operation completes. If >> one of our arguments has no other references besides this one, then we >> can be sure that it is a dead obj walking, and steal its corpse. >> >> But this has a fatal flaw: people are unreasonable creatures, and >> sometimes they call Python libraries without going through ceval.c >> :-(. It's legal for random C code to hold an array object with a >> single reference count, and then call PyNumber_Add on it, and then >> expect the original array object to still be valid. But who writes >> code like that in practice? Well, Cython does. So, this is no-go. > > > I understand that a lot of numpy/scipy code is compiled with Cython, so you really want the optimization to continue working when so compiled. Is there a simple change to Cython that would work, perhaps in coordination with a change to numpy? Is so, you could get the result before 3.5 comes out. Unfortunately we don't actually know whether Cython is the only culprit (such code *could* be written by hand), and even if we fixed Cython it would take some unknowable amount of time before all downstream users upgraded their Cythons. (It's pretty common for projects to check in Cython-generated .c files, and only regenerate when the Cython source actually gets modified.) Pretty risky for an optimization. > I realized that there are other compilers than Cython and non-numpy code that could benefit, so that a more generic solution would also be good. In particular > > > Here's the idea. Take an innocuous expression like: > > > > result = (a + b + c) / c > > > > This gets evaluated as: > > > > tmp1 = a + b > > tmp2 = tmp1 + c > > result = tmp2 / c > ... > > > There's an obvious missed optimization in this code, though, which is > > that it keeps allocating new temporaries and throwing away old ones. > > It would be better to just allocate a temporary once and re-use it: > > tmp1 = a + b > > tmp1 += c > > tmp1 /= c > > result = tmp1 > > Could this transformation be done in the ast? And would that help? I don't think it could be done in the ast because I don't think you can work with anonymous temporaries there. But, now that you mention it, it could be done on the fly in the implementation of the relevant opcodes. I.e., BIN_ADD could do if (Py_REFCNT(left) == 1) result = PyNumber_InPlaceAdd(left, right); else result = PyNumber_Add(left, right) Upside: all packages automagically benefit! Potential downsides to consider: - Subtle but real and user-visible change in Python semantics. I'd be a little nervous about whether anyone has implemented, say, an iadd with side effects such that you can tell whether a copy was made, even if the object being copied is immediately destroyed. Maybe this doesn't make sense though. - Only works when left operand is the temporary ("remember that a*b+c is faster than c+a*b"), and only for arithmetic (no benefit for np.sin(a + b)). Probably does cover the majority of cases though. > A prolonged discussion might be better on python-ideas. See what others say. Yeah, I wasn't sure which list to use for this one, happy to move if it would work better. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Jun 6 03:51:13 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 6 Jun 2014 11:51:13 +1000 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On Fri, Jun 6, 2014 at 11:47 AM, Nathaniel Smith wrote: > Unfortunately we don't actually know whether Cython is the only culprit > (such code *could* be written by hand), and even if we fixed Cython it would > take some unknowable amount of time before all downstream users upgraded > their Cythons. (It's pretty common for projects to check in Cython-generated > .c files, and only regenerate when the Cython source actually gets > modified.) Pretty risky for an optimization. But code will still work, right? I mean, you miss out on an optimization, but it won't actually be wrong code? It should be possible to say "After upgrading to Cython version x.y, regenerate all your .c files to take advantage of this new optimization". ChrisA From greg.ewing at canterbury.ac.nz Fri Jun 6 04:17:20 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 06 Jun 2014 14:17:20 +1200 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: <539124B0.1070701@canterbury.ac.nz> Nathaniel Smith wrote: > I.e., BIN_ADD could do > > if (Py_REFCNT(left) == 1) > result = PyNumber_InPlaceAdd(left, right); > else > result = PyNumber_Add(left, right) > > Upside: all packages automagically benefit! > > Potential downsides to consider: > - Subtle but real and user-visible change in Python semantics. That would be a real worry. Even if such cases were rare, they'd be damnably difficult to debug when they did occur. I think for safety's sake this should only be done if the type concerned opts in somehow, perhaps by a tp_flag indicating that the type is eligible for temporary elision. -- Greg From sturla.molden at gmail.com Fri Jun 6 04:18:05 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 06 Jun 2014 04:18:05 +0200 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: On 05/06/14 22:51, Nathaniel Smith wrote: > This gets evaluated as: > > tmp1 = a + b > tmp2 = tmp1 + c > result = tmp2 / c > > All these temporaries are very expensive. Suppose that a, b, c are > arrays with N bytes each, and N is large. For simple arithmetic like > this, then costs are dominated by memory access. Allocating an N byte > array requires the kernel to clear the memory, which incurs N bytes of > memory traffic. It seems to be the case that a large portion of the run-time in Python code using NumPy can be spent in the kernel zeroing pages (which the kernel does for security reasons). I think this can also be seen as a 'malloc problem'. It comes about because each new NumPy array starts with a fresh buffer allocated by malloc. Perhaps buffers can be reused? Sturla From greg.ewing at canterbury.ac.nz Fri Jun 6 04:26:35 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 06 Jun 2014 14:26:35 +1200 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: <539126DB.8010306@canterbury.ac.nz> Nathaniel Smith wrote: > I'd be a > little nervous about whether anyone has implemented, say, an iadd with > side effects such that you can tell whether a copy was made, even if the > object being copied is immediately destroyed. I can think of at least one plausible scenario where this could occur: the operand is a view object that wraps another object, and its __iadd__ method updates that other object. In fact, now that I think about it, exactly this kind of thing happens in numpy when you slice an array! So the opt-in indicator would need to be dynamic, on a per-object basis, rather than a type flag. -- Greg From Nikolaus at rath.org Fri Jun 6 04:27:20 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 05 Jun 2014 19:27:20 -0700 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: (Nathaniel Smith's message of "Fri, 6 Jun 2014 02:47:50 +0100") References: Message-ID: <87tx7yu60n.fsf@vostro.rath.org> Nathaniel Smith writes: >> > tmp1 = a + b >> > tmp1 += c >> > tmp1 /= c >> > result = tmp1 >> >> Could this transformation be done in the ast? And would that help? > > I don't think it could be done in the ast because I don't think you can > work with anonymous temporaries there. But, now that you mention it, it > could be done on the fly in the implementation of the relevant opcodes. > I.e., BIN_ADD could do > > if (Py_REFCNT(left) == 1) > result = PyNumber_InPlaceAdd(left, right); > else > result = PyNumber_Add(left, right) > > Upside: all packages automagically benefit! > > Potential downsides to consider: > - Subtle but real and user-visible change in Python semantics. I'd be a > little nervous about whether anyone has implemented, say, an iadd with side > effects such that you can tell whether a copy was made, even if the object > being copied is immediately destroyed. Maybe this doesn't make sense > though. Hmm. I don't think this is as unlikely as it may sound. Consider eg the h5py module: with h5py.File('database.h5') as fh: result = fh['key'] + np.ones(42) if this were transformed to with h5py.File('database.h5') as fh: tmp = fh['key'] tmp += np.ones(42) result = tmp then the database.h5 file would get modified, *and* result would be of type h5py.Dataset rather than np.array. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From greg.ewing at canterbury.ac.nz Fri Jun 6 01:06:57 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 06 Jun 2014 11:06:57 +1200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140605150121.286032df@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> Message-ID: <5390F811.90300@canterbury.ac.nz> Paul Sokolovsky wrote: > All these changes are what let me dream on and speculate on > possibility that Python4 could offer an encoding-neutral string type > (which means based on bytes) Can you elaborate on exactly what you have in mind? You seem to want something different from Python 3 str, Python 3 bytes and Python 2 str, but it's far from clear what you want this type to be like. -- Greg From jimjjewett at gmail.com Fri Jun 6 05:54:55 2014 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Thu, 05 Jun 2014 20:54:55 -0700 (PDT) Subject: [Python-Dev] Internal representation of strings and Micropython (Steven D'Aprano's summary) In-Reply-To: <20140605132312.GK10355@ando> Message-ID: <53913b8f.4d16e00a.2ba2.44f0@mx.google.com> Steven D'Aprano wrote: > (1) I asked if it would be okay for MicroPython to *optionally* use > nominally Unicode strings limited to ASCII. Pretty much the only > response to this as been Guido saying "That would be a pretty lousy > option", and since nobody has really defended the suggestion, I think we > can assume that it's off the table. Lousy is not quite the same as forbidden. Doing it in good faith would require making the limit prominent in the documentation, and raising some sort of CharacterNotSupported exception (or at least a warning) whenever there is an attempt to create a non-ASCII string, even via the C API. > (2) I asked if it would be okay ... to use an UTF-8 implementation > even though it would lead to O(N) indexing operations instead of O(1). > There's been some opposition to this, including Guido's: [Non-ASCII character removed.] It is bad when quirks -- even good quirks -- of one implementation lead people to write code that will perform badly on a different Python implementation. Cpython has at least delayed obvious optimizations for this reason. Changing idiomatic operations from O(1) to O(N) is big enough to cause a concern. That said, the target environment itself apparently limits N to small enough that the problem should be mostly theoretical. If you want to be good citizens, then do put a note in the documentation warning that particularly long strings are likely to cause performance issues unique to the MicroPython implementation. (Frankly, my personal opinion is that if you're really optimizing for space, then long strings will start getting awkward long before N is big enough for algorithmic complexity to overcome constant factors.) > ... those strings will need to be transcoded to UTF-8 before they > can be written or printed, so keeping them as UTF-8 ... That all assumes that the external world is using UTF-8 anyhow. Which is more likely to be true if you document it as a limitation of MicroPython. > ... but many strings may never be written out: print(prefix + s[1:].strip().lower().center(80) + suffix) > creates five strings that are never written out and one that is. But looking at the actual strings -- UTF-8 doesn't really hurt much. Only the slice and center() are more complex, and for a string less than 80 characters long, O(N) is irrelevant. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From steve at pearwood.info Fri Jun 6 08:37:57 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 6 Jun 2014 16:37:57 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <5391107F.1010500@canterbury.ac.nz> References: <20140604011718.GD10355@ando> <20140605132312.GK10355@ando> <5391107F.1010500@canterbury.ac.nz> Message-ID: <20140606063757.GM10355@ando> On Fri, Jun 06, 2014 at 12:51:11PM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >(1) I asked if it would be okay for MicroPython to *optionally* use > >nominally Unicode strings limited to ASCII. Pretty much the only > >response to this as been Guido saying "That would be a pretty lousy > >option", > > It would be limiting to have this as the *only* way of > dealing with unicode, but I don't see anything wrong with > having this available as an option for applications that > truly don't need anything more than ascii. There must be > plenty of those; the controller that runs my car engine, > for example, doesn't exchange text with the outside world > at all. I don't know about car engine controllers, but presumably they have diagnostic ports, and they may sometimes output text. If they output text, then at least hypothetically car mechanics in Russia might prefer their car to output "??????" and "??????" rather than "true" and "false". I think that opportunities for ASCII-only optimizations are shrinking, not getting bigger, as more people come to expect that their computing devices speak their language rather than Foreign. > >The > >rationale of internal UTF-8 is that the use of any other encoding > >internally will be inefficient since those strings will need to be > >transcoded to UTF-8 before they can be written or printed, > > No, I think the rationale is that UTF-8 is likely to use > less memory than UTF-16 or UTF-32. Right. I was talking about memory efficiency. Instead of this, which requires two copies of the string at one time: 1) accept UTF-8 bytes 2) transcode to internal representation 3) discard UTF-8 bytes you could have: 1) accept UTF-8 bytes and be done. -- Steve From breamoreboy at yahoo.co.uk Fri Jun 6 10:32:25 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 06 Jun 2014 09:32:25 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On 04/06/2014 16:52, Mark Lawrence wrote: > On 04/06/2014 16:32, Steve Dower wrote: >> >> If copying into a separate list is a problem (memory-wise), >> re.finditer('\\S+', string) also provides the same behaviour and gives >> me the sliced string, so there's no need to index for anything. >> > > Out of idle curiosity is there anything that stops MicroPython, or any > other implementation for that matter, from providing views of a string > rather than copying every time? IIRC memoryviews in CPython rely on the > buffer protocol at the C API level, so since strings don't support this > protocol you can't take a memoryview of them. Could this actually be > implemented in the future, is the underlying C code just too > complicated, or what? > Anybody? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From jtaylor.debian at googlemail.com Fri Jun 6 10:01:17 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 06 Jun 2014 10:01:17 +0200 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: Message-ID: <5391754D.8000607@googlemail.com> On 06.06.2014 04:18, Sturla Molden wrote: > On 05/06/14 22:51, Nathaniel Smith wrote: > >> This gets evaluated as: >> >> tmp1 = a + b >> tmp2 = tmp1 + c >> result = tmp2 / c >> >> All these temporaries are very expensive. Suppose that a, b, c are >> arrays with N bytes each, and N is large. For simple arithmetic like >> this, then costs are dominated by memory access. Allocating an N byte >> array requires the kernel to clear the memory, which incurs N bytes of >> memory traffic. > > It seems to be the case that a large portion of the run-time in Python > code using NumPy can be spent in the kernel zeroing pages (which the > kernel does for security reasons). > > I think this can also be seen as a 'malloc problem'. It comes about > because each new NumPy array starts with a fresh buffer allocated by > malloc. Perhaps buffers can be reused? > > Sturla > > Caching memory inside of numpy would indeed solve this issue too. There has even been a paper written on this which contains some more serious benchmarks than the laplace case which runs on very old hardware (and the inplace and out of place cases are actually not the same, one computes array/scalar the other array * (1 / scalar)): hiperfit.dk/pdf/Doubling.pdf "The result is an improvement of as much as 2.29 times speedup, on average 1.32 times speedup across a benchmark suite of 15 applications" The problem with this approach is that it is already difficult enough to handle memory in numpy. Having a cache that potentially stores gigabytes of memory out of the users sight will just make things worse. This would not be needed if we can come up with a way on how python can help out numpy in eliding the temporaries. From hrvoje.niksic at avl.com Fri Jun 6 10:53:50 2014 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Fri, 6 Jun 2014 10:53:50 +0200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <5391819E.3060300@avl.com> On 06/04/2014 05:52 PM, Mark Lawrence wrote: > On 04/06/2014 16:32, Steve Dower wrote: >> >> If copying into a separate list is a problem (memory-wise), re.finditer('\\S+', string) also provides the same behaviour and gives me the sliced string, so there's no need to index for anything. >> > > Out of idle curiosity is there anything that stops MicroPython, or any > other implementation for that matter, from providing views of a string > rather than copying every time? IIRC memoryviews in CPython rely on the > buffer protocol at the C API level, so since strings don't support this > protocol you can't take a memoryview of them. Could this actually be > implemented in the future, is the underlying C code just too > complicated, or what? > Memory view of Unicode strings is controversial for two reasons: 1. It exposes the internal representation of the string. If memoryviews of strings were supported in Python 3, PEP 393 would not have been possible (without breaking that feature). 2. Even if it were OK to expose the internal representation, it might not be what the users expect. For example, memoryview("Hrvoje") would return a view of a 6-byte buffer, while memoryview("Nik?i?") would return a view of a 12-byte UCS-2 buffer. The user of a memory view might expect to get UCS-2 (or UCS-4, or even UTF-8) in all cases. An implementation that decided to export strings as memory views might be forced to make a decision about internal representation of strings, and then stick to it. The byte objects don't have these issues, which is why in Python 2.7 memoryview("foo") works just fine, as does memoryview(b"foo") in Python 3. From pmiscml at gmail.com Fri Jun 6 11:13:06 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 6 Jun 2014 12:13:06 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> Message-ID: <20140606121306.06783df6@x34f> Hello, On Thu, 5 Jun 2014 22:21:30 +1000 Tim Delaney wrote: > On 5 June 2014 22:01, Paul Sokolovsky wrote: > > > > > All these changes are what let me dream on and speculate on > > possibility that Python4 could offer an encoding-neutral string type > > (which means based on bytes) > > > > To me, an "encoding neutral string type" means roughly "characters are > atomic", and the best representation we have for a "character" is a And for me it means exactly what "encoding neutral string type" moniker promises - that you should not make any assumption about its encoding. That kinda means "string is atomic", instead of your "characters are atomic". That's the most basic level, and you can write a big enough set of applications using it - for example, get some information from user, store in database, then show back to user at later time. [] > > Cheers, > > Tim Delaney -- Best regards, Paul mailto:pmiscml at gmail.com From victor.stinner at gmail.com Fri Jun 6 11:31:23 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jun 2014 11:31:23 +0200 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream Message-ID: Hi, I added a new BaseEventLoop.is_closed() method to Tulip and Python 3.5 to fix an issue (see Tulip issue 169 for the detail). The problem is that I don't want to add this method to Python 3.4 because usually we don't add new methods in minor versions of Python (future version 3.4.2 in this case). Guido just wrote in the issue: "Actually for asyncio we have special dispensation to push new features to minor releases (until 3.5). Please push to 3.4 so the source code is the same everywhere (except selectors.py, which is not covered by the exception)." I disagree with Guido. I would prefer to start to maintain a different branch for Python 3.4, because I consider that only bugfixes should be applied to Python 3.4. It's not the first change that cannot be applied on Python 3.4 (only in Tulip and Python 3.5): the selectors module now also supports devpoll on Solaris. It's annoying because the Tulip script "update_stdlib.sh" used to synchronize Tulip and Python wants to replace Lib/selectors.py in Python 3.4. I have to revert the change each time. I propose a new workflow: use Python default (future version 3.5) as the new asyncio "upstream". Bugfixes would be applied as other Python bugfixes: first in Python 3.4, than in Python 3.5. The "update_stdlib.sh" script of Tulip should be modified to copy files from Python default to Tulip (opposite of the current direction). Workflow: New feature: Python 3.5 => Tulip => Trollius Bugfix: Python 3.4 => Python 3.5 => Tulip => Trollius I don't think that Tulip should have minor release just for bugfixes, it would be a pain to maintain. Tulip is a third party module, it doesn't have the same constraints than Python stdlib. What do you think? Victor From pmiscml at gmail.com Fri Jun 6 12:15:31 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 6 Jun 2014 13:15:31 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605121054.GA348@sleipnir.bytereef.org> Message-ID: <20140606131531.2f8431c1@x34f> Hello, On Thu, 5 Jun 2014 22:38:13 +1000 Nick Coghlan wrote: > On 5 June 2014 22:10, Stefan Krah wrote: > > Paul Sokolovsky wrote: > >> In this regard, I'm glad to participate in mind-resetting > >> discussion. So, let's reiterate - there's nothing like "the best", > >> "the only right", "the only correct", "righter than", "more > >> correct than" in CPython's implementation of Unicode storage. It > >> is *arbitrary*. Well, sure, it's not arbitrary, but based on > >> requirements, and these requirements match CPython's (implied) > >> usage model well enough. But among all possible sets of > >> requirements, CPython's requirements are no more valid that other > >> possible. And other set of requirement fairly clearly lead to > >> situation where CPython implementation is rejected as not correct > >> for those requirements at all. > > > > Several core-devs have said that using UTF-8 for MicroPython is > > perfectly okay. I also think it's the right choice and I hope that > > you guys come up with a very efficient implementation. > > Based on this discussion , I've also posted a draft patch aimed at > clarifying the relevant aspects of the data model section of the > language reference (http://bugs.python.org/issue21667). Thanks, it's very much appreciated. Though, the discussion there opened another can of worms. I'm sorry if I was somehow related to that, my bringing in the formal language spec was more a rhetorical figure, a response to people claiming O(1) requirement. So, it either should be in spec, or spec should be treated as such - something not specified means it's underspecified and implementation-dependent. I'm glad that the last point now explicitly pronounced by BDFL in the last comment of that ticket (http://bugs.python.org/issue21667#msg219824) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/pmiscml%40gmail.com -- Best regards, Paul mailto:pmiscml at gmail.com From greg.ewing at canterbury.ac.nz Fri Jun 6 12:48:59 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 06 Jun 2014 22:48:59 +1200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140606063757.GM10355@ando> References: <20140604011718.GD10355@ando> <20140605132312.GK10355@ando> <5391107F.1010500@canterbury.ac.nz> <20140606063757.GM10355@ando> Message-ID: <53919C9B.6000100@canterbury.ac.nz> Steven D'Aprano wrote: > I don't know about car engine controllers, but presumably they have > diagnostic ports, and they may sometimes output text. If they output > text, then at least hypothetically car mechanics in Russia might prefer > their car to output "??????" and "??????" rather than "true" and > "false". From a bit of googling, it seems that engine controller diagnostic ports typically speak some kind of binary protocol. So it would be up to the software running on whatever was plugged into the port to display the information in the user's native language. E.g. this document lists a big pile of hex byte values and little or no text that I can see: https://law.resource.org/pub/us/cfr/ibr/005/sae.j1979.2002.pdf -- Greg From rdmurray at bitdance.com Fri Jun 6 13:00:40 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 06 Jun 2014 07:00:40 -0400 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream In-Reply-To: References: Message-ID: <20140606110041.08C03250DE6@webabinitio.net> On Fri, 06 Jun 2014 11:31:23 +0200, Victor Stinner wrote: > Hi, > > I added a new BaseEventLoop.is_closed() method to Tulip and Python 3.5 > to fix an issue (see Tulip issue 169 for the detail). The problem is > that I don't want to add this method to Python 3.4 because usually we > don't add new methods in minor versions of Python (future version > 3.4.2 in this case). > > Guido just wrote in the issue: "Actually for asyncio we have special > dispensation to push new features to minor releases (until 3.5). > Please push to 3.4 so the source code is the same everywhere (except > selectors.py, which is not covered by the exception)." > > I disagree with Guido. I would prefer to start to maintain a different > branch for Python 3.4, because I consider that only bugfixes should be > applied to Python 3.4. > > It's not the first change that cannot be applied on Python 3.4 (only > in Tulip and Python 3.5): the selectors module now also supports > devpoll on Solaris. It's annoying because the Tulip script > "update_stdlib.sh" used to synchronize Tulip and Python wants to > replace Lib/selectors.py in Python 3.4. I have to revert the change each time. > > I propose a new workflow: use Python default (future version 3.5) as > the new asyncio "upstream". Bugfixes would be applied as other Python > bugfixes: first in Python 3.4, than in Python 3.5. The > "update_stdlib.sh" script of Tulip should be modified to copy files > from Python default to Tulip (opposite of the current direction). > > Workflow: > > New feature: Python 3.5 => Tulip => Trollius > Bugfix: Python 3.4 => Python 3.5 => Tulip => Trollius > > I don't think that Tulip should have minor release just for bugfixes, > it would be a pain to maintain. Tulip is a third party module, it > doesn't have the same constraints than Python stdlib. > > What do you think? I don't have any opinion on the workflow. My understanding is that part of the purpose of the "provisional" designation is to allow faster evolution (read: fixing) of an API before the library becomes non-provisional. Thus I agree with Guido here, and will be doing something similar with at least one of the minor provisional email API features in 3.4.2 (unless I miss the cutoff again ... :( --David From ncoghlan at gmail.com Fri Jun 6 13:10:49 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 6 Jun 2014 21:10:49 +1000 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream In-Reply-To: References: Message-ID: On 6 June 2014 19:31, Victor Stinner wrote: > Guido just wrote in the issue: "Actually for asyncio we have special > dispensation to push new features to minor releases (until 3.5). > Please push to 3.4 so the source code is the same everywhere (except > selectors.py, which is not covered by the exception)." > > I disagree with Guido. I would prefer to start to maintain a different > branch for Python 3.4, because I consider that only bugfixes should be > applied to Python 3.4. This is why PEP 411 was thrashed out: to let us split the dates of "make broadly available in the standard library" and "get ultra conservative with API changes". asyncio was added as a provisional module, so it can still get new features in 3.4.x maintenance releases - that's a far more minor change than the backwards compatibility breaks permitted by the PEP. The difference with selectors is that it was *not* added as a provisional module - it's subject to all the normal stability requirements. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Fri Jun 6 13:11:27 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 06 Jun 2014 20:11:27 +0900 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140606121306.06783df6@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140606121306.06783df6@x34f> Message-ID: <8761ke2syo.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Sokolovsky writes: > That kinda means "string is atomic", instead of your "characters are > atomic". I would be very surprised if a language that behaved that way was called a "Python subset". No indexing, no slicing, no regexps, no .split(), no .startswith(), no sorted() or .sort(), ...!? If that's not what you mean by "string is atomic", I think you're using very confusing terminology. From pmiscml at gmail.com Fri Jun 6 13:15:35 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 6 Jun 2014 14:15:35 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140605153708.7f27412e@x34f> Message-ID: <20140606141535.69e8bab0@x34f> Hello, On Thu, 5 Jun 2014 23:15:54 +1000 Nick Coghlan wrote: > On 5 June 2014 22:37, Paul Sokolovsky wrote: > > On Thu, 5 Jun 2014 22:20:04 +1000 > > Nick Coghlan wrote: > >> problems caused by trusting the locale encoding to be correct, but > >> the startup code will need non-trivial changes for that to happen > >> - the C.UTF-8 locale may even become widespread before we get > >> there). > > > > ... And until those golden times come, it would be nice if Python > > did not force its perfect world model, which unfortunately is not > > based on surrounding reality, and let users solve their encoding > > problems themselves - when they need, because again, one can go > > quite a long way without dealing with encodings at all. Whereas now > > Python3 forces users to deal with encoding almost universally, but > > forcing a particular for all strings (which is again, doesn't > > correspond to the state of surrounding reality). I already hear > > response that it's good that users taught to deal with encoding, > > that will make them write correct programs, but that's a bit far > > away from the original aim of making it write "correct" programs > > easy and pleasant. (And definition of "correct" vary.) > > As I've said before in other contexts, find me Windows, Mac OS X and > JVM developers, or educators and scientists that are as concerned by > the text model changes as folks that are primarily focused on Linux > system (including network) programming, and I'll be more willing to > concede the point. Well, but this question reduces to finding out (or specifying) who are target audiences of Python. It always has been (with a bow to Guido) forpost of scientific users (and probably even if there was mass exodus of other categories of users will remain prominent in that role). But Python has always had its share as system scripting language among Perl-haters, and with Perl going flatline, I guess it's fair to say that Python is major system scripting and service implementation language. To whom all features like memoryview, array.array, in-place input operations, etc. cater? To scientists? I'm sure most of them are just happy with stuffing "@jit" for their kernel functions. And scientist who bother with memoryviews for their data structures are system-level-ish programmers too. So, no wonder that Linux crowd cries at Python3 - it makes doing simple things unnecessarily complicated. > Windows, Mac OS X, and the JVM are all opinionated about the text > encodings to be used at platform boundaries (using UTF-16, UTF-8 and > UTF-16, respectively). By contrast, Linux (or, more accurately, POSIX) > says "well, it's configurable, but we won't provide a reliable > mechanism for finding out what the encoding is. So either guess as [] Yes, I understand complexity of developing cross-platform language with advanced features. By I may offer another look at all this activity: Python3 was brave enough to do revolution in its own world (catching a lot of its users by surprise), but surely not brave enough to do revolution around itself, by saying something like "We choose ONE, the most right, and even the most used (per bytes transferred) encoding as our standard I/O encoding. Grow up or explicitly specify encoding which you personally need.". Surely, it didn't to that - it makes no sense to fight the world. But then Python3 is sympathetic about Java's desire to use "UTF-16" instead of "right" encoding, and no so about Unix desire to treat encodings as a separate level from content (and treating Unicode by nothing else as yet another arbitrary encoding, which it is formally, and will be for a long time de-facto, however sad it is). So, maybe "cross-platform" should have mean "don't do implicit conversions". Because see, Python2 had a problem with implicit encoding conversion when str and unicode objects were mixed, and Python3 has problem with implicit conversions whenever str is used at all. Anyway, I appreciate detailed responses, and understand what you (Python3 developers) are trying to achieve, and appreciate your work, and hope it all work out. Each user has own concerns about Unicode. Mine are efficiency and layering. But once MicroPython has UTF-8 support I will be much more relaxed about it. Layering is harder to accept, but hopefully can be tackled too both on own mind's and technical sides. I hope other users will find their peace with Unicode too! [] -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Fri Jun 6 13:34:01 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 6 Jun 2014 14:34:01 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <8761ke2syo.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140606121306.06783df6@x34f> <8761ke2syo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140606143401.79a7b0ee@x34f> Hello, On Fri, 06 Jun 2014 20:11:27 +0900 "Stephen J. Turnbull" wrote: > Paul Sokolovsky writes: > > > That kinda means "string is atomic", instead of your "characters > > are atomic". > > I would be very surprised if a language that behaved that way was > called a "Python subset". No indexing, no slicing, no regexps, no > .split(), no .startswith(), no sorted() or .sort(), ...!? > > If that's not what you mean by "string is atomic", I think you're > using very confusing terminology. I'm sorry if I didn't mention it, or didn't make it clear enough - it's all about layering. On level 0, you treat strings verbatim, and can write some subset of apps (my point is that even this level allows to write lot enough apps). Let's call this set A0. On level 1, you accept that there's some universal enough conventions for some chars, like space or newline. And you can write set of apps A1 > A0. On level 2, you add len(), and - oh magic - you now can center a string within fixed-size field, something you probably to as often as once a month, so hopefully that will keep you busy for few. On level 3, it indeed starts to smell Unicode, we get isdigit(), isalpha(), which require long boring tables, which hopefully can be compressed enough to fit in your pocket. On level 4, it's pumping up, with tolower() and friends, tables for which you carry around in suitcase. On level 5, everything is Unicode, what a bliss! You can even start pretending that no other levels exist (God created Unicode on a second day). On level 6, there're mind-boggling, ugly manual-use utilities to deal with internals of "magic" "working on its own for everyone" encoding to deal with stuff like code-point vs charecters vs surrogate pair vs grapheme separation, etc. So, once again, for me and some other people, it's not that bright idea to shoot for level 5 if levels 0-4 exist and well-proven pragmatic model. And level 6 is still there anyway. -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Fri Jun 6 13:35:49 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 6 Jun 2014 21:35:49 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140606141535.69e8bab0@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140605153708.7f27412e@x34f> <20140606141535.69e8bab0@x34f> Message-ID: On 6 June 2014 21:15, Paul Sokolovsky wrote: > Hello, > > On Thu, 5 Jun 2014 23:15:54 +1000 > Nick Coghlan wrote: > >> On 5 June 2014 22:37, Paul Sokolovsky wrote: >> > On Thu, 5 Jun 2014 22:20:04 +1000 >> > Nick Coghlan wrote: >> >> problems caused by trusting the locale encoding to be correct, but >> >> the startup code will need non-trivial changes for that to happen >> >> - the C.UTF-8 locale may even become widespread before we get >> >> there). >> > >> > ... And until those golden times come, it would be nice if Python >> > did not force its perfect world model, which unfortunately is not >> > based on surrounding reality, and let users solve their encoding >> > problems themselves - when they need, because again, one can go >> > quite a long way without dealing with encodings at all. Whereas now >> > Python3 forces users to deal with encoding almost universally, but >> > forcing a particular for all strings (which is again, doesn't >> > correspond to the state of surrounding reality). I already hear >> > response that it's good that users taught to deal with encoding, >> > that will make them write correct programs, but that's a bit far >> > away from the original aim of making it write "correct" programs >> > easy and pleasant. (And definition of "correct" vary.) >> >> As I've said before in other contexts, find me Windows, Mac OS X and >> JVM developers, or educators and scientists that are as concerned by >> the text model changes as folks that are primarily focused on Linux >> system (including network) programming, and I'll be more willing to >> concede the point. > > Well, but this question reduces to finding out (or specifying) who are > target audiences of Python. It always has been (with a bow to Guido) > forpost of scientific users (and probably even if there was mass exodus > of other categories of users will remain prominent in that role). But > Python has always had its share as system scripting language among > Perl-haters, and with Perl going flatline, I guess it's fair to say > that Python is major system scripting and service implementation > language. Correct - and the efforts of a number of core developers are focused on getting the Linux distros and major projects like OpenStack migrated. If other Linux users say "I'm not switching to Python 3 until after my distro has switched their own Python applications over", that's a perfectly reasonable course of action for them to take. After all, that approach to the adoption of new Python versions is a large part of why Python 2.6 is still so widely supported by library and framework developers: enterprise Linux distros haven't even finished migrating to Python 2.7 yet, let alone Python 3. (The other reason is that the language moratorium that was applied to Python 2.7 and 3.2 means that supporting back to Python 2.6 isn't that much harder than supporting 2.7 at this point in time). That said, the feedback from the early adopters of Python 3 on Linux is proving invaluable, and Linux users in general will benefit from their work as the distros move their infrastructure applications over. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From timothy.c.delaney at gmail.com Fri Jun 6 13:48:41 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 6 Jun 2014 21:48:41 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140606143401.79a7b0ee@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140606121306.06783df6@x34f> <8761ke2syo.fsf@uwakimon.sk.tsukuba.ac.jp> <20140606143401.79a7b0ee@x34f> Message-ID: On 6 June 2014 21:34, Paul Sokolovsky wrote: > > On Fri, 06 Jun 2014 20:11:27 +0900 > "Stephen J. Turnbull" wrote: > > > Paul Sokolovsky writes: > > > > > That kinda means "string is atomic", instead of your "characters > > > are atomic". > > > > I would be very surprised if a language that behaved that way was > > called a "Python subset". No indexing, no slicing, no regexps, no > > .split(), no .startswith(), no sorted() or .sort(), ...!? > > > > If that's not what you mean by "string is atomic", I think you're > > using very confusing terminology. > > I'm sorry if I didn't mention it, or didn't make it clear enough - it's > all about layering. > > On level 0, you treat strings verbatim, and can write some subset of > apps (my point is that even this level allows to write lot enough > apps). Let's call this set A0. > > On level 1, you accept that there's some universal enough conventions > for some chars, like space or newline. And you can write set of > apps A1 > A0. > At heart, this is exactly what the Python 3 "str" type is. The universal convention is "code points". It's got nothing to do with encodings, or bytes. A Python string is simply a finite sequence of atomic code points - it is indexable, and it has a length. Once you have that, everything is layered on top of it. How the code points themselves are implemented is opaque and irrelevant other than the memory and performance consequences of the implementation decisions (for example, a string could be indexable by iterating from the start until you find the nth code point). Similarly the "bytes" type is a sequence of 8-bit bytes. Encodings are simply a way to transport code points via a byte-oriented transport. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Fri Jun 6 15:18:38 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 6 Jun 2014 16:18:38 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <20140606161838.2fe38114@x34f> Hello, On Fri, 06 Jun 2014 09:32:25 +0100 Mark Lawrence wrote: > On 04/06/2014 16:52, Mark Lawrence wrote: > > On 04/06/2014 16:32, Steve Dower wrote: > >> > >> If copying into a separate list is a problem (memory-wise), > >> re.finditer('\\S+', string) also provides the same behaviour and > >> gives me the sliced string, so there's no need to index for > >> anything. > >> > > > > Out of idle curiosity is there anything that stops MicroPython, or > > any other implementation for that matter, from providing views of a > > string rather than copying every time? IIRC memoryviews in CPython > > rely on the buffer protocol at the C API level, so since strings > > don't support this protocol you can't take a memoryview of them. > > Could this actually be implemented in the future, is the underlying > > C code just too complicated, or what? > > > > Anybody? I'd like to address this, and other, buffer manipulation optimization ideas I have for MicroPython at some time later. But as you suggest, it would possible to transparently have "strings-by-reference". The reasons MicroPython doesn't have such so far (and why I'm, as a uPy contributor, not ready to discuss them) is because they're optimization, and everyone knows what premature optimization is. [] -- Best regards, Paul mailto:pmiscml at gmail.com From breamoreboy at yahoo.co.uk Fri Jun 6 15:30:18 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 06 Jun 2014 14:30:18 +0100 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <5391819E.3060300@avl.com> References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> <5391819E.3060300@avl.com> Message-ID: On 06/06/2014 09:53, Hrvoje Niksic wrote: > On 06/04/2014 05:52 PM, Mark Lawrence wrote: >> On 04/06/2014 16:32, Steve Dower wrote: >>> >>> If copying into a separate list is a problem (memory-wise), >>> re.finditer('\\S+', string) also provides the same behaviour and >>> gives me the sliced string, so there's no need to index for anything. >>> >> >> Out of idle curiosity is there anything that stops MicroPython, or any >> other implementation for that matter, from providing views of a string >> rather than copying every time? IIRC memoryviews in CPython rely on the >> buffer protocol at the C API level, so since strings don't support this >> protocol you can't take a memoryview of them. Could this actually be >> implemented in the future, is the underlying C code just too >> complicated, or what? >> > > Memory view of Unicode strings is controversial for two reasons: > > 1. It exposes the internal representation of the string. If memoryviews > of strings were supported in Python 3, PEP 393 would not have been > possible (without breaking that feature). > > 2. Even if it were OK to expose the internal representation, it might > not be what the users expect. For example, memoryview("Hrvoje") would > return a view of a 6-byte buffer, while memoryview("Nik?i?") would > return a view of a 12-byte UCS-2 buffer. The user of a memory view might > expect to get UCS-2 (or UCS-4, or even UTF-8) in all cases. > > An implementation that decided to export strings as memory views might > be forced to make a decision about internal representation of strings, > and then stick to it. > > The byte objects don't have these issues, which is why in Python 2.7 > memoryview("foo") works just fine, as does memoryview(b"foo") in Python 3. > Thanks for the explanation :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From antoine at python.org Fri Jun 6 16:05:52 2014 From: antoine at python.org (Antoine Pitrou) Date: Fri, 06 Jun 2014 10:05:52 -0400 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream In-Reply-To: <20140606110041.08C03250DE6@webabinitio.net> References: <20140606110041.08C03250DE6@webabinitio.net> Message-ID: Le 06/06/2014 07:00, R. David Murray a ?crit : > > I don't have any opinion on the workflow. > > My understanding is that part of the purpose of the "provisional" > designation is to allow faster evolution (read: fixing) of an API before > the library becomes non-provisional. Thus I agree with Guido here, and > will be doing something similar with at least one of the minor provisional > email API features in 3.4.2 (unless I miss the cutoff again ... :( I would personally distinguish API fixes (compatibility-breaking changes) from feature additions (new APIs). Regards Antoine. From rdmurray at bitdance.com Fri Jun 6 16:37:39 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 06 Jun 2014 10:37:39 -0400 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream In-Reply-To: References: <20140606110041.08C03250DE6@webabinitio.net> Message-ID: <20140606143739.67FEB250DE6@webabinitio.net> On Fri, 06 Jun 2014 10:05:52 -0400, Antoine Pitrou wrote: > Le 06/06/2014 07:00, R. David Murray a ??crit : > > > > I don't have any opinion on the workflow. > > > > My understanding is that part of the purpose of the "provisional" > > designation is to allow faster evolution (read: fixing) of an API before > > the library becomes non-provisional. Thus I agree with Guido here, and > > will be doing something similar with at least one of the minor provisional > > email API features in 3.4.2 (unless I miss the cutoff again ... :( > > I would personally distinguish API fixes (compatibility-breaking > changes) from feature additions (new APIs). It doesn't look like the PEP directly addresses API changes in maintenance releases, and I suppose that should be fixed. I specifically want to fix this API before someone depends on it working the wrong way, which they would have to if I left it alone for the whole of the 3.4 series. (Issue 21091 for the curious.) --David From pmiscml at gmail.com Fri Jun 6 16:52:17 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 6 Jun 2014 17:52:17 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140606121306.06783df6@x34f> <8761ke2syo.fsf@uwakimon.sk.tsukuba.ac.jp> <20140606143401.79a7b0ee@x34f> Message-ID: <20140606175217.766b781c@x34f> Hello, On Fri, 6 Jun 2014 21:48:41 +1000 Tim Delaney wrote: > On 6 June 2014 21:34, Paul Sokolovsky wrote: > > > > > On Fri, 06 Jun 2014 20:11:27 +0900 > > "Stephen J. Turnbull" wrote: > > > > > Paul Sokolovsky writes: > > > > > > > That kinda means "string is atomic", instead of your > > > > "characters are atomic". > > > > > > I would be very surprised if a language that behaved that way was > > > called a "Python subset". No indexing, no slicing, no regexps, no > > > .split(), no .startswith(), no sorted() or .sort(), ...!? > > > > > > If that's not what you mean by "string is atomic", I think you're > > > using very confusing terminology. > > > > I'm sorry if I didn't mention it, or didn't make it clear enough - > > it's all about layering. > > > > On level 0, you treat strings verbatim, and can write some subset of > > apps (my point is that even this level allows to write lot enough > > apps). Let's call this set A0. > > > > On level 1, you accept that there's some universal enough > > conventions for some chars, like space or newline. And you can > > write set of apps A1 > A0. > > > > At heart, this is exactly what the Python 3 "str" type is. The > universal convention is "code points". Yes. Except for one small detail - Python3 specifies these code points to be Unicode code points. And Unicode is a very bloated thing. But if we drop that "Unicode" stipulation, then it's also exactly what MicroPython implements. Its "str" type consists of codepoints, we don't have pet names for them yet, like Unicode does, but their numeric values are 0-255. Note that it in no way limits encodings, characters, or scripts which can be used with MicroPython, because just like Unicode, it support concept of "surrogate pairs" (but we don't call it like that) - specifically, smaller code points may comprise bigger groupings. But unlike Unicode, we don't stipulate format, value or other constraints on how these "surrogate pairs"-alikes are formed, leaving that to users. -- Best regards, Paul mailto:pmiscml at gmail.com From rosuav at gmail.com Fri Jun 6 17:14:30 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 7 Jun 2014 01:14:30 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140606131531.2f8431c1@x34f> References: <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605121054.GA348@sleipnir.bytereef.org> <20140606131531.2f8431c1@x34f> Message-ID: On Fri, Jun 6, 2014 at 8:15 PM, Paul Sokolovsky wrote: > I'm sorry if I was somehow related to that, my > bringing in the formal language spec was more a rhetorical figure, a > response to people claiming O(1) requirement. This was exactly why this whole discussion came up, though. We were debating on the uPy bug tracker about how important O(1) indexing is; I then came to python-list to try to get some solid data from which to debate; and then the discussion jumped here to python-dev for more solid explanations. The spec wasn't perfectly clear, and now it's being made clearer: O(N) indexing does not violate Python's spec, ergo uPy is allowed to use UTF-8 as its internal representation, as long as script-visible behaviour is correct. It'll be interesting to see when it's done (I'm currently working on that implementation, bit by bit) and to run the CPython benchmarks on it. It's been a fruitful and interesting discussion, and the formal language spec is key to it. No need to apologize! ChrisA From regex at mrabarnett.plus.com Fri Jun 6 17:47:24 2014 From: regex at mrabarnett.plus.com (MRAB) Date: Fri, 06 Jun 2014 16:47:24 +0100 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream In-Reply-To: References: Message-ID: <5391E28C.1000406@mrabarnett.plus.com> On 2014-06-06 10:31, Victor Stinner wrote: > Hi, > > I added a new BaseEventLoop.is_closed() method to Tulip and Python > 3.5 to fix an issue (see Tulip issue 169 for the detail). The problem > is that I don't want to add this method to Python 3.4 because usually > we don't add new methods in minor versions of Python (future version > 3.4.2 in this case). > > Guido just wrote in the issue: "Actually for asyncio we have special > dispensation to push new features to minor releases (until 3.5). > Please push to 3.4 so the source code is the same everywhere (except > selectors.py, which is not covered by the exception)." > > I disagree with Guido. I would prefer to start to maintain a > different branch for Python 3.4, because I consider that only > bugfixes should be applied to Python 3.4. > [snip] Isn't this a little like when bool, True and False were added to Python 2.2.1, a bugfix release, an act that is, I believe, now regarded as a mistake not to be repeated? From Steve.Dower at microsoft.com Fri Jun 6 17:41:22 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Fri, 6 Jun 2014 15:41:22 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler Message-ID: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Hi all I would like to propose moving Python 3.5 to use Visual C++ 14.0 as the main compiler. The first CTP of Visual Studio "14" was released earlier this week: http://blogs.msdn.com/b/vcblog/archive/2014/06/03/visual-studio-14-ctp.aspx The major feature of interest in this version of MSVC is a new policy to maintain binary compatibility for the CRT into the future. (There will be a blog about this soon, but I didn't want to hold up getting the discussion started here.) What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. Those who are aware of the current state of affairs where you need to use a matching compiler will hopefully see how big an improvement this will be. It is also likely that other compilers will have an easier time providing compatibility with this new CRT, making it simpler and more reliable to build extensions with LLVM or GCC against an MSVC CPython. The other major benefit is that both products are at points in their development where changes can be made. Being a Microsoft employee, I have the ability to test Python builds regularly against the daily MSVC builds and to file bugs directly to the VC team (crashes, incorrect code generation, incorrect linking, performance regressions, etc.). This is a great opportunity to make sure that our needs are covered by the compiler team - it's also a good chance to raise any particular missing features that would be beneficial. My internal testing shows that the core code is almost fully compatible and builds successfully with only trivial modifications (some CRT variables are now macros with a leading underscore). The project files need updating, but I am willing to do this as part of any migration. There may also be some work required for external dependencies, since I did not test these, but I am also willing to do that. Basically, what I am offering to do is: * Update the files in PCBuild to work with Visual Studio "14" * Make any code changes necessary to build with VC14 * Regularly test the latest Python source with the latest MSVC builds and report issues/suggestions to the MSVC team * Keep all changes in a separate (public) repo until early next year when we're getting close to the final VS "14" release What I am asking anyone else to do is: * Nothing Thoughts/comments/concerns? Cheers, Steve From tjreedy at udel.edu Fri Jun 6 17:59:31 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 06 Jun 2014 11:59:31 -0400 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <5391819E.3060300@avl.com> References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> <5391819E.3060300@avl.com> Message-ID: On 6/6/2014 4:53 AM, Hrvoje Niksic wrote: > On 06/04/2014 05:52 PM, Mark Lawrence wrote: >> Out of idle curiosity is there anything that stops MicroPython, or any >> other implementation for that matter, from providing views of a string >> rather than copying every time? IIRC memoryviews in CPython rely on the >> buffer protocol at the C API level, so since strings don't support this >> protocol you can't take a memoryview of them. Could this actually be >> implemented in the future, is the underlying C code just too >> complicated, or what? >> > > Memory view of Unicode strings is controversial for two reasons: > > 1. It exposes the internal representation of the string. If memoryviews > of strings were supported in Python 3, PEP 393 would not have been > possible (without breaking that feature). > > 2. Even if it were OK to expose the internal representation, it might > not be what the users expect. For example, memoryview("Hrvoje") would > return a view of a 6-byte buffer, while memoryview("Nik?i?") would > return a view of a 12-byte UCS-2 buffer. The user of a memory view might > expect to get UCS-2 (or UCS-4, or even UTF-8) in all cases. > > An implementation that decided to export strings as memory views might > be forced to make a decision about internal representation of strings, > and then stick to it. > > The byte objects don't have these issues, which is why in Python 2.7 > memoryview("foo") works just fine, as does memoryview(b"foo") in Python 3. The other problem is that a small slice view of a large object keeps the large object alive, so a view user needs to think carefully about whether to make a copy or create a view, and later to copy views to delete the base object. This is not for beginners. -- Terry Jan Reedy From rosuav at gmail.com Fri Jun 6 18:01:08 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 7 Jun 2014 02:01:08 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On Sat, Jun 7, 2014 at 1:41 AM, Steve Dower wrote: > What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. Oh, if only this had been available for 2.7!! Actually... this means that 14.0 would be a good target for a compiler change for 2.7.x, if such a change is ever acceptable. To what extent is this compatibility going to be maintained? Is there a guarantee that there'll be X versions (or X years) of cross-compilation support? ChrisA From donald at stufft.io Fri Jun 6 18:01:52 2014 From: donald at stufft.io (Donald Stufft) Date: Fri, 6 Jun 2014 12:01:52 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <40554C37-5379-4D35-8EB8-93481436A8D0@stufft.io> On Jun 6, 2014, at 11:41 AM, Steve Dower wrote: > words +1 from me. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From guido at python.org Fri Jun 6 18:04:38 2014 From: guido at python.org (Guido van Rossum) Date: Fri, 6 Jun 2014 09:04:38 -0700 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream In-Reply-To: <5391E28C.1000406@mrabarnett.plus.com> References: <5391E28C.1000406@mrabarnett.plus.com> Message-ID: On Fri, Jun 6, 2014 at 8:47 AM, MRAB wrote: > On 2014-06-06 10:31, Victor Stinner wrote: > >> Hi, >> >> I added a new BaseEventLoop.is_closed() method to Tulip and Python >> 3.5 to fix an issue (see Tulip issue 169 for the detail). The problem >> is that I don't want to add this method to Python 3.4 because usually >> we don't add new methods in minor versions of Python (future version >> 3.4.2 in this case). >> >> Guido just wrote in the issue: "Actually for asyncio we have special >> dispensation to push new features to minor releases (until 3.5). >> Please push to 3.4 so the source code is the same everywhere (except >> selectors.py, which is not covered by the exception)." >> >> I disagree with Guido. I would prefer to start to maintain a >> different branch for Python 3.4, because I consider that only >> bugfixes should be applied to Python 3.4. >> >> [snip] > > Isn't this a little like when bool, True and False were added to > Python 2.2.1, a bugfix release, an act that is, I believe, now regarded > as a mistake not to be repeated? > It's a little like that, but it's also a little unlike that -- asyncio is explicitly accepted in the stdlib with "provisional" status which allows changes like this. Regarding the workflow, I'd really like asyncio to be able to move faster than the rest of the stdlib, at least until 3.5 is fixed. Working in the Tulip repo is much easier for me than working in the CPython repo, so I'd like to keep the workflow of Tulip -> 3.4 -> 3.5 as long as possible. I also specifically consider selectors.py subject to a *different* workflow -- for that module the workflow should be 3.5 -> Tulip. If Tulip's update_stdlib.sh script's prompts to copy this file are too distracting, I can hack the script to be silent about this file if it detects that the CPython repo is 3.4. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Jun 6 18:06:18 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 6 Jun 2014 16:06:18 +0000 (UTC) Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes References: <5391754D.8000607@googlemail.com> Message-ID: <1842263445423761298.493568sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > The problem with this approach is that it is already difficult enough to > handle memory in numpy. I would not do this in a way that complicates memory management in NumPy. I would just replace malloc and free with temporarily cached versions. From the perspective of NumPy the API should be the same. > Having a cache that potentially stores gigabytes > of memory out of the users sight will just make things worse. Buffer don't need to stay in cache forver, just long enough to allow resue within an expression. We are probably talking about delaying the call to free with just a few microseconds. We could e.g. have a setup like this: NumPy thread on "malloc": - tries to grab memory off the internal heap - calls system malloc on failure NumPy thread on "free": - returns a buffer to the internal heap - signals a condition Background daemonic GC thread: - wakes after sleeping on the condition - sleeps for another N microseconds (N = magic number) - flushes or shrinks the internal heap with system free - goes back to sleeping on the condition It can be implemented with the same API as malloc and free, and plugged directly into the existing NumPy code. We would in total need two mutexes, one condition variable, a pthread, and a heap. Sturla From status at bugs.python.org Fri Jun 6 18:07:55 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 6 Jun 2014 18:07:55 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140606160755.1EFB156A46@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-05-30 - 2014-06-06) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4650 (+15) closed 28802 (+52) total 33452 (+67) Open issues with patches: 2127 Issues opened (48) ================== #21614: Case sensitivity problem in multiprocessing. http://bugs.python.org/issue21614 opened by ColinPDavidson #21615: Curses bug report for Python 2.7 and Python 3.2 http://bugs.python.org/issue21615 opened by eclectic9509 #21616: argparse explodes with nargs='*' and a tuple metavar http://bugs.python.org/issue21616 opened by vvas #21617: importlib reload can fail with AttributeError if module remove http://bugs.python.org/issue21617 opened by ned.deily #21619: Cleaning up a subprocess with a broken pipe http://bugs.python.org/issue21619 opened by vadmium #21621: Add note to 3.x What's New re Idle changes in bugfix releases http://bugs.python.org/issue21621 opened by terry.reedy #21622: ctypes.util incorrectly fails for libraries without DT_SONAME http://bugs.python.org/issue21622 opened by Jeremy.Huntwork #21623: build ssl failed use vs2010 express http://bugs.python.org/issue21623 opened by Mo.Jia #21624: Idle: polish htests http://bugs.python.org/issue21624 opened by terry.reedy #21625: help()'s more-mode is frustrating http://bugs.python.org/issue21625 opened by nedbat #21626: Add options width and compact to pickle cli http://bugs.python.org/issue21626 opened by barcc #21627: Concurrently closing files and iterating over the open files d http://bugs.python.org/issue21627 opened by sstewartgallus #21629: clinic.py --converters fails http://bugs.python.org/issue21629 opened by serhiy.storchaka #21632: Idle: sychronize text files across versions as appropriate. http://bugs.python.org/issue21632 opened by terry.reedy #21633: Argparse does not propagate HelpFormatter class to subparsers http://bugs.python.org/issue21633 opened by Michael.Cohen #21635: difflib.SequenceMatcher stores matching blocks as tuples, not http://bugs.python.org/issue21635 opened by drevicko #21642: "_ if 1else _" does not compile http://bugs.python.org/issue21642 opened by Joshua.Landau #21644: Optimize bytearray(int) constructor to use calloc() http://bugs.python.org/issue21644 opened by haypo #21645: test_read_all_from_pipe_reader() of test_asyncio hangs on Free http://bugs.python.org/issue21645 opened by haypo #21646: Add tests for turtle.ScrolledCanvas http://bugs.python.org/issue21646 opened by ingrid #21647: Idle unittests: make gui, mock switching easier. http://bugs.python.org/issue21647 opened by terry.reedy #21648: urllib urlopener leaves open sockets for FTP connection http://bugs.python.org/issue21648 opened by Claudiu.Popa #21649: Mention "Recommendations for Secure Use of TLS and DTLS" http://bugs.python.org/issue21649 opened by pitrou #21650: add json.tool option to avoid alphabetic sort of fields http://bugs.python.org/issue21650 opened by Pavel.Kazlou #21652: Python 2.7.7 regression in mimetypes module on Windows http://bugs.python.org/issue21652 opened by foom #21655: Write Unit Test for Vec2 class in the Turtle Module http://bugs.python.org/issue21655 opened by Lita.Cho #21656: Create test coverage for TurtleScreenBase in Turtle http://bugs.python.org/issue21656 opened by Lita.Cho #21657: pip.get_installed_distributions() Does not return packages in http://bugs.python.org/issue21657 opened by Adam.Matan #21658: __m128, can't build 3.4.1 with intel 14.0.0 http://bugs.python.org/issue21658 opened by aom #21659: IDLE: One corner calltip case http://bugs.python.org/issue21659 opened by serhiy.storchaka #21660: Substitute @TOKENS@ from sysconfig variables, for python-confi http://bugs.python.org/issue21660 opened by haubi #21664: multiprocessing leaks temporary directories pymp-xxx http://bugs.python.org/issue21664 opened by yjhong #21665: 2.7.7 ttk widgets not themed http://bugs.python.org/issue21665 opened by les.bothwell #21666: Argparse exceptions should include which argument has a proble http://bugs.python.org/issue21666 opened by v+python #21667: Clarify status of O(1) indexing semantics of str objects http://bugs.python.org/issue21667 opened by ncoghlan #21668: The select and time modules uses libm functions without linkin http://bugs.python.org/issue21668 opened by fornwall #21669: Custom error messages when print & exec are used as statements http://bugs.python.org/issue21669 opened by ncoghlan #21670: Add repr to shelve.Shelf http://bugs.python.org/issue21670 opened by Claudiu.Popa #21671: CVE-2014-0224: OpenSSL upgrade to 1.0.1h on Windows required http://bugs.python.org/issue21671 opened by lambacck #21672: Python for Windows 2.7.7: Path Configuration File No Longer Wo http://bugs.python.org/issue21672 opened by jblairpdx #21673: Idle: hilite search terms in hits in Find in Files output wind http://bugs.python.org/issue21673 opened by terry.reedy #21674: Idle: Add 'find all' in current file http://bugs.python.org/issue21674 opened by terry.reedy #21675: Library - Introduction - paragraph 5 - wrong ordering http://bugs.python.org/issue21675 opened by AnthonyBartoli #21676: IDLE - Test Replace Dialog http://bugs.python.org/issue21676 opened by sahutd #21677: Exception context set to string by BufferedWriter.close() http://bugs.python.org/issue21677 opened by vadmium #21678: Add operation "plus" for dictionaries http://bugs.python.org/issue21678 opened by Pix #21679: Prevent extraneous fstat during open() http://bugs.python.org/issue21679 opened by bkabrda #21680: asyncio: document event loops http://bugs.python.org/issue21680 opened by haypo Most recent 15 issues with no replies (15) ========================================== #21680: asyncio: document event loops http://bugs.python.org/issue21680 #21679: Prevent extraneous fstat during open() http://bugs.python.org/issue21679 #21677: Exception context set to string by BufferedWriter.close() http://bugs.python.org/issue21677 #21676: IDLE - Test Replace Dialog http://bugs.python.org/issue21676 #21675: Library - Introduction - paragraph 5 - wrong ordering http://bugs.python.org/issue21675 #21674: Idle: Add 'find all' in current file http://bugs.python.org/issue21674 #21673: Idle: hilite search terms in hits in Find in Files output wind http://bugs.python.org/issue21673 #21670: Add repr to shelve.Shelf http://bugs.python.org/issue21670 #21666: Argparse exceptions should include which argument has a proble http://bugs.python.org/issue21666 #21660: Substitute @TOKENS@ from sysconfig variables, for python-confi http://bugs.python.org/issue21660 #21657: pip.get_installed_distributions() Does not return packages in http://bugs.python.org/issue21657 #21656: Create test coverage for TurtleScreenBase in Turtle http://bugs.python.org/issue21656 #21655: Write Unit Test for Vec2 class in the Turtle Module http://bugs.python.org/issue21655 #21652: Python 2.7.7 regression in mimetypes module on Windows http://bugs.python.org/issue21652 #21649: Mention "Recommendations for Secure Use of TLS and DTLS" http://bugs.python.org/issue21649 Most recent 15 issues waiting for review (15) ============================================= #21679: Prevent extraneous fstat during open() http://bugs.python.org/issue21679 #21676: IDLE - Test Replace Dialog http://bugs.python.org/issue21676 #21670: Add repr to shelve.Shelf http://bugs.python.org/issue21670 #21669: Custom error messages when print & exec are used as statements http://bugs.python.org/issue21669 #21668: The select and time modules uses libm functions without linkin http://bugs.python.org/issue21668 #21660: Substitute @TOKENS@ from sysconfig variables, for python-confi http://bugs.python.org/issue21660 #21650: add json.tool option to avoid alphabetic sort of fields http://bugs.python.org/issue21650 #21648: urllib urlopener leaves open sockets for FTP connection http://bugs.python.org/issue21648 #21627: Concurrently closing files and iterating over the open files d http://bugs.python.org/issue21627 #21626: Add options width and compact to pickle cli http://bugs.python.org/issue21626 #21610: load_module not closing opened files http://bugs.python.org/issue21610 #21600: mock.patch.stopall doesn't work with patch.dict to sys.modules http://bugs.python.org/issue21600 #21599: Argument transport in attach and detach method in Server class http://bugs.python.org/issue21599 #21596: asyncio.wait fails when futures list is empty http://bugs.python.org/issue21596 #21595: asyncio: Creating many subprocess generates lots of internal B http://bugs.python.org/issue21595 Top 10 most discussed issues (10) ================================= #21667: Clarify status of O(1) indexing semantics of str objects http://bugs.python.org/issue21667 15 msgs #21427: installer not working http://bugs.python.org/issue21427 12 msgs #21476: Inconsitent behaviour between BytesParser.parse and Parser.par http://bugs.python.org/issue21476 11 msgs #21592: Make statistics.median run in linear time http://bugs.python.org/issue21592 11 msgs #21573: Clean up turtle.py code formatting http://bugs.python.org/issue21573 9 msgs #21623: build ssl failed use vs2010 express http://bugs.python.org/issue21623 9 msgs #15590: --libs is inconsistent for python-config --libs and pkgconfig http://bugs.python.org/issue15590 8 msgs #21665: 2.7.7 ttk widgets not themed http://bugs.python.org/issue21665 8 msgs #10740: sqlite3 module breaks transactions and potentially corrupts da http://bugs.python.org/issue10740 7 msgs #21671: CVE-2014-0224: OpenSSL upgrade to 1.0.1h on Windows required http://bugs.python.org/issue21671 7 msgs Issues closed (51) ================== #6181: Tkinter.Listbox several minor issues http://bugs.python.org/issue6181 closed by serhiy.storchaka #11387: Tkinter, callback functions http://bugs.python.org/issue11387 closed by terry.reedy #13630: IDLE: Find(ed) text is not highlighted while dialog box is ope http://bugs.python.org/issue13630 closed by terry.reedy #17095: Modules/Setup *shared* support broken http://bugs.python.org/issue17095 closed by ned.deily #18292: Idle: test AutoExpand.py http://bugs.python.org/issue18292 closed by terry.reedy #18409: Idle: test AutoComplete.py http://bugs.python.org/issue18409 closed by terry.reedy #18492: Allow all resources if not running under regrtest.py http://bugs.python.org/issue18492 closed by zach.ware #18910: IDle: test textView.py http://bugs.python.org/issue18910 closed by terry.reedy #19656: Add Py3k warning for non-ascii bytes literals http://bugs.python.org/issue19656 closed by serhiy.storchaka #20336: test_asyncio: relax timings even more http://bugs.python.org/issue20336 closed by skrah #20383: Add a keyword-only spec argument to types.ModuleType http://bugs.python.org/issue20383 closed by brett.cannon #20475: pystone.py in 3.4 still uses time.clock(), even though it's ma http://bugs.python.org/issue20475 closed by gvanrossum #21119: asyncio create_connection resource warning http://bugs.python.org/issue21119 closed by haypo #21180: Efficiently create empty array.array, consistent with bytearra http://bugs.python.org/issue21180 closed by gvanrossum #21233: Add *Calloc functions to CPython memory allocation API http://bugs.python.org/issue21233 closed by haypo #21252: Lib/asyncio/events.py has tons of docstrings which are just "X http://bugs.python.org/issue21252 closed by haypo #21304: PEP 466: Backport hashlib.pbkdf2_hmac to Python 2.7 http://bugs.python.org/issue21304 closed by python-dev #21344: save scores or ratios in difflib get_close_matches http://bugs.python.org/issue21344 closed by zach.ware #21462: PEP 466: upgrade OpenSSL in the Python 2.7 Windows builds http://bugs.python.org/issue21462 closed by benjamin.peterson #21477: Idle: improve idle_test.htest http://bugs.python.org/issue21477 closed by terry.reedy #21504: can the subprocess module war using os.wait4 and so return usa http://bugs.python.org/issue21504 closed by r.david.murray #21533: built-in types dict docs - construct dict from iterable, not i http://bugs.python.org/issue21533 closed by terry.reedy #21552: String length overflow in Tkinter http://bugs.python.org/issue21552 closed by serhiy.storchaka #21572: Use generic license web page rather than requiring release-spe http://bugs.python.org/issue21572 closed by ned.deily #21576: Overwritten (custom) uuid inside dictionary http://bugs.python.org/issue21576 closed by r.david.murray #21583: use support.captured_stderr context manager - test_logging http://bugs.python.org/issue21583 closed by python-dev #21593: Clarify re.search documentation first match http://bugs.python.org/issue21593 closed by terry.reedy #21594: asyncio.create_subprocess_exec raises OSError http://bugs.python.org/issue21594 closed by haypo #21601: Cancel method for Asyncio Task is not documented http://bugs.python.org/issue21601 closed by haypo #21604: Misleading 2to3 fixer name in documentation: standard_error http://bugs.python.org/issue21604 closed by python-dev #21605: Add tests for Tkinter images http://bugs.python.org/issue21605 closed by serhiy.storchaka #21612: IDLE should not open multiple instances of one file http://bugs.python.org/issue21612 closed by terry.reedy #21618: POpen does not close fds when fds have been inherited from a p http://bugs.python.org/issue21618 closed by gregory.p.smith #21620: OrderedDict KeysView set operations not supported http://bugs.python.org/issue21620 closed by serhiy.storchaka #21628: 2to3 does not fix zip in some cases http://bugs.python.org/issue21628 closed by berker.peksag #21630: List Dict bug? http://bugs.python.org/issue21630 closed by Robert.w #21631: List/Dict Combination Bug http://bugs.python.org/issue21631 closed by rhettinger #21634: Pystone uses floats http://bugs.python.org/issue21634 closed by haypo #21636: test_logging fails on Windows for Unix tests http://bugs.python.org/issue21636 closed by haypo #21637: Add a warning section exaplaining that tempfiles are opened in http://bugs.python.org/issue21637 closed by r.david.murray #21638: Seeking to EOF is too inefficient! http://bugs.python.org/issue21638 closed by yanlinlin82 #21639: tracemalloc crashes with floating point exception when using S http://bugs.python.org/issue21639 closed by haypo #21640: References to other Python version in sidebar of documentation http://bugs.python.org/issue21640 closed by orsenthil #21641: smtplib leaves open sockets around if SMTPResponseException is http://bugs.python.org/issue21641 closed by orsenthil #21643: "File exists" error during venv --upgrade http://bugs.python.org/issue21643 closed by python-dev #21651: asyncio tests ResourceWarning http://bugs.python.org/issue21651 closed by haypo #21653: Row.keys() in sqlite3 returns a list, not a tuple http://bugs.python.org/issue21653 closed by r.david.murray #21654: IDLE call tips emitting future warnings about ElementTree obje http://bugs.python.org/issue21654 closed by rhettinger #21661: setuptools documentation: typo http://bugs.python.org/issue21661 closed by python-dev #21662: datamodel documentation: fix typo and phrasing http://bugs.python.org/issue21662 closed by r.david.murray #21663: venv upgrade fails on Windows when copying TCL files http://bugs.python.org/issue21663 closed by python-dev From barry at python.org Fri Jun 6 18:10:37 2014 From: barry at python.org (Barry Warsaw) Date: Fri, 6 Jun 2014 12:10:37 -0400 Subject: [Python-Dev] asyncio/Tulip: use CPython as the new upstream In-Reply-To: <5391E28C.1000406@mrabarnett.plus.com> References: <5391E28C.1000406@mrabarnett.plus.com> Message-ID: <20140606121037.7b95fc3f@anarchist.wooz.org> On Jun 06, 2014, at 04:47 PM, MRAB wrote: >Isn't this a little like when bool, True and False were added to >Python 2.2.1, a bugfix release, an act that is, I believe, now regarded >as a mistake not to be repeated? Yes, that was a mistake, but the case under discussion is different. With True/False, it was a runtime-wide change that affected every Python program, and there was no such "special dispensation". -Barry From hrvoje.niksic at avl.com Fri Jun 6 18:11:56 2014 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Fri, 6 Jun 2014 18:11:56 +0200 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> <5391819E.3060300@avl.com> Message-ID: <5391E84C.1060100@avl.com> On 06/06/2014 05:59 PM, Terry Reedy wrote: > The other problem is that a small slice view of a large object keeps the > large object alive, so a view user needs to think carefully about > whether to make a copy or create a view, and later to copy views to > delete the base object. This is not for beginners. And this was important enough that Java 7 actually removed the long-standing feature of String.substring creating a string that shares the character array with the original. http://java-performance.info/changes-to-string-java-1-7-0_06/ From p.f.moore at gmail.com Fri Jun 6 18:16:07 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 6 Jun 2014 17:16:07 +0100 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On 6 June 2014 16:41, Steve Dower wrote: > Basically, what I am offering to do is: > > * Update the files in PCBuild to work with Visual Studio "14" > * Make any code changes necessary to build with VC14 > * Regularly test the latest Python source with the latest MSVC builds and report issues/suggestions to the MSVC team > * Keep all changes in a separate (public) repo until early next year when we're getting close to the final VS "14" release > > What I am asking anyone else to do is: > > * Nothing +1 from me. Paul From zachary.ware+pydev at gmail.com Fri Jun 6 18:22:47 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Fri, 6 Jun 2014 11:22:47 -0500 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On Fri, Jun 6, 2014 at 10:41 AM, Steve Dower wrote: > Thoughts/comments/concerns? My only concern is support for elderly versions of Windows, in particular: XP. I seem to recall the last "let's update our MSVC version" discussion dying off because of XP support. Even though MS has abandoned it, I'm not sure whether we can yet. If that's a non-issue, or if we can actually drop XP support, I'm all for it. -- Zach From pmiscml at gmail.com Fri Jun 6 18:25:03 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 6 Jun 2014 19:25:03 +0300 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: References: <20140604011718.GD10355@ando> <20140604174930.3a5af45f@x34f> <02b9a61658c04b11a21317da5b78bad6@BLUPR03MB389.namprd03.prod.outlook.com> <5391819E.3060300@avl.com> Message-ID: <20140606192503.6a22d236@x34f> Hello, On Fri, 06 Jun 2014 11:59:31 -0400 Terry Reedy wrote: [] > The other problem is that a small slice view of a large object keeps > the large object alive, so a view user needs to think carefully about > whether to make a copy or create a view, and later to copy views to > delete the base object. This is not for beginners. Yes, so it doesn't make sense to add such feature to any of existing APIs. However, as I pointed in another mail, it would make lot of sense to add iterator-based string API (because if dict methods were *switched* to iterators, why can't string have them *as alternative*), and for their return values, it would be ~ natural to return "string views", especially if it's clearly and explicitly described that if user wants to store them, they should be explicitly copied via str(view). One reason against this would be of course API bloat. But API bloat happens all the time, for example compare this modest proposal http://bugs.python.org/issue21180 with what's going to be actually implemented: http://legacy.python.org/dev/peps/pep-0467/#alternate-constructors . -- Best regards, Paul mailto:pmiscml at gmail.com From dw+python-dev at hmmz.org Fri Jun 6 18:37:01 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Fri, 6 Jun 2014 16:37:01 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <20140606163701.GA10004@k2> On Fri, Jun 06, 2014 at 03:41:22PM +0000, Steve Dower wrote: > [snip] Speaking as a third party who aims to provide binary distributions for recent Python releases on Windows, every new compiler introduces a licensing and configuration headache. So I guess the questions are: * Does the ABI stability address some historical real world problem with Python binary builds? (I guess possibly) * Is the existing solution of third parties building under e.g. Mingw as an option of last resort causing real world issues? It seems to work for a lot of people, although I personally avoid it. * Have other compiler vendors indicated they will change their ABI environment to match VS under this new stability guarantee? If not, then as yet there is no real world benefit here. * Has Python ever hit a showstopper release issue as a result of a bug in MSVC? (I guess probably not). * Will VS 14 be golden prior to Python 3.5's release? It would suck to rely on a beta compiler.. :) Sorry for dunking water on this, but I've recently spent a ton of time getting a Microsoft build environment running, and it seems possible a new compiler may not yet justify more effort if there is little tangible benefit. David From bcannon at gmail.com Fri Jun 6 18:37:30 2014 From: bcannon at gmail.com (Brett Cannon) Date: Fri, 06 Jun 2014 16:37:30 +0000 Subject: [Python-Dev] Division of tool labour in porting Python 2 code to 2/3 Message-ID: After Glyph and Alex's email about their asks for assisting in writing Python 2/3 code, it got me thinking about where in the toolchain various warnings and such should go in order to help direct energy to help develop whatever future toolchain to assist in porting. There seems to be three places where issues are/can be caught once a project has embarked down the road of 2/3 source compatibility: 1. -3 warnings 2. Some linter tool 3. Failing tests -3 warnings are things that we know are flat-out wrong and do not cause massive compatibility issues in the stdlib. For instance, warning that buffer() is not in Python 3 is a py3k warning -- Glyph made a mistake when he asked for it as a new warning -- is a perfect example of something that isn't excessively noisy and won't cause issues when people run with it. But what about warning about classic classes? The stdlib is full of them and they were purposefully left alone for compatibility reasons. But there is a subtle semantic difference between classic and new-style classes, and so 2/3 code should consider switching (this is when people chime in saying "this is why we want a 2.8 release!", but that still isn't happening). If this were made a py3k warning in 2.7 then the stdlib itself would spew out warnings which we can't change due to compatibility, so that makes it not useful (http://bugs.python.org/issue21231). But as part of a lint tool specific to Python 2.7 that kind of warning would not be an issue and is easily managed and integrated into CI setups to make sure there are no regressions. Lastly, there are things like string/unicode comparisons. http://bugs.python.org/issue21401 has a patch from VIctor which warns when comparing strings and unicode in Python 2.7. Much like the classic classes example, the stdlib becomes rather noisy due to APIs that handle either/or, etc. But unlike the classic classes example, you just can't systematically verify that two variables are always going to be str vs. unicode in Python 2.7 if they aren't literals. If people want to implement type constraint graphs for 2.7 code to help find them then that's great, but I personally don't have that kind of time. In this instance it would seem like relying on a project's unit tests to find this sort of problem is the best option. With those three levels in mind, where do we draw the line between these levels? Take for instance the print statement. Right now there is no warning with -3. Do we add one and then update the 2.7 stdlib to prevent warnings being generated by the stdlib? Or do we add it to some linter tool to pick up when people accidentally leave one in their code? The reason I ask is since this is clear I'm willing to spearhead the tooling work we talked about at the language summit to make sure there's a clear path for people wanting to port which is as easy as (reasonably) possible, but I don't want to start on it until I have a clear indication of what people are going to be okay with. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Fri Jun 6 18:57:43 2014 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 6 Jun 2014 18:57:43 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606163701.GA10004@k2> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <20140606163701.GA10004@k2> Message-ID: <20140606165743.GA11669@sleipnir.bytereef.org> dw+python-dev at hmmz.org wrote: > * Has Python ever hit a showstopper release issue as a result of a bug > in MSVC? (I guess probably not). Yes, a PGO issue: http://bugs.python.org/issue15993 To be fair, in that issue I did not look if there's some undefined behavior in longobject.c. > * Will VS 14 be golden prior to Python 3.5's release? It would suck to > rely on a beta compiler.. :) This is my only concern, too. Otherwise, +1 for the switch. Stefan Krah From rdmurray at bitdance.com Fri Jun 6 19:05:25 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 06 Jun 2014 13:05:25 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606163701.GA10004@k2> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <20140606163701.GA10004@k2> Message-ID: <20140606170526.6B4AA250DCD@webabinitio.net> On Fri, 06 Jun 2014 16:37:01 -0000, dw+python-dev at hmmz.org wrote: > On Fri, Jun 06, 2014 at 03:41:22PM +0000, Steve Dower wrote: > > > [snip] > > Speaking as a third party who aims to provide binary distributions for > recent Python releases on Windows, every new compiler introduces a > licensing and configuration headache. So I guess the questions are: > > * Does the ABI stability address some historical real world problem with > Python binary builds? (I guess possibly) > > * Is the existing solution of third parties building under e.g. Mingw as > an option of last resort causing real world issues? It seems to work > for a lot of people, although I personally avoid it. > > * Have other compiler vendors indicated they will change their ABI > environment to match VS under this new stability guarantee? If not, > then as yet there is no real world benefit here. > > * Has Python ever hit a showstopper release issue as a result of a bug > in MSVC? (I guess probably not). > > * Will VS 14 be golden prior to Python 3.5's release? It would suck to > rely on a beta compiler.. :) > > > Sorry for dunking water on this, but I've recently spent a ton of time > getting a Microsoft build environment running, and it seems possible a > new compiler may not yet justify more effort if there is little tangible > benefit. If I understand correctly (but I may not as I'm not a windows dev) we're going to want to switch VS versions for 3.5 anyway, so switching to the cutting edge one but where Steve can be and is willing to be in a tight feedback loop with the developers sounds like a win to me. --David From njs at pobox.com Fri Jun 6 18:55:04 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Jun 2014 17:55:04 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: <1842263445423761298.493568sturla.molden-gmail.com@news.gmane.org> References: <5391754D.8000607@googlemail.com> <1842263445423761298.493568sturla.molden-gmail.com@news.gmane.org> Message-ID: On 6 Jun 2014 17:07, "Sturla Molden" wrote: > We would in total need two mutexes, one condition variable, a pthread, and > a heap. The proposal in my initial email requires zero pthreads, and is substantially more effective. (Your proposal reduces only the alloc overhead for large arrays; mine reduces both alloc and memory access overhead for boyh large and small arrays.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jun 6 19:21:40 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 06 Jun 2014 19:21:40 +0200 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: <539126DB.8010306@canterbury.ac.nz> References: <539126DB.8010306@canterbury.ac.nz> Message-ID: <5391F8A4.70401@googlemail.com> On 06.06.2014 04:26, Greg Ewing wrote: > Nathaniel Smith wrote: > >> I'd be a little nervous about whether anyone has implemented, say, an >> iadd with side effects such that you can tell whether a copy was made, >> even if the object being copied is immediately destroyed. > > I can think of at least one plausible scenario where > this could occur: the operand is a view object that > wraps another object, and its __iadd__ method updates > that other object. > > In fact, now that I think about it, exactly this > kind of thing happens in numpy when you slice an > array! > > So the opt-in indicator would need to be dynamic, on > a per-object basis, rather than a type flag. > yes an opt-in indicator would need to receive both operand objects so it would need to be a slot in the object or number type object. Would the addition of a tp_can_elide slot to the object types be acceptable for this rather specialized case? tp_can_elide receives two objects and returns one of three values: * can work inplace, operation is associative * can work inplace but not associative * cannot work inplace Implementation could e.g. look about like this: TARGET(BINARY_SUBTRACT) { fl = left->obj_type->tp_can_elide fr = right->obj_type->tp_can_elide elide = 0 if (unlikely(fl)) { elide = fl(left, right) } else if (unlikely(fr)) { elide = fr(left, right) } if (unlikely(elide == YES) && left->refcnt == 1) { PyNumber_InPlaceSubtract(left, right) } else if (unlikely(elide == SWAPPABLE) && right->refcnt == 1) { PyNumber_InPlaceSubtract(right, left) } else { PyNumber_Subtract(left, right) } } From stefan at bytereef.org Fri Jun 6 19:24:40 2014 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 6 Jun 2014 19:24:40 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606165743.GA11669@sleipnir.bytereef.org> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <20140606163701.GA10004@k2> <20140606165743.GA11669@sleipnir.bytereef.org> Message-ID: <20140606172440.GA11927@sleipnir.bytereef.org> Stefan Krah wrote: > > * Will VS 14 be golden prior to Python 3.5's release? It would suck to > > rely on a beta compiler.. :) > > This is my only concern, too. Otherwise, +1 for the switch. One more thing: Will the SDK 64-bit tools be available for the Express Versions? Stefan Krah From brian at python.org Fri Jun 6 19:31:53 2014 From: brian at python.org (Brian Curtin) Date: Fri, 6 Jun 2014 21:31:53 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On Fri, Jun 6, 2014 at 8:22 PM, Zachary Ware wrote: > On Fri, Jun 6, 2014 at 10:41 AM, Steve Dower wrote: >> Thoughts/comments/concerns? > > My only concern is support for elderly versions of Windows, in > particular: XP. I seem to recall the last "let's update our MSVC > version" discussion dying off because of XP support. Even though MS > has abandoned it, I'm not sure whether we can yet. > > If that's a non-issue, or if we can actually drop XP support, I'm all for it. Extended support ended in April of this year, so I think we should put XP as unsupported for 3.5 in PEP 11 - http://legacy.python.org/dev/peps/pep-0011/ I seem to remember that we were waiting for this anyway. From Steve.Dower at microsoft.com Fri Jun 6 19:40:03 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Fri, 6 Jun 2014 17:40:03 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606163701.GA10004@k2> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <20140606163701.GA10004@k2> Message-ID: dw+python-dev at hmmz.org wrote: > Speaking as a third party who aims to provide binary distributions for recent > Python releases on Windows, every new compiler introduces a licensing and > configuration headache. So I guess the questions are: > > * Does the ABI stability address some historical real world problem with > Python binary builds? (I guess possibly) Yes. It's very hard to explain to users that even though they've gone out and paid for Visual Studio 2013 Ultimate, they don't really have a C compiler that works with Python. This stability will eventually get us to a place where it doesn't matter what version of the compiler you have, though for a while people will obviously need the latest. (Another thing I'm working on is making sure that it's really easy to get the latest... lots of pieces to this puzzle.) > * Is the existing solution of third parties building under e.g. Mingw as > an option of last resort causing real world issues? It seems to work > for a lot of people, although I personally avoid it. I think it actually tends to solve more issues than it causes :( I want to fix that by making MSVC better for Python, rather than switching away to another toolset. > * Have other compiler vendors indicated they will change their ABI > environment to match VS under this new stability guarantee? If not, > then as yet there is no real world benefit here. I have no idea, but I hope they do (eventually they almost certainly will). I've already mentioned to our team that they should reach out to the other projects and try to help them move it along, though I have no idea if they have the time or contacts to manage that. FWIW, the stability guarantee was only announced this week, so there's a good chance that the gcc/clang/etc. teams aren't even aware of it yet. > * Has Python ever hit a showstopper release issue as a result of a bug > in MSVC? (I guess probably not). Not to my knowledge, and I'm certainly hoping to avoid it by keeping the builds coming regularly. I can't do an official buildbot for it (and probably can't even reuse the infrastructure) since I'm going to work against the latest internal version as much as I can and we get new builds almost daily. More likely, building Python will reveal showstopper issues that actually get fixed (and it has done in the past, though that was never publicised :) ) > * Will VS 14 be golden prior to Python 3.5's release? It would suck to > rely on a beta compiler.. :) I sure hope so. The current planning looks like it will (I'm assuming that Python 3.5 is going to be late next year, but I couldn't find a good reference). If things slip here, I'm going to be surrounded by very stressed people, which is not much fun. So I hope it'll be done! At worst, VS 14 RC (or whatever label it gets) will probably be released under a "go live" licence. If anything is dramatically broken at that point, we'll know and it should be fixed, or we know that it's going to be around for a while regardless and we can make the decision to either stick with VC10 or work around the issues. > Sorry for dunking water on this, but I've recently spent a ton of time getting a > Microsoft build environment running, and it seems possible a new compiler may > not yet justify more effort if there is little tangible benefit. Not at all. I've spent far more time than I wanted to getting a build environment running for producing the Python 2.7 installers, and I spent just as long getting an environment for default going too. I'm personally a big fan of automating things like this, so you can also expect scripts (probably PowerShell) that will configure as much as possible. Cheers, Steve > David From Steve.Dower at microsoft.com Fri Jun 6 19:42:45 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Fri, 6 Jun 2014 17:42:45 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606172440.GA11927@sleipnir.bytereef.org> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <20140606163701.GA10004@k2> <20140606165743.GA11669@sleipnir.bytereef.org> <20140606172440.GA11927@sleipnir.bytereef.org> Message-ID: Stefan Krah wrote: >Stefan Krah wrote: >> > * Will VS 14 be golden prior to Python 3.5's release? It would suck to >> > rely on a beta compiler.. :) >> >> This is my only concern, too. Otherwise, +1 for the switch. > >One more thing: Will the SDK 64-bit tools be available for the Express Versions? They should be. If they're not, I'll certainly be making a noise about it (unless there's another, easier way to get the tools by then...) From Steve.Dower at microsoft.com Fri Jun 6 20:12:04 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Fri, 6 Jun 2014 18:12:04 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> Chris Angelico wrote: > On Sat, Jun 7, 2014 at 1:41 AM, Steve Dower wrote: >> What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. > > Oh, if only this had been available for 2.7!! Actually... this means that 14.0 would be a good target for a compiler change for 2.7.x, if such a change is ever acceptable. Maybe, but I doubt it will ever be acceptable :) > To what extent is this compatibility going to be maintained? Is there a guarantee that there'll be X versions (or X years) of cross-compilation support? There are a few breaking changes in this version that are designed to standardize on a function-call based ABI, which should effectively be a life-long guarantee. The only promise I can make is this: when cross-compilation support is eventually broken, it will be due to something that nobody has been able to predict up until now. (Hopefully that's better than promising that it will be broken in the very next release.) > ChrisA From rosuav at gmail.com Fri Jun 6 20:19:32 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 7 Jun 2014 04:19:32 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On Sat, Jun 7, 2014 at 4:12 AM, Steve Dower wrote: > Chris Angelico wrote: >> On Sat, Jun 7, 2014 at 1:41 AM, Steve Dower wrote: >>> What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. >> >> Oh, if only this had been available for 2.7!! Actually... this means that 14.0 would be a good target for a compiler change for 2.7.x, if such a change is ever acceptable. > > Maybe, but I doubt it will ever be acceptable :) Well, there were discussions. Since Python 2.7's support is far exceeding the Microsoft promise of support for the compiler it was built on, there's going to be a problem, one way or the other. I don't know how that's going to end up being resolved. ChrisA From brian at python.org Fri Jun 6 20:25:14 2014 From: brian at python.org (Brian Curtin) Date: Fri, 6 Jun 2014 22:25:14 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On Fri, Jun 6, 2014 at 10:19 PM, Chris Angelico wrote: > On Sat, Jun 7, 2014 at 4:12 AM, Steve Dower wrote: >> Chris Angelico wrote: >>> On Sat, Jun 7, 2014 at 1:41 AM, Steve Dower wrote: >>>> What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. >>> >>> Oh, if only this had been available for 2.7!! Actually... this means that 14.0 would be a good target for a compiler change for 2.7.x, if such a change is ever acceptable. >> >> Maybe, but I doubt it will ever be acceptable :) > > Well, there were discussions. Since Python 2.7's support is far > exceeding the Microsoft promise of support for the compiler it was > built on, there's going to be a problem, one way or the other. I don't > know how that's going to end up being resolved. We're going to have to change it at some point, otherwise we're going to have people in 2018 scrambling to find VS2008, which will be 35 versions too old by then. No matter what we do here, we're going to have a tough PR situation, but we have to make something workable. I'd rather cause a hassle than outright kill extensions. I would probably prefer we aim for VS 14 for 3.5, and then explore making the same change for the 2.7.x release that comes after 3.5.0 comes out. Lessons learned and all that. From tjreedy at udel.edu Fri Jun 6 20:28:29 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 06 Jun 2014 14:28:29 -0400 Subject: [Python-Dev] Division of tool labour in porting Python 2 code to 2/3 In-Reply-To: References: Message-ID: On 6/6/2014 12:37 PM, Brett Cannon wrote: > After Glyph and Alex's email about their asks for assisting in writing > Python 2/3 code, it got me thinking about where in the toolchain various > warnings and such should go in order to help direct energy to help > develop whatever future toolchain to assist in porting. > > There seems to be three places where issues are/can be caught once a > project has embarked down the road of 2/3 source compatibility: > > 1. -3 warnings > 2. Some linter tool > 3. Failing tests > > -3 warnings are things that we know are flat-out wrong and do not cause > massive compatibility issues in the stdlib. For instance, warning that > buffer() is not in Python 3 is a py3k warning -- Glyph made a mistake > when he asked for it as a new warning -- is a perfect example of > something that isn't excessively noisy and won't cause issues when > people run with it. > > But what about warning about classic classes? The stdlib is full of them > and they were purposefully left alone for compatibility reasons. But > there is a subtle semantic difference between classic and new-style > classes, A non-subtle difference is that old style classes do not have .__new__. I just ran into this when backporting an Idle test to 2.7. (I rewrote the test to avoid diverging the code). In retrospect, perhaps we should have added a global 'new-class future' -C switch, like -Q, and made sure that stdlib worked either way. People running 2and3 code could then run 2.x with the switch. Is is possible to add this now? > and so 2/3 code should consider switching I do not not understand what you mean without a switch or furture statement available to switch from old to new in 2.7. > (this is when people > chime in saying "this is why we want a 2.8 release!", but that still > isn't happening). If this were made a py3k warning in 2.7 then the > stdlib itself would spew out warnings which we can't change due to > compatibility, so that makes it not useful > (http://bugs.python.org/issue21231). Don't issue the warning if the class is in the stdlib. If the warning is issued *after* creating class C: f = C.__module__.__file__ if classic(C) and (not 'lib' in f or 'site_packages' in f): warn(...) On Windows, the directory is 'Lib'; I presume it is lowercased everywhere. If not, adjust. > But as part of a lint tool specific > to Python 2.7 that kind of warning would not be an issue and is easily > managed and integrated into CI setups to make sure there are no regressions. > > Lastly, there are things like string/unicode comparisons. > http://bugs.python.org/issue21401 has a patch from VIctor which warns > when comparing strings and unicode in Python 2.7. Much like the classic > classes example, the stdlib becomes rather noisy due to APIs that handle > either/or, etc. But unlike the classic classes example, you just can't > systematically verify that two variables are always going to be str vs. > unicode in Python 2.7 if they aren't literals. If people want to > implement type constraint graphs for 2.7 code to help find them then > that's great, but I personally don't have that kind of time. In this > instance it would seem like relying on a project's unit tests to find > this sort of problem is the best option. > > With those three levels in mind, where do we draw the line between these > levels? Take for instance the print statement. Right now there is no > warning with -3. Do we add one and then update the 2.7 stdlib to prevent > warnings being generated by the stdlib? Make conditional as with class. We *could* change 'print s' to the exactly equivalent 'print(s)' (perhaps half the cases); 'print r, s' to "print('%s %s' % (r,s)), 'print 'xxxx', y' to "print('xxxx %s' % y), and so on. However, 'print >>self.stdout, x', etc, does not translate to a pseudo-call. It would need transltion to "self.stdout.write(x+'\n')". Grepping 2.7.6 lib/*.py for print gives 1341 hits, with at least 1000 being actual print statements. > Or do we add it to some linter > tool to pick up when people accidentally leave one in their code? > The reason I ask is since this is clear I'm willing to spearhead the > tooling work we talked about at the language summit to make sure there's > a clear path for people wanting to port which is as easy as (reasonably) > possible, but I don't want to start on it until I have a clear > indication of what people are going to be okay with. -- Terry Jan Reedy From mal at egenix.com Fri Jun 6 20:41:16 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 06 Jun 2014 20:41:16 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <53920B4C.8020700@egenix.com> On 06.06.2014 20:25, Brian Curtin wrote: > On Fri, Jun 6, 2014 at 10:19 PM, Chris Angelico wrote: >> On Sat, Jun 7, 2014 at 4:12 AM, Steve Dower wrote: >>> Chris Angelico wrote: >>>> On Sat, Jun 7, 2014 at 1:41 AM, Steve Dower wrote: >>>>> What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. >>>> >>>> Oh, if only this had been available for 2.7!! Actually... this means that 14.0 would be a good target for a compiler change for 2.7.x, if such a change is ever acceptable. >>> >>> Maybe, but I doubt it will ever be acceptable :) >> >> Well, there were discussions. Since Python 2.7's support is far >> exceeding the Microsoft promise of support for the compiler it was >> built on, there's going to be a problem, one way or the other. I don't >> know how that's going to end up being resolved. > > We're going to have to change it at some point, otherwise we're going > to have people in 2018 scrambling to find VS2008, which will be 35 > versions too old by then. No matter what we do here, we're going to > have a tough PR situation, but we have to make something workable. I'd > rather cause a hassle than outright kill extensions. > > I would probably prefer we aim for VS 14 for 3.5, and then explore > making the same change for the 2.7.x release that comes after 3.5.0 > comes out. Lessons learned and all that. Are you sure that's an option ? Changing the compiler the stock Python from python.org is built with will most likely render existing Python extensions built for 2.7.x with x < (release that comes after 3.5.0) broken, so users and installation tools will end up having to pay close attention to the patch level version of Python they are using... which is something we wanted to avoid after we ran into this situation with 1.5.1 and 1.5.2 a few years ago. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 06 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-05-28: Released mxODBC.Connect 2.1.0 ... http://egenix.com/go56 2014-07-02: Python Meeting Duesseldorf ... 26 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brian at python.org Fri Jun 6 20:49:24 2014 From: brian at python.org (Brian Curtin) Date: Fri, 6 Jun 2014 22:49:24 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <53920B4C.8020700@egenix.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> Message-ID: On Fri, Jun 6, 2014 at 10:41 PM, M.-A. Lemburg wrote: > On 06.06.2014 20:25, Brian Curtin wrote: >> On Fri, Jun 6, 2014 at 10:19 PM, Chris Angelico wrote: >>> On Sat, Jun 7, 2014 at 4:12 AM, Steve Dower wrote: >>>> Chris Angelico wrote: >>>>> On Sat, Jun 7, 2014 at 1:41 AM, Steve Dower wrote: >>>>>> What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. >>>>> >>>>> Oh, if only this had been available for 2.7!! Actually... this means that 14.0 would be a good target for a compiler change for 2.7.x, if such a change is ever acceptable. >>>> >>>> Maybe, but I doubt it will ever be acceptable :) >>> >>> Well, there were discussions. Since Python 2.7's support is far >>> exceeding the Microsoft promise of support for the compiler it was >>> built on, there's going to be a problem, one way or the other. I don't >>> know how that's going to end up being resolved. >> >> We're going to have to change it at some point, otherwise we're going >> to have people in 2018 scrambling to find VS2008, which will be 35 >> versions too old by then. No matter what we do here, we're going to >> have a tough PR situation, but we have to make something workable. I'd >> rather cause a hassle than outright kill extensions. >> >> I would probably prefer we aim for VS 14 for 3.5, and then explore >> making the same change for the 2.7.x release that comes after 3.5.0 >> comes out. Lessons learned and all that. > > Are you sure that's an option ? Changing the compiler the stock > Python from python.org is built with will most likely render > existing Python extensions built for 2.7.x with x < (release that comes > after 3.5.0) broken, so users and installation tools will end up > having to pay close attention to the patch level version of Python > they are using... which is something we wanted to avoid after > we ran into this situation with 1.5.1 and 1.5.2 a few years ago. None of the options are particularly good, but yes, I think that's an option we have to consider. We're supporting 2.7.x for 6 more years on a compiler that is already 6 years old. Something less than awesome for everyone involved is going to have to happen to make that possible. From mal at egenix.com Fri Jun 6 20:52:49 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 06 Jun 2014 20:52:49 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> Message-ID: <53920E01.6080300@egenix.com> On 06.06.2014 20:49, Brian Curtin wrote: > On Fri, Jun 6, 2014 at 10:41 PM, M.-A. Lemburg wrote: >> On 06.06.2014 20:25, Brian Curtin wrote: >>> On Fri, Jun 6, 2014 at 10:19 PM, Chris Angelico wrote: >>>> On Sat, Jun 7, 2014 at 4:12 AM, Steve Dower wrote: >>>>> Chris Angelico wrote: >>>>>> On Sat, Jun 7, 2014 at 1:41 AM, Steve Dower wrote: >>>>>>> What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. >>>>>> >>>>>> Oh, if only this had been available for 2.7!! Actually... this means that 14.0 would be a good target for a compiler change for 2.7.x, if such a change is ever acceptable. >>>>> >>>>> Maybe, but I doubt it will ever be acceptable :) >>>> >>>> Well, there were discussions. Since Python 2.7's support is far >>>> exceeding the Microsoft promise of support for the compiler it was >>>> built on, there's going to be a problem, one way or the other. I don't >>>> know how that's going to end up being resolved. >>> >>> We're going to have to change it at some point, otherwise we're going >>> to have people in 2018 scrambling to find VS2008, which will be 35 >>> versions too old by then. No matter what we do here, we're going to >>> have a tough PR situation, but we have to make something workable. I'd >>> rather cause a hassle than outright kill extensions. >>> >>> I would probably prefer we aim for VS 14 for 3.5, and then explore >>> making the same change for the 2.7.x release that comes after 3.5.0 >>> comes out. Lessons learned and all that. >> >> Are you sure that's an option ? Changing the compiler the stock >> Python from python.org is built with will most likely render >> existing Python extensions built for 2.7.x with x < (release that comes >> after 3.5.0) broken, so users and installation tools will end up >> having to pay close attention to the patch level version of Python >> they are using... which is something we wanted to avoid after >> we ran into this situation with 1.5.1 and 1.5.2 a few years ago. > > None of the options are particularly good, but yes, I think that's an > option we have to consider. We're supporting 2.7.x for 6 more years on > a compiler that is already 6 years old. Something less than awesome > for everyone involved is going to have to happen to make that > possible. Perhaps we could combine this with the breakage that a Python 2.7.10 would introduce due to the two digit patch level release version ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 06 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-05-28: Released mxODBC.Connect 2.1.0 ... http://egenix.com/go56 2014-07-02: Python Meeting Duesseldorf ... 26 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dw+python-dev at hmmz.org Fri Jun 6 20:56:31 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Fri, 6 Jun 2014 18:56:31 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> Message-ID: <20140606185631.GA11094@k2> On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote: > None of the options are particularly good, but yes, I think that's an > option we have to consider. We're supporting 2.7.x for 6 more years on > a compiler that is already 6 years old. Surely that is infinitely less desirable than simply bumping the minor version? David From bcannon at gmail.com Fri Jun 6 21:03:58 2014 From: bcannon at gmail.com (Brett Cannon) Date: Fri, 06 Jun 2014 19:03:58 +0000 Subject: [Python-Dev] Division of tool labour in porting Python 2 code to 2/3 References: Message-ID: On Fri Jun 06 2014 at 2:29:13 PM, Terry Reedy wrote: > On 6/6/2014 12:37 PM, Brett Cannon wrote: > > After Glyph and Alex's email about their asks for assisting in writing > > Python 2/3 code, it got me thinking about where in the toolchain various > > warnings and such should go in order to help direct energy to help > > develop whatever future toolchain to assist in porting. > > > > There seems to be three places where issues are/can be caught once a > > project has embarked down the road of 2/3 source compatibility: > > > > 1. -3 warnings > > 2. Some linter tool > > 3. Failing tests > > > > -3 warnings are things that we know are flat-out wrong and do not cause > > massive compatibility issues in the stdlib. For instance, warning that > > buffer() is not in Python 3 is a py3k warning -- Glyph made a mistake > > when he asked for it as a new warning -- is a perfect example of > > something that isn't excessively noisy and won't cause issues when > > people run with it. > > > > But what about warning about classic classes? The stdlib is full of them > > and they were purposefully left alone for compatibility reasons. But > > there is a subtle semantic difference between classic and new-style > > classes, > > A non-subtle difference is that old style classes do not have .__new__. > I just ran into this when backporting an Idle test to 2.7. (I rewrote > the test to avoid diverging the code). In retrospect, perhaps we should > have added a global 'new-class future' -C switch, like -Q, and made sure > that stdlib worked either way. People running 2and3 code could then run > 2.x with the switch. Is is possible to add this now? > I consider changing the CLI out of bounds in a bugfix release as it's part of the API of Python. > > > and so 2/3 code should consider switching > > I do not not understand what you mean without a switch or furture > statement available to switch from old to new in 2.7. > Run a 2to3 fixer that changes all of their classes to new-style. > > > (this is when people > > chime in saying "this is why we want a 2.8 release!", but that still > > isn't happening). If this were made a py3k warning in 2.7 then the > > stdlib itself would spew out warnings which we can't change due to > > compatibility, so that makes it not useful > > (http://bugs.python.org/issue21231). > > Don't issue the warning if the class is in the stdlib. > If the warning is issued *after* creating class C: > > f = C.__module__.__file__ > if classic(C) and (not 'lib' in f or 'site_packages' in f): > warn(...) > > On Windows, the directory is 'Lib'; I presume it is lowercased > everywhere. If not, adjust. > That's just asking for trouble. I don't want to be import-dependent like that in the stdlib. > > > But as part of a lint tool specific > > to Python 2.7 that kind of warning would not be an issue and is easily > > managed and integrated into CI setups to make sure there are no > regressions. > > > > Lastly, there are things like string/unicode comparisons. > > http://bugs.python.org/issue21401 has a patch from VIctor which warns > > when comparing strings and unicode in Python 2.7. Much like the classic > > classes example, the stdlib becomes rather noisy due to APIs that handle > > either/or, etc. But unlike the classic classes example, you just can't > > systematically verify that two variables are always going to be str vs. > > unicode in Python 2.7 if they aren't literals. If people want to > > implement type constraint graphs for 2.7 code to help find them then > > that's great, but I personally don't have that kind of time. In this > > instance it would seem like relying on a project's unit tests to find > > this sort of problem is the best option. > > > > With those three levels in mind, where do we draw the line between these > > levels? Take for instance the print statement. Right now there is no > > warning with -3. Do we add one and then update the 2.7 stdlib to prevent > > warnings being generated by the stdlib? > > Make conditional as with class. > > We *could* change 'print s' to the exactly equivalent 'print(s)' > (perhaps half the cases); 'print r, s' to "print('%s %s' % (r,s)), > 'print 'xxxx', y' to "print('xxxx %s' % y), and so on. However, 'print > >>self.stdout, x', etc, does not translate to a pseudo-call. It would > need transltion to "self.stdout.write(x+'\n')". Grepping 2.7.6 lib/*.py > for print gives 1341 hits, with at least 1000 being actual print > statements. > Yep, which is why I don't want to do a 2to3 run on the stdlib to get rid of them. I also want to minimize conditional checks as it leads to potential issues of people thinking it's okay not to change things when there are in actually differences (e.g. I don't want to promote classic classes or native strings if it can be helped for the vast majority of users). -Brett > > > Or do we add it to some linter > > tool to pick up when people accidentally leave one in their code? > > > The reason I ask is since this is clear I'm willing to spearhead the > > tooling work we talked about at the language summit to make sure there's > > a clear path for people wanting to port which is as easy as (reasonably) > > possible, but I don't want to start on it until I have a clear > > indication of what people are going to be okay with. > > > -- > Terry Jan Reedy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Fri Jun 6 21:04:24 2014 From: brian at python.org (Brian Curtin) Date: Fri, 6 Jun 2014 23:04:24 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606185631.GA11094@k2> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> Message-ID: On Fri, Jun 6, 2014 at 10:56 PM, wrote: > On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote: > >> None of the options are particularly good, but yes, I think that's an >> option we have to consider. We're supporting 2.7.x for 6 more years on >> a compiler that is already 6 years old. > > Surely that is infinitely less desirable than simply bumping the minor > version? It's definitely not desirable, but "simply" bumping the minor version is not A Thing. From bcannon at gmail.com Fri Jun 6 21:05:05 2014 From: bcannon at gmail.com (Brett Cannon) Date: Fri, 06 Jun 2014 19:05:05 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> Message-ID: On Fri Jun 06 2014 at 2:59:24 PM, wrote: > On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote: > > > None of the options are particularly good, but yes, I think that's an > > option we have to consider. We're supporting 2.7.x for 6 more years on > > a compiler that is already 6 years old. > > Surely that is infinitely less desirable than simply bumping the minor > version? > Nope. A new minor release of Python is a massive undertaking which is why we have saved ourselves the hassle of doing a Python 2.8 or not giving a clear signal as to when Python 2.x will end as a language. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Fri Jun 6 21:08:19 2014 From: donald at stufft.io (Donald Stufft) Date: Fri, 6 Jun 2014 15:08:19 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> Message-ID: <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> On Jun 6, 2014, at 3:04 PM, Brian Curtin wrote: > On Fri, Jun 6, 2014 at 10:56 PM, wrote: >> On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote: >> >>> None of the options are particularly good, but yes, I think that's an >>> option we have to consider. We're supporting 2.7.x for 6 more years on >>> a compiler that is already 6 years old. >> >> Surely that is infinitely less desirable than simply bumping the minor >> version? > > It's definitely not desirable, but "simply" bumping the minor version > is not A Thing. Why? I mean even if it?s the same thing as 2.7 just with an updated compiler that seems like a better answer than having to deal with 2.7.whatever suddenly breaking all C exts. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From brian at python.org Fri Jun 6 21:09:56 2014 From: brian at python.org (Brian Curtin) Date: Fri, 6 Jun 2014 23:09:56 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> Message-ID: On Fri, Jun 6, 2014 at 11:08 PM, Donald Stufft wrote: > > On Jun 6, 2014, at 3:04 PM, Brian Curtin wrote: > >> On Fri, Jun 6, 2014 at 10:56 PM, wrote: >>> On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote: >>> >>>> None of the options are particularly good, but yes, I think that's an >>>> option we have to consider. We're supporting 2.7.x for 6 more years on >>>> a compiler that is already 6 years old. >>> >>> Surely that is infinitely less desirable than simply bumping the minor >>> version? >> >> It's definitely not desirable, but "simply" bumping the minor version >> is not A Thing. > > Why? I mean even if it?s the same thing as 2.7 just with an updated > compiler that seems like a better answer than having to deal with > 2.7.whatever suddenly breaking all C exts. Because then we have to maintain 2.8 at a time when no one even wants to maintain 2.7? From donald at stufft.io Fri Jun 6 21:11:46 2014 From: donald at stufft.io (Donald Stufft) Date: Fri, 6 Jun 2014 15:11:46 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> Message-ID: On Jun 6, 2014, at 3:09 PM, Brian Curtin wrote: > On Fri, Jun 6, 2014 at 11:08 PM, Donald Stufft wrote: >> >> On Jun 6, 2014, at 3:04 PM, Brian Curtin wrote: >> >>> On Fri, Jun 6, 2014 at 10:56 PM, wrote: >>>> On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote: >>>> >>>>> None of the options are particularly good, but yes, I think that's an >>>>> option we have to consider. We're supporting 2.7.x for 6 more years on >>>>> a compiler that is already 6 years old. >>>> >>>> Surely that is infinitely less desirable than simply bumping the minor >>>> version? >>> >>> It's definitely not desirable, but "simply" bumping the minor version >>> is not A Thing. >> >> Why? I mean even if it?s the same thing as 2.7 just with an updated >> compiler that seems like a better answer than having to deal with >> 2.7.whatever suddenly breaking all C exts. > > Because then we have to maintain 2.8 at a time when no one even wants > to maintain 2.7? Is it really any difference in maintenance if you just stop applying updates to 2.7 and switch to 2.8? If 2.8 is really just 2.7 with a new compiler then there should be no functional difference between doing that and doing a 2.7.whatever except all of the tooling that relies on the compiler not to change in micro releases won?t suddenly break and freak out. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From martin at v.loewis.de Fri Jun 6 21:20:04 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 06 Jun 2014 21:20:04 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <53921464.7030400@v.loewis.de> Am 06.06.14 17:41, schrieb Steve Dower: > Hi all > > I would like to propose moving Python 3.5 to use Visual C++ 14.0 as > the main compiler. This is fine with me, but I'm worried about the precise timing of doing so. I assume that you would plan to do this moving before VC++ 14 is actually released. This worries me for three reasons: 1. what is the availability of the compiler during the testing phase, and what will it be immediately after the testing ends (where traditionally people would have to buy licenses, or wait for VS Express to be released)? 2. what is the risk of installing a beta compiler on what might otherwise be a "production" developer system? In particular, could it interfere with other VS installations, and could it require a complete system reinstall when the final release of VC 14 is available? 3. what is the chance of the final release being delayed beyond the planned release date of Python 3.5? Microsoft has a bad track record of meeting release dates (or the tradition of not announcing any for that reason); the blog says that it will be available "sometime in 2015". Now, Python 3.5 might appear November 2015, so what do we do if VS 2015 is not released by the time 3.5b1 is planned? Regards, Martin From martin at v.loewis.de Fri Jun 6 21:22:11 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 06 Jun 2014 21:22:11 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <539214E3.5010308@v.loewis.de> Am 06.06.14 19:31, schrieb Brian Curtin: >> If that's a non-issue, or if we can actually drop XP support, I'm all for it. > > Extended support ended in April of this year, so I think we should put > XP as unsupported for 3.5 in PEP 11 - > http://legacy.python.org/dev/peps/pep-0011/ > > I seem to remember that we were waiting for this anyway. We don't actually need to explicitly put XP there, as PEP 11 ties our support to the Microsoft product life cycle. XP is not supported by Python anymore. Regards, Martin From rosuav at gmail.com Fri Jun 6 21:33:45 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 7 Jun 2014 05:33:45 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> Message-ID: On Sat, Jun 7, 2014 at 5:11 AM, Donald Stufft wrote: > Is it really any difference in maintenance if you just stop applying updates to > 2.7 and switch to 2.8? If 2.8 is really just 2.7 with a new compiler then there > should be no functional difference between doing that and doing a 2.7.whatever > except all of the tooling that relies on the compiler not to change in micro > releases won?t suddenly break and freak out. If the only difference between 2.7 and 2.8 is the compiler used on Windows, what happens on Linux and other platforms? A Python 2.8 would have to be materially different from Python 2.7, not just binarily incompatible on one platform. ChrisA From donald at stufft.io Fri Jun 6 21:36:59 2014 From: donald at stufft.io (Donald Stufft) Date: Fri, 6 Jun 2014 15:36:59 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> Message-ID: On Jun 6, 2014, at 3:33 PM, Chris Angelico wrote: > On Sat, Jun 7, 2014 at 5:11 AM, Donald Stufft wrote: >> Is it really any difference in maintenance if you just stop applying updates to >> 2.7 and switch to 2.8? If 2.8 is really just 2.7 with a new compiler then there >> should be no functional difference between doing that and doing a 2.7.whatever >> except all of the tooling that relies on the compiler not to change in micro >> releases won?t suddenly break and freak out. > > If the only difference between 2.7 and 2.8 is the compiler used on > Windows, what happens on Linux and other platforms? A Python 2.8 would > have to be materially different from Python 2.7, not just binarily > incompatible on one platform. > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io Well it?d contain bug fixes and whatever other sorts of things you?d put into a 2.7.whatever release. So they?d still want to upgrade to 2.8 since that?ll have bug fixes. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From martin at v.loewis.de Fri Jun 6 21:37:55 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 06 Jun 2014 21:37:55 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <53921893.4080200@v.loewis.de> Am 06.06.14 20:25, schrieb Brian Curtin: > We're going to have to change it at some point, otherwise we're going > to have people in 2018 scrambling to find VS2008, which will be 35 > versions too old by then. Not sure whether you picked 2018 deliberately: extended support for VS2008 Professional ends on April 10, 2018. In any case, the extension problem will occur regardless of what you do: - if you switch compilers within 2.7, applications may crash - if you switch compilers and declare it 2.8, you extensions might not be available precompiled for some time (in particular if the developers of some package have abandoned 2.7) - if you don't switch compilers, availability of the tool chain will be terrible. Regards, Martin From rosuav at gmail.com Fri Jun 6 21:46:48 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 7 Jun 2014 05:46:48 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> Message-ID: On Sat, Jun 7, 2014 at 5:36 AM, Donald Stufft wrote: > Well it?d contain bug fixes and whatever other sorts of things you?d put > into a 2.7.whatever release. So they?d still want to upgrade to 2.8 since > that?ll have bug fixes. But it's not a potentially-breaking change. For example, on Debian Wheezy, there are a huge number of packages that depend on "python (<< 2.8)", because they expect Python 2.7 and *not* Python 2.8. A newer version 2.7 will satisfy that; a version 2.8 won't, because it's entirely possible that 2.8 will have something that's significantly different. That's what version numbers mean; Python follows the standard three-part convention, where you upgrade automatically only within the last part of the number. ChrisA From guido at python.org Fri Jun 6 21:46:50 2014 From: guido at python.org (Guido van Rossum) Date: Fri, 6 Jun 2014 12:46:50 -0700 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <53921893.4080200@v.loewis.de> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53921893.4080200@v.loewis.de> Message-ID: A reminder: https://lh5.googleusercontent.com/-d4rF0qJPskQ/U0qpNjP5GoI/AAAAAAAAPW0/4RF_7zy3esY/w1118-h629-no/Python28.jpg -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Jun 6 22:13:30 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 6 Jun 2014 21:13:30 +0100 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <53921464.7030400@v.loewis.de> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <53921464.7030400@v.loewis.de> Message-ID: On 6 June 2014 20:20, "Martin v. L?wis" wrote: > 2. what is the risk of installing a beta compiler on what might > otherwise be a "production" developer system? In particular, could > it interfere with other VS installations, and could it require a > complete system reinstall when the final release of VC 14 is > available? >From http://www.visualstudio.com/en-us/downloads/visual-studio-14-ctp-vs """ Currently, Visual Studio "14" CTPs have known compatibility issues with previous releases of Visual Studio and should not be installed side-by-side on the same computer. """ It also states that installing the CTP on a PC puts that PC into "Unsupported" state. Paul From martin at v.loewis.de Fri Jun 6 22:12:34 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 06 Jun 2014 22:12:34 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <53921464.7030400@v.loewis.de> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <53921464.7030400@v.loewis.de> Message-ID: <539220B2.4030901@v.loewis.de> Am 06.06.14 21:20, schrieb "Martin v. L?wis": > 2. what is the risk of installing a beta compiler on what might > otherwise be a "production" developer system? In particular, could > it interfere with other VS installations, and could it require a > complete system reinstall when the final release of VC 14 is > available? I found an official answer here: http://www.visualstudio.com/en-us/downloads/visual-studio-14-ctp-vs "Installing a CTP release will place a computer in an unsupported state. For that reason, we recommend only installing CTP releases in a virtual machine, or on a computer that is available for reformatting." So there is no promise that you will not need to reformat the system during the evolution of the compiler. Regards, Martin From dw+python-dev at hmmz.org Fri Jun 6 21:42:52 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Fri, 6 Jun 2014 19:42:52 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> Message-ID: <20140606194252.GA11482@k2> On Sat, Jun 07, 2014 at 05:33:45AM +1000, Chris Angelico wrote: > > Is it really any difference in maintenance if you just stop applying > > updates to 2.7 and switch to 2.8? If 2.8 is really just 2.7 with a > > new compiler then there should be no functional difference between > > doing that and doing a 2.7.whatever except all of the tooling that > > relies on the compiler not to change in micro releases won?t > > suddenly break and freak out. > If the only difference between 2.7 and 2.8 is the compiler used on > Windows, what happens on Linux and other platforms? A Python 2.8 would > have to be materially different from Python 2.7, not just binarily > incompatible on one platform. Grrmph, that's fair. Perhaps a final alternative is simply continuing the 2.7 series with a stale compiler, as a kind of carrot on a stick to encourage users to upgrade? Gating 2.7 life on the natural decline of its supported compiler/related ecosystem seems somehow quite a gradual and natural demise.. :) David From martin at v.loewis.de Fri Jun 6 22:23:06 2014 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 06 Jun 2014 22:23:06 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <53921464.7030400@v.loewis.de> Message-ID: <5392232A.2000102@v.loewis.de> Am 06.06.14 22:13, schrieb Paul Moore: > From http://www.visualstudio.com/en-us/downloads/visual-studio-14-ctp-vs > > """ > Currently, Visual Studio "14" CTPs have known compatibility issues > with previous releases of Visual Studio and should not be installed > side-by-side on the same computer. > """ I also found http://support.microsoft.com/kb/2967191 which is more specific about this issue: '''There are known issues when you install Visual Studio "14" CTP 14.0.21730.1 DP on the same computer as Visual Studio 2013. While we expect that an uninstallation of Visual Studio "14" and then a repair of Visual Studio 2013 should fix these issues, our safest recommendation is to install Visual Studio "14" in a VM, a VHD, a fresh computer, or another non-production test-only computer that does not have Visual Studio 2013 on it. All of these Visual Studio side-by-side issues are expected to be fixed soon. There is an installation block in this Visual Studio "14" CTP that will prevent installation on a computer where an earlier version of Visual Studio is already installed. To disable the block that will put the computer in an un-recommended state, add the value "BlockerOverride" to the registry: HKLM\SOFTWARE\Microsoft\DevDiv\vs\Servicing''' So it seems to me that switching to VS 14 at this point in time is not possible. Of course, Steve could certainly maintain a Mercurial clone in his hg.python.org sandbox that has all the necessary changes done, so people won't have to redo the porting over and over. Regards, Marti From rosuav at gmail.com Fri Jun 6 22:23:40 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 7 Jun 2014 06:23:40 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606194252.GA11482@k2> References: <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> Message-ID: On Sat, Jun 7, 2014 at 5:42 AM, wrote: > Perhaps a final alternative is simply continuing > the 2.7 series with a stale compiler, as a kind of carrot on a stick to > encourage users to upgrade? More likely, what would happen is that there'd be an alternate distribution of Python 2.7 (eg ActiveState), which would be language-compatible with python.org 2.7, but built with a different compiler, and therefore unable to use extensions built for python.org's 2.7. One way or another, pain will happen. ChrisA From brian at python.org Fri Jun 6 22:28:10 2014 From: brian at python.org (Brian Curtin) Date: Sat, 7 Jun 2014 00:28:10 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <20140606194252.GA11482@k2> References: <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> Message-ID: On Fri, Jun 6, 2014 at 11:42 PM, wrote: > On Sat, Jun 07, 2014 at 05:33:45AM +1000, Chris Angelico wrote: > >> > Is it really any difference in maintenance if you just stop applying >> > updates to 2.7 and switch to 2.8? If 2.8 is really just 2.7 with a >> > new compiler then there should be no functional difference between >> > doing that and doing a 2.7.whatever except all of the tooling that >> > relies on the compiler not to change in micro releases won?t >> > suddenly break and freak out. > >> If the only difference between 2.7 and 2.8 is the compiler used on >> Windows, what happens on Linux and other platforms? A Python 2.8 would >> have to be materially different from Python 2.7, not just binarily >> incompatible on one platform. > > Grrmph, that's fair. Perhaps a final alternative is simply continuing > the 2.7 series with a stale compiler, as a kind of carrot on a stick to > encourage users to upgrade? Gating 2.7 life on the natural decline of > its supported compiler/related ecosystem seems somehow quite a gradual > and natural demise.. :) Adding features into 3.x is already not enough of a carrot on the stick for many users. Intentionally leaving 2.7 on a dead compiler is like beating them with the stick. From jurko.gospodnetic at pke.hr Fri Jun 6 22:28:01 2014 From: jurko.gospodnetic at pke.hr (=?UTF-8?B?SnVya28gR29zcG9kbmV0acSH?=) Date: Fri, 06 Jun 2014 22:28:01 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53921893.4080200@v.loewis.de> Message-ID: Hi. On 6.6.2014. 21:46, Guido van Rossum wrote: > A reminder: > https://lh5.googleusercontent.com/-d4rF0qJPskQ/U0qpNjP5GoI/AAAAAAAAPW0/4RF_7zy3esY/w1118-h629-no/Python28.jpg *ROFL* Subtle, ain't he? *gdr* Best regards, Jurko Gospodneti? From timothy.c.delaney at gmail.com Fri Jun 6 23:35:59 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sat, 7 Jun 2014 07:35:59 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140606175217.766b781c@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140606121306.06783df6@x34f> <8761ke2syo.fsf@uwakimon.sk.tsukuba.ac.jp> <20140606143401.79a7b0ee@x34f> <20140606175217.766b781c@x34f> Message-ID: On 7 June 2014 00:52, Paul Sokolovsky wrote: > > At heart, this is exactly what the Python 3 "str" type is. The > > universal convention is "code points". > > Yes. Except for one small detail - Python3 specifies these code points > to be Unicode code points. And Unicode is a very bloated thing. > > But if we drop that "Unicode" stipulation, then it's also exactly what > MicroPython implements. Its "str" type consists of codepoints, we don't > have pet names for them yet, like Unicode does, but their numeric > values are 0-255. Note that it in no way limits encodings, characters, > or scripts which can be used with MicroPython, because just like > Unicode, it support concept of "surrogate pairs" (but we don't call it > like that) - specifically, smaller code points may comprise bigger > groupings. But unlike Unicode, we don't stipulate format, value or > other constraints on how these "surrogate pairs"-alikes are formed, > leaving that to users. I think you've missed my point. There is absolutely nothing conceptually bloaty about what a Python 3 string is. It's just like a 7-bit ASCII string, except each entry can be from a larger table. When you index into a Python 3 string, you get back exactly *one valid entry* from the Unicode code point table. That plus the length of the string, plus the guarantee of immutability gives everything needed to layer the rest of the string functionality on top. There are no surrogate pairs - each code point is standalone (unlike code *units*). It is conceptually very simple. The implementation may be difficult (if you're trying to do better than 4 bytes per code point) but the concept is dead simple. If the MicroPython string type requires people *using* it to deal with surrogates (i.e. indexing could return a value that is not a valid Unicode code point) then it will have broken the conceptual simplicity of the Python 3 string type (and most certainly can't be considered in any way compatible). Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jun 7 00:33:29 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 6 Jun 2014 22:33:29 +0000 (UTC) Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes References: <5391754D.8000607@googlemail.com> <1842263445423761298.493568sturla.molden-gmail.com@news.gmane.org> Message-ID: <1064279801423785627.049284sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > The proposal in my initial email requires zero pthreads, and is > substantially more effective. (Your proposal reduces only the alloc > overhead for large arrays; mine reduces both alloc and memory access > overhead for boyh large and small arrays.) My suggestion prevents the kernel from zeroing pages in the middle of a computation, which is an important part. It would also be an optimiation the Python interpreter could benefit from indepently of NumPy, by allowing reuse of allocated memory pages within CPU bound portions of the Python code. And no, the method I suggested does not only work for large arrays. If we really want to take out the memory access overhead, we need to consider lazy evaluation. E.g. a context manager that collects a symbolic expression and triggers evaluation on exit: with numpy.accelerate: x = y = z = # evaluation of x,y,z happens here Sturla From sturla.molden at gmail.com Sat Jun 7 00:43:34 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 6 Jun 2014 22:43:34 +0000 (UTC) Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> Message-ID: <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Brett Cannon wrote: > Nope. A new minor release of Python is a massive undertaking which is why > we have saved ourselves the hassle of doing a Python 2.8 or not giving a > clear signal as to when Python 2.x will end as a language. Why not just define Python 2.8 as Python 2.7 except with a newer compiler? I cannot see why that would be massive undertaking, if changing compiler for 2.7 is neccesary anyway. Sturla From sturla.molden at gmail.com Sat Jun 7 01:01:31 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 6 Jun 2014 23:01:31 +0000 (UTC) Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler References: <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> Message-ID: <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> Brian Curtin wrote: > Adding features into 3.x is already not enough of a carrot on the > stick for many users. Intentionally leaving 2.7 on a dead compiler is > like beating them with the stick. Those who want to build extensions on Windows will just use MinGW (currently GCC 2.8.2) instead. NumPy and SciPy are planning a switch to a GCC based toolchain with static linkage of the MinGW runtime on Windows. It is carefully configured to be binary compatible with VS2008 on Python 2.7. The major reason for this is to use gfortran also on Windows. But the result will be a GCC based toolchain that anyone can use to build extensions on Windows. Sturla From Steve.Dower at microsoft.com Sat Jun 7 01:01:53 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Fri, 6 Jun 2014 23:01:53 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <5392232A.2000102@v.loewis.de> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <53921464.7030400@v.loewis.de> <5392232A.2000102@v.loewis.de> Message-ID: <438e8a27e8e643f4841a22b24447b956@BLUPR03MB389.namprd03.prod.outlook.com> Martin v. L?wis wrote: > Am 06.06.14 22:13, schrieb Paul Moore: >> From >> http://www.visualstudio.com/en-us/downloads/visual-studio-14-ctp-vs >> >> """ >> Currently, Visual Studio "14" CTPs have known compatibility issues >> with previous releases of Visual Studio and should not be installed >> side-by-side on the same computer. >> """ > > I also found > > http://support.microsoft.com/kb/2967191 > > which is more specific about this issue: > > '''There are known issues when you install Visual Studio "14" CTP > 14.0.21730.1 DP on the same computer as Visual Studio 2013. While we expect that > an uninstallation of Visual Studio "14" and then a repair of Visual Studio 2013 > should fix these issues, our safest recommendation is to install Visual Studio > "14" in a VM, a VHD, a fresh computer, or another non-production test-only > computer that does not have Visual Studio 2013 on it. All of these Visual Studio > side-by-side issues are expected to be fixed soon. Somebody ran a test to see how well the install/uninstall/repair scenario works, and it isn't that great. There are a lot of teams who contribute to Visual Studio, and not all of them have updated their installers yet (my team included...). Unfortunately, it all happened too close to the release to fix it for this version, hence the recommendation. Eventually, VS 14 will be safe to install side-by-side with earlier versions. Chances are it is safe enough with VS 2010 or VS 2012 - it's the one-version-prior that's causing the most trouble. > There is an installation block in this Visual Studio "14" CTP that will prevent > installation on a computer where an earlier version of Visual Studio is already > installed. To disable the block that will put the computer in an un-recommended > state, add the value "BlockerOverride" to the registry: > HKLM\SOFTWARE\Microsoft\DevDiv\vs\Servicing''' > > So it seems to me that switching to VS 14 at this point in time is not possible. > > Of course, Steve could certainly maintain a Mercurial clone in his hg.python.org > sandbox that has all the necessary changes done, so people won't have to redo > the porting over and over. That's what I had in mind. [Earlier post] > 1. what is the availability of the compiler during the testing phase, > and what will it be immediately after the testing ends (where > traditionally people would have to buy licenses, or wait for VS > Express to be released)? It's freely available now as part of Visual Studio, and all the pre-release releases will include everything. The last release (RC or whatever they decide to call it this time) should have a go-live license, though it will also be time bombed. I believe Express will be released at the same time as the paid versions. > 2. what is the risk of installing a beta compiler on what might > otherwise be a "production" developer system? In particular, could > it interfere with other VS installations, and could it require a > complete system reinstall when the final release of VC 14 is > available? Answered above. It's as risky as it always is, though as I mentioned, VC 14 may well be fine against VC 10. Build-to-build upgrades may not be supported between pre-release versions, but typically RC to RTM upgrades are supported. > 3. what is the chance of the final release being delayed beyond > the planned release date of Python 3.5? Microsoft has a bad > track record of meeting release dates (or the tradition of not > announcing any for that reason); the blog says that it will > be available "sometime in 2015". Now, Python 3.5 might appear > November 2015, so what do we do if VS 2015 is not released > by the time 3.5b1 is planned? We keep the VS 2010 files around and make sure they keep working. This is the biggest risk of the whole plan, but I believe that there's enough of a gap between when VS 14 is planned to release (which I know, but can't share) and when Python 3.5 is planned (which I don't know, but have a semi-informed guess). Is Python 3.5b1 being built with VS 14 RC (hypothetically) a blocking issue? Do we need to resolve that now or can it wait until it happens? > Regards, > Martin Cheers, Steve From brian at python.org Sat Jun 7 01:05:52 2014 From: brian at python.org (Brian Curtin) Date: Sat, 7 Jun 2014 03:05:52 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> References: <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> Message-ID: On Jun 6, 2014 6:01 PM, "Sturla Molden" wrote: > > Brian Curtin wrote: > > > Adding features into 3.x is already not enough of a carrot on the > > stick for many users. Intentionally leaving 2.7 on a dead compiler is > > like beating them with the stick. > > Those who want to build extensions on Windows will just use MinGW > (currently GCC 2.8.2) instead. Well we're certainly not going to assume such a thing. I know people do that, but many don't (I never have). -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jun 7 01:32:50 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 6 Jun 2014 23:32:50 +0000 (UTC) Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler References: <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> Message-ID: <1817978993423789902.560333sturla.molden-gmail.com@news.gmane.org> Brian Curtin wrote: > Well we're certainly not going to assume such a thing. I know people do > that, but many don't (I never have). If Python 2.7 users are left with a dead compiler on Windows, they will find a solution. For example, Enthought is already bundling their Python distribution with gcc 2.8.1 on Windows. Sturla From njs at pobox.com Sat Jun 7 00:47:49 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Jun 2014 23:47:49 +0100 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, Jun 6, 2014 at 11:43 PM, Sturla Molden wrote: > Brett Cannon wrote: > >> Nope. A new minor release of Python is a massive undertaking which is why >> we have saved ourselves the hassle of doing a Python 2.8 or not giving a >> clear signal as to when Python 2.x will end as a language. > > Why not just define Python 2.8 as Python 2.7 except with a newer compiler? > I cannot see why that would be massive undertaking, if changing compiler > for 2.7 is neccesary anyway. This would require recompiling all packages on OS X and Linux, even though nothing had changed. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Sat Jun 7 00:53:25 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Jun 2014 23:53:25 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: <1064279801423785627.049284sturla.molden-gmail.com@news.gmane.org> References: <5391754D.8000607@googlemail.com> <1842263445423761298.493568sturla.molden-gmail.com@news.gmane.org> <1064279801423785627.049284sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, Jun 6, 2014 at 11:33 PM, Sturla Molden wrote: > Nathaniel Smith wrote: > >> The proposal in my initial email requires zero pthreads, and is >> substantially more effective. (Your proposal reduces only the alloc >> overhead for large arrays; mine reduces both alloc and memory access >> overhead for boyh large and small arrays.) > > My suggestion prevents the kernel from zeroing pages in the middle of a > computation, which is an important part. It would also be an optimiation > the Python interpreter could benefit from indepently of NumPy, by allowing > reuse of allocated memory pages within CPU bound portions of the Python > code. And no, the method I suggested does not only work for large arrays. Small allocations are already recycled within process and don't touch the kernel, so your method doesn't affect them at all. My guess is that PyMalloc is unlikely to start spawning background threads any time soon, but if you'd like to propose it maybe you should start a new thread for that? > If we really want to take out the memory access overhead, we need to > consider lazy evaluation. E.g. a context manager that collects a symbolic > expression and triggers evaluation on exit: > > with numpy.accelerate: > x = > y = > z = > # evaluation of x,y,z happens here Using an alternative evaluation engine is indeed another way to optimize execution, which is why projects like numexpr, numba, theano, etc. exist. But this is basically switching to a different language in a different VM. I think people will keep running plain-old-CPython code for some time yet, and the point of this thread is that there's some low-hanging fruit for making all *that* code faster. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From eliben at gmail.com Sat Jun 7 01:45:06 2014 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 6 Jun 2014 16:45:06 -0700 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <1817978993423789902.560333sturla.molden-gmail.com@news.gmane.org> References: <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> <1817978993423789902.560333sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, Jun 6, 2014 at 4:32 PM, Sturla Molden wrote: > Brian Curtin wrote: > > > Well we're certainly not going to assume such a thing. I know people do > > that, but many don't (I never have). > > If Python 2.7 users are left with a dead compiler on Windows, they will > find a solution. For example, Enthought is already bundling their Python > distribution with gcc 2.8.1 on Windows. > While we're at it, Clang in nearing a stage where it can compile C and C++ on Windows *with ABI-compatibility to MSVC* (yes, even C++) -- see http://clang.llvm.org/docs/MSVCCompatibility.html for more details. Could this help? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jun 7 02:05:44 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 7 Jun 2014 00:05:44 +0000 (UTC) Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler References: <20140606194252.GA11482@k2> <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> <1817978993423789902.560333sturla.molden-gmail.com@news.gmane.org> Message-ID: <468120726423791445.579744sturla.molden-gmail.com@news.gmane.org> Eli Bendersky wrote: > While we're at it, Clang in nearing a stage where it can compile C and C++ > on Windows *with ABI-compatibility to MSVC* (yes, even C++) -- see > href="http://clang.llvm.org/docs/MSVCCompatibility.html">http://clang.llvm.org/docs/MSVCCompatibility.html > for more details. Could > this help? Possibly. "cl-clang" is exciting and I hope distutils will support it one day. Clang is not well known among Windows users as it is among users of "Unix" (Apple, Linux, FreeBSD, et al.) It would be even better if Python were bundled with Clang on Windows. The MinGW-based "SciPy toolchain" has ABI compatibility with MSVC only for C (and Fortran), not C++. Differences from vanilla MinGW is mainly static linkage of the MinGW runtime, different stack alignment (4 bytes instead of 16), and it links with msvcr91.dll instead of msvcrt.dll. Sturla From greg.ewing at canterbury.ac.nz Sat Jun 7 02:37:14 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 Jun 2014 12:37:14 +1200 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: <5391F8A4.70401@googlemail.com> References: <539126DB.8010306@canterbury.ac.nz> <5391F8A4.70401@googlemail.com> Message-ID: <53925EBA.4050306@canterbury.ac.nz> Julian Taylor wrote: > tp_can_elide receives two objects and returns one of three values: > * can work inplace, operation is associative > * can work inplace but not associative > * cannot work inplace Does it really need to be that complicated? Isn't it sufficient just to ask the object potentially being overwritten whether it's okay to overwrite it? I.e. a parameterless method returning a boolean. -- Greg From ncoghlan at gmail.com Sat Jun 7 02:37:28 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jun 2014 10:37:28 +1000 Subject: [Python-Dev] Internal representation of strings and Micropython In-Reply-To: <20140606175217.766b781c@x34f> References: <20140604011718.GD10355@ando> <20140604183831.7226448c@x34f> <20140604200520.1d432329@x34f> <538FB4F5.9070500@canterbury.ac.nz> <20140605041913.14886264@x34f> <87oay73i70.fsf@uwakimon.sk.tsukuba.ac.jp> <20140605142528.39e0e5fc@x34f> <20140605150121.286032df@x34f> <20140606121306.06783df6@x34f> <8761ke2syo.fsf@uwakimon.sk.tsukuba.ac.jp> <20140606143401.79a7b0ee@x34f> <20140606175217.766b781c@x34f> Message-ID: On 7 Jun 2014 00:53, "Paul Sokolovsky" wrote: > > Yes. Except for one small detail - Python3 specifies these code points > to be Unicode code points. And Unicode is a very bloated thing. I rather suspect users of East Asian & African scripts might have a different notion of what constitutes "bloated" vs "can actually represent this language properly, unlike 8-bit code spaces". > But if we drop that "Unicode" stipulation, then it's also exactly what > MicroPython implements. Its "str" type consists of codepoints, we don't > have pet names for them yet, like Unicode does, but their numeric > values are 0-255. Note that it in no way limits encodings, characters, > or scripts which can be used with MicroPython, because just like > Unicode, it support concept of "surrogate pairs" (but we don't call it > like that) - specifically, smaller code points may comprise bigger > groupings. But unlike Unicode, we don't stipulate format, value or > other constraints on how these "surrogate pairs"-alikes are formed, > leaving that to users. This is effectively what the Python 2 str type does, and it's a recipe for data driven latent defects. You inevitably end up concatenating strings using different code spaces, or else splitting strings between surrogate pairs rather than on the proper boundaries, etc. The abstraction presented to users by the str type *must* be the full range of Unicode code points as atomic units. Storing those internally as UTF-8 rather than as fixed width code points as CPython does is an experiment worth trying, since you don't have the same C level backwards compatibility constraints we do. But limiting the str type to a single code page per process is not an acceptable constraint in a Python 3 implementation. Regards, Nick. > > > -- > Best regards, > Paul mailto:pmiscml at gmail.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Jun 7 03:05:53 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 06 Jun 2014 21:05:53 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: On 6/6/2014 6:47 PM, Nathaniel Smith wrote: > On Fri, Jun 6, 2014 at 11:43 PM, Sturla Molden wrote: >> Brett Cannon wrote: >> >>> Nope. A new minor release of Python is a massive undertaking which is why >>> we have saved ourselves the hassle of doing a Python 2.8 or not giving a >>> clear signal as to when Python 2.x will end as a language. >> >> Why not just define Python 2.8 as Python 2.7 except with a newer compiler? >> I cannot see why that would be massive undertaking, if changing compiler >> for 2.7 is neccesary anyway. > > This would require recompiling all packages on OS X and Linux, even > though nothing had changed. If you are suggesting that a Windows compiler change should be invisible to non-Windows users, I agree. Let us assume that /pcbuild remains for those who have vc2008 and that /pcbuild14 is added (and everything else remains as is). Then the only other thing that would change is the Windows installer released on Python.org. Call than 2.7.9W or whatever on the download site and interactive startup message to signal that something is different. -- Terry Jan Reedy From brian at python.org Sat Jun 7 03:09:36 2014 From: brian at python.org (Brian Curtin) Date: Sat, 7 Jun 2014 05:09:36 +0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <1817978993423789902.560333sturla.molden-gmail.com@news.gmane.org> References: <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> <1817978993423789902.560333sturla.molden-gmail.com@news.gmane.org> Message-ID: On Jun 6, 2014 6:33 PM, "Sturla Molden" wrote: > > Brian Curtin wrote: > > > Well we're certainly not going to assume such a thing. I know people do > > that, but many don't (I never have). > > If Python 2.7 users are left with a dead compiler on Windows, they will > find a solution. For example, Enthought is already bundling their Python > distribution with gcc 2.8.1 on Windows. Again, not something I think we should depend on. A lot of people use python.org installers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sat Jun 7 03:13:32 2014 From: donald at stufft.io (Donald Stufft) Date: Fri, 6 Jun 2014 21:13:32 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: <4A7F1D64-2F36-428E-9682-9861B05DEFAD@stufft.io> On Jun 6, 2014, at 9:05 PM, Terry Reedy wrote: > On 6/6/2014 6:47 PM, Nathaniel Smith wrote: >> On Fri, Jun 6, 2014 at 11:43 PM, Sturla Molden wrote: >>> Brett Cannon wrote: >>> >>>> Nope. A new minor release of Python is a massive undertaking which is why >>>> we have saved ourselves the hassle of doing a Python 2.8 or not giving a >>>> clear signal as to when Python 2.x will end as a language. >>> >>> Why not just define Python 2.8 as Python 2.7 except with a newer compiler? >>> I cannot see why that would be massive undertaking, if changing compiler >>> for 2.7 is neccesary anyway. >> >> This would require recompiling all packages on OS X and Linux, even >> though nothing had changed. > > If you are suggesting that a Windows compiler change should be invisible to non-Windows users, I agree. > > Let us assume that /pcbuild remains for those who have vc2008 and that /pcbuild14 is added (and everything else remains as is). Then the only other thing that would change is the Windows installer released on Python.org. Call than 2.7.9W or whatever on the download site and interactive startup message to signal that something is different. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io How are packaging tools supposed to cope with this? AFAIK there is nothing in most of them to deal with a X.Y.Z release suddenly dealing with a different compiler. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From sturla.molden at gmail.com Sat Jun 7 03:35:40 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 7 Jun 2014 01:35:40 +0000 (UTC) Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler References: <20140606194252.GA11482@k2> <830540067423787719.509957sturla.molden-gmail.com@news.gmane.org> <1817978993423789902.560333sturla.molden-gmail.com@news.gmane.org> Message-ID: <495727245423796617.857480sturla.molden-gmail.com@news.gmane.org> Brian Curtin wrote: >> If Python 2.7 users are left with a dead compiler on Windows, they will >> find a solution. For example, Enthought is already bundling their Python >> distribution with gcc 2.8.1 on Windows. > > Again, not something I think we should depend on. A lot of people use > python.org installers. I am not talking about changing the python.org installers. Let it remain on VS2008 for Python 2.7. I am only suggesting we make it easier to find a free C compiler compatible with the python.org installers. The NumPy/SciPy dev team have taken the burden to build a MinGW toolchain that is configured to be 100 % ABI compatible with the python.org installer. I am only suggesting a link to it or something like that, perhaps even host it as a separate download. (It is GPL, so anyone can do that.) That way it would be easy to find a compatible C compiler. We have to consider that VS2008 will be unobtainable abandonware long before the promised Python 2.7 support expires. When that happens, users of Python 2.7 will need to find another compiler to build C extensions. If Python.org makes this easier it would hurt less to have Python 2.7 remain on VS2008 forever. Sturla From sturla.molden at gmail.com Sat Jun 7 03:40:34 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 7 Jun 2014 01:40:34 +0000 (UTC) Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes References: <539126DB.8010306@canterbury.ac.nz> <5391F8A4.70401@googlemail.com> <53925EBA.4050306@canterbury.ac.nz> Message-ID: <224483517423797963.412727sturla.molden-gmail.com@news.gmane.org> Greg Ewing wrote: > Julian Taylor wrote: >> tp_can_elide receives two objects and returns one of three values: >> * can work inplace, operation is associative >> * can work inplace but not associative >> * cannot work inplace > > Does it really need to be that complicated? Isn't it > sufficient just to ask the object potentially being > overwritten whether it's okay to overwrite it? How can it know this without help from the interpreter? Sturla From sturla.molden at gmail.com Sat Jun 7 04:18:35 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 7 Jun 2014 02:18:35 +0000 (UTC) Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes References: <5391754D.8000607@googlemail.com> <1842263445423761298.493568sturla.molden-gmail.com@news.gmane.org> <1064279801423785627.049284sturla.molden-gmail.com@news.gmane.org> Message-ID: <1245574759423799281.221143sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: >> with numpy.accelerate: >> x = >> y = >> z = >> # evaluation of x,y,z happens here > > Using an alternative evaluation engine is indeed another way to > optimize execution, which is why projects like numexpr, numba, theano, > etc. exist. But this is basically switching to a different language in > a different VM. I was not thinking that complicated. Let us focus on what an unmodified CPython can do. A compound expression with arrays can also be seen as a pipeline. Imagine what would happen if in "NumPy 2.0" arithmetic operators returned coroutines instead of temporary arrays. That way an expression could be evaluated chunkwise, and the chunks would be small enough to fit in cache. Sturla From tjreedy at udel.edu Sat Jun 7 04:23:41 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 06 Jun 2014 22:23:41 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <4A7F1D64-2F36-428E-9682-9861B05DEFAD@stufft.io> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> <4A7F1D64-2F36-428E-9682-9861B05DEFAD@stufft.io> Message-ID: On 6/6/2014 9:13 PM, Donald Stufft wrote: > > On Jun 6, 2014, at 9:05 PM, Terry Reedy wrote: >> If you are suggesting that a Windows compiler change should be >> invisible to non-Windows users, I agree. >> >> Let us assume that /pcbuild remains for those who have vc2008 and >> that /pcbuild14 is added (and everything else remains as is). Then >> the only other thing that would change is the Windows installer >> released on Python.org. Call than 2.7.9W or whatever on the >> download site and interactive startup message to signal that >> something is different. > How are packaging tools supposed to cope with this? AFAIK there is > nothing in most of them to deal with a X.Y.Z release suddenly dealing > with a different compiler. For this option, packaging tools on Windows would have to gain a special rule to cope with a special, hopefully unique, not-to-be-repeated, series of releases. If VC2008 ceases to become available to those who do not already have it, and who machines do not break or get replaced, dealing with a different easily available compiler would be easier than dealing with having no compiler. -- Terry Jan Reedy From chris.barker at noaa.gov Sat Jun 7 06:01:58 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 6 Jun 2014 21:01:58 -0700 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: > > > Why not just define Python 2.8 as Python 2.7 except with a newer compiler? > I cannot see why that would be massive undertaking, if changing compiler > for 2.7 is neccesary anyway. > A reminder that this was brought up a few months ago, as a proposal by the stackless team, as they wanted to use a newer compiler for binaries. IIRC, there was a pretty resounding "don't do that" from this list. Makes sense to me -- we have how many different binaries of 2.7 on how many platforms, with how many compilers? Sure, python.org has been nicely consistent about what compiler (run time, really) they use to distribute Windows binaries, but the python version has NOTHING to do with what compiler is used. (for hat matter there is 32 bit and 64 bit 2.7 on Windows ...) I think, at the time, it was thought that pip, wheel, and the metadata standards should be extended to allow multiple binaries of the same version with different compilers to be in the wild. those projects have had bigger fish to fry, but maybe it's time to get ahead of the game with that, so we can accommodate this change. It's already getting hard to find VS2008 Express, and building 64 bit extensions is s serious pain. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 7 06:41:34 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jun 2014 14:41:34 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: On 7 June 2014 08:43, Sturla Molden wrote: > Brett Cannon wrote: > >> Nope. A new minor release of Python is a massive undertaking which is why >> we have saved ourselves the hassle of doing a Python 2.8 or not giving a >> clear signal as to when Python 2.x will end as a language. > > Why not just define Python 2.8 as Python 2.7 except with a newer compiler? > I cannot see why that would be massive undertaking, if changing compiler > for 2.7 is neccesary anyway. It's honestly astonishing the number of people that tell us doing a new minor release of Python 2 is easy, and then refuse to believe us when we tell them it isn't. It's 2014 and Python *2.7*, which was released in *2010*, is STILL BEING ROLLED OUT. One part of the rollout that is near & dear to my own heart is the fact that Red Hat Enterprise Linux 7 and CentOS 7 are still in their respective release candidate phases, and it is the 6 -> 7 transition that finally upgrades their system Pythons from 2.6 to 2.7. Maya 2014 & MotionBuilder 2014 are also the first versions Autodesk are shipping that use 2.7 rather than 2.6 as the scripting engine (although my understanding is that Autodesk don't guarantee compatibility with Python C extensions that aren't built specifically for use with their products, so they already use a newer C runtime on Windows than we do). And once those two dominoes fall, then there'll be some additional follow on upgrade work in some parts of the developer community as the *users* that receive their Python through those channels rather than directly from upstream switch from 2.6 to 2.7 and stumble over the small compatibility breaks between those two releases. Words like "just", or "simple", or "easy" really have no place being applied to a task where the time required to fully execute it with *no significant problems* is still measured in years. That said, there are definitely problems with toolchain availability on Windows for Python 2, and it isn't clear yet how that will be addressed in the long run. Steve is working on ensuring the official toolchain and C runtime binaries are more readily available from MS. Other folks are independently looking into ensuring that open source toolchains (like mingw) can be used effectively to at least build Python C extensions for Windows (and ironing out some of the glitches with that approach that others have mentioned). The Python Packaging Authority are continuing to work on the wheel based infrastructure to help avoid end users having to compile anything in the first place, and redistributors like ActiveState, Enthought & Continuum Analytics also make it possible for many end users to just ignore these upstream concerns. An extension compatibility break would be an absolutely last resort, pursued only if all other attempts at resolving the challenges have demonstrably failed - even at the best of times it can take months for C extension authors to start publishing compatible binaries for a new minor release, so we'd have to assume that time would be even longer for a Python 2.7 maintenance release, if they published updated binaries at all. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Sat Jun 7 06:47:38 2014 From: donald at stufft.io (Donald Stufft) Date: Sat, 7 Jun 2014 00:47:38 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: On Jun 7, 2014, at 12:41 AM, Nick Coghlan wrote: > On 7 June 2014 08:43, Sturla Molden wrote: >> Brett Cannon wrote: >> >>> Nope. A new minor release of Python is a massive undertaking which is why >>> we have saved ourselves the hassle of doing a Python 2.8 or not giving a >>> clear signal as to when Python 2.x will end as a language. >> >> Why not just define Python 2.8 as Python 2.7 except with a newer compiler? >> I cannot see why that would be massive undertaking, if changing compiler >> for 2.7 is neccesary anyway. > > It's honestly astonishing the number of people that tell us doing a > new minor release of Python 2 is easy, and then refuse to believe us > when we tell them it isn't. > > It's 2014 and Python *2.7*, which was released in *2010*, is STILL > BEING ROLLED OUT. One part of the rollout that is near & dear to my > own heart is the fact that Red Hat Enterprise Linux 7 and CentOS 7 are > still in their respective release candidate phases, and it is the 6 -> > 7 transition that finally upgrades their system Pythons from 2.6 to > 2.7. Maya 2014 & MotionBuilder 2014 are also the first versions > Autodesk are shipping that use 2.7 rather than 2.6 as the scripting > engine (although my understanding is that Autodesk don't guarantee > compatibility with Python C extensions that aren't built specifically > for use with their products, so they already use a newer C runtime on > Windows than we do). > > And once those two dominoes fall, then there'll be some additional > follow on upgrade work in some parts of the developer community as the > *users* that receive their Python through those channels rather than > directly from upstream switch from 2.6 to 2.7 and stumble over the > small compatibility breaks between those two releases. > > Words like "just", or "simple", or "easy" really have no place being > applied to a task where the time required to fully execute it with *no > significant problems* is still measured in years. How much of that time exists because there were actual significant changes from 2.6 to 2.7 and how much of it would not need to exist if 2.8 was literally 2.7.Z with a new compiler on Windows. IOW is it the *version* number that causes the slow upgrade, or is it the fact that there are enough changes that it can?t be safely applied automatically. > > That said, there are definitely problems with toolchain availability > on Windows for Python 2, and it isn't clear yet how that will be > addressed in the long run. Steve is working on ensuring the official > toolchain and C runtime binaries are more readily available from MS. > Other folks are independently looking into ensuring that open source > toolchains (like mingw) can be used effectively to at least build > Python C extensions for Windows (and ironing out some of the glitches > with that approach that others have mentioned). The Python Packaging > Authority are continuing to work on the wheel based infrastructure to > help avoid end users having to compile anything in the first place, > and redistributors like ActiveState, Enthought & Continuum Analytics > also make it possible for many end users to just ignore these upstream > concerns. An extension compatibility break would be an absolutely last > resort, pursued only if all other attempts at resolving the challenges > have demonstrably failed - even at the best of times it can take > months for C extension authors to start publishing compatible binaries > for a new minor release, so we'd have to assume that time would be > even longer for a Python 2.7 maintenance release, if they published > updated binaries at all. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Sat Jun 7 06:49:24 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jun 2014 14:49:24 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: On 7 June 2014 14:01, Chris Barker wrote: >> >> Why not just define Python 2.8 as Python 2.7 except with a newer compiler? >> I cannot see why that would be massive undertaking, if changing compiler >> for 2.7 is neccesary anyway. > > > A reminder that this was brought up a few months ago, as a proposal by the > stackless team, as they wanted to use a newer compiler for binaries. IIRC, > there was a pretty resounding "don't do that" from this list. Makes sense to > me -- we have how many different binaries of 2.7 on how many platforms, with > how many compilers? Sure, python.org has been nicely consistent about what > compiler (run time, really) they use to distribute Windows binaries, but the > python version has NOTHING to do with what compiler is used. (for hat matter > there is 32 bit and 64 bit 2.7 on Windows ...) Supported by python-dev? We have two: 32-bit and 64-bit, both depending on the Microsoft C runtime, and both published as binary installers on python.org. That's it. > I think, at the time, it was thought that pip, wheel, and the metadata > standards should be extended to allow multiple binaries of the same version > with different compilers to be in the wild. those projects have had bigger > fish to fry, but maybe it's time to get ahead of the game with that, so we > can accommodate this change. It's already getting hard to find VS2008 > Express, and building 64 bit extensions is s serious pain. That was a largely independent discussion, noting that if we come up with a mechanism for dealing with Linux distro variances, it may also be useful for dealing with Windows C runtime variances. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jun 7 06:58:18 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jun 2014 14:58:18 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: On 7 June 2014 14:47, Donald Stufft wrote: > On Jun 7, 2014, at 12:41 AM, Nick Coghlan wrote: >> >> Words like "just", or "simple", or "easy" really have no place being >> applied to a task where the time required to fully execute it with *no >> significant problems* is still measured in years. > > How much of that time exists because there were actual significant > changes from 2.6 to 2.7 and how much of it would not need to exist > if 2.8 was literally 2.7.Z with a new compiler on Windows. IOW is it > the *version* number that causes the slow upgrade, or is it the fact > that there are enough changes that it can?t be safely applied > automatically. It's the version number change itself. Python 2.7 was covered by the language moratorium, so it consists almost entirely of standard library changes, and the porting notes are minimal: https://docs.python.org/2/whatsnew/2.7.html#porting-to-python-2-7 We didn't even switch compilers on Windows (both 2.6 and 2.7 use VS 2008). I can't think of a better demonstration than the slow pace of the Python 2.7 rollout that the challenges with doing a new minor release of Python really aren't technical ones at the language level - they're technical and administrative challenges in the way the language version number interacts with the broader Python ecosystem, especially the various redistribution channels. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Sat Jun 7 07:05:19 2014 From: donald at stufft.io (Donald Stufft) Date: Sat, 7 Jun 2014 01:05:19 -0400 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> Message-ID: <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> On Jun 7, 2014, at 12:58 AM, Nick Coghlan wrote: > On 7 June 2014 14:47, Donald Stufft wrote: >> On Jun 7, 2014, at 12:41 AM, Nick Coghlan wrote: >>> >>> Words like "just", or "simple", or "easy" really have no place being >>> applied to a task where the time required to fully execute it with *no >>> significant problems* is still measured in years. >> >> How much of that time exists because there were actual significant >> changes from 2.6 to 2.7 and how much of it would not need to exist >> if 2.8 was literally 2.7.Z with a new compiler on Windows. IOW is it >> the *version* number that causes the slow upgrade, or is it the fact >> that there are enough changes that it can?t be safely applied >> automatically. > > It's the version number change itself. Python 2.7 was covered by the > language moratorium, so it consists almost entirely of standard > library changes, and the porting notes are minimal: > https://docs.python.org/2/whatsnew/2.7.html#porting-to-python-2-7 I?m not sure I agree, the porting docs only show a subset of changes, you also have a lot of new stuff like OrderedDict, dict comprehensions, set literals, argparse, dict views, memory views, etc. AFAIK stable releases don?t jump versions because all of these new features are risks, not because a number didn?t change. I don?t particularly care too much though, I just think that bumping the compiler in a 2.7.Z release is a really bad idea and that either of the other two options are massively better. > > We didn't even switch compilers on Windows (both 2.6 and 2.7 use VS 2008). > > I can't think of a better demonstration than the slow pace of the > Python 2.7 rollout that the challenges with doing a new minor release > of Python really aren't technical ones at the language level - they're > technical and administrative challenges in the way the language > version number interacts with the broader Python ecosystem, especially > the various redistribution channels. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Sat Jun 7 07:18:49 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jun 2014 15:18:49 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> Message-ID: On 7 June 2014 15:05, Donald Stufft wrote: > I don?t particularly care too much though, I just think that bumping > the compiler in a 2.7.Z release is a really bad idea and that either > of the other two options are massively better. It is *incredibly* unlikely that backwards compatibility with binary extensions will be broken within the Python 2.7 series - there's a reason we said "No" when the Stackless folks were asking about it a while back. Instead, the toolchain availability problem is currently being tackled by trying to make suitable build toolchains more readily available (both the official VS 2008 toolchain and alternative open source toolchains), and by reducing the reliance on building from source for end users. Both of those courses of action are likely to bear fruit. It's only in the case where those approaches *don't* solve the problem that we'll need to come back and revisit the question of a compatibility break for binary extensions - it is, as you say, a really bad idea, and hence not something we would pursue when there are better options available (I think a Python 2.8 release would be an *even worse* idea in terms of souring our relationships with redistributors, but fortunately, those aren't our only two choices). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jun 7 07:28:47 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jun 2014 15:28:47 +1000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On 7 June 2014 01:41, Steve Dower wrote: > > What this means for Python is that C extensions for Python 3.5 and later can be built using any version of MSVC from 14.0 and later. Those who are aware of the current state of affairs where you need to use a matching compiler will hopefully see how big an improvement this will be. It is also likely that other compilers will have an easier time providing compatibility with this new CRT, making it simpler and more reliable to build extensions with LLVM or GCC against an MSVC CPython. \o/ That's great news. (I'm assuming that change in policy includes figuring out a solution to the file descriptor problem, since we determined during the Stackless 2.8 discussion that file descriptor mismatches were actually our biggest stumbling block when it came to mixing and matching different CRT versions in one process) > Basically, what I am offering to do is: > > * Update the files in PCBuild to work with Visual Studio "14" > * Make any code changes necessary to build with VC14 > * Regularly test the latest Python source with the latest MSVC builds and report issues/suggestions to the MSVC team > * Keep all changes in a separate (public) repo until early next year when we're getting close to the final VS "14" release > > What I am asking anyone else to do is: > > * Nothing > > Thoughts/comments/concerns? As long as we're also keeping the VS10 files up to date as a fallback option, which we will be, since the VS14 work will be in a separate repo, this sounds like a fine idea to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Sat Jun 7 08:09:08 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 07 Jun 2014 15:09:08 +0900 Subject: [Python-Dev] Moving Python 2.7 [was: 3.5] on Windows to a new compiler In-Reply-To: References: <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <99A4C614-FAC9-4201-859B-B698744A5DB9@stufft.io> <20140606194252.GA11482@k2> Message-ID: <87zjhp1caj.fsf@uwakimon.sk.tsukuba.ac.jp> Brian Curtin writes: > Adding features into 3.x is already not enough of a carrot on the > stick for many users. Intentionally leaving 2.7 on a dead compiler is > like beating them with the stick. No, it's like a New Year's resolution to stop self-flagellating, and handing the whip to the users to use on themselves, or not, as they choose. Remember, the users *chose* to remain locked-in to 2.7, hoping that we would continue to provide support, maybe 2.8. They had alternatives: contributing resources (in full-time developer support units!) to the PSF earmarked for Python 2, porting their dependencies to Python 3, etc. All expensive, yes, but eventually they need to pay the price of support or switching. Staying with Python 2 was always a bet that switching would be cheaper in the future, or that they'd have more resources in the future, or both. Who knows about the private resources, but not only does Python 3 acquire more features steadily, but efforts in core by folks like Ethan, distutils, and Nick (just to name those I've followed personally), along with steadily and expanding ports of 3rd party libraries, are quickly making switching cheaper. Cheap *enough*? That's for the users themselves to decide. So I'm not arguing against support; this kind of support (*and* the people who argue that it's worth doing, and then *do* it!) is one reason I have *no* hesitation in recommending Python (3!) vs. any comparable language.[1] But whatever is decided here, we're doing it for pride or for our own use, not because we owe the users anything. Footnotes: [1] I don't know enough about languages like Ruby or Perl to say Python provides strictly better support. I just can't imagine that it gets better than this! From breamoreboy at yahoo.co.uk Sat Jun 7 09:57:59 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 07 Jun 2014 08:57:59 +0100 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <4A7F1D64-2F36-428E-9682-9861B05DEFAD@stufft.io> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> <4A7F1D64-2F36-428E-9682-9861B05DEFAD@stufft.io> Message-ID: On 07/06/2014 02:13, Donald Stufft wrote: > > On Jun 6, 2014, at 9:05 PM, Terry Reedy wrote: > >> On 6/6/2014 6:47 PM, Nathaniel Smith wrote: >>> On Fri, Jun 6, 2014 at 11:43 PM, Sturla Molden wrote: >>>> Brett Cannon wrote: >>>> >>>>> Nope. A new minor release of Python is a massive undertaking which is why >>>>> we have saved ourselves the hassle of doing a Python 2.8 or not giving a >>>>> clear signal as to when Python 2.x will end as a language. >>>> >>>> Why not just define Python 2.8 as Python 2.7 except with a newer compiler? >>>> I cannot see why that would be massive undertaking, if changing compiler >>>> for 2.7 is neccesary anyway. >>> >>> This would require recompiling all packages on OS X and Linux, even >>> though nothing had changed. >> >> If you are suggesting that a Windows compiler change should be invisible to non-Windows users, I agree. >> >> Let us assume that /pcbuild remains for those who have vc2008 and that /pcbuild14 is added (and everything else remains as is). Then the only other thing that would change is the Windows installer released on Python.org. Call than 2.7.9W or whatever on the download site and interactive startup message to signal that something is different. >> >> -- >> Terry Jan Reedy >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io > > How are packaging tools supposed to cope with this? AFAIK there is nothing in most of them to deal with a X.Y.Z release suddenly dealing with a different compiler. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > Potentially completely stupid suggestion to get people thinking (or die laughing :) , but would it be possible to use hex digits, such that 2.7.A was the first release on Windows with the different compiler? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From g.rodola at gmail.com Sat Jun 7 11:41:31 2014 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sat, 7 Jun 2014 11:41:31 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> Message-ID: On Sat, Jun 7, 2014 at 7:05 AM, Donald Stufft wrote: > > I don?t particularly care too much though, I just think that bumping > the compiler in a 2.7.Z release is a really bad idea and that either > of the other two options are massively better. +1 -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 7 11:50:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jun 2014 19:50:50 +1000 Subject: [Python-Dev] [Python-ideas] Expose `itertools.count.start` and implement `itertools.count.__eq__` based on it, like `range`. In-Reply-To: References: <082cd87a-aeb5-49bf-9f79-d99a6d18e402@googlegroups.com> Message-ID: On 7 June 2014 19:36, Ram Rachum wrote: > My need is to have an infinite immutable sequence. I did this for myself by > creating a simple `count`-like stateless class, but it would be nice if that > behavior was part of `range`. Handling esoteric use cases like it sounds yours was is *why* user defined classes exist. It does not follow that "I had to write a custom class to solve my problem" should lead to a standard library or builtin changing unless you can make a compelling case for: * the change being a solution to a common problem that a lot of other people also have. "I think it might be nice" and "it would have been useful to me to help solve this weird problem I had that one time" isn't enough. * the change fitting in *conceptually* with the existing language and tools. In this case, "infinite sequence" is a fundamentally incoherent concept in Python - len() certainly won't work, and negative indexing behaviour is hence not defined. By contrast, since iterables and iterators aren't required to support len() the way sequences are, infinite iterable and infinite iterator are both perfectly well defined. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From chris at simplistix.co.uk Fri Jun 6 20:50:57 2014 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 06 Jun 2014 19:50:57 +0100 Subject: [Python-Dev] namedtuple implementation grumble Message-ID: <53920D91.3060207@simplistix.co.uk> Hi All, I've been trying to add support for explicit comparison of namedtuples into testfixtures and hit a problem which lead me to read the source and be sad. Rather than the mixin and class assembly in the function I expected to find, I'm greeted by an exec of a string. Curious as to what lead to that implementation approach? What does it buy that couldn't have been obtained by a mixin providing the functionality? In my case, that's somewhat irrelevant, I'm looking to store a comparer in a registry that would get used for all namedtuples, but I have nothing to key that off, there are no shared bases other than object and tuple. I guess I could duck-type it based on the _fields attribute but that feels implicit and fragile. What do you guys suggest? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From rdmurray at bitdance.com Sat Jun 7 15:25:24 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 07 Jun 2014 09:25:24 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <53920D91.3060207@simplistix.co.uk> References: <53920D91.3060207@simplistix.co.uk> Message-ID: <20140607132525.2A2F9250D5C@webabinitio.net> On Fri, 06 Jun 2014 19:50:57 +0100, Chris Withers wrote: > I've been trying to add support for explicit comparison of namedtuples > into testfixtures and hit a problem which lead me to read the source and > be sad. > > Rather than the mixin and class assembly in the function I expected to > find, I'm greeted by an exec of a string. > > Curious as to what lead to that implementation approach? What does it > buy that couldn't have been obtained by a mixin providing the functionality? > > In my case, that's somewhat irrelevant, I'm looking to store a comparer > in a registry that would get used for all namedtuples, but I have > nothing to key that off, there are no shared bases other than object and > tuple. > > I guess I could duck-type it based on the _fields attribute but that > feels implicit and fragile. > > What do you guys suggest? I seem to remember a previous discussion that concluded that duck typing based on _fields was the way to go. (It's a public API, despite the _, due to name-tuple's attribute namespacing issues.) --David From steve at pearwood.info Sat Jun 7 16:29:55 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 8 Jun 2014 00:29:55 +1000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <53920D91.3060207@simplistix.co.uk> References: <53920D91.3060207@simplistix.co.uk> Message-ID: <20140607142955.GQ10355@ando> On Fri, Jun 06, 2014 at 07:50:57PM +0100, Chris Withers wrote: > Hi All, > > I've been trying to add support for explicit comparison of namedtuples > into testfixtures and hit a problem which lead me to read the source and > be sad. > > Rather than the mixin and class assembly in the function I expected to > find, I'm greeted by an exec of a string. > > Curious as to what lead to that implementation approach? What does it > buy that couldn't have been obtained by a mixin providing the functionality? namedtuple started off as a recipe on ActiveState by Raymond Hettinger. Start here: http://code.activestate.com/recipes/500261-named-tuples/?in=user-178123 -- Steven From ncoghlan at gmail.com Sat Jun 7 16:46:47 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 8 Jun 2014 00:46:47 +1000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <53920D91.3060207@simplistix.co.uk> References: <53920D91.3060207@simplistix.co.uk> Message-ID: On 7 June 2014 04:50, Chris Withers wrote: > Curious as to what lead to that implementation approach? What does it buy > that couldn't have been obtained by a mixin providing the functionality? In principle, you could get the equivalent of collections.namedtuple through dynamically constructed classes. In practice, that's actually easier said than done, so the fact the current implementation works fine for almost all purposes acts as a powerful disincentive to rewriting it. The current implementation is also *really* easy to understand, while writing out the dynamic type creation explicitly would likely require much deeper knowledge of the type machinery to follow. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From antoine at python.org Sat Jun 7 16:50:16 2014 From: antoine at python.org (Antoine Pitrou) Date: Sat, 07 Jun 2014 10:50:16 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <20140607132525.2A2F9250D5C@webabinitio.net> References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> Message-ID: Le 07/06/2014 09:25, R. David Murray a ?crit : > On Fri, 06 Jun 2014 19:50:57 +0100, Chris Withers wrote: >> I've been trying to add support for explicit comparison of namedtuples >> into testfixtures and hit a problem which lead me to read the source and >> be sad. >> >> Rather than the mixin and class assembly in the function I expected to >> find, I'm greeted by an exec of a string. >> >> Curious as to what lead to that implementation approach? What does it >> buy that couldn't have been obtained by a mixin providing the functionality? >> >> In my case, that's somewhat irrelevant, I'm looking to store a comparer >> in a registry that would get used for all namedtuples, but I have >> nothing to key that off, there are no shared bases other than object and >> tuple. >> >> I guess I could duck-type it based on the _fields attribute but that >> feels implicit and fragile. >> >> What do you guys suggest? > > I seem to remember a previous discussion that concluded that duck typing > based on _fields was the way to go. (It's a public API, despite the _, > due to name-tuple's attribute namespacing issues.) There could be many third-party classes with a _fields member, so that sounds rather fragile. There doesn't seem to be any technical reason barring the addition of a common base class for namedtuples. Regards Antoine. From njs at pobox.com Sat Jun 7 09:23:46 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 7 Jun 2014 08:23:46 +0100 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> Message-ID: Once 7 Jun 2014 06:19, "Nick Coghlan" wrote: > > On 7 June 2014 15:05, Donald Stufft wrote: > > I don?t particularly care too much though, I just think that bumping > > the compiler in a 2.7.Z release is a really bad idea and that either > > of the other two options are massively better. > > It is *incredibly* unlikely that backwards compatibility with binary > extensions will be broken within the Python 2.7 series - there's a > reason we said "No" when the Stackless folks were asking about it a > while back. Instead, the toolchain availability problem is currently > being tackled by trying to make suitable build toolchains more readily > available (both the official VS 2008 toolchain and alternative open > source toolchains), and by reducing the reliance on building from > source for end users. A third piece of the puzzle could potentially be the availability of automated wheel-building services. (Personally I still haven't successfully managed to build windows wheels for my own packages, and envy my R-using colleagues whose PyPi equivalent does the building for them.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcmanticore at gmail.com Sat Jun 7 15:11:54 2014 From: pcmanticore at gmail.com (Claudiu Popa) Date: Sat, 7 Jun 2014 16:11:54 +0300 Subject: [Python-Dev] Division of tool labour in porting Python 2 code to 2/3 In-Reply-To: References: Message-ID: On Fri, Jun 6, 2014 at 7:37 PM, Brett Cannon wrote: > After Glyph and Alex's email about their asks for assisting in writing > Python 2/3 code, it got me thinking about where in the toolchain various > warnings and such should go in order to help direct energy to help develop > whatever future toolchain to assist in porting. > > There seems to be three places where issues are/can be caught once a project > has embarked down the road of 2/3 source compatibility: > > -3 warnings > Some linter tool Pylint could help here. We already have a couple of checks which addresses the issue of porting between Python 2 and 3, checks like: raising-string old-style-class slots-on-old-class super-on-old-class old-raise-syntax old-ne-operator lowercase-l-suffix backtick unpacking-in-except indexing-exception property-on-old-class There was an idea on Pylint's bugtracker to implement a plugin for Python 2, with warnings dedicated to porting and this solution seems easier than the alternatives. From bcannon at gmail.com Sat Jun 7 17:37:47 2014 From: bcannon at gmail.com (Brett Cannon) Date: Sat, 07 Jun 2014 15:37:47 +0000 Subject: [Python-Dev] Division of tool labour in porting Python 2 code to 2/3 References: Message-ID: On Sat Jun 07 2014 at 9:11:54 AM, Claudiu Popa wrote: > On Fri, Jun 6, 2014 at 7:37 PM, Brett Cannon wrote: > > After Glyph and Alex's email about their asks for assisting in writing > > Python 2/3 code, it got me thinking about where in the toolchain various > > warnings and such should go in order to help direct energy to help > develop > > whatever future toolchain to assist in porting. > > > > There seems to be three places where issues are/can be caught once a > project > > has embarked down the road of 2/3 source compatibility: > > > > -3 warnings > > Some linter tool > > > Pylint could help here. We already have a couple of checks which > addresses the issue of porting between Python 2 and 3, checks like: > > raising-string > old-style-class > slots-on-old-class > super-on-old-class > old-raise-syntax > old-ne-operator > lowercase-l-suffix > backtick > unpacking-in-except > indexing-exception > property-on-old-class > > There was an idea on Pylint's bugtracker to implement a plugin for > Python 2, with warnings dedicated to porting and this solution seems > easier than the alternatives. > Yes, pylint is definitely an option. I have not looked at how hard it would be to write the rules, though, and how easy it would be to run with just those rules (if I remember correctly pylint can take a config, but I have not run it manually in a while). Having something which walked the 2.7 CST or AST wouldn't be difficult to write either, so it's just a matter of balance of work required. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Sat Jun 7 17:38:41 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 7 Jun 2014 15:38:41 +0000 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> , Message-ID: <1402155524095.94474@microsoft.com> One more possible concern that I just thought of is the availability of the build tools on Windows Vista and Windows 7 RTM (that is, without SP1). I'd have to check, but I don't believe anything after VS 2012 is supported on Vista and it's entirely possible that installation is blocked. This may be a non-issue. VC14 still has the "XP mode" that avoids using new APIs, so compiled Python will run fine, but it may be the case that the compiler doesn't (if we manage to get a separate, compiler-only package, that is. VS itself is definitely unusable). I assume gcc/clang will continue to support earlier OSs, so hopefully by the time 3.5 is getting early releases there will be an option for building extensions. I doubt anyone on this list is stuck on Vista or in a position where they can't keep Win7 updated, but do we know of any environments where this may be a problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jun 7 18:56:16 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 7 Jun 2014 17:56:16 +0100 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: <53925EBA.4050306@canterbury.ac.nz> References: <539126DB.8010306@canterbury.ac.nz> <5391F8A4.70401@googlemail.com> <53925EBA.4050306@canterbury.ac.nz> Message-ID: On Sat, Jun 7, 2014 at 1:37 AM, Greg Ewing wrote: > Julian Taylor wrote: >> >> tp_can_elide receives two objects and returns one of three values: >> * can work inplace, operation is associative >> * can work inplace but not associative >> * cannot work inplace > > > Does it really need to be that complicated? Isn't it > sufficient just to ask the object potentially being > overwritten whether it's okay to overwrite it? > I.e. a parameterless method returning a boolean. For the numpy case, we really need to see all the operands, *and* know what the operation in question is. Consider tmp1 = np.ones((3, 1)) tmp2 = np.ones((1, 3)) tmp1 + tmp2 which returns an array with shape (3, 3). Both input arrays are temporaries, but neither of them can be stolen to use for the output array. Or suppose 'a' is an array of integers and 'b' is an array of floats, then 'a + b' and 'a += b' have very different results (the former upcasts 'a' to float, the latter has to either downcast 'b' to int or raise an error). But the casting rules depend on the particular input types and the particular operation -- operations like & and << want to cast to int, < and > return bools, etc. So one really needs to know all the details of the operation before one can determine whether temporary elision is possible. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From mistersheik at gmail.com Sat Jun 7 19:57:05 2014 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 7 Jun 2014 13:57:05 -0400 Subject: [Python-Dev] [Python-ideas] Expose `itertools.count.start` and implement `itertools.count.__eq__` based on it, like `range`. In-Reply-To: References: <082cd87a-aeb5-49bf-9f79-d99a6d18e402@googlegroups.com> Message-ID: On Sat, Jun 7, 2014 at 5:50 AM, Nick Coghlan wrote: > On 7 June 2014 19:36, Ram Rachum wrote: > > My need is to have an infinite immutable sequence. I did this for myself > by > > creating a simple `count`-like stateless class, but it would be nice if > that > > behavior was part of `range`. > > Handling esoteric use cases like it sounds yours was is *why* user > defined classes exist. It does not follow that "I had to write a > custom class to solve my problem" should lead to a standard library or > builtin changing unless you can make a compelling case for: > > * the change being a solution to a common problem that a lot of other > people also have. "I think it might be nice" and "it would have been > useful to me to help solve this weird problem I had that one time" > isn't enough. > * the change fitting in *conceptually* with the existing language and > tools. In this case, "infinite sequence" is a fundamentally incoherent > concept in Python - len() certainly won't work, and negative indexing > behaviour is hence not defined. By contrast, since iterables and > iterators aren't required to support len() the way sequences are, > infinite iterable and infinite iterator are both perfectly well > defined. > With all due respect, ?"infinite sequence" is a fundamentally incoherent concept in Python? is a bit hyperbolic. It would be perfectly reasonable to have them, but they're not defined (yet). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Sat Jun 7 21:42:32 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 07 Jun 2014 12:42:32 -0700 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: References: <53920D91.3060207@simplistix.co.uk> Message-ID: <53936B28.1080605@g.nevcal.com> On 6/7/2014 7:50 AM, Antoine Pitrou wrote: > Le 07/06/2014 09:25, R. David Murray a ?crit : >> On Fri, 06 Jun 2014 19:50:57 +0100, Chris Withers >> wrote: >>> I guess I could duck-type it based on the _fields attribute but that >>> feels implicit and fragile. >>> >>> What do you guys suggest? >> >> I seem to remember a previous discussion that concluded that duck typing >> based on _fields was the way to go. (It's a public API, despite the _, >> due to name-tuple's attribute namespacing issues.) > > There could be many third-party classes with a _fields member, so that > sounds rather fragile. > There doesn't seem to be any technical reason barring the addition of > a common base class for namedtuples. > > Regards > > Antoine. A common base class sounds like a good idea, to me, at a minimum, to help identify all the namedtuple derivatives. On 6/7/2014 7:46 AM, Nick Coghlan wrote: > On 7 June 2014 04:50, Chris Withers wrote: >> Curious as to what lead to that implementation approach? What does it buy >> that couldn't have been obtained by a mixin providing the functionality? > In principle, you could get the equivalent of collections.namedtuple > through dynamically constructed classes. In practice, that's actually > easier said than done, so the fact the current implementation works > fine for almost all purposes acts as a powerful disincentive to > rewriting it. The current implementation is also *really* easy to > understand, while writing out the dynamic type creation explicitly > would likely require much deeper knowledge of the type machinery to > follow. I wonder if the dynamically constructed classes approach could lead to the same space and time efficiencies... seems like I recall there being a discussion of efficiency, I think primarily space efficiency, as a justification for the present implementation. namedtuple predates of the improvements in metaclasses, also, which may be a justification for the present implementation. I bumped into namedtuple when I first started coding in Python, I was looking for _some way_, _any way_ to achieve an unmutable class with named members, and came across Raymond's recipe, which others have linked to... and learned, at the time, that he was putting it into Python stdlib. I found it far from "*really* easy to understand", although at that point in my Python knowledge, I highly doubt a metaclass implementation would have been easier to understand... but learning metaclasses earlier than I did might have been good for my general understanding of Python, and more useful in the toolbox than an implementation like namedtuple. I did, however, find and suggest a fix for a bug in the namedtuple implementation that Raymond was rather surprised that he had missed, although I would have to pick through the email archives to remember now what it was, or any other details about it... but it was in time to get fixed before the first release of Python that included namedtuple, happily. I wouldn't be opposed to someone rewriting namedtuple using metaclasses, to compare the implementations from an understandability and from an efficiency standpoint... but I don't think my metaclass skills are presently sufficient to make the attempt myself. I also seem to recall that somewhere in the (lengthy) Enum discussions, that Enum uses a technique similar to namedtuple, again for an efficiency reason, even though it also uses metaclasses in its implementation. I wonder if, if the reasons were well understood by someone that understand Python internals far better than I do, if they point out some capability that is missing from metaclasses that lead to these decisions to use string parsing and manipulation as a basis for implementing classes with metaclass-like behaviors, yet not use the metaclass feature set to achieve those behaviors. Glenn -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Sat Jun 7 22:00:15 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Sat, 7 Jun 2014 23:00:15 +0300 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <53936B28.1080605@g.nevcal.com> References: <53920D91.3060207@simplistix.co.uk> <53936B28.1080605@g.nevcal.com> Message-ID: <20140607230015.71fbc213@x34f> Hello, On Sat, 07 Jun 2014 12:42:32 -0700 Glenn Linderman wrote: > On 6/7/2014 7:50 AM, Antoine Pitrou wrote: > > Le 07/06/2014 09:25, R. David Murray a ?crit : > >> On Fri, 06 Jun 2014 19:50:57 +0100, Chris Withers > >> wrote: > >>> I guess I could duck-type it based on the _fields attribute but > >>> that feels implicit and fragile. > >>> > >>> What do you guys suggest? > >> > >> I seem to remember a previous discussion that concluded that duck > >> typing based on _fields was the way to go. (It's a public API, > >> despite the _, due to name-tuple's attribute namespacing issues.) > > > > There could be many third-party classes with a _fields member, so > > that sounds rather fragile. > > There doesn't seem to be any technical reason barring the addition > > of a common base class for namedtuples. > > > > Regards > > > > Antoine. > > A common base class sounds like a good idea, to me, at a minimum, to > help identify all the namedtuple derivatives. I'm perplexed - isn't "tuple" such common base class? And checking for both "tuple" base class and "_fields" member will identify it with ~same probability as a check for special base type (because it's fair to say that if someone *both* subclassed a builtin type and add _fields member, then they wanted it to be treated as namedtuple). [] -- Best regards, Paul mailto:pmiscml at gmail.com From lpanl09 at gmail.com Sat Jun 7 23:50:02 2014 From: lpanl09 at gmail.com (Le Pa) Date: Sat, 7 Jun 2014 21:50:02 +0000 (UTC) Subject: [Python-Dev] cpython and python debugger documentation Message-ID: Hi, I am interested in learning how the cpython interpreter is designed and implemented, and also how the python debugger works internally. My ultimate purpose is to modify them for my distributed computing needs. Are there any documentations on these please? I have done some goggling but failed to find anything useful. Thanks you very much for your help! -Le From greg.ewing at canterbury.ac.nz Sun Jun 8 01:02:51 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 08 Jun 2014 11:02:51 +1200 Subject: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes In-Reply-To: References: <539126DB.8010306@canterbury.ac.nz> <5391F8A4.70401@googlemail.com> <53925EBA.4050306@canterbury.ac.nz> Message-ID: <53939A1B.9060705@canterbury.ac.nz> Nathaniel Smith wrote: > For the numpy case, we really need to see all the operands, *and* know > what the operation in question is... Okay, I see what you mean now. Given all that, it might be simpler just to have the method perform the operation itself if it can. It has all the information necessary to do so, after all. This would also make it possible for the inplace operators to have different semantics from temp-elided non-inplace ones if desired. -- Greg From ncoghlan at gmail.com Sun Jun 8 01:35:58 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 8 Jun 2014 09:35:58 +1000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <53936B28.1080605@g.nevcal.com> References: <53920D91.3060207@simplistix.co.uk> <53936B28.1080605@g.nevcal.com> Message-ID: On 8 Jun 2014 05:44, "Glenn Linderman" wrote: > > I wonder if the dynamically constructed classes approach could lead to the same space and time efficiencies... seems like I recall there being a discussion of efficiency, I think primarily space efficiency, as a justification for the present implementation. namedtuple predates of the improvements in metaclasses, also, which may be a justification for the present implementation. As far as I am aware, there's nothing magical in the classes namedtuple creates that would require a custom metaclass - it's just that what it does would likely be even harder to read if written out explicitly rather than letting the compiler & eval loop deal with it. However, we've drifted off topic for python-dev at this point. If anyone wanted to experiment with alternative implementations, python-ideas would be the place to discuss that. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sun Jun 8 21:13:55 2014 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 08 Jun 2014 15:13:55 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: References: <53920D91.3060207@simplistix.co.uk> Message-ID: <5394B5F3.6050403@trueblade.com> On 6/7/2014 10:46 AM, Nick Coghlan wrote: > On 7 June 2014 04:50, Chris Withers wrote: >> Curious as to what lead to that implementation approach? What does it buy >> that couldn't have been obtained by a mixin providing the functionality? > > In principle, you could get the equivalent of collections.namedtuple > through dynamically constructed classes. In practice, that's actually > easier said than done, so the fact the current implementation works > fine for almost all purposes acts as a powerful disincentive to > rewriting it. The current implementation is also *really* easy to > understand, while writing out the dynamic type creation explicitly > would likely require much deeper knowledge of the type machinery to > follow. As proof that it's harder to understand, here's an example of that dynamically creating functions and types: https://pypi.python.org/pypi/namedlist https://bitbucket.org/ericvsmith/namedlist/src/163d0d05e94f9cc0af8e269015b9ac3bf9a83826/namedlist.py?at=default#cl-155 It uses the ast module to build an __init__ (or __new__) function dynamically, without exec. Then it creates a type using that function to initialize the new type. namedlist.namedtuple passes all collections.namedtuple tests, except for those using the _source attribute (of course). namedlist.namedlist and namedlist.namedtuple both support a clunky interface to specify default values for member fields. The reasons I didn't use the collections.namedtuple exec-based approach are: - specify default values to __init__ or __new__ became very complex - 2.x and 3.x support is harder with exec Eric. From dw+python-dev at hmmz.org Sun Jun 8 21:37:46 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Sun, 8 Jun 2014 19:37:46 +0000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <5394B5F3.6050403@trueblade.com> References: <53920D91.3060207@simplistix.co.uk> <5394B5F3.6050403@trueblade.com> Message-ID: <20140608193746.GA1687@k2> On Sun, Jun 08, 2014 at 03:13:55PM -0400, Eric V. Smith wrote: > > The current implementation is also *really* easy to understand, > > while writing out the dynamic type creation explicitly would likely > > require much deeper knowledge of the type machinery to follow. > As proof that it's harder to understand, here's an example of that > dynamically creating functions and types: Probably I'm missing something, but there's a much simpler non-exec approach, something like: class _NamedTuple(...): ... def namedtuple(name, fields): cls = tuple(name, (_NamedTuple,), { '_fields': fields.split() }) for i, field_name in enumerate(cls._fields): prop = property(functools.partial(_NamedTuple.__getitem__, i) functools.partial(_NamedTuple.__setitem__, i)) setattr(cls, field_name, prop) return cls David From dw+python-dev at hmmz.org Sun Jun 8 21:38:47 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Sun, 8 Jun 2014 19:38:47 +0000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <20140608193746.GA1687@k2> References: <53920D91.3060207@simplistix.co.uk> <5394B5F3.6050403@trueblade.com> <20140608193746.GA1687@k2> Message-ID: <20140608193847.GB1687@k2> On Sun, Jun 08, 2014 at 07:37:46PM +0000, dw+python-dev at hmmz.org wrote: > cls = tuple(name, (_NamedTuple,), { Ugh, this should of course have been type(). David From eric at trueblade.com Sun Jun 8 23:27:41 2014 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 08 Jun 2014 17:27:41 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <20140608193746.GA1687@k2> References: <53920D91.3060207@simplistix.co.uk> <5394B5F3.6050403@trueblade.com> <20140608193746.GA1687@k2> Message-ID: <5394D54D.9020507@trueblade.com> On 6/8/2014 3:37 PM, dw+python-dev at hmmz.org wrote: > On Sun, Jun 08, 2014 at 03:13:55PM -0400, Eric V. Smith wrote: > >>> The current implementation is also *really* easy to understand, >>> while writing out the dynamic type creation explicitly would likely >>> require much deeper knowledge of the type machinery to follow. > >> As proof that it's harder to understand, here's an example of that >> dynamically creating functions and types: > > Probably I'm missing something, but there's a much simpler non-exec > approach, something like: > > class _NamedTuple(...): > ... > > def namedtuple(name, fields): > cls = tuple(name, (_NamedTuple,), { > '_fields': fields.split() > }) > for i, field_name in enumerate(cls._fields): > prop = property(functools.partial(_NamedTuple.__getitem__, i) > functools.partial(_NamedTuple.__setitem__, i)) > setattr(cls, field_name, prop) > return cls How would you write _Namedtuple.__new__? From dw+python-dev at hmmz.org Sun Jun 8 23:51:35 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Sun, 8 Jun 2014 21:51:35 +0000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <5394D54D.9020507@trueblade.com> References: <53920D91.3060207@simplistix.co.uk> <5394B5F3.6050403@trueblade.com> <20140608193746.GA1687@k2> <5394D54D.9020507@trueblade.com> Message-ID: <20140608215135.GA2970@k2> On Sun, Jun 08, 2014 at 05:27:41PM -0400, Eric V. Smith wrote: > How would you write _Namedtuple.__new__? Knew something must be missing :) Obviously it's possible, but not nearly as efficiently as reusing the argument parsing machinery as in the original implementation. I guess especially the kwargs implementation below would suck.. _undef = object() class _NamedTuple(...): def __new__(cls, *a, **kw): if kw: a = list(a) + ([_undef] * (len(self._fields)-len(a))) for k, v in kw.iteritems(): i = cls._name_id_map[k] if a[i] is not _undef: raise TypeError(...) a[i] = v if _undef not in a: return tuple.__new__(cls, a) raise TypeError(...) else: if len(a) == len(self._fields): return tuple.__new__(cls, a) raise TypeError(...) def namedtuple(name, fields): fields = fields.split() cls = type(name, (_NamedTuple,), { '_fields': fields, '_name_id_map': {k: i for i, k in enumerate(fields)} }) for i, field_name in enumerate(fields): getter = functools.partial(_NamedTuple.__getitem__, i) setattr(cls, field_name, property(getter)) return cls David From rdmurray at bitdance.com Mon Jun 9 00:44:02 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 08 Jun 2014 18:44:02 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> Message-ID: <20140608224402.834E5250D4E@webabinitio.net> On Sat, 07 Jun 2014 10:50:16 -0400, Antoine Pitrou wrote: > Le 07/06/2014 09:25, R. David Murray a ??crit : > > On Fri, 06 Jun 2014 19:50:57 +0100, Chris Withers wrote: > >> I've been trying to add support for explicit comparison of namedtuples > >> into testfixtures and hit a problem which lead me to read the source and > >> be sad. > >> > >> Rather than the mixin and class assembly in the function I expected to > >> find, I'm greeted by an exec of a string. > >> > >> Curious as to what lead to that implementation approach? What does it > >> buy that couldn't have been obtained by a mixin providing the functionality? > >> > >> In my case, that's somewhat irrelevant, I'm looking to store a comparer > >> in a registry that would get used for all namedtuples, but I have > >> nothing to key that off, there are no shared bases other than object and > >> tuple. > >> > >> I guess I could duck-type it based on the _fields attribute but that > >> feels implicit and fragile. > >> > >> What do you guys suggest? > > > > I seem to remember a previous discussion that concluded that duck typing > > based on _fields was the way to go. (It's a public API, despite the _, > > due to name-tuple's attribute namespacing issues.) > > There could be many third-party classes with a _fields member, so that > sounds rather fragile. > There doesn't seem to be any technical reason barring the addition of a > common base class for namedtuples. For what it is worth, I found the discussion I was remembering: http://bugs.python.org/issue7796 And as someone pointed out down thread, the actual check is "inherits from tuple and has a _fields attribute". That gets you a duck type, which is generally what you want in Python. --David From antoine at python.org Mon Jun 9 01:32:11 2014 From: antoine at python.org (Antoine Pitrou) Date: Sun, 08 Jun 2014 19:32:11 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <20140608224402.834E5250D4E@webabinitio.net> References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> <20140608224402.834E5250D4E@webabinitio.net> Message-ID: Le 08/06/2014 18:44, R. David Murray a ?crit : > > For what it is worth, I found the discussion I was remembering: > > http://bugs.python.org/issue7796 > > And as someone pointed out down thread, the actual check is "inherits > from tuple and has a _fields attribute". > > That gets you a duck type, which is generally what you want in Python. I think it's a bit complicated (and not obviously discoverable) as far as duck-typing goes. Regards Antoine. From steve at pearwood.info Mon Jun 9 01:31:17 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 9 Jun 2014 09:31:17 +1000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <5394B5F3.6050403@trueblade.com> References: <53920D91.3060207@simplistix.co.uk> <5394B5F3.6050403@trueblade.com> Message-ID: <20140608233117.GS10355@ando> On Sun, Jun 08, 2014 at 03:13:55PM -0400, Eric V. Smith wrote: > On 6/7/2014 10:46 AM, Nick Coghlan wrote: > > On 7 June 2014 04:50, Chris Withers wrote: > >> Curious as to what lead to that implementation approach? What does it buy > >> that couldn't have been obtained by a mixin providing the functionality? > > > > In principle, you could get the equivalent of collections.namedtuple > > through dynamically constructed classes. In practice, that's actually > > easier said than done, so the fact the current implementation works > > fine for almost all purposes acts as a powerful disincentive to > > rewriting it. The current implementation is also *really* easy to > > understand, while writing out the dynamic type creation explicitly > > would likely require much deeper knowledge of the type machinery to > > follow. > > As proof that it's harder to understand, here's an example of that > dynamically creating functions and types: [...] I wonder how a hybrid approach would work? Use a dynamically-created class, but then construct the __new__ method using exec and inject it into the new class. As far as I can see, it's only __new__ that benefits from the exec approach. Anyone tried this yet? Is it worth an experiment? -- Steven From raymond.hettinger at gmail.com Mon Jun 9 02:03:11 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 8 Jun 2014 17:03:11 -0700 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <20140607132525.2A2F9250D5C@webabinitio.net> References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> Message-ID: On Jun 7, 2014, at 6:25 AM, R. David Murray wrote: >> I guess I could duck-type it based on the _fields attribute but that >> feels implicit and fragile. >> >> What do you guys suggest? > > I seem to remember a previous discussion that concluded that duck typing > based on _fields was the way to go. (It's a public API, despite the _, > due to name-tuple's attribute namespacing issues.) Yes. That is the recommended approach. IIRC that was Guido's suggestion rather than creating an abstract base class for a named tuple (any tuple-like class with indexable elements that are also accessible using named attributes). Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Jun 9 03:21:42 2014 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 08 Jun 2014 21:21:42 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <20140608233117.GS10355@ando> References: <53920D91.3060207@simplistix.co.uk> <5394B5F3.6050403@trueblade.com> <20140608233117.GS10355@ando> Message-ID: <53950C26.10006@trueblade.com> On 6/8/2014 7:31 PM, Steven D'Aprano wrote: > On Sun, Jun 08, 2014 at 03:13:55PM -0400, Eric V. Smith wrote: >> On 6/7/2014 10:46 AM, Nick Coghlan wrote: >>> On 7 June 2014 04:50, Chris Withers wrote: >>>> Curious as to what lead to that implementation approach? What does it buy >>>> that couldn't have been obtained by a mixin providing the functionality? >>> >>> In principle, you could get the equivalent of collections.namedtuple >>> through dynamically constructed classes. In practice, that's actually >>> easier said than done, so the fact the current implementation works >>> fine for almost all purposes acts as a powerful disincentive to >>> rewriting it. The current implementation is also *really* easy to >>> understand, while writing out the dynamic type creation explicitly >>> would likely require much deeper knowledge of the type machinery to >>> follow. >> >> As proof that it's harder to understand, here's an example of that >> dynamically creating functions and types: > [...] > > > I wonder how a hybrid approach would work? Use a dynamically-created > class, but then construct the __new__ method using exec and inject it > into the new class. As far as I can see, it's only __new__ that benefits > from the exec approach. > > Anyone tried this yet? Is it worth an experiment? I'm not sure what the benefit would be. Other than the ast manipulations for __new__, the rest of the non-exec code is easy to understand. Eric. From ncoghlan at gmail.com Mon Jun 9 03:42:59 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 9 Jun 2014 11:42:59 +1000 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> Message-ID: On 9 Jun 2014 10:04, "Raymond Hettinger" wrote: > > > On Jun 7, 2014, at 6:25 AM, R. David Murray wrote: > >>> I guess I could duck-type it based on the _fields attribute but that >>> feels implicit and fragile. >>> >>> What do you guys suggest? >> >> >> I seem to remember a previous discussion that concluded that duck typing >> based on _fields was the way to go. (It's a public API, despite the _, >> due to name-tuple's attribute namespacing issues.) > > > Yes. That is the recommended approach. > > IIRC that was Guido's suggestion rather than creating an abstract > base class for a named tuple (any tuple-like class with indexable > elements that are also accessible using named attributes). Given the somewhat periodic recurrence of the question, might it be worth making an ABC after all, with "subclass of tuple with a _fields attribute" as its default check? "isinstance(obj, collections.NamedTupleABC)" is quite a bit more self-documenting than "isinstance(obj, tuple) and hasattr(obj, '_fields')" Cheers, Nick. > > > Raymond > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Jun 9 06:05:17 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 8 Jun 2014 21:05:17 -0700 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> Message-ID: <97F551CA-E992-43F9-B988-B040DF1B7D76@gmail.com> On Jun 8, 2014, at 6:42 PM, Nick Coghlan wrote: > >> I seem to remember a previous discussion that concluded that duck typing > >> based on _fields was the way to go. (It's a public API, despite the _, > >> due to name-tuple's attribute namespacing issues.) > > > > > > Yes. That is the recommended approach. > > > > IIRC that was Guido's suggestion rather than creating an abstract > > base class for a named tuple (any tuple-like class with indexable > > elements that are also accessible using named attributes). > > Given the somewhat periodic recurrence of the question, might it be worth making an ABC after all, with "subclass of tuple with a _fields attribute" as its default check? > > "isinstance(obj, collections.NamedTupleABC)" is quite a bit more self-documenting than "isinstance(obj, tuple) and hasattr(obj, '_fields')" > The "isinstance(obj, tuple)" part isn't a requirement. The concept of a named tuple is meant to include structseq objects or user defined classes that have are "tuple-like with indexable elements that are also accessible using named attributes" (see the definition in the glossary). I could add a note to the docs saying that hasattr(obj, '_fields') is the preferred way to check for named tuples produced by the namedtuple() factory function, but it would be a waste to introduce an ABC for this. (Consider the failure of the Callable() abc leading to us deciding to reintroduce the callable() builtin function, and consider the general unwillingness to test for iterability using the Iterable abc). Another issue is that a straight abc wouldn't be sufficient. What we would really want is to check for is: 1) the presence of a _fields tuple (an abc can do this) 2) to check that all of the attribute names specified in _fields are defined (ABCMeta doesn't do this) 3) and that the type is a Sequence (ABCMeta can do this). An tricked-out ABC extension might be worth it if it provided some non-trivial mixin capabilities for implementing homegrown named tuples (not created by the factory function), but I don't think we want to go there. The problem isn't important enough to warrant throwing this much code and a new API at it (duck-typing the attributes and checking for _fields is a practical solution that works even on older pythons). Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From sstewartgallus00 at mylangara.bc.ca Sun Jun 8 23:22:19 2014 From: sstewartgallus00 at mylangara.bc.ca (Steven Stewart-Gallus) Date: Sun, 08 Jun 2014 21:22:19 +0000 (GMT) Subject: [Python-Dev] Help with the build system and my first patch Message-ID: Hello, I would like some help understanding the build system. I am currently working on an issue (http://bugs.python.org/issue21627) and plan to create some common functionality in Python/setcloexec.c and Include/setcloexec.h that is conditionally compiled in on POSIX systems and not on Windows systems. I need to extract this functionality out from _Py_set_inheritable because it needs to run in the dangerous context of right after a fork and I don't believe it can throw exceptions. How can I conditionally compile some library code for certain platforms only? Thank you, Steven Stewart-Gallus From berker.peksag at gmail.com Mon Jun 9 11:31:14 2014 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Mon, 9 Jun 2014 12:31:14 +0300 Subject: [Python-Dev] [Python-checkins] cpython: Closes #21256: Printout of keyword args in deterministic order in mock calls. In-Reply-To: <3gn6jt4bdPz7LjP@mail.python.org> References: <3gn6jt4bdPz7LjP@mail.python.org> Message-ID: On Mon, Jun 9, 2014 at 11:16 AM, kushal.das wrote: > http://hg.python.org/cpython/rev/8e05e15901a8 > changeset: 91102:8e05e15901a8 > user: Kushal Das > date: Mon Jun 09 13:45:56 2014 +0530 > summary: > Closes #21256: Printout of keyword args in deterministic order in mock calls. > > Printout of keyword args should be in deterministic order in > a mock function call. This will help to write better doctests. > > files: > Lib/unittest/mock.py | 2 +- > Lib/unittest/test/testmock/testmock.py | 6 ++++++ > Misc/NEWS | 3 +++ > 3 files changed, 10 insertions(+), 1 deletions(-) > > > diff --git a/Lib/unittest/mock.py b/Lib/unittest/mock.py > --- a/Lib/unittest/mock.py > +++ b/Lib/unittest/mock.py > @@ -1894,7 +1894,7 @@ > formatted_args = '' > args_string = ', '.join([repr(arg) for arg in args]) > kwargs_string = ', '.join([ > - '%s=%r' % (key, value) for key, value in kwargs.items() > + '%s=%r' % (key, value) for key, value in sorted(kwargs.items()) > ]) > if args_string: > formatted_args = args_string > diff --git a/Lib/unittest/test/testmock/testmock.py b/Lib/unittest/test/testmock/testmock.py > --- a/Lib/unittest/test/testmock/testmock.py > +++ b/Lib/unittest/test/testmock/testmock.py > @@ -1206,6 +1206,12 @@ > with self.assertRaises(AssertionError): > m.hello.assert_not_called() > > + #Issue21256 printout of keyword args should be in deterministic order > + def test_sorted_call_signature(self): > + m = Mock() > + m.hello(name='hello', daddy='hero') > + text = "call(daddy='hero', name='hello')" > + self.assertEquals(repr(m.hello.call_args), text) Should this be assertEqual instead? --Berker > > def test_mock_add_spec(self): > class _One(object): > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -92,6 +92,9 @@ > Library > ------- > > +- Issue #21256: Printout of keyword args should be in deterministic order in > + a mock function call. This will help to write better doctests. > + > - Issue #21677: Fixed chaining nonnormalized exceptions in io close() methods. > > - Issue #11709: Fix the pydoc.help function to not fail when sys.stdin is not a > > -- > Repository URL: http://hg.python.org/cpython > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > https://mail.python.org/mailman/listinfo/python-checkins > From antoine at python.org Mon Jun 9 13:40:41 2014 From: antoine at python.org (Antoine Pitrou) Date: Mon, 09 Jun 2014 07:40:41 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <97F551CA-E992-43F9-B988-B040DF1B7D76@gmail.com> References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> <97F551CA-E992-43F9-B988-B040DF1B7D76@gmail.com> Message-ID: Le 09/06/2014 00:05, Raymond Hettinger a ?crit : > > Another issue is that a straight abc wouldn't be sufficient. What we > would really want is to check for is: > 1) the presence of a _fields tuple (an abc can do this) > 2) to check that all of the attribute names specified in _fields are > defined (ABCMeta doesn't do this) > 3) and that the type is a Sequence (ABCMeta can do this). > > An tricked-out ABC extension might be worth it if it provided some > non-trivial mixin capabilities for implementing homegrown named tuples > (not created by the factory function), but I don't think we want to go > there. Instead of an ABC, why not a simple is_namedtuple() function? Regards Antoine. From bcannon at gmail.com Mon Jun 9 16:01:18 2014 From: bcannon at gmail.com (Brett Cannon) Date: Mon, 09 Jun 2014 14:01:18 +0000 Subject: [Python-Dev] cpython and python debugger documentation References: Message-ID: On Sat Jun 07 2014 at 5:55:29 PM, Le Pa wrote: > Hi, > > I am interested in learning how the cpython interpreter is designed and > implemented, > and also how the python debugger works internally. My ultimate purpose is > to > modify > them for my distributed computing needs. Are there any documentations > on these please? I have done some goggling but failed to find anything > useful. > > Thanks you very much for your help! > The only documentation we have is (roughly) how the parser and compiler work, not the interpreter. As for pdb, it's written in Python so you can look at the source to see how that works without much issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcannon at gmail.com Mon Jun 9 16:03:01 2014 From: bcannon at gmail.com (Brett Cannon) Date: Mon, 09 Jun 2014 14:03:01 +0000 Subject: [Python-Dev] Help with the build system and my first patch References: Message-ID: On Mon Jun 09 2014 at 2:07:22 AM, Steven Stewart-Gallus < sstewartgallus00 at mylangara.bc.ca> wrote: > Hello, > > I would like some help understanding the build system. I am currently > working on an issue (http://bugs.python.org/issue21627) and plan to > create some common functionality in Python/setcloexec.c and > Include/setcloexec.h that is conditionally compiled in on POSIX > systems and not on Windows systems. I need to extract this > functionality out from _Py_set_inheritable because it needs to run in > the dangerous context of right after a fork and I don't believe it can > throw exceptions. How can I conditionally compile some library code > for certain platforms only? > Do you mean other than potentially detecting something in the configure script and using an #ifdef guard? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Mon Jun 9 16:50:02 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Mon, 9 Jun 2014 17:50:02 +0300 Subject: [Python-Dev] cpython and python debugger documentation In-Reply-To: References: Message-ID: <20140609175002.6ad27c91@x34f> Hello, On Mon, 09 Jun 2014 14:01:18 +0000 Brett Cannon wrote: > On Sat Jun 07 2014 at 5:55:29 PM, Le Pa wrote: > > > Hi, > > > > I am interested in learning how the cpython interpreter is designed > > and implemented, > > and also how the python debugger works internally. My ultimate > > purpose is to > > modify > > them for my distributed computing needs. Are there any > > documentations on these please? I have done some goggling but > > failed to find anything useful. > > > > Thanks you very much for your help! > > > > The only documentation we have is (roughly) how the parser and > compiler work, not the interpreter. As for pdb, it's written in > Python so you can look at the source to see how that works without > much issue. But doing attentive googling will turn out a lot of 3rd-party blog posts which discuss various implementation aspects of CPython (and even alternative implementations). Some random links: http://tech.blog.aknin.name/category/my-projects/pythons-innards/ http://eli.thegreenplace.net/2010/09/18/python-internals-symbol-tables-part-1/ http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html One should keep in mind that implementation evolves all the time, and any info in older docs may be obsolete. So, the ultimate reference is the source itself, but posts like above can be a good help to understand it more easily and effectively. -- Best regards, Paul mailto:pmiscml at gmail.com From eliben at gmail.com Mon Jun 9 18:26:25 2014 From: eliben at gmail.com (Eli Bendersky) Date: Mon, 9 Jun 2014 09:26:25 -0700 Subject: [Python-Dev] cpython and python debugger documentation In-Reply-To: <20140609175002.6ad27c91@x34f> References: <20140609175002.6ad27c91@x34f> Message-ID: On Mon, Jun 9, 2014 at 7:50 AM, Paul Sokolovsky wrote: > Hello, > > On Mon, 09 Jun 2014 14:01:18 +0000 > Brett Cannon wrote: > > > On Sat Jun 07 2014 at 5:55:29 PM, Le Pa wrote: > > > > > Hi, > > > > > > I am interested in learning how the cpython interpreter is designed > > > and implemented, > > > and also how the python debugger works internally. My ultimate > > > purpose is to > > > modify > > > them for my distributed computing needs. Are there any > > > documentations on these please? I have done some goggling but > > > failed to find anything useful. > > > > > > Thanks you very much for your help! > > > > > > > The only documentation we have is (roughly) how the parser and > > compiler work, not the interpreter. As for pdb, it's written in > > Python so you can look at the source to see how that works without > > much issue. > > But doing attentive googling will turn out a lot of 3rd-party blog > posts which discuss various implementation aspects of CPython (and even > alternative implementations). Some random links: > > http://tech.blog.aknin.name/category/my-projects/pythons-innards/ > > http://eli.thegreenplace.net/2010/09/18/python-internals-symbol-tables-part-1/ > FWIW I have a bunch of those, and the symbol table one is probably not the best for beginners. The whole category is here: http://eli.thegreenplace.net/category/programming/python/python-internals/ Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Jun 9 18:34:31 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 9 Jun 2014 09:34:31 -0700 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: References: <53920D91.3060207@simplistix.co.uk> <20140607132525.2A2F9250D5C@webabinitio.net> <97F551CA-E992-43F9-B988-B040DF1B7D76@gmail.com> Message-ID: On Jun 9, 2014, at 4:40 AM, Antoine Pitrou wrote: > Instead of an ABC, why not a simple is_namedtuple() function? That would work. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From sstewartgallus00 at mylangara.bc.ca Mon Jun 9 19:48:27 2014 From: sstewartgallus00 at mylangara.bc.ca (Steven Stewart-Gallus) Date: Mon, 09 Jun 2014 17:48:27 +0000 (GMT) Subject: [Python-Dev] Help with the build system and my first patch In-Reply-To: References: Message-ID: > Do you mean other than potentially detecting something in the > configurescript and using an #ifdef guard? Yes, that works on a static function inside a file level but I need to conditionally include a whole file into the build. From tjreedy at udel.edu Mon Jun 9 20:44:03 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 09 Jun 2014 14:44:03 -0400 Subject: [Python-Dev] cpython and python debugger documentation In-Reply-To: References: <20140609175002.6ad27c91@x34f> Message-ID: On 6/9/2014 12:26 PM, Eli Bendersky wrote: > > > > On Mon, Jun 9, 2014 at 7:50 AM, Paul Sokolovsky > wrote: > > Hello, > > On Mon, 09 Jun 2014 14:01:18 +0000 > Brett Cannon > wrote: > > > On Sat Jun 07 2014 at 5:55:29 PM, Le Pa > wrote: > > > > > Hi, > > > > > > I am interested in learning how the cpython interpreter is designed > > > and implemented, > > > and also how the python debugger works internally. My ultimate > > > purpose is to > > > modify > > > them for my distributed computing needs. Are there any > > > documentations on these please? I have done some goggling but > > > failed to find anything useful. > > > > > > Thanks you very much for your help! > > > > > > > The only documentation we have is (roughly) how the parser and > > compiler work, not the interpreter. As for pdb, it's written in > > Python so you can look at the source to see how that works without > > much issue. > > But doing attentive googling will turn out a lot of 3rd-party blog > posts which discuss various implementation aspects of CPython (and even > alternative implementations). Some random links: > > http://tech.blog.aknin.name/category/my-projects/pythons-innards/ > http://eli.thegreenplace.net/2010/09/18/python-internals-symbol-tables-part-1/ > > > FWIW I have a bunch of those, and the symbol table one is probably not > the best for beginners. The whole category is here: > http://eli.thegreenplace.net/category/programming/python/python-internals/ Perhaps someone could make a wiki entry such as PythonInternals with links such as these. -- Terry Jan Reedy From bcannon at gmail.com Mon Jun 9 20:45:46 2014 From: bcannon at gmail.com (Brett Cannon) Date: Mon, 09 Jun 2014 18:45:46 +0000 Subject: [Python-Dev] Help with the build system and my first patch References: Message-ID: On Mon Jun 09 2014 at 1:48:27 PM, Steven Stewart-Gallus < sstewartgallus00 at mylangara.bc.ca> wrote: > > Do you mean other than potentially detecting something in the > > configurescript and using an #ifdef guard? > > Yes, that works on a static function inside a file level but I need to > conditionally include a whole file into the build. > Why specifically does the file itself be conditional? Typically you unconditionally include the whole file and then put the entire contents of it in a #ifdef guard. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Tue Jun 10 04:23:12 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Tue, 10 Jun 2014 05:23:12 +0300 Subject: [Python-Dev] Criticism of execfile() removal in Python3 Message-ID: <20140610052312.280e49c9@x34f> Hello, I was pleasantly surprised with the response to recent post about MicroPython implementation details (https://mail.python.org/pipermail/python-dev/2014-June/134718.html). I hope that discussion means that posts about alternative implementations are not unwelcome here, so I would like to bring up another (of many) issues we faced while implementing MicroPython. execfile() builtin function was removed in 3.0. This brings few problems: 1. It hampers interactive mode - instead of short and easy to type execfile("file.py") one needs to use exec(open("file.py").read()). I'm sure that's not going to bother a lot of people - after all, the easiest way to execute a Python file is to drop back to shell and restart python with file name, using all wonders of tab completion. But now imagine that Python interpreter runs on bare hardware, and its REPL is the only shell. That's exactly what we have with MicroPython's Cortex-M port. But it's not really MicroPython-specific, there's CPython port to baremetal either - http://www.pycorn.org/ . 2. Ok, assuming that exec(open().read()) idiom is still a way to go, there's a problem - it requires to load entire file to memory. But there can be not enough memory. Consider 1Mb file with 900Kb comments (autogenerated, for example). execfile() could easily parse it, using small buffer. But exec() requires to slurp entire file into memory, and 1Mb is much more than heap sizes that we target. Comments, suggestions? Just to set a productive direction, please kindly don't consider the problems above as MicroPython's. I very much liked how last discussion went: I was pointed that https://docs.python.org/3/reference/index.html is not really a CPython reference, it's a *Python* reference, and there were even motion to clarify in it some points which came out from MicroPython discussion. So, what about https://docs.python.org/3/library/index.html - is it CPython, or Python standard library specification? Assuming the latter, what we have is that, by removal of previously available feature, *Python* became less friendly for interactive usage and less scalable. Thanks, Paul mailto:pmiscml at gmail.com From steve at pearwood.info Tue Jun 10 05:03:03 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 10 Jun 2014 13:03:03 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140610052312.280e49c9@x34f> References: <20140610052312.280e49c9@x34f> Message-ID: <20140610030303.GU10355@ando> On Tue, Jun 10, 2014 at 05:23:12AM +0300, Paul Sokolovsky wrote: > execfile() builtin function was removed in 3.0. This brings few > problems: > > 1. It hampers interactive mode - instead of short and easy to type > execfile("file.py") one needs to use exec(open("file.py").read()). If the amount of typing is the problem, that's easy to solve: # do this once def execfile(name): exec(open("file.py").read()) Another possibility is: os.system("python file.py") > 2. Ok, assuming that exec(open().read()) idiom is still a way to go, > there's a problem - it requires to load entire file to memory. But > there can be not enough memory. Consider 1Mb file with 900Kb comments > (autogenerated, for example). execfile() could easily parse it, using > small buffer. But exec() requires to slurp entire file into memory, and > 1Mb is much more than heap sizes that we target. There's nothing stopping alternative implementations having their own implementation-specific standard library modules. steve at orac:/home/s$ jython Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19) [OpenJDK Server VM (Sun Microsystems Inc.)] on java1.6.0_27 Type "help", "copyright", "credits" or "license" for more information. >>> import java >>> So you could do this: from upy import execfile execfile("file.py") So long as you make it clear that this is a platform specific module, and don't advertise it as a language feature, I see no reason why you cannot do that. -- Steven From tjreedy at udel.edu Tue Jun 10 05:56:09 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 09 Jun 2014 23:56:09 -0400 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140610030303.GU10355@ando> References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> Message-ID: On 6/9/2014 11:03 PM, Steven D'Aprano wrote: > On Tue, Jun 10, 2014 at 05:23:12AM +0300, Paul Sokolovsky wrote: > >> execfile() builtin function was removed in 3.0. Because it was hardly ever used. For short bits of code, it is usually inferior to exec with a string in the file. For substantial bits of code, it is generally inferior to 'from file import *' and does not have the option of other forms of import. For startup code that you want every session, it is inferior to PYTHONSTARTUP or custom site module. >> This brings few problems: >> 1. It hampers interactive mode - instead of short and easy to type >> execfile("file.py") one needs to use exec(open("file.py").read()) > If the amount of typing is the problem, that's easy to solve: > > # do this once > def execfile(name): > exec(open("file.py").read()) > > Another possibility is: > > os.system("python file.py") > > >> 2. Ok, assuming that exec(open().read()) idiom is still a way to go, >> there's a problem - it requires to load entire file to memory. But >> there can be not enough memory. Consider 1Mb file with 900Kb comments >> (autogenerated, for example). execfile() could easily parse it, using >> small buffer. But exec() requires to slurp entire file into memory, and >> 1Mb is much more than heap sizes that we target. Execfile could slurp the whole file into memory too. Next parse the entire file. Then execute the entire bytecode. Finally toss the bytecode so that the file has to be reparsed next time it is used. > There's nothing stopping alternative implementations having their own > implementation-specific standard library modules. ... > So you could do this: > > from upy import execfile > execfile("file.py") > > So long as you make it clear that this is a platform specific module, > and don't advertise it as a language feature, I see no reason why you > cannot do that. If you want execfile as a substitute for 'python -i file' on the unavailable command console, you should have the option to restore globals to initial condition. Something like (untested) # startup entries in globals in CPython 3.4.1 startnames={'__spec__', '__name__', '__builtins__', '__doc__', '__loader__', '__package__'} def execfile(file, encoding='utf-8', restart=): glodict = globals() code = open(file, 'r', encoding=encoding) # don't restart if the file does not open if restart: for name in list(glodict): if name not in startnames: del glodict(name) for statement in statements(code): # statements is statement iterator exec(statement,...globals=glodict, locals=glodict) -- Terry Jan Reedy From benhoyt at gmail.com Tue Jun 10 06:02:14 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 10 Jun 2014 00:02:14 -0400 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() Message-ID: Hi folks, As pointed out to me recently in an issue report [1] on my scandir module, Python's os.stat() simply discards most of the file attribute information fetched via the Win32 system calls. On Windows, os.stat() calls CreateFile to open the file and get the dwFileAttributes value, but it throws it all away except the FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY bits. See CPython source at [2]. Given that os.stat() returns extended, platform-specific file attributes on Linux and OS X platforms (see [3] -- for example, st_blocks, st_rsize, etc), it seems that Windows is something of a second-class citizen here. There are several questions on StackOverflow about how to get this information on Windows, and one has to resort to ctypes. For example, [4]. To solve this problem, what do people think about adding an "st_winattrs" attribute to the object returned by os.stat() on Windows? Then, similarly to existing code like hasattr(st, 'st_blocks') on Linux, you could write a cross-platform function to determine if a file was hidden, something like so: FILE_ATTRIBUTE_HIDDEN = 2 # constant defined in Windows.h def is_hidden(path): if startswith(os.path.basename(path), '.'): return True st = os.stat(path) if hasattr(st, 'st_winattrs') and st.st_winattrs & FILE_ATTRIBUTE_HIDDEN: return True return False I'd be interested to hear people's thoughts on this. Thanks, Ben. [1]: https://github.com/benhoyt/scandir/issues/22 [2]: https://github.com/python/cpython/blob/master/Modules/posixmodule.c#L1462 [3]: https://docs.python.org/3.4/library/os.html#os.stat [4]: http://stackoverflow.com/a/6365265 From jim.baker at zyasoft.com Tue Jun 10 06:41:19 2014 From: jim.baker at zyasoft.com (Jim Baker) Date: Mon, 9 Jun 2014 22:41:19 -0600 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140610030303.GU10355@ando> References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> Message-ID: On Mon, Jun 9, 2014 at 9:03 PM, Steven D'Aprano wrote: > ... > There's nothing stopping alternative implementations having their own > implementation-specific standard library modules. > > steve at orac:/home/s$ jython > Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19) > [OpenJDK Server VM (Sun Microsystems Inc.)] on java1.6.0_27 > Type "help", "copyright", "credits" or "license" for more information. > >>> import java > >>> > > Small nit: Jython does implement a number of implementation-specific modules in its version of the standard library; jarray comes to mind, which is mostly but not completely superseded by the standard array module. However, the java package namespace is not part of the standard library, it's part of the standard Java ecosystem and it's due to a builtin import hook: Jython 2.7b3+ (default:6cee6fef06f0, Jun 9 2014, 22:29:14) [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/home/jbaker/jythondev/jython27/dist/Lib', '__classpath__', '__pyclasspath__/', '/home/jbaker/.local/lib/jython2.7/site-packages', '/home/jbaker/jythondev/jython27/dist/Lib/site-packages'] The entry __classpath__ means search CLASSPATH for Java packages; this includes the Java runtime, rt.jar, from which you get package namespaces as java.*, javax.*, sun.*, etc. Another behavior that you get for free in Jython is being able to also import the org.python.* namespace, which is Jython's own runtime. Some of the implementations of standard library modules, such as threading, take advantage of this support. - Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 10 09:36:02 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Jun 2014 17:36:02 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140610052312.280e49c9@x34f> References: <20140610052312.280e49c9@x34f> Message-ID: On 10 June 2014 12:23, Paul Sokolovsky wrote: > 1. It hampers interactive mode - instead of short and easy to type > execfile("file.py") one needs to use exec(open("file.py").read()). I'm > sure that's not going to bother a lot of people - after all, the > easiest way to execute a Python file is to drop back to shell and > restart python with file name, using all wonders of tab completion. But > now imagine that Python interpreter runs on bare hardware, and its REPL > is the only shell. That's exactly what we have with MicroPython's > Cortex-M port. But it's not really MicroPython-specific, there's > CPython port to baremetal either - http://www.pycorn.org/ . https://docs.python.org/3/library/runpy.html#runpy.run_path import runpy file_globals = runpy.run_path("file.py") The standard implementation of run_path reads the whole file into memory, but MicroPython would be free to optimise that and do statement by statement execution instead (while that will pose some challenges in terms of handling encoding cookies, future imports, etc correctly, it's certainly feasible). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Tue Jun 10 10:37:12 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 10 Jun 2014 09:37:12 +0100 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: On 10 June 2014 05:02, Ben Hoyt wrote: > To solve this problem, what do people think about adding an > "st_winattrs" attribute to the object returned by os.stat() on > Windows? +1. Given the precedent of Linux- and OS X-specific attributes, this seems like a no-brainer to me. Paul From p.f.moore at gmail.com Tue Jun 10 10:41:16 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 10 Jun 2014 09:41:16 +0100 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> Message-ID: On 10 June 2014 08:36, Nick Coghlan wrote: > The standard implementation of run_path reads the whole file into > memory, but MicroPython would be free to optimise that and do > statement by statement execution instead (while that will pose some > challenges in terms of handling encoding cookies, future imports, etc > correctly, it's certainly feasible). ... and if they did optimise that way, I would imagine that the patch would be a useful contribution back to the core Python stdlib, rather than remaining a MicroPython-specific optimisation. Paul From ncoghlan at gmail.com Tue Jun 10 11:07:40 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Jun 2014 19:07:40 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> Message-ID: On 10 Jun 2014 18:41, "Paul Moore" wrote: > > On 10 June 2014 08:36, Nick Coghlan wrote: > > The standard implementation of run_path reads the whole file into > > memory, but MicroPython would be free to optimise that and do > > statement by statement execution instead (while that will pose some > > challenges in terms of handling encoding cookies, future imports, etc > > correctly, it's certainly feasible). > > ... and if they did optimise that way, I would imagine that the patch > would be a useful contribution back to the core Python stdlib, rather > than remaining a MicroPython-specific optimisation. I believe it's a space/speed trade-off, so I'd be surprised if it made sense for CPython in general. There are also some behavioural differences when it comes to handling syntax errors. Now that I think about the idea a bit more, if the MicroPython folks can get a low memory usage incremental file execution model working, the semantic differences mean it would likely make the most sense as a separate API in runpy, rather than as an implicit change to run_path. Cheers, Nick. > > Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jun 10 11:34:57 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 10 Jun 2014 11:34:57 +0200 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: 2014-06-10 6:02 GMT+02:00 Ben Hoyt : > To solve this problem, what do people think about adding an > "st_winattrs" attribute to the object returned by os.stat() on > Windows? > (...) > FILE_ATTRIBUTE_HIDDEN = 2 # constant defined in Windows.h > > if hasattr(st, 'st_winattrs') and st.st_winattrs & FILE_ATTRIBUTE_HIDDEN: I don't like such API, it requires to import constants, use masks, etc. I would prefer something like: if st.win_hidden: ... Or maybe: if st.winattrs.hidden: ... Victor From python at mrabarnett.plus.com Tue Jun 10 14:03:14 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 10 Jun 2014 13:03:14 +0100 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: <5396F402.3030309@mrabarnett.plus.com> On 2014-06-10 05:02, Ben Hoyt wrote: [snip] > > FILE_ATTRIBUTE_HIDDEN = 2 # constant defined in Windows.h > > def is_hidden(path): > if startswith(os.path.basename(path), '.'): > return True > st = os.stat(path) > if hasattr(st, 'st_winattrs') and st.st_winattrs & FILE_ATTRIBUTE_HIDDEN: That could be written more succinctly as: if getattr(st, 'st_winattrs', 0) & FILE_ATTRIBUTE_HIDDEN: > return True > return False > From benhoyt at gmail.com Tue Jun 10 14:19:54 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 10 Jun 2014 08:19:54 -0400 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: > > FILE_ATTRIBUTE_HIDDEN = 2 # constant defined in Windows.h > > > > if hasattr(st, 'st_winattrs') and st.st_winattrs & FILE_ATTRIBUTE_HIDDEN: > > I don't like such API, it requires to import constants, use masks, etc. > > I would prefer something like: > > if st.win_hidden: ... > > Or maybe: > > if st.winattrs.hidden: ... Yes, fair call. However, it looks like the precent for the attributes in os.stat()'s return value has long since been set -- this is OS-specific stuff. For example, what's in "st_flags"? It's not documented, but comes straight from the OS. Same with st_rdev, st_type, etc -- the documentation doesn't define them, and it looks like they're OS-specific values. I don't think the st.win_hidden approach gains us much, because the next person is going to ask for the FILE_ATTRIBUTE_ENCRYPTED or FILE_ATTRIBUTE_COMPRESSED flag. So we really need all the bits or nothing. I don't mind the st.st_winattrs.hidden approach, except that we'd need 17 sub-attributes, and they'd all have to be documented. And if Windows added another attribute, Python wouldn't have it, etc. So I think the OS-defined constant is the way to go. Because these are fixed-forever constants, I suspect in library code and the like people would just KISS and use an integer literal and a comment, avoiding the import/constant thing: if getattr(st, 'st_winattrs', 0) & 2: # FILE_ATTRIBUTE_HIDDEN ... -Ben From benhoyt at gmail.com Tue Jun 10 14:20:56 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 10 Jun 2014 08:20:56 -0400 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: <5396F402.3030309@mrabarnett.plus.com> References: <5396F402.3030309@mrabarnett.plus.com> Message-ID: >> if hasattr(st, 'st_winattrs') and st.st_winattrs & >> FILE_ATTRIBUTE_HIDDEN: > > That could be written more succinctly as: > > if getattr(st, 'st_winattrs', 0) & FILE_ATTRIBUTE_HIDDEN: > >> return True >> return False Yes, good call. Or one further: return getattr(st, 'st_winattrs', 0) & FILE_ATTRIBUTE_HIDDEN != 0 -Ben From p.f.moore at gmail.com Tue Jun 10 14:44:43 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 10 Jun 2014 13:44:43 +0100 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: On 10 June 2014 13:19, Ben Hoyt wrote: > Because these are fixed-forever constants, I suspect in library code > and the like people would just KISS and use an integer literal and a > comment, avoiding the import/constant thing: The stat module exposes a load of constants - why not add the (currently known) ones there? Finding the values of Windows constants if you don't have access to the C headers can be a pain, so having them defined *somewhere* as named values is useful. Paul From benhoyt at gmail.com Tue Jun 10 14:58:19 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 10 Jun 2014 08:58:19 -0400 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: > The stat module exposes a load of constants - why not add the > (currently known) ones there? Finding the values of Windows constants > if you don't have access to the C headers can be a pain, so having > them defined *somewhere* as named values is useful. So stat.FILE_ATTRIBUTES_HIDDEN and the like? Alternatively they could go in ctypes.wintypes, but I think stat makes more sense in this case. -Ben From rdmurray at bitdance.com Tue Jun 10 15:05:55 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 10 Jun 2014 09:05:55 -0400 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> Message-ID: <20140610130555.7B71A250D5E@webabinitio.net> On Tue, 10 Jun 2014 19:07:40 +1000, Nick Coghlan wrote: > On 10 Jun 2014 18:41, "Paul Moore" wrote: > > > > On 10 June 2014 08:36, Nick Coghlan wrote: > > > The standard implementation of run_path reads the whole file into > > > memory, but MicroPython would be free to optimise that and do > > > statement by statement execution instead (while that will pose some > > > challenges in terms of handling encoding cookies, future imports, etc > > > correctly, it's certainly feasible). > > > > ... and if they did optimise that way, I would imagine that the patch > > would be a useful contribution back to the core Python stdlib, rather > > than remaining a MicroPython-specific optimisation. > > I believe it's a space/speed trade-off, so I'd be surprised if it made > sense for CPython in general. There are also some behavioural differences > when it comes to handling syntax errors. > > Now that I think about the idea a bit more, if the MicroPython folks can > get a low memory usage incremental file execution model working, the > semantic differences mean it would likely make the most sense as a separate > API in runpy, rather than as an implicit change to run_path. If it is a separate API, it seems like there's no reason it couldn't be contributed back to CPython. There might be other contexts in which low memory would be the right tradeoff. Although, if key bits end up working at the C level, "contributing back" might require writing separate C for CPython, so that might not happen. --David From ncoghlan at gmail.com Tue Jun 10 15:11:18 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Jun 2014 23:11:18 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140610130555.7B71A250D5E@webabinitio.net> References: <20140610052312.280e49c9@x34f> <20140610130555.7B71A250D5E@webabinitio.net> Message-ID: On 10 June 2014 23:05, R. David Murray wrote: > On Tue, 10 Jun 2014 19:07:40 +1000, Nick Coghlan wrote: >> I believe it's a space/speed trade-off, so I'd be surprised if it made >> sense for CPython in general. There are also some behavioural differences >> when it comes to handling syntax errors. >> >> Now that I think about the idea a bit more, if the MicroPython folks can >> get a low memory usage incremental file execution model working, the >> semantic differences mean it would likely make the most sense as a separate >> API in runpy, rather than as an implicit change to run_path. > > If it is a separate API, it seems like there's no reason it couldn't be > contributed back to CPython. There might be other contexts in which > low memory would be the right tradeoff. Although, if key bits end > up working at the C level, "contributing back" might require writing > separate C for CPython, so that might not happen. Yeah, as a separate API it could make sense in CPython - I just didn't go back and revise the first paragraph after writing the second one :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Tue Jun 10 15:22:04 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 10 Jun 2014 14:22:04 +0100 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: On 10 June 2014 13:58, Ben Hoyt wrote: > So stat.FILE_ATTRIBUTES_HIDDEN and the like? Yep. (Maybe WIN_FILE_ATTRIBUTES_HIDDEN, but the Unix ones don't have an OA name prefix, so I'd go with your original). Paul From Steve.Dower at microsoft.com Tue Jun 10 18:30:24 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 10 Jun 2014 16:30:24 +0000 Subject: [Python-Dev] Python 3.5 on VC14 - update Message-ID: For anyone who is interested in more details on the CRT changes, there's a blog post from my colleague who worked on most of them at http://blogs.msdn.com/b/vcblog/archive/2014/06/10/the-great-crt-refactoring.aspx I wanted to call out one section and add some details: In order to unify these different CRTs [desktop, phone, etc], we have split the CRT into three pieces: 1. VCRuntime (vcruntime140.dll): This DLL contains all of the runtime functionality required for things like process startup and exception handling, and functionality that is coupled to the compiler for one reason or another. We may need to make breaking changes to this library in the future. 2. AppCRT (appcrt140.dll): This DLL contains all of the functionality that is usable on all platforms. This includes the heap, the math library, the stdio and locale libraries, most of the string manipulation functions, the time library, and a handful of other functions. We will maintain backwards compatibility for this part of the CRT. 3. DesktopCRT (desktopcrt140.dll): This DLL contains all of the functionality that is usable only by desktop apps. Notably, this includes the functions for working with multibyte strings, the exec and spawn process management functions, and the direct-to-console I/O functions. We will maintain backwards compatibility for this part of the CRT. The builds of Python I've already made are indeed linked against these three DLLs, though it happens transparently. Most of the APIs are from the AppCRT, which is a good sign as it will simplify portability to other Windows-based platforms (though the direct references to the Win32 API will arise again to complicate this). Very few functions are imported from VCRuntime, which is the only part that *may* have breaking changes in the future (that's the current promise, and I'd expect it to be strengthened one way or the other by releas). Apart from the standard memcpy/strcpy type functions (which may be moved in later builds), these other imports are compiler helpers: * void terminate(void) (currently exported as a decorated C++ function, but that's going to be fixed) * __vcrt_TerminateProcess * __vcrt_UnhandledException * __vcrt_cleanup_type_info_names * _except_handler4_common * _local_unwind4 I've checked with our CRT dev and he says that these don't keep any state (and won't cause problems like we've seen in the past with FILE*), and are only there to deal with potential C++ exceptions - they are included at a point where it is impossible to tell whether C++ is involved, and so can't be removed. My builds pass almost all of regrtest.py and the only issues are with Tcl/tk and OpenSSL, which need to update their compiler version detection. I've built them with changes, though as usual Tcl/tk is a real pain. I ran a quick test with profile-guided optimization (PGO, pronounced "pogo"), which has supposedly been improved since VC9, and saw a very unscientific 20% speed improvement on pybench.py and 10% size reduction in python35.dll. I'm not sure what we used to get from VC9, but it certainly seems worth enabling provided it doesn't break anything. (Interestingly, PGO decided that only 1% of functions needed to be compiled for speed. Not sure if I can find out which ones those are but if anyone's interested I can give it a shot?) Cheers, Steve From ethan at stoneleaf.us Tue Jun 10 19:17:28 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 10 Jun 2014 10:17:28 -0700 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: Message-ID: <53973DA8.1090602@stoneleaf.us> On 06/09/2014 09:02 PM, Ben Hoyt wrote: > > To solve this problem, what do people think about adding an > "st_winattrs" attribute to the object returned by os.stat() on > Windows? +1 to the idea, whatever the exact implementation. -- ~Ethan~ From zachary.ware+pydev at gmail.com Tue Jun 10 20:02:51 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Tue, 10 Jun 2014 13:02:51 -0500 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: <53973DA8.1090602@stoneleaf.us> References: <53973DA8.1090602@stoneleaf.us> Message-ID: On Tue, Jun 10, 2014 at 12:17 PM, Ethan Furman wrote: > On 06/09/2014 09:02 PM, Ben Hoyt wrote: >> To solve this problem, what do people think about adding an >> "st_winattrs" attribute to the object returned by os.stat() on >> Windows? > > > +1 to the idea, whatever the exact implementation. Agreed. -- Zach From antoine at python.org Tue Jun 10 20:26:33 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 10 Jun 2014 14:26:33 -0400 Subject: [Python-Dev] Python 3.5 on VC14 - update In-Reply-To: References: Message-ID: Le 10/06/2014 12:30, Steve Dower a ?crit : > > I ran a quick test with profile-guided optimization (PGO, pronounced "pogo"), which has supposedly been improved since VC9, and saw a very unscientific 20% speed improvement on pybench.py and 10% size reduction in python35.dll. I'm not sure what we used to get from VC9, but it certainly seems worth enabling provided it doesn't break anything. (Interestingly, PGO decided that only 1% of functions needed to be compiled for speed. Not sure if I can find out which ones those are but if anyone's interested I can give it a shot?) I would recommend using the non-trivial suite of benchmarks at http://hg.python.org/benchmarks (both for the profiling and the benchmarking, though you may want to use additional workloads for profiling too) Regards Antoine. From eric at trueblade.com Tue Jun 10 20:33:27 2014 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 10 Jun 2014 14:33:27 -0400 Subject: [Python-Dev] namedtuple implementation grumble In-Reply-To: <53950C26.10006@trueblade.com> References: <53920D91.3060207@simplistix.co.uk> <5394B5F3.6050403@trueblade.com> <20140608233117.GS10355@ando> <53950C26.10006@trueblade.com> Message-ID: <53974F77.8000302@trueblade.com> >> I wonder how a hybrid approach would work? Use a dynamically-created >> class, but then construct the __new__ method using exec and inject it >> into the new class. As far as I can see, it's only __new__ that benefits >> from the exec approach. >> >> Anyone tried this yet? Is it worth an experiment? > > I'm not sure what the benefit would be. Other than the ast manipulations > for __new__, the rest of the non-exec code is easy to understand. I misread this, sorry. This might work for collections.namedtuple, but is probably not worth the hassle or churn of changing it. The main reason I switched to ast for namedlist is because generating the text version of __new__ or __init__ with default parameter values was extremely difficult, so an approach of exec-ing that one function wouldn't work for me. Eric. From hasan.diwan at gmail.com Tue Jun 10 20:51:16 2014 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Tue, 10 Jun 2014 11:51:16 -0700 Subject: [Python-Dev] Documentation Oversight Message-ID: >From the csv module pydoc: "The optional "dialect" parameter is discussed below" The discussion is actually above the method. Present in 2.7.6. -- H -- Sent from my mobile device Envoy? de mon portable -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Tue Jun 10 20:37:10 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 10 Jun 2014 18:37:10 +0000 Subject: [Python-Dev] Python 3.5 on VC14 - update In-Reply-To: References: Message-ID: > Antoine Pitrou wrote: > Le 10/06/2014 12:30, Steve Dower a ?crit : >> >> I ran a quick test with profile-guided optimization (PGO, pronounced >> "pogo"), which has supposedly been improved since VC9, and saw a very >> unscientific 20% speed improvement on pybench.py and 10% size reduction in >> python35.dll. I'm not sure what we used to get from VC9, but it certainly seems >> worth enabling provided it doesn't break anything. >> (Interestingly, PGO decided that only 1% of functions needed to be compiled for >> speed. Not sure if I can find out which ones those are but if anyone's >> interested I can give it a shot?) > > I would recommend using the non-trivial suite of benchmarks at > http://hg.python.org/benchmarks (both for the profiling and the benchmarking, > though you may want to use additional workloads for profiling too) > > Regards > > Antoine. > Thanks. I knew there was a proper set somewhere, but didn't manage to track it down in the minute or so I spent looking :) Cheers, Steve From benhoyt at gmail.com Tue Jun 10 21:04:12 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 10 Jun 2014 15:04:12 -0400 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: <53973DA8.1090602@stoneleaf.us> References: <53973DA8.1090602@stoneleaf.us> Message-ID: >> To solve this problem, what do people think about adding an >> "st_winattrs" attribute to the object returned by os.stat() on >> Windows? > > +1 to the idea, whatever the exact implementation. Cool. I think we should add a st_winattrs integer attribute (on Windows) and then also add the FILE_ATTRIBUTES_* constants to stat.py per Paul Moore. What would be the next steps to get this to happen? Open an issue on bugs.python.org and submit a patch with tests? -Ben From zachary.ware+pydev at gmail.com Tue Jun 10 21:08:26 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Tue, 10 Jun 2014 14:08:26 -0500 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: <53973DA8.1090602@stoneleaf.us> Message-ID: On Tue, Jun 10, 2014 at 2:04 PM, Ben Hoyt wrote: >>> To solve this problem, what do people think about adding an >>> "st_winattrs" attribute to the object returned by os.stat() on >>> Windows? >> >> +1 to the idea, whatever the exact implementation. > > Cool. > > I think we should add a st_winattrs integer attribute (on Windows) and > then also add the FILE_ATTRIBUTES_* constants to stat.py per Paul > Moore. Add to _stat.c rather than stat.py. > What would be the next steps to get this to happen? Open an issue on > bugs.python.org and submit a patch with tests? Yep! -- Zach From tjreedy at udel.edu Tue Jun 10 21:49:26 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 10 Jun 2014 15:49:26 -0400 Subject: [Python-Dev] Documentation Oversight In-Reply-To: References: Message-ID: <53976146.6040007@udel.edu> On 6/10/2014 2:51 PM, Hasan Diwan wrote: > From the csv module pydoc: > "The optional "dialect" parameter is discussed below" > > The discussion is actually above the method. Present in 2.7.6. Bug reports should be posted on the tracker rather than sent here. Short doc reports like this can be sent to docs at python.org. Also, the docs are continuous updated. Reports should be based on the current version as docs.python.org. As it turns out, this sentence is not in the current Doc/library/csv.rst or the online version at https://docs.python.org/3/library/csv.html#module-csv If this is what you meant, something has been changed. -- Terry Jan Reedy From victor.stinner at gmail.com Tue Jun 10 22:29:07 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 10 Jun 2014 22:29:07 +0200 Subject: [Python-Dev] Python 3.5 on VC14 - update In-Reply-To: References: Message-ID: 2014-06-10 18:30 GMT+02:00 Steve Dower : > I ran a quick test with profile-guided optimization (PGO, pronounced "pogo"), which has supposedly been improved since VC9, and saw a very unscientific 20% speed improvement on pybench.py and 10% size reduction in python35.dll. I'm not sure what we used to get from VC9, but it certainly seems worth enabling provided it doesn't break anything. (Interestingly, PGO decided that only 1% of functions needed to be compiled for speed. Not sure if I can find out which ones those are but if anyone's interested I can give it a shot?) If we upgrade the compiler on Windows, some optimizer options can maybe be enabled again. Previous Visual Studio (2010?) bugs: * http://bugs.python.org/issue15993 * http://bugs.python.org/issue8847#msg166935 Victor From martin at v.loewis.de Wed Jun 11 00:05:42 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 11 Jun 2014 00:05:42 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <438e8a27e8e643f4841a22b24447b956@BLUPR03MB389.namprd03.prod.outlook.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <53921464.7030400@v.loewis.de> <5392232A.2000102@v.loewis.de> <438e8a27e8e643f4841a22b24447b956@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <53978136.4000307@v.loewis.de> Am 07.06.14 01:01, schrieb Steve Dower: > We keep the VS 2010 files around and make sure they keep working. > This is the biggest risk of the whole plan, but I believe that > there's enough of a gap between when VS 14 is planned to release > (which I know, but can't share) and when Python 3.5 is planned (which > I don't know, but have a semi-informed guess). By "keep around", I'd be fine with "in a subdirectory of PC". PCbuild should either switch for sure, or not switch at all. People had proposed to come up with a "PCbuildN" directory (N=10, N=14, or whatever) to maintain two build environments simultaneously; I'd be -1 on such a plan. There needs to be one official toolset to build Python X.Y with, and it needs to be either VS 2010 or VS 2014, but not both. > Is Python 3.5b1 being built with VS 14 RC (hypothetically) a blocking > issue? Do we need to resolve that now or can it wait until it > happens? It's up to the release manager, but I'd personally see it as a blocking issue: we shouldn't use a beta compiler for the final release, and we shouldn't switch compilers (back) after b1. The RM *could* opt to bet on VS 14 RTM appearing before 3.5rc1 is released (or otherwise blocking rc1 until VS 14 is released); I would consider this risy, but possibly worth it. We certainly don't need to resolve this now. We should discuss it again when the release schedule for 3.5 is proposed. Regards, Martin From martin at v.loewis.de Wed Jun 11 00:15:15 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 11 Jun 2014 00:15:15 +0200 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <1402155524095.94474@microsoft.com> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <4bad156ff9f145b792191327736e672d@BLUPR03MB389.namprd03.prod.outlook.com> <53920B4C.8020700@egenix.com> <20140606185631.GA11094@k2> <896772508423787267.391031sturla.molden-gmail.com@news.gmane.org> <2FCC7CC7-8D23-45BF-8157-1C92B9566A16@stufft.io> , <1402155524095.94474@microsoft.com> Message-ID: <53978373.5050701@v.loewis.de> Am 07.06.14 17:38, schrieb Steve Dower: > One more possible concern that I just thought of is the availability of > the build tools on Windows Vista and Windows 7 RTM (that is, without > SP1). I'd have to check, but I don't believe anything after VS 2012 is > supported on Vista and it's entirely possible that installation is blocked. I wouldn't worry about that. People can be asked to update their build machines (within reason), as long as the resulting binaries should work on older systems still. There are testing issues, of course, but they show up even in other cases, like testing whether a 32-bit installer actually runs on a 32-bit system when the build system is a 64-bit system; such issues will always exist. Regards, Martin From martin at v.loewis.de Wed Jun 11 00:24:48 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 11 Jun 2014 00:24:48 +0200 Subject: [Python-Dev] Python 3.5 on VC14 - update In-Reply-To: References: Message-ID: <539785B0.8030909@v.loewis.de> Am 10.06.14 18:30, schrieb Steve Dower: > I ran a quick test with profile-guided optimization (PGO, pronounced > "pogo"), which has supposedly been improved since VC9, and saw a very > unscientific 20% speed improvement on pybench.py and 10% size > reduction in python35.dll. I'm not sure what we used to get from VC9, > but it certainly seems worth enabling provided it doesn't break > anything. (Interestingly, PGO decided that only 1% of functions > needed to be compiled for speed. Not sure if I can find out which > ones those are but if anyone's interested I can give it a shot?) You probably ran too little Python code. See PCbuild/build_pgo.bat for what used to be part of the release process. It takes quite some time, but it rebuilt more than 1% (IIRC). FWIW, I stopped using PGO for the official releases when it was demonstrated to generate bad code. In my experience, a compiler that generates bad code has lost trust "forever", so it will be hard to justify re-enabling PGO (like "but it really works this time"). I wasn't sad when I found a justification to skip the profiling, since it significantly held up the release process. Regards, Martin From Steve.Dower at microsoft.com Wed Jun 11 00:48:21 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 10 Jun 2014 22:48:21 +0000 Subject: [Python-Dev] Python 3.5 on VC14 - update In-Reply-To: <539785B0.8030909@v.loewis.de> References: <539785B0.8030909@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Am 10.06.14 18:30, schrieb Steve Dower: >> I ran a quick test with profile-guided optimization (PGO, pronounced >> "pogo"), which has supposedly been improved since VC9, and saw a very >> unscientific 20% speed improvement on pybench.py and 10% size >> reduction in python35.dll. I'm not sure what we used to get from VC9, >> but it certainly seems worth enabling provided it doesn't break >> anything. (Interestingly, PGO decided that only 1% of functions needed >> to be compiled for speed. Not sure if I can find out which ones those >> are but if anyone's interested I can give it a shot?) > > You probably ran too little Python code. See PCbuild/build_pgo.bat for what used > to be part of the release process. It takes quite some time, but it rebuilt more > than 1% (IIRC). That's almost certainly the case. I didn't run anywhere near enough to call it good, though I'd only really expect the size to get worse and the speed to get better. > FWIW, I stopped using PGO for the official releases when it was demonstrated to > generate bad code. In my experience, a compiler that generates bad code has lost > trust "forever", so it will be hard to justify re-enabling PGO (like "but it > really works this time"). I wasn't sad when I found a justification to skip the > profiling, since it significantly held up the release process. Yeah, and it seems the bad code is still there. I suspect it's actually due to optimizing for space rather than speed, and not due to PGO directly, but either way I'll be trying to get it fixed. [EARLIER EMAIL] > By "keep around", I'd be fine with "in a subdirectory of PC". PCbuild should > either switch for sure, or not switch at all. People had proposed to come up > with a "PCbuildN" directory (N=10, N=14, or whatever) to maintain two build > environments simultaneously; I'd be -1 on such a plan. There needs to be one > official toolset to build Python X.Y with, and it needs to be either VS 2010 or > VS 2014, but not both. That's what I have planned. Right now it's in my sandbox and I've just replaced the existing PCbuild contents (rather wholesale - I took the opportunity to simplify the files, which is important to me as I spend most of my time editing them by hand rather than through VS). When/if I merge, the version in PC\VS10.0 will be exactly what was there at merge time. > Regards, > Martin And thanks, I appreciate the context and suggestions. Cheers, Steve From thomas at python.org Wed Jun 11 03:10:43 2014 From: thomas at python.org (Thomas Wouters) Date: Tue, 10 Jun 2014 18:10:43 -0700 Subject: [Python-Dev] Python 3.5 on VC14 - update In-Reply-To: References: Message-ID: On Tue, Jun 10, 2014 at 9:30 AM, Steve Dower wrote: > > I ran a quick test with profile-guided optimization (PGO, pronounced > "pogo"), which has supposedly been improved since VC9, and saw a very > unscientific 20% speed improvement on pybench.py and 10% size reduction in > python35.dll. I'm not sure what we used to get from VC9, but it certainly > seems worth enabling provided it doesn't break anything. (Interestingly, > PGO decided that only 1% of functions needed to be compiled for speed. Not > sure if I can find out which ones those are but if anyone's interested I > can give it a shot?) > For what it's worth, we build Google's internal Python interpreters with gcc's flavour of PGO and are seeing somewhat more than 20% performance increase for Python 2.7. (We train using most of the testsuite, not pybench, and I believe the Debian/Ubuntu packages also do this.) I believe almost all of that is from speedups to the main eval loop, which is a huge switch in a bigger loop with complicated jump logic. It wouldn't surprise me if VS's PGO only decided to optimize that eval loop :) -- Thomas Wouters Hi! I'm an email virus! Think twice before sending your email to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Wed Jun 11 03:30:49 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Tue, 10 Jun 2014 18:30:49 -0700 Subject: [Python-Dev] Why does IOBase.__del__ call .close? Message-ID: <87d2egnsfq.fsf@vostro.rath.org> Hello, I recently noticed (after some rather protacted debugging) that the io.IOBase class comes with a destructor that calls self.close(): [0] nikratio at vostro:~/tmp$ cat test.py import io class Foo(io.IOBase): def close(self): print('close called') r = Foo() del r [0] nikratio at vostro:~/tmp$ python3 test.py close called To me, this came as quite a surprise, and the best "documentation" of this feature seems to be the following note (from the io library reference): "The abstract base classes also provide default implementations of some methods in order to help implementation of concrete stream classes. For example, BufferedIOBase provides unoptimized implementations of readinto() and readline()." For me, having __del__ call close() does not qualify as a reasonable default implementation unless close() is required to be idempotent (which one could deduce from the documentation if one tries to, but it's far from clear). Is this behavior an accident, or was that a deliberate decision? Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From python at mrabarnett.plus.com Wed Jun 11 03:51:43 2014 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 11 Jun 2014 02:51:43 +0100 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <87d2egnsfq.fsf@vostro.rath.org> References: <87d2egnsfq.fsf@vostro.rath.org> Message-ID: <5397B62F.80004@mrabarnett.plus.com> On 2014-06-11 02:30, Nikolaus Rath wrote: > Hello, > > I recently noticed (after some rather protacted debugging) that the > io.IOBase class comes with a destructor that calls self.close(): > > [0] nikratio at vostro:~/tmp$ cat test.py > import io > class Foo(io.IOBase): > def close(self): > print('close called') > r = Foo() > del r > [0] nikratio at vostro:~/tmp$ python3 test.py > close called > > To me, this came as quite a surprise, and the best "documentation" of > this feature seems to be the following note (from the io library > reference): > > "The abstract base classes also provide default implementations of some > methods in order to help implementation of concrete stream classes. For > example, BufferedIOBase provides unoptimized implementations of > readinto() and readline()." > > For me, having __del__ call close() does not qualify as a reasonable > default implementation unless close() is required to be idempotent > (which one could deduce from the documentation if one tries to, but it's > far from clear). > > Is this behavior an accident, or was that a deliberate decision? > To me, it makes sense. You want to make sure that it's closed, releasing any resources it might be holding, even if you haven't done so explicitly. From antoine at python.org Wed Jun 11 04:28:17 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 10 Jun 2014 22:28:17 -0400 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <87d2egnsfq.fsf@vostro.rath.org> References: <87d2egnsfq.fsf@vostro.rath.org> Message-ID: Le 10/06/2014 21:30, Nikolaus Rath a ?crit : > > For me, having __del__ call close() does not qualify as a reasonable > default implementation unless close() is required to be idempotent > (which one could deduce from the documentation if one tries to, but it's > far from clear). close() should indeed be idempotent on all bundled IO class implementations (otherwise it's a bug), and so should it preferably on third-party IO class implementations. If you want to improve the documentation on this, you're welcome to provide a patch! Regards Antoine. From ncoghlan at gmail.com Wed Jun 11 14:38:13 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 11 Jun 2014 22:38:13 +1000 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: References: <87d2egnsfq.fsf@vostro.rath.org> Message-ID: On 11 Jun 2014 12:31, "Antoine Pitrou" wrote: > > Le 10/06/2014 21:30, Nikolaus Rath a ?crit : > >> >> For me, having __del__ call close() does not qualify as a reasonable >> default implementation unless close() is required to be idempotent >> (which one could deduce from the documentation if one tries to, but it's >> far from clear). > > > close() should indeed be idempotent on all bundled IO class implementations (otherwise it's a bug), and so should it preferably on third-party IO class implementations. > > If you want to improve the documentation on this, you're welcome to provide a patch! We certainly assume idempotent close() behaviour in various places, so if that expectation isn't currently clear in the docs, suggestions for improved wording would definitely be appreciated! Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Wed Jun 11 15:27:25 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 11 Jun 2014 09:27:25 -0400 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: <53973DA8.1090602@stoneleaf.us> Message-ID: >> What would be the next steps to get this to happen? Open an issue on >> bugs.python.org and submit a patch with tests? > > Yep! Okay, I've done step one (opened an issue on bugs.python.org), and hope to provide a patch in the next few weeks if no-one else does (I've never compiled CPython on Windows before): http://bugs.python.org/issue21719 -Ben From victor.stinner at gmail.com Wed Jun 11 16:28:53 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 11 Jun 2014 16:28:53 +0200 Subject: [Python-Dev] Issue #21205: add __qualname__ to generators Message-ID: Hi, I'm working on asyncio and it's difficult to debug code because @asyncio.coroutine decorator removes the name of the function if the function is not a generator (if it doesn't use yield from). I propose to add new gi_name and gi_qualname fields to the C structure PyGenObject, add a new __qualname__ (= gi_qualname) attribute to the Python API of generator, and change how the default value of __name__ (= gi_name) of generators. Instead of getting the name from the code object, I propose to get the name from the function (if the generator was created from a function). So if the function name was modified, you get the new name instead of getting the name from the code object (as done in Python 3.4). I also propose to display the qualified name in repr(generator) instead of the name. All these changes should make my life easier to debug asyncio, but it should help any project using generators. Issues describing the problem, I attached a patch implementing my ideas: http://bugs.python.org/issue21205 Would you be ok with these (minor) incompatible changes? By the way, it looks like generator attributes were never documented :-( My patch also adds a basic documentation (at least, it lists all attributes in the documentation of the inspect module). Victor From antoine at python.org Wed Jun 11 18:17:40 2014 From: antoine at python.org (Antoine Pitrou) Date: Wed, 11 Jun 2014 12:17:40 -0400 Subject: [Python-Dev] Issue #21205: add __qualname__ to generators In-Reply-To: References: Message-ID: Le 11/06/2014 10:28, Victor Stinner a ?crit : > Hi, > > I'm working on asyncio and it's difficult to debug code because > @asyncio.coroutine decorator removes the name of the function if the > function is not a generator (if it doesn't use yield from). > > I propose to add new gi_name and gi_qualname fields to the C structure > PyGenObject, add a new __qualname__ (= gi_qualname) attribute to the > Python API of generator, and change how the default value of __name__ > (= gi_name) of generators. > > Instead of getting the name from the code object, I propose to get the > name from the function (if the generator was created from a function). > So if the function name was modified, you get the new name instead of > getting the name from the code object (as done in Python 3.4). > > I also propose to display the qualified name in repr(generator) > instead of the name. > > All these changes should make my life easier to debug asyncio, but it > should help any project using generators. > > Issues describing the problem, I attached a patch implementing my ideas: > http://bugs.python.org/issue21205 > > Would you be ok with these (minor) incompatible changes? +1 from me. Regards Antoine. From tjreedy at udel.edu Wed Jun 11 18:24:35 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 11 Jun 2014 12:24:35 -0400 Subject: [Python-Dev] Returning Windows file attribute information via os.stat() In-Reply-To: References: <53973DA8.1090602@stoneleaf.us> Message-ID: On 6/11/2014 9:27 AM, Ben Hoyt wrote: >>> What would be the next steps to get this to happen? Open an issue on >>> bugs.python.org and submit a patch with tests? >> >> Yep! > > Okay, I've done step one (opened an issue on bugs.python.org), and > hope to provide a patch in the next few weeks if no-one else does > (I've never compiled CPython on Windows before): > > http://bugs.python.org/issue21719 If you have problems compiling, the core-mentorship list is one place to ask. For 3.4+, I believe the devguide instructions are correct. If not, say something. -- Terry Jan Reedy From techtonik at gmail.com Wed Jun 11 22:26:26 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 11 Jun 2014 23:26:26 +0300 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character Message-ID: I am banned from tracker, so I post the bug here: Normal Windows behavior: >hg status --rev ".^1" M mercurial\commands.py ? pysptest.py >hg status --rev .^1 abort: unknown revision '.1'! So, ^ is an escape character. See http://www.tomshardware.co.uk/forum/35565-45-when-special-command-line But subprocess doesn't escape it, making cross-platform command fail on Windows. ---[cut pysptest.py]-- import subprocess as sp # this fails with # abort: unknown revision '.1'! cmd = ['hg', 'status', '--rev', '.^1'] # this works #cmd = 'hg status --rev ".^1"' # this works too #cmd = ['hg', 'status', '--rev', '.^^1'] try: print sp.check_output(cmd, stderr=sp.STDOUT, shell=True) except Exception as e: print e.output ------------------------------ -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Jun 11 23:58:30 2014 From: rymg19 at gmail.com (Ryan) Date: Wed, 11 Jun 2014 16:58:30 -0500 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: Of course! And, why not escape everything else, too? abc -> ^a^b^c echo %PATH% -> ^e^c^h^o^ ^%^P^A^T^H^% In all seriousness, to me this is obvious. When you pass a command to the shell, naturally, certain details are shell-specific. -10000. Bad idea. Very bad idea. If you want the ^ to be escaped, do it yourself. Or better yet, don't pass shell=True. anatoly techtonik wrote: >I am banned from tracker, so I post the bug here: > >Normal Windows behavior: > > >hg status --rev ".^1" > M mercurial\commands.py > ? pysptest.py > > >hg status --rev .^1 > abort: unknown revision '.1'! > >So, ^ is an escape character. See >http://www.tomshardware.co.uk/forum/35565-45-when-special-command-line > > >But subprocess doesn't escape it, making cross-platform command fail on >Windows. > >---[cut pysptest.py]-- >import subprocess as sp > ># this fails with ># abort: unknown revision '.1'! >cmd = ['hg', 'status', '--rev', '.^1'] ># this works >#cmd = 'hg status --rev ".^1"' ># this works too >#cmd = ['hg', 'status', '--rev', '.^^1'] > >try: > print sp.check_output(cmd, stderr=sp.STDOUT, shell=True) >except Exception as e: > print e.output >------------------------------ > >-- >anatoly t. > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: >https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jun 12 00:30:30 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 12 Jun 2014 08:30:30 +1000 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: On Thu, Jun 12, 2014 at 7:58 AM, Ryan wrote: > In all seriousness, to me this is obvious. When you pass a command to the > shell, naturally, certain details are shell-specific. > > -10000. Bad idea. Very bad idea. If you want the ^ to be escaped, do it > yourself. Or better yet, don't pass shell=True. Definitely the latter. Why pass shell=True when executing a single command? I don't get it. ChrisA From benjamin at python.org Thu Jun 12 00:34:51 2014 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 11 Jun 2014 15:34:51 -0700 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: <1402526091.15771.127814637.2B184603@webmail.messagingengine.com> On Wed, Jun 11, 2014, at 13:26, anatoly techtonik wrote: > I am banned from tracker, so I post the bug here: Being banned from the tracker is not an invitation to use python-dev@ as one. From rdmurray at bitdance.com Thu Jun 12 01:00:29 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 11 Jun 2014 19:00:29 -0400 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: <20140611230030.6F56F250DC4@webabinitio.net> Also notice that using a list with shell=True is using the API incorrectly. It wouldn't even work on Linux, so that torpedoes the cross-platform concern already :) This kind of confusion is why I opened http://bugs.python.org/issue7839. On Wed, 11 Jun 2014 16:58:30 -0500, Ryan wrote: > Of course! And, why not escape everything else, too? > > abc -> ^a^b^c > > echo %PATH% -> ^e^c^h^o^ ^%^P^A^T^H^% > > In all seriousness, to me this is obvious. When you pass a command to the shell, naturally, certain details are shell-specific. > > -10000. Bad idea. Very bad idea. If you want the ^ to be escaped, do it yourself. Or better yet, don't pass shell=True. > > anatoly techtonik wrote: > >I am banned from tracker, so I post the bug here: > > > >Normal Windows behavior: > > > > >hg status --rev ".^1" > > M mercurial\commands.py > > ? pysptest.py > > > > >hg status --rev .^1 > > abort: unknown revision '.1'! > > > >So, ^ is an escape character. See > >http://www.tomshardware.co.uk/forum/35565-45-when-special-command-line > > > > > >But subprocess doesn't escape it, making cross-platform command fail on > >Windows. > > > >---[cut pysptest.py]-- > >import subprocess as sp > > > ># this fails with > ># abort: unknown revision '.1'! > >cmd = ['hg', 'status', '--rev', '.^1'] > ># this works > >#cmd = 'hg status --rev ".^1"' > ># this works too > >#cmd = ['hg', 'status', '--rev', '.^^1'] > > > >try: > > print sp.check_output(cmd, stderr=sp.STDOUT, shell=True) > >except Exception as e: > > print e.output > >------------------------------ > > > >-- > >anatoly t. > > > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Python-Dev mailing list > >Python-Dev at python.org > >https://mail.python.org/mailman/listinfo/python-dev > >Unsubscribe: > >https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/rdmurray%40bitdance.com From techtonik at gmail.com Thu Jun 12 00:53:20 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 12 Jun 2014 01:53:20 +0300 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: On Thu, Jun 12, 2014 at 1:30 AM, Chris Angelico wrote: > On Thu, Jun 12, 2014 at 7:58 AM, Ryan wrote: > > In all seriousness, to me this is obvious. When you pass a command to the > > shell, naturally, certain details are shell-specific. > On Windows cmd.exe is used by default: http://hg.python.org/cpython/file/38a325c84564/Lib/subprocess.py#l1108 so it makes sense to make default behavior cross-platform. > > -10000. Bad idea. Very bad idea. If you want the ^ to be escaped, do it > > yourself. Or better yet, don't pass shell=True. > > Definitely the latter. Why pass shell=True when executing a single > command? I don't get it. > This is a complete use case using Rietveld upload script: http://techtonik.rainforce.org/2013/07/code-review-with-rietveld-and-mercurial.html I am interested to know how to modify upload script without kludges: https://code.google.com/p/rietveld/source/browse/upload.py#1056 I expect many people are facing with the same problem trying to wrap Git and HG with Python scripts. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Thu Jun 12 01:00:55 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 12 Jun 2014 02:00:55 +0300 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: On Thu, Jun 12, 2014 at 1:30 AM, Chris Angelico wrote: > Why pass shell=True when executing a single > command? I don't get it. > I don't know about Linux, but on Windows programs are not directly available as /usr/bin/python, so you need to find command in PATH directories. Passing shell=True makes this lookup done by shell and not manually. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Thu Jun 12 02:11:53 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 11 Jun 2014 17:11:53 -0700 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <5397B62F.80004@mrabarnett.plus.com> (MRAB's message of "Wed, 11 Jun 2014 02:51:43 +0100") References: <87d2egnsfq.fsf@vostro.rath.org> <5397B62F.80004@mrabarnett.plus.com> Message-ID: <87a99jnfzq.fsf@vostro.rath.org> MRAB writes: > On 2014-06-11 02:30, Nikolaus Rath wrote: >> Hello, >> >> I recently noticed (after some rather protacted debugging) that the >> io.IOBase class comes with a destructor that calls self.close(): >> >> [0] nikratio at vostro:~/tmp$ cat test.py >> import io >> class Foo(io.IOBase): >> def close(self): >> print('close called') >> r = Foo() >> del r >> [0] nikratio at vostro:~/tmp$ python3 test.py >> close called >> >> To me, this came as quite a surprise, and the best "documentation" of >> this feature seems to be the following note (from the io library >> reference): >> >> "The abstract base classes also provide default implementations of some >> methods in order to help implementation of concrete stream classes. For >> example, BufferedIOBase provides unoptimized implementations of >> readinto() and readline()." >> >> For me, having __del__ call close() does not qualify as a reasonable >> default implementation unless close() is required to be idempotent >> (which one could deduce from the documentation if one tries to, but it's >> far from clear). >> >> Is this behavior an accident, or was that a deliberate decision? >> > To me, it makes sense. You want to make sure that it's closed, releasing > any resources it might be holding, even if you haven't done so > explicitly. I agree with your intentions, but I come to the opposite conclusion: automatically calling close() in the destructor will hide that there's a problem in the code. Without that automatic cleanup, there's at least a good chance that a ResourceWarning will be emitted so the problem gets noticed. "Silently work around bugs in caller's code" doesn't seem like a very useful default to me... Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From techtonik at gmail.com Thu Jun 12 02:00:42 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 12 Jun 2014 03:00:42 +0300 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <20140611230030.6F56F250DC4@webabinitio.net> References: <20140611230030.6F56F250DC4@webabinitio.net> Message-ID: On Thu, Jun 12, 2014 at 2:00 AM, R. David Murray wrote: > Also notice that using a list with shell=True is using the API > incorrectly. It wouldn't even work on Linux, so that torpedoes > the cross-platform concern already :) > > This kind of confusion is why I opened http://bugs.python.org/issue7839. I thought exactly about that. Usually separate arguments are used to avoid problems with escaping of quotes and other stuff. I'd deprecate subprocess and split it into separate modules. One is about shell execution and another one is for secure process control. shell execution module then could build on top of process control and be insecure by design. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Thu Jun 12 02:54:53 2014 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 11 Jun 2014 17:54:53 -0700 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <87a99jnfzq.fsf@vostro.rath.org> References: <87d2egnsfq.fsf@vostro.rath.org> <5397B62F.80004@mrabarnett.plus.com> <87a99jnfzq.fsf@vostro.rath.org> Message-ID: <1402534493.31346.127850065.34AEEDD2@webmail.messagingengine.com> On Wed, Jun 11, 2014, at 17:11, Nikolaus Rath wrote: > MRAB writes: > > On 2014-06-11 02:30, Nikolaus Rath wrote: > >> Hello, > >> > >> I recently noticed (after some rather protacted debugging) that the > >> io.IOBase class comes with a destructor that calls self.close(): > >> > >> [0] nikratio at vostro:~/tmp$ cat test.py > >> import io > >> class Foo(io.IOBase): > >> def close(self): > >> print('close called') > >> r = Foo() > >> del r > >> [0] nikratio at vostro:~/tmp$ python3 test.py > >> close called > >> > >> To me, this came as quite a surprise, and the best "documentation" of > >> this feature seems to be the following note (from the io library > >> reference): > >> > >> "The abstract base classes also provide default implementations of some > >> methods in order to help implementation of concrete stream classes. For > >> example, BufferedIOBase provides unoptimized implementations of > >> readinto() and readline()." > >> > >> For me, having __del__ call close() does not qualify as a reasonable > >> default implementation unless close() is required to be idempotent > >> (which one could deduce from the documentation if one tries to, but it's > >> far from clear). > >> > >> Is this behavior an accident, or was that a deliberate decision? > >> > > To me, it makes sense. You want to make sure that it's closed, releasing > > any resources it might be holding, even if you haven't done so > > explicitly. > > I agree with your intentions, but I come to the opposite conclusion: > automatically calling close() in the destructor will hide that there's a > problem in the code. Without that automatic cleanup, there's at least a > good chance that a ResourceWarning will be emitted so the problem gets > noticed. "Silently work around bugs in caller's code" doesn't seem like > a very useful default to me... Things which actually hold system resources (like FileIO) give ResourceWarning if they close in __del__, so I don't understand your point. From rosuav at gmail.com Thu Jun 12 04:07:19 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 12 Jun 2014 12:07:19 +1000 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: <20140611230030.6F56F250DC4@webabinitio.net> Message-ID: On Thu, Jun 12, 2014 at 10:00 AM, anatoly techtonik wrote: > I thought exactly about that. Usually separate arguments are used to avoid > problems with escaping of quotes and other stuff. > > I'd deprecate subprocess and split it into separate modules. One is about > shell execution and another one is for secure process control. ISTM what you want is not shell=True, but a separate function that follows the system policy for translating a command name into a path-to-binary. That's something that, AFAIK, doesn't currently exist in the Python 2 stdlib, but Python 3 has shutil.which(). If there's a PyPI backport of that for Py2, you should be able to use that to figure out the command name, and then avoid shell=False. ChrisA From rosuav at gmail.com Thu Jun 12 04:12:48 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 12 Jun 2014 12:12:48 +1000 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: <20140611230030.6F56F250DC4@webabinitio.net> Message-ID: On Thu, Jun 12, 2014 at 12:07 PM, Chris Angelico wrote: > ISTM what you want is not shell=True, but a separate function that > follows the system policy for translating a command name into a > path-to-binary. That's something that, AFAIK, doesn't currently exist > in the Python 2 stdlib, but Python 3 has shutil.which(). If there's a > PyPI backport of that for Py2, you should be able to use that to > figure out the command name, and then avoid shell=False. Huh. Next time, Chris, search the web before you post. Via a StackOverflow post, learned about distutils.spawn.find_executable(). Python 2.7.4 (default, Apr 6 2013, 19:54:46) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import distutils.spawn >>> distutils.spawn.find_executable("python") 'C:\\Program Files\\LilyPond\\usr\\bin\\python.exe' So that would be the way to go. Render the short-form into an executable name, then skip the shell. ChrisA From ethan at stoneleaf.us Thu Jun 12 04:43:49 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 11 Jun 2014 19:43:49 -0700 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: <20140611230030.6F56F250DC4@webabinitio.net> Message-ID: <539913E5.3050007@stoneleaf.us> On 06/11/2014 07:12 PM, Chris Angelico wrote: > On Thu, Jun 12, 2014 at 12:07 PM, Chris Angelico wrote: >> ISTM what you want is not shell=True, but a separate function that >> follows the system policy for translating a command name into a >> path-to-binary. That's something that, AFAIK, doesn't currently exist >> in the Python 2 stdlib, but Python 3 has shutil.which(). If there's a >> PyPI backport of that for Py2, you should be able to use that to >> figure out the command name, and then avoid shell=False. > > Huh. Next time, Chris, search the web before you post. Via a > StackOverflow post, learned about distutils.spawn.find_executable(). --> import sys --> sys.executable '/usr/bin/python' From brian at python.org Thu Jun 12 06:27:23 2014 From: brian at python.org (Brian Curtin) Date: Wed, 11 Jun 2014 23:27:23 -0500 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <539913E5.3050007@stoneleaf.us> References: <20140611230030.6F56F250DC4@webabinitio.net> <539913E5.3050007@stoneleaf.us> Message-ID: On Wed, Jun 11, 2014 at 9:43 PM, Ethan Furman wrote: > On 06/11/2014 07:12 PM, Chris Angelico wrote: >> >> On Thu, Jun 12, 2014 at 12:07 PM, Chris Angelico wrote: >>> >>> ISTM what you want is not shell=True, but a separate function that >>> follows the system policy for translating a command name into a >>> path-to-binary. That's something that, AFAIK, doesn't currently exist >>> in the Python 2 stdlib, but Python 3 has shutil.which(). If there's a >>> PyPI backport of that for Py2, you should be able to use that to >>> figure out the command name, and then avoid shell=False. >> >> >> Huh. Next time, Chris, search the web before you post. Via a >> StackOverflow post, learned about distutils.spawn.find_executable(). > > > --> import sys > --> sys.executable > '/usr/bin/python' For finding the Python executable, yes, but the discussion and example are about a 2.x version of shutil.which From me at the-compiler.org Thu Jun 12 06:34:59 2014 From: me at the-compiler.org (Florian Bruhin) Date: Thu, 12 Jun 2014 06:34:59 +0200 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: <20140612043459.GA19485@lupin> * anatoly techtonik [2014-06-12 02:00:55 +0300]: > On Thu, Jun 12, 2014 at 1:30 AM, Chris Angelico wrote: > > > Why pass shell=True when executing a single > > command? I don't get it. > > > > I don't know about Linux, but on Windows programs are not directly > available as /usr/bin/python, so you need to find command in PATH > directories. Passing shell=True makes this lookup done by shell and not > manually. As it's been said, the whole *point* of shell=True is to be able to use shell features, so ^ being escaped automatically just would be... broken. How would I escape > then, for example ;) You basically have two options: - Do the lookup in PATH yourself, it's not like that's rocket science. I haven't checked if there's a ready function for it in the stdlib, but even when not: Get os.environ['PATH'], split it by os.pathsep, then for every directory check if your binary is in there. There's also some environment variable on Windows which contains the possible extensions for a binary in PATH, add that, and that's all. - Use shell=True and a cross-platform shell escape function. I've wrote one for a project of mine: [1] I've written some tests[2] but I haven't checked all corner-cases, so I can't guarantee it'll always work, as the interpretation of special chars by cmd.exe *is* black magic, at least to me. Needless to say this is probably the worse choice of the two. [1] http://git.the-compiler.org/qutebrowser/tree/qutebrowser/utils/misc.py?id=dffec73db76c867d261ec3416de011becb209f13#n154 [2] http://git.the-compiler.org/qutebrowser/tree/qutebrowser/test/utils/test_misc.py?id=dffec73db76c867d261ec3416de011becb209f13#n195 Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG 0xFD55A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From p.f.moore at gmail.com Thu Jun 12 08:57:41 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 Jun 2014 07:57:41 +0100 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <20140612043459.GA19485@lupin> References: <20140612043459.GA19485@lupin> Message-ID: On 12 June 2014 05:34, Florian Bruhin wrote: > Do the lookup in PATH yourself, it's not like that's rocket science. Am I missing something here? I routinely do subprocess.check_call(['hg', 'update']) or whatever, and it finds the hg executable fine. Paul From victor.stinner at gmail.com Thu Jun 12 11:41:22 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Jun 2014 11:41:22 +0200 Subject: [Python-Dev] Issue #21205: add __qualname__ to generators In-Reply-To: References: Message-ID: 2014-06-11 18:17 GMT+02:00 Antoine Pitrou : > Le 11/06/2014 10:28, Victor Stinner a ?crit : >> (...) >> Issues describing the problem, I attached a patch implementing my ideas: >> http://bugs.python.org/issue21205 >> >> Would you be ok with these (minor) incompatible changes? > > +1 from me. > > Regards > Antoine. I asked myself if this change can cause issues with serialization. The marshal and pickle modules cannot serialize a generator. Marshal only supports a few types. For pickle, I found this explanation: http://peadrop.com/blog/2009/12/29/why-you-cannot-pickle-generators/ So I consider that my change is safe. It changes the representation of a generator, but repr() is usually only checked in unit tests, tests can be fixed. It also changes the value of the __name__ attribute if the name of the function was changed, but I don't think that anyone relies on it. If you really want the original name of the code object, you can still get gen.gi_code.co_name. Another recent change in the Python API was the __wrapped__ attribute set by functools.wraps(). It is now chain wrapper functions, and I'm not aware of anyone complaining of such change. So I'm confident in my change :) Victor From storchaka at gmail.com Thu Jun 12 15:16:38 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 12 Jun 2014 16:16:38 +0300 Subject: [Python-Dev] close() questions In-Reply-To: References: <87d2egnsfq.fsf@vostro.rath.org> Message-ID: 11.06.14 05:28, Antoine Pitrou ???????(??): > close() should indeed be idempotent on all bundled IO class > implementations (otherwise it's a bug), and so should it preferably on > third-party IO class implementations. There are some questions about close(). 1. If object owns several resources, should close() try to clean up all them if error is happened during cleaning up some resource. E.g. should BufferedRWPair.close() close reader if closing writer failed? 2. If close() raises an exception, should repeated call of close() raise an exception or do nothing? E.g. if GzipFile.close() fails during writing gzip tail (CRC and size), should repeated call of it try to write this tail again? 3. If close() raises an exception, should the closed attribute (if exists) be True or False? From yselivanov.ml at gmail.com Thu Jun 12 18:34:47 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 12 Jun 2014 12:34:47 -0400 Subject: [Python-Dev] Issue #21205: add __qualname__ to generators In-Reply-To: References: Message-ID: <5399D6A7.8050609@gmail.com> Hello Victor, On 2014-06-11, 10:28 AM, Victor Stinner wrote: > Hi, > > I'm working on asyncio and it's difficult to debug code because > @asyncio.coroutine decorator removes the name of the function if the > function is not a generator (if it doesn't use yield from). > > I propose to add new gi_name and gi_qualname fields to the C structure > PyGenObject, add a new __qualname__ (= gi_qualname) attribute to the > Python API of generator, and change how the default value of __name__ > (= gi_name) of generators. > > Instead of getting the name from the code object, I propose to get the > name from the function (if the generator was created from a function). > So if the function name was modified, you get the new name instead of > getting the name from the code object (as done in Python 3.4). > > I also propose to display the qualified name in repr(generator) > instead of the name. > > All these changes should make my life easier to debug asyncio, but it > should help any project using generators. > > Issues describing the problem, I attached a patch implementing my ideas: > http://bugs.python.org/issue21205 > > Would you be ok with these (minor) incompatible changes? I'm +1 for your proposal. This change will indeed make debugging asyncio (and any generator-heavy code) easier. I wouldn't worry too much about compatibility, as the change is fairly minimal, and the feature will only land in 3.5, where people expect new things and are generally OK with slightly updated behaviors. Yury > > By the way, it looks like generator attributes were never documented > :-( My patch also adds a basic documentation (at least, it lists all > attributes in the documentation of the inspect module). > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com From donspauldingii at gmail.com Fri Jun 13 00:38:26 2014 From: donspauldingii at gmail.com (Don Spaulding) Date: Thu, 12 Jun 2014 17:38:26 -0500 Subject: [Python-Dev] Backwards Incompatibility in logging module in 3.4? Message-ID: Hi there, I just started testing a project of mine on Python 3.4.0b1. I ran into a change that broke compatibility with the logging module in 3.3. The basic test is: $ py34/bin/python -c 'import logging; print(logging.getLevelName("debug".upper()))' Level DEBUG $ py33/bin/python -c 'import logging; print(logging.getLevelName("debug".upper()))' 10 I quickly stumbled upon this webpage: http://aazza.github.io/2014/05/31/testing-on-multiple-versions-of-Python/ Which led me to this ticket regarding the change: http://bugs.python.org/issue18046 Is this a bug or an intentional break? If it's the latter, shouldn't this at least be mentioned in the "What's new in Python 3.4" document? If it's the former, should I file a bug? Thanks, Don -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 13 01:10:16 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 13 Jun 2014 09:10:16 +1000 Subject: [Python-Dev] Backwards Incompatibility in logging module in 3.4? In-Reply-To: References: Message-ID: On 13 Jun 2014 08:59, "Don Spaulding" wrote: > > Hi there, > > I just started testing a project of mine on Python 3.4.0b1. I ran into a change that broke compatibility with the logging module in 3.3. > > The basic test is: > > $ py34/bin/python -c 'import logging; print(logging.getLevelName("debug".upper()))' > Level DEBUG > > $ py33/bin/python -c 'import logging; print(logging.getLevelName("debug".upper()))' > 10 > > I quickly stumbled upon this webpage: > > http://aazza.github.io/2014/05/31/testing-on-multiple-versions-of-Python/ > > Which led me to this ticket regarding the change: > > http://bugs.python.org/issue18046 > > Is this a bug or an intentional break? If it's the latter, shouldn't this at least be mentioned in the "What's new in Python 3.4" document? If it's the former, should I file a bug? Yes, it sounds like a bug to me - there's no indication of an intent to change behaviour with that cleanup patch. Cheers, Nick. > > Thanks, > Don > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Jun 13 01:45:13 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 13 Jun 2014 01:45:13 +0200 Subject: [Python-Dev] Backwards Incompatibility in logging module in 3.4? In-Reply-To: References: Message-ID: Hi, 2014-06-13 0:38 GMT+02:00 Don Spaulding : > Is this a bug or an intentional break? If it's the latter, shouldn't this > at least be mentioned in the "What's new in Python 3.4" document? IMO the change is intentional. The previous behaviour was not really expected. Python 3.3 documentation is explicit: the result is a string and the input paramter is an integer. logging.getLevelName("DEBUG") was more an implementation https://docs.python.org/3.3/library/logging.html#logging.getLevelName "Returns the textual representation of logging level lvl. If the level is one of the predefined levels CRITICAL, ERROR, WARNING, INFO or DEBUG then you get the corresponding string. If you have associated levels with names using addLevelName() then the name you have associated with lvl is returned. If a numeric value corresponding to one of the defined levels is passed in, the corresponding string representation is returned. Otherwise, the string ?Level %s? % lvl is returned." If your code uses something like logger.setLevel(logging.getLevelName("DEBUG")), use directly logger.setLevel("DEBUG"). This issue was fixed in OpenStack with this change: https://review.openstack.org/#/c/94028/6/openstack/common/log.py,cm https://review.openstack.org/#/c/94028/6 Victor From rymg19 at gmail.com Fri Jun 13 01:55:08 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Thu, 12 Jun 2014 18:55:08 -0500 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: SHELLS ARE NOT CROSS-PLATFORM!!!! Seriously, there are going to be differences. If you really must: escape = lambda s: s.replace('^', '^^') if os.name == 'nt' else s Viola. On Wed, Jun 11, 2014 at 5:53 PM, anatoly techtonik wrote: > On Thu, Jun 12, 2014 at 1:30 AM, Chris Angelico wrote: > >> On Thu, Jun 12, 2014 at 7:58 AM, Ryan wrote: >> > In all seriousness, to me this is obvious. When you pass a command to >> the >> > shell, naturally, certain details are shell-specific. >> > > On Windows cmd.exe is used by default: > http://hg.python.org/cpython/file/38a325c84564/Lib/subprocess.py#l1108 > so it makes sense to make default behavior cross-platform. > > >> > -10000. Bad idea. Very bad idea. If you want the ^ to be escaped, do it >> > yourself. Or better yet, don't pass shell=True. >> >> Definitely the latter. Why pass shell=True when executing a single >> command? I don't get it. >> > > This is a complete use case using Rietveld upload script: > > http://techtonik.rainforce.org/2013/07/code-review-with-rietveld-and-mercurial.html > > I am interested to know how to modify upload script without kludges: > https://code.google.com/p/rietveld/source/browse/upload.py#1056 > I expect many people are facing with the same problem trying to wrap > Git and HG with Python scripts. > -- > anatoly t. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Fri Jun 13 03:06:20 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 12 Jun 2014 18:06:20 -0700 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <1402534493.31346.127850065.34AEEDD2@webmail.messagingengine.com> (Benjamin Peterson's message of "Wed, 11 Jun 2014 17:54:53 -0700") References: <87d2egnsfq.fsf@vostro.rath.org> <5397B62F.80004@mrabarnett.plus.com> <87a99jnfzq.fsf@vostro.rath.org> <1402534493.31346.127850065.34AEEDD2@webmail.messagingengine.com> Message-ID: <877g4lobxv.fsf@vostro.rath.org> Benjamin Peterson writes: > On Wed, Jun 11, 2014, at 17:11, Nikolaus Rath wrote: >> MRAB writes: >> > On 2014-06-11 02:30, Nikolaus Rath wrote: >> >> Hello, >> >> >> >> I recently noticed (after some rather protacted debugging) that the >> >> io.IOBase class comes with a destructor that calls self.close(): >> >> >> >> [0] nikratio at vostro:~/tmp$ cat test.py >> >> import io >> >> class Foo(io.IOBase): >> >> def close(self): >> >> print('close called') >> >> r = Foo() >> >> del r >> >> [0] nikratio at vostro:~/tmp$ python3 test.py >> >> close called >> >> >> >> To me, this came as quite a surprise, and the best "documentation" of >> >> this feature seems to be the following note (from the io library >> >> reference): >> >> >> >> "The abstract base classes also provide default implementations of some >> >> methods in order to help implementation of concrete stream classes. For >> >> example, BufferedIOBase provides unoptimized implementations of >> >> readinto() and readline()." >> >> >> >> For me, having __del__ call close() does not qualify as a reasonable >> >> default implementation unless close() is required to be idempotent >> >> (which one could deduce from the documentation if one tries to, but it's >> >> far from clear). >> >> >> >> Is this behavior an accident, or was that a deliberate decision? >> >> >> > To me, it makes sense. You want to make sure that it's closed, releasing >> > any resources it might be holding, even if you haven't done so >> > explicitly. >> >> I agree with your intentions, but I come to the opposite conclusion: >> automatically calling close() in the destructor will hide that there's a >> problem in the code. Without that automatic cleanup, there's at least a >> good chance that a ResourceWarning will be emitted so the problem gets >> noticed. "Silently work around bugs in caller's code" doesn't seem like >> a very useful default to me... > > Things which actually hold system resources (like FileIO) give > ResourceWarning if they close in __del__, so I don't understand your > point. Consider this simple example: $ cat test.py import io import warnings class StridedStream(io.IOBase): def __init__(self, name, stride=2): super().__init__() self.fh = open(name, 'rb') self.stride = stride def read(self, len_): return self.fh.read(self.stride*len_)[::self.stride] def close(self): self.fh.close() class FixedStridedStream(StridedStream): def __del__(self): # Prevent IOBase.__del__ frombeing called. pass warnings.resetwarnings() warnings.simplefilter('error') print('Creating & loosing StridedStream..') r = StridedStream('/dev/zero') del r print('Creating & loosing FixedStridedStream..') r = FixedStridedStream('/dev/zero') del r $ python3 test.py Creating & loosing StridedStream.. Creating & loosing FixedStridedStream.. Exception ignored in: <_io.FileIO name='/dev/zero' mode='rb'> ResourceWarning: unclosed file <_io.BufferedReader name='/dev/zero'> In the first case, the destructor inherited from IOBase actually prevents the ResourceWarning from being emitted. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From Nikolaus at rath.org Fri Jun 13 04:11:07 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 12 Jun 2014 19:11:07 -0700 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <20140611230030.6F56F250DC4@webabinitio.net> (R. David Murray's message of "Wed, 11 Jun 2014 19:00:29 -0400") References: <20140611230030.6F56F250DC4@webabinitio.net> Message-ID: <874mzpo8xw.fsf@vostro.rath.org> "R. David Murray" writes: > Also notice that using a list with shell=True is using the API > incorrectly. It wouldn't even work on Linux, so that torpedoes > the cross-platform concern already :) > > This kind of confusion is why I opened http://bugs.python.org/issue7839. Can someone describe an use case where shell=True actually makes sense at all? It seems to me that whenever you need a shell, the argument's that you pass to it will be shell specific. So instead of e.g. Popen('for i in `seq 42`; do echo $i; done', shell=True) you almost certainly want to do Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False) because if your shell happens to be tcsh or cmd.exe, things are going to break. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From rosuav at gmail.com Fri Jun 13 04:25:36 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 13 Jun 2014 12:25:36 +1000 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <874mzpo8xw.fsf@vostro.rath.org> References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> Message-ID: On Fri, Jun 13, 2014 at 12:11 PM, Nikolaus Rath wrote: > Can someone describe an use case where shell=True actually makes sense > at all? > > It seems to me that whenever you need a shell, the argument's that you > pass to it will be shell specific. So instead of e.g. > > Popen('for i in `seq 42`; do echo $i; done', shell=True) > > you almost certainly want to do > > Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False) > > because if your shell happens to be tcsh or cmd.exe, things are going to > break. Some features, while technically shell-specific, are supported across a lot of shells. You should be able to pipe output from one command into another in most shells, for instance. But yes, I generally don't use it. ChrisA From ncoghlan at gmail.com Fri Jun 13 04:43:56 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 13 Jun 2014 12:43:56 +1000 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <874mzpo8xw.fsf@vostro.rath.org> References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> Message-ID: On 13 Jun 2014 12:12, "Nikolaus Rath" wrote: > > "R. David Murray" writes: > > Also notice that using a list with shell=True is using the API > > incorrectly. It wouldn't even work on Linux, so that torpedoes > > the cross-platform concern already :) > > > > This kind of confusion is why I opened http://bugs.python.org/issue7839. > > Can someone describe an use case where shell=True actually makes sense > at all? When you're writing platform specific code, it's occasionally useful. It's generally best avoided, though. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at the-compiler.org Fri Jun 13 06:18:52 2014 From: me at the-compiler.org (Florian Bruhin) Date: Fri, 13 Jun 2014 06:18:52 +0200 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <874mzpo8xw.fsf@vostro.rath.org> References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> Message-ID: <20140613041852.GD19485@lupin> * Nikolaus Rath [2014-06-12 19:11:07 -0700]: > "R. David Murray" writes: > > Also notice that using a list with shell=True is using the API > > incorrectly. It wouldn't even work on Linux, so that torpedoes > > the cross-platform concern already :) > > > > This kind of confusion is why I opened http://bugs.python.org/issue7839. > > Can someone describe an use case where shell=True actually makes sense > at all? > > It seems to me that whenever you need a shell, the argument's that you > pass to it will be shell specific. So instead of e.g. > > Popen('for i in `seq 42`; do echo $i; done', shell=True) > > you almost certainly want to do > > Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False) > > because if your shell happens to be tcsh or cmd.exe, things are going to > break. My usecase is a spawn-command in a GUI application, which the user can use to spawn an executable. I want the user to be able to use the usual shell features from there. However, I also pass an argument to that command, and that should be escaped. Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG 0xFD55A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Fri Jun 13 06:57:49 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 13 Jun 2014 16:57:49 +1200 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <874mzpo8xw.fsf@vostro.rath.org> References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> Message-ID: <539A84CD.90104@canterbury.ac.nz> Nikolaus Rath wrote: > you almost certainly want to do > > Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False) > > because if your shell happens to be tcsh or cmd.exe, things are going to > break. On Unix, the C library's system() and popen() functions always use /bin/sh, NOT the user's current login shell, for this very reason. I would hope that the Python versions of these, and also the new subprocess stuff, do the same. That still leaves differences between Unix and Windows, but explicitly naming the shell won't help with that. -- Greg From benjamin at python.org Fri Jun 13 07:27:49 2014 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 12 Jun 2014 22:27:49 -0700 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <877g4lobxv.fsf@vostro.rath.org> References: <87d2egnsfq.fsf@vostro.rath.org> <5397B62F.80004@mrabarnett.plus.com> <87a99jnfzq.fsf@vostro.rath.org> <1402534493.31346.127850065.34AEEDD2@webmail.messagingengine.com> <877g4lobxv.fsf@vostro.rath.org> Message-ID: <1402637269.29254.128319501.4C662871@webmail.messagingengine.com> On Thu, Jun 12, 2014, at 18:06, Nikolaus Rath wrote: > Consider this simple example: > > $ cat test.py > import io > import warnings > > class StridedStream(io.IOBase): > def __init__(self, name, stride=2): > super().__init__() > self.fh = open(name, 'rb') > self.stride = stride > > def read(self, len_): > return self.fh.read(self.stride*len_)[::self.stride] > > def close(self): > self.fh.close() > > class FixedStridedStream(StridedStream): > def __del__(self): > # Prevent IOBase.__del__ frombeing called. > pass > > warnings.resetwarnings() > warnings.simplefilter('error') > > print('Creating & loosing StridedStream..') > r = StridedStream('/dev/zero') > del r > > print('Creating & loosing FixedStridedStream..') > r = FixedStridedStream('/dev/zero') > del r > > $ python3 test.py > Creating & loosing StridedStream.. > Creating & loosing FixedStridedStream.. > Exception ignored in: <_io.FileIO name='/dev/zero' mode='rb'> > ResourceWarning: unclosed file <_io.BufferedReader name='/dev/zero'> > > In the first case, the destructor inherited from IOBase actually > prevents the ResourceWarning from being emitted. Ah, I see. I don't see any good ways to fix it, though, besides setting some flag if close() is called from __del__. From mail at timgolden.me.uk Fri Jun 13 09:35:41 2014 From: mail at timgolden.me.uk (Tim Golden) Date: Fri, 13 Jun 2014 08:35:41 +0100 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <874mzpo8xw.fsf@vostro.rath.org> References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> Message-ID: <539AA9CD.5090306@timgolden.me.uk> On 13/06/2014 03:11, Nikolaus Rath wrote: > "R. David Murray" writes: >> Also notice that using a list with shell=True is using the API >> incorrectly. It wouldn't even work on Linux, so that torpedoes >> the cross-platform concern already :) >> >> This kind of confusion is why I opened http://bugs.python.org/issue7839. > > Can someone describe an use case where shell=True actually makes sense > at all? On Windows (where I think the OP is), Popen & friends ultimately invoke CreateProcess. In the case where shell=True, subprocess invokes the command interpreter explictly under the covers and tweaks a few other things to avoid a Brief Flash of Unstyled Console. This is the relevant snippet from subprocess.py: if shell: startupinfo.dwFlags |= _winapi.STARTF_USESHOWWINDOW startupinfo.wShowWindow = _winapi.SW_HIDE comspec = os.environ.get("COMSPEC", "cmd.exe") args = '{} /c "{}"'.format (comspec, args) That's all. It's more or less equivalent to your prefixing your commands with "cmd.exe /c". The only reasons you should need to do this are: * If you're using one of the few commands which are actually built-in to cmd.exe. I can't quickly find an online source for these, but typical examples will be: "dir" or "copy". * In some situations -- and I've never been able to nail this -- if you're trying to run a .bat/.cmd file. I've certainly been able to run batch files without shell=True but other people have failed within what appears to be the same configuration unless invoking cmd.exe via shell=True. I use hg.exe (from TortoiseHg) but ISTR that the base Mercurial install supplies a .bat/.cmd. If that's the OP's case then he might find it necessary to pass shell=True. TJG From taleinat at gmail.com Fri Jun 13 12:24:38 2014 From: taleinat at gmail.com (Tal Einat) Date: Fri, 13 Jun 2014 13:24:38 +0300 Subject: [Python-Dev] Raspberry Pi Buildbot Message-ID: Is there one? If not, would you like me to set one up? I've got one at home with Raspbian installed not doing anything, it could easily run a buildslave. Poking around on buildbot.python.org/all/builders, I can only see one ARM buildbot[1], and it's just called "ARM v7". I also found this python-dev thread[2] along with a blog.python.org blog post[3] from 2012, which mentioned that Trent Nelson would be receiving a Raspberry Pi and setting up a buildslave on it. But I can't find mention of it on buildbot.python.org. So I can't tell what the current state is. If anyone is interested, just let me know! .. [1]: http://buildbot.python.org/all/builders/ARMv7%203.x .. [2]: http://thread.gmane.org/gmane.comp.python.devel/136388 .. [3]: http://blog.python.org/2012/12/pandaboard-raspberry-pi-coming-to.html - Tal Einat From rdmurray at bitdance.com Fri Jun 13 14:40:17 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 13 Jun 2014 08:40:17 -0400 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <539A84CD.90104@canterbury.ac.nz> References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> <539A84CD.90104@canterbury.ac.nz> Message-ID: <20140613124017.9081A250D0C@webabinitio.net> On Fri, 13 Jun 2014 16:57:49 +1200, Greg Ewing wrote: > Nikolaus Rath wrote: > > you almost certainly want to do > > > > Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False) > > > > because if your shell happens to be tcsh or cmd.exe, things are going to > > break. > > On Unix, the C library's system() and popen() functions > always use /bin/sh, NOT the user's current login shell, > for this very reason. > > I would hope that the Python versions of these, and also > the new subprocess stuff, do the same. They do. > That still leaves differences between Unix and Windows, > but explicitly naming the shell won't help with that. There are some non-windows platforms where /bin/sh doesn't work (notably Android, where it is /system/bin/sh). See http://bugs.python.org/issue16353 for a proposal to create a standard way to figure out what the system shell should be for Popen's use. (The conclusion for Windows was to hardcode cmd.exe, though that isn't what the most recent patch there implements.) --David From larry at hastings.org Fri Jun 13 14:55:30 2014 From: larry at hastings.org (Larry Hastings) Date: Fri, 13 Jun 2014 05:55:30 -0700 Subject: [Python-Dev] Moving Python 3.5 on Windows to a new compiler In-Reply-To: <53978136.4000307@v.loewis.de> References: <529cffa5961d4b5bb57d554affe9643c@BLUPR03MB389.namprd03.prod.outlook.com> <53921464.7030400@v.loewis.de> <5392232A.2000102@v.loewis.de> <438e8a27e8e643f4841a22b24447b956@BLUPR03MB389.namprd03.prod.outlook.com> <53978136.4000307@v.loewis.de> Message-ID: <539AF4C2.2090900@hastings.org> On 06/10/2014 03:05 PM, "Martin v. L?wis" wrote: > We certainly don't need to resolve this now. We should discuss it again > when the release schedule for 3.5 is proposed. I anticipate 3.5 should be released about 18 months after the release of 3.4, putting it mid-September 2015. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From saimadhavheblikar at gmail.com Fri Jun 13 16:41:29 2014 From: saimadhavheblikar at gmail.com (Saimadhav Heblikar) Date: Fri, 13 Jun 2014 20:11:29 +0530 Subject: [Python-Dev] [Idle-dev] KeyConfig, KeyBinding and other related issues. In-Reply-To: References: <539805D6.8020201@udel.edu> <5399FB2D.1050208@udel.edu> <539A4C05.8060100@udel.edu> Message-ID: Hi, I would like the keyseq validator to be reviewed. The diff file: https://gist.github.com/sahutd/0a471db8138383fd73b2#file-test-keyseq-diff A sample test runner file: https://gist.github.com/sahutd/0a471db8138383fd73b2#file-test-keyseq-runner-py In its current form, it supports/has modifiers = ['Shift', 'Control', 'Alt', 'Meta'] alpha_uppercase = ['A'] alpha_lowercase = ['a'] direction = ['Up',] direction_key = ['Key-Up'] It supports validating combinations upto 4 in length. Please test for the above set only. (It will extended easily to fully represent the respective complete sets. The reason it cant be done *now* is the due to how RE optionals are coded differently in my patch. See CLEANUP below). I will also add remaining keys like Backspace, Slash etc tomorrow. # Cleanup: If we decide to go ahead with RE validating keys as in the above patch, 0. I made the mistake of not coding RE optionals -> ((pat)|(pat)) same for all sets. The result is that, extending the current key set is not possible without making all RE optional patterns similar.(Read the starting lines of is_valid_keyseq method). 1. There is a lot of places where refactoring can be done and appropriate comment added. 2. I left the asserts as-is. They can be used in testing the validator method itself. 3. The above patch still needs support for Backspace, slash etc to be added. I decided to add, once I am sure we will use it. 4. I would like to know how it will affect Mac? What are system specific differences? Please run the test-runner script on it and do let me know. --- My friend told that this thing can be done by "defining a grammar and automata." I did read up about it, but found it hard to grasp everything. Can you say whether it would be easier to solve it that way than RE? Regards On 13 June 2014 17:15, Saimadhav Heblikar wrote: > On 13 June 2014 16:58, Tal Einat wrote: >> On Fri, Jun 13, 2014 at 2:22 PM, Saimadhav Heblikar >> wrote: >>> Just a heads up to both: I am writing a keyseq validator method. >>> It currently works for over 800 permutations of ['Shift', 'Control', >>> 'Alt', 'Meta', 'Key-a', 'Key-A', 'Up', 'Key-Up', 'a', 'A']. It works >>> for permutations of length 2 and 3. Beyond that its not worth it IMO. >>> I am currently trying to integrate it with test_configuration.py and >>> catching permutations i missed out. >>> >>> I post this, so that we dont duplicate work. I hope it to be ready by >>> the end of the day.(UTC +5.5) >> >> What is the method you are using? > > Regex. It is not something elegant. The permutations are coded in.(Not > all 800+ obviously, but around 15-20 general ones.). The only > advantage is it can be used without creating a new Tk instance. > > >> >> What do you mean by "permutations"? If you mean what I think, then I'm >> not sure I agree with >3 not being worth it. I've used keyboard >> bindings with more than 2 modifiers before, and we should certainly >> support this properly. >> > I am sorry. I meant to write >3 modifier permutations. > (i.eControl-Shift-Alt-Meta+Key-X is not covered. But > Control-Shift-Alt-Key-X is.) > > > > > -- > Regards > Saimadhav Heblikar -- Regards Saimadhav Heblikar From helou.pedro at gmail.com Fri Jun 13 11:21:04 2014 From: helou.pedro at gmail.com (Pedro Helou) Date: Fri, 13 Jun 2014 11:21:04 +0200 Subject: [Python-Dev] python-dev for MAC OS X Message-ID: Hey, does anybody know how to install the python-dev headers and libraries for MAC OS X? -- Pedro Issa Helou Network Communication Engineering + 36 20 262 9274 -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Fri Jun 13 17:32:48 2014 From: taleinat at gmail.com (Tal Einat) Date: Fri, 13 Jun 2014 18:32:48 +0300 Subject: [Python-Dev] python-dev for MAC OS X In-Reply-To: References: Message-ID: On Fri, Jun 13, 2014 at 12:21 PM, Pedro Helou wrote: > Hey, > > does anybody know how to install the python-dev headers and libraries for > MAC OS X? Hi, This list is for discussing the development *of* Python, not *with* Python. Please ask on the python list, python-list at python.org (more info here[1]) or on the #python channel on the Freenode IRC server. StackOverflow is also a good place to search for information and ask questions. But while we're on the subject, on OSX I recommend using a binary package manager such as Homebrew[2] or Macports[3] for this. I have had good experiences using Homebrew. Good luck, - Tal Einat .. [1]: https://mail.python.org/mailman/listinfo/python-list .. [2]: http://brew.sh/ .. [3]: http://www.macports.org/ From saimadhavheblikar at gmail.com Fri Jun 13 17:44:04 2014 From: saimadhavheblikar at gmail.com (Saimadhav Heblikar) Date: Fri, 13 Jun 2014 21:14:04 +0530 Subject: [Python-Dev] [Idle-dev] KeyConfig, KeyBinding and other related issues. In-Reply-To: References: <539805D6.8020201@udel.edu> <5399FB2D.1050208@udel.edu> <539A4C05.8060100@udel.edu> Message-ID: Apologies for the accidental cross post. I intended to send it to idle-dev. I am sorry again :( On 13 June 2014 20:11, Saimadhav Heblikar wrote: > Hi, > > I would like the keyseq validator to be reviewed. > > The diff file: https://gist.github.com/sahutd/0a471db8138383fd73b2#file-test-keyseq-diff > A sample test runner file: > https://gist.github.com/sahutd/0a471db8138383fd73b2#file-test-keyseq-runner-py > > In its current form, it supports/has > modifiers = ['Shift', 'Control', 'Alt', 'Meta'] > alpha_uppercase = ['A'] > alpha_lowercase = ['a'] > direction = ['Up',] > direction_key = ['Key-Up'] > > It supports validating combinations upto 4 in length. > > Please test for the above set only. (It will extended easily to fully > represent the respective complete sets. The reason it cant be done > *now* is the due to how RE optionals are coded differently in my > patch. See CLEANUP below). I will also add remaining keys like > Backspace, Slash etc tomorrow. > > # Cleanup: > If we decide to go ahead with RE validating keys as in the above patch, > > 0. I made the mistake of not coding RE optionals -> ((pat)|(pat)) same > for all sets. The result is that, extending the current key set is not > possible without making all RE optional patterns similar.(Read the > starting lines of is_valid_keyseq method). > > 1. There is a lot of places where refactoring can be done and > appropriate comment added. > > 2. I left the asserts as-is. They can be used in testing the validator > method itself. > > 3. The above patch still needs support for Backspace, slash etc to be > added. I decided to add, once I am sure we will use it. > > 4. I would like to know how it will affect Mac? What are system > specific differences? Please run the test-runner script on it and do > let me know. > > --- > My friend told that this thing can be done by "defining a grammar and > automata." I did read up about it, but found it hard to grasp > everything. Can you say whether it would be easier to solve it that > way than RE? > > Regards > > > > On 13 June 2014 17:15, Saimadhav Heblikar wrote: >> On 13 June 2014 16:58, Tal Einat wrote: >>> On Fri, Jun 13, 2014 at 2:22 PM, Saimadhav Heblikar >>> wrote: >>>> Just a heads up to both: I am writing a keyseq validator method. >>>> It currently works for over 800 permutations of ['Shift', 'Control', >>>> 'Alt', 'Meta', 'Key-a', 'Key-A', 'Up', 'Key-Up', 'a', 'A']. It works >>>> for permutations of length 2 and 3. Beyond that its not worth it IMO. >>>> I am currently trying to integrate it with test_configuration.py and >>>> catching permutations i missed out. >>>> >>>> I post this, so that we dont duplicate work. I hope it to be ready by >>>> the end of the day.(UTC +5.5) >>> >>> What is the method you are using? >> >> Regex. It is not something elegant. The permutations are coded in.(Not >> all 800+ obviously, but around 15-20 general ones.). The only >> advantage is it can be used without creating a new Tk instance. >> >> >>> >>> What do you mean by "permutations"? If you mean what I think, then I'm >>> not sure I agree with >3 not being worth it. I've used keyboard >>> bindings with more than 2 modifiers before, and we should certainly >>> support this properly. >>> >> I am sorry. I meant to write >3 modifier permutations. >> (i.eControl-Shift-Alt-Meta+Key-X is not covered. But >> Control-Shift-Alt-Key-X is.) >> >> >> >> >> -- >> Regards >> Saimadhav Heblikar > > > > -- > Regards > Saimadhav Heblikar -- Regards Saimadhav Heblikar From status at bugs.python.org Fri Jun 13 18:07:57 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 13 Jun 2014 18:07:57 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140613160757.9965156A83@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-06-06 - 2014-06-13) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4662 (+12) closed 28859 (+57) total 33521 (+69) Open issues with patches: 2150 Issues opened (52) ================== #15993: Windows: 3.3.0-rc2.msi: test_buffer fails http://bugs.python.org/issue15993 reopened by skrah #18910: IDle: test textView.py http://bugs.python.org/issue18910 reopened by ned.deily #20043: test_multiprocessing_main_handling fails --without-threads http://bugs.python.org/issue20043 reopened by berker.peksag #20578: BufferedIOBase.readinto1 is missing http://bugs.python.org/issue20578 reopened by benjamin.peterson #21684: inspect.signature bind doesn't include defaults or empty tuple http://bugs.python.org/issue21684 opened by rmccampbell7 #21686: IDLE - Test hyperparser http://bugs.python.org/issue21686 opened by sahutd #21687: Py_SetPath: Path components separated by colons http://bugs.python.org/issue21687 opened by fwalch #21690: re documentation: re.compile links to re.search / re.match ins http://bugs.python.org/issue21690 opened by jdg #21694: IDLE - Test ParenMatch http://bugs.python.org/issue21694 opened by sahutd #21696: Idle: test configuration files http://bugs.python.org/issue21696 opened by terry.reedy #21697: shutil.copytree() handles symbolic directory incorrectly http://bugs.python.org/issue21697 opened by shajunxing #21699: Windows Python 3.4.1 pyvenv doesn't work in directories with s http://bugs.python.org/issue21699 opened by Justin.Engel #21702: asyncio: remote_addr of create_datagram_endpoint() is not docu http://bugs.python.org/issue21702 opened by haypo #21703: IDLE: Test UndoDelegator http://bugs.python.org/issue21703 opened by sahutd #21704: _multiprocessing module builds incorrectly when POSIX semaphor http://bugs.python.org/issue21704 opened by Arfrever #21705: cgi.py: Multipart with more than one file is misparsed http://bugs.python.org/issue21705 opened by smurfix #21706: Add base for enumerations (Functional API) http://bugs.python.org/issue21706 opened by dkorchem #21707: modulefinder uses wrong CodeType signature in .replace_paths_i http://bugs.python.org/issue21707 opened by lemburg #21708: Deprecate nonstandard behavior of a dumbdbm database http://bugs.python.org/issue21708 opened by serhiy.storchaka #21710: --install-base option ignored? http://bugs.python.org/issue21710 opened by pitrou #21714: Path.with_name can construct invalid paths http://bugs.python.org/issue21714 opened by Antony.Lee #21715: Chaining exceptions at C level http://bugs.python.org/issue21715 opened by serhiy.storchaka #21716: 3.4.1 download page link for OpenPGP signatures has no sigs http://bugs.python.org/issue21716 opened by grossdm #21717: Exclusive mode for ZipFile and TarFile http://bugs.python.org/issue21717 opened by Antony.Lee #21718: sqlite3 cursor.description seems to rely on incomplete stateme http://bugs.python.org/issue21718 opened by zzzeek #21719: Returning Windows file attribute information via os.stat() http://bugs.python.org/issue21719 opened by benhoyt #21720: "TypeError: Item in ``from list'' not a string" message http://bugs.python.org/issue21720 opened by davidszotten #21721: socket.sendfile() should use TransmitFile on Windows http://bugs.python.org/issue21721 opened by giampaolo.rodola #21722: teach distutils "upload" to exit with code != 0 when error occ http://bugs.python.org/issue21722 opened by mdengler #21723: Float maxsize is treated as infinity in asyncio.Queue http://bugs.python.org/issue21723 opened by vajrasky #21724: resetwarnings doesn't reset warnings registry http://bugs.python.org/issue21724 opened by pitrou #21725: RFC 6531 (SMTPUTF8) support in smtpd http://bugs.python.org/issue21725 opened by r.david.murray #21726: Unnecessary line in documentation http://bugs.python.org/issue21726 opened by Reid.Price #21728: Confusing error message when initialising type inheriting obje http://bugs.python.org/issue21728 opened by Gerrit.Holl #21729: Use `with` statement in dbm.dumb http://bugs.python.org/issue21729 opened by Claudiu.Popa #21730: test_socket fails --without-threads http://bugs.python.org/issue21730 opened by berker.peksag #21731: Calendar Problem with Windows (XP) http://bugs.python.org/issue21731 opened by Juebo #21732: SubprocessTestsMixin.test_subprocess_terminate() hangs on "AMD http://bugs.python.org/issue21732 opened by haypo #21734: compilation of the _ctypes module fails on OpenIndiana: ffi_pr http://bugs.python.org/issue21734 opened by haypo #21735: test_threading.test_main_thread_after_fork_from_nonmain_thread http://bugs.python.org/issue21735 opened by haypo #21736: Add __file__ attribute to frozen modules http://bugs.python.org/issue21736 opened by lemburg #21737: runpy.run_path() fails with frozen __main__ modules http://bugs.python.org/issue21737 opened by lemburg #21738: Enum docs claim replacing __new__ is not possible http://bugs.python.org/issue21738 opened by ethan.furman #21739: Add hint about expression in list comprehensions (https://docs http://bugs.python.org/issue21739 opened by krichter #21740: doctest doesn't allow duck-typing callables http://bugs.python.org/issue21740 opened by pitrou #21741: Convert most of the test suite to using unittest.main() http://bugs.python.org/issue21741 opened by zach.ware #21742: WatchedFileHandler can fail due to race conditions or file ope http://bugs.python.org/issue21742 opened by vishvananda #21743: Create tests for RawTurtleScreen http://bugs.python.org/issue21743 opened by Lita.Cho #21744: itertools.islice() goes over all the pre-initial elements even http://bugs.python.org/issue21744 opened by jcea #21746: urlparse.BaseResult no longer exists http://bugs.python.org/issue21746 opened by mgilson #21748: glob.glob does not sort its results http://bugs.python.org/issue21748 opened by drj #21749: pkgutil ImpLoader does not support frozen modules http://bugs.python.org/issue21749 opened by lemburg Most recent 15 issues with no replies (15) ========================================== #21749: pkgutil ImpLoader does not support frozen modules http://bugs.python.org/issue21749 #21743: Create tests for RawTurtleScreen http://bugs.python.org/issue21743 #21740: doctest doesn't allow duck-typing callables http://bugs.python.org/issue21740 #21738: Enum docs claim replacing __new__ is not possible http://bugs.python.org/issue21738 #21737: runpy.run_path() fails with frozen __main__ modules http://bugs.python.org/issue21737 #21735: test_threading.test_main_thread_after_fork_from_nonmain_thread http://bugs.python.org/issue21735 #21734: compilation of the _ctypes module fails on OpenIndiana: ffi_pr http://bugs.python.org/issue21734 #21730: test_socket fails --without-threads http://bugs.python.org/issue21730 #21726: Unnecessary line in documentation http://bugs.python.org/issue21726 #21720: "TypeError: Item in ``from list'' not a string" message http://bugs.python.org/issue21720 #21717: Exclusive mode for ZipFile and TarFile http://bugs.python.org/issue21717 #21716: 3.4.1 download page link for OpenPGP signatures has no sigs http://bugs.python.org/issue21716 #21715: Chaining exceptions at C level http://bugs.python.org/issue21715 #21710: --install-base option ignored? http://bugs.python.org/issue21710 #21708: Deprecate nonstandard behavior of a dumbdbm database http://bugs.python.org/issue21708 Most recent 15 issues waiting for review (15) ============================================= #21749: pkgutil ImpLoader does not support frozen modules http://bugs.python.org/issue21749 #21746: urlparse.BaseResult no longer exists http://bugs.python.org/issue21746 #21742: WatchedFileHandler can fail due to race conditions or file ope http://bugs.python.org/issue21742 #21741: Convert most of the test suite to using unittest.main() http://bugs.python.org/issue21741 #21737: runpy.run_path() fails with frozen __main__ modules http://bugs.python.org/issue21737 #21736: Add __file__ attribute to frozen modules http://bugs.python.org/issue21736 #21730: test_socket fails --without-threads http://bugs.python.org/issue21730 #21729: Use `with` statement in dbm.dumb http://bugs.python.org/issue21729 #21725: RFC 6531 (SMTPUTF8) support in smtpd http://bugs.python.org/issue21725 #21723: Float maxsize is treated as infinity in asyncio.Queue http://bugs.python.org/issue21723 #21722: teach distutils "upload" to exit with code != 0 when error occ http://bugs.python.org/issue21722 #21719: Returning Windows file attribute information via os.stat() http://bugs.python.org/issue21719 #21715: Chaining exceptions at C level http://bugs.python.org/issue21715 #21708: Deprecate nonstandard behavior of a dumbdbm database http://bugs.python.org/issue21708 #21707: modulefinder uses wrong CodeType signature in .replace_paths_i http://bugs.python.org/issue21707 Top 10 most discussed issues (10) ================================= #18910: IDle: test textView.py http://bugs.python.org/issue18910 8 msgs #21722: teach distutils "upload" to exit with code != 0 when error occ http://bugs.python.org/issue21722 8 msgs #17822: Save on Close windows (IDLE) http://bugs.python.org/issue17822 7 msgs #20577: IDLE: Remove FormatParagraph's width setting from config dialo http://bugs.python.org/issue20577 7 msgs #21205: Add __qualname__ attribute to Python generators and change def http://bugs.python.org/issue21205 6 msgs #20578: BufferedIOBase.readinto1 is missing http://bugs.python.org/issue20578 5 msgs #21652: Python 2.7.7 regression in mimetypes module on Windows http://bugs.python.org/issue21652 5 msgs #21669: Custom error messages when print & exec are used as statements http://bugs.python.org/issue21669 5 msgs #21719: Returning Windows file attribute information via os.stat() http://bugs.python.org/issue21719 5 msgs #21725: RFC 6531 (SMTPUTF8) support in smtpd http://bugs.python.org/issue21725 5 msgs Issues closed (59) ================== #1253: IDLE - Percolator overhaul http://bugs.python.org/issue1253 closed by terry.reedy #3938: Clearing globals; interpreter -- IDLE difference http://bugs.python.org/issue3938 closed by terry.reedy #7424: NetBSD: segmentation fault in listextend during install http://bugs.python.org/issue7424 closed by ned.deily #8378: PYTHONSTARTUP is not run by default when Idle is started http://bugs.python.org/issue8378 closed by terry.reedy #10498: calendar.LocaleHTMLCalendar.formatyearpage() results in traceb http://bugs.python.org/issue10498 closed by r.david.murray #10503: os.getuid() documentation should be clear on what kind of uid http://bugs.python.org/issue10503 closed by python-dev #11709: help-method crashes if sys.stdin is None http://bugs.python.org/issue11709 closed by python-dev #12063: tokenize module appears to treat unterminated single and doubl http://bugs.python.org/issue12063 closed by python-dev #12561: Compiler workaround for wide string constants in Modules/getpa http://bugs.python.org/issue12561 closed by Jim.Jewett #13111: Error 2203 when installing Python/Perl? http://bugs.python.org/issue13111 closed by loewis #13223: pydoc removes 'self' in HTML for method docstrings with exampl http://bugs.python.org/issue13223 closed by python-dev #14758: SMTPServer of smptd does not support binding to an IPv6 addres http://bugs.python.org/issue14758 closed by r.david.murray #15780: IDLE (windows) with PYTHONPATH and multiple python versions http://bugs.python.org/issue15780 closed by terry.reedy #17457: Unittest discover fails with namespace packages and builtin mo http://bugs.python.org/issue17457 closed by berker.peksag #17552: Add a new socket.sendfile() method http://bugs.python.org/issue17552 closed by giampaolo.rodola #18039: dbm.open(..., flag="n") does not work and does not give a warn http://bugs.python.org/issue18039 closed by serhiy.storchaka #18141: tkinter.Image.__del__ can throw an exception if module globals http://bugs.python.org/issue18141 closed by JanKanis #19662: smtpd.py should not decode utf-8 http://bugs.python.org/issue19662 closed by r.david.murray #19840: shutil.move(): Add ability to use custom copy function to allo http://bugs.python.org/issue19840 closed by r.david.murray #20903: smtplib.SMTP raises socket.timeout http://bugs.python.org/issue20903 closed by r.david.murray #21230: imghdr does not accept adobe photoshop mime type http://bugs.python.org/issue21230 closed by r.david.murray #21256: Sort keyword arguments in mock _format_call_signature http://bugs.python.org/issue21256 closed by python-dev #21310: ResourceWarning when open() fails with io.UnsupportedOperation http://bugs.python.org/issue21310 closed by serhiy.storchaka #21372: multiprocessing.util.register_after_fork inconsistency http://bugs.python.org/issue21372 closed by sbt #21404: Document options used to control compression level in tarfile http://bugs.python.org/issue21404 closed by python-dev #21463: RuntimeError when URLopener.ftpcache is full http://bugs.python.org/issue21463 closed by python-dev #21515: Use Linux O_TMPFILE flag in tempfile.TemporaryFile? http://bugs.python.org/issue21515 closed by haypo #21569: PEP 466: Python 2.7 What's New preamble changes http://bugs.python.org/issue21569 closed by ncoghlan #21596: asyncio.wait fails when futures list is empty http://bugs.python.org/issue21596 closed by haypo #21629: clinic.py --converters fails http://bugs.python.org/issue21629 closed by larry #21642: "_ if 1else _" does not compile http://bugs.python.org/issue21642 closed by python-dev #21656: Create test coverage for TurtleScreenBase in Turtle http://bugs.python.org/issue21656 closed by Lita.Cho #21659: IDLE: One corner calltip case http://bugs.python.org/issue21659 closed by python-dev #21667: Clarify status of O(1) indexing semantics of str objects http://bugs.python.org/issue21667 closed by ncoghlan #21671: CVE-2014-0224: OpenSSL upgrade to 1.0.1h on Windows required http://bugs.python.org/issue21671 closed by zach.ware #21677: Exception context set to string by BufferedWriter.close() http://bugs.python.org/issue21677 closed by serhiy.storchaka #21678: Add operation "plus" for dictionaries http://bugs.python.org/issue21678 closed by terry.reedy #21681: version string printed on STDERR http://bugs.python.org/issue21681 closed by r.david.murray #21682: Refleak in idle_test test_autocomplete http://bugs.python.org/issue21682 closed by terry.reedy #21683: Add Tix to the Windows buildbot scripts http://bugs.python.org/issue21683 closed by zach.ware #21685: zipfile module doesn't properly compress odt documents http://bugs.python.org/issue21685 closed by r.david.murray #21688: Improved error msg for make.bat htmlhelp http://bugs.python.org/issue21688 closed by zach.ware #21689: Docs for "Using Python on a Macintosh" needs to be updated. http://bugs.python.org/issue21689 closed by ned.deily #21691: set() returns random output with Python 3.4.1, in non-interact http://bugs.python.org/issue21691 closed by benjamin.peterson #21692: Wrong order of expected/actual for assert_called_once_with http://bugs.python.org/issue21692 closed by michael.foord #21693: Broken link to Pylons in the HOWTO TurboGears documentation http://bugs.python.org/issue21693 closed by orsenthil #21695: Idle 3.4.1-: closing Find in Files while in progress closes Id http://bugs.python.org/issue21695 closed by terry.reedy #21698: Platform.win32_ver() shows different values than expected on W http://bugs.python.org/issue21698 closed by haypo #21700: Missing mention of DatagramProtocol having connection_made and http://bugs.python.org/issue21700 closed by haypo #21701: create_datagram_endpoint does not receive when both local_addr http://bugs.python.org/issue21701 closed by ariddell #21709: logging.__init__ assumes that __file__ is always set http://bugs.python.org/issue21709 closed by python-dev #21711: Remove site-python support http://bugs.python.org/issue21711 closed by pitrou #21712: fractions.gcd failure http://bugs.python.org/issue21712 closed by rhettinger #21713: a mistype comment in PC/pyconfig.h http://bugs.python.org/issue21713 closed by python-dev #21727: Ambiguous sentence explaining `cycle` in itertools documentati http://bugs.python.org/issue21727 closed by rhettinger #21733: "mmap(size=9223372036854779904) failed" message when running t http://bugs.python.org/issue21733 closed by ned.deily #21745: Devguide: mention requirement to install Visual Studio SP1 on http://bugs.python.org/issue21745 closed by zach.ware #21747: argvars: error while parsing under windows http://bugs.python.org/issue21747 closed by r.david.murray #1517993: IDLE: config-main.def contains windows-specific settings http://bugs.python.org/issue1517993 closed by terry.reedy From 4kir4.1i at gmail.com Fri Jun 13 18:18:45 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Fri, 13 Jun 2014 20:18:45 +0400 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> <20140613041852.GD19485@lupin> Message-ID: <87zjhgkcka.fsf@gmail.com> Florian Bruhin writes: > * Nikolaus Rath [2014-06-12 19:11:07 -0700]: >> "R. David Murray" writes: >> > Also notice that using a list with shell=True is using the API >> > incorrectly. It wouldn't even work on Linux, so that torpedoes >> > the cross-platform concern already :) >> > >> > This kind of confusion is why I opened http://bugs.python.org/issue7839. >> >> Can someone describe an use case where shell=True actually makes sense >> at all? >> >> It seems to me that whenever you need a shell, the argument's that you >> pass to it will be shell specific. So instead of e.g. >> >> Popen('for i in `seq 42`; do echo $i; done', shell=True) >> >> you almost certainly want to do >> >> Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False) >> >> because if your shell happens to be tcsh or cmd.exe, things are going to >> break. > > My usecase is a spawn-command in a GUI application, which the user can > use to spawn an executable. I want the user to be able to use the > usual shell features from there. However, I also pass an argument to > that command, and that should be escaped. You should pass the command as a string and use cmd.exe quote rules [1] (note: they are different from the one provided by `subprocess.list2cmdline()` [2] that follows Microsoft C/C++ startup code rules [3] e.g., `^` is not special unlike in cmd.exe case). [1]: http://blogs.msdn.com/b/twistylittlepassagesallalike/archive/2011/04/23/everyone-quotes-arguments-the-wrong-way.aspx [2]: https://docs.python.org/3.4/library/subprocess.html#converting-an-argument-sequence-to-a-string-on-windows [3]: http://msdn.microsoft.com/en-us/library/17w5ykft%28v=vs.85%29.aspx -- akira From donspauldingii at gmail.com Fri Jun 13 21:11:31 2014 From: donspauldingii at gmail.com (Don Spaulding) Date: Fri, 13 Jun 2014 14:11:31 -0500 Subject: [Python-Dev] Backwards Incompatibility in logging module in 3.4? In-Reply-To: References: Message-ID: On Thu, Jun 12, 2014 at 6:45 PM, Victor Stinner wrote: > Hi, > > 2014-06-13 0:38 GMT+02:00 Don Spaulding : > > Is this a bug or an intentional break? If it's the latter, shouldn't > this > > at least be mentioned in the "What's new in Python 3.4" document? > > IMO the change is intentional. The previous behaviour was not really > expected. > I agree that the change seems intentional. However, as Nick mentioned, the ticket doesn't really discuss the repercussions of changing the output of the function. As far as I can tell, this function has returned an int when given a string since it was introduced in Python 2.3. I think it's reasonable to call a function's behavior "expected" after 11 years in the wild. > > Python 3.3 documentation is explicit: the result is a string and the > input paramter is an integer. logging.getLevelName("DEBUG") was more > an implementation > > https://docs.python.org/3.3/library/logging.html#logging.getLevelName > "Returns the textual representation of logging level lvl. If the level > is one of the predefined levels CRITICAL, ERROR, WARNING, INFO or > DEBUG then you get the corresponding string. If you have associated > levels with names using addLevelName() then the name you have > associated with lvl is returned. If a numeric value corresponding to > one of the defined levels is passed in, the corresponding string > representation is returned. Otherwise, the string ?Level %s? % lvl is > returned." > > If your code uses something like > logger.setLevel(logging.getLevelName("DEBUG")), use directly > logger.setLevel("DEBUG"). > > This issue was fixed in OpenStack with this change: > https://review.openstack.org/#/c/94028/6/openstack/common/log.py,cm > https://review.openstack.org/#/c/94028/6 > > Victor > I appreciate the pointer to the OpenStack fix. I've actually already worked around the issue in my project (although without much elegance, I'll readily admit). I opened up an issue on the tracker for this: http://bugs.python.org/issue21752 I apologize if that was out of turn. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nad at acm.org Sat Jun 14 00:34:33 2014 From: nad at acm.org (Ned Deily) Date: Fri, 13 Jun 2014 15:34:33 -0700 Subject: [Python-Dev] python-dev for MAC OS X References: Message-ID: In article , Tal Einat wrote: > On Fri, Jun 13, 2014 at 12:21 PM, Pedro Helou wrote: > > does anybody know how to install the python-dev headers and libraries for > > MAC OS X? > This list is for discussing the development *of* Python, not *with* > Python. Please ask on the python list, python-list at python.org (more > info here[1]) or on the #python channel on the Freenode IRC server. > StackOverflow is also a good place to search for information and ask > questions. Like Tal said. But I'm guessing you are asking about the headers for the Apple-supplied System Pythons. On recent versions of OS X, they are not installed by default; you need to install the Command Line Tools component to install system headers include those for Python. How you do that varies by OS X release. In OS X 10.9 Mavericks, you can run "xcode-select --install". For earlier releases, there may be an option in Xcode.app's Preferences. Or you may be able to download the right Command Line Tools package from the Apple Developer Connection site. -- Ned Deily, nad at acm.org From ncoghlan at gmail.com Sat Jun 14 01:28:22 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 14 Jun 2014 09:28:22 +1000 Subject: [Python-Dev] Backwards Incompatibility in logging module in 3.4? In-Reply-To: References: Message-ID: On 14 Jun 2014 06:18, "Don Spaulding" wrote: > > I opened up an issue on the tracker for this: http://bugs.python.org/issue21752 > > I apologize if that was out of turn. At the very least, there should be a note in the "porting to Python 3.4" section of the What's New and a versionchanged note on the API docs, so a docs bug report is appropriate to add those. Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Sat Jun 14 05:04:23 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Fri, 13 Jun 2014 20:04:23 -0700 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <1402637269.29254.128319501.4C662871@webmail.messagingengine.com> (Benjamin Peterson's message of "Thu, 12 Jun 2014 22:27:49 -0700") References: <87d2egnsfq.fsf@vostro.rath.org> <5397B62F.80004@mrabarnett.plus.com> <87a99jnfzq.fsf@vostro.rath.org> <1402534493.31346.127850065.34AEEDD2@webmail.messagingengine.com> <877g4lobxv.fsf@vostro.rath.org> <1402637269.29254.128319501.4C662871@webmail.messagingengine.com> Message-ID: <871tusnqdk.fsf@vostro.rath.org> Benjamin Peterson writes: > On Thu, Jun 12, 2014, at 18:06, Nikolaus Rath wrote: >> Consider this simple example: >> >> $ cat test.py >> import io >> import warnings >> >> class StridedStream(io.IOBase): >> def __init__(self, name, stride=2): >> super().__init__() >> self.fh = open(name, 'rb') >> self.stride = stride >> >> def read(self, len_): >> return self.fh.read(self.stride*len_)[::self.stride] >> >> def close(self): >> self.fh.close() >> >> class FixedStridedStream(StridedStream): >> def __del__(self): >> # Prevent IOBase.__del__ frombeing called. >> pass >> >> warnings.resetwarnings() >> warnings.simplefilter('error') >> >> print('Creating & loosing StridedStream..') >> r = StridedStream('/dev/zero') >> del r >> >> print('Creating & loosing FixedStridedStream..') >> r = FixedStridedStream('/dev/zero') >> del r >> >> $ python3 test.py >> Creating & loosing StridedStream.. >> Creating & loosing FixedStridedStream.. >> Exception ignored in: <_io.FileIO name='/dev/zero' mode='rb'> >> ResourceWarning: unclosed file <_io.BufferedReader name='/dev/zero'> >> >> In the first case, the destructor inherited from IOBase actually >> prevents the ResourceWarning from being emitted. > > Ah, I see. I don't see any good ways to fix it, though, besides setting > some flag if close() is called from __del__. How about not having IOBase.__del__ call self.close()? Any resources acquired by the derived class would still clean up after themselves when they are garbage collected. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From benjamin at python.org Sat Jun 14 05:26:02 2014 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 13 Jun 2014 20:26:02 -0700 Subject: [Python-Dev] Why does IOBase.__del__ call .close? In-Reply-To: <871tusnqdk.fsf@vostro.rath.org> References: <87d2egnsfq.fsf@vostro.rath.org> <5397B62F.80004@mrabarnett.plus.com> <87a99jnfzq.fsf@vostro.rath.org> <1402534493.31346.127850065.34AEEDD2@webmail.messagingengine.com> <877g4lobxv.fsf@vostro.rath.org> <1402637269.29254.128319501.4C662871@webmail.messagingengine.com> <871tusnqdk.fsf@vostro.rath.org> Message-ID: <1402716362.12324.128662021.0B640B3D@webmail.messagingengine.com> On Fri, Jun 13, 2014, at 20:04, Nikolaus Rath wrote: > Benjamin Peterson writes: > > On Thu, Jun 12, 2014, at 18:06, Nikolaus Rath wrote: > >> Consider this simple example: > >> > >> $ cat test.py > >> import io > >> import warnings > >> > >> class StridedStream(io.IOBase): > >> def __init__(self, name, stride=2): > >> super().__init__() > >> self.fh = open(name, 'rb') > >> self.stride = stride > >> > >> def read(self, len_): > >> return self.fh.read(self.stride*len_)[::self.stride] > >> > >> def close(self): > >> self.fh.close() > >> > >> class FixedStridedStream(StridedStream): > >> def __del__(self): > >> # Prevent IOBase.__del__ frombeing called. > >> pass > >> > >> warnings.resetwarnings() > >> warnings.simplefilter('error') > >> > >> print('Creating & loosing StridedStream..') > >> r = StridedStream('/dev/zero') > >> del r > >> > >> print('Creating & loosing FixedStridedStream..') > >> r = FixedStridedStream('/dev/zero') > >> del r > >> > >> $ python3 test.py > >> Creating & loosing StridedStream.. > >> Creating & loosing FixedStridedStream.. > >> Exception ignored in: <_io.FileIO name='/dev/zero' mode='rb'> > >> ResourceWarning: unclosed file <_io.BufferedReader name='/dev/zero'> > >> > >> In the first case, the destructor inherited from IOBase actually > >> prevents the ResourceWarning from being emitted. > > > > Ah, I see. I don't see any good ways to fix it, though, besides setting > > some flag if close() is called from __del__. > > How about not having IOBase.__del__ call self.close()? Any resources > acquired by the derived class would still clean up after themselves when > they are garbage collected. Well, yes, but that's probably a backwards compat problem. From greg at krypto.org Sat Jun 14 05:38:04 2014 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 13 Jun 2014 20:38:04 -0700 Subject: [Python-Dev] Raspberry Pi Buildbot In-Reply-To: References: Message-ID: On Fri, Jun 13, 2014 at 3:24 AM, Tal Einat wrote: > Is there one? If not, would you like me to set one up? I've got one at > home with Raspbian installed not doing anything, it could easily run a > buildslave. > > Poking around on buildbot.python.org/all/builders, I can only see one > ARM buildbot[1], and it's just called "ARM v7". > The ARM v7 buildbot is mine. It's a Samsung chromebook with a dual core exynos5 cpu and usb3 SSD. ie: It's *at least* 10x faster than a raspberry pi. I don't think a pi buildbot would add much value but if you want to run one, feel free. It should live in the unstable pool. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Sat Jun 14 22:11:44 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Sat, 14 Jun 2014 23:11:44 +0300 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140610030303.GU10355@ando> References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> Message-ID: <20140614231144.639bf852@x34f> Hello, On Tue, 10 Jun 2014 13:03:03 +1000 Steven D'Aprano wrote: > On Tue, Jun 10, 2014 at 05:23:12AM +0300, Paul Sokolovsky wrote: > > > execfile() builtin function was removed in 3.0. This brings few > > problems: > > > > 1. It hampers interactive mode - instead of short and easy to type > > execfile("file.py") one needs to use exec(open("file.py").read()). > > If the amount of typing is the problem, that's easy to solve: > > # do this once > def execfile(name): > exec(open("file.py").read()) So, you here propose to workaround removal of core language feature either a) on end user side, or b) on "system integrator" side. But such solution is based on big number of assumptions, like: user wants to workaround that at all (hint: they don't, they just want to use it); you say "do this once", but actually it's "do it in each interactive session again and again", and user may not have knowledge to "do it once" instead; that if system integrator does that, the the function is called "execfile": if system integrator didn't have enough Python experience, and read only Python3 spec, they might call it something else, and yet users with a bit of Python experience will expect it be called exactly "execfile" and not anything else. > > Another possibility is: > > os.system("python file.py") > > > > 2. Ok, assuming that exec(open().read()) idiom is still a way to go, > > there's a problem - it requires to load entire file to memory. But > > there can be not enough memory. Consider 1Mb file with 900Kb > > comments (autogenerated, for example). execfile() could easily > > parse it, using small buffer. But exec() requires to slurp entire > > file into memory, and 1Mb is much more than heap sizes that we > > target. > > There's nothing stopping alternative implementations having their own > implementation-specific standard library modules. And here you propose to workaround it on particular implementation's level. But in my original mail, in excerpt that you removed, I kindly asked to skip obvious suggestions (like that particular implementation can do anything it wants). I don't see how working around the issue on user, particular distribution, or particular implementation level help *Python* language in general, and *Python community* in general. So, any bright ideas how to workaround the issue of execfile() removal on *language level*? [] > So you could do this: > > from upy import execfile > execfile("file.py") > > So long as you make it clear that this is a platform specific module, > and don't advertise it as a language feature, I see no reason why you > cannot do that. The case we discuss is clearly different. It's not about "platform specific module", it's about functionality which was in Python all the time, and was suddenly removed in Python3, for not fully clear, or alternatively, not severe enough, reasons. If some implementation is to re-add it, the description like above seems the most truthful way to represent that function. -- Best regards, Paul mailto:pmiscml at gmail.com From techtonik at gmail.com Sat Jun 14 21:54:01 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 14 Jun 2014 22:54:01 +0300 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: On Fri, Jun 13, 2014 at 2:55 AM, Ryan Gonzalez wrote: > SHELLS ARE NOT CROSS-PLATFORM!!!! Seriously, there are going to be > differences. If you really must: > > escape = lambda s: s.replace('^', '^^') if os.name == 'nt' else s > It is not about generic shell problem, it is about specific behavior that on Windows Python already uses cmd.exe shell hardcoded in its sources. So for crossplatform behavior on Windows, it should escape symbols on command passed to cmd.exe that are special to this shell to avoid breaking Python scripts. What you propose is a bad workaround, because it assumes that all Python users who use subprocess to execute hg or git should possess apriori knowledge about default subprocess behaviour with default shell on Windows and implement workaround for that. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sat Jun 14 22:04:23 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 14 Jun 2014 23:04:23 +0300 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <874mzpo8xw.fsf@vostro.rath.org> References: <20140611230030.6F56F250DC4@webabinitio.net> <874mzpo8xw.fsf@vostro.rath.org> Message-ID: On Fri, Jun 13, 2014 at 5:11 AM, Nikolaus Rath wrote: > "R. David Murray" writes: > > Also notice that using a list with shell=True is using the API > > incorrectly. It wouldn't even work on Linux, so that torpedoes > > the cross-platform concern already :) > > > > This kind of confusion is why I opened http://bugs.python.org/issue7839. > > Can someone describe an use case where shell=True actually makes sense > at all? > You need to write a wrapper script to automate several user commands. It is quite common to use shell pipe redirection for joining many utils and calls together than to rewrite data pipes in Python. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sat Jun 14 22:07:27 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 14 Jun 2014 23:07:27 +0300 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: <20140611230030.6F56F250DC4@webabinitio.net> Message-ID: On Thu, Jun 12, 2014 at 5:12 AM, Chris Angelico wrote: > On Thu, Jun 12, 2014 at 12:07 PM, Chris Angelico wrote: > > ISTM what you want is not shell=True, but a separate function that > > follows the system policy for translating a command name into a > > path-to-binary. That's something that, AFAIK, doesn't currently exist > > in the Python 2 stdlib, but Python 3 has shutil.which(). If there's a > > PyPI backport of that for Py2, you should be able to use that to > > figure out the command name, and then avoid shell=False. > > Huh. Next time, Chris, search the web before you post. Via a > StackOverflow post, learned about distutils.spawn.find_executable(). > I remember I even wrote a patch for it, but I forgot about it already. Still feels like a hack that is difficult to find and understand that you need really it. In Rietveld case it won't work, because upload.py script allows user to specify arbitrary diff command to send change for review. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Sat Jun 14 22:52:15 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Sat, 14 Jun 2014 23:52:15 +0300 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> Message-ID: <20140614235215.621e7571@x34f> Hello, On Tue, 10 Jun 2014 17:36:02 +1000 Nick Coghlan wrote: > On 10 June 2014 12:23, Paul Sokolovsky wrote: > > 1. It hampers interactive mode - instead of short and easy to type > > execfile("file.py") one needs to use exec(open("file.py").read()). > > I'm sure that's not going to bother a lot of people - after all, the > > easiest way to execute a Python file is to drop back to shell and > > restart python with file name, using all wonders of tab completion. > > But now imagine that Python interpreter runs on bare hardware, and > > its REPL is the only shell. That's exactly what we have with > > MicroPython's Cortex-M port. But it's not really > > MicroPython-specific, there's CPython port to baremetal either - > > http://www.pycorn.org/ . > > https://docs.python.org/3/library/runpy.html#runpy.run_path > > import runpy > file_globals = runpy.run_path("file.py") Thanks, it's the most productive response surely. So, at least there's alternative to removed execfile(). Unfortunately, I don't think it's good alternative to execfile() in all respects. It clearly provides API for that functionality, but is that solution of least surprise and is it actually known by users at all (to be useful for them)? Googling for "execfile python 3", top 3 hits I see are stackoverflow questions, *none* of which mentions runpy. So, people either don't consider it viable alternative to execfile, or don't know about it at all (my guess it's the latter). Like with previous discussion, its meaning goes beyond just Python realm - there's competition all around. And internets bring funny examples, like for example http://www.red-lang.org/p/contributions.html (scroll down to diagram, or here's direct link: http://3.bp.blogspot.com/-xhOP35Dm99w/UuXFKgY2dlI/AAAAAAAAAGA/YQu98_pPDjw/s1600/reichart-abstraction-diagram.png) So, didn't you know that Ruby can be used for OS-level development, and Python can't? Or that JavaScript DSL capabilities are better than Python's (that's taking into account that JavaScript DSL capabilities are represented by JSON, whose creators were so arrogant as to disallow even usage of comments in it). So, now suppose there's a discussion of how good different languages are for interactive usage (out of the box apparently). It would be a little hard to defend claim that Python is *excellent* interactive language, if its latest series got -1 on that scale, by removing feature which may be indispensable at times. Knowing that, one subconsciously may start to wonder if Ruby or JavaScript are doing it (in wide sense) better than Python. -- Best regards, Paul mailto:pmiscml at gmail.com From markus at unterwaditzer.net Sat Jun 14 23:00:59 2014 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Sat, 14 Jun 2014 23:00:59 +0200 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140610052312.280e49c9@x34f> References: <20140610052312.280e49c9@x34f> Message-ID: <20140614210059.GB20710@chromebot.lan> On Tue, Jun 10, 2014 at 05:23:12AM +0300, Paul Sokolovsky wrote: > Hello, > > I was pleasantly surprised with the response to recent post about > MicroPython implementation details > (https://mail.python.org/pipermail/python-dev/2014-June/134718.html). I > hope that discussion means that posts about alternative implementations > are not unwelcome here, so I would like to bring up another (of many) > issues we faced while implementing MicroPython. > > execfile() builtin function was removed in 3.0. This brings few > problems: > > 1. It hampers interactive mode - instead of short and easy to type > execfile("file.py") one needs to use exec(open("file.py").read()). I'm > sure that's not going to bother a lot of people - after all, the > easiest way to execute a Python file is to drop back to shell and > restart python with file name, using all wonders of tab completion. But > now imagine that Python interpreter runs on bare hardware, and its REPL > is the only shell. That's exactly what we have with MicroPython's > Cortex-M port. But it's not really MicroPython-specific, there's > CPython port to baremetal either - http://www.pycorn.org/ . As far as i can see, minimizing the amount of characters to type was never a design goal of the Python language. And because that goal never mattered as much for the designers as it seems to do for you, the reason for it to get removed -- reducing the amount of builtins without reducing functionality -- was the only one left. > 2. Ok, assuming that exec(open().read()) idiom is still a way to go, > there's a problem - it requires to load entire file to memory. But > there can be not enough memory. Consider 1Mb file with 900Kb comments > (autogenerated, for example). execfile() could easily parse it, using > small buffer. But exec() requires to slurp entire file into memory, and > 1Mb is much more than heap sizes that we target. That is a valid concern, but i believe violating the language specification and adding your own execfile implementation (either as a builtin or in a new stdlib module) here is justified, even if it means you will have to modify your existing Python 3 code to use it -- i don't think the majority of software written in Python will be able to run under such memory constraints without major modifications anyway. > Comments, suggestions? Just to set a productive direction, please > kindly don't consider the problems above as MicroPython's. A new (not MicroPython-specific) stdlib module containing functions such as execfile could be considered. Not really for Python-2-compatibility, but for performance-critical situations. I am not sure if this is a good solution. Not at all. Even though it's separated from the builtins, i think it would still sacrifice the purity of the the language (by which i mean having a minimal composable API), because people are going to use it anyway. It reminds me of the situation in Python 2 where developers are trying to use cStringIO with a fallback to StringIO as a matter of principle, not because they actually need that kind of performance. Another, IMO better idea which shifts the problem to the MicroPython devs is to "just" detect code using exec(open(...).read()) and transparently rewrite it to something more memory-efficient. This is the idea i actually think is a good one. > I very much liked how last discussion went: I was pointed that > https://docs.python.org/3/reference/index.html is not really a CPython > reference, it's a *Python* reference, and there were even motion to > clarify in it some points which came out from MicroPython discussion. > So, what about https://docs.python.org/3/library/index.html - is it > CPython, or Python standard library specification? Assuming the latter, > what we have is that, by removal of previously available feature, > *Python* became less friendly for interactive usage and less scalable. "Less friendly for interactive usage" is a strong and vague statement. If you're going after the amount of characters required to type, yes, absolutely, but by that terms one could declare Bash and Perl to be superior languages. Look at it from a different perspective: There are fewer builtins to remember. > > > Thanks, > Paul mailto:pmiscml at gmail.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/markus%40unterwaditzer.net From fabiofz at gmail.com Sun Jun 15 00:15:37 2014 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Sat, 14 Jun 2014 19:15:37 -0300 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140614210059.GB20710@chromebot.lan> References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> Message-ID: On Sat, Jun 14, 2014 at 6:00 PM, Markus Unterwaditzer < markus at unterwaditzer.net> wrote: > On Tue, Jun 10, 2014 at 05:23:12AM +0300, Paul Sokolovsky wrote: > > Hello, > > > > I was pleasantly surprised with the response to recent post about > > MicroPython implementation details > > (https://mail.python.org/pipermail/python-dev/2014-June/134718.html). I > > hope that discussion means that posts about alternative implementations > > are not unwelcome here, so I would like to bring up another (of many) > > issues we faced while implementing MicroPython. > > > > execfile() builtin function was removed in 3.0. This brings few > > problems: > > > > 1. It hampers interactive mode - instead of short and easy to type > > execfile("file.py") one needs to use exec(open("file.py").read()). I'm > > sure that's not going to bother a lot of people - after all, the > > easiest way to execute a Python file is to drop back to shell and > > restart python with file name, using all wonders of tab completion. But > > now imagine that Python interpreter runs on bare hardware, and its REPL > > is the only shell. That's exactly what we have with MicroPython's > > Cortex-M port. But it's not really MicroPython-specific, there's > > CPython port to baremetal either - http://www.pycorn.org/ . > > As far as i can see, minimizing the amount of characters to type was never > a > design goal of the Python language. And because that goal never mattered as > much for the designers as it seems to do for you, the reason for it to get > removed -- reducing the amount of builtins without reducing functionality > -- > was the only one left. > > > 2. Ok, assuming that exec(open().read()) idiom is still a way to go, > > there's a problem - it requires to load entire file to memory. But > > there can be not enough memory. Consider 1Mb file with 900Kb comments > > (autogenerated, for example). execfile() could easily parse it, using > > small buffer. But exec() requires to slurp entire file into memory, and > > 1Mb is much more than heap sizes that we target. > > That is a valid concern, but i believe violating the language > specification and > adding your own execfile implementation (either as a builtin or in a new > stdlib > module) here is justified, even if it means you will have to modify your > existing Python 3 code to use it -- i don't think the majority of software > written in Python will be able to run under such memory constraints without > major modifications anyway. > > > Comments, suggestions? Just to set a productive direction, please > > kindly don't consider the problems above as MicroPython's. > > A new (not MicroPython-specific) stdlib module containing functions such as > execfile could be considered. Not really for Python-2-compatibility, but > for > performance-critical situations. > > I am not sure if this is a good solution. Not at all. Even though it's > separated from the builtins, i think it would still sacrifice the purity > of the > the language (by which i mean having a minimal composable API), because > people > are going to use it anyway. It reminds me of the situation in Python 2 > where > developers are trying to use cStringIO with a fallback to StringIO as a > matter > of principle, not because they actually need that kind of performance. > > Another, IMO better idea which shifts the problem to the MicroPython devs > is to > "just" detect code using > > exec(open(...).read()) > > and transparently rewrite it to something more memory-efficient. This is > the > idea i actually think is a good one. > > > > I very much liked how last discussion went: I was pointed that > > https://docs.python.org/3/reference/index.html is not really a CPython > > reference, it's a *Python* reference, and there were even motion to > > clarify in it some points which came out from MicroPython discussion. > > So, what about https://docs.python.org/3/library/index.html - is it > > CPython, or Python standard library specification? Assuming the latter, > > what we have is that, by removal of previously available feature, > > *Python* became less friendly for interactive usage and less scalable. > > "Less friendly for interactive usage" is a strong and vague statement. If > you're going after the amount of characters required to type, yes, > absolutely, > but by that terms one could declare Bash and Perl to be superior languages. > Look at it from a different perspective: There are fewer builtins to > remember. > > > > Well, I must say that the exec(open().read()) is not really a proper execfile implementation because it may fail because of encoding issues... (i.e.: one has to check the file encoding to do the open with the proper encoding, otherwise it's possible to end up with gibberish). The PyDev debugger has an implementation (see: https://github.com/fabioz/Pydev/blob/development/plugins/org.python.pydev/pysrc/_pydev_execfile.py) which considers the encoding so that the result is ok (but it still has a bug related to utf-8 with bom: https://sw-brainwy.rhcloud.com/tracker/PyDev/346 which I plan to fix soon...) Personally, it's one thing that I think should be restored as the proper implementation is actually quite tricky and the default recommended solution does not work properly on some situations (and if micropython can provide an optimized implementation which'd conform to Python, that'd be one more point to add it back)... Best Regards, Fabio -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Sun Jun 15 00:39:19 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 14 Jun 2014 15:39:19 -0700 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? Message-ID: <87y4wzm7zc.fsf@vostro.rath.org> Hello, The _pyio.BufferedIOBase class contains the following hack to make sure that you can read-into array objects with format 'b': try: b[:n] = data except TypeError as err: import array if not isinstance(b, array.array): raise err b[:n] = array.array('b', data) I am now wondering if I should implement the same hack in BufferedReader (cf. issue 20578). Is there anything special about 'b' arrays that justifies to treat them this way? Note that readinto is supposed to work with any object implementing the buffer protocol, but the Python implementation only works with bytearrays and (with the above hack) 'b' arrays. Even using a 'B' array fails: >>> import _pyio >>> from array import array >>> buf = array('b', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) 10 >>> buf = array('B', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) Traceback (most recent call last): File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 662, in readinto b[:n] = data TypeError: can only assign array (not "bytes") to array slice During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 667, in readinto b[:n] = array.array('b', data) TypeError: bad argument type for built-in operation It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From greg.ewing at canterbury.ac.nz Sun Jun 15 01:18:51 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Jun 2014 11:18:51 +1200 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> Message-ID: <539CD85B.1060104@canterbury.ac.nz> Fabio Zadrozny wrote: > Well, I must say that the exec(open().read()) is not really a proper > execfile implementation because it may fail because of encoding > issues... It's not far off, though -- all it needs is an optional encoding parameter. -- Greg From Steve.Dower at microsoft.com Sun Jun 15 01:36:15 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 14 Jun 2014 23:36:15 +0000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <539CD85B.1060104@canterbury.ac.nz> References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> , <539CD85B.1060104@canterbury.ac.nz> Message-ID: I think the point is that the encoding may be embedded in the file as a coding comment and there's no obvious way to deal with that. Top-posted from my Windows Phone ________________________________ From: Greg Ewing Sent: ?6/?14/?2014 16:19 To: python-dev at python.org Subject: Re: [Python-Dev] Criticism of execfile() removal in Python3 Fabio Zadrozny wrote: > Well, I must say that the exec(open().read()) is not really a proper > execfile implementation because it may fail because of encoding > issues... It's not far off, though -- all it needs is an optional encoding parameter. -- Greg _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip.montanaro at gmail.com Sun Jun 15 02:01:09 2014 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Sat, 14 Jun 2014 19:01:09 -0500 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140614231144.639bf852@x34f> References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> <20140614231144.639bf852@x34f> Message-ID: > you say "do this once", but actually it's "do it in each interactive > session again and again", ... That's what your Python startup file is for. I have been running with several tweaked builtin functions for years. Never have to consciously load them. If I wanted execfile badly enough, I'd define it there. I don't think I've used execfile more than a handful of times in the 20-odd years I've been using Python. Perhaps our personal approaches to executing code at the interpreter prompt are radically different, but I think if the lack of execfile is such a big deal for you, you might want to check around to see how other people use interactive mode. Skip -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Sun Jun 15 02:41:44 2014 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 14 Jun 2014 17:41:44 -0700 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? In-Reply-To: <87y4wzm7zc.fsf@vostro.rath.org> References: <87y4wzm7zc.fsf@vostro.rath.org> Message-ID: <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote: > It seems to me that a much cleaner solution would be to simply declare > _pyio's readinto to only work with bytearrays, and to explicitly raise a > (more helpful) TypeError if anything else is passed in. That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module. From ncoghlan at gmail.com Sun Jun 15 03:28:29 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Jun 2014 11:28:29 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140614235215.621e7571@x34f> References: <20140610052312.280e49c9@x34f> <20140614235215.621e7571@x34f> Message-ID: On 15 Jun 2014 06:52, "Paul Sokolovsky" wrote: > > Hello, > > On Tue, 10 Jun 2014 17:36:02 +1000 > Nick Coghlan wrote: > > > On 10 June 2014 12:23, Paul Sokolovsky wrote: > > > 1. It hampers interactive mode - instead of short and easy to type > > > execfile("file.py") one needs to use exec(open("file.py").read()). > > > I'm sure that's not going to bother a lot of people - after all, the > > > easiest way to execute a Python file is to drop back to shell and > > > restart python with file name, using all wonders of tab completion. > > > But now imagine that Python interpreter runs on bare hardware, and > > > its REPL is the only shell. That's exactly what we have with > > > MicroPython's Cortex-M port. But it's not really > > > MicroPython-specific, there's CPython port to baremetal either - > > > http://www.pycorn.org/ . > > > > https://docs.python.org/3/library/runpy.html#runpy.run_path > > > > import runpy > > file_globals = runpy.run_path("file.py") > > Thanks, it's the most productive response surely. So, at least there's > alternative to removed execfile(). Unfortunately, I don't think it's > good alternative to execfile() in all respects. It clearly provides API > for that functionality, but is that solution of least surprise and is > it actually known by users at all (to be useful for them)? We don't want people instinctively reaching for execfile (or run_path for that matter). It's almost always the wrong answer to a problem (because it runs code in a weird, ill-defined environment and has undefined behaviour when used inside a function), meeting the definition of "attractive nuisance". We moved reload() to imp.reload() and reduce() to functools.reduce() for similar reasons - they're too rarely the right answer to justify having them globally available by default. > Googling for "execfile python 3", top 3 hits I see are stackoverflow > questions, *none* of which mentions runpy. So, people either don't > consider it viable alternative to execfile, or don't know about it at > all (my guess it's the latter). Given the relative age of the two APIs, that seems likely. Adding answers pointing users to the runpy APIs could be useful. > Like with previous discussion, its meaning goes beyond just Python > realm - there's competition all around. And internets bring funny > examples, like for example http://www.red-lang.org/p/contributions.html > (scroll down to diagram, or here's direct link: > http://3.bp.blogspot.com/-xhOP35Dm99w/UuXFKgY2dlI/AAAAAAAAAGA/YQu98_pPDjw/s1600/reichart-abstraction-diagram.png ) > So, didn't you know that Ruby can be used for OS-level development, and > Python can't? Or that JavaScript DSL capabilities are better than > Python's (that's taking into account that JavaScript DSL capabilities > are represented by JSON, whose creators were so arrogant as to disallow > even usage of comments in it). There's a lot of misinformation on the internet. While there is certainly room for the PSF to do more in terms of effectively communicating Python's ubiquity and strengths (and we're working on that), "people with no clue post stuff on the internet" doesn't make a compelling *technical* argument (which is what is needed to get new builtins added). > So, now suppose there's a discussion of how good different languages are > for interactive usage (out of the box apparently). It would be a little > hard to defend claim that Python is *excellent* interactive language, > if its latest series got -1 on that scale, by removing feature which > may be indispensable at times. Knowing that, one subconsciously may > start to wonder if Ruby or JavaScript are doing it (in wide sense) > better than Python. Yes, people get upset when we tell them we consider some aspects of their software designs to be ill-advised. Running other code in the *current* namespace is such a thing - it is typically preferable to run it in a *different* namespace and then access the results, rather than implicitly overwriting the contents of the current namespace. That said, a question still worth asking is whether there is scope for additional runpy APIs that are designed to more easily implement Python 2 and IPython style modes of operation where independent units of code manipulate a shared namespace? That's actually a possibility, but any such proposals need to be presented on python-ideas in terms of the *use case* to be addressed, rather than the fact that execfile() happened to be the preferred solution in Python 2. Regards, Nick. > > > -- > Best regards, > Paul mailto:pmiscml at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jun 15 03:31:44 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Jun 2014 11:31:44 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> <539CD85B.1060104@canterbury.ac.nz> Message-ID: On 15 Jun 2014 09:37, "Steve Dower" wrote: > > I think the point is that the encoding may be embedded in the file as a coding comment and there's no obvious way to deal with that. Opening source files correctly is the intended use case for tokenize.open(). Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sun Jun 15 03:22:52 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 14 Jun 2014 20:22:52 -0500 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: Of course cmd.exe is hardcoded; there are no other shells on Windows! (I'm purposely ignoring MinGW, Cygwin, command.com, etc.) If anything, auto-escaping will break scripts that are already designed to escape carets on Windows. On Sat, Jun 14, 2014 at 2:54 PM, anatoly techtonik wrote: > On Fri, Jun 13, 2014 at 2:55 AM, Ryan Gonzalez wrote: > >> SHELLS ARE NOT CROSS-PLATFORM!!!! Seriously, there are going to be >> differences. If you really must: >> >> escape = lambda s: s.replace('^', '^^') if os.name == 'nt' else s >> > > It is not about generic shell problem, it is about specific behavior that > on Windows Python already uses cmd.exe shell hardcoded in its sources. So > for crossplatform behavior on Windows, it should escape symbols on command > passed to cmd.exe that are special to this shell to avoid breaking Python > scripts. What you propose is a bad workaround, because it assumes that all > Python users who use subprocess to execute hg or git should possess apriori > knowledge about default subprocess behaviour with default shell on Windows > and implement workaround for that. > -- > anatoly t. > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Jun 15 01:15:27 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Jun 2014 11:15:27 +1200 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: <20140611230030.6F56F250DC4@webabinitio.net> Message-ID: <539CD78F.9020908@canterbury.ac.nz> > On Thu, Jun 12, 2014 at 12:07 PM, Chris Angelico > wrote: > > ISTM what you want is not shell=True, but a separate function that > > follows the system policy for translating a command name into a > > path-to-binary. According to the docs, subprocess.Popen should already be doing this on Unix: On Unix, with shell=False (default): In this case, the Popen class uses os.execvp() to execute the child program. and execvp() searches the user's PATH to find the program. However, it says the Windows version uses CreateProcess, which doesn't use PATH. This seems like an unfortunate platform difference to me. It would be better if PATH were searched on both platforms, or better still, make it an option independent of shell=True. -- Greg From Steve.Dower at microsoft.com Sun Jun 15 05:15:12 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sun, 15 Jun 2014 03:15:12 +0000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> <539CD85B.1060104@canterbury.ac.nz> , Message-ID: <481b9af010ac4134a3ecbafd32f3be31@BLUPR03MB389.namprd03.prod.outlook.com> So is exec(tokenize.open(file).read()) the actual replacement for execfile()? Not too bad, but still not obvious (or widely promoted - I'd never heard of it). Top-posted from my Windows Phone ________________________________ From: Nick Coghlan Sent: ?6/?14/?2014 18:31 To: Steve Dower Cc: Greg Ewing; python-dev at python.org Subject: Re: [Python-Dev] Criticism of execfile() removal in Python3 On 15 Jun 2014 09:37, "Steve Dower" > wrote: > > I think the point is that the encoding may be embedded in the file as a coding comment and there's no obvious way to deal with that. Opening source files correctly is the intended use case for tokenize.open(). Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jun 15 06:31:36 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Jun 2014 14:31:36 +1000 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? In-Reply-To: <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> References: <87y4wzm7zc.fsf@vostro.rath.org> <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> Message-ID: On 15 June 2014 10:41, Benjamin Peterson wrote: > On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote: >> It seems to me that a much cleaner solution would be to simply declare >> _pyio's readinto to only work with bytearrays, and to explicitly raise a >> (more helpful) TypeError if anything else is passed in. > > That seems reasonable. I don't think _pyio's behavior is terribly > important compared to the C _io module. _pyio was written before the various memoryview fixes that were implemented in Python 3.3 - it seems to me it would make more sense to use memoryview to correctly handle arbitrary buffer exporters (we implemented similar fixes for the base64 module in 3.4). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 15 06:42:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Jun 2014 14:42:50 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <481b9af010ac4134a3ecbafd32f3be31@BLUPR03MB389.namprd03.prod.outlook.com> References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> <539CD85B.1060104@canterbury.ac.nz> <481b9af010ac4134a3ecbafd32f3be31@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On 15 June 2014 13:15, Steve Dower wrote: > So is exec(tokenize.open(file).read()) the actual replacement for > execfile()? Not too bad, but still not obvious (or widely promoted - I'd > never heard of it). Yes, that's pretty close. It's still a dubious idea due to the implicit modification of the local namespace (and the resulting differences in behaviour at function level due to the fact that writing to locals() doesn't actually update the local namespace). That said, the "implicit changes to the local namespace are a bad idea" concern applies to exec() in general, so it was the "it's just a shorthand for a particular use of exec" aspect that tipped in the balance in the demise of execfile (this is also implied by the phrasing of the relevant bullet point in PEP 3100). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From Nikolaus at rath.org Sun Jun 15 06:57:12 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 14 Jun 2014 21:57:12 -0700 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? In-Reply-To: References: <87y4wzm7zc.fsf@vostro.rath.org> <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> Message-ID: <539D27A8.7070505@rath.org> On 06/14/2014 09:31 PM, Nick Coghlan wrote: > On 15 June 2014 10:41, Benjamin Peterson wrote: >> On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote: >>> It seems to me that a much cleaner solution would be to simply declare >>> _pyio's readinto to only work with bytearrays, and to explicitly raise a >>> (more helpful) TypeError if anything else is passed in. >> >> That seems reasonable. I don't think _pyio's behavior is terribly >> important compared to the C _io module. > > _pyio was written before the various memoryview fixes that were > implemented in Python 3.3 - it seems to me it would make more sense to > use memoryview to correctly handle arbitrary buffer exporters (we > implemented similar fixes for the base64 module in 3.4). Definitely. But is there a way to do that without writing C code? My attempts failed: >>> from array import array >>> a = array('b', b'x'*10) >>> am = memoryview(a) >>> am[:3] = b'foo' Traceback (most recent call last): File "", line 1, in ValueError: memoryview assignment: lvalue and rvalue have different structures >>> am[:3] = memoryview(b'foo') Traceback (most recent call last): File "", line 1, in ValueError: memoryview assignment: lvalue and rvalue have different structures >>> am.format = 'B' Traceback (most recent call last): File "", line 1, in AttributeError: attribute 'format' of 'memoryview' objects is not writable The only thing that works is: >>> am[:3] = array('b', b'foo') but that's again specific to a being a 'b'-array. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From mail at timgolden.me.uk Sun Jun 15 08:07:18 2014 From: mail at timgolden.me.uk (Tim Golden) Date: Sun, 15 Jun 2014 07:07:18 +0100 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: <539D3816.3080402@timgolden.me.uk> On 15/06/2014 02:22, Ryan Gonzalez wrote: > Of course cmd.exe is hardcoded; Of course it's not: (from Lib/subprocess.py) comspec = os.environ.get("COMSPEC", "cmd.exe") I don't often expect, in these post-command.com days, to get anything other than cmd.exe. But alternative command processors are certainly possible. TJG From ncoghlan at gmail.com Sun Jun 15 08:37:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Jun 2014 16:37:48 +1000 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? In-Reply-To: <539D27A8.7070505@rath.org> References: <87y4wzm7zc.fsf@vostro.rath.org> <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> <539D27A8.7070505@rath.org> Message-ID: On 15 June 2014 14:57, Nikolaus Rath wrote: > On 06/14/2014 09:31 PM, Nick Coghlan wrote: >> On 15 June 2014 10:41, Benjamin Peterson wrote: >>> On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote: >>>> It seems to me that a much cleaner solution would be to simply declare >>>> _pyio's readinto to only work with bytearrays, and to explicitly raise a >>>> (more helpful) TypeError if anything else is passed in. >>> >>> That seems reasonable. I don't think _pyio's behavior is terribly >>> important compared to the C _io module. >> >> _pyio was written before the various memoryview fixes that were >> implemented in Python 3.3 - it seems to me it would make more sense to >> use memoryview to correctly handle arbitrary buffer exporters (we >> implemented similar fixes for the base64 module in 3.4). > > Definitely. But is there a way to do that without writing C code? Yes, Python level reshaping and typecasting of memory views is one of the key enhancements Stefan implemented for 3.3. >>> from array import array >>> a = array('b', b'x'*10) >>> am = memoryview(a) >>> a array('b', [120, 120, 120, 120, 120, 120, 120, 120, 120, 120]) >>> am[:3] = memoryview(b'foo').cast('b') >>> a array('b', [102, 111, 111, 120, 120, 120, 120, 120, 120, 120]) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Sun Jun 15 09:54:50 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 15 Jun 2014 08:54:50 +0100 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: <539CD78F.9020908@canterbury.ac.nz> References: <20140611230030.6F56F250DC4@webabinitio.net> <539CD78F.9020908@canterbury.ac.nz> Message-ID: On 15 June 2014 00:15, Greg Ewing wrote: > However, it says the Windows version uses CreateProcess, which > doesn't use PATH. Huh? CreateProcess uses PATH: >py -3.4 Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:25:23) [MSC v.1600 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import subprocess >>> subprocess.check_call(['echo', 'hello']) hello 0 "echo" is an executable "C:\Utils\GnuWin64\echo.exe" which is on PATH but not in the current directory... Paul From mail at timgolden.me.uk Sun Jun 15 10:58:32 2014 From: mail at timgolden.me.uk (Tim Golden) Date: Sun, 15 Jun 2014 09:58:32 +0100 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: <20140611230030.6F56F250DC4@webabinitio.net> <539CD78F.9020908@canterbury.ac.nz> Message-ID: <539D6038.2080309@timgolden.me.uk> On 15/06/2014 08:54, Paul Moore wrote: > On 15 June 2014 00:15, Greg Ewing wrote: >> However, it says the Windows version uses CreateProcess, which >> doesn't use PATH. > > Huh? CreateProcess uses PATH: Just to be precise: CreateProcess *doesn't* use PATH if you pass an lpApplicationName parameter. It *does* use PATH if you pass a lpCommandLine parameter without an lpApplicationName parameter. It's possible to do either via the subprocess module, but the latter is the default. If you call: subprocess.Popen(['program.exe', 'a', 'b']) or subprocess.Popen('program.exe a b']) Then CreateProcess will be called with a lpCommandLine but no lpApplicationName and PATH will be searched. If, however, you call: subprocess.Popen(['a', 'b'], executable="program.exe") then CreateProcess will be called with lpApplicationName="program.exe" and lpCommandLine="a b" and the PATH will not be searched. TJG From victor.stinner at gmail.com Sun Jun 15 11:31:43 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 15 Jun 2014 11:31:43 +0200 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? In-Reply-To: <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> References: <87y4wzm7zc.fsf@vostro.rath.org> <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> Message-ID: Le 15 juin 2014 02:42, "Benjamin Peterson" a ?crit : > On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote: > > It seems to me that a much cleaner solution would be to simply declare > > _pyio's readinto to only work with bytearrays, and to explicitly raise a > > (more helpful) TypeError if anything else is passed in. > > That seems reasonable. I don't think _pyio's behavior is terribly > important compared to the C _io module. Which types are accepted by the readinto() method of the C io module? If the C module only accepts bytearray, the array hack must be removed from _pyio. The _pyio module is mostly used for testing purpose, it's much slower. I hope that nobody uses it in production, the module is private (underscore prefix). So it's fine to break backward compatibilty to have the same behaviour then the C module. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Jun 15 12:47:54 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Jun 2014 22:47:54 +1200 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: <20140611230030.6F56F250DC4@webabinitio.net> <539CD78F.9020908@canterbury.ac.nz> Message-ID: <539D79DA.8030608@canterbury.ac.nz> Paul Moore wrote: > Huh? CreateProcess uses PATH: Hmm, in that case Microsoft's documentation is lying, or subprocess is doing something itself before passing the command name to CreateProcess. Anyway, looks like there's no problem. -- Greg From Nikolaus at rath.org Sun Jun 15 21:03:28 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sun, 15 Jun 2014 12:03:28 -0700 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? In-Reply-To: (Nick Coghlan's message of "Sun, 15 Jun 2014 16:37:48 +1000") References: <87y4wzm7zc.fsf@vostro.rath.org> <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> <539D27A8.7070505@rath.org> Message-ID: <87vbs2m1vj.fsf@vostro.rath.org> Nick Coghlan writes: > On 15 June 2014 14:57, Nikolaus Rath wrote: >> On 06/14/2014 09:31 PM, Nick Coghlan wrote: >>> On 15 June 2014 10:41, Benjamin Peterson wrote: >>>> On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote: >>>>> It seems to me that a much cleaner solution would be to simply declare >>>>> _pyio's readinto to only work with bytearrays, and to explicitly raise a >>>>> (more helpful) TypeError if anything else is passed in. >>>> >>>> That seems reasonable. I don't think _pyio's behavior is terribly >>>> important compared to the C _io module. >>> >>> _pyio was written before the various memoryview fixes that were >>> implemented in Python 3.3 - it seems to me it would make more sense to >>> use memoryview to correctly handle arbitrary buffer exporters (we >>> implemented similar fixes for the base64 module in 3.4). >> >> Definitely. But is there a way to do that without writing C code? > > Yes, Python level reshaping and typecasting of memory views is one of > the key enhancements Stefan implemented for 3.3. [..] Ah, nice. I'll use that. Thank you Stefan :-). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From Nikolaus at rath.org Sun Jun 15 21:05:09 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sun, 15 Jun 2014 12:05:09 -0700 Subject: [Python-Dev] Why does _pyio.*.readinto have to work with 'b' arrays? In-Reply-To: (Victor Stinner's message of "Sun, 15 Jun 2014 11:31:43 +0200") References: <87y4wzm7zc.fsf@vostro.rath.org> <1402792904.16337.128859457.27BBE77A@webmail.messagingengine.com> Message-ID: <87sin6m1sq.fsf@vostro.rath.org> Victor Stinner writes: > Le 15 juin 2014 02:42, "Benjamin Peterson" a ?crit : >> On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote: >> > It seems to me that a much cleaner solution would be to simply declare >> > _pyio's readinto to only work with bytearrays, and to explicitly raise a >> > (more helpful) TypeError if anything else is passed in. >> >> That seems reasonable. I don't think _pyio's behavior is terribly >> important compared to the C _io module. > > Which types are accepted by the readinto() method of the C io module? Everything implementing the buffer protocol. > If the C module only accepts bytearray, the array hack must be removed > from _pyio. _pyio currently accepts only bytearray and 'b'-type arrays. But it seems with memoryview.cast() we now have a way to make it behave like the C module. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From chris.barker at noaa.gov Mon Jun 16 19:40:03 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Jun 2014 10:40:03 -0700 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <20140614231144.639bf852@x34f> References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> <20140614231144.639bf852@x34f> Message-ID: On Sat, Jun 14, 2014 at 1:11 PM, Paul Sokolovsky wrote: > > > 1. It hampers interactive mode - instead of short and easy to type > > > execfile("file.py") one needs to use exec(open("file.py").read()). > > > If the amount of typing is the problem, that's easy to solve: > > > > # do this once > > def execfile(name): > > exec(open("file.py").read()) > FWIW, when I started using python (15?) years ago -- the first thing I looked for was a way to "just run a file", at the interactive prompt, like I had in MATLAB. I found and used execfile(). However, it wasn't long before I discovered that excecfile() was really kind of a pain, you've got namespaces, and all sorts of stuff that made it often not work like I wanted, and was a pain to type. I stopped using it all together More recently, I discovered iPython and its "run" function -- very nice, it does the obvious stuff for you the way you'd expect. My conclusions: 1) runfile() is not really very usefull, it's fine to hve removed it. 2) the built-in interactive python interpreter is really pretty lame. If you want a good interactive experience, you need something more anyway (iPython, for instance) -- putting execfile() back is only one tiny improvement that's not worth it. So if this is about micropython -- I think it would serve the project very well to have a micropython-specific interactive mode. iPython is fabulous, but though I imagine too heavy weight. But perhaps you could borrow some things from it -- like "run" , for example. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jun 16 19:52:18 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 16 Jun 2014 10:52:18 -0700 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> <20140614231144.639bf852@x34f> Message-ID: <539F2ED2.5080105@stoneleaf.us> On 06/16/2014 10:40 AM, Chris Barker wrote: > > My conclusions: > > 1) runfile() is not really very usefull, it's fine to hve removed it. s/runfile/execfile -- ~Ethan~ From victor.stinner at gmail.com Mon Jun 16 23:12:18 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Jun 2014 23:12:18 +0200 Subject: [Python-Dev] Windows XP, Python 3.5 and PEP 11 Message-ID: Hi, I would like to know if Python 3.5 will still support Windows XP or not. Almost all flavors of Windows XP reached the end-of-life in April, 2014 except "Windows XP Embedded". There is even an hack to use Windows upgrades on the desktop flavor using the embedded flavor (by changing a key in the registry). Extracts of the Wikipedia page: "As of January 2014, at least 49% of all computers in China still ran XP. " "In January 2014, it was estimated that more than 95% of the 3 million automated teller machines in the world were still running Windows XP (which largely replaced IBM's OS/2 as the predominant operating system on ATMs)" http://en.wikipedia.org/wiki/Windows_XP A few months ago, I installed an ISO of Windows XP, downloaded from MSDN, to investigate a bug (something related to timer and HPET), but then I realized that I can use my Windows 7 VM to reproduce the issue. Now I cannot use my Windows XP VM anymore because I have to enter a product key (before I had a delay of 30 days), but I don't have this product key and my MSDN account expired. I don't want to waste my time and money with the registration thing, so I just gave up. Any of you plan to invest time on issues specific to Windows XP and produce binaries working on Windows XP? Or can we just provide binaries without testing them? For example, it looks like the following issue is specific to Windows XP: http://bugs.python.org/issue6926 Oh, and the PEP 11: http://legacy.python.org/dev/peps/pep-0011/#microsoft-windows "Microsoft has established a policy called product support lifecycle (...) Python's Windows support now follows this lifecycle." Victor From ncoghlan at gmail.com Tue Jun 17 00:39:29 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 17 Jun 2014 08:39:29 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> <20140614231144.639bf852@x34f> Message-ID: On 17 Jun 2014 03:42, "Chris Barker" wrote: > > On Sat, Jun 14, 2014 at 1:11 PM, Paul Sokolovsky wrote: > >> >> > > 1. It hampers interactive mode - instead of short and easy to type >> > > execfile("file.py") one needs to use exec(open("file.py").read()). >> >> > >> > If the amount of typing is the problem, that's easy to solve: >> > >> > # do this once >> > def execfile(name): >> > exec(open("file.py").read()) > > > FWIW, when I started using python (15?) years ago -- the first thing I looked for was a way to "just run a file", at the interactive prompt, like I had in MATLAB. I found and used execfile(). Yes, if people are looking for a MATLAB replacement, they want IPython rather than the default REPL. The default one is deliberately minimal, IPython is designed to be a comprehensive numeric and scientific workspace. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.ware+pydev at gmail.com Tue Jun 17 05:08:28 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Mon, 16 Jun 2014 22:08:28 -0500 Subject: [Python-Dev] Windows XP, Python 3.5 and PEP 11 In-Reply-To: References: Message-ID: On Mon, Jun 16, 2014 at 4:12 PM, Victor Stinner wrote: > Hi, > > I would like to know if Python 3.5 will still support Windows XP or > not. Almost all flavors of Windows XP reached the end-of-life in > April, 2014 except "Windows XP Embedded". There is even an hack to use > Windows upgrades on the desktop flavor using the embedded flavor (by > changing a key in the registry). Extracts of the Wikipedia page: This was recently discussed in the "Moving Python 3.5 on Windows to a new compiler" thread, where Martin declared XP support to be ended [1]. I believe Tim Golden is the only resident Windows dev from whom I haven't seen at least implicit agreement that XP doesn't need further support, so I'd say our support for XP is well and truly dead :) In any case, surely anyone stuck with XP can be happy with Python 3.4. I'm perfectly fine with 3.2 on Win2k! -- Zach [1] https://mail.python.org/pipermail/python-dev/2014-June/134903.html From mail at timgolden.me.uk Tue Jun 17 07:01:29 2014 From: mail at timgolden.me.uk (Tim Golden) Date: Tue, 17 Jun 2014 06:01:29 +0100 Subject: [Python-Dev] Windows XP, Python 3.5 and PEP 11 In-Reply-To: References: Message-ID: <539FCBA9.2010903@timgolden.me.uk> On 17/06/2014 04:08, Zachary Ware wrote: > On Mon, Jun 16, 2014 at 4:12 PM, Victor Stinner > wrote: >> Hi, >> >> I would like to know if Python 3.5 will still support Windows XP or >> not. Almost all flavors of Windows XP reached the end-of-life in >> April, 2014 except "Windows XP Embedded". There is even an hack to use >> Windows upgrades on the desktop flavor using the embedded flavor (by >> changing a key in the registry). Extracts of the Wikipedia page: > > This was recently discussed in the "Moving Python 3.5 on Windows to a > new compiler" thread, where Martin declared XP support to be ended > [1]. I believe Tim Golden is the only resident Windows dev from whom > I haven't seen at least implicit agreement that XP doesn't need > further support, so I'd say our support for XP is well and truly dead > :) > > In any case, surely anyone stuck with XP can be happy with Python 3.4. > I'm perfectly fine with 3.2 on Win2k! > I think we're justified in dropping XP support, for all the reasons others have given. Like most people, I suppose, I'm support WinXP in various ways (including embedded) because "not supported" != "not working". But those are all running 2.x versions of Python. It'll be good to be able stretch a little on the Windows API front without having to double-think about where a particular API came in. TJG From victor.stinner at gmail.com Tue Jun 17 09:03:54 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 17 Jun 2014 09:03:54 +0200 Subject: [Python-Dev] Windows XP, Python 3.5 and PEP 11 In-Reply-To: <539FCBA9.2010903@timgolden.me.uk> References: <539FCBA9.2010903@timgolden.me.uk> Message-ID: 2014-06-17 7:01 GMT+02:00 Tim Golden : > On 17/06/2014 04:08, Zachary Ware wrote: >> This was recently discussed in the "Moving Python 3.5 on Windows to a >> new compiler" thread, where Martin declared XP support to be ended >> [1]. I believe Tim Golden is the only resident Windows dev from whom >> I haven't seen at least implicit agreement that XP doesn't need >> further support, so I'd say our support for XP is well and truly dead >> :) >> >> In any case, surely anyone stuck with XP can be happy with Python 3.4. >> I'm perfectly fine with 3.2 on Win2k! >> > > I think we're justified in dropping XP support, for all the reasons others > have given. Would you be ok to make this official by adding Windows XP explicitly to the PEP 11? (I can do the change, I'm just asking for a confirmation.) > Like most people, I suppose, I'm support WinXP in various ways > (including embedded) because "not supported" != "not working". But those are > all running 2.x versions of Python. I'm ok to provide a best-effort support of Windows XP on Python 2.7 (and maybe also Python 3.4), especially if there are Windows XP buildbots. We can drop Windows XP support in Python 3.5 only. Victor From victor.stinner at gmail.com Tue Jun 17 09:11:45 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 17 Jun 2014 09:11:45 +0200 Subject: [Python-Dev] Commit "avoid a deadlock with the interpreter head lock and the GIL during finalization" Message-ID: Hi, I just saw a change in Python finalization related to threads. I'm not sure that it is correct to not call tstate_delete_common(). Is this change related to an issue? I don't see any specific test. --- changeset 91234:5ccb6901cf95 3.4 avoid a deadlock with the interpreter head lock and the GIL during finalization author Benjamin Peterson date Mon, 16 Jun 2014 23:07:49 -0700 (61 minutes ago) parents d1d1ed421717 children 2ed64ea19d81 fceb3a907260 files Python/pystate.c diffstat 1 files changed, 8 insertions(+), 0 deletions(-) [+] http://hg.python.org/cpython/rev/5ccb6901cf95 --- Victor From breamoreboy at yahoo.co.uk Tue Jun 17 09:53:09 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 17 Jun 2014 08:53:09 +0100 Subject: [Python-Dev] Windows XP, Python 3.5 and PEP 11 In-Reply-To: References: <539FCBA9.2010903@timgolden.me.uk> Message-ID: On 17/06/2014 08:03, Victor Stinner wrote: > 2014-06-17 7:01 GMT+02:00 Tim Golden : >> On 17/06/2014 04:08, Zachary Ware wrote: >>> This was recently discussed in the "Moving Python 3.5 on Windows to a >>> new compiler" thread, where Martin declared XP support to be ended >>> [1]. I believe Tim Golden is the only resident Windows dev from whom >>> I haven't seen at least implicit agreement that XP doesn't need >>> further support, so I'd say our support for XP is well and truly dead >>> :) >>> >>> In any case, surely anyone stuck with XP can be happy with Python 3.4. >>> I'm perfectly fine with 3.2 on Win2k! >>> >> >> I think we're justified in dropping XP support, for all the reasons others >> have given. > > Would you be ok to make this official by adding Windows XP explicitly > to the PEP 11? (I can do the change, I'm just asking for a > confirmation.) > From PEP 11 the entire "Microsoft Windows" section. Please see the third paragraph. "Microsoft has established a policy called product support lifecycle [1]. Each product's lifecycle has a mainstream support phase, where the product is generally commercially available, and an extended support phase, where paid support is still available, and certain bug fixes are released (in particular security fixes). Python's Windows support now follows this lifecycle. A new feature release X.Y.0 will support all Windows releases whose extended support phase is not yet expired. Subsequent bug fix releases will support the same Windows releases as the original feature release (even if the extended support phase has ended). Because of this policy, no further Windows releases need to be listed in this PEP. Each feature release is built by a specific version of Microsoft Visual Studio. That version should have mainstream support when the release is made. Developers of extension modules will generally need to use the same Visual Studio release; they are concerned both with the availability of the versions they need to use, and with keeping the zoo of versions small. The Python source tree will keep unmaintained build files for older Visual Studio releases, for which patches will be accepted. Such build files will be removed from the source tree 3 years after the extended support for the compiler has ended (but continue to remain available in revision control)." -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From chris.barker at noaa.gov Tue Jun 17 17:59:13 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 17 Jun 2014 08:59:13 -0700 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> <20140614231144.639bf852@x34f> Message-ID: On Mon, Jun 16, 2014 at 3:39 PM, Nick Coghlan wrote: > > FWIW, when I started using python (15?) years ago -- the first thing I > looked for was a way to "just run a file", at the interactive prompt, like > I had in MATLAB. I found and used execfile(). > > Yes, if people are looking for a MATLAB replacement, they want IPython > rather than the default REPL. > I didn't meant o distract the conversation here -- what I meant was that even before iPython existed, I still dropped using execfile("") it was hardly ever the right thing. And for the micropython example, I'm proposing that a micropython interactive environment would be a really nice thing to build -- and worth doing, even if execfile() was still there. By the way: iPython, while coming from, and heavily used by, the scientific/numeric computing community, is a great tool for all sorts of other python development as well. But probably too heavyweight for micropython. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ayates at hp.com Tue Jun 17 18:41:23 2014 From: ayates at hp.com (Yates, Andy (CS Houston, TX)) Date: Tue, 17 Jun 2014 16:41:23 +0000 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required Message-ID: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> Python Dev, Andy here. I have a Windows product based on Python and I'm getting hammered to release a version that includes the fix in OpenSSL 1.0.1h. My product is built on a Windows system using Python installed from the standard Python installer at Python.org. I would be grateful if I could get some advice on my options. Will Python.org be releasing a Windows installer with the fix any time soon or will it be at the next scheduled release in November? If it is November, there's no way I can wait that long. Now what? Would it be best to build my own Python? Is it possible to drop in new OpenSSL versions on Windows without rebuilding Python? Looking for some guidance on how to handle these OpenSSL issues on Windows. Thanks! Andy Yates -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Tue Jun 17 20:27:30 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 17 Jun 2014 18:27:30 +0000 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> Message-ID: <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> Yates, Andy (CS Houston, TX) wrote: > Python Dev, > Andy here. I have a Windows product based on Python and I'm getting hammered to > release a version that includes the fix in OpenSSL 1.0.1h. My product is built > on a Windows system using Python installed from the standard Python installer at > Python.org. I would be grateful if I could get some advice on my options. Will > Python.org be releasing a Windows installer with the fix any time soon or will > it be at the next scheduled release in November? If it is November, there's no > way I can wait that long. Now what? Would it be best to build my own Python? Is > it possible to drop in new OpenSSL versions on Windows without rebuilding > Python? Looking for some guidance on how to handle these OpenSSL issues on > Windows. You'll only need to rebuild the _ssl and _hashlib extension modules with the new OpenSSL version. The easiest way to do this is to build from source (which has already been updated for 1.0.1h if you use the externals scripts in Tools\buildbot), and you should just be able to drop _ssl.pyd and _hashlib.pyd on top of a normal install. Aside: I wonder if it's worth changing to dynamically linking to OpenSSL? It would make this kind of in-place upgrade easier when people need to do it. Any thoughts? (Does OpenSSL even support it?) Cheers, Steve > Thanks! > Andy Yates From mal at egenix.com Tue Jun 17 20:55:54 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 17 Jun 2014 20:55:54 +0200 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <53A08F3A.30908@egenix.com> On 17.06.2014 20:27, Steve Dower wrote: > Yates, Andy (CS Houston, TX) wrote: >> Python Dev, >> Andy here. I have a Windows product based on Python and I'm getting hammered to >> release a version that includes the fix in OpenSSL 1.0.1h. My product is built >> on a Windows system using Python installed from the standard Python installer at >> Python.org. I would be grateful if I could get some advice on my options. Will >> Python.org be releasing a Windows installer with the fix any time soon or will >> it be at the next scheduled release in November? If it is November, there's no >> way I can wait that long. Now what? Would it be best to build my own Python? Is >> it possible to drop in new OpenSSL versions on Windows without rebuilding >> Python? Looking for some guidance on how to handle these OpenSSL issues on >> Windows. > > You'll only need to rebuild the _ssl and _hashlib extension modules with the new OpenSSL version. The easiest way to do this is to build from source (which has already been updated for 1.0.1h if you use the externals scripts in Tools\buildbot), and you should just be able to drop _ssl.pyd and _hashlib.pyd on top of a normal install. > > Aside: I wonder if it's worth changing to dynamically linking to OpenSSL? It would make this kind of in-place upgrade easier when people need to do it. Any thoughts? (Does OpenSSL even support it?) Yes, no problem at all, but you'd still have to either do a new release every time a new OpenSSL problem is found (don't think that's an option for Python) or provide new compiled versions compatible with the Python modules needing the OpenSSL libs or instructions on how to build these. Note that the hash routines are rarely affected by these OpenSSL bugs. They usually only affect the SSL/TLS protocol parts. Alternatively, you could make use of our pyOpenSSL distribution, which includes pyOpenSSL and the OpenSSL libs (also for Windows): http://www.egenix.com/products/python/pyOpenSSL/ We created this to address the problem of having to update OpenSSL rather often. It doesn't support Python 3 yet, but on the plus side, you do get OpenSSL libs which are compiled with the same compiler versions used for the Python.org installers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nad at acm.org Tue Jun 17 21:03:40 2014 From: nad at acm.org (Ned Deily) Date: Tue, 17 Jun 2014 12:03:40 -0700 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: In article <81f84430ce0242e5bfa5b2264777df56 at BLUPR03MB389.namprd03.prod.outlook.com >, Steve Dower wrote: > You'll only need to rebuild the _ssl and _hashlib extension modules with the > new OpenSSL version. The easiest way to do this is to build from source > (which has already been updated for 1.0.1h if you use the externals scripts > in Tools\buildbot), and you should just be able to drop _ssl.pyd and > _hashlib.pyd on top of a normal install. Should we consider doing a re-spin of the Windows installers for 2.7.7 with 1.0.1h? Or consider doing a 2.7.8 in the near future to address this and various 2.7.7 regressions that have been identified so far (Issues 21652 and 21672)? > Aside: I wonder if it's worth changing to dynamically linking to OpenSSL? It > would make this kind of in-place upgrade easier when people need to do it. > Any thoughts? (Does OpenSSL even support it?) OpenSSL is often dynamically linked in Python builds on various other platforms, for example, on Linux or OS X. -- Ned Deily, nad at acm.org From benjamin at python.org Tue Jun 17 21:07:06 2014 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 17 Jun 2014 12:07:06 -0700 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <1403032026.581.129869537.3E053BB6@webmail.messagingengine.com> On Tue, Jun 17, 2014, at 12:03, Ned Deily wrote: > In article > <81f84430ce0242e5bfa5b2264777df56 at BLUPR03MB389.namprd03.prod.outlook.com > >, > Steve Dower wrote: > > You'll only need to rebuild the _ssl and _hashlib extension modules with the > > new OpenSSL version. The easiest way to do this is to build from source > > (which has already been updated for 1.0.1h if you use the externals scripts > > in Tools\buildbot), and you should just be able to drop _ssl.pyd and > > _hashlib.pyd on top of a normal install. > > Should we consider doing a re-spin of the Windows installers for 2.7.7 > with 1.0.1h? Or consider doing a 2.7.8 in the near future to address > this and various 2.7.7 regressions that have been identified so far > (Issues 21652 and 21672)? I think we should do a 2.7.8 soon to pick up the openssl upgrade and recent CGI security fix. I would like to see those two regressions fixed first, though. From antoine at python.org Tue Jun 17 22:36:23 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 17 Jun 2014 16:36:23 -0400 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <53A08F3A.30908@egenix.com> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> <53A08F3A.30908@egenix.com> Message-ID: Le 17/06/2014 14:55, M.-A. Lemburg a ?crit : > > Alternatively, you could make use of our pyOpenSSL distribution, > which includes pyOpenSSL and the OpenSSL libs (also for Windows): > > http://www.egenix.com/products/python/pyOpenSSL/ > > We created this to address the problem of having to update > OpenSSL rather often. This is very nice, but does it also upgrade the OpenSSL version used by the _ssl and _hashlib modules? Regards Antoine. From mal at egenix.com Tue Jun 17 22:58:45 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 17 Jun 2014 22:58:45 +0200 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> <53A08F3A.30908@egenix.com> Message-ID: <53A0AC05.8050708@egenix.com> On 17.06.2014 22:36, Antoine Pitrou wrote: > Le 17/06/2014 14:55, M.-A. Lemburg a ?crit : >> >> Alternatively, you could make use of our pyOpenSSL distribution, >> which includes pyOpenSSL and the OpenSSL libs (also for Windows): >> >> http://www.egenix.com/products/python/pyOpenSSL/ >> >> We created this to address the problem of having to update >> OpenSSL rather often. > > This is very nice, but does it also upgrade the OpenSSL version used by the _ssl and _hashlib modules? On Unix, tt will if you load pyOpenSSL before importing _ssl or _hashlib (and those modules are built as shared libs). Alternatively, you can set LD_LIBRARY_PATH to lib/python2.7/OpenSSL to have the system linker use the embedded libs before starting Python. Then it will always use the up-to-date libs. On Windows, this won't work, because _ssl and _hashlib are statically linked against the OpenSSL libs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 2014-06-09: Released eGenix pyOpenSSL 0.13.3 ... http://egenix.com/go57 2014-07-02: Python Meeting Duesseldorf ... 15 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Wed Jun 18 00:00:49 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Jun 2014 08:00:49 +1000 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: References: <20140610052312.280e49c9@x34f> <20140610030303.GU10355@ando> <20140614231144.639bf852@x34f> Message-ID: On 18 Jun 2014 01:59, "Chris Barker" wrote: > > By the way: iPython, while coming from, and heavily used by, the scientific/numeric computing community, is a great tool for all sorts of other python development as well. But probably too heavyweight for micropython. (we're drifting off topic, so this will be my last addition to this subthread) Yes, as great as IPython is, when it's considered out of scope for the standard installers, it's unlikely to be a good fit for a version of Python aimed at running *on* a microcontroller. Running on a Raspberry Pi or remote PC and *talking* to an associated microcontroller is a different story, though. Cheers, Nick. > > -CHB > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Wed Jun 18 09:18:24 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 18 Jun 2014 08:18:24 +0100 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> Message-ID: On 17 June 2014 17:41, Yates, Andy (CS Houston, TX) wrote: > Is it possible to drop in new OpenSSL versions > on Windows without rebuilding Python? If you think this is a problem you're going to have more than once, you'll want to look hard at whether it's worth using pyOpenSSL (either the egenix version or the PyCA one[1]) instead, and delivering binary releases with a bundled copy of OpenSSL. PyOpenSSL from PyCA is actually considering bundling OpenSSL on Windows anyway[2], so you might find this problem goes away. [1] https://github.com/pyca/pyopenssl [2] https://github.com/pyca/cryptography/issues/1121 From martin at v.loewis.de Wed Jun 18 11:32:33 2014 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jun 2014 11:32:33 +0200 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> Message-ID: <53A15CB1.5070401@v.loewis.de> Am 17.06.14 18:41, schrieb Yates, Andy (CS Houston, TX): > Python Dev, > > Andy here. I have a Windows product based on Python and I?m getting > hammered to release a version that includes the fix in OpenSSL 1.0.1h. > My product is built on a Windows system using Python installed from the > standard Python installer at Python.org. I would be grateful if I could > get some advice on my options. Can you please report - what version of Python you are distributing? - why it absolutely has to be 1.0.1h that is included? According to the CVE, 0.9.8za and 1.0.0m would work as well (and in our case, would be preferred for older versions of Python). Regards, Martin From martin at v.loewis.de Wed Jun 18 11:46:46 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jun 2014 11:46:46 +0200 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <53A16006.3090801@v.loewis.de> Am 17.06.14 20:27, schrieb Steve Dower: > You'll only need to rebuild the _ssl and _hashlib extension modules > with the new OpenSSL version. The easiest way to do this is to build > from source (which has already been updated for 1.0.1h if you use the > externals scripts in Tools\buildbot), and you should just be able to > drop _ssl.pyd and _hashlib.pyd on top of a normal install. > > Aside: I wonder if it's worth changing to dynamically linking to > OpenSSL? It would make this kind of in-place upgrade easier when > people need to do it. Any thoughts? (Does OpenSSL even support it?) We originally considered using prebuilt binaries, such as http://slproweb.com/products/Win32OpenSSL.html This is tricky because of CRT issues: they will likely bind to a different version of the CRT, and a) it is unclear whether this would reliably work, and b) requires the Python installer to include a different version of the CRT, which we would not have a license to include (as the CRT redistribution license only applies to the version of the CRT that Python was built with) There was also the desire to use the same compiler for all code distributed, to use the same optimizations on all of it. In addition, for OpenSSL, there is compile time configuration wrt. to the algorithms built into the binaries where Python's build deviates from the default. Having a separate project to build a DLL within pcbuild.sln was never implemented. Doing so possibly increases the risk of DLL hell, if Python picks up the wrong version of OpenSSL (e.g. if Python gets embedded into some other application). Regards, Martin From Steve.Dower at microsoft.com Wed Jun 18 15:07:02 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 18 Jun 2014 13:07:02 +0000 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <53A16006.3090801@v.loewis.de> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com>, <53A16006.3090801@v.loewis.de> Message-ID: <5fd7795d324f4bc59b2b09bb217502cc@BLUPR03MB389.namprd03.prod.outlook.com> Yeah, the fact that it really has to be our own copy of the DLL negates the advantage. If someone can rebuild that, they could rebuild the modules that statically link it. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Martin v. L?wis Sent: ?6/?18/?2014 2:46 To: Steve Dower; Yates, Andy (CS Houston, TX); Python-Dev at python.org Subject: Re: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required Am 17.06.14 20:27, schrieb Steve Dower: > You'll only need to rebuild the _ssl and _hashlib extension modules > with the new OpenSSL version. The easiest way to do this is to build > from source (which has already been updated for 1.0.1h if you use the > externals scripts in Tools\buildbot), and you should just be able to > drop _ssl.pyd and _hashlib.pyd on top of a normal install. > > Aside: I wonder if it's worth changing to dynamically linking to > OpenSSL? It would make this kind of in-place upgrade easier when > people need to do it. Any thoughts? (Does OpenSSL even support it?) We originally considered using prebuilt binaries, such as http://slproweb.com/products/Win32OpenSSL.html This is tricky because of CRT issues: they will likely bind to a different version of the CRT, and a) it is unclear whether this would reliably work, and b) requires the Python installer to include a different version of the CRT, which we would not have a license to include (as the CRT redistribution license only applies to the version of the CRT that Python was built with) There was also the desire to use the same compiler for all code distributed, to use the same optimizations on all of it. In addition, for OpenSSL, there is compile time configuration wrt. to the algorithms built into the binaries where Python's build deviates from the default. Having a separate project to build a DLL within pcbuild.sln was never implemented. Doing so possibly increases the risk of DLL hell, if Python picks up the wrong version of OpenSSL (e.g. if Python gets embedded into some other application). Regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ayates at hp.com Thu Jun 19 20:06:51 2014 From: ayates at hp.com (Yates, Andy (CS Houston, TX)) Date: Thu, 19 Jun 2014 18:06:51 +0000 Subject: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required In-Reply-To: <1403032026.581.129869537.3E053BB6@webmail.messagingengine.com> References: <8E2E2A615DD11C4CAFD572AA1370481675707068@G9W0341.americas.hpqcorp.net> <81f84430ce0242e5bfa5b2264777df56@BLUPR03MB389.namprd03.prod.outlook.com> <1403032026.581.129869537.3E053BB6@webmail.messagingengine.com> Message-ID: <8E2E2A615DD11C4CAFD572AA1370481675708954@G9W0341.americas.hpqcorp.net> Thanks for all the good information. We ended up building _ssl and _hashlib and dropping those into the existing Python on our build server. That seems to be working fine. >From my perspective ssl libraries are a special case. I think I could handle any other included library having a flaw for weeks or months, but my management and customers are sensitive to releasing software with known ssl vulnerabilities. For Windows Python it looks like the only option for updating OpenSSL is to build from source. For us that turned out to be no big deal. However, it may be beyond the reach of some, either technically or due to the lack of access to Dev Studio. There's also some concern that a custom build of Python may not have some secret sauce or complier switch that could cause unexpected behavior. That said, I'd like to see Python spin within a short period of time after a recognized OpenSSL vulnerability is fixed if is statically linked. This would limit exposure to the unsuspecting user who downloads Windows Python from Python.org. The next best thing would be to dynamically link to Windows OpenSSL DLLs allowing users to drop in which ever version they like. Thanks again!! Andy -----Original Message----- From: Python-Dev [mailto:python-dev-bounces+ayates=hp.com at python.org] On Behalf Of Benjamin Peterson Sent: Tuesday, June 17, 2014 2:07 PM To: Ned Deily; python-dev at python.org Subject: Re: [Python-Dev] Issue 21671: CVE-2014-0224 OpenSSL upgrade to 1.0.1h on Windows required On Tue, Jun 17, 2014, at 12:03, Ned Deily wrote: > In article > <81f84430ce0242e5bfa5b2264777df56 at BLUPR03MB389.namprd03.prod.outlook.c > om > >, > Steve Dower wrote: > > You'll only need to rebuild the _ssl and _hashlib extension modules > > with the new OpenSSL version. The easiest way to do this is to build > > from source (which has already been updated for 1.0.1h if you use > > the externals scripts in Tools\buildbot), and you should just be > > able to drop _ssl.pyd and _hashlib.pyd on top of a normal install. > > Should we consider doing a re-spin of the Windows installers for 2.7.7 > with 1.0.1h? Or consider doing a 2.7.8 in the near future to address > this and various 2.7.7 regressions that have been identified so far > (Issues 21652 and 21672)? I think we should do a 2.7.8 soon to pick up the openssl upgrade and recent CGI security fix. I would like to see those two regressions fixed first, though. _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ayates%40hp.com From joseph.martinot-lagarde at m4x.org Thu Jun 19 21:39:20 2014 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Thu, 19 Jun 2014 21:39:20 +0200 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <481b9af010ac4134a3ecbafd32f3be31@BLUPR03MB389.namprd03.prod.outlook.com> References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> <539CD85B.1060104@canterbury.ac.nz> , <481b9af010ac4134a3ecbafd32f3be31@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <53A33C68.2070006@m4x.org> Le 15/06/2014 05:15, Steve Dower a ?crit : > So is exec(tokenize.open(file).read()) the actual replacement for > execfile()? Not too bad, but still not obvious (or widely promoted - I'd > never heard of it). > Another way is to open the file in binary, then exec() checks itself if an encoding is defined in the file. This is what is used in spyder: exec(open(file, 'rb').read()) Here is the discussion for reference: https://bitbucket.org/spyder-ide/spyderlib/pull-request/3/execution-on-current-spyder-interpreter/diff This behavior is not indicated in the documentation but is somehow confirmed on stackoverflow: http://stackoverflow.com/questions/6357361/alternative-to-execfile-in-python-3-2/6357418?noredirect=1#comment30467918_6357418 --- Ce courrier ?lectronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com From p.f.moore at gmail.com Thu Jun 19 22:46:02 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 19 Jun 2014 21:46:02 +0100 Subject: [Python-Dev] Criticism of execfile() removal in Python3 In-Reply-To: <53A33C68.2070006@m4x.org> References: <20140610052312.280e49c9@x34f> <20140614210059.GB20710@chromebot.lan> <539CD85B.1060104@canterbury.ac.nz> <481b9af010ac4134a3ecbafd32f3be31@BLUPR03MB389.namprd03.prod.outlook.com> <53A33C68.2070006@m4x.org> Message-ID: On 19 June 2014 20:39, Joseph Martinot-Lagarde wrote: > Another way is to open the file in binary, then exec() checks itself if an > encoding is defined in the file. This is what is used in spyder: > > exec(open(file, 'rb').read()) > > Here is the discussion for reference: > https://bitbucket.org/spyder-ide/spyderlib/pull-request/3/execution-on-current-spyder-interpreter/diff It would be good to document this. Could you open a docs bug to get this added? Paul From status at bugs.python.org Fri Jun 20 18:07:58 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 20 Jun 2014 18:07:58 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140620160758.1EDB456920@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-06-13 - 2014-06-20) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4655 ( -7) closed 28932 (+73) total 33587 (+66) Open issues with patches: 2152 Issues opened (49) ================== #8110: subprocess.py doesn't correctly detect Windows machines http://bugs.python.org/issue8110 reopened by r.david.murray #21750: mock_open data is visible only once for the life of the class http://bugs.python.org/issue21750 opened by pkoning #21753: Windows cmd.exe character escaping function http://bugs.python.org/issue21753 opened by Jim.Jewett #21754: Add tests for turtle.TurtleScreenBase http://bugs.python.org/issue21754 opened by ingrid #21755: test_importlib.test_locks fails --without-threads http://bugs.python.org/issue21755 opened by berker.peksag #21756: IDLE - ParenMatch fails to find closing paren of multi-line st http://bugs.python.org/issue21756 opened by taleinat #21760: inspect documentation describes module type inaccurately http://bugs.python.org/issue21760 opened by eric.snow #21761: language reference describes the role of module.__file__ inacc http://bugs.python.org/issue21761 opened by eric.snow #21762: update the import machinery to only use __spec__ http://bugs.python.org/issue21762 opened by eric.snow #21763: Clarify requirements for file-like objects http://bugs.python.org/issue21763 opened by nikratio #21765: Idle: make 3.x HyperParser work with non-ascii identifiers. http://bugs.python.org/issue21765 opened by terry.reedy #21767: singledispatch docs should explicitly mention support for abst http://bugs.python.org/issue21767 opened by ncoghlan #21768: Fix a NameError in test_pydoc http://bugs.python.org/issue21768 opened by Claudiu.Popa #21769: Fix a NameError in test_descr http://bugs.python.org/issue21769 opened by Claudiu.Popa #21770: Module not callable in script_helper.py http://bugs.python.org/issue21770 opened by Claudiu.Popa #21772: platform.uname() not EINTR safe http://bugs.python.org/issue21772 opened by Tor.Colvin #21775: shutil.copytree() crashes copying to VFAT on Linux: AttributeE http://bugs.python.org/issue21775 opened by gward #21776: distutils.upload uses the wrong order of exceptions http://bugs.python.org/issue21776 opened by Claudiu.Popa #21777: Separate out documentation of binary sequence methods http://bugs.python.org/issue21777 opened by ncoghlan #21778: PyBuffer_FillInfo() from 3.3 http://bugs.python.org/issue21778 opened by arigo #21779: test_multiprocessing_spawn fails when ran with -Werror http://bugs.python.org/issue21779 opened by serhiy.storchaka #21780: make unicodedata module 64-bit safe http://bugs.python.org/issue21780 opened by haypo #21781: make _ssl module 64-bit clean http://bugs.python.org/issue21781 opened by haypo #21782: hashable documentation error: shouldn't mention id http://bugs.python.org/issue21782 opened by Giacomo.Alzetta #21783: smtpd.py does not allow multiple helo/ehlo commands http://bugs.python.org/issue21783 opened by zvyn #21784: __init__.py can be a directory http://bugs.python.org/issue21784 opened by abraithwaite #21785: __getitem__ and __setitem__ try to be smart when invoked with http://bugs.python.org/issue21785 opened by kt #21786: Use assertEqual in test_pydoc http://bugs.python.org/issue21786 opened by Claudiu.Popa #21787: Idle: make 3.x Hyperparser.get_expression recognize ... http://bugs.python.org/issue21787 opened by terry.reedy #21788: Rework Python finalization http://bugs.python.org/issue21788 opened by haypo #21790: Change blocksize in http.client to the value of resource.getpa http://bugs.python.org/issue21790 opened by dbrecht #21791: Proper return status of os.WNOHANG is not always (0, 0) http://bugs.python.org/issue21791 opened by eradman #21793: httplib client/server status refactor http://bugs.python.org/issue21793 opened by dbrecht #21795: smtpd.SMTPServer should announce 8BITMIME when supported http://bugs.python.org/issue21795 opened by zvyn #21796: tempfile.py", line 83, in once_lock = _allocate_lock( http://bugs.python.org/issue21796 opened by pythonbug1shal #21799: Py_SetPath() gives compile error: undefined reference to '__im http://bugs.python.org/issue21799 opened by Pat.Le.Cat #21800: Implement RFC 6855 (IMAP Support for UTF-8) in imaplib. http://bugs.python.org/issue21800 opened by zvyn #21801: inspect.signature doesn't always return a signature http://bugs.python.org/issue21801 opened by Claudiu.Popa #21802: Reader of BufferedRWPair is not closed if writer's close() fai http://bugs.python.org/issue21802 opened by serhiy.storchaka #21803: Remove macro indirections in complexobject http://bugs.python.org/issue21803 opened by pitrou #21804: Implement thr UTF8 command (RFC 6856) in poplib. http://bugs.python.org/issue21804 opened by zvyn #21806: Add tests for turtle.TPen class http://bugs.python.org/issue21806 opened by ingrid #21807: SysLogHandler closes TCP connection after first message http://bugs.python.org/issue21807 opened by Omer.Katz #21809: Building Python3 on VMS - External repository http://bugs.python.org/issue21809 opened by John.Malmberg #21811: Anticipate fixes to 3.x and 2.7 for OS X 10.10 Yosemite suppor http://bugs.python.org/issue21811 opened by ned.deily #21812: turtle.shapetransform doesn't transform the turtle on the firs http://bugs.python.org/issue21812 opened by Lita.Cho #21813: Enhance doc of os.stat_result http://bugs.python.org/issue21813 opened by haypo #21814: object.__setattr__ or super(...).__setattr__? http://bugs.python.org/issue21814 opened by b9 #21815: imaplib truncates some untagged responses http://bugs.python.org/issue21815 opened by rafales Most recent 15 issues with no replies (15) ========================================== #21814: object.__setattr__ or super(...).__setattr__? http://bugs.python.org/issue21814 #21812: turtle.shapetransform doesn't transform the turtle on the firs http://bugs.python.org/issue21812 #21806: Add tests for turtle.TPen class http://bugs.python.org/issue21806 #21804: Implement thr UTF8 command (RFC 6856) in poplib. http://bugs.python.org/issue21804 #21803: Remove macro indirections in complexobject http://bugs.python.org/issue21803 #21802: Reader of BufferedRWPair is not closed if writer's close() fai http://bugs.python.org/issue21802 #21801: inspect.signature doesn't always return a signature http://bugs.python.org/issue21801 #21800: Implement RFC 6855 (IMAP Support for UTF-8) in imaplib. http://bugs.python.org/issue21800 #21799: Py_SetPath() gives compile error: undefined reference to '__im http://bugs.python.org/issue21799 #21796: tempfile.py", line 83, in once_lock = _allocate_lock( http://bugs.python.org/issue21796 #21795: smtpd.SMTPServer should announce 8BITMIME when supported http://bugs.python.org/issue21795 #21791: Proper return status of os.WNOHANG is not always (0, 0) http://bugs.python.org/issue21791 #21787: Idle: make 3.x Hyperparser.get_expression recognize ... http://bugs.python.org/issue21787 #21783: smtpd.py does not allow multiple helo/ehlo commands http://bugs.python.org/issue21783 #21781: make _ssl module 64-bit clean http://bugs.python.org/issue21781 Most recent 15 issues waiting for review (15) ============================================= #21813: Enhance doc of os.stat_result http://bugs.python.org/issue21813 #21811: Anticipate fixes to 3.x and 2.7 for OS X 10.10 Yosemite suppor http://bugs.python.org/issue21811 #21806: Add tests for turtle.TPen class http://bugs.python.org/issue21806 #21804: Implement thr UTF8 command (RFC 6856) in poplib. http://bugs.python.org/issue21804 #21803: Remove macro indirections in complexobject http://bugs.python.org/issue21803 #21802: Reader of BufferedRWPair is not closed if writer's close() fai http://bugs.python.org/issue21802 #21801: inspect.signature doesn't always return a signature http://bugs.python.org/issue21801 #21793: httplib client/server status refactor http://bugs.python.org/issue21793 #21790: Change blocksize in http.client to the value of resource.getpa http://bugs.python.org/issue21790 #21786: Use assertEqual in test_pydoc http://bugs.python.org/issue21786 #21781: make _ssl module 64-bit clean http://bugs.python.org/issue21781 #21780: make unicodedata module 64-bit safe http://bugs.python.org/issue21780 #21777: Separate out documentation of binary sequence methods http://bugs.python.org/issue21777 #21776: distutils.upload uses the wrong order of exceptions http://bugs.python.org/issue21776 #21772: platform.uname() not EINTR safe http://bugs.python.org/issue21772 Top 10 most discussed issues (10) ================================= #14534: Add method to mark unittest.TestCases as "do not run". http://bugs.python.org/issue14534 14 msgs #21763: Clarify requirements for file-like objects http://bugs.python.org/issue21763 14 msgs #10740: sqlite3 module breaks transactions and potentially corrupts da http://bugs.python.org/issue10740 10 msgs #21741: Convert most of the test suite to using unittest.main() http://bugs.python.org/issue21741 10 msgs #19495: Enhancement for timeit: measure time to run blocks of code usi http://bugs.python.org/issue19495 8 msgs #21772: platform.uname() not EINTR safe http://bugs.python.org/issue21772 8 msgs #15993: Windows: 3.3.0-rc2.msi: test_buffer fails http://bugs.python.org/issue15993 7 msgs #21765: Idle: make 3.x HyperParser work with non-ascii identifiers. http://bugs.python.org/issue21765 6 msgs #21784: __init__.py can be a directory http://bugs.python.org/issue21784 6 msgs #5207: extend strftime/strptime format for RFC3339 and RFC2822 http://bugs.python.org/issue5207 5 msgs Issues closed (67) ================== #3425: posixmodule.c always using res = utime(path, NULL) http://bugs.python.org/issue3425 closed by r.david.murray #5904: strftime docs do not explain locale effect on result string http://bugs.python.org/issue5904 closed by r.david.murray #6133: LOAD_CONST followed by LOAD_ATTR can be optimized to just be a http://bugs.python.org/issue6133 closed by terry.reedy #6916: Remove deprecated items from asynchat http://bugs.python.org/issue6916 closed by giampaolo.rodola #6966: Ability to refer to arguments in TestCase.fail* methods http://bugs.python.org/issue6966 closed by r.david.murray #9693: asynchat push_callable() patch http://bugs.python.org/issue9693 closed by giampaolo.rodola #9727: Add callbacks to be invoked when locale changes http://bugs.python.org/issue9727 closed by loewis #9972: PyGILState_XXX missing in Python builds without threads http://bugs.python.org/issue9972 closed by ned.deily #10002: Installer doesn't install on Windows Server 2008 DataCenter R2 http://bugs.python.org/issue10002 closed by loewis #10084: SSL support for asyncore http://bugs.python.org/issue10084 closed by giampaolo.rodola #10136: kill_python doesn't work with short path http://bugs.python.org/issue10136 closed by zach.ware #10310: signed:1 bitfields rarely make sense http://bugs.python.org/issue10310 closed by berker.peksag #10524: Patch to add Pardus to supported dists in platform http://bugs.python.org/issue10524 closed by berker.peksag #11287: Add context manager support to dbm modules http://bugs.python.org/issue11287 closed by Claudiu.Popa #11394: Tools/demo, etc. are not installed http://bugs.python.org/issue11394 closed by terry.reedy #11736: windows installers ssl module / openssl broken for some sites http://bugs.python.org/issue11736 closed by loewis #11792: asyncore module print to stdout http://bugs.python.org/issue11792 closed by giampaolo.rodola #12617: Mutable Sequence Type can work not only with iterable in slice http://bugs.python.org/issue12617 closed by Claudiu.Popa #13102: xml.dom.minidom does not support default namespaces http://bugs.python.org/issue13102 closed by ezio.melotti #13779: os.walk: bottom-up http://bugs.python.org/issue13779 closed by python-dev #16587: Py_Initialize breaks wprintf on Windows http://bugs.python.org/issue16587 closed by haypo #18612: More elaborate documentation on how list comprehensions and ge http://bugs.python.org/issue18612 closed by uglemat #18703: To change the doc of html/faq/gui.html http://bugs.python.org/issue18703 closed by r.david.murray #19362: Documentation for len() fails to mention that it works on sets http://bugs.python.org/issue19362 closed by terry.reedy #19493: Report skipped ctypes tests as skipped http://bugs.python.org/issue19493 closed by zach.ware #19768: Not so correct error message when giving incorrect type to max http://bugs.python.org/issue19768 closed by rhettinger #19898: No tests for dequereviter_new http://bugs.python.org/issue19898 closed by rhettinger #20062: Remove emacs page from devguide http://bugs.python.org/issue20062 closed by ezio.melotti #20068: collections.Counter documentation leaves out interesting useca http://bugs.python.org/issue20068 closed by rhettinger #20091: An index entry for __main__ in "30.5 runpy" is missing http://bugs.python.org/issue20091 closed by orsenthil #20457: Use partition and enumerate make getopt easier http://bugs.python.org/issue20457 closed by ezio.melotti #20708: commands has no "RANDOM" environment? http://bugs.python.org/issue20708 closed by zach.ware #20880: Windows installation problem with 3.3.5 http://bugs.python.org/issue20880 closed by BreamoreBoy #20915: Add "pip" section to experts list in devguide http://bugs.python.org/issue20915 closed by ezio.melotti #21205: Add __qualname__ attribute to Python generators and change def http://bugs.python.org/issue21205 closed by haypo #21326: asyncio: request clearer error message when event loop closed http://bugs.python.org/issue21326 closed by haypo #21559: OverflowError should not happen for integer operations http://bugs.python.org/issue21559 closed by terry.reedy #21595: asyncio: Creating many subprocess generates lots of internal B http://bugs.python.org/issue21595 closed by python-dev #21669: Custom error messages when print & exec are used as statements http://bugs.python.org/issue21669 closed by ncoghlan #21686: IDLE - Test hyperparser http://bugs.python.org/issue21686 closed by terry.reedy #21690: re documentation: re.compile links to re.search / re.match ins http://bugs.python.org/issue21690 closed by ezio.melotti #21694: IDLE - Test ParenMatch http://bugs.python.org/issue21694 closed by terry.reedy #21719: Returning Windows file attribute information via os.stat() http://bugs.python.org/issue21719 closed by zach.ware #21722: teach distutils "upload" to exit with code != 0 when error occ http://bugs.python.org/issue21722 closed by pitrou #21723: Float maxsize is treated as infinity in asyncio.Queue http://bugs.python.org/issue21723 closed by haypo #21726: Unnecessary line in documentation http://bugs.python.org/issue21726 closed by terry.reedy #21730: test_socket fails --without-threads http://bugs.python.org/issue21730 closed by terry.reedy #21742: WatchedFileHandler can fail due to race conditions or file ope http://bugs.python.org/issue21742 closed by python-dev #21744: itertools.islice() goes over all the pre-initial elements even http://bugs.python.org/issue21744 closed by rhettinger #21751: Expand zipimport to support bzip2 and lzma http://bugs.python.org/issue21751 closed by serhiy.storchaka #21752: Document Backwards Incompatible change to logging in 3.4 http://bugs.python.org/issue21752 closed by python-dev #21757: Can't reenable menus in Tkinter on Mac http://bugs.python.org/issue21757 closed by ned.deily #21758: Not so correct documentation about asyncio.subprocess_shell me http://bugs.python.org/issue21758 closed by python-dev #21759: URL Typo in Documentation FAQ http://bugs.python.org/issue21759 closed by python-dev #21764: Document that IOBase.__del__ calls self.close http://bugs.python.org/issue21764 closed by python-dev #21766: CGIHTTPServer File Disclosure http://bugs.python.org/issue21766 closed by python-dev #21771: name of 2nd parameter to itertools.groupby() http://bugs.python.org/issue21771 closed by rhettinger #21773: Fix a NameError in test_enum http://bugs.python.org/issue21773 closed by haypo #21774: Fix a NameError in xml.dom.minidom http://bugs.python.org/issue21774 closed by rhettinger #21789: Broken link to PEP 263 in Python 2.7 error message http://bugs.python.org/issue21789 closed by ned.deily #21792: Spam http://bugs.python.org/issue21792 closed by SilentGhost #21794: stack frame contains name of wrapper method, not that of wrapp http://bugs.python.org/issue21794 closed by zach.ware #21797: mmap read of single byte accesses more that just that byte http://bugs.python.org/issue21797 closed by jcea #21798: Allow adding Path or str to Path http://bugs.python.org/issue21798 closed by pitrou #21805: Argparse Revert config_file defaults http://bugs.python.org/issue21805 closed by r.david.murray #21808: 65001 code page not supported http://bugs.python.org/issue21808 closed by r.david.murray #21810: SIGSEGV in PyObject_Malloc when ARENAS_USE_MMAP http://bugs.python.org/issue21810 closed by neologix From ezio.melotti at gmail.com Fri Jun 20 19:30:55 2014 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Fri, 20 Jun 2014 20:30:55 +0300 Subject: [Python-Dev] Tracker Stats Message-ID: Hi, I added a new "stats" page to the bug tracker: http://bugs.python.org/issue?@template=stats The page can be reached from the sidebar of the bug tracker: Summaries -> Stats The data are updated once a week, together with the Summary of Python tracker issues. Best Regards, Ezio Melotti From raymond.hettinger at gmail.com Fri Jun 20 20:23:54 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 20 Jun 2014 11:23:54 -0700 Subject: [Python-Dev] Tracker Stats In-Reply-To: References: Message-ID: <320108CE-AAEE-4B57-BD89-281BDB84A07D@gmail.com> On Jun 20, 2014, at 10:30 AM, Ezio Melotti wrote: > I added a new "stats" page to the bug tracker: > http://bugs.python.org/issue?@template=stats > The page can be reached from the sidebar of the bug tracker: Summaries -> Stats > The data are updated once a week, together with the Summary of Python > tracker issues. Thank you. That gives nice visibility to all the work being done on the tracker. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From pjenvey at underboss.org Fri Jun 20 22:32:10 2014 From: pjenvey at underboss.org (Philip Jenvey) Date: Fri, 20 Jun 2014 13:32:10 -0700 Subject: [Python-Dev] PyPy3 2.3.1 released Message-ID: <42C29176-D101-483F-97DC-91C18443D393@underboss.org> ===================== PyPy3 2.3.1 - Fulcrum ===================== We're pleased to announce the first stable release of PyPy3. PyPy3 targets Python 3 (3.2.5) compatibility. We would like to thank all of the people who donated_ to the `py3k proposal`_ for supporting the work that went into this. You can download the PyPy3 2.3.1 release here: http://pypy.org/download.html#pypy3-2-3-1 Highlights ========== * The first stable release of PyPy3: support for Python 3! * The stdlib has been updated to Python 3.2.5 * Additional support for the u'unicode' syntax (`PEP 414`_) from Python 3.3 * Updates from the default branch, such as incremental GC and various JIT improvements * Resolved some notable JIT performance regressions from PyPy2: - Re-enabled the previously disabled collection (list/dict/set) strategies - Resolved performance of iteration over range objects - Resolved handling of Python 3's exception __context__ unnecessarily forcing frame object overhead .. _`PEP 414`: http://legacy.python.org/dev/peps/pep-0414/ What is PyPy? ============== PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.6 or 3.2.5. It's fast due to its integrated tracing JIT compiler. This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows, and OpenBSD, as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux. While we support 32 bit python on Windows, work on the native Windows 64 bit python is still stalling, we would welcome a volunteer to `handle that`_. .. _`handle that`: http://doc.pypy.org/en/latest/windows.html#what-is-missing-for-a-full-64-bit-translation How to use PyPy? ================= We suggest using PyPy from a `virtualenv`_. Once you have a virtualenv installed, you can follow instructions from `pypy documentation`_ on how to proceed. This document also covers other `installation schemes`_. .. _donated: http://morepypy.blogspot.com/2012/01/py3k-and-numpy-first-stage-thanks-to.html .. _`py3k proposal`: http://pypy.org/py3donate.html .. _`pypy documentation`: http://doc.pypy.org/en/latest/getting-started.html#installing-using-virtualenv .. _`virtualenv`: http://www.virtualenv.org/en/latest/ .. _`installation schemes`: http://doc.pypy.org/en/latest/getting-started.html#installing-pypy Cheers, the PyPy team -- Philip Jenvey From ericsnowcurrently at gmail.com Sat Jun 21 01:18:16 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 20 Jun 2014 17:18:16 -0600 Subject: [Python-Dev] PyPy3 2.3.1 released In-Reply-To: <42C29176-D101-483F-97DC-91C18443D393@underboss.org> References: <42C29176-D101-483F-97DC-91C18443D393@underboss.org> Message-ID: On Fri, Jun 20, 2014 at 2:32 PM, Philip Jenvey wrote: > ===================== > PyPy3 2.3.1 - Fulcrum > ===================== > > We're pleased to announce the first stable release of PyPy3. PyPy3 > targets Python 3 (3.2.5) compatibility. Awesome! -eric From ncoghlan at gmail.com Sat Jun 21 03:24:58 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Jun 2014 11:24:58 +1000 Subject: [Python-Dev] PyPy3 2.3.1 released In-Reply-To: <42C29176-D101-483F-97DC-91C18443D393@underboss.org> References: <42C29176-D101-483F-97DC-91C18443D393@underboss.org> Message-ID: On 21 Jun 2014 06:39, "Philip Jenvey" wrote: > > ===================== > PyPy3 2.3.1 - Fulcrum > ===================== > > We're pleased to announce the first stable release of PyPy3. PyPy3 > targets Python 3 (3.2.5) compatibility. Congratulations, that's another critical milestone in the Python 3 migration reached! :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wizzat at gmail.com Sat Jun 21 05:12:19 2014 From: wizzat at gmail.com (Mark Roberts) Date: Fri, 20 Jun 2014 20:12:19 -0700 Subject: [Python-Dev] PyPy3 2.3.1 released In-Reply-To: <42C29176-D101-483F-97DC-91C18443D393@underboss.org> References: <42C29176-D101-483F-97DC-91C18443D393@underboss.org> Message-ID: <2125E459-F4A3-4F2E-A8E1-77263D0BEE5C@gmail.com> That's fantastic! Great job - that's a lot of work :) -Mark > On Jun 20, 2014, at 13:32, Philip Jenvey wrote: > > ===================== > PyPy3 2.3.1 - Fulcrum > ===================== > > We're pleased to announce the first stable release of PyPy3. PyPy3 > targets Python 3 (3.2.5) compatibility. > > We would like to thank all of the people who donated_ to the `py3k proposal`_ > for supporting the work that went into this. > > You can download the PyPy3 2.3.1 release here: > > http://pypy.org/download.html#pypy3-2-3-1 > > Highlights > ========== > > * The first stable release of PyPy3: support for Python 3! > > * The stdlib has been updated to Python 3.2.5 > > * Additional support for the u'unicode' syntax (`PEP 414`_) from Python 3.3 > > * Updates from the default branch, such as incremental GC and various JIT > improvements > > * Resolved some notable JIT performance regressions from PyPy2: > > - Re-enabled the previously disabled collection (list/dict/set) strategies > > - Resolved performance of iteration over range objects > > - Resolved handling of Python 3's exception __context__ unnecessarily forcing > frame object overhead > > .. _`PEP 414`: http://legacy.python.org/dev/peps/pep-0414/ > > What is PyPy? > ============== > > PyPy is a very compliant Python interpreter, almost a drop-in replacement for > CPython 2.7.6 or 3.2.5. It's fast due to its integrated tracing JIT compiler. > > This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows, > and OpenBSD, > as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux. > > While we support 32 bit python on Windows, work on the native Windows 64 > bit python is still stalling, we would welcome a volunteer > to `handle that`_. > > .. _`handle that`: http://doc.pypy.org/en/latest/windows.html#what-is-missing-for-a-full-64-bit-translation > > How to use PyPy? > ================= > > We suggest using PyPy from a `virtualenv`_. Once you have a virtualenv > installed, you can follow instructions from `pypy documentation`_ on how > to proceed. This document also covers other `installation schemes`_. > > .. _donated: http://morepypy.blogspot.com/2012/01/py3k-and-numpy-first-stage-thanks-to.html > .. _`py3k proposal`: http://pypy.org/py3donate.html > .. _`pypy documentation`: http://doc.pypy.org/en/latest/getting-started.html#installing-using-virtualenv > .. _`virtualenv`: http://www.virtualenv.org/en/latest/ > .. _`installation schemes`: http://doc.pypy.org/en/latest/getting-started.html#installing-pypy > > > Cheers, > the PyPy team > > -- > Philip Jenvey > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wizzat%40gmail.com From mal at egenix.com Sat Jun 21 12:27:17 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 21 Jun 2014 12:27:17 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit Message-ID: <53A55E05.5020906@egenix.com> With PEP 466 and the constant flow of OpenSSL security fixes which are currently being handled via Python patch level releases, we will soon reach 2.7.10 and quickly go beyond that (also see http://bugs.python.org/issue21308). This opens up a potential backwards incompatibility with existing tools that assume the Python release version number to use the "x.y.z" single digit approach, e.g. code that uses sys.version[:5] for the Python version or relies on the lexicographic ordering of the version string (sys.version > '2.7.2'). Some questions we should probably ask ourselves (I've added my thoughts inline): * Is it a good strategy to ship to Python releases for every single OpenSSL security release or is there a better way to handle these 3rd party issues ? I think we should link to the OpenSSL libs dynamically rather than statically in Python 2.7 for Windows so that it's possible to provide drop-in updates for such issues. * Should we try to avoid two digit patch level release numbers by using some other mechanism such as e.g. a release date after 2.7.9 ? Grepping through our code, this will introduce some breakage, but not much. Most older code branches on minor versions, not patch levels. More recent code uses sys.python_info so is not affected. * Should we make use of the potential breakage with 2.7.10 to introduce a new Windows compiler version for Python 2.7 ? I think this would be a good chance to update the compiler to whatever we use for Python 3 at the time. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 21 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 2014-06-09: Released eGenix pyOpenSSL 0.13.3 ... http://egenix.com/go57 2014-07-02: Python Meeting Duesseldorf ... 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Sat Jun 21 12:51:54 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Jun 2014 20:51:54 +1000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A55E05.5020906@egenix.com> References: <53A55E05.5020906@egenix.com> Message-ID: On 21 June 2014 20:27, M.-A. Lemburg wrote: > With PEP 466 and the constant flow of OpenSSL security fixes > which are currently being handled via Python patch level releases, > we will soon reach 2.7.10 and quickly go beyond that (also see > http://bugs.python.org/issue21308). > > This opens up a potential backwards incompatibility with existing > tools that assume the Python release version number to use the > "x.y.z" single digit approach, e.g. code that uses sys.version[:5] > for the Python version or relies on the lexicographic ordering > of the version string (sys.version > '2.7.2'). Such code has an easy fix available, though, as sys.version_info has existed since 2.0, and handles two digit micro releases just fine. The docs for sys.version also have this explicit disclaimer: "Do not extract version information out of it, rather, use version_info and the functions provided by the platform module." Making it harder to tell whether or not someone's Python installation is affected by an OpenSSL CVE is also an undesirable outcome. On a Linux distro, folks will check the distro package database directly for the OpenSSL version, but on Windows, no such centralised audit mechanism is available by default. With OpenSSL statically linked, Python versions can just be mapped to OpenSSL versions (so, for example, 2.7.7 has 1.0.1g) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Sat Jun 21 18:40:29 2014 From: barry at python.org (Barry Warsaw) Date: Sat, 21 Jun 2014 12:40:29 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A55E05.5020906@egenix.com> References: <53A55E05.5020906@egenix.com> Message-ID: <20140621124029.1314ead6@limelight.wooz.org> On Jun 21, 2014, at 12:27 PM, M.-A. Lemburg wrote: >This opens up a potential backwards incompatibility with existing >tools that assume the Python release version number to use the >"x.y.z" single digit approach, e.g. code that uses sys.version[:5] >for the Python version or relies on the lexicographic ordering >of the version string (sys.version > '2.7.2'). Patient: Doctor, it hurts when I do this. Doctor: Don't do that! > * Should we try to avoid two digit patch level release numbers > by using some other mechanism such as e.g. a release date > after 2.7.9 ? > > Grepping through our code, this will introduce some breakage, > but not much. Most older code branches on minor versions, > not patch levels. More recent code uses sys.python_info so > is not affected. s/sys.python_info/sys.version_info/ and yes the latter has been preferred for a long time now. Given that 2.7 is a long term support release, it's inevitable that we'll break the 2-digit micro release number barrier. So be it. A 2.7.10 isn't the end of the world. -Barry From mal at egenix.com Sat Jun 21 18:57:57 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 21 Jun 2014 18:57:57 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> Message-ID: <53A5B995.6040802@egenix.com> On 21.06.2014 12:51, Nick Coghlan wrote: > On 21 June 2014 20:27, M.-A. Lemburg wrote: >> With PEP 466 and the constant flow of OpenSSL security fixes >> which are currently being handled via Python patch level releases, >> we will soon reach 2.7.10 and quickly go beyond that (also see >> http://bugs.python.org/issue21308). >> >> This opens up a potential backwards incompatibility with existing >> tools that assume the Python release version number to use the >> "x.y.z" single digit approach, e.g. code that uses sys.version[:5] >> for the Python version or relies on the lexicographic ordering >> of the version string (sys.version > '2.7.2'). > > Such code has an easy fix available, though, as sys.version_info has > existed since 2.0, and handles two digit micro releases just fine. The > docs for sys.version also have this explicit disclaimer: "Do not > extract version information out of it, rather, use version_info and > the functions provided by the platform module." I don't think that's a good argument. Of course, there are better ways to figure out the version number, but fact is, existing code, even in the stdlib, does use and parse the sys.version string version. During Python's lifetime, we've always avoided two digit version numbers, so people have been relying on this, even if it was never (AFAIK) documented anywhere. > Making it harder to tell whether or not someone's Python installation > is affected by an OpenSSL CVE is also an undesirable outcome. On a > Linux distro, folks will check the distro package database directly > for the OpenSSL version, but on Windows, no such centralised audit > mechanism is available by default. With OpenSSL statically linked, > Python versions can just be mapped to OpenSSL versions (so, for > example, 2.7.7 has 1.0.1g) I have to disagree here as well :-) If people cannot upgrade to a higher patch level for whatever reason (say a patch level release introduced some other bugs), but still need to upgrade to the current OpenSSL version, they'd be stuck if we continue to bind the Python version number to some OpenSSL release version. We should definitely make it possible to address OpenSSL bugs without having to upgrade Python and it's not hard to do: just replace the static binding with dynamic binding and include the two OpenSSL DLLs with the Windows installer. People can then drop in new versions of those DLLs as needed, without having the core devs do a complete new release every time someone finds a new problem those libs. Security libs simply have a much higher release rate (if they are well maintained) than most other software. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 21 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 2014-06-09: Released eGenix pyOpenSSL 0.13.3 ... http://egenix.com/go57 2014-07-02: Python Meeting Duesseldorf ... 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nad at acm.org Sat Jun 21 20:47:08 2014 From: nad at acm.org (Ned Deily) Date: Sat, 21 Jun 2014 11:47:08 -0700 Subject: [Python-Dev] Python 2.7 patch levels turning two digit References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> Message-ID: In article <53A5B995.6040802 at egenix.com>, "M.-A. Lemburg" wrote: > > Making it harder to tell whether or not someone's Python installation > > is affected by an OpenSSL CVE is also an undesirable outcome. On a > > Linux distro, folks will check the distro package database directly > > for the OpenSSL version, but on Windows, no such centralised audit > > mechanism is available by default. With OpenSSL statically linked, > > Python versions can just be mapped to OpenSSL versions (so, for > > example, 2.7.7 has 1.0.1g) > > I have to disagree here as well :-) > > If people cannot upgrade to a higher patch level for whatever > reason (say a patch level release introduced some other bugs), > but still need to upgrade to the current OpenSSL version, they'd > be stuck if we continue to bind the Python version number to > some OpenSSL release version. > > We should definitely make it possible to address OpenSSL > bugs without having to upgrade Python and it's not hard to > do: just replace the static binding with dynamic binding > and include the two OpenSSL DLLs with the Windows installer. > > People can then drop in new versions of those DLLs > as needed, without having the core devs do a complete > new release every time someone finds a new problem those > libs. Security libs simply have a much higher release > rate (if they are well maintained) than most other > software. I agree that with Nick and Barry that, due to the extended support period for 2.7, we have no choice but to bite the bullet and deal with micro levels exceeding 9. On the other hand, it would also be good to be better able to deal with third-party library revisions that only affect the Windows or OS X binary installers we supply. I don't know that we've ever had a process/policy in place to do that, certainly not recently. Even for statically linked libraries, we presumably could supply replacement re-linked files or even carefully repacked installers with the updated files. This might be something to discuss and add to PEP 101 or a new PEP. Up to now, this hasn't been a major concern since there have usually been other reasons to do full releases as well, e.g. source regressions. Given the still relatively high churn rate for changes going into 2.7 and the growing distance between the 2.7 and 3.x code bases (among other things, leading to more frequent inadvertent backporting errors), we'll probably need to keep making relatively frequent 2.7 releases unless we can further slow down the 2.7 change rate. -- Ned Deily, nad at acm.org From rosuav at gmail.com Sat Jun 21 22:34:23 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 22 Jun 2014 06:34:23 +1000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A5B995.6040802@egenix.com> References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> Message-ID: On Sun, Jun 22, 2014 at 2:57 AM, M.-A. Lemburg wrote: > On 21.06.2014 12:51, Nick Coghlan wrote: >> Such code has an easy fix available, though, as sys.version_info has >> existed since 2.0, and handles two digit micro releases just fine. The >> docs for sys.version also have this explicit disclaimer: "Do not >> extract version information out of it, rather, use version_info and >> the functions provided by the platform module." > > I don't think that's a good argument. Of course, there are > better ways to figure out the version number, but fact is, > existing code, even in the stdlib, does use and parse > the sys.version string version. > > During Python's lifetime, we've always avoided two digit > version numbers, so people have been relying on this, even > if it was never (AFAIK) documented anywhere. It's going to be a broken-code-breaking change that's introduced in a point release, but since PEP 404 implicitly says that there won't be a 2.10.0, there's no way around that. Although actually, a glance at the stdlib suggests that 2.10.0 (or 3.10.0) would break a lot more than 2.7.10 would break - there are places where sys.version[:3] is used (or equivalents like "... %.3s ..." % sys.version), or a whole-string comparison is done against a two-part version string (eg: sys.version >= "2.6"), and at least one place that checks sys.version[0] for the major version number, but I didn't find any that look at sys.version[:5] or equivalent. Everything that cares about the three-part version number seems to either look at sys.version.split()[0] or sys.version_info. Do you know where this problematic code is? I checked this in the 2.7.3 stdlib as packaged on my Debian Wheezy system, for what it's worth. ChrisA From phd at phdru.name Sat Jun 21 22:58:04 2014 From: phd at phdru.name (Oleg Broytman) Date: Sat, 21 Jun 2014 22:58:04 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> Message-ID: <20140621205804.GA12098@phdru.name> On Sun, Jun 22, 2014 at 06:34:23AM +1000, Chris Angelico wrote: > Do you know where this problematic code is? In many places: https://encrypted.google.com/search?q=%22sys.version[%3A3]%22 https://encrypted.google.com/search?q=%22sys.version[%3A5]%22 Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From mal at egenix.com Sat Jun 21 23:37:21 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 21 Jun 2014 23:37:21 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> Message-ID: <53A5FB11.5020302@egenix.com> On 21.06.2014 22:34, Chris Angelico wrote: > On Sun, Jun 22, 2014 at 2:57 AM, M.-A. Lemburg wrote: >> On 21.06.2014 12:51, Nick Coghlan wrote: >>> Such code has an easy fix available, though, as sys.version_info has >>> existed since 2.0, and handles two digit micro releases just fine. The >>> docs for sys.version also have this explicit disclaimer: "Do not >>> extract version information out of it, rather, use version_info and >>> the functions provided by the platform module." >> >> I don't think that's a good argument. Of course, there are >> better ways to figure out the version number, but fact is, >> existing code, even in the stdlib, does use and parse >> the sys.version string version. >> >> During Python's lifetime, we've always avoided two digit >> version numbers, so people have been relying on this, even >> if it was never (AFAIK) documented anywhere. > > It's going to be a broken-code-breaking change that's introduced in a > point release, but since PEP 404 implicitly says that there won't be a > 2.10.0, there's no way around that. Although actually, a glance at the > stdlib suggests that 2.10.0 (or 3.10.0) would break a lot more than > 2.7.10 would break - there are places where sys.version[:3] is used > (or equivalents like "... %.3s ..." % sys.version), or a whole-string > comparison is done against a two-part version string (eg: sys.version >> = "2.6"), and at least one place that checks sys.version[0] for the > major version number, but I didn't find any that look at > sys.version[:5] or equivalent. Everything that cares about the > three-part version number seems to either look at > sys.version.split()[0] or sys.version_info. Do you know where this > problematic code is? > > I checked this in the 2.7.3 stdlib as packaged on my Debian Wheezy > system, for what it's worth. There are no places in the stdlib that parse sys.version in a way that would break wtih 2.7.10, AFAIK. I was just referring to the statement that Nick quoted. sys.version *is* used for parsing the Python version or using parts of it to build e.g. filenames and that's really no surprise. That said, and I also included this in my answers to the questions that Nick removed in his reply, I don't think that a lot of code would be affected by this. I do believe that we can use this potential breakage as a chance for improvement. See the last question (listed here again)... 1. Is it a good strategy to ship to Python releases for every single OpenSSL security release or is there a better way to handle these 3rd party issues ? 2. Should we try to avoid two digit patch level release numbers by using some other mechanism such as e.g. a release date after 2.7.9 ? 3. Should we make use of the potential breakage with 2.7.10 to introduce a new Windows compiler version for Python 2.7 ? My answers to these are: 1. We should use dynamic linking instead and not let OpenSSL bugs trigger Python releases; 2. It's not a big problem; 3. Yes, please, since it is difficult for people to develop and debug their extensions with a 2008 compiler, when the rest of the world has long moved on. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 21 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 2014-06-09: Released eGenix pyOpenSSL 0.13.3 ... http://egenix.com/go57 2014-07-02: Python Meeting Duesseldorf ... 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From phil at riverbankcomputing.com Sat Jun 21 23:57:38 2014 From: phil at riverbankcomputing.com (Phil Thompson) Date: Sat, 21 Jun 2014 22:57:38 +0100 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A5FB11.5020302@egenix.com> References: "\"<53A55E05.5020906@egenix.com>" " <53A5B995.6040802@egenix.com> <53A5FB11.5020302@egenix.com> Message-ID: <880dda6fcfa7666894993fd515d889d3@www.riverbankcomputing.com> On 21/06/2014 10:37 pm, M.-A. Lemburg wrote: > That said, and I also included this in my answers to the questions > that Nick removed in his reply, I don't think that a lot of > code would be affected by this. I do believe that we can use > this potential breakage as a chance for improvement. See the last > question (listed here again)... > > 1. Is it a good strategy to ship to Python releases for every > single OpenSSL security release or is there a better way to > handle these 3rd party issues ? Isn't this only a packaging issue? There is no change to the Python API or implementation, so there is no need to change the version number. So just make new Windows packages. The precedent is to add a dash and a package number. I can't remember what version this was applied to before - but I got a +1 from Guido for suggesting it :) Phil From ethan at stoneleaf.us Sat Jun 21 23:48:34 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 21 Jun 2014 14:48:34 -0700 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A5FB11.5020302@egenix.com> References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> <53A5FB11.5020302@egenix.com> Message-ID: <53A5FDB2.1080000@stoneleaf.us> On 06/21/2014 02:37 PM, M.-A. Lemburg wrote: > > My answers to these are: 1. We should use dynamic linking > instead and not let OpenSSL bugs trigger Python releases; 2. > It's not a big problem; 3. Yes, please, since it is difficult > for people to develop and debug their extensions with a > 2008 compiler, when the rest of the world has long moved on. +1 (assuming not incredibly difficult and those that can are willing ;) -- ~Ethan~ From Steve.Dower at microsoft.com Sun Jun 22 00:00:14 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 21 Jun 2014 22:00:14 +0000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A5FB11.5020302@egenix.com> References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> , <53A5FB11.5020302@egenix.com> Message-ID: <51068485f2924c599a6fe238ea81c8bd@BLUPR03MB389.namprd03.prod.outlook.com> We can always lie about the version in sys.version. Existing code is unaffected and new code will have to use version_info (Windows developers will know that Windows pulls tricks like this every other version... doesn't make it a great idea, but it works). Changing compiler without changing at least the install directory and preventing in place upgrades is a really bad idea, and with those mitigations is only pretty bad. I'm torn here, because I know the current situation hurts, but it'd probably only move to VC10 which will hurt just as much in a few years... there are better tooling solutions (yes, I'm working on some behind the scenes). A separate distro of _ssl and _hashlib wouldn't be too hard and has the same effect as a dynamically linked OpenSSL. Maybe we can make these PyPI updateable? Top-posted from my Windows Phone ________________________________ From: M.-A. Lemburg Sent: ?6/?21/?2014 14:38 To: Chris Angelico Cc: Python-Dev Subject: Re: [Python-Dev] Python 2.7 patch levels turning two digit On 21.06.2014 22:34, Chris Angelico wrote: > On Sun, Jun 22, 2014 at 2:57 AM, M.-A. Lemburg wrote: >> On 21.06.2014 12:51, Nick Coghlan wrote: >>> Such code has an easy fix available, though, as sys.version_info has >>> existed since 2.0, and handles two digit micro releases just fine. The >>> docs for sys.version also have this explicit disclaimer: "Do not >>> extract version information out of it, rather, use version_info and >>> the functions provided by the platform module." >> >> I don't think that's a good argument. Of course, there are >> better ways to figure out the version number, but fact is, >> existing code, even in the stdlib, does use and parse >> the sys.version string version. >> >> During Python's lifetime, we've always avoided two digit >> version numbers, so people have been relying on this, even >> if it was never (AFAIK) documented anywhere. > > It's going to be a broken-code-breaking change that's introduced in a > point release, but since PEP 404 implicitly says that there won't be a > 2.10.0, there's no way around that. Although actually, a glance at the > stdlib suggests that 2.10.0 (or 3.10.0) would break a lot more than > 2.7.10 would break - there are places where sys.version[:3] is used > (or equivalents like "... %.3s ..." % sys.version), or a whole-string > comparison is done against a two-part version string (eg: sys.version >> = "2.6"), and at least one place that checks sys.version[0] for the > major version number, but I didn't find any that look at > sys.version[:5] or equivalent. Everything that cares about the > three-part version number seems to either look at > sys.version.split()[0] or sys.version_info. Do you know where this > problematic code is? > > I checked this in the 2.7.3 stdlib as packaged on my Debian Wheezy > system, for what it's worth. There are no places in the stdlib that parse sys.version in a way that would break wtih 2.7.10, AFAIK. I was just referring to the statement that Nick quoted. sys.version *is* used for parsing the Python version or using parts of it to build e.g. filenames and that's really no surprise. That said, and I also included this in my answers to the questions that Nick removed in his reply, I don't think that a lot of code would be affected by this. I do believe that we can use this potential breakage as a chance for improvement. See the last question (listed here again)... 1. Is it a good strategy to ship to Python releases for every single OpenSSL security release or is there a better way to handle these 3rd party issues ? 2. Should we try to avoid two digit patch level release numbers by using some other mechanism such as e.g. a release date after 2.7.9 ? 3. Should we make use of the potential breakage with 2.7.10 to introduce a new Windows compiler version for Python 2.7 ? My answers to these are: 1. We should use dynamic linking instead and not let OpenSSL bugs trigger Python releases; 2. It's not a big problem; 3. Yes, please, since it is difficult for people to develop and debug their extensions with a 2008 compiler, when the rest of the world has long moved on. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 21 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 2014-06-09: Released eGenix pyOpenSSL 0.13.3 ... http://egenix.com/go57 2014-07-02: Python Meeting Duesseldorf ... 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sun Jun 22 00:18:05 2014 From: donald at stufft.io (Donald Stufft) Date: Sat, 21 Jun 2014 18:18:05 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <51068485f2924c599a6fe238ea81c8bd@BLUPR03MB389.namprd03.prod.outlook.com> References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> , <53A5FB11.5020302@egenix.com> <51068485f2924c599a6fe238ea81c8bd@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On Jun 21, 2014, at 6:00 PM, Steve Dower wrote: > We can always lie about the version in sys.version. Existing code is unaffected and new code will have to use version_info (Windows developers will know that Windows pulls tricks like this every other version... doesn't make it a great idea, but it works). > > Changing compiler without changing at least the install directory and preventing in place upgrades is a really bad idea, and with those mitigations is only pretty bad. I'm torn here, because I know the current situation hurts, but it'd probably only move to VC10 which will hurt just as much in a few years... there are better tooling solutions (yes, I'm working on some behind the scenes). > > A separate distro of _ssl and _hashlib wouldn't be too hard and has the same effect as a dynamically linked OpenSSL. Maybe we can make these PyPI updateable? Stuff from PyPI installs later on in the sys.path than the stdlib. I wish it were different but it means without sys.path shenanigans you can?t replace the stdlib with something from PyPI. > > Top-posted from my Windows Phone > From: M.-A. Lemburg > Sent: ?6/?21/?2014 14:38 > To: Chris Angelico > Cc: Python-Dev > Subject: Re: [Python-Dev] Python 2.7 patch levels turning two digit > > On 21.06.2014 22:34, Chris Angelico wrote: > > On Sun, Jun 22, 2014 at 2:57 AM, M.-A. Lemburg wrote: > >> On 21.06.2014 12:51, Nick Coghlan wrote: > >>> Such code has an easy fix available, though, as sys.version_info has > >>> existed since 2.0, and handles two digit micro releases just fine. The > >>> docs for sys.version also have this explicit disclaimer: "Do not > >>> extract version information out of it, rather, use version_info and > >>> the functions provided by the platform module." > >> > >> I don't think that's a good argument. Of course, there are > >> better ways to figure out the version number, but fact is, > >> existing code, even in the stdlib, does use and parse > >> the sys.version string version. > >> > >> During Python's lifetime, we've always avoided two digit > >> version numbers, so people have been relying on this, even > >> if it was never (AFAIK) documented anywhere. > > > > It's going to be a broken-code-breaking change that's introduced in a > > point release, but since PEP 404 implicitly says that there won't be a > > 2.10.0, there's no way around that. Although actually, a glance at the > > stdlib suggests that 2.10.0 (or 3.10.0) would break a lot more than > > 2.7.10 would break - there are places where sys.version[:3] is used > > (or equivalents like "... %.3s ..." % sys.version), or a whole-string > > comparison is done against a two-part version string (eg: sys.version > >> = "2.6"), and at least one place that checks sys.version[0] for the > > major version number, but I didn't find any that look at > > sys.version[:5] or equivalent. Everything that cares about the > > three-part version number seems to either look at > > sys.version.split()[0] or sys.version_info. Do you know where this > > problematic code is? > > > > I checked this in the 2.7.3 stdlib as packaged on my Debian Wheezy > > system, for what it's worth. > > There are no places in the stdlib that parse sys.version in a > way that would break wtih 2.7.10, AFAIK. I was just referring > to the statement that Nick quoted. sys.version *is* used for > parsing the Python version or using parts of it to build > e.g. filenames and that's really no surprise. > > That said, and I also included this in my answers to the questions > that Nick removed in his reply, I don't think that a lot of > code would be affected by this. I do believe that we can use > this potential breakage as a chance for improvement. See the last > question (listed here again)... > > 1. Is it a good strategy to ship to Python releases for every > single OpenSSL security release or is there a better way to > handle these 3rd party issues ? > > 2. Should we try to avoid two digit patch level release numbers > by using some other mechanism such as e.g. a release date > after 2.7.9 ? > > 3. Should we make use of the potential breakage with 2.7.10 > to introduce a new Windows compiler version for Python 2.7 ? > > My answers to these are: 1. We should use dynamic linking > instead and not let OpenSSL bugs trigger Python releases; 2. > It's not a big problem; 3. Yes, please, since it is difficult > for people to develop and debug their extensions with a > 2008 compiler, when the rest of the world has long moved on. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jun 21 2014) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 > 2014-06-09: Released eGenix pyOpenSSL 0.13.3 ... http://egenix.com/go57 > 2014-07-02: Python Meeting Duesseldorf ... 11 days to go > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From rosuav at gmail.com Sun Jun 22 01:10:41 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 22 Jun 2014 09:10:41 +1000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <51068485f2924c599a6fe238ea81c8bd@BLUPR03MB389.namprd03.prod.outlook.com> References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> <53A5FB11.5020302@egenix.com> <51068485f2924c599a6fe238ea81c8bd@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On Sun, Jun 22, 2014 at 8:00 AM, Steve Dower wrote: > We can always lie about the version in sys.version. Existing code is > unaffected and new code will have to use version_info (Windows developers > will know that Windows pulls tricks like this every other version... doesn't > make it a great idea, but it works). I'd prefer a change of format to an outright lie. Something like "2.7._10" which will sort after "2.7.9". But ideally, nothing at all - just go smoothly to "2.7.10" and let broken code be broken. It'll think it's running on 2.7.1, and if anything needs to distinguish between 2.7.1 and 2.7.x, hopefully it's using version_info. ChrisA From rosuav at gmail.com Sun Jun 22 01:11:34 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 22 Jun 2014 09:11:34 +1000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A5FB11.5020302@egenix.com> References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> <53A5FB11.5020302@egenix.com> Message-ID: On Sun, Jun 22, 2014 at 7:37 AM, M.-A. Lemburg wrote: > There are no places in the stdlib that parse sys.version in a > way that would break wtih 2.7.10, AFAIK. I was just referring > to the statement that Nick quoted. sys.version *is* used for > parsing the Python version or using parts of it to build > e.g. filenames and that's really no surprise. Right, good to know. I thought you were implying that stuff would break. Yes, stuff definitely does parse out the version number from sys.version, lots of that happens. ChrisA From breamoreboy at yahoo.co.uk Sun Jun 22 11:17:11 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 22 Jun 2014 10:17:11 +0100 Subject: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character In-Reply-To: References: Message-ID: On 11/06/2014 21:26, anatoly techtonik wrote: > I am banned from tracker, so I post the bug here: > The OP's approach to the Python community is beautifully summarised here http://bugs.python.org/issue8940 -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From martin at v.loewis.de Mon Jun 23 08:09:32 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 23 Jun 2014 08:09:32 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A55E05.5020906@egenix.com> References: <53A55E05.5020906@egenix.com> Message-ID: <53A7C49C.1090107@v.loewis.de> > * Is it a good strategy to ship to Python releases for every > single OpenSSL security release or is there a better way to > handle these 3rd party issues ? At least for Windows, a new release certainly needs to be made. It could be possible to produce MSI patch files, but this would still be a new release. > I think we should link to the OpenSSL libs dynamically rather > than statically in Python 2.7 for Windows so that it's possible > to provide drop-in updates for such issues. It is possible to provide drop-in updates regardless of whether the OpenSSL libs are dynamically linked, as the _ssl module itself is a dynamic lib. > * Should we try to avoid two digit patch level release numbers > by using some other mechanism such as e.g. a release date > after 2.7.9 ? If it was for me, then yes, certainly: the development of 2.7 should just stop :-) > * Should we make use of the potential breakage with 2.7.10 > to introduce a new Windows compiler version for Python 2.7 ? Assuming it is a good idea to continue producing Windows binaries for 2.7, I think it would be a bad idea to switch compilers. It will cause severe breakage of 2.7 installations, much more problematic than switching to two-digit version numbers. Regards, Martin From francismb at email.de Mon Jun 23 17:52:33 2014 From: francismb at email.de (francis) Date: Mon, 23 Jun 2014 17:52:33 +0200 Subject: [Python-Dev] Tracker Stats In-Reply-To: References: Message-ID: <53A84D41.6070508@email.de> > Hi, > I added a new "stats" page to the bug tracker: > http://bugs.python.org/issue?@template=stats Thanks Ezio, Two questions: how hard would be to add (or enhance) a chart with the ?open issues type enhancement? and ?open issues type bug? info ? In the summaries there is a link to ?Issues with patch?, means that the ones not listed there are in ?needs patch? or ?new? status? Regards, francis From donald at stufft.io Mon Jun 23 18:09:28 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 23 Jun 2014 12:09:28 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A7C49C.1090107@v.loewis.de> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> Message-ID: On Jun 23, 2014, at 2:09 AM, Martin v. L?wis wrote: >> >> * Should we make use of the potential breakage with 2.7.10 >> to introduce a new Windows compiler version for Python 2.7 ? > > Assuming it is a good idea to continue producing Windows binaries > for 2.7, I think it would be a bad idea to switch compilers. It will > cause severe breakage of 2.7 installations, much more problematic > than switching to two-digit version numbers. I agree with this, we?ve just finally started getting things to the point where it makes a lot of sense for binary distributions for Windows. Breaking all of them on 2.7 would be very bad. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Mon Jun 23 21:27:47 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 23 Jun 2014 21:27:47 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> Message-ID: <53A87FB3.2000100@egenix.com> On 23.06.2014 18:09, Donald Stufft wrote: > > On Jun 23, 2014, at 2:09 AM, Martin v. L?wis wrote: > >>> >>> * Should we make use of the potential breakage with 2.7.10 >>> to introduce a new Windows compiler version for Python 2.7 ? >> >> Assuming it is a good idea to continue producing Windows binaries >> for 2.7, I think it would be a bad idea to switch compilers. It will >> cause severe breakage of 2.7 installations, much more problematic >> than switching to two-digit version numbers. > > I agree with this, we?ve just finally started getting things to the point where > it makes a lot of sense for binary distributions for Windows. Breaking all > of them on 2.7 would be very bad. Not sure what you mean. We've had binary wininst distributions for Windows for more than a decade, and egg and msi distributions for 8 years :-) But without access to the VS 2008 compiler that is needed to compile those extensions, it will become increasingly difficult for package authors to provide such binary packages, so we have to ask ourselves: What's worse: breaking old Windows binaries for Python 2.7 or not having updated and new Windows binaries for Python 2.7 at all in a few years ? Switching to a newer compiler will make things easier for everyone and we'd see more binary packages for Windows again. Given that Python 2.7 support was extended for another 5 years at the recent Python Language Summit to 2020, we have to face this breakage sooner or later anyway. Extended support for VS 2008 will end in 2018 (but then: Python developers usually don't have extended support contracts with MS). Service pack support has already ended in 2009. Depending on how you see it, using such an old compiler also poses security risks. The last security update for VS 2008 dates back to 2011 (http://support.microsoft.com/kb/2538243). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 23 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 2014-07-02: Python Meeting Duesseldorf ... 9 days to go 2014-07-21: EuroPython 2014, Berlin, Germany ... 28 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nad at acm.org Mon Jun 23 21:53:14 2014 From: nad at acm.org (Ned Deily) Date: Mon, 23 Jun 2014 12:53:14 -0700 Subject: [Python-Dev] Python 2.7 patch levels turning two digit References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> Message-ID: In article <53A87FB3.2000100 at egenix.com>, "M.-A. Lemburg" wrote: [...] > But without access to the VS 2008 compiler that is needed to > compile those extensions, it will become increasingly difficult > for package authors to provide such binary packages, so we have to > ask ourselves: > > What's worse: breaking old Windows binaries for Python 2.7 > or not having updated and new Windows binaries for Python 2.7 > at all in a few years ? > > Switching to a newer compiler will make things easier for everyone > and we'd see more binary packages for Windows again. It does seem like a conundrum. As I have no deep Windows experience to be able to have an appreciation of all of the technical issues involved, I ask out of ignorance: would it be possible and desirable to provide a transition period of n 2.7.x maintenance releases (where n is between 1 and, say, 3) where we would provide 2 sets of Windows installers, one set (32- and 64-bit) with the older compiler and CRT and one with the newer, and campaign to get users and packagers who provide binary extensions to migrate? Would that mitigate the pain, assuming that Steve (or someone else) would be willing to build the additional installers for the transition period? I've done something similar on a smaller scale with the OS X 32-bit installer for 2.7.x but that impact is much less as the audience for that installer is much smaller. -- Ned Deily, nad at acm.org From antoine at python.org Mon Jun 23 22:04:29 2014 From: antoine at python.org (Antoine Pitrou) Date: Mon, 23 Jun 2014 16:04:29 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A87FB3.2000100@egenix.com> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> Message-ID: Le 23/06/2014 15:27, M.-A. Lemburg a ?crit : > > Not sure what you mean. We've had binary wininst distributions > for Windows for more than a decade, and egg and msi distributions > for 8 years :-) > > But without access to the VS 2008 compiler that is needed to > compile those extensions, It does seem to be available: http://www.microsoft.com/en-us/download/details.aspx?id=13276 What am I missing? Regards Antoine. From rdmurray at bitdance.com Mon Jun 23 22:12:24 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 23 Jun 2014 16:12:24 -0400 Subject: [Python-Dev] Tracker Stats In-Reply-To: <53A84D41.6070508@email.de> References: <53A84D41.6070508@email.de> Message-ID: <20140623201225.0DA80250DE6@webabinitio.net> On Mon, 23 Jun 2014 17:52:33 +0200, francis wrote: > > > Hi, > > I added a new "stats" page to the bug tracker: > > http://bugs.python.org/issue?@template=stats > Thanks Ezio, > > Two questions: > how hard would be to add (or enhance) a chart with the > ???open issues type enhancement??? and ???open issues type bug??? > info ? > > In the summaries there is a link to ???Issues with patch???, > means that the ones not listed there are in ???needs patch??? > or ???new??? status? The stats graphs are based on the data generated for the weekly issue report. I have a patched version of that report that adds the bug/enhancement info. I'll try to dig it up this week; someone ping me if I forget :) It think the patch will need to be updated based on Ezio's changes. --David From donald at stufft.io Mon Jun 23 22:20:30 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 23 Jun 2014 16:20:30 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A87FB3.2000100@egenix.com> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> Message-ID: <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> On Jun 23, 2014, at 3:27 PM, M.-A. Lemburg wrote: > On 23.06.2014 18:09, Donald Stufft wrote: >> >> On Jun 23, 2014, at 2:09 AM, Martin v. L?wis wrote: >> >>>> >>>> * Should we make use of the potential breakage with 2.7.10 >>>> to introduce a new Windows compiler version for Python 2.7 ? >>> >>> Assuming it is a good idea to continue producing Windows binaries >>> for 2.7, I think it would be a bad idea to switch compilers. It will >>> cause severe breakage of 2.7 installations, much more problematic >>> than switching to two-digit version numbers. >> >> I agree with this, we?ve just finally started getting things to the point where >> it makes a lot of sense for binary distributions for Windows. Breaking all >> of them on 2.7 would be very bad. Err, sorry that ?We? was with my pip hat on. > > Not sure what you mean. We've had binary wininst distributions > for Windows for more than a decade, and egg and msi distributions > for 8 years :-) Nonetheless, changing the compiler will not only break pip, but every automated installer tool (easy_install, buildout) that i?m aware of. The blow back for binary installation is going to be huge I think. > > But without access to the VS 2008 compiler that is needed to > compile those extensions, it will become increasingly difficult > for package authors to provide such binary packages, so we have to > ask ourselves: > > What's worse: breaking old Windows binaries for Python 2.7 > or not having updated and new Windows binaries for Python 2.7 > at all in a few years ? At the risk of getting Guido to post his slide again, I still think the solution to the old compiler is to just roll a 2.8 with minimal changes. It could even be a good place to move to the ssl backport changes too since they were the riskier set of changes in PEP466. But either way, if a compiler does change in a 2.7 release we?ll need to update a lot of tooling to cope with that, so any plan to do that should include that and a timeline for adoption of that. > > Switching to a newer compiler will make things easier for everyone > and we'd see more binary packages for Windows again. > > Given that Python 2.7 support was extended for another 5 years at the > recent Python Language Summit to 2020, we have to face this > breakage sooner or later anyway. Extended support for VS 2008 > will end in 2018 (but then: Python developers usually don't have > extended support contracts with MS). Service pack support has already > ended in 2009. > > Depending on how you see it, using such an old compiler also > poses security risks. The last security update for VS 2008 dates > back to 2011 (http://support.microsoft.com/kb/2538243). > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jun 23 2014) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2014-06-17: Released eGenix PyRun 2.0.0 ... http://egenix.com/go58 > 2014-07-02: Python Meeting Duesseldorf ... 9 days to go > 2014-07-21: EuroPython 2014, Berlin, Germany ... 28 days to go > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From barry at python.org Mon Jun 23 22:31:03 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 23 Jun 2014 16:31:03 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> Message-ID: <20140623163103.75073882@anarchist.wooz.org> On Jun 23, 2014, at 04:20 PM, Donald Stufft wrote: >At the risk of getting Guido to post his slide again, I still think the >solution to the old compiler is to just roll a 2.8 with minimal changes. No. It's not going to happen, for all the reasons discussed previously. Python 2.8 is not a solution to anything. If a new, incompatible compiler suite is required, why can't there just be multiple Windows downloads on https://www.python.org/download/releases/2.7.7/ ? Well, on reason is that you'd have to convince MvL or someone else to take over the work that would require, but that's gotta be *much* lighter weight than releasing a Python 2.8. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From ethan at stoneleaf.us Mon Jun 23 22:12:15 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 23 Jun 2014 13:12:15 -0700 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> Message-ID: <53A88A1F.308@stoneleaf.us> On 06/23/2014 01:04 PM, Antoine Pitrou wrote: > Le 23/06/2014 15:27, M.-A. Lemburg a ?crit : >> >> Not sure what you mean. We've had binary wininst distributions >> for Windows for more than a decade, and egg and msi distributions >> for 8 years :-) >> >> But without access to the VS 2008 compiler that is needed to >> compile those extensions, > > It does seem to be available: > http://www.microsoft.com/en-us/download/details.aspx?id=13276 > > What am I missing? Is that VS 2008 /with/ the SP, or just the SP? -- ~Ethan~ From martin at v.loewis.de Mon Jun 23 22:40:30 2014 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 23 Jun 2014 22:40:30 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> Message-ID: <53A890BE.3000102@v.loewis.de> Am 23.06.14 22:04, schrieb Antoine Pitrou: > Le 23/06/2014 15:27, M.-A. Lemburg a ?crit : >> >> Not sure what you mean. We've had binary wininst distributions >> for Windows for more than a decade, and egg and msi distributions >> for 8 years :-) >> >> But without access to the VS 2008 compiler that is needed to >> compile those extensions, > > It does seem to be available: > http://www.microsoft.com/en-us/download/details.aspx?id=13276 > > What am I missing? I believe (without testing) that this is just the service pack. Installing it requires a pre-existing installation of Visual Studio 2008, or else the installer will refuse to do anything. Note that it also won't install on top of Visual Studio Express: you need a licensed copy of Visual Studio to install the service pack. Visual Studio 2008 still *is* available to MSDN users. It's just not available through regular sales channels anymore. Regards, Martin From donald at stufft.io Mon Jun 23 22:43:31 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 23 Jun 2014 16:43:31 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <20140623163103.75073882@anarchist.wooz.org> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <20140623163103.75073882@anarchist.wooz.org> Message-ID: On Jun 23, 2014, at 4:31 PM, Barry Warsaw wrote: > On Jun 23, 2014, at 04:20 PM, Donald Stufft wrote: > >> At the risk of getting Guido to post his slide again, I still think the >> solution to the old compiler is to just roll a 2.8 with minimal changes. > > No. It's not going to happen, for all the reasons discussed previously. > Python 2.8 is not a solution to anything. > > If a new, incompatible compiler suite is required, why can't there just be > multiple Windows downloads on https://www.python.org/download/releases/2.7.7/ > ? Well, on reason is that you'd have to convince MvL or someone else to take > over the work that would require, but that's gotta be *much* lighter weight > than releasing a Python 2.8. As far as I am aware, a 2.7 with a different compiler, even if it?s just an option is an attractive nuisance. None of the tooling right now differentiates between binary compatibility by anything other than ?CPython 2.7?. The end result of having a 2.7 which is built with the old compiler, and a 2.7 built with the new compiler is that you?ll end up with binary distributions which work sometimes if you?re lucky and the creator of the binary distribution and you happened to pick the same ?variant? of 2.7. Most likely result is all the binary distributions will *mostly* still depend on using the old compiler because of the corpus of existing binary packages that depend on that. Which means that the 2.7 with new compiler will exist entirely to act as a footgun to anyone who picks it and also wants to use binary packages. > > -Barry > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From martin at v.loewis.de Mon Jun 23 22:42:40 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 23 Jun 2014 22:42:40 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <20140623163103.75073882@anarchist.wooz.org> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <20140623163103.75073882@anarchist.wooz.org> Message-ID: <53A89140.80609@v.loewis.de> Am 23.06.14 22:31, schrieb Barry Warsaw: > On Jun 23, 2014, at 04:20 PM, Donald Stufft wrote: > >> At the risk of getting Guido to post his slide again, I still think the >> solution to the old compiler is to just roll a 2.8 with minimal changes. > > No. It's not going to happen, for all the reasons discussed previously. > Python 2.8 is not a solution to anything. > > If a new, incompatible compiler suite is required, why can't there just be > multiple Windows downloads on https://www.python.org/download/releases/2.7.7/ > ? Well, on reason is that you'd have to convince MvL or someone else to take > over the work that would require, but that's gotta be *much* lighter weight > than releasing a Python 2.8. See my other message. It's actually heavier, since it requires changes to distutils, PyPI, pip, buildout etc., all which know how to deal with Python minor version numbers, but are unaware of the notion of competing ABIs on Windows (except that they know how to deal with 32-bit vs. 64-bit). Regards, Martin From martin at v.loewis.de Mon Jun 23 22:31:41 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 23 Jun 2014 22:31:41 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> Message-ID: <53A88EAD.6040600@v.loewis.de> Am 23.06.14 21:53, schrieb Ned Deily: > It does seem like a conundrum. As I have no deep Windows experience to > be able to have an appreciation of all of the technical issues involved, > I ask out of ignorance: would it be possible and desirable to provide a > transition period of n 2.7.x maintenance releases (where n is between 1 > and, say, 3) where we would provide 2 sets of Windows installers, one > set (32- and 64-bit) with the older compiler and CRT and one with the > newer, and campaign to get users and packagers who provide binary > extensions to migrate? The question is how exactly you implement the transition. I see two alternatives: 1. "Hijack" the 2.7 name space, in particular the name "python27.dll", along with registry keys, the .pyd extension, etc. Doing so would permit users to mix binaries from different compilers, and doing so would lead to crashes. Users would have to be careful to either install packages built for the old compiler or packages for the new compiler, and never mix. 2. "Sandbox" the 2.7 name space; come up with new names for everything. E.g. "python27vs13.dll", ".pydvs13" (or "_vs13.pyd"), "C:\Python27vs13", along with the corresponding changes to PyPI, pip, buildout, etc. which would need to learn to look for the right variant of a Python 2.7 package. This should work, but might take several years to implement: you need to find all the places in existing code that refer to the "old" names. If you do it right, you are done about the time when VS 13 becomes unavailable, so you'd then do another such transition to VS 2015, which promises forward-binary compatibility to future releases of Visual Studio. > Would that mitigate the pain, assuming that > Steve (or someone else) would be willing to build the additional > installers for the transition period? I've done something similar on a > smaller scale with the OS X 32-bit installer for 2.7.x but that impact > is much less as the audience for that installer is much smaller. Well, the question really is whether precompiled extension modules available from PyPI would work on both compilers. I understand that for OSX, you typically don't have precompiled binaries for the extension modules, so installation compiles the modules from scratch. This is easier, as it can use the ABI of the Python which will be installed to. If you go the "parallel ABIs" route, extension authors have to provide two parallel sets of packages as well. Given 32-bit and 64-bit packages, this will make actually two additional packages - just as if they had to support another Python version. Regards, Martin From Steve.Dower at microsoft.com Mon Jun 23 22:31:19 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 23 Jun 2014 20:31:19 +0000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> Message-ID: <6622306b01cc4446a2df6ae90c3c087c@BLUPR03MB389.namprd03.prod.outlook.com> > Antoine Pitrou wrote: > Le 23/06/2014 15:27, M.-A. Lemburg a ?crit : >> >> Not sure what you mean. We've had binary wininst distributions for >> Windows for more than a decade, and egg and msi distributions for 8 >> years :-) >> >> But without access to the VS 2008 compiler that is needed to compile >> those extensions, > > It does seem to be available: > http://www.microsoft.com/en-us/download/details.aspx?id=13276 > > What am I missing? That's the service pack, which will only install if you already have VS 2008 installed. The only official source for VS 2008 these days is through an MSDN subscription, though there's a link floating around that will get to an ISO of VC 2008 Express (but it could disappear or move at any time, which would break the link for good). It's also possible to get VC9 standalone through the Windows SDK for Windows 7 and .NET 3.5, but this installer has bugs, and distutils does not detect VC installs properly (it only detects Visual Studio and then assumes VC). This is fixable with a few extra files and registry keys, but not simple. The best answer here is making VC9 available in a long-term, unsupported manner (support is the main MSFT concern - simply throwing products out there and forgetting about them is very counter-cultural). I'm working on getting people to recognize the importance of keeping the old compilers available, but it's an uphill battle. Obviously I'll post here as soon as I have something I can officially share. :) Cheers, Steve > Regards > > Antoine. From donald at stufft.io Mon Jun 23 22:55:55 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 23 Jun 2014 16:55:55 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A88EAD.6040600@v.loewis.de> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <53A88EAD.6040600@v.loewis.de> Message-ID: <14DE41E2-5314-4E49-BE93-85EEEDDDEEAD@stufft.io> On Jun 23, 2014, at 4:31 PM, Martin v. L?wis wrote: >> >> Would that mitigate the pain, assuming that >> Steve (or someone else) would be willing to build the additional >> installers for the transition period? I've done something similar on a >> smaller scale with the OS X 32-bit installer for 2.7.x but that impact >> is much less as the audience for that installer is much smaller. > > Well, the question really is whether precompiled extension modules > available from PyPI would work on both compilers. I understand that > for OSX, you typically don't have precompiled binaries for the extension > modules, so installation compiles the modules from scratch. This is > easier, as it can use the ABI of the Python which will be installed > to. > > If you go the "parallel ABIs" route, extension authors have to provide > two parallel sets of packages as well. Given 32-bit and 64-bit packages, > this will make actually two additional packages - just as if they had > to support another Python version. As far as I know, stuff on OSX is generally built for ?X compiler or later? so binary compatibility is kept as long as you?re using an ?or later? but I could be wrong about that. Using binary packages on OSX is a much less frequent thing I think though since getting a working compiler toolchain is easier there. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From martin at v.loewis.de Mon Jun 23 22:49:32 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 23 Jun 2014 22:49:32 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <20140623163103.75073882@anarchist.wooz.org> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <20140623163103.75073882@anarchist.wooz.org> Message-ID: <53A892DC.7080106@v.loewis.de> Am 23.06.14 22:31, schrieb Barry Warsaw: > Well, on reason is that you'd have to convince MvL or someone else to take > over the work that would require, but that's gotta be *much* lighter weight > than releasing a Python 2.8. Just to point this out in a separate message: it will have to be somebody else. I stepped down as the Windows release maintainer for 2.7 when I learned about the extended life of 2.7, much because I feared that exactly the thing would happen that we see happen now - and I didn't want to be the one who would have to deal with it. It is a mess, and it will get bigger the more time passes. Playing-the-role-of-Cassandra-ly y'rs, Martin From mal at egenix.com Mon Jun 23 23:07:02 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 23 Jun 2014 23:07:02 +0200 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> Message-ID: <53A896F6.2030401@egenix.com> On 23.06.2014 22:20, Donald Stufft wrote: > > On Jun 23, 2014, at 3:27 PM, M.-A. Lemburg wrote: > >> On 23.06.2014 18:09, Donald Stufft wrote: >>> >>> On Jun 23, 2014, at 2:09 AM, Martin v. L?wis wrote: >>> >>>>> >>>>> * Should we make use of the potential breakage with 2.7.10 >>>>> to introduce a new Windows compiler version for Python 2.7 ? >>>> >>>> Assuming it is a good idea to continue producing Windows binaries >>>> for 2.7, I think it would be a bad idea to switch compilers. It will >>>> cause severe breakage of 2.7 installations, much more problematic >>>> than switching to two-digit version numbers. >>> >>> I agree with this, we?ve just finally started getting things to the point where >>> it makes a lot of sense for binary distributions for Windows. Breaking all >>> of them on 2.7 would be very bad. > > Err, sorry that ?We? was with my pip hat on. > >> >> Not sure what you mean. We've had binary wininst distributions >> for Windows for more than a decade, and egg and msi distributions >> for 8 years :-) > > Nonetheless, changing the compiler will not only break pip, but every > automated installer tool (easy_install, buildout) that i?m aware of. The > blow back for binary installation is going to be huge I think. > >> But without access to the VS 2008 compiler that is needed to >> compile those extensions, it will become increasingly difficult >> for package authors to provide such binary packages, so we have to >> ask ourselves: >> >> What's worse: breaking old Windows binaries for Python 2.7 >> or not having updated and new Windows binaries for Python 2.7 >> at all in a few years ? > > At the risk of getting Guido to post his slide again, I still think the > solution to the old compiler is to just roll a 2.8 with minimal changes. > It could even be a good place to move to the ssl backport changes > too since they were the riskier set of changes in PEP466. > > But either way, if a compiler does change in a 2.7 release we?ll need > to update a lot of tooling to cope with that, so any plan to do that should > include that and a timeline for adoption of that. Sure, and we'd need to hash out possible solutions to minimize breakage, but first we'll have to see whether we want to consider this step or not. BTW: It's strange that I'm arguing for breaking things. I'm usually on the other side of such arguments :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nad at acm.org Mon Jun 23 23:14:41 2014 From: nad at acm.org (Ned Deily) Date: Mon, 23 Jun 2014 14:14:41 -0700 Subject: [Python-Dev] Python 2.7 patch levels turning two digit References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <53A88EAD.6040600@v.loewis.de> <14DE41E2-5314-4E49-BE93-85EEEDDDEEAD@stufft.io> Message-ID: In article <14DE41E2-5314-4E49-BE93-85EEEDDDEEAD at stufft.io>, Donald Stufft wrote: > On Jun 23, 2014, at 4:31 PM, Martin v. Lowis wrote: > > >> > >> Would that mitigate the pain, assuming that > >> Steve (or someone else) would be willing to build the additional > >> installers for the transition period? I've done something similar on a > >> smaller scale with the OS X 32-bit installer for 2.7.x but that impact > >> is much less as the audience for that installer is much smaller. > > > > Well, the question really is whether precompiled extension modules > > available from PyPI would work on both compilers. I understand that > > for OSX, you typically don't have precompiled binaries for the extension > > modules, so installation compiles the modules from scratch. This is > > easier, as it can use the ABI of the Python which will be installed > > to. > > > > If you go the "parallel ABIs" route, extension authors have to provide > > two parallel sets of packages as well. Given 32-bit and 64-bit packages, > > this will make actually two additional packages - just as if they had > > to support another Python version. > > As far as I know, stuff on OSX is generally built for "X compiler or later" > so binary compatibility is kept as long as you're using an "or later" but > I could be wrong about that. Using binary packages on OSX is a much > less frequent thing I think though since getting a working compiler toolchain > is easier there. Both points are generally true on OS X so, yes, binary extensions are much less of an issue there. Where binary distributions on OS X are most used, I think, is when there are dependencies on third-party non-Python libraries that are not shipped by Apple with OS X. But, yes, if we were to go down the route of two sets of installers, that could mean two sets of third-party packages. I suppose there could potentially be some pip / wheel / possibly Distutils help by conditioning the platform name or other component used to generate the egg / wheel / and/or bdist file names on the CRT version (or compiler version), much as what we do today with OS X deployment target. Again, I'm speculating in ignorance here. If that were feasible, things built with the old toolchain could have unchanged names. And, clearly, we would want to keep that "n" number of releases with two sets of installers to be as small as possible, like 1. While there would be a certain amount of unavoidable disruption for some folks, others *might* welcome the opportunity to no longer have to keep around old versions of the tool chain, particularly if they now could use the same tool chain to produce binaries for both Py2 and Py3. -- Ned Deily, nad at acm.org From donald at stufft.io Mon Jun 23 23:15:05 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 23 Jun 2014 17:15:05 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A896F6.2030401@egenix.com> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <53A896F6.2030401@egenix.com> Message-ID: <685B2505-BABC-41DE-9A2E-BE64D2CE6AF8@stufft.io> On Jun 23, 2014, at 5:07 PM, M.-A. Lemburg wrote: > On 23.06.2014 22:20, Donald Stufft wrote: >> >> On Jun 23, 2014, at 3:27 PM, M.-A. Lemburg wrote: >> >>> On 23.06.2014 18:09, Donald Stufft wrote: >>>> >>>> On Jun 23, 2014, at 2:09 AM, Martin v. L?wis wrote: >>>> >>>>>> >>>>>> * Should we make use of the potential breakage with 2.7.10 >>>>>> to introduce a new Windows compiler version for Python 2.7 ? >>>>> >>>>> Assuming it is a good idea to continue producing Windows binaries >>>>> for 2.7, I think it would be a bad idea to switch compilers. It will >>>>> cause severe breakage of 2.7 installations, much more problematic >>>>> than switching to two-digit version numbers. >>>> >>>> I agree with this, we?ve just finally started getting things to the point where >>>> it makes a lot of sense for binary distributions for Windows. Breaking all >>>> of them on 2.7 would be very bad. >> >> Err, sorry that ?We? was with my pip hat on. >> >>> >>> Not sure what you mean. We've had binary wininst distributions >>> for Windows for more than a decade, and egg and msi distributions >>> for 8 years :-) >> >> Nonetheless, changing the compiler will not only break pip, but every >> automated installer tool (easy_install, buildout) that i?m aware of. The >> blow back for binary installation is going to be huge I think. >> >>> But without access to the VS 2008 compiler that is needed to >>> compile those extensions, it will become increasingly difficult >>> for package authors to provide such binary packages, so we have to >>> ask ourselves: >>> >>> What's worse: breaking old Windows binaries for Python 2.7 >>> or not having updated and new Windows binaries for Python 2.7 >>> at all in a few years ? >> >> At the risk of getting Guido to post his slide again, I still think the >> solution to the old compiler is to just roll a 2.8 with minimal changes. >> It could even be a good place to move to the ssl backport changes >> too since they were the riskier set of changes in PEP466. >> >> But either way, if a compiler does change in a 2.7 release we?ll need >> to update a lot of tooling to cope with that, so any plan to do that should >> include that and a timeline for adoption of that. > > Sure, and we'd need to hash out possible solutions to minimize > breakage, but first we'll have to see whether we want to consider > this step or not. > > > BTW: It's strange that I'm arguing for breaking things. I'm usually > on the other side of such arguments :-) Ok. I?m just making sure that any proposal to do this includes how it plans to work around/minimize that. I agree with Martin (I think) that trying to fix the entire ecosystem to cope with that change is going to be far more work than folks realize and that it needs to be an explicit part of the discussion when deciding how to solve the problem. Normally when I see someone suggest that switching compilers in 2.7.x is likely to be less work than releasing a 2.8 It normally appears to me they haven?t looked at the impact on the packaging tooling. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source >>>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ckaynor at zindagigames.com Mon Jun 23 23:14:44 2014 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Mon, 23 Jun 2014 14:14:44 -0700 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A896F6.2030401@egenix.com> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <53A896F6.2030401@egenix.com> Message-ID: Not being a Python developer, I normally just lurk on Py-Dev, but I figured I'd throw this out there for this thread: Recent version of Maya embed Python 2.x, and the newer version of Maya (I believe 2012 was the first version) embeds a Python 2.7 compiled with VS 2010. From my experience, most C extensions work across compiler versions, however when they don't, it's generally a fairly difficult to debug issue - at least unless you know what to look for in the call stacks, and have access to the symbol files. Chris On Mon, Jun 23, 2014 at 2:07 PM, M.-A. Lemburg wrote: > On 23.06.2014 22:20, Donald Stufft wrote: > > > > On Jun 23, 2014, at 3:27 PM, M.-A. Lemburg wrote: > > > >> On 23.06.2014 18:09, Donald Stufft wrote: > >>> > >>> On Jun 23, 2014, at 2:09 AM, Martin v. L?wis > wrote: > >>> > >>>>> > >>>>> * Should we make use of the potential breakage with 2.7.10 > >>>>> to introduce a new Windows compiler version for Python 2.7 ? > >>>> > >>>> Assuming it is a good idea to continue producing Windows binaries > >>>> for 2.7, I think it would be a bad idea to switch compilers. It will > >>>> cause severe breakage of 2.7 installations, much more problematic > >>>> than switching to two-digit version numbers. > >>> > >>> I agree with this, we?ve just finally started getting things to the > point where > >>> it makes a lot of sense for binary distributions for Windows. Breaking > all > >>> of them on 2.7 would be very bad. > > > > Err, sorry that ?We? was with my pip hat on. > > > >> > >> Not sure what you mean. We've had binary wininst distributions > >> for Windows for more than a decade, and egg and msi distributions > >> for 8 years :-) > > > > Nonetheless, changing the compiler will not only break pip, but every > > automated installer tool (easy_install, buildout) that i?m aware of. The > > blow back for binary installation is going to be huge I think. > > > >> But without access to the VS 2008 compiler that is needed to > >> compile those extensions, it will become increasingly difficult > >> for package authors to provide such binary packages, so we have to > >> ask ourselves: > >> > >> What's worse: breaking old Windows binaries for Python 2.7 > >> or not having updated and new Windows binaries for Python 2.7 > >> at all in a few years ? > > > > At the risk of getting Guido to post his slide again, I still think the > > solution to the old compiler is to just roll a 2.8 with minimal changes. > > It could even be a good place to move to the ssl backport changes > > too since they were the riskier set of changes in PEP466. > > > > But either way, if a compiler does change in a 2.7 release we?ll need > > to update a lot of tooling to cope with that, so any plan to do that > should > > include that and a timeline for adoption of that. > > Sure, and we'd need to hash out possible solutions to minimize > breakage, but first we'll have to see whether we want to consider > this step or not. > > > BTW: It's strange that I'm arguing for breaking things. I'm usually > on the other side of such arguments :-) > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source > >>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ckaynor%40zindagigames.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Jun 23 23:22:27 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 23 Jun 2014 17:22:27 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <685B2505-BABC-41DE-9A2E-BE64D2CE6AF8@stufft.io> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <53A896F6.2030401@egenix.com> <685B2505-BABC-41DE-9A2E-BE64D2CE6AF8@stufft.io> Message-ID: <20140623172227.3c58884a@anarchist.wooz.org> On Jun 23, 2014, at 05:15 PM, Donald Stufft wrote: >Normally when I see someone suggest that switching compilers >in 2.7.x is likely to be less work than releasing a 2.8 It normally >appears to me they haven?t looked at the impact on the packaging >tooling. Just to be clear, releasing a Python 2.8 has enormous impact outside of just the amount of work to do it. It's an exceedingly bad idea. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From donald at stufft.io Mon Jun 23 23:28:23 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 23 Jun 2014 17:28:23 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <20140623172227.3c58884a@anarchist.wooz.org> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <53A896F6.2030401@egenix.com> <685B2505-BABC-41DE-9A2E-BE64D2CE6AF8@stufft.io> <20140623172227.3c58884a@anarchist.wooz.org> Message-ID: <09813B4B-373A-4058-B5FE-887939E07B55@stufft.io> On Jun 23, 2014, at 5:22 PM, Barry Warsaw wrote: > On Jun 23, 2014, at 05:15 PM, Donald Stufft wrote: > >> Normally when I see someone suggest that switching compilers >> in 2.7.x is likely to be less work than releasing a 2.8 It normally >> appears to me they haven?t looked at the impact on the packaging >> tooling. > > Just to be clear, releasing a Python 2.8 has enormous impact outside of just > the amount of work to do it. It's an exceedingly bad idea. Can you clarify? Also FWIW I?m not really married to the 2.8 thing, it?s mostly that, on Windows, the X.Y release prior to the ABI thing in 3.x _was_ the ABI so all the tooling builds on that. So you need to either 1) Stick with the old Compiler 2) Release 2.8 3) Do all the work to fix all the tooling to cope with the fact that X.Y isn?t the ABI on 2.x anymore I don?t think a reasonable option is: 4) Just switch compilers and leave it on someone else?s doorsteps to fix the entire packaging tool chain to cope. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ethan at stoneleaf.us Mon Jun 23 23:19:13 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 23 Jun 2014 14:19:13 -0700 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A5FDB2.1080000@stoneleaf.us> References: <53A55E05.5020906@egenix.com> <53A5B995.6040802@egenix.com> <53A5FB11.5020302@egenix.com> <53A5FDB2.1080000@stoneleaf.us> Message-ID: <53A899D1.6050306@stoneleaf.us> On 06/21/2014 02:48 PM, Ethan Furman wrote: > On 06/21/2014 02:37 PM, M.-A. Lemburg wrote: >> >> My answers to these are: 1. We should use dynamic linking >> instead and not let OpenSSL bugs trigger Python releases; 2. >> It's not a big problem; 3. Yes, please, since it is difficult >> for people to develop and debug their extensions with a >> 2008 compiler, when the rest of the world has long moved on. > > +1 (assuming not incredibly difficult and those that can are willing ;) Revising this to: +1, -0, -1 It seems to me the intention of supporting 2.7 for so long was not to give ourselves additional nightmares, but to provide a basic level of support for those who are needing longer time before migrating. One of the reasons to migrate is to avoid future pain (pain is an excellent motivator -- it's why we don't go to the doctor when we're healthy, right? ;) If getting new or updated modules becomes more painful then that's motivation to upgrade -- not motivation for us to make both our lives (with the extra work) and everyone's else lives (why isn't this module working? oh, wrong compiler) more difficult. -- ~Ethan~ From barry at python.org Mon Jun 23 23:47:27 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 23 Jun 2014 17:47:27 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <09813B4B-373A-4058-B5FE-887939E07B55@stufft.io> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <53A896F6.2030401@egenix.com> <685B2505-BABC-41DE-9A2E-BE64D2CE6AF8@stufft.io> <20140623172227.3c58884a@anarchist.wooz.org> <09813B4B-373A-4058-B5FE-887939E07B55@stufft.io> Message-ID: <20140623174727.5ab8feb4@anarchist.wooz.org> On Jun 23, 2014, at 05:28 PM, Donald Stufft wrote: >Can you clarify? What support guarantees will we make about Python 2.8? Will it be supported as long as Python 2.7? Longer? Will we now have two long-term support versions or change *years* of expectations that users should transition off of Python 2.7 onto Python 2.8? Will all the LTS policies for 2.7 (e.g. PEP 466) be retired for 2.7 and/or adopted completely into 2.8? What should Linux distros do? Should they support both 2.7 and 2.8 or begin the long and potentially arduous process of certifying and transitioning to 2.8? What about other operating systems and package managers, including commercial redistributors? Who is going to do the work to make sure patch are forward ported from 2.7 to 2.8? Who is going to be the 2.8 release manager? Will they be strong enough to reject any and all new features that wouldn't have already made it into 2.7 (due to the already approved, narrow exemptions)? Or will we open the flood gates to Just One More Little New Feature To Make It Easier To Port to Python 3? How will we manage the PR surrounding our backtracking on Python 2.8? How will we manage expectations that it's only released to support a new Windows compiler? Should non-Windows users just ignore it (much like the Python 1.6 releases were mostly ignored)? How do you know which tools, workflows, and processes that will break with a Python 2.8 release? What assumptions about 2.7 being EOL for Python 2 are baked into the ecosystems outside of core Python? I could probably go on, but I'm exhausted. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From rosuav at gmail.com Mon Jun 23 23:48:06 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 24 Jun 2014 07:48:06 +1000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <53A89140.80609@v.loewis.de> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <20140623163103.75073882@anarchist.wooz.org> <53A89140.80609@v.loewis.de> Message-ID: On Tue, Jun 24, 2014 at 6:42 AM, "Martin v. L?wis" wrote: > See my other message. It's actually heavier, since it requires changes > to distutils, PyPI, pip, buildout etc., all which know how to deal with > Python minor version numbers, but are unaware of the notion of competing > ABIs on Windows (except that they know how to deal with 32-bit vs. 64-bit). Is it possible to hijack the "deal with 32-bit vs 64-bit"ness of things to handle the different compilers? So, for instance, there might be a "32-bit-NewCompiler" and a "64-bit-NewCompiler", two new architectures, just as if someone came out with a 128-bit Windows and built Python 2.7 for it. Would packaging be able to handle that more easily than a compiler change within the same architecture? ChrisA From donald at stufft.io Tue Jun 24 00:04:26 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 23 Jun 2014 18:04:26 -0400 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <20140623163103.75073882@anarchist.wooz.org> <53A89140.80609@v.loewis.de> Message-ID: <7D629B76-5800-40D6-A071-D26CB0F794E7@stufft.io> On Jun 23, 2014, at 5:48 PM, Chris Angelico wrote: > On Tue, Jun 24, 2014 at 6:42 AM, "Martin v. L?wis" wrote: >> See my other message. It's actually heavier, since it requires changes >> to distutils, PyPI, pip, buildout etc., all which know how to deal with >> Python minor version numbers, but are unaware of the notion of competing >> ABIs on Windows (except that they know how to deal with 32-bit vs. 64-bit). > > Is it possible to hijack the "deal with 32-bit vs 64-bit"ness of > things to handle the different compilers? So, for instance, there > might be a "32-bit-NewCompiler" and a "64-bit-NewCompiler", two new > architectures, just as if someone came out with a 128-bit Windows and > built Python 2.7 for it. Would packaging be able to handle that more > easily than a compiler change within the same architecture? > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io I?m not sure about this FWIW. I?d have to look at the implementations of stuff to see how they?d cope with a new thing like that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From amk at amk.ca Tue Jun 24 00:25:14 2014 From: amk at amk.ca (A.M. Kuchling) Date: Mon, 23 Jun 2014 18:25:14 -0400 Subject: [Python-Dev] Tracker Stats In-Reply-To: <20140623201225.0DA80250DE6@webabinitio.net> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> Message-ID: <20140623222514.GA74324@datlandrewk.home> On Mon, Jun 23, 2014 at 04:12:24PM -0400, R. David Murray wrote: > The stats graphs are based on the data generated for the > weekly issue report. I have a patched version of that > report that adds the bug/enhancement info. After PyCon, I started working on a scraper that would produce a bunch of different lists and charts. My ideas were: * pie charts of issues by status and type. * list or histogram of open library issues by module, perhaps limited to the top N modules * list of N oldest issues with no subsequent activity (the unreviewed ones) * list of N people with the most open issues assigned to them The idea is to provide charts that help us direct effort to particular subsets of bugs. --amk From ncoghlan at gmail.com Tue Jun 24 01:42:26 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 24 Jun 2014 09:42:26 +1000 Subject: [Python-Dev] Python 2.7 patch levels turning two digit In-Reply-To: <09813B4B-373A-4058-B5FE-887939E07B55@stufft.io> References: <53A55E05.5020906@egenix.com> <53A7C49C.1090107@v.loewis.de> <53A87FB3.2000100@egenix.com> <91E82F8F-339A-403C-8EEA-997E27FBEE59@stufft.io> <53A896F6.2030401@egenix.com> <685B2505-BABC-41DE-9A2E-BE64D2CE6AF8@stufft.io> <20140623172227.3c58884a@anarchist.wooz.org> <09813B4B-373A-4058-B5FE-887939E07B55@stufft.io> Message-ID: On 24 Jun 2014 07:29, "Donald Stufft" wrote: > > > On Jun 23, 2014, at 5:22 PM, Barry Warsaw wrote: > > > On Jun 23, 2014, at 05:15 PM, Donald Stufft wrote: > > > >> Normally when I see someone suggest that switching compilers > >> in 2.7.x is likely to be less work than releasing a 2.8 It normally > >> appears to me they haven?t looked at the impact on the packaging > >> tooling. > > > > Just to be clear, releasing a Python 2.8 has enormous impact outside of just > > the amount of work to do it. It's an exceedingly bad idea. > > Can you clarify? > > Also FWIW I?m not really married to the 2.8 thing, it?s mostly that, on Windows, the X.Y release > prior to the ABI thing in 3.x _was_ the ABI so all the tooling builds on that. So you need to > either > > 1) Stick with the old Compiler This is what we're going with. Steve is working on making that more manageable from the Visual Studio side, and there are some folks in the numeric/scientific community looking at improving the usability of the MinGW toolchain for the purpose of building Python 2.7 C extensions. > 2) Release 2.8 Impractical for the various reasons Barry listed. > 3) Do all the work to fix all the tooling to cope with the fact that X.Y isn?t the ABI on 2.x anymore Impractical for the various reasons you listed. > I don?t think a reasonable option is: > > 4) Just switch compilers and leave it on someone else?s doorsteps to fix the entire packaging > tool chain to cope. Agreed. We discussed this option in detail when the Stackless folks asked about it a while ago, and the conclusion was that the risk of obscure breakage was just too high. Cheers, Nick. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ezio.melotti at gmail.com Tue Jun 24 03:50:53 2014 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Tue, 24 Jun 2014 04:50:53 +0300 Subject: [Python-Dev] Tracker Stats In-Reply-To: <53A84D41.6070508@email.de> References: <53A84D41.6070508@email.de> Message-ID: On Mon, Jun 23, 2014 at 6:52 PM, francis wrote: > >> Hi, >> I added a new "stats" page to the bug tracker: >> http://bugs.python.org/issue?@template=stats > > Thanks Ezio, > > Two questions: > how hard would be to add (or enhance) a chart with the > ?open issues type enhancement? and ?open issues type bug? > info ? > Not particularly hard, but I won't have time to get back to this project for a while (contributions are welcomed though!). > In the summaries there is a link to ?Issues with patch?, > means that the ones not listed there are in ?needs patch? > or ?new? status? That summary lists all the issues with the "patch" keyword, and the ones not listed simply don't have it. The keyword is added automatically whenever an attachment is added to the issue, so there might be false positives (e.g. if the attachment is a script to reproduce the issue rather than a patch, or if the available patches are outdated). The might also be issues with patches that are not included in the summary (e.g. if someone accidentally removed the keyword), but that shouldn't be very common. >From the first graph you can see that out of the 4500+ open issues, about 2000 have a patch. We need more reviewers and committers :) Best Regards, Ezio Melotti > > Regards, > francis > From ezio.melotti at gmail.com Tue Jun 24 04:33:52 2014 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Tue, 24 Jun 2014 05:33:52 +0300 Subject: [Python-Dev] Tracker Stats In-Reply-To: <20140623222514.GA74324@datlandrewk.home> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> <20140623222514.GA74324@datlandrewk.home> Message-ID: On Tue, Jun 24, 2014 at 1:25 AM, A.M. Kuchling wrote: > On Mon, Jun 23, 2014 at 04:12:24PM -0400, R. David Murray wrote: >> The stats graphs are based on the data generated for the >> weekly issue report. I have a patched version of that >> report that adds the bug/enhancement info. > > After PyCon, I started working on a scraper that would produce a bunch > of different lists and charts. My ideas were: > > * pie charts of issues by status and type. > > * list or histogram of open library issues by module, perhaps limited to the > top N modules > We don't have module-specific tags yet (see the core-workflow ML for discussions about that), but I have other scripts that analyze all the patches and divide them by module. I didn't have time to integrate this in the tracker though. > * list of N oldest issues with no subsequent activity (the unreviewed ones) > You can search for issues with only one message: http://bugs.python.org/issue?%40sort0=activity&%40sort1=&%40group0=&%40group1=&%40columns=title%2Cid%2Cactivity%2Cstatus&%40filter=status%2Cmessage_count&status=1&message_count=1&%40pagesize=50&%40startwith=0 > * list of N people with the most open issues assigned to them > And then poke them with a goad until they fix them? :) > The idea is to provide charts that help us direct effort to particular > subsets of bugs. > If someone wants to experiment with and/or improve the tracker stats, this is how it works: 1) The roundup-summary script [0] analyzes the issues once a week and produce the weekly report and a static JSON file [1]; 2) The stats page [2] request the JSON file and uses the data to generate the charts client-side. Now there are two ways to improve it: 1) the easy way is just to use the roundup-summary script to expose more of its data or to find new ones and add them to the JSON file (and possibly to the summary too); 2) the hard way is to decouple the roundup-summary and the stats page and either make another weekly (or daily/hourly) script to generate the JSON file, or a template page that generates the data in real-time. Once the data are in the JSON file is quite easy to use jqPlot [4] to make any kind of charts. Keep in mind that some things are trivial to get out from the DB (e.g. number of issues for each status/type), but other things are a bit more complicated (e.g. things involving specific periods of time) and currently the roundup-summary takes a few minutes to analyze all the issues. I also tried to include just a few useful charts on the stats page -- at first I had several more charts but then I removed them. Feel free to ping me on IRC (#python-dev at Freenode) if you have questions. Best Regards, Ezio Melotti [0]: http://hg.python.org/tracker/python-dev/file/default/scripts/roundup-summary [1]: http://bugs.python.org/@@file/issue.stats.json [2]: http://hg.python.org/tracker/python-dev/file/bbbe6c190a99/html/issue.stats.html#l20 [3]: http://www.jqplot.com/tests/ > --amk From storchaka at gmail.com Tue Jun 24 10:22:11 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 24 Jun 2014 11:22:11 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 Message-ID: I submitted a number of patches which fixes currently broken Unicode-disabled build of Python 2.7 (built with --disable-unicode configure option). I suppose this was broken in 2.7 when C implementation of the io module was introduced. http://bugs.python.org/issue21833 -- main patch which fixes the io module and adds helpers for testing. http://bugs.python.org/issue21834 -- a lot of minor fixes for tests. Following issues fix different modules and related tests: http://bugs.python.org/issue21854 -- cookielib http://bugs.python.org/issue21838 -- ctypes http://bugs.python.org/issue21855 -- decimal http://bugs.python.org/issue21839 -- distutils http://bugs.python.org/issue21843 -- doctest http://bugs.python.org/issue21851 -- gettext http://bugs.python.org/issue21844 -- HTMLParser http://bugs.python.org/issue21850 -- httplib and SimpleHTTPServer http://bugs.python.org/issue21842 -- IDLE http://bugs.python.org/issue21853 -- inspect http://bugs.python.org/issue21848 -- logging http://bugs.python.org/issue21849 -- multiprocessing http://bugs.python.org/issue21852 -- optparse http://bugs.python.org/issue21840 -- os.path http://bugs.python.org/issue21845 -- plistlib http://bugs.python.org/issue21836 -- sqlite3 http://bugs.python.org/issue21837 -- tarfile http://bugs.python.org/issue21835 -- Tkinter http://bugs.python.org/issue21847 -- xmlrpc http://bugs.python.org/issue21841 -- xml.sax http://bugs.python.org/issue21846 -- zipfile Most fixes are trivial and are only several lines of a code. From victor.stinner at gmail.com Tue Jun 24 10:55:21 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 24 Jun 2014 10:55:21 +0200 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: Message-ID: Hi, I don't know anyone building Python without Unicode. I would prefer to modify configure to raise an error, and drop #ifdef in the code. (Stop supporting building Python 2 without Unicode.) Building Python 2 without Unicode support is not an innocent change. Python is moving strongly to Unicode: Python 3 uses Unicode by default. So to me it sounds really weird to work on building Python 2 without Unicode support. It means that you may have "Python 2" and "Python 2 without Unicode" which are not exactly the same language. IMO u"unicode" is part of the Python 2 language. --disable-unicode is an old option added while Python 1.5 was very slowly moving to Unicode. I have the same opinion on --without-thread option (we should stop supporting it, this option is useless). I worked in the embedded world, Python used for the UI of a TV set top box. Even if the hardware was slow and old, Python was compiled with threads and Unicode. Unicode was mandatory to handle correctly letters with diacritics, threads were used to handle network and D-Bus for examples. Victor 2014-06-24 10:22 GMT+02:00 Serhiy Storchaka : > I submitted a number of patches which fixes currently broken > Unicode-disabled build of Python 2.7 (built with --disable-unicode configure > option). I suppose this was broken in 2.7 when C implementation of the io > module was introduced. > > http://bugs.python.org/issue21833 -- main patch which fixes the io module > and adds helpers for testing. > > http://bugs.python.org/issue21834 -- a lot of minor fixes for tests. > > Following issues fix different modules and related tests: > > http://bugs.python.org/issue21854 -- cookielib > http://bugs.python.org/issue21838 -- ctypes > http://bugs.python.org/issue21855 -- decimal > http://bugs.python.org/issue21839 -- distutils > http://bugs.python.org/issue21843 -- doctest > http://bugs.python.org/issue21851 -- gettext > http://bugs.python.org/issue21844 -- HTMLParser > http://bugs.python.org/issue21850 -- httplib and SimpleHTTPServer > http://bugs.python.org/issue21842 -- IDLE > http://bugs.python.org/issue21853 -- inspect > http://bugs.python.org/issue21848 -- logging > http://bugs.python.org/issue21849 -- multiprocessing > http://bugs.python.org/issue21852 -- optparse > http://bugs.python.org/issue21840 -- os.path > http://bugs.python.org/issue21845 -- plistlib > http://bugs.python.org/issue21836 -- sqlite3 > http://bugs.python.org/issue21837 -- tarfile > http://bugs.python.org/issue21835 -- Tkinter > http://bugs.python.org/issue21847 -- xmlrpc > http://bugs.python.org/issue21841 -- xml.sax > http://bugs.python.org/issue21846 -- zipfile > > Most fixes are trivial and are only several lines of a code. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com From skip at pobox.com Tue Jun 24 13:04:41 2014 From: skip at pobox.com (Skip Montanaro) Date: Tue, 24 Jun 2014 06:04:41 -0500 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: Message-ID: I can't see any reason to make a backwards-incompatible change to Python 2 to only support Unicode. You're bound to break somebody's setup. Wouldn't it be better to fix bugs as Serhiy has done? Skip From antoine at python.org Tue Jun 24 13:47:37 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 24 Jun 2014 07:47:37 -0400 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: Message-ID: Le 24/06/2014 07:04, Skip Montanaro a ?crit : > I can't see any reason to make a backwards-incompatible change to > Python 2 to only support Unicode. You're bound to break somebody's > setup. Apparently, that setup would already have been broken for years. Regards Antoine. From victor.stinner at gmail.com Tue Jun 24 13:50:25 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 24 Jun 2014 13:50:25 +0200 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: Message-ID: 2014-06-24 13:04 GMT+02:00 Skip Montanaro : > I can't see any reason to make a backwards-incompatible change to > Python 2 to only support Unicode. You're bound to break somebody's > setup. Wouldn't it be better to fix bugs as Serhiy has done? According to the long list of issues, I don't think that it's possible to compile and use Python stdlib when Python is compiled without Unicode support. So I'm not sure that we can say that it's an backward-incompatible change. Who is somebody? Who compiles Python without Unicode support? Which version of Python? With Python 2.6, ./configure --disable-unicode fails with: "checking what type to use for unicode... configure: error: invalid value for --enable-unicode. Use either ucs2 or ucs4 (lowercase)." So I'm not sure that anyone used this option recently. The configure script was fixed 2 years ago in Python 2.7 (2 years after the release of Python 2.7.0): http://hg.python.org/cpython/rev/d7aff4423172 http://bugs.python.org/issue21833 "./configure --disable-unicode" works on Python 2.5.6: unicode type doesn't exist, and u'abc' is a bytes string. It works with Python 2.7.7+ too. Victor From storchaka at gmail.com Tue Jun 24 14:10:07 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 24 Jun 2014 15:10:07 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: Message-ID: 24.06.14 14:50, Victor Stinner ???????(??): > 2014-06-24 13:04 GMT+02:00 Skip Montanaro : >> I can't see any reason to make a backwards-incompatible change to >> Python 2 to only support Unicode. You're bound to break somebody's >> setup. Wouldn't it be better to fix bugs as Serhiy has done? > > According to the long list of issues, I don't think that it's possible > to compile and use Python stdlib when Python is compiled without > Unicode support. So I'm not sure that we can say that it's an > backward-incompatible change. Python has about 300 modules, my patches fix about 30 modules (only 8 of them cause compiling error). And that's almost all. Left only pickle, json, etree, email and unicode-specific modules (codecs, unicodedata and encodings). Besides pickle I'm not sure that others can be fixed. The fact that only small fraction of modules needs fixes means that Python without unicode support can be pretty usable. The main problem was with testing itself. Test suite depends on tempfile, which now uses io.open, which didn't work without unicode support (at least since 2.7). From tjreedy at udel.edu Tue Jun 24 16:24:01 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 24 Jun 2014 10:24:01 -0400 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: Message-ID: On 6/24/2014 4:22 AM, Serhiy Storchaka wrote: > I submitted a number of patches which fixes currently broken > Unicode-disabled build of Python 2.7 (built with --disable-unicode > configure option). I suppose this was broken in 2.7 when C > implementation of the io module was introduced. > > http://bugs.python.org/issue21833 -- main patch which fixes the io > module and adds helpers for testing. > > http://bugs.python.org/issue21834 -- a lot of minor fixes for tests. > > Following issues fix different modules and related tests: This list and more to follow suggests that --disable-unicode was somewhat broken long before 2.7 and the introduction of _io. > http://bugs.python.org/issue21854 -- cookielib > http://bugs.python.org/issue21838 -- ctypes > http://bugs.python.org/issue21855 -- decimal > http://bugs.python.org/issue21839 -- distutils > http://bugs.python.org/issue21843 -- doctest > http://bugs.python.org/issue21851 -- gettext > http://bugs.python.org/issue21844 -- HTMLParser > http://bugs.python.org/issue21850 -- httplib and SimpleHTTPServer > http://bugs.python.org/issue21842 -- IDLE > http://bugs.python.org/issue21853 -- inspect > http://bugs.python.org/issue21848 -- logging > http://bugs.python.org/issue21849 -- multiprocessing > http://bugs.python.org/issue21852 -- optparse > http://bugs.python.org/issue21840 -- os.path > http://bugs.python.org/issue21845 -- plistlib > http://bugs.python.org/issue21836 -- sqlite3 > http://bugs.python.org/issue21837 -- tarfile > http://bugs.python.org/issue21835 -- Tkinter > http://bugs.python.org/issue21847 -- xmlrpc > http://bugs.python.org/issue21841 -- xml.sax > http://bugs.python.org/issue21846 -- zipfile > > Most fixes are trivial and are only several lines of a code. > -- Terry Jan Reedy From benjamin at python.org Tue Jun 24 18:06:10 2014 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 24 Jun 2014 09:06:10 -0700 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: Message-ID: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> If Serhiy wants to spend his time supporting this arcane feature, he can do that. It doesn't really seem worth risking regressions to do this, though. On Tue, Jun 24, 2014, at 01:55, Victor Stinner wrote: > Hi, > > I don't know anyone building Python without Unicode. I would prefer to > modify configure to raise an error, and drop #ifdef in the code. (Stop > supporting building Python 2 without Unicode.) > > Building Python 2 without Unicode support is not an innocent change. > Python is moving strongly to Unicode: Python 3 uses Unicode by > default. So to me it sounds really weird to work on building Python 2 > without Unicode support. It means that you may have "Python 2" and > "Python 2 without Unicode" which are not exactly the same language. > IMO u"unicode" is part of the Python 2 language. > > --disable-unicode is an old option added while Python 1.5 was very > slowly moving to Unicode. > > I have the same opinion on --without-thread option (we should stop > supporting it, this option is useless). I worked in the embedded > world, Python used for the UI of a TV set top box. Even if the > hardware was slow and old, Python was compiled with threads and > Unicode. Unicode was mandatory to handle correctly letters with > diacritics, threads were used to handle network and D-Bus for > examples. > > Victor > > > 2014-06-24 10:22 GMT+02:00 Serhiy Storchaka : > > I submitted a number of patches which fixes currently broken > > Unicode-disabled build of Python 2.7 (built with --disable-unicode configure > > option). I suppose this was broken in 2.7 when C implementation of the io > > module was introduced. > > > > http://bugs.python.org/issue21833 -- main patch which fixes the io module > > and adds helpers for testing. > > > > http://bugs.python.org/issue21834 -- a lot of minor fixes for tests. > > > > Following issues fix different modules and related tests: > > > > http://bugs.python.org/issue21854 -- cookielib > > http://bugs.python.org/issue21838 -- ctypes > > http://bugs.python.org/issue21855 -- decimal > > http://bugs.python.org/issue21839 -- distutils > > http://bugs.python.org/issue21843 -- doctest > > http://bugs.python.org/issue21851 -- gettext > > http://bugs.python.org/issue21844 -- HTMLParser > > http://bugs.python.org/issue21850 -- httplib and SimpleHTTPServer > > http://bugs.python.org/issue21842 -- IDLE > > http://bugs.python.org/issue21853 -- inspect > > http://bugs.python.org/issue21848 -- logging > > http://bugs.python.org/issue21849 -- multiprocessing > > http://bugs.python.org/issue21852 -- optparse > > http://bugs.python.org/issue21840 -- os.path > > http://bugs.python.org/issue21845 -- plistlib > > http://bugs.python.org/issue21836 -- sqlite3 > > http://bugs.python.org/issue21837 -- tarfile > > http://bugs.python.org/issue21835 -- Tkinter > > http://bugs.python.org/issue21847 -- xmlrpc > > http://bugs.python.org/issue21841 -- xml.sax > > http://bugs.python.org/issue21846 -- zipfile > > > > Most fixes are trivial and are only several lines of a code. > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org From francismb at email.de Tue Jun 24 20:43:41 2014 From: francismb at email.de (francis) Date: Tue, 24 Jun 2014 20:43:41 +0200 Subject: [Python-Dev] Tracker Stats In-Reply-To: References: <53A84D41.6070508@email.de> Message-ID: <53A9C6DD.7070505@email.de> On 06/24/2014 03:50 AM, Ezio Melotti wrote: >>From the first graph you can see that out of the 4500+ open issues, > about 2000 have a patch. One would like to start with the ones that are bugs ;-) and see some status line trying to drop to 0 (is that possible :-) ?) > We need more reviewers and committers :) more patch writers: yes, more patch reviewers: yes, more committers: ?? automate!! :-) Regards, francis From rdmurray at bitdance.com Tue Jun 24 20:58:09 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 24 Jun 2014 14:58:09 -0400 Subject: [Python-Dev] Tracker Stats In-Reply-To: <53A9C6DD.7070505@email.de> References: <53A84D41.6070508@email.de> <53A9C6DD.7070505@email.de> Message-ID: <20140624185809.ED50B250E00@webabinitio.net> On Tue, 24 Jun 2014 20:43:41 +0200, francis wrote: > On 06/24/2014 03:50 AM, Ezio Melotti wrote: > >>From the first graph you can see that out of the 4500+ open issues, > > about 2000 have a patch. > One would like to start with the ones that are bugs ;-) and see some > status line trying to drop to 0 (is that possible :-) ?) > > > We need more reviewers and committers :) > more patch writers: yes, > more patch reviewers: yes, Anyone can review patches, in case that isn't clear. > more committers: ?? automate!! :-) That's a goal of the python-workflow interest group. Unfortunately between billable work and GSOC mentoring I haven't had time to do much there lately. Our first goal is to make the review step easier to manage (know which patches really need review, be able to list patches where community review is thought to be complete) by improving the tracker, then we'll look at creating the patch gating system Nick has talked about previously. Still needs a committer to approve the patch, but it should increase the throughput considerably. In the meantime, something that would help would be if people would do reviews and say on the issue "I think this is commit ready" and have the issue moved to 'commit review' stage. Do that a few times where people who are already triagers/committers agree with you, and you'll get triage privileges on the tracker. --David From nad at acm.org Tue Jun 24 21:54:29 2014 From: nad at acm.org (Ned Deily) Date: Tue, 24 Jun 2014 12:54:29 -0700 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: In article <1403625970.6550.133062453.693ECDEA at webmail.messagingengine.com>, Benjamin Peterson wrote: > If Serhiy wants to spend his time supporting this arcane feature, he can > do that. It doesn't really seem worth risking regressions to do this, > though. That's why I'm concerned about applying these 20+ patches that touch many parts of the code base. I don't have any objection to the "arcane feature" per se and I appreciate the obvious effort that Serhiy put into the patches but, at this stage of the life of Python 2, our overriding concern should be stability. That's really why most users of Python 2.7 continue to use it. As I see it, maintenance mode is a promise from us to our users that we will try our best, in general, to only make changes that fix serious problems, either due to bugs in Python itself or changes in the external world (new OS releases, etc). We don't automatically fix all bugs. Any time we make a change, we're making an engineering decision with cost-benefit tradeoffs. The more lines of code changed, the greater the risk that we introduce new bugs; inadvertently adding regressions has been an issue over a number of the 2.7.x releases, including the most recent one. The cost-benefit of this set of changes seems to me to be: Costs: - Code changes in many modules: - careful review -> additional work for multiple core developers - careful testing on all platforms including this option that we don't currently test at all, AFAIK -> added work for platform experts - risk of regressions not caught prior to release, at worst requiring another early followup release -> added work for release team, third-party packagers, users - possibly making backporting of other issues more difficult due to merge conflicts - possible invalidation of waiting-for-review patches forcing patch refreshes and retests -> added work for potential contributors - possible invalidation of user local patches -> added work for users - may encourage use of an apparently little-used feature that has no equivalent in Python 3, another incentive to stay with Py2? Benefit: - Fixes documented feature that may be of benefit to users of Python in applications with very limited memory available, although there aren't any open issues from users requesting this (AFAIK). No benefit to the overwhelming majority of Python users, who only use Unicode-enabled builds. That just doesn't seem like a good trade-off to me. I'll certainly abide by the release manager's decision but I think we all need to be thinking more about these kinds of cost-benefit tradeoffs and recognize that there are often non-obvious costs of making changes, costs that can affect our entire community. Yes, we are committed to maintaining Python 2.7 for multiple years but that doesn't mean we have to fix every open issue or even most open issues. Any or all of the above costs may apply to any changes we make. For many of our users, the best maintenance policy for Python 2.7 would be the least change possible. -- Ned Deily, nad at acm.org From ethan at stoneleaf.us Tue Jun 24 22:10:48 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 24 Jun 2014 13:10:48 -0700 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: <53A9DB48.4050506@stoneleaf.us> On 06/24/2014 12:54 PM, Ned Deily wrote: > > Yes, we are committed to maintaining > Python 2.7 for multiple years but that doesn't mean we have to fix every > open issue or even most open issues. Any or all of the above costs may > apply to any changes we make. For many of our users, the best > maintenance policy for Python 2.7 would be the least change possible. +1 We need to keep 2.7 running, but we don't need to kill ourselves doing it. If a bug has been there for a while, the affected users are probably working around it by now. ;) -- ~Ethan~ From jimjjewett at gmail.com Tue Jun 24 23:03:27 2014 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Tue, 24 Jun 2014 14:03:27 -0700 (PDT) Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: Message-ID: <53a9e79f.455ce00a.4e6e.4680@mx.google.com> On 6/24/2014 4:22 AM, Serhiy Storchaka wrote: > I submitted a number of patches which fixes currently broken > Unicode-disabled build of Python 2.7 (built with --disable-unicode > configure option). I suppose this was broken in 2.7 when C > implementation of the io module was introduced. It has frequently been broken. Without a buildbot, it will continue to break. I have given at least a quick look at all your proposed changes; most are fixes to test code, such as skip decorators. People checked in tests without the right guards because it did work on their own builds, and on all stable buildbots. That will probably continue to happen unless/until a --disable-unicode buildbot is added. It would be good to fix the tests (and actual library issues). Unfortunately, some of the specifically proposed changes (such as defining and using _unicode instead of unicode within python code) look to me as though they would trigger problems in the normal build (where the unicode object *does* exist, but would no longer be used). Other changes, such as the use of \x escapes, appear correct, but make the tests harder to read -- and might end up removing a test for correct unicode funtionality across different spellings. Even if we assume that the tests are fine, and I'm just an idiot who misread them, the fact that there is any confusion means that these particular changes may be tricky enough to be for a bad tradeoff for 2.7. It *might* work if you could make a more focused change. For example, instead of leaving the 'unicode' name unbound, provide an object that simply returns false for isinstance and raises a UnicodeError for any other method call. Even *this* might be too aggressive to 2.7, but the fact that it would only appear in the --disable-unicode builds, and would make them more similar to the regular build are points in its favor. Before doing that, though, please document what the --disable-unicode mode is actually *supposed* to do when interacting with byte-streams that a standard defines as UTF-8. (For example, are the changes to _xml_dumps and _xml_loads at http://bugs.python.org/file35758/multiprocessing.patch correct, or do those functions assume they get bytes as input, or should the functions raise an exception any time they are called?) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From ncoghlan at gmail.com Wed Jun 25 01:15:27 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 25 Jun 2014 09:15:27 +1000 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: <53A9DB48.4050506@stoneleaf.us> References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> <53A9DB48.4050506@stoneleaf.us> Message-ID: On 25 Jun 2014 07:05, "Ethan Furman" wrote: > > On 06/24/2014 12:54 PM, Ned Deily wrote: >> >> >> Yes, we are committed to maintaining >> Python 2.7 for multiple years but that doesn't mean we have to fix every >> open issue or even most open issues. Any or all of the above costs may >> apply to any changes we make. For many of our users, the best >> maintenance policy for Python 2.7 would be the least change possible. > > > +1 > > We need to keep 2.7 running, but we don't need to kill ourselves doing it. If a bug has been there for a while, the affected users are probably working around it by now. ;) Aye, in this case, I'm in the "officially deprecate the feature" camp. Don't actively try to break it further, just slap a warning in the docs to say it is no longer a supported configuration. In my own personal case, I not only wasn't aware that there was still an option to turn off the Unicode support, but I also wouldn't really class a build with it turned off as still being Python. As Jim noted, there are quite a lot of APIs that don't make sense if there's no Unicode type available. Cheers, Nick. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip at pobox.com Wed Jun 25 14:20:49 2014 From: skip at pobox.com (Skip Montanaro) Date: Wed, 25 Jun 2014 07:20:49 -0500 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> <53A9DB48.4050506@stoneleaf.us> Message-ID: On Tue, Jun 24, 2014 at 6:15 PM, Nick Coghlan wrote: > Aye, in this case, I'm in the "officially deprecate the feature" camp. Definitely preferable to the suggestion to remove the configure flag. Skip From storchaka at gmail.com Wed Jun 25 14:55:35 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 25 Jun 2014 15:55:35 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: <53a9e79f.455ce00a.4e6e.4680@mx.google.com> References: <53a9e79f.455ce00a.4e6e.4680@mx.google.com> Message-ID: 25.06.14 00:03, Jim J. Jewett ???????(??): > It would be good to fix the tests (and actual library issues). > Unfortunately, some of the specifically proposed changes (such as > defining and using _unicode instead of unicode within python code) > look to me as though they would trigger problems in the normal build > (where the unicode object *does* exist, but would no longer be used). This is recomended by MvL [1] and widely used (19 times in source code) idiom. [1] http://bugs.python.org/issue8767#msg159473 > Other changes, such as the use of \x escapes, appear correct, but make > the tests harder to read -- and might end up removing a test for > correct unicode funtionality across different spellings. > > Even if we assume that the tests are fine, and I'm just an idiot who > misread them, the fact that there is any confusion means that these > particular changes may be tricky enough to be for a bad tradeoff for 2.7. > > It *might* work if you could make a more focused change. For example, > instead of leaving the 'unicode' name unbound, provide an object that > simply returns false for isinstance and raises a UnicodeError for any > other method call. Even *this* might be too aggressive to 2.7, but the > fact that it would only appear in the --disable-unicode builds, and > would make them more similar to the regular build are points in its > favor. No, existing code use different approach. "unicode" doesn't exist, while encode/decode methods exist but are useless. If my memory doesn't fail me, there is even special explanatory comment about this historical decision somewhere. This decision was made many years ago. > Before doing that, though, please document what the --disable-unicode > mode is actually *supposed* to do when interacting with byte-streams > that a standard defines as UTF-8. (For example, are the changes to > _xml_dumps and _xml_loads at > http://bugs.python.org/file35758/multiprocessing.patch > correct, or do those functions assume they get bytes as input, or > should the functions raise an exception any time they are called?) Looking more carefully, I see that there is a bug in unicode-enable build (wrong backporting from 3.x). In 2.x xmlrpclib.dumps produces already utf-8 encoded string, in 3.x xmlrpc.client.dumps produces unicode string. multiprocessing should fail with non-ascii str or unicode. Side benefit of my patches is that they expose existing errors in unicode-enable build. From storchaka at gmail.com Wed Jun 25 14:58:02 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 25 Jun 2014 15:58:02 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: 24.06.14 22:54, Ned Deily ???????(??): > Benefit: > - Fixes documented feature that may be of benefit to users of Python in > applications with very limited memory available, although there aren't > any open issues from users requesting this (AFAIK). No benefit to the > overwhelming majority of Python users, who only use Unicode-enabled > builds. Other benefit: patches exposed several bugs in code (mainly errors in backporting from 3.x). From victor.stinner at gmail.com Wed Jun 25 15:29:01 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 25 Jun 2014 15:29:01 +0200 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: 2014-06-25 14:58 GMT+02:00 Serhiy Storchaka : > 24.06.14 22:54, Ned Deily ???????(??): > >> Benefit: >> - Fixes documented feature that may be of benefit to users of Python in >> applications with very limited memory available, although there aren't >> any open issues from users requesting this (AFAIK). No benefit to the >> overwhelming majority of Python users, who only use Unicode-enabled >> builds. > > > Other benefit: patches exposed several bugs in code (mainly errors in > backporting from 3.x). Oh, interesting. Do you have examples of such bugs? Victor From storchaka at gmail.com Wed Jun 25 16:00:42 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 25 Jun 2014 17:00:42 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: 25.06.14 16:29, Victor Stinner ???????(??): > 2014-06-25 14:58 GMT+02:00 Serhiy Storchaka : >> Other benefit: patches exposed several bugs in code (mainly errors in >> backporting from 3.x). > > Oh, interesting. Do you have examples of such bugs? In posixpath branches for unicode and str should be reversed. In multiprocessing .encode('utf-8') is applied on utf-8 encoded str (this is unicode string in Python 3). And there is similar error in at least one other place. Tests for bytearray actually test bytes, not bytearray. That is what I remember. From nad at acm.org Wed Jun 25 20:35:36 2014 From: nad at acm.org (Ned Deily) Date: Wed, 25 Jun 2014 11:35:36 -0700 Subject: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented References: <3gz1lK1lYkz7Lk0@mail.python.org> Message-ID: In article <3gz1lK1lYkz7Lk0 at mail.python.org>, jesus.cea wrote: > http://hg.python.org/cpython/rev/cf156cfb12e7 > changeset: 91398:cf156cfb12e7 > branch: 3.3 > parent: 91384:92d691c3ca00 > user: Jesus Cea > date: Wed Jun 25 13:05:31 2014 +0200 > summary: > Closes #20872: dbm/gdbm/ndbm close methods are not documented The 3.3 branch is open only to security fixes. Please don't backport other patches to there. https://docs.python.org/devguide/devcycle.html#summary -- Ned Deily, nad at acm.org From jcea at jcea.es Thu Jun 26 00:56:39 2014 From: jcea at jcea.es (Jesus Cea) Date: Thu, 26 Jun 2014 00:56:39 +0200 Subject: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented In-Reply-To: References: <3gz1lK1lYkz7Lk0@mail.python.org> Message-ID: <53AB53A7.6050403@jcea.es> On 25/06/14 20:35, Ned Deily wrote: > The 3.3 branch is open only to security fixes. Please don't backport > other patches to there. > > https://docs.python.org/devguide/devcycle.html#summary Ned, I am aware. It is a doc-only fix, like fixing a typo or correcting an incorrect statement. It that is against policy, let me know. That said, looks like 3.3 documentation is not "sphinxed" anymore to the webpage, so that actually makes the point high and clear. I have a browser tab open to check a 3.3 doc fix and it is not showing. Thanks for the heads up. Sorry for the inconvenience. -- Jes?s Cea Avi?n _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 538 bytes Desc: OpenPGP digital signature URL: From ncoghlan at gmail.com Thu Jun 26 01:28:35 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 26 Jun 2014 09:28:35 +1000 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: On 26 Jun 2014 01:13, "Serhiy Storchaka" wrote: > > 25.06.14 16:29, Victor Stinner ???????(??): >> >> 2014-06-25 14:58 GMT+02:00 Serhiy Storchaka : >>> >>> Other benefit: patches exposed several bugs in code (mainly errors in >>> backporting from 3.x). >> >> >> Oh, interesting. Do you have examples of such bugs? > > > In posixpath branches for unicode and str should be reversed. > In multiprocessing .encode('utf-8') is applied on utf-8 encoded str (this is unicode string in Python 3). And there is similar error in at least one other place. Tests for bytearray actually test bytes, not bytearray. That is what I remember. OK, *that* sounds like an excellent reason to keep the Unicode disabled builds functional, and make sure they stay that way with a buildbot: to help make sure we're not accidentally running afoul of the implicit interoperability between str and unicode when backporting fixes from Python 3. Helping to ensure correct handling of str values makes this capability something of benefit to *all* Python 2 users, not just those that turn off the Unicode support. It also makes it a potentially useful testing tool when assessing str/unicode handling in general. Regards, Nick. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From nad at acm.org Thu Jun 26 08:38:46 2014 From: nad at acm.org (Ned Deily) Date: Wed, 25 Jun 2014 23:38:46 -0700 Subject: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented References: <3gz1lK1lYkz7Lk0@mail.python.org> <53AB53A7.6050403@jcea.es> Message-ID: In article <53AB53A7.6050403 at jcea.es>, Jesus Cea wrote: > On 25/06/14 20:35, Ned Deily wrote: > > The 3.3 branch is open only to security fixes. Please don't backport > > other patches to there. > > > > https://docs.python.org/devguide/devcycle.html#summary > > Ned, I am aware. It is a doc-only fix, like fixing a typo or correcting > an incorrect statement. It that is against policy, let me know. My understanding is that doc changes are treated the same as any other code changes. As you noticed, after a release leaves maintenance mode, its documentation is no longer updated on the web site. -- Ned Deily, nad at acm.org From storchaka at gmail.com Thu Jun 26 09:15:06 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 26 Jun 2014 10:15:06 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: 26.06.14 02:28, Nick Coghlan ???????(??): > OK, *that* sounds like an excellent reason to keep the Unicode disabled > builds functional, and make sure they stay that way with a buildbot: to > help make sure we're not accidentally running afoul of the implicit > interoperability between str and unicode when backporting fixes from > Python 3. > > Helping to ensure correct handling of str values makes this capability > something of benefit to *all* Python 2 users, not just those that turn > off the Unicode support. It also makes it a potentially useful testing > tool when assessing str/unicode handling in general. Do you want to make some patch reviews? From antoine at python.org Thu Jun 26 13:04:53 2014 From: antoine at python.org (Antoine Pitrou) Date: Thu, 26 Jun 2014 07:04:53 -0400 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: Le 25/06/2014 19:28, Nick Coghlan a ?crit : > > OK, *that* sounds like an excellent reason to keep the Unicode disabled > builds functional, and make sure they stay that way with a buildbot: to > help make sure we're not accidentally running afoul of the implicit > interoperability between str and unicode when backporting fixes from > Python 3. > > Helping to ensure correct handling of str values makes this capability > something of benefit to *all* Python 2 users, not just those that turn > off the Unicode support. It also makes it a potentially useful testing > tool when assessing str/unicode handling in general. Hmmm... From my perspective, trying to enforce unicode-disabled builds will only lower the (already low) chance that I may want to write / backport bug fixes for 2.7. For the same reason, I agree with Victor that we should ditch the threading-disabled builds. It's too much of a hassle for no actual, practical benefit. People who want a threadless unicodeless Python can install Python 1.5.2 for all I care. Regards Antoine. From rosuav at gmail.com Thu Jun 26 14:49:40 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 26 Jun 2014 22:49:40 +1000 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: On Thu, Jun 26, 2014 at 9:04 PM, Antoine Pitrou wrote: > For the same reason, I agree with Victor that we should ditch the > threading-disabled builds. It's too much of a hassle for no actual, > practical benefit. People who want a threadless unicodeless Python can > install Python 1.5.2 for all I care. Or some other implementation of Python. It's looking like micropython will be permanently supporting a non-Unicode build (although I stepped away from the project after a strong disagreement over what would and would not make sense, and haven't been following it since). If someone wants a Python that doesn't have stuff that the core CPython devs treat as essential, s/he probably wants something like uPy anyway. ChrisA From benjamin at python.org Thu Jun 26 18:21:46 2014 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 26 Jun 2014 09:21:46 -0700 Subject: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented In-Reply-To: References: <3gz1lK1lYkz7Lk0@mail.python.org> <53AB53A7.6050403@jcea.es> Message-ID: <1403799706.1921.134890741.5AFCC6E2@webmail.messagingengine.com> On Wed, Jun 25, 2014, at 23:38, Ned Deily wrote: > In article <53AB53A7.6050403 at jcea.es>, Jesus Cea wrote: > > > On 25/06/14 20:35, Ned Deily wrote: > > > The 3.3 branch is open only to security fixes. Please don't backport > > > other patches to there. > > > > > > https://docs.python.org/devguide/devcycle.html#summary > > > > Ned, I am aware. It is a doc-only fix, like fixing a typo or correcting > > an incorrect statement. It that is against policy, let me know. > > My understanding is that doc changes are treated the same as any other > code changes. As you noticed, after a release leaves maintenance mode, > its documentation is no longer updated on the web site. To echo Ned, committing a doc change to 3.3 isn't the end of the world. We just want to make sure energy is focused on the 3 branches we do fully maintain. From petertbrady at gmail.com Thu Jun 26 17:38:50 2014 From: petertbrady at gmail.com (Peter Brady) Date: Thu, 26 Jun 2014 09:38:50 -0600 Subject: [Python-Dev] C version of functools.lru_cache Message-ID: Hello python devs, I was recently in need of some faster caching and thought this would be a good opportunity to familiarize myself with the Python/C api so I wrote a C extension for the lru_cache in functools. The source is at https://github.com/pbrady/fastcache.git and I've posted it as a package on PyPI (fastcache). There are some simple benchmarks on the github page showing about 9x speedup. I would like to submit this for incorporation into the standard library. Is there any interest in this? I suspect it probably requires some changes/cleanup especially since I haven't addressed thread-safety at all. Thanks, Peter. P.S. This was the motivation for the faster caching https://github.com/sympy/sympy/pull/7464. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Thu Jun 26 18:33:29 2014 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 26 Jun 2014 09:33:29 -0700 Subject: [Python-Dev] C version of functools.lru_cache In-Reply-To: References: Message-ID: <1403800409.4541.134895341.6BD26137@webmail.messagingengine.com> You might look at https://bugs.python.org/issue14373 On Thu, Jun 26, 2014, at 08:38, Peter Brady wrote: > Hello python devs, > > I was recently in need of some faster caching and thought this would be a > good opportunity to familiarize myself with the Python/C api so I wrote a > C > extension for the lru_cache in functools. The source is at > https://github.com/pbrady/fastcache.git and I've posted it as a package > on > PyPI (fastcache). There are some simple benchmarks on the github page > showing about 9x speedup. I would like to submit this for incorporation > into the standard library. Is there any interest in this? I suspect it > probably requires some changes/cleanup especially since I haven't > addressed > thread-safety at all. > > Thanks, > Peter. > > P.S. This was the motivation for the faster caching > https://github.com/sympy/sympy/pull/7464. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org From petertbrady at gmail.com Thu Jun 26 19:23:06 2014 From: petertbrady at gmail.com (Peter Brady) Date: Thu, 26 Jun 2014 11:23:06 -0600 Subject: [Python-Dev] C version of functools.lru_cache In-Reply-To: <1403800409.4541.134895341.6BD26137@webmail.messagingengine.com> References: <1403800409.4541.134895341.6BD26137@webmail.messagingengine.com> Message-ID: Looks like it's already in the works! Nevermind On Thu, Jun 26, 2014 at 10:33 AM, Benjamin Peterson wrote: > You might look at https://bugs.python.org/issue14373 > > On Thu, Jun 26, 2014, at 08:38, Peter Brady wrote: > > Hello python devs, > > > > I was recently in need of some faster caching and thought this would be a > > good opportunity to familiarize myself with the Python/C api so I wrote a > > C > > extension for the lru_cache in functools. The source is at > > https://github.com/pbrady/fastcache.git and I've posted it as a package > > on > > PyPI (fastcache). There are some simple benchmarks on the github page > > showing about 9x speedup. I would like to submit this for incorporation > > into the standard library. Is there any interest in this? I suspect it > > probably requires some changes/cleanup especially since I haven't > > addressed > > thread-safety at all. > > > > Thanks, > > Peter. > > > > P.S. This was the motivation for the faster caching > > https://github.com/sympy/sympy/pull/7464. > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregory.szorc at gmail.com Thu Jun 26 20:34:03 2014 From: gregory.szorc at gmail.com (Gregory Szorc) Date: Thu, 26 Jun 2014 11:34:03 -0700 Subject: [Python-Dev] Binary CPython distribution for Linux Message-ID: <53AC679B.1000408@gmail.com> I'm an advocate of getting users and projects to move to modern Python versions. I believe dropping support for end-of-lifed Python versions is important for the health of the Python community. If you've done any amount of Python 3 porting work, you know things get much harder the more 2.x legacy versions you need to support. I led the successful charge to drop support for Python 2.6 and below from Firefox's build system. I failed to win the argument that Mercurial should drop 2.4 and 2.5 [1]. A few years ago, I started a similar conversation with the LLVM project [2]. I wrote a blog post on the subject [3] that even got Slashdotted [4] (although I don't think that's the honor it was a decade ago). While much of the opposition to dropping Python <2.7 stems from the RHEL community (they still have 2.4 in extended support and 2.7 wasn't in a release until a few weeks ago), a common objection from the users is "I can't install a different Python" or "it's too difficult to install a different Python." The former is a legit complaint - if you are on shared hosting and don't have root, as easy as it is to add an alternate package repository that provides 2.7 (or newer), you don't have the permissions so you can't do it. This leaves users with attempting a userland install of Python. Personally, I think installing Python in userland is relatively simple. Tools like pyenv make this turnkey. Worst case you fall back to configure + make. But I'm an experienced developer and have a compiler toolchain and library dependencies on my machine. What about less experienced users or people that don't have the necessary build dependencies? And, even if they do manage to find or build a Python distribution, we all know that there's enough finicky behavior with things like site-packages default paths to cause many headaches, even for experienced Python hackers. I'd like to propose a solution to this problem: a pre-built distribution of CPython for Linux available via www.python.org in the list of downloads for a particular release [5]. This distribution could be downloaded and unarchived into the user's home directory and users could start running it immediately by setting an environment variable or two, creating a symlink, or even running a basic installer script. This would hopefully remove the hurdles of obtaining a (sane) Python distribution on Linux. This would allow projects to more easily drop end-of-life Python versions and would speed adoption of modern Python, including Python 3 (because porting is much easier if you only have to target 2.7). I understand there may be technical challenges with doing this for some distributions and with producing a universal binary distribution. I would settle for a binary distribution that was targeted towards RHEL users and variant distros, as that is the user population that I perceive to be the most conservative and responsible for holding modern Python adoption back. [1] http://permalink.gmane.org/gmane.comp.version-control.mercurial.devel/68902 [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/056545.html [3] http://gregoryszorc.com/blog/2014/01/08/why-do-projects-support-old-python-releases/ [4] http://developers.slashdot.org/story/14/01/09/1940232/why-do-projects-continue-to-support-old-python-releases [5] https://www.python.org/download/releases/2.7.7/ From joseph.martinot-lagarde at m4x.org Thu Jun 26 21:23:10 2014 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Thu, 26 Jun 2014 21:23:10 +0200 Subject: [Python-Dev] Binary CPython distribution for Linux In-Reply-To: <53AC679B.1000408@gmail.com> References: <53AC679B.1000408@gmail.com> Message-ID: <53AC731E.5010604@m4x.org> Le 26/06/2014 20:34, Gregory Szorc a ?crit : > I'm an advocate of getting users and projects to move to modern Python > versions. I believe dropping support for end-of-lifed Python versions is > important for the health of the Python community. If you've done any > amount of Python 3 porting work, you know things get much harder the > more 2.x legacy versions you need to support. > > I led the successful charge to drop support for Python 2.6 and below > from Firefox's build system. I failed to win the argument that Mercurial > should drop 2.4 and 2.5 [1]. A few years ago, I started a similar > conversation with the LLVM project [2]. I wrote a blog post on the > subject [3] that even got Slashdotted [4] (although I don't think that's > the honor it was a decade ago). > > While much of the opposition to dropping Python <2.7 stems from the RHEL > community (they still have 2.4 in extended support and 2.7 wasn't in a > release until a few weeks ago), a common objection from the users is "I > can't install a different Python" or "it's too difficult to install a > different Python." The former is a legit complaint - if you are on > shared hosting and don't have root, as easy as it is to add an alternate > package repository that provides 2.7 (or newer), you don't have the > permissions so you can't do it. > > This leaves users with attempting a userland install of Python. > Personally, I think installing Python in userland is relatively simple. > Tools like pyenv make this turnkey. Worst case you fall back to > configure + make. But I'm an experienced developer and have a compiler > toolchain and library dependencies on my machine. What about less > experienced users or people that don't have the necessary build > dependencies? And, even if they do manage to find or build a Python > distribution, we all know that there's enough finicky behavior with > things like site-packages default paths to cause many headaches, even > for experienced Python hackers. > > I'd like to propose a solution to this problem: a pre-built distribution > of CPython for Linux available via www.python.org in the list of > downloads for a particular release [5]. This distribution could be > downloaded and unarchived into the user's home directory and users could > start running it immediately by setting an environment variable or two, > creating a symlink, or even running a basic installer script. This would > hopefully remove the hurdles of obtaining a (sane) Python distribution > on Linux. This would allow projects to more easily drop end-of-life > Python versions and would speed adoption of modern Python, including > Python 3 (because porting is much easier if you only have to target 2.7). > > I understand there may be technical challenges with doing this for some > distributions and with producing a universal binary distribution. I > would settle for a binary distribution that was targeted towards RHEL > users and variant distros, as that is the user population that I > perceive to be the most conservative and responsible for holding modern > Python adoption back. > > [1] > http://permalink.gmane.org/gmane.comp.version-control.mercurial.devel/68902 > [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-December/056545.html > [3] > http://gregoryszorc.com/blog/2014/01/08/why-do-projects-support-old-python-releases/ > > [4] > http://developers.slashdot.org/story/14/01/09/1940232/why-do-projects-continue-to-support-old-python-releases > > [5] https://www.python.org/download/releases/2.7.7/ Just today I installed Anaconda (https://store.continuum.io/cshop/anaconda/) on Linux servers running CentOS 6.4. It installs in a directory anywhere in the filesystem (no need to be root), and using it globally is just a matter of prepending a folder to the PATH and it was done. Of course Anaconda is oriented towards scientific applications but it is a proof that a pre-build binary installer works and can be simple to use. If someone wants to try it without all scientific libraries they provide Miniconda (http://conda.pydata.org/miniconda.html) which contains only python and the python package manager conda. Joseph From a.cavallo at cavallinux.eu Thu Jun 26 22:00:38 2014 From: a.cavallo at cavallinux.eu (Antonio Cavallo) Date: Thu, 26 Jun 2014 21:00:38 +0100 Subject: [Python-Dev] Binary CPython distribution for Linux In-Reply-To: <53AC731E.5010604@m4x.org> References: <53AC679B.1000408@gmail.com> <53AC731E.5010604@m4x.org> Message-ID: <53AC7BE6.6060207@cavallinux.eu> I have a little pet project for building rpm of python 2.7 (it should be trivial to port to 3.x): https://build.opensuse.org/project/show/home:cavallo71:opt-python-modules If there's enough interest I can help to integrate with python.org. >> I understand there may be technical challenges with doing this for some >> distributions and with producing a universal binary distribution. Opensuse provides the vm to build binaries for multiple platforms already since a very long time. > Of course Anaconda is oriented towards scientific applications but it is > a proof that a pre-build binary installer works and can be simple to use. Rpm are the "blessed" way to instal software on linux: it supports what most sysadmin expect (easy to list the installed packages, easy to validate if tampering to a package occurred, which file belongs to a package? etc..). Anaconda might appeal some group of user, but for deployment company-wide rpm is the best technical solution given its integration in linux. I hope this helps, Antonio From joseph.martinot-lagarde at m4x.org Thu Jun 26 23:27:39 2014 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Thu, 26 Jun 2014 23:27:39 +0200 Subject: [Python-Dev] Binary CPython distribution for Linux In-Reply-To: <53AC7BE6.6060207@cavallinux.eu> References: <53AC679B.1000408@gmail.com> <53AC731E.5010604@m4x.org> <53AC7BE6.6060207@cavallinux.eu> Message-ID: <53AC904B.7090907@m4x.org> Le 26/06/2014 22:00, Antonio Cavallo a ?crit : > > Of course Anaconda is oriented towards scientific applications but it is > > a proof that a pre-build binary installer works and can be simple to > use. > > Rpm are the "blessed" way to instal software on linux: it supports what > most sysadmin expect (easy to list the installed packages, easy to > validate if tampering to a package occurred, which file belongs to a > package? etc..). > > Anaconda might appeal some group of user, but for deployment > company-wide rpm is the best technical solution given its integration in > linux. 1. Not all Linux distros use rpm (Debian, Ubuntu, Arch Linux...) 2. rpm need to be root to be installed. Btw, Anaconda is multiplatform and can be installed on Linux, Windows and Mac. Joseph From benhoyt at gmail.com Fri Jun 27 00:59:45 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 26 Jun 2014 18:59:45 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator Message-ID: Hi Python dev folks, I've written a PEP proposing a specific os.scandir() API for a directory iterator that returns the stat-like info from the OS, the main advantage of which is to speed up os.walk() and similar operations between 4-20x, depending on your OS and file system. Full details, background info, and context links are in the PEP, which Victor Stinner has uploaded at the following URL, and I've also copied inline below. http://legacy.python.org/dev/peps/pep-0471/ Would love feedback on the PEP, but also of course on the proposal itself. -Ben PEP: 471 Title: os.scandir() function -- a better and faster directory iterator Version: $Revision$ Last-Modified: $Date$ Author: Ben Hoyt Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-May-2014 Python-Version: 3.5 Abstract ======== This PEP proposes including a new directory iteration function, ``os.scandir()``, in the standard library. This new function adds useful functionality and increases the speed of ``os.walk()`` by 2-10 times (depending on the platform and file system) by significantly reducing the number of times ``stat()`` needs to be called. Rationale ========= Python's built-in ``os.walk()`` is significantly slower than it needs to be, because -- in addition to calling ``os.listdir()`` on each directory -- it executes the system call ``os.stat()`` or ``GetFileAttributes()`` on each file to determine whether the entry is a directory or not. But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- already tell you whether the files returned are directories or not, so no further system calls are needed. In short, you can reduce the number of system calls from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually much wider than they are deep, it's often much better than this.) In practice, removing all those extra system calls makes ``os.walk()`` about **8-9 times as fast on Windows**, and about **2-3 times as fast on Linux and Mac OS X**. So we're not talking about micro- optimizations. See more `benchmarks`_. .. _`benchmarks`: https://github.com/benhoyt/scandir#benchmarks Somewhat relatedly, many people (see Python `Issue 11406`_) are also keen on a version of ``os.listdir()`` that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories. So as well as providing a ``scandir()`` iterator function for calling directly, Python's existing ``os.walk()`` function could be sped up a huge amount. .. _`Issue 11406`: http://bugs.python.org/issue11406 Implementation ============== The implementation of this proposal was written by Ben Hoyt (initial version) and Tim Golden (who helped a lot with the C extension module). It lives on GitHub at `benhoyt/scandir`_. .. _`benhoyt/scandir`: https://github.com/benhoyt/scandir Note that this module has been used and tested (see "Use in the wild" section in this PEP), so it's more than a proof-of-concept. However, it is marked as beta software and is not extensively battle-tested. It will need some cleanup and more thorough testing before going into the standard library, as well as integration into `posixmodule.c`. Specifics of proposal ===================== Specifically, this PEP proposes adding a single function to the ``os`` module in the standard library, ``scandir``, that takes a single, optional string as its argument:: scandir(path='.') -> generator of DirEntry objects Like ``listdir``, ``scandir`` calls the operating system's directory iteration system calls to get the names of the files in the ``path`` directory, but it's different from ``listdir`` in two ways: * Instead of bare filename strings, it returns lightweight ``DirEntry`` objects that hold the filename string and provide simple methods that allow access to the stat-like data the operating system returned. * It returns a generator instead of a list, so that ``scandir`` acts as a true iterator instead of returning the full list immediately. ``scandir()`` yields a ``DirEntry`` object for each file and directory in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each ``DirEntry`` object has the following attributes and methods: * ``name``: the entry's filename, relative to ``path`` (corresponds to the return values of ``os.listdir``) * ``is_dir()``: like ``os.path.isdir()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``is_file()``: like ``os.path.isfile()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``is_symlink()``: like ``os.path.islink()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``lstat()``: like ``os.lstat()``, but requires no system calls on Windows The ``DirEntry`` attribute and method names were chosen to be the same as those in the new ``pathlib`` module for consistency. Notes on caching ---------------- The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute is obviously always cached, and the ``is_X`` and ``lstat`` methods cache their values (immediately on Windows via ``FindNextFile``, and on first use on Linux / OS X via a ``stat`` call) and never refetch from the system. For this reason, ``DirEntry`` objects are intended to be used and thrown away after iteration, not stored in long-lived data structured and the methods called again and again. If a user wants to do that (for example, for watching a file's size change), they'll need to call the regular ``os.lstat()`` or ``os.path.getsize()`` functions which force a new system call each time. Examples ======== Here's a good usage pattern for ``scandir``. This is in fact almost exactly how the scandir module's faster ``os.walk()`` implementation uses it:: dirs = [] non_dirs = [] for entry in scandir(path): if entry.is_dir(): dirs.append(entry) else: non_dirs.append(entry) The above ``os.walk()``-like code will be significantly using scandir on both Windows and Linux or OS X. Or, for getting the total size of files in a directory tree -- showing use of the ``DirEntry.lstat()`` method:: def get_tree_size(path): """Return total size of files in path and subdirs.""" size = 0 for entry in scandir(path): if entry.is_dir(): sub_path = os.path.join(path, entry.name) size += get_tree_size(sub_path) else: size += entry.lstat().st_size return size Note that ``get_tree_size()`` will get a huge speed boost on Windows, because no extra stat call are needed, but on Linux and OS X the size information is not returned by the directory iteration functions, so this function won't gain anything there. Support ======= The scandir module on GitHub has been forked and used quite a bit (see "Use in the wild" in this PEP), but there's also been a fair bit of direct support for a scandir-like function from core developers and others on the python-dev and python-ideas mailing lists. A sampling: * **Nick Coghlan**, a core Python developer: "I've had the local Red Hat release engineering team express their displeasure at having to stat every file in a network mounted directory tree for info that is present in the dirent structure, so a definite +1 to os.scandir from me, so long as it makes that info available." [`source1 `_] * **Tim Golden**, a core Python developer, supports scandir enough to have spent time refactoring and significantly improving scandir's C extension module. [`source2 `_] * **Christian Heimes**, a core Python developer: "+1 for something like yielddir()" [`source3 `_] and "Indeed! I'd like to see the feature in 3.4 so I can remove my own hack from our code base." [`source4 `_] * **Gregory P. Smith**, a core Python developer: "As 3.4beta1 happens tonight, this isn't going to make 3.4 so i'm bumping this to 3.5. I really like the proposed design outlined above." [`source5 `_] * **Guido van Rossum** on the possibility of adding scandir to Python 3.5 (as it was too late for 3.4): "The ship has likewise sailed for adding scandir() (whether to os or pathlib). By all means experiment and get it ready for consideration for 3.5, but I don't want to add it to 3.4." [`source6 `_] Support for this PEP itself (meta-support?) was given by Nick Coghlan on python-dev: "A PEP reviewing all this for 3.5 and proposing a specific os.scandir API would be a good thing." [`source7 `_] Use in the wild =============== To date, ``scandir`` is definitely useful, but has been clearly marked "beta", so it's uncertain how much use of it there is in the wild. Ben Hoyt has had several reports from people using it. For example: * Chris F: "I am processing some pretty large directories and was half expecting to have to modify getdents. So thanks for saving me the effort." [via personal email] * bschollnick: "I wanted to let you know about this, since I am using Scandir as a building block for this code. Here's a good example of scandir making a radical performance improvement over os.listdir." [`source8 `_] * Avram L: "I'm testing our scandir for a project I'm working on. Seems pretty solid, so first thing, just want to say nice work!" [via personal email] Others have `requested a PyPI package`_ for it, which has been created. See `PyPI package`_. .. _`requested a PyPI package`: https://github.com/benhoyt/scandir/issues/12 .. _`PyPI package`: https://pypi.python.org/pypi/scandir GitHub stats don't mean too much, but scandir does have several watchers, issues, forks, etc. Here's the run-down as of the stats as of June 5, 2014: * Watchers: 17 * Stars: 48 * Forks: 15 * Issues: 2 open, 19 closed **However, the much larger point is this:**, if this PEP is accepted, ``os.walk()`` can easily be reimplemented using ``scandir`` rather than ``listdir`` and ``stat``, increasing the speed of ``os.walk()`` very significantly. There are thousands of developers, scripts, and production code that would benefit from this large speedup of ``os.walk()``. For example, on GitHub, there are almost as many uses of ``os.walk`` (194,000) as there are of ``os.mkdir`` (230,000). Open issues and optional things =============================== There are a few open issues or optional additions: Should scandir be in its own module? ------------------------------------ Should the function be included in the standard library in a new module, ``scandir.scandir()``, or just as ``os.scandir()`` as discussed? The preference of this PEP's author (Ben Hoyt) would be ``os.scandir()``, as it's just a single function. Should there be a way to access the full path? ---------------------------------------------- Should ``DirEntry``'s have a way to get the full path without using ``os.path.join(path, entry.name)``? This is a pretty common pattern, and it may be useful to add pathlib-like ``str(entry)`` functionality. This functionality has also been requested in `issue 13`_ on GitHub. .. _`issue 13`: https://github.com/benhoyt/scandir/issues/13 Should it expose Windows wildcard functionality? ------------------------------------------------ Should ``scandir()`` have a way of exposing the wildcard functionality in the Windows ``FindFirstFile`` / ``FindNextFile`` functions? The scandir module on GitHub exposes this as a ``windows_wildcard`` keyword argument, allowing Windows power users the option to pass a custom wildcard to ``FindFirstFile``, which may avoid the need to use ``fnmatch`` or similar on the resulting names. It is named the unwieldly ``windows_wildcard`` to remind you you're writing power- user, Windows-only code if you use it. This boils down to whether ``scandir`` should be about exposing all of the system's directory iteration features, or simply providing a fast, simple, cross-platform directory iteration API. This PEP's author votes for not including ``windows_wildcard`` in the standard library version, because even though it could be useful in rare cases (say the Windows Dropbox client?), it'd be too easy to use it just because you're a Windows developer, and create code that is not cross-platform. Possible improvements ===================== There are many possible improvements one could make to scandir, but here is a short list of some this PEP's author has in mind: * scandir could potentially be further sped up by calling ``readdir`` / ``FindNextFile`` say 50 times per ``Py_BEGIN_ALLOW_THREADS`` block so that it stays in the C extension module for longer, and may be somewhat faster as a result. This approach hasn't been tested, but was suggested by on Issue 11406 by Antoine Pitrou. [`source9 `_] Previous discussion =================== * `Original thread Ben Hoyt started on python-ideas`_ about speeding up ``os.walk()`` * Python `Issue 11406`_, which includes the original proposal for a scandir-like function * `Further thread Ben Hoyt started on python-dev`_ that refined the ``scandir()`` API, including Nick Coghlan's suggestion of scandir yielding ``DirEntry``-like objects * `Final thread Ben Hoyt started on python-dev`_ to discuss the interaction between scandir and the new ``pathlib`` module * `Question on StackOverflow`_ about why ``os.walk()`` is slow and pointers on how to fix it (this inspired the author of this PEP early on) * `BetterWalk`_, this PEP's author's previous attempt at this, on which the scandir code is based .. _`Original thread Ben Hoyt started on python-ideas`: https://mail.python.org/pipermail/python-ideas/2012-November/017770.html .. _`Further thread Ben Hoyt started on python-dev`: https://mail.python.org/pipermail/python-dev/2013-May/126119.html .. _`Final thread Ben Hoyt started on python-dev`: https://mail.python.org/pipermail/python-dev/2013-November/130572.html .. _`Question on StackOverflow`: http://stackoverflow.com/questions/2485719/very-quickly-getting-total-size-of-folder .. _`BetterWalk`: https://github.com/benhoyt/betterwalk Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From python at mrabarnett.plus.com Fri Jun 27 01:28:20 2014 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 27 Jun 2014 00:28:20 +0100 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: <53ACAC94.1050206@mrabarnett.plus.com> On 2014-06-26 23:59, Ben Hoyt wrote: > Hi Python dev folks, > > I've written a PEP proposing a specific os.scandir() API for a > directory iterator that returns the stat-like info from the OS, the > main advantage of which is to speed up os.walk() and similar > operations between 4-20x, depending on your OS and file system. Full > details, background info, and context links are in the PEP, which > Victor Stinner has uploaded at the following URL, and I've also > copied inline below. > > http://legacy.python.org/dev/peps/pep-0471/ > > Would love feedback on the PEP, but also of course on the proposal > itself. > [snip] Personally, I'd prefer the name 'iterdir' because it emphasises that it's an iterator. From timothy.c.delaney at gmail.com Fri Jun 27 01:36:28 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 27 Jun 2014 09:36:28 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53ACAC94.1050206@mrabarnett.plus.com> References: <53ACAC94.1050206@mrabarnett.plus.com> Message-ID: On 27 June 2014 09:28, MRAB wrote: > Personally, I'd prefer the name 'iterdir' because it emphasises that > it's an iterator. Exactly what I was going to post (with the added note that thee's an obvious symmetry with listdir). +1 for iterdir rather than scandir Other than that: +1 for adding scandir to the stdlib -1 for windows_wildcard (it would be an attractive nuisance to write windows-only code) Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Fri Jun 27 02:07:46 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 27 Jun 2014 03:07:46 +0300 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: <20140627030746.15641d7e@x34f> Hello, On Thu, 26 Jun 2014 18:59:45 -0400 Ben Hoyt wrote: > Hi Python dev folks, > > I've written a PEP proposing a specific os.scandir() API for a > directory iterator that returns the stat-like info from the OS, the > main advantage of which is to speed up os.walk() and similar > operations between 4-20x, depending on your OS and file system. Full > details, background info, and context links are in the PEP, which > Victor Stinner has uploaded at the following URL, and I've also copied > inline below. I noticed obvious inefficiency of os.walk() implemented in terms of os.listdir() when I worked on "os" module for MicroPython. I essentially did what your PEP suggests - introduced internal generator function (ilistdir_ex() in https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85 ), in terms of which both os.listdir() and os.walk() are implemented. With my MicroPython hat on, os.scandir() would make things only worse. With current interface, one can either have inefficient implementation (like CPython chose) or efficient implementation (like MicroPython chose) - all transparently. os.scandir() supposedly opens up efficient implementation for everyone, but at the price of bloating API and introducing heavy-weight objects to wrap info. PEP calls it "lightweight DirEntry objects", but that cannot be true, because all Python objects are heavy-weight, especially those which have methods. It would be better if os.scandir() was specified to return a struct (named tuple) compatible with return value of os.stat() (with only fields relevant to underlying readdir()-like system call). The grounds for that are obvious: it's already existing data interface in module "os", which is also based on open standard for operating systems - POSIX, so if one is to expect something about file attributes, it's what one can reasonably base expectations on. But reusing os.stat struct is glaringly not what's proposed. And it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(), but requires no system calls on Windows". Nice, but OS "FooBar" can do much more than Windows - it has a system call to send a file by email, right when scanning a directory containing it. So, why not to have DirEntry.send_by_email(recipient) method? I hear the answer - it's because CPython strives to support Windows well, while doesn't care about "FooBar" OS. And then it again leads to the question I posed several times - where's line between "CPython" and "Python"? Is it grounded for CPython to add (or remove) to Python stdlib something which is useful for its users, but useless or complicating for other Python implementations? Especially taking into account that there's "win32api" module allowing Windows users to use all wonders of its API? Especially that os.stat struct is itself pretty extensible (https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix systems (such as FreeBSD), the following attributes may be available ...", "On Mac OS systems...", - so extra fields can be added for Windows just the same, if really needed). > > http://legacy.python.org/dev/peps/pep-0471/ > > Would love feedback on the PEP, but also of course on the proposal > itself. > > -Ben > [] -- Best regards, Paul mailto:pmiscml at gmail.com From ethan at stoneleaf.us Fri Jun 27 01:43:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 26 Jun 2014 16:43:43 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53ACAC94.1050206@mrabarnett.plus.com> Message-ID: <53ACB02F.4020402@stoneleaf.us> On 06/26/2014 04:36 PM, Tim Delaney wrote: > On 27 June 2014 09:28, MRAB wrote: >> >> Personally, I'd prefer the name 'iterdir' because it emphasises that >> it's an iterator. > > Exactly what I was going to post (with the added note that thee's an obvious symmetry with listdir). > > +1 for iterdir rather than scandir > > Other than that: > > +1 for adding [it] to the stdlib +1 for all of above -- ~Ethan~ From benjamin at python.org Fri Jun 27 02:35:21 2014 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 26 Jun 2014 17:35:21 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <20140627030746.15641d7e@x34f> References: <20140627030746.15641d7e@x34f> Message-ID: <1403829321.29631.135045201.6BD5CF6A@webmail.messagingengine.com> On Thu, Jun 26, 2014, at 17:07, Paul Sokolovsky wrote: > > With my MicroPython hat on, os.scandir() would make things only worse. > With current interface, one can either have inefficient implementation > (like CPython chose) or efficient implementation (like MicroPython > chose) - all transparently. os.scandir() supposedly opens up efficient > implementation for everyone, but at the price of bloating API and > introducing heavy-weight objects to wrap info. PEP calls it > "lightweight DirEntry objects", but that cannot be true, because all > Python objects are heavy-weight, especially those which have methods. Why do you think methods make an object more heavyweight? namedtuples have methods. From pmiscml at gmail.com Fri Jun 27 02:47:08 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 27 Jun 2014 03:47:08 +0300 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <1403829321.29631.135045201.6BD5CF6A@webmail.messagingengine.com> References: <20140627030746.15641d7e@x34f> <1403829321.29631.135045201.6BD5CF6A@webmail.messagingengine.com> Message-ID: <20140627034708.04dc58f1@x34f> Hello, On Thu, 26 Jun 2014 17:35:21 -0700 Benjamin Peterson wrote: > On Thu, Jun 26, 2014, at 17:07, Paul Sokolovsky wrote: > > > > With my MicroPython hat on, os.scandir() would make things only > > worse. With current interface, one can either have inefficient > > implementation (like CPython chose) or efficient implementation > > (like MicroPython chose) - all transparently. os.scandir() > > supposedly opens up efficient implementation for everyone, but at > > the price of bloating API and introducing heavy-weight objects to > > wrap info. PEP calls it "lightweight DirEntry objects", but that > > cannot be true, because all Python objects are heavy-weight, > > especially those which have methods. > > Why do you think methods make an object more heavyweight? Because you need to call them. And if the only thing they do is return object field, call overhead is rather noticeable. > namedtuples have methods. Yes, unfortunately. But fortunately, named tuple is a subclass of tuple, so user caring for efficiency can just use numeric indexing which existed for os.stat values all the time, blissfully ignoring cruft which have been accumulating there since 1.5 times. -- Best regards, Paul mailto:pmiscml at gmail.com From rymg19 at gmail.com Fri Jun 27 03:01:18 2014 From: rymg19 at gmail.com (Ryan) Date: Thu, 26 Jun 2014 20:01:18 -0500 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53ACAC94.1050206@mrabarnett.plus.com> Message-ID: <00665839-bca9-4a5d-ae09-b75b6a2abb0e@email.android.com> +1 for scandir. -1 for iterdir(scandir sounds fancier). -99999999 for windows_wildcard. Tim Delaney wrote: >On 27 June 2014 09:28, MRAB wrote: > >> Personally, I'd prefer the name 'iterdir' because it emphasises that >> it's an iterator. > > >Exactly what I was going to post (with the added note that thee's an >obvious symmetry with listdir). > >+1 for iterdir rather than scandir > >Other than that: > >+1 for adding scandir to the stdlib >-1 for windows_wildcard (it would be an attractive nuisance to write >windows-only code) > >Tim Delaney > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: >https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Fri Jun 27 03:37:50 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 26 Jun 2014 21:37:50 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53ACB02F.4020402@stoneleaf.us> References: <53ACAC94.1050206@mrabarnett.plus.com> <53ACB02F.4020402@stoneleaf.us> Message-ID: I don't mind iterdir() and would take it :-), but I'll just say why I chose the name scandir() -- though it wasn't my suggestion originally: iterdir() sounds like just an iterator version of listdir(), kinda like keys() and iterkeys() in Python 2. Whereas in actual fact the return values are quite different (DirEntry objects vs strings), and so the name change reflects that difference a little. I'm also -1 on windows_wildcard. I think it's asking for trouble, and wouldn't gain much on Windows in most cases anyway. -Ben On Thu, Jun 26, 2014 at 7:43 PM, Ethan Furman wrote: > On 06/26/2014 04:36 PM, Tim Delaney wrote: >> >> On 27 June 2014 09:28, MRAB wrote: >>> >>> >>> Personally, I'd prefer the name 'iterdir' because it emphasises that >>> it's an iterator. >> >> >> Exactly what I was going to post (with the added note that thee's an >> obvious symmetry with listdir). >> >> +1 for iterdir rather than scandir >> >> Other than that: >> >> +1 for adding [it] to the stdlib > > > +1 for all of above > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com From python at mrabarnett.plus.com Fri Jun 27 03:50:38 2014 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 27 Jun 2014 02:50:38 +0100 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53ACAC94.1050206@mrabarnett.plus.com> <53ACB02F.4020402@stoneleaf.us> Message-ID: <53ACCDEE.9070906@mrabarnett.plus.com> On 2014-06-27 02:37, Ben Hoyt wrote: > I don't mind iterdir() and would take it :-), but I'll just say why I > chose the name scandir() -- though it wasn't my suggestion originally: > > iterdir() sounds like just an iterator version of listdir(), kinda > like keys() and iterkeys() in Python 2. Whereas in actual fact the > return values are quite different (DirEntry objects vs strings), and > so the name change reflects that difference a little. > [snip] The re module has 'findall', which returns a list of strings, and 'finditer', which returns an iterator that yields match objects, so there's a precedent. :-) From benhoyt at gmail.com Fri Jun 27 03:52:43 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 26 Jun 2014 21:52:43 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <20140627030746.15641d7e@x34f> References: <20140627030746.15641d7e@x34f> Message-ID: > os.listdir() when I worked on "os" module for MicroPython. I essentially > did what your PEP suggests - introduced internal generator function > (ilistdir_ex() in > https://github.com/micropython/micropython-lib/blob/master/os/os/__init__.py#L85 > ), in terms of which both os.listdir() and os.walk() are implemented. Nice (though I see the implementation is very *nix specific). > With my MicroPython hat on, os.scandir() would make things only worse. > With current interface, one can either have inefficient implementation > (like CPython chose) or efficient implementation (like MicroPython > chose) - all transparently. os.scandir() supposedly opens up efficient > implementation for everyone, but at the price of bloating API and > introducing heavy-weight objects to wrap info. PEP calls it > "lightweight DirEntry objects", but that cannot be true, because all > Python objects are heavy-weight, especially those which have methods. It's a fair point that os.walk() can be implemented efficiently without adding a new function and API. However, often you'll want more info, like the file size, which scandir() can give you via DirEntry.lstat(), which is free on Windows. So opening up this efficient API is beneficial. In CPython, I think the DirEntry objects are as lightweight as stat_result objects. I'm an embedded developer by background, so I know the constraints here, but I really don't think Python's development should be tailored to fit MicroPython. If os.scandir() is not very efficient on MicroPython, so be it -- 99% of all desktop/server users will gain from it. > It would be better if os.scandir() was specified to return a struct > (named tuple) compatible with return value of os.stat() (with only > fields relevant to underlying readdir()-like system call). The grounds > for that are obvious: it's already existing data interface in module > "os", which is also based on open standard for operating systems - > POSIX, so if one is to expect something about file attributes, it's > what one can reasonably base expectations on. Yes, we considered this early on (see the python-ideas and python-dev threads referenced in the PEP), but decided it wasn't a great API to overload stat_result further, and have most of the attributes None or not present on Linux. > Especially that os.stat struct is itself pretty extensible > (https://docs.python.org/3.4/library/os.html#os.stat : "On other Unix > systems (such as FreeBSD), the following attributes may be > available ...", "On Mac OS systems...", - so extra fields can be added > for Windows just the same, if really needed). Yes. Incidentally, I just submitted an (accepted) patch for Python 3.5 that adds the full Win32 file attribute data to stat_result objects on Windows (see https://docs.python.org/3.5/whatsnew/3.5.html#os). However, for scandir() to be useful, you also need the name. My original version of this directory iterator returned two-tuples of (name, stat_result). But most people didn't like the API, and I don't really either. You could overload stat_result with a .name attribute in this case, but it still isn't a nice API to have most of the attributes None, and then you have to test for that, etc. So basically we tweaked the API to do what was best, and ended up with it returning DirEntry objects with is_file() and similar methods. Hope that helps give a bit more context. If you haven't read the relevant python-ideas and python-dev threads, those are interesting too. -Ben From greg at krypto.org Fri Jun 27 04:04:16 2014 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 26 Jun 2014 19:04:16 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53ACAC94.1050206@mrabarnett.plus.com> <53ACB02F.4020402@stoneleaf.us> Message-ID: +1 on getting this in for 3.5. If the only objection people are having is the stupid paint color of the name I don't care what it's called! scandir matches the libc API of the same name. iterdir also makes sense to anyone reading it. Whoever checks this in can pick one and be done with it. We have other Python APIs with iter in the name and tend not to be trying to mirror C so much these days so the iterdir folks do have a valid point. I'm not a huge fan of the DirEntry object and the method calls on it instead of simply yielding tuples of (filename, partially_filled_in_stat_result) but I don't *really* care which is used as they both work fine and it is trivial to wrap with another generator expression to turn it into exactly what you want anyways. Python not having the ability to operate on large directories means Python simply cannot be used for common system maintenance tasks. Python being slow to walk a file system due to unnecessary stat calls (often each an entire io op. requiring a disk seek!) due to the existing information that it throws away not being used via listdir is similarly a problem. This addresses both. IMNSHO, it is a single function, it belongs in the os module right next to listdir. -gps On Thu, Jun 26, 2014 at 6:37 PM, Ben Hoyt wrote: > I don't mind iterdir() and would take it :-), but I'll just say why I > chose the name scandir() -- though it wasn't my suggestion originally: > > iterdir() sounds like just an iterator version of listdir(), kinda > like keys() and iterkeys() in Python 2. Whereas in actual fact the > return values are quite different (DirEntry objects vs strings), and > so the name change reflects that difference a little. > > I'm also -1 on windows_wildcard. I think it's asking for trouble, and > wouldn't gain much on Windows in most cases anyway. > > -Ben > > On Thu, Jun 26, 2014 at 7:43 PM, Ethan Furman wrote: > > On 06/26/2014 04:36 PM, Tim Delaney wrote: > >> > >> On 27 June 2014 09:28, MRAB wrote: > >>> > >>> > >>> Personally, I'd prefer the name 'iterdir' because it emphasises that > >>> it's an iterator. > >> > >> > >> Exactly what I was going to post (with the added note that thee's an > >> obvious symmetry with listdir). > >> > >> +1 for iterdir rather than scandir > >> > >> Other than that: > >> > >> +1 for adding [it] to the stdlib > > > > > > +1 for all of above > > > > -- > > ~Ethan~ > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jun 27 04:08:41 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 27 Jun 2014 12:08:41 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <20140627030746.15641d7e@x34f> References: <20140627030746.15641d7e@x34f> Message-ID: <20140627020841.GD13014@ando> On Fri, Jun 27, 2014 at 03:07:46AM +0300, Paul Sokolovsky wrote: > With my MicroPython hat on, os.scandir() would make things only worse. > With current interface, one can either have inefficient implementation > (like CPython chose) or efficient implementation (like MicroPython > chose) - all transparently. os.scandir() supposedly opens up efficient > implementation for everyone, but at the price of bloating API and > introducing heavy-weight objects to wrap info. os.scandir is not part of the Python API, it is not a built-in function. It is part of the CPython standard library. That means (in my opinion) that there is an expectation that other Pythons should provide it, but not an absolute requirement. Especially for the os module, which by definition is platform-specific. In my opinion that means you have four options: 1. provide os.scandir, with exactly the same semantics as on CPython; 2. provide os.scandir, but change its semantics to be more lightweight (e.g. return an ordinary tuple, as you already suggest); 3. don't provide os.scandir at all; or 4. do something different depending on whether the platform is Linux or an embedded system. I would consider any of those acceptable for a library feature, but not for a language feature. [...] > But reusing os.stat struct is glaringly not what's proposed. And > it's clear where that comes from - "[DirEntry.]lstat(): like os.lstat(), > but requires no system calls on Windows". Nice, but OS "FooBar" can do > much more than Windows - it has a system call to send a file by email, > right when scanning a directory containing it. So, why not to have > DirEntry.send_by_email(recipient) method? I hear the answer - it's > because CPython strives to support Windows well, while doesn't care > about "FooBar" OS. Correct. If there is sufficient demand for FooBar, then CPython may support it. Until then, FooBarPython can support it, and offer whatever platform-specific features are needed within its standard library. > And then it again leads to the question I posed several times - where's > line between "CPython" and "Python"? Is it grounded for CPython to add > (or remove) to Python stdlib something which is useful for its users, > but useless or complicating for other Python implementations? I think so. And other implementations are free to do the same thing. Of course there is an expectation that the standard library of most implementations will be broadly similar, but not that they will be identical. I am surprised that both Jython and IronPython offer an non-functioning dis module: you can import it successfully, but if there's a way to actually use it, I haven't found it: steve at orac:~$ jython Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19) [OpenJDK Server VM (Sun Microsystems Inc.)] on java1.6.0_27 Type "help", "copyright", "credits" or "license" for more information. >>> import dis >>> dis.dis(lambda x: x+1) Traceback (most recent call last): File "", line 1, in File "/usr/share/jython/Lib/dis.py", line 42, in dis disassemble(x) File "/usr/share/jython/Lib/dis.py", line 64, in disassemble linestarts = dict(findlinestarts(co)) File "/usr/share/jython/Lib/dis.py", line 183, in findlinestarts byte_increments = [ord(c) for c in code.co_lnotab[0::2]] AttributeError: 'tablecode' object has no attribute 'co_lnotab' IronPython gives a different exception: steve at orac:~$ ipy IronPython 2.6 Beta 2 DEBUG (2.6.0.20) on .NET 2.0.50727.1433 Type "help", "copyright", "credits" or "license" for more information. >>> import dis >>> dis.dis(lambda x: x+1) Traceback (most recent call last): TypeError: don't know how to disassemble code objects It's quite annoying, I would have rather that they just removed the module altogether. Better still would have been to disassemble code objects to whatever byte code the Java and .Net platforms use. But there's surely no requirement to disassemble to CPython byte code! -- Steven From steve at pearwood.info Fri Jun 27 04:21:15 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 27 Jun 2014 12:21:15 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53ACAC94.1050206@mrabarnett.plus.com> <53ACB02F.4020402@stoneleaf.us> Message-ID: <20140627022115.GE13014@ando> On Thu, Jun 26, 2014 at 09:37:50PM -0400, Ben Hoyt wrote: > I don't mind iterdir() and would take it :-), but I'll just say why I > chose the name scandir() -- though it wasn't my suggestion originally: > > iterdir() sounds like just an iterator version of listdir(), kinda > like keys() and iterkeys() in Python 2. Whereas in actual fact the > return values are quite different (DirEntry objects vs strings), and > so the name change reflects that difference a little. +1 I think that's a good objective reason to prefer scandir, which suits me, because my subjective opinion is that "iterdir" is an inelegant and less than attractive name. -- Steven From v+python at g.nevcal.com Fri Jun 27 04:43:34 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 26 Jun 2014 19:43:34 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: <53ACDA56.9050803@g.nevcal.com> I'm generally +1, with opinions noted below on these two topics. On 6/26/2014 3:59 PM, Ben Hoyt wrote: > Should there be a way to access the full path? > ---------------------------------------------- > > Should ``DirEntry``'s have a way to get the full path without using > ``os.path.join(path, entry.name)``? This is a pretty common pattern, > and it may be useful to add pathlib-like ``str(entry)`` functionality. > This functionality has also been requested in `issue 13`_ on GitHub. > > .. _`issue 13`:https://github.com/benhoyt/scandir/issues/13 +1 > Should it expose Windows wildcard functionality? > ------------------------------------------------ > > Should ``scandir()`` have a way of exposing the wildcard functionality > in the Windows ``FindFirstFile`` / ``FindNextFile`` functions? The > scandir module on GitHub exposes this as a ``windows_wildcard`` > keyword argument, allowing Windows power users the option to pass a > custom wildcard to ``FindFirstFile``, which may avoid the need to use > ``fnmatch`` or similar on the resulting names. It is named the > unwieldly ``windows_wildcard`` to remind you you're writing power- > user, Windows-only code if you use it. > > This boils down to whether ``scandir`` should be about exposing all of > the system's directory iteration features, or simply providing a fast, > simple, cross-platform directory iteration API. > > This PEP's author votes for not including ``windows_wildcard`` in the > standard library version, because even though it could be useful in > rare cases (say the Windows Dropbox client?), it'd be too easy to use > it just because you're a Windows developer, and create code that is > not cross-platform. Because another common pattern is to check for name matches pattern, I think it would be good to have a feature that provides such. I do that in my own private directory listing extensions, and also some command lines expose it to the user. Where exposed to the user, I use -p windows-pattern and -P regexp. My implementation converts the windows-pattern to a regexp, and then uses common code, but for this particular API, because the windows_wildcard can be optimized by the window API call used, it would make more sense to pass windows_wildcard directly to FindFirst on Windows, but on *nix convert it to a regexp. Both Windows and *nix would call re to process pattern matches except for the case on Windows of having a Windows pattern passed in. The alternate parameter could simply be called wildcard, and would be a regexp. If desired, other flavors of wildcard bsd_wildcard? could also be implemented, but I'm not sure there are any benefits to them, as there are, as far as I am aware, no optimizations for those patterns in those systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Jun 27 08:47:21 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 27 Jun 2014 07:47:21 +0100 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: On 26 June 2014 23:59, Ben Hoyt wrote: > Would love feedback on the PEP, but also of course on the proposal itself. A solid +1 from me. Some specific points: - I'm in favour of it being in the os module. It's more discoverable there, as well as the other reasons mentioned. - I prefer scandir as the name, for the reason you gave (the output isn't the same as an iterator version of listdir) - I'm mildly against windows_wildcard (even though I'm a windows user) - You mention the caching behaviour of DirEntry objects. The limitations should be clearly covered in the final docs, as it's the sort of thing people will get wrong otherwise. Paul From bkabrda at redhat.com Fri Jun 27 09:07:23 2014 From: bkabrda at redhat.com (Bohuslav Kabrda) Date: Fri, 27 Jun 2014 03:07:23 -0400 (EDT) Subject: [Python-Dev] Binary CPython distribution for Linux In-Reply-To: <53AC679B.1000408@gmail.com> References: <53AC679B.1000408@gmail.com> Message-ID: <1792609645.45624621.1403852843057.JavaMail.zimbra@redhat.com> ----- Original Message ----- > While much of the opposition to dropping Python <2.7 stems from the RHEL > community (they still have 2.4 in extended support and 2.7 wasn't in a > release until a few weeks ago), a common objection from the users is "I > can't install a different Python" or "it's too difficult to install a > different Python." The former is a legit complaint - if you are on > shared hosting and don't have root, as easy as it is to add an alternate > package repository that provides 2.7 (or newer), you don't have the > permissions so you can't do it. It's not true that 2.7 wasn't released until few weeks ago. It was released few weeks ago as part of RHEL 7, but Red Hat has been shipping Red Hat Software Collections (RHSCL) 1.0, that contain Python 2.7 and Python 3.3, for almost a year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1 (also with 2.7 and 3.3) has been released few weeks ago and is supported on RHEL 6 and 7. Also, these collections now have their community rebuilds at [2], so you can just download them without needing to talk to Red Hat at all. But yeah, these are all RPMs, so you have to be root to install them. > I'd like to propose a solution to this problem: a pre-built distribution > of CPython for Linux available via www.python.org in the list of > downloads for a particular release [5]. This distribution could be > downloaded and unarchived into the user's home directory and users could > start running it immediately by setting an environment variable or two, > creating a symlink, or even running a basic installer script. This would > hopefully remove the hurdles of obtaining a (sane) Python distribution > on Linux. This would allow projects to more easily drop end-of-life > Python versions and would speed adoption of modern Python, including > Python 3 (because porting is much easier if you only have to target 2.7). > > I understand there may be technical challenges with doing this for some > distributions and with producing a universal binary distribution. I > would settle for a binary distribution that was targeted towards RHEL > users and variant distros, as that is the user population that I > perceive to be the most conservative and responsible for holding modern > Python adoption back. Speaking with my Fedora/RHEL/RHSCL Python maintainer's hat on, prebuilding Python is not as easy task as it may seem :) Someone has to write the build scripts (e.g. sort of specfile, but rpm/specfile wouldn't really work for you, since you want to install in user's home dirs). Someone has to update them when new Python comes out, so in the worst case you end up with slightly different build scripts for different versions of Python. Someone has to do rebuilds when there is CVE. Or a bug. Or a user requests a feature that makes sense. Someone has to do that for *each packaged version* - and each packaged version needs to be maintained for some amount of time so that it all actually makes sense. Maintaining a prebuilt distribution of Python is a time consuming task even if you do it just for one Linux distro. If you want to maintain a *universal* prebuilt Python distribution, then you'll find out that it's a) undoable b) consumes so many resources and it's so fragile, that it's probably not worth it. You could just bundle all Python dependencies into your distribution to make it "easier", but that would just make the result grow in size (perhaps significantly) and you would then also need to update/bugfix/securityfix the bundled dependencies (which would consume even more time). Please don't take this as a criticism of your ideas, I see what you're trying to solve. I just think the way you're trying to solve it is unachievable or would consume so much community resources, that it would end up unmaintained and buggy most of the time. -- Regards, Bohuslav "Slavek" Kabrda. [1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/ [2] https://www.softwarecollections.org/en/scls/ From victor.stinner at gmail.com Fri Jun 27 09:44:17 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 27 Jun 2014 09:44:17 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: Hi, You wrote a great PEP Ben, thanks :-) But it's now time for comments! > But the underlying system calls -- ``FindFirstFile`` / > ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir? You should add a link to FindFirstFile doc: http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418%28v=vs.85%29.aspx It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we should mimic stat_result recent addition: the new stat_result.file_attributes field. Add DirEntry.file_attributes which would only be available on Windows. The Windows structure also contains FILETIME ftCreationTime; FILETIME ftLastAccessTime; FILETIME ftLastWriteTime; DWORD nFileSizeHigh; DWORD nFileSizeLow; It would be nice to expose them as well. I'm no more surprised that the exact API is different depending on the OS for functions of the os module. > * Instead of bare filename strings, it returns lightweight > ``DirEntry`` objects that hold the filename string and provide > simple methods that allow access to the stat-like data the operating > system returned. Does your implementation uses a free list to avoid the cost of memory allocation? A short free list of 10 or maybe just 1 may help. The free list may be stored directly in the generator object. > ``scandir()`` yields a ``DirEntry`` object for each file and directory > in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` > pseudo-directories are skipped, and the entries are yielded in > system-dependent order. Each ``DirEntry`` object has the following > attributes and methods: Does it support also bytes filenames on UNIX? Python now supports undecodable filenames thanks to the PEP 383 (surrogateescape). I prefer to use the same type for filenames on Linux and Windows, so Unicode is better. But some users might prefer bytes for other reasons. > The ``DirEntry`` attribute and method names were chosen to be the same > as those in the new ``pathlib`` module for consistency. Great! That's exactly what I expected :-) Consistency with other modules. > Notes on caching > ---------------- > > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute > is obviously always cached, and the ``is_X`` and ``lstat`` methods > cache their values (immediately on Windows via ``FindNextFile``, and > on first use on Linux / OS X via a ``stat`` call) and never refetch > from the system. > > For this reason, ``DirEntry`` objects are intended to be used and > thrown away after iteration, not stored in long-lived data structured > and the methods called again and again. > > If a user wants to do that (for example, for watching a file's size > change), they'll need to call the regular ``os.lstat()`` or > ``os.path.getsize()`` functions which force a new system call each > time. Crazy idea: would it be possible to "convert" a DirEntry object to a pathlib.Path object without losing the cache? I guess that pathlib.Path expects a full stat_result object. > Or, for getting the total size of files in a directory tree -- showing > use of the ``DirEntry.lstat()`` method:: > > def get_tree_size(path): > """Return total size of files in path and subdirs.""" > size = 0 > for entry in scandir(path): > if entry.is_dir(): > sub_path = os.path.join(path, entry.name) > size += get_tree_size(sub_path) > else: > size += entry.lstat().st_size > return size > > Note that ``get_tree_size()`` will get a huge speed boost on Windows, > because no extra stat call are needed, but on Linux and OS X the size > information is not returned by the directory iteration functions, so > this function won't gain anything there. I don't understand how you can build a full lstat() result without really calling stat. I see that WIN32_FIND_DATA contains the size, but here you call lstat(). If you know that it's not a symlink, you already know the size, but you still have to call stat() to retrieve all fields required to build a stat_result no? > Support > ======= > > The scandir module on GitHub has been forked and used quite a bit (see > "Use in the wild" in this PEP), Do you plan to continue to maintain your module for Python < 3.5, but upgrade your module for the final PEP? > Should scandir be in its own module? > ------------------------------------ > > Should the function be included in the standard library in a new > module, ``scandir.scandir()``, or just as ``os.scandir()`` as > discussed? The preference of this PEP's author (Ben Hoyt) would be > ``os.scandir()``, as it's just a single function. Yes, put it in the os module which is already bloated :-) > Should there be a way to access the full path? > ---------------------------------------------- > > Should ``DirEntry``'s have a way to get the full path without using > ``os.path.join(path, entry.name)``? This is a pretty common pattern, > and it may be useful to add pathlib-like ``str(entry)`` functionality. > This functionality has also been requested in `issue 13`_ on GitHub. > > .. _`issue 13`: https://github.com/benhoyt/scandir/issues/13 I think that it would be very convinient to store the directory name in the DirEntry. It should be light, it's just a reference. And provide a fullname() name which would just return os.path.join(path, entry.name) without trying to resolve path to get an absolute path. > Should it expose Windows wildcard functionality? > ------------------------------------------------ > > Should ``scandir()`` have a way of exposing the wildcard functionality > in the Windows ``FindFirstFile`` / ``FindNextFile`` functions? The > scandir module on GitHub exposes this as a ``windows_wildcard`` > keyword argument, allowing Windows power users the option to pass a > custom wildcard to ``FindFirstFile``, which may avoid the need to use > ``fnmatch`` or similar on the resulting names. It is named the > unwieldly ``windows_wildcard`` to remind you you're writing power- > user, Windows-only code if you use it. > > This boils down to whether ``scandir`` should be about exposing all of > the system's directory iteration features, or simply providing a fast, > simple, cross-platform directory iteration API. Would it be hard to implement the wildcard feature on UNIX to compare performances of scandir('*.jpg') with and without the wildcard built in os.scandir? I implemented it in C for the tracemalloc module (Filter object): http://hg.python.org/features/tracemalloc Get the revision 69fd2d766005 and search match_filename_joker() in Modules/_tracemalloc.c. The function matchs the filename backward because it most cases, the last latter is enough to reject a filename (ex: "*.jpg" => reject filenames not ending with "g"). The filename is normalized before matching the pattern: converted to lowercase and / is replaced with \ on Windows. It was decided to drop the Filter object to keep the tracemalloc module as simple as possible. Charles-Fran?ois was not convinced by the speedup. But tracemalloc case is different because the OS didn't provide an API for that. Victor From nad at acm.org Fri Jun 27 11:14:52 2014 From: nad at acm.org (Ned Deily) Date: Fri, 27 Jun 2014 02:14:52 -0700 Subject: [Python-Dev] buildbot.python.org down? Message-ID: The buildbot web site seems to have been down for some hours and still is as of 0915 UTC. I'm not sure who is watching over it but I'll ping the infrastructure team as well. -- Ned Deily, nad at acm.org From ncoghlan at gmail.com Fri Jun 27 12:54:18 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 Jun 2014 20:54:18 +1000 Subject: [Python-Dev] Binary CPython distribution for Linux In-Reply-To: <1792609645.45624621.1403852843057.JavaMail.zimbra@redhat.com> References: <53AC679B.1000408@gmail.com> <1792609645.45624621.1403852843057.JavaMail.zimbra@redhat.com> Message-ID: On 27 Jun 2014 17:33, "Bohuslav Kabrda" wrote: > > It's not true that 2.7 wasn't released until few weeks ago. It was released few weeks ago as part of RHEL 7, but Red Hat has been shipping Red Hat Software Collections (RHSCL) 1.0, that contain Python 2.7 and Python 3.3, for almost a year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1 (also with 2.7 and 3.3) has been released few weeks ago and is supported on RHEL 6 and 7. Also, these collections now have their community rebuilds at [2], so you can just download them without needing to talk to Red Hat at all. But yeah, these are all RPMs, so you have to be root to install them. Indeed, while there are still some rough edges, software collections look like the best approach to doing maintainable system installs of Python runtimes other than the system Python into Fedora/RHEL/CentOS et al (and I say that while wearing both my upstream and downstream hats). Collections solve this problem in a general (rather than CPython specific) way, since they can be used to get upgraded versions of language runtimes, databases, web servers, etc, all without risking the stability of the OS itself. I hope to see someone put together collections for PyPy and PyPy3 as well. The approaches used for runtime isolation of software collections should also be applicable to Debian systems, but (as far as I am aware) the tooling to build them as debs rather than RPMs doesn't exist yet. > Please don't take this as a criticism of your ideas, I see what you're trying to solve. I just think the way you're trying to solve it is unachievable or would consume so much community resources, that it would end up unmaintained and buggy most of the time. For prebuilt userland installs on Linux, I think "miniconda" is the current best available approach. It has its challenges (especially around its handling of security concerns), but it's designed to offer a full cross platform package management system that makes it well suited to the task of managing prebuilt language runtimes in user space. Cheers, Nick. > > -- > Regards, > Bohuslav "Slavek" Kabrda. > > [1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/ > [2] https://www.softwarecollections.org/en/scls/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jun 27 02:09:01 2014 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 26 Jun 2014 17:09:01 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53ACAC94.1050206@mrabarnett.plus.com> Message-ID: <3153899029119710294@unknownmsgid> On Jun 26, 2014, at 4:38 PM, Tim Delaney wrote: On 27 June 2014 09:28, MRAB wrote: > > -1 for windows_wildcard (it would be an attractive nuisance to write windows-only code) Could you emulate it on other platforms? +1 on the rest of it. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Fri Jun 27 13:48:17 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 27 Jun 2014 14:48:17 +0300 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <20140627030746.15641d7e@x34f> Message-ID: <20140627144817.2290a544@x34f> Hello, On Thu, 26 Jun 2014 21:52:43 -0400 Ben Hoyt wrote: [] > It's a fair point that os.walk() can be implemented efficiently > without adding a new function and API. However, often you'll want more > info, like the file size, which scandir() can give you via > DirEntry.lstat(), which is free on Windows. So opening up this > efficient API is beneficial. > > In CPython, I think the DirEntry objects are as lightweight as > stat_result objects. > > I'm an embedded developer by background, so I know the constraints > here, but I really don't think Python's development should be tailored > to fit MicroPython. If os.scandir() is not very efficient on > MicroPython, so be it -- 99% of all desktop/server users will gain > from it. Surely, tailoring Python to MicroPython's needs is completely not what I suggest. It was an example of alternative implementation which optimized os.walk() without need for any additional public module APIs. Vice-versa, high-level nature of API call like os.walk() and underspecification of low-level details (like which function implemented in terms of which others) allow MicroPython provide optimized implementation even with its resource constraints. So, power of high-level interfaces and underspecification should not be underestimated ;-). But I don't want to argue that os.scandir() is "not needed", because that's hardly productive. Something I'd like to prototype in uPy and ideally lead further up to PEP status is to add iterator-based string methods, and I pretty much can expect "we lived without it" response, so don't want to go the same way regarding addition of other iterator-based APIs - it's clear that more iterator/generator based APIs is a good direction for Python to evolve. > > It would be better if os.scandir() was specified to return a struct > > (named tuple) compatible with return value of os.stat() (with only > > fields relevant to underlying readdir()-like system call). The > > grounds for that are obvious: it's already existing data interface > > in module "os", which is also based on open standard for operating > > systems - POSIX, so if one is to expect something about file > > attributes, it's what one can reasonably base expectations on. > > Yes, we considered this early on (see the python-ideas and python-dev > threads referenced in the PEP), but decided it wasn't a great API to > overload stat_result further, and have most of the attributes None or > not present on Linux. > [] > > However, for scandir() to be useful, you also need the name. My > original version of this directory iterator returned two-tuples of > (name, stat_result). But most people didn't like the API, and I don't > really either. You could overload stat_result with a .name attribute > in this case, but it still isn't a nice API to have most of the > attributes None, and then you have to test for that, etc. Yes, returning (name, stat_result) would be my first motion too, I don't see why someone wouldn't like pair of 2 values, with each value of obvious type and semantics within "os" module. Regarding stat result, os.stat() provides full information about a file, and intuitively, one may expect that os.scandir() would provide subset of that info, asymptotically reaching volume of what os.stat() may provide, depending on OS capabilities. So, if truly OS-independent interface is wanted to salvage more data from a dir scanning, using os.stat struct as data interface is hard to ignore. But well, if it was rejected already, what can be said? Perhaps, at least the PEP could be extended to explicitly mention other approached which were discussed and rejected, not just link to a discussion archive (from experience with reading other PEPs, they oftentimes contained such subsections, so hope this suggestion is not ungrounded). > > So basically we tweaked the API to do what was best, and ended up with > it returning DirEntry objects with is_file() and similar methods. > > Hope that helps give a bit more context. If you haven't read the > relevant python-ideas and python-dev threads, those are interesting > too. > > -Ben -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Fri Jun 27 14:13:13 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 27 Jun 2014 15:13:13 +0300 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <20140627020841.GD13014@ando> References: <20140627030746.15641d7e@x34f> <20140627020841.GD13014@ando> Message-ID: <20140627151313.6f2ff34d@x34f> Hello, On Fri, 27 Jun 2014 12:08:41 +1000 Steven D'Aprano wrote: > On Fri, Jun 27, 2014 at 03:07:46AM +0300, Paul Sokolovsky wrote: > > > With my MicroPython hat on, os.scandir() would make things only > > worse. With current interface, one can either have inefficient > > implementation (like CPython chose) or efficient implementation > > (like MicroPython chose) - all transparently. os.scandir() > > supposedly opens up efficient implementation for everyone, but at > > the price of bloating API and introducing heavy-weight objects to > > wrap info. > > os.scandir is not part of the Python API, it is not a built-in > function. It is part of the CPython standard library. Ok, so standard library also has API, and that's the API being discussed. > That means (in > my opinion) that there is an expectation that other Pythons should > provide it, but not an absolute requirement. Especially for the os > module, which by definition is platform-specific. Yes, that's intuitive, but not strict and formal, so is subject to interpretations. As a developer working on alternative Python implementation, I'd like to have better understanding of what needs to be done to be a compliant implementation (in particular, because I need to pass that info down to the users). So, I was told that https://docs.python.org/3/reference/index.html describes Python, not CPython. Next step is figuring out whether https://docs.python.org/3/library/index.html describes Python or CPython, and if the latter, how to separate Python's stdlib essence from extended library CPython provides? > In my opinion that > means you have four options: > > 1. provide os.scandir, with exactly the same semantics as on CPython; > > 2. provide os.scandir, but change its semantics to be more > lightweight (e.g. return an ordinary tuple, as you already suggest); > > 3. don't provide os.scandir at all; or > > 4. do something different depending on whether the platform is Linux > or an embedded system. > > I would consider any of those acceptable for a library feature, but > not for a language feature. Good, thanks. If that represents shared opinion of (C)Python developers (so, there won't be claims like "MicroPython is not Python because it doesn't provide os.scandir()" (or hundred of other missing stdlib functions ;-) )) that's good enough already. With that in mind, I wished that any Python implementation was as complete and as efficient as possible, and one way to achieve that is to not add stdlib entities without real need (be it more API calls or more data types). So, I'm glad to know that os.scandir() passed thru Occam's Razor in this respect and specified the way it is really for common good. [] -- Best regards, Paul mailto:pmiscml at gmail.com From j.wielicki at sotecware.net Fri Jun 27 12:28:27 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Fri, 27 Jun 2014 12:28:27 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53ACCDEE.9070906@mrabarnett.plus.com> References: <53ACAC94.1050206@mrabarnett.plus.com> <53ACB02F.4020402@stoneleaf.us> <53ACCDEE.9070906@mrabarnett.plus.com> Message-ID: <53AD474B.4020204@sotecware.net> On 27.06.2014 03:50, MRAB wrote: > On 2014-06-27 02:37, Ben Hoyt wrote: >> I don't mind iterdir() and would take it :-), but I'll just say why I >> chose the name scandir() -- though it wasn't my suggestion originally: >> >> iterdir() sounds like just an iterator version of listdir(), kinda >> like keys() and iterkeys() in Python 2. Whereas in actual fact the >> return values are quite different (DirEntry objects vs strings), and >> so the name change reflects that difference a little. >> > [snip] > > The re module has 'findall', which returns a list of strings, and > 'finditer', which returns an iterator that yields match objects, so > there's a precedent. :-) A bad precedent in my opinion though -- I was just recently bitten by that, and I find it very untypical for python. regards, Jonas From j.wielicki at sotecware.net Fri Jun 27 12:44:35 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Fri, 27 Jun 2014 12:44:35 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: <53AD4B13.8070100@sotecware.net> On 27.06.2014 00:59, Ben Hoyt wrote: > Specifics of proposal > ===================== > [snip] Each ``DirEntry`` object has the following > attributes and methods: > [snip] > Notes on caching > ---------------- > > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute > is obviously always cached, and the ``is_X`` and ``lstat`` methods > cache their values (immediately on Windows via ``FindNextFile``, and > on first use on Linux / OS X via a ``stat`` call) and never refetch > from the system. I find this behaviour a bit misleading: using methods and have them return cached results. How much (implementation and/or performance and/or memory) overhead would incur by using property-like access here? I think this would underline the static nature of the data. This would break the semantics with respect to pathlib, but they?re only marginally equal anyways -- and as far as I understand it, pathlib won?t cache, so I think this has a fair point here. regards, jwi From status at bugs.python.org Fri Jun 27 18:07:57 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 27 Jun 2014 18:07:57 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140627160757.D267E56A2F@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-06-20 - 2014-06-27) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4643 (-12) closed 29004 (+72) total 33647 (+60) Open issues with patches: 2162 Issues opened (50) ================== #6916: Remove deprecated items from asynchat http://bugs.python.org/issue6916 reopened by ezio.melotti #10312: intcatcher() can deadlock http://bugs.python.org/issue10312 reopened by Claudiu.Popa #21817: `concurrent.futures.ProcessPoolExecutor` swallows tracebacks http://bugs.python.org/issue21817 opened by cool-RR #21818: cookielib documentation references Cookie module, not cookieli http://bugs.python.org/issue21818 opened by Ajtag #21820: unittest: unhelpful truncating of long strings. http://bugs.python.org/issue21820 opened by cjw296 #21821: The function cygwinccompiler.is_cygwingcc leads to FileNotFoun http://bugs.python.org/issue21821 opened by paugier #21822: KeyboardInterrupt during Thread.join hangs that Thread http://bugs.python.org/issue21822 opened by tupl #21825: Embedding-Python example code from documentation crashes http://bugs.python.org/issue21825 opened by Pat.Le.Cat #21826: Performance issue (+fix) AIX ctypes.util with no /sbin/ldconfi http://bugs.python.org/issue21826 opened by tw.bert #21827: textwrap.dedent() fails when largest common whitespace is a su http://bugs.python.org/issue21827 opened by robertjli #21830: ssl.wrap_socket fails on Windows 7 when specifying ca_certs http://bugs.python.org/issue21830 opened by David.M.Noriega #21833: Fix unicodeless build of Python http://bugs.python.org/issue21833 opened by serhiy.storchaka #21834: Fix a number of tests in unicodeless build http://bugs.python.org/issue21834 opened by serhiy.storchaka #21835: Fix Tkinter in unicodeless build http://bugs.python.org/issue21835 opened by serhiy.storchaka #21836: Fix sqlite3 in unicodeless build http://bugs.python.org/issue21836 opened by serhiy.storchaka #21837: Fix tarfile in unicodeless build http://bugs.python.org/issue21837 opened by serhiy.storchaka #21838: Fix ctypes in unicodeless build http://bugs.python.org/issue21838 opened by serhiy.storchaka #21839: Fix distutils in unicodeless build http://bugs.python.org/issue21839 opened by serhiy.storchaka #21840: Fix os.path in unicodeless build http://bugs.python.org/issue21840 opened by serhiy.storchaka #21841: Fix xml.sax in unicodeless build http://bugs.python.org/issue21841 opened by serhiy.storchaka #21842: Fix IDLE in unicodeless build http://bugs.python.org/issue21842 opened by serhiy.storchaka #21843: Fix doctest in unicodeless build http://bugs.python.org/issue21843 opened by serhiy.storchaka #21844: Fix HTMLParser in unicodeless build http://bugs.python.org/issue21844 opened by serhiy.storchaka #21845: Fix plistlib in unicodeless build http://bugs.python.org/issue21845 opened by serhiy.storchaka #21846: Fix zipfile in unicodeless build http://bugs.python.org/issue21846 opened by serhiy.storchaka #21847: Fix xmlrpc in unicodeless build http://bugs.python.org/issue21847 opened by serhiy.storchaka #21848: Fix logging in unicodeless build http://bugs.python.org/issue21848 opened by serhiy.storchaka #21849: Fix multiprocessing for non-ascii data http://bugs.python.org/issue21849 opened by serhiy.storchaka #21850: Fix httplib and SimpleHTTPServer in unicodeless build http://bugs.python.org/issue21850 opened by serhiy.storchaka #21851: Fix gettext in unicodeless build http://bugs.python.org/issue21851 opened by serhiy.storchaka #21852: Fix optparse in unicodeless build http://bugs.python.org/issue21852 opened by serhiy.storchaka #21853: Fix inspect in unicodeless build http://bugs.python.org/issue21853 opened by serhiy.storchaka #21854: Fix cookielib in unicodeless build http://bugs.python.org/issue21854 opened by serhiy.storchaka #21855: Fix decimal in unicodeless build http://bugs.python.org/issue21855 opened by serhiy.storchaka #21856: memoryview: no overflow on large slice values (start, stop, st http://bugs.python.org/issue21856 opened by haypo #21857: assert that functions clearing the current exception are not c http://bugs.python.org/issue21857 opened by haypo #21859: Add Python implementation of FileIO http://bugs.python.org/issue21859 opened by serhiy.storchaka #21860: Correct FileIO docstrings http://bugs.python.org/issue21860 opened by serhiy.storchaka #21861: io class name are hardcoded in reprs http://bugs.python.org/issue21861 opened by serhiy.storchaka #21862: cProfile command-line should accept "-m module_name" as an alt http://bugs.python.org/issue21862 opened by pitrou #21863: Display module names of C functions in cProfile http://bugs.python.org/issue21863 opened by pitrou #21864: Error in documentation of point 9.8 'Exceptions are classes to http://bugs.python.org/issue21864 opened by Peibolvig #21865: Improve invalid category exception for warnings.filterwarnings http://bugs.python.org/issue21865 opened by berker.peksag #21866: zipfile.ZipFile.close() doesn't respect allowZip64 http://bugs.python.org/issue21866 opened by bgilbert #21867: Turtle returns TypeError when undobuffer is set to 0 (aka no u http://bugs.python.org/issue21867 opened by Lita.Cho #21868: Tbuffer in turtle allows negative size http://bugs.python.org/issue21868 opened by Lita.Cho #21869: Clean up quopri, correct method names encodestring and decodes http://bugs.python.org/issue21869 opened by orsenthil #21871: Python 2.7.7 regression in mimetypes read_windows_registry http://bugs.python.org/issue21871 opened by agolde #21872: LZMA library sometimes fails to decompress a file http://bugs.python.org/issue21872 opened by vnummela #21874: test_strptime fails on rhel/centos/fedora systems http://bugs.python.org/issue21874 opened by boblfoot Most recent 15 issues with no replies (15) ========================================== #21874: test_strptime fails on rhel/centos/fedora systems http://bugs.python.org/issue21874 #21871: Python 2.7.7 regression in mimetypes read_windows_registry http://bugs.python.org/issue21871 #21865: Improve invalid category exception for warnings.filterwarnings http://bugs.python.org/issue21865 #21861: io class name are hardcoded in reprs http://bugs.python.org/issue21861 #21859: Add Python implementation of FileIO http://bugs.python.org/issue21859 #21855: Fix decimal in unicodeless build http://bugs.python.org/issue21855 #21854: Fix cookielib in unicodeless build http://bugs.python.org/issue21854 #21853: Fix inspect in unicodeless build http://bugs.python.org/issue21853 #21852: Fix optparse in unicodeless build http://bugs.python.org/issue21852 #21851: Fix gettext in unicodeless build http://bugs.python.org/issue21851 #21850: Fix httplib and SimpleHTTPServer in unicodeless build http://bugs.python.org/issue21850 #21847: Fix xmlrpc in unicodeless build http://bugs.python.org/issue21847 #21846: Fix zipfile in unicodeless build http://bugs.python.org/issue21846 #21845: Fix plistlib in unicodeless build http://bugs.python.org/issue21845 #21843: Fix doctest in unicodeless build http://bugs.python.org/issue21843 Most recent 15 issues waiting for review (15) ============================================= #21868: Tbuffer in turtle allows negative size http://bugs.python.org/issue21868 #21865: Improve invalid category exception for warnings.filterwarnings http://bugs.python.org/issue21865 #21863: Display module names of C functions in cProfile http://bugs.python.org/issue21863 #21862: cProfile command-line should accept "-m module_name" as an alt http://bugs.python.org/issue21862 #21860: Correct FileIO docstrings http://bugs.python.org/issue21860 #21859: Add Python implementation of FileIO http://bugs.python.org/issue21859 #21857: assert that functions clearing the current exception are not c http://bugs.python.org/issue21857 #21855: Fix decimal in unicodeless build http://bugs.python.org/issue21855 #21854: Fix cookielib in unicodeless build http://bugs.python.org/issue21854 #21853: Fix inspect in unicodeless build http://bugs.python.org/issue21853 #21852: Fix optparse in unicodeless build http://bugs.python.org/issue21852 #21851: Fix gettext in unicodeless build http://bugs.python.org/issue21851 #21850: Fix httplib and SimpleHTTPServer in unicodeless build http://bugs.python.org/issue21850 #21849: Fix multiprocessing for non-ascii data http://bugs.python.org/issue21849 #21848: Fix logging in unicodeless build http://bugs.python.org/issue21848 Top 10 most discussed issues (10) ================================= #14460: In re's positive lookbehind assertion repetition works http://bugs.python.org/issue14460 10 msgs #21163: asyncio doesn't warn if a task is destroyed during its executi http://bugs.python.org/issue21163 10 msgs #21765: Idle: make 3.x HyperParser work with non-ascii identifiers. http://bugs.python.org/issue21765 9 msgs #21820: unittest: unhelpful truncating of long strings. http://bugs.python.org/issue21820 9 msgs #6916: Remove deprecated items from asynchat http://bugs.python.org/issue6916 8 msgs #11406: There is no os.listdir() equivalent returning generator instea http://bugs.python.org/issue11406 7 msgs #12750: datetime.strftime('%s') should respect tzinfo http://bugs.python.org/issue12750 7 msgs #19351: python msi installers - silent mode http://bugs.python.org/issue19351 7 msgs #20092: type() constructor should bind __int__ to __index__ when __ind http://bugs.python.org/issue20092 6 msgs #21331: Reversing an encoding with unicode-escape returns a different http://bugs.python.org/issue21331 6 msgs Issues closed (70) ================== #2213: build_tkinter.py does not handle paths with spaces http://bugs.python.org/issue2213 closed by loewis #4346: PyObject_CallMethod changes the exception message already set http://bugs.python.org/issue4346 closed by python-dev #4613: Can't figure out where SyntaxError: can not delete variable 'x http://bugs.python.org/issue4613 closed by ned.deily #4735: An error occurred during the installation of assembly http://bugs.python.org/issue4735 closed by zach.ware #5235: distutils seems to only work with VC++ 2008 (9.0) http://bugs.python.org/issue5235 closed by loewis #6305: islice doesn't accept large stop values http://bugs.python.org/issue6305 closed by loewis #6362: multiprocessing: handling of errno after signals in sem_acquir http://bugs.python.org/issue6362 closed by loewis #8192: SQLite3 PRAGMA table_info doesn't respect database on Win32 http://bugs.python.org/issue8192 closed by loewis #8343: improve re parse error messages for named groups http://bugs.python.org/issue8343 closed by rhettinger #10217: python-2.7.amd64.msi install fails http://bugs.python.org/issue10217 closed by zach.ware #10747: Include version info in Windows shortcuts http://bugs.python.org/issue10747 closed by loewis #10798: test_concurrent_futures fails on FreeBSD http://bugs.python.org/issue10798 closed by haypo #11974: Class definition gotcha.. should this be documented somewhere? http://bugs.python.org/issue11974 closed by rhettinger #12066: Empty ('') xmlns attribute is not properly handled by xml.dom. http://bugs.python.org/issue12066 closed by ned.deily #12860: http client attempts to send a readable object twice http://bugs.python.org/issue12860 closed by ned.deily #13143: os.path.islink documentation is ambiguous http://bugs.python.org/issue13143 closed by python-dev #14457: Unattended Install doesn't populate registry http://bugs.python.org/issue14457 closed by loewis #14477: Rietveld test issue http://bugs.python.org/issue14477 closed by loewis #14540: Crash in Modules/_ctypes/libffi/src/dlmalloc.c on ia64-hp-hpux http://bugs.python.org/issue14540 closed by pda #14561: python-2.7.2-r3 suffers test failure at test_mhlib http://bugs.python.org/issue14561 closed by ned.deily #15588: quopri: encodestring and decodestring handle bytes, not string http://bugs.python.org/issue15588 closed by orsenthil #16667: timezone docs need "versionadded: 3.2" http://bugs.python.org/issue16667 closed by python-dev #16976: Asyncore/asynchat hangs when used with ssl sockets http://bugs.python.org/issue16976 closed by giampaolo.rodola #17170: string method lookup is too slow http://bugs.python.org/issue17170 closed by pitrou #17424: help() should use the class signature http://bugs.python.org/issue17424 closed by yselivanov #17449: dev guide appears not to cover the benchmarking suite http://bugs.python.org/issue17449 closed by python-dev #19145: Inconsistent behaviour in itertools.repeat when using negative http://bugs.python.org/issue19145 closed by rhettinger #19897: Use python as executable instead of python3 in Python 2 docs http://bugs.python.org/issue19897 closed by berker.peksag #20155: Regression test test_httpservers fails, hangs on Windows http://bugs.python.org/issue20155 closed by r.david.murray #20295: imghdr add openexr support http://bugs.python.org/issue20295 closed by r.david.murray #20446: ipaddress: hash similarities for ipv4 and ipv6 http://bugs.python.org/issue20446 closed by tim.peters #20753: disable test_robotparser test that uses an invalid URL http://bugs.python.org/issue20753 closed by orsenthil #20756: Segmentation fault with unoconv http://bugs.python.org/issue20756 closed by Sworddragon #20872: dbm/gdbm/ndbm close methods are not document http://bugs.python.org/issue20872 closed by python-dev #20939: test_geturl of test_urllibnet fails with 'https://www.python.o http://bugs.python.org/issue20939 closed by ned.deily #21030: pip usable only by administrators on Windows and SELinux http://bugs.python.org/issue21030 closed by loewis #21158: Windows installer service could not be accessed http://bugs.python.org/issue21158 closed by loewis #21216: getaddrinfo is wrongly considered thread safe on linux http://bugs.python.org/issue21216 closed by gregory.p.smith #21441: Buffer Protocol Documentation Error http://bugs.python.org/issue21441 closed by python-dev #21476: Inconsistent behaviour between BytesParser.parse and Parser.pa http://bugs.python.org/issue21476 closed by r.david.murray #21491: race condition in SocketServer.py ForkingMixIn collect_childre http://bugs.python.org/issue21491 closed by neologix #21532: 2.7.7rc1 msi is lacking libpython27.a http://bugs.python.org/issue21532 closed by loewis #21635: difflib.SequenceMatcher stores matching blocks as tuples, not http://bugs.python.org/issue21635 closed by rhettinger #21670: Add repr to shelve.Shelf http://bugs.python.org/issue21670 closed by rhettinger #21672: Python for Windows 2.7.7: Path Configuration File No Longer Wo http://bugs.python.org/issue21672 closed by python-dev #21684: inspect.signature bind doesn't include defaults or empty tuple http://bugs.python.org/issue21684 closed by yselivanov #21716: 3.4.1 download page link for OpenPGP signatures has no sigs http://bugs.python.org/issue21716 closed by ned.deily #21729: Use `with` statement in dbm.dumb http://bugs.python.org/issue21729 closed by serhiy.storchaka #21768: Fix a NameError in test_pydoc http://bugs.python.org/issue21768 closed by terry.reedy #21769: Fix a NameError in test_descr http://bugs.python.org/issue21769 closed by terry.reedy #21770: Module not callable in script_helper.py http://bugs.python.org/issue21770 closed by terry.reedy #21786: Use assertEqual in test_pydoc http://bugs.python.org/issue21786 closed by rhettinger #21799: python34.dll is not installed http://bugs.python.org/issue21799 closed by loewis #21801: inspect.signature doesn't always return a signature http://bugs.python.org/issue21801 closed by python-dev #21807: SysLogHandler closes TCP connection after first message http://bugs.python.org/issue21807 closed by vinay.sajip #21809: Building Python3 on VMS - External repository http://bugs.python.org/issue21809 closed by terry.reedy #21812: turtle.shapetransform doesn't transform the turtle on the firs http://bugs.python.org/issue21812 closed by rhettinger #21814: object.__setattr__ or super(...).__setattr__? http://bugs.python.org/issue21814 closed by rhettinger #21816: OverflowError: Python int too large to convert to C long http://bugs.python.org/issue21816 closed by ned.deily #21819: Remaining buffer from socket isn't available anymore after cal http://bugs.python.org/issue21819 closed by neologix #21823: Catch turtle.Terminator exceptions in turtledemo http://bugs.python.org/issue21823 closed by terry.reedy #21824: Make turtledemo 2.7 help show file contents, not file name. http://bugs.python.org/issue21824 closed by terry.reedy #21828: added/corrected containment relationship for networks in lib i http://bugs.python.org/issue21828 closed by nlm #21829: Wrong test in ctypes http://bugs.python.org/issue21829 closed by zach.ware #21831: integer overflow in 'buffer' type allows reading memory http://bugs.python.org/issue21831 closed by python-dev #21832: collections.namedtuple does questionable things when passed qu http://bugs.python.org/issue21832 closed by rhettinger #21858: Enhance error handling in the sqlite module http://bugs.python.org/issue21858 closed by haypo #21870: Ctrl-C doesn't interrupt simple loop http://bugs.python.org/issue21870 closed by r.david.murray #21873: Tuple comparisons with NaNs are broken http://bugs.python.org/issue21873 closed by rhettinger #21875: Remove vestigial references to Classic Mac OS attributes in os http://bugs.python.org/issue21875 closed by ned.deily From benjamin at python.org Fri Jun 27 18:50:19 2014 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 27 Jun 2014 09:50:19 -0700 Subject: [Python-Dev] buildbot.python.org down? In-Reply-To: References: Message-ID: <1403887819.13904.135300541.288F8C33@webmail.messagingengine.com> On Fri, Jun 27, 2014, at 02:14, Ned Deily wrote: > The buildbot web site seems to have been down for some hours and still > is as of 0915 UTC. I'm not sure who is watching over it but I'll ping > the infrastructure team as well. Fixed. The VM crashed, and Ernest rebooted it. From python at mrabarnett.plus.com Fri Jun 27 18:56:28 2014 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 27 Jun 2014 17:56:28 +0100 Subject: [Python-Dev] LZO bug Message-ID: <53ADA23C.5000801@mrabarnett.plus.com> Is this something that we need to worry about? Raising Lazarus - The 20 Year Old Bug that Went to Mars http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html From raymond.hettinger at gmail.com Fri Jun 27 20:13:53 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 27 Jun 2014 11:13:53 -0700 Subject: [Python-Dev] LZO bug In-Reply-To: <53ADA23C.5000801@mrabarnett.plus.com> References: <53ADA23C.5000801@mrabarnett.plus.com> Message-ID: On Jun 27, 2014, at 9:56 AM, MRAB wrote: > Is this something that we need to worry about? > > Raising Lazarus - The 20 Year Old Bug that Went to Mars > http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html Debunking the LZ4 "20 years old bug" myth http://fastcompression.blogspot.com/2014/06/debunking-lz4-20-years-old-bug-myth.html Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 27 23:58:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jun 2014 07:58:50 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53AD4B13.8070100@sotecware.net> References: <53AD4B13.8070100@sotecware.net> Message-ID: On 28 Jun 2014 01:27, "Jonas Wielicki" wrote: > > On 27.06.2014 00:59, Ben Hoyt wrote: > > Specifics of proposal > > ===================== > > [snip] Each ``DirEntry`` object has the following > > attributes and methods: > > [snip] > > Notes on caching > > ---------------- > > > > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute > > is obviously always cached, and the ``is_X`` and ``lstat`` methods > > cache their values (immediately on Windows via ``FindNextFile``, and > > on first use on Linux / OS X via a ``stat`` call) and never refetch > > from the system. > > I find this behaviour a bit misleading: using methods and have them > return cached results. How much (implementation and/or performance > and/or memory) overhead would incur by using property-like access here? > I think this would underline the static nature of the data. > > This would break the semantics with respect to pathlib, but they?re only > marginally equal anyways -- and as far as I understand it, pathlib won?t > cache, so I think this has a fair point here. Indeed - using properties rather than methods may help emphasise the deliberate *difference* from pathlib in this case (i.e. value when the result was retrieved from the OS, rather than the value right now). The main benefit is that switching from using the DirEntry object to a pathlib Path will require touching all the places where the performance characteristics switch from "memory access" to "system call". This benefit is also the main downside, so I'd actually be OK with either decision on this one. Other comments: * +1 on the general idea * +1 on scandir() over iterdir, since it *isn't* just an iterator version of listdir * -1 on including Windows specific globbing support in the API * -0 on including cross platform globbing support in the initial iteration of the API (that could be done later as a separate RFE instead) * +1 on a new section in the PEP covering rejected design options (calling it iterdir, returning a 2-tuple instead of a dedicated DirEntry type) * regarding "why not a 2-tuple", we know from experience that operating systems evolve and we end up wanting to add additional info to this kind of API. A dedicated DirEntry type lets us adjust the information returned over time, without breaking backwards compatibility and without resorting to ugly hacks like those in some of the time and stat APIs (or even our own codec info APIs) * it would be nice to see some relative performance numbers for NFS and CIFS network shares - the additional network round trips can make excessive stat calls absolutely brutal from a speed perspective when using a network drive (that's why the stat caching added to the import system in 3.3 dramatically sped up the case of having network drives on sys.path, and why I thought AJ had a point when he was complaining about the fact we didn't expose the dirent data from os.listdir) Regards, Nick. > > regards, > jwi > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Sat Jun 28 01:51:44 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 28 Jun 2014 01:51:44 +0200 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: 2014-06-26 13:04 GMT+02:00 Antoine Pitrou : > For the same reason, I agree with Victor that we should ditch the > threading-disabled builds. It's too much of a hassle for no actual, > practical benefit. People who want a threadless unicodeless Python can > install Python 1.5.2 for all I care. By the way, adding a buildbot for testing Python without thread support is not enough. The buildbot is currently broken since more than one month and nobody noticed :-p http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%203.x/ Ok, I noticed, but I consider that I spent too much time on this minor use case. I prefer to leave such task to someone else :-) Victor From greg at krypto.org Sat Jun 28 08:17:55 2014 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 27 Jun 2014 23:17:55 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> Message-ID: On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan wrote: > > * -1 on including Windows specific globbing support in the API > * -0 on including cross platform globbing support in the initial iteration > of the API (that could be done later as a separate RFE instead) > Agreed. Globbing or filtering support should not hold this up. If that part isn't settled, just don't include it and work out what it should be as a future enhancement. > * +1 on a new section in the PEP covering rejected design options (calling > it iterdir, returning a 2-tuple instead of a dedicated DirEntry type) > +1. IMNSHO, one of the most important part of PEPs: capturing the entire decision process to document the "why nots". > * regarding "why not a 2-tuple", we know from experience that operating > systems evolve and we end up wanting to add additional info to this kind of > API. A dedicated DirEntry type lets us adjust the information returned over > time, without breaking backwards compatibility and without resorting to > ugly hacks like those in some of the time and stat APIs (or even our own > codec info APIs) > * it would be nice to see some relative performance numbers for NFS and > CIFS network shares - the additional network round trips can make excessive > stat calls absolutely brutal from a speed perspective when using a network > drive (that's why the stat caching added to the import system in 3.3 > dramatically sped up the case of having network drives on sys.path, and why > I thought AJ had a point when he was complaining about the fact we didn't > expose the dirent data from os.listdir) > fwiw, I wouldn't wait for benchmark numbers. A needless stat call when you've got the information from an earlier API call is already brutal. It is easy to compute from existing ballparks remote file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms. fetch of stat info cached in memory on file server on the local network: ~500us. You can go down further to local system call overhead which can vary wildly but should likely be assumed to be at least 10us. You don't need a benchmark to tell you that adding needless >= 500us-100ms blocking operations to your program is bad. :) -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 28 11:19:23 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jun 2014 19:19:23 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> Message-ID: On 28 June 2014 19:17, Nick Coghlan wrote: > Agreed, but walking even a moderately large tree over the network can > really hammer home the point that this offers a significant > performance enhancement as the latency of access increases. I've found > that kind of comparison can be eye-opening for folks that are used to > only operating on local disks (even spinning disks, let alone SSDs) > and/or relatively small trees (distro build trees aren't *that* big, > but they're big enough for this kind of difference in access overhead > to start getting annoying). Oops, forgot to add - I agree this isn't a blocking issue for the PEP, it's definitely only in "nice to have" territory. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jun 28 11:17:12 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jun 2014 19:17:12 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> Message-ID: On 28 June 2014 16:17, Gregory P. Smith wrote: > On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan wrote: >> * it would be nice to see some relative performance numbers for NFS and >> CIFS network shares - the additional network round trips can make excessive >> stat calls absolutely brutal from a speed perspective when using a network >> drive (that's why the stat caching added to the import system in 3.3 >> dramatically sped up the case of having network drives on sys.path, and why >> I thought AJ had a point when he was complaining about the fact we didn't >> expose the dirent data from os.listdir) > > fwiw, I wouldn't wait for benchmark numbers. > > A needless stat call when you've got the information from an earlier API > call is already brutal. It is easy to compute from existing ballparks remote > file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms. > fetch of stat info cached in memory on file server on the local network: > ~500us. You can go down further to local system call overhead which can > vary wildly but should likely be assumed to be at least 10us. > > You don't need a benchmark to tell you that adding needless >= 500us-100ms > blocking operations to your program is bad. :) Agreed, but walking even a moderately large tree over the network can really hammer home the point that this offers a significant performance enhancement as the latency of access increases. I've found that kind of comparison can be eye-opening for folks that are used to only operating on local disks (even spinning disks, let alone SSDs) and/or relatively small trees (distro build trees aren't *that* big, but they're big enough for this kind of difference in access overhead to start getting annoying). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pmiscml at gmail.com Sat Jun 28 12:58:54 2014 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Sat, 28 Jun 2014 13:58:54 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: <20140628135854.72f0ab28@x34f> Hello, On Thu, 26 Jun 2014 22:49:40 +1000 Chris Angelico wrote: > On Thu, Jun 26, 2014 at 9:04 PM, Antoine Pitrou > wrote: > > For the same reason, I agree with Victor that we should ditch the > > threading-disabled builds. It's too much of a hassle for no actual, > > practical benefit. People who want a threadless unicodeless Python > > can install Python 1.5.2 for all I care. > > Or some other implementation of Python. It's looking like micropython > will be permanently supporting a non-Unicode build Yes. > (although I stepped > away from the project after a strong disagreement over what would and > would not make sense, and haven't been following it since). Your patches with my further additions were finally merged. Unicode strings still cannot be enabled by default due to https://github.com/micropython/micropython/issues/726 . Any help with reviewing/testing what's currently available is welcome. > If someone > wants a Python that doesn't have stuff that the core CPython devs > treat as essential, s/he probably wants something like uPy anyway. I hinted it during previous discussions of MicroPython, and would like to say it again, that MicroPython already embraced a lot of ideas rejected from CPython, like GC-only operation (which alone not something to be proud of, but can you start up and do something in 2K heap?) or tagged pointers (https://mail.python.org/pipermail/python-dev/2004-July/046139.html). So, it should be good vehicle to try any unorthodox ideas(*) or implementations. * MicroPython already implements intra-module constants for example. -- Best regards, Paul mailto:pmiscml at gmail.com From 4kir4.1i at gmail.com Sat Jun 28 15:05:31 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Sat, 28 Jun 2014 17:05:31 +0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator References: Message-ID: <877g412nhg.fsf@gmail.com> Ben Hoyt writes: > Hi Python dev folks, > > I've written a PEP proposing a specific os.scandir() API for a > directory iterator that returns the stat-like info from the OS, *the > main advantage of which is to speed up os.walk() and similar > operations between 4-20x, depending on your OS and file system.* > ... > http://legacy.python.org/dev/peps/pep-0471/ > ... > Specifically, this PEP proposes adding a single function to the ``os`` > module in the standard library, ``scandir``, that takes a single, > optional string as its argument:: > > scandir(path='.') -> generator of DirEntry objects > Have you considered adding support for paths relative to directory descriptors [1] via keyword only dir_fd=None parameter if it may lead to more efficient implementations on some platforms? [1]: https://docs.python.org/3.4/library/os.html#dir-fd -- akira From rosuav at gmail.com Sat Jun 28 17:27:44 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 29 Jun 2014 01:27:44 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <877g412nhg.fsf@gmail.com> References: <877g412nhg.fsf@gmail.com> Message-ID: On Sat, Jun 28, 2014 at 11:05 PM, Akira Li <4kir4.1i at gmail.com> wrote: > Have you considered adding support for paths relative to directory > descriptors [1] via keyword only dir_fd=None parameter if it may lead to > more efficient implementations on some platforms? > > [1]: https://docs.python.org/3.4/library/os.html#dir-fd Potentially more efficient and also potentially safer (see 'man openat')... but an enhancement that can wait, if necessary. ChrisA From benhoyt at gmail.com Sat Jun 28 21:48:03 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Sat, 28 Jun 2014 15:48:03 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: >> But the underlying system calls -- ``FindFirstFile`` / >> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- > > What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir? I guess it'd be better to say "Windows" and "Unix-based OSs" throughout the PEP? Because all of these (including Mac OS X) are Unix-based. > It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we > should mimic stat_result recent addition: the new > stat_result.file_attributes field. Add DirEntry.file_attributes which > would only be available on Windows. > > The Windows structure also contains > > FILETIME ftCreationTime; > FILETIME ftLastAccessTime; > FILETIME ftLastWriteTime; > DWORD nFileSizeHigh; > DWORD nFileSizeLow; > > It would be nice to expose them as well. I'm no more surprised that > the exact API is different depending on the OS for functions of the os > module. I think you've misunderstood how DirEntry.lstat() works on Windows -- it's basically a no-op, as Windows returns the full stat information with the original FindFirst/FindNext OS calls. This is fairly explict in the PEP, but I'm sure I could make it clearer: DirEntry.lstat(): "like os.lstat(), but requires no system calls on Windows So you can already get the dwFileAttributes for free by saying entry.lstat().st_file_attributes. You can also get all the other fields you mentioned for free via .lstat() with no additional OS calls on Windows, for example: entry.lstat().st_size. Feel free to suggest changes to the PEP or scandir docs if this isn't clear. Note that is_dir()/is_file()/is_symlink() are free on all systems, but .lstat() is only free on Windows. > Does your implementation uses a free list to avoid the cost of memory > allocation? A short free list of 10 or maybe just 1 may help. The free > list may be stored directly in the generator object. No, it doesn't. I might add this to the PEP under "possible improvements". However, I think the speed increase by removing the extra OS call and/or disk seek is going to be way more than memory allocation improvements, so I'm not sure this would be worth it. > Does it support also bytes filenames on UNIX? > Python now supports undecodable filenames thanks to the PEP 383 > (surrogateescape). I prefer to use the same type for filenames on > Linux and Windows, so Unicode is better. But some users might prefer > bytes for other reasons. I forget exactly now what my scandir module does, but for os.scandir() I think this should behave exactly like os.listdir() does for Unicode/bytes filenames. > Crazy idea: would it be possible to "convert" a DirEntry object to a > pathlib.Path object without losing the cache? I guess that > pathlib.Path expects a full stat_result object. The main problem is that pathlib.Path objects explicitly don't cache stat info (and Guido doesn't want them to, for good reason I think). There's a thread on python-dev about this earlier. I'll add it to a "Rejected ideas" section. > I don't understand how you can build a full lstat() result without > really calling stat. I see that WIN32_FIND_DATA contains the size, but > here you call lstat(). See above. > Do you plan to continue to maintain your module for Python < 3.5, but > upgrade your module for the final PEP? Yes, I intend to maintain the standalone scandir module for 2.6 <= Python < 3.5, at least for a good while. For integration into the Python 3.5 stdlib, the implementation will be integrated into posixmodule.c, of course. >> Should there be a way to access the full path? >> ---------------------------------------------- >> >> Should ``DirEntry``'s have a way to get the full path without using >> ``os.path.join(path, entry.name)``? This is a pretty common pattern, >> and it may be useful to add pathlib-like ``str(entry)`` functionality. >> This functionality has also been requested in `issue 13`_ on GitHub. >> >> .. _`issue 13`: https://github.com/benhoyt/scandir/issues/13 > > I think that it would be very convinient to store the directory name > in the DirEntry. It should be light, it's just a reference. > > And provide a fullname() name which would just return > os.path.join(path, entry.name) without trying to resolve path to get > an absolute path. Yeah, fair suggestion. I'm still slightly on the fence about this, but I think an explicit fullname() is a good suggestion. Ideally I think it'd be better to mimic pathlib.Path.__str__() which is kind of the equivalent of fullname(). But how does pathlib deal with unicode/bytes issues if it's the str function which has to return a str object? Or at least, it'd be very weird if __str__() returned bytes. But I think it'd need to if you passed bytes into scandir(). Do others have thoughts? > Would it be hard to implement the wildcard feature on UNIX to compare > performances of scandir('*.jpg') with and without the wildcard built > in os.scandir? It's a good idea, the problem with this is that the Windows wildcard implementation has a bunch of crazy edge cases where *.ext will catch more things than just a simple regex/glob. This was discussed on python-dev or python-ideas previously, so I'll dig it up and add to a Rejected Ideas section. In any case, this could be added later if there's a way to iron out the Windows quirks. -Ben From benhoyt at gmail.com Sat Jun 28 21:55:00 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Sat, 28 Jun 2014 15:55:00 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> Message-ID: Re is_dir etc being properties rather than methods: >> I find this behaviour a bit misleading: using methods and have them >> return cached results. How much (implementation and/or performance >> and/or memory) overhead would incur by using property-like access here? >> I think this would underline the static nature of the data. >> >> This would break the semantics with respect to pathlib, but they're only >> marginally equal anyways -- and as far as I understand it, pathlib won't >> cache, so I think this has a fair point here. > > Indeed - using properties rather than methods may help emphasise the > deliberate *difference* from pathlib in this case (i.e. value when the > result was retrieved from the OS, rather than the value right now). The main > benefit is that switching from using the DirEntry object to a pathlib Path > will require touching all the places where the performance characteristics > switch from "memory access" to "system call". This benefit is also the main > downside, so I'd actually be OK with either decision on this one. The problem with this is that properties "look free", they look just like attribute access, so you wouldn't normally handle exceptions when accessing them. But .lstat() and .is_dir() etc may do an OS call, so if you're needing to be careful with error handling, you may want to handle errors on them. Hence I think it's best practice to make them functions(). Some of us discussed this on python-dev or python-ideas a while back, and I think there was general agreement with what I've stated above and therefore they should be methods. But I'll dig up the links and add to a Rejected ideas section. > * +1 on a new section in the PEP covering rejected design options (calling > it iterdir, returning a 2-tuple instead of a dedicated DirEntry type) Great idea. I'll add a bunch of stuff, including the above, to a new section, Rejected Design Options. > * regarding "why not a 2-tuple", we know from experience that operating > systems evolve and we end up wanting to add additional info to this kind of > API. A dedicated DirEntry type lets us adjust the information returned over > time, without breaking backwards compatibility and without resorting to ugly > hacks like those in some of the time and stat APIs (or even our own codec > info APIs) Fully agreed. > * it would be nice to see some relative performance numbers for NFS and CIFS > network shares - the additional network round trips can make excessive stat > calls absolutely brutal from a speed perspective when using a network drive > (that's why the stat caching added to the import system in 3.3 dramatically > sped up the case of having network drives on sys.path, and why I thought AJ > had a point when he was complaining about the fact we didn't expose the > dirent data from os.listdir) Don't know if you saw, but there are actually some benchmarks, including one over NFS, on the scandir GitHub page: https://github.com/benhoyt/scandir#benchmarks os.walk() was 23 times faster with scandir() than the current listdir() + stat() implementation on the Windows NFS file system I tried. Pretty good speedup! -Ben From ncoghlan at gmail.com Sun Jun 29 06:59:19 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Jun 2014 14:59:19 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: On 29 June 2014 05:48, Ben Hoyt wrote: >>> But the underlying system calls -- ``FindFirstFile`` / >>> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- >> >> What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir? > > I guess it'd be better to say "Windows" and "Unix-based OSs" > throughout the PEP? Because all of these (including Mac OS X) are > Unix-based. *nix and POSIX-based are the two conventions I use. >> Crazy idea: would it be possible to "convert" a DirEntry object to a >> pathlib.Path object without losing the cache? I guess that >> pathlib.Path expects a full stat_result object. > > The main problem is that pathlib.Path objects explicitly don't cache > stat info (and Guido doesn't want them to, for good reason I think). > There's a thread on python-dev about this earlier. I'll add it to a > "Rejected ideas" section. The key problem with caches on pathlib.Path objects is that you could end up with two separate path objects that referred to the same filesystem location but returned different answers about the filesystem state because their caches might be stale. DirEntry is different, as the content is generally *assumed* to be stale (referring to when the directory was scanned, rather than the current filesystem state). DirEntry.lstat() on POSIX systems will be an exception to that general rule (referring to the time of first lookup, rather than when the directory was scanned, so the answer rom lstat() may be inconsistent with other data stored directly on the DirEntry object), but one we can probably live with. More generally, as part of the pathlib PEP review, we figured out that a *per-object* cache of filesystem state would be an inherently bad idea, but a string based *process global* cache might make sense for modules like walkdir (not part of the stdlib - it's an iterator pipeline based approach to file tree scanning I wrote a while back, that currently suffers badly from the performance impact of repeated stat calls at different stages of the pipeline). We realised this was getting into a space where application and library specific concerns are likely to start affecting the caching design, though, so the current status of standard library level stat caching is "it's not clear if there's an available approach that would be sufficiently general purpose to be appropriate for inclusion in the standard library". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 29 07:03:27 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Jun 2014 15:03:27 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> Message-ID: On 29 June 2014 05:55, Ben Hoyt wrote: > Re is_dir etc being properties rather than methods: > >>> I find this behaviour a bit misleading: using methods and have them >>> return cached results. How much (implementation and/or performance >>> and/or memory) overhead would incur by using property-like access here? >>> I think this would underline the static nature of the data. >>> >>> This would break the semantics with respect to pathlib, but they're only >>> marginally equal anyways -- and as far as I understand it, pathlib won't >>> cache, so I think this has a fair point here. >> >> Indeed - using properties rather than methods may help emphasise the >> deliberate *difference* from pathlib in this case (i.e. value when the >> result was retrieved from the OS, rather than the value right now). The main >> benefit is that switching from using the DirEntry object to a pathlib Path >> will require touching all the places where the performance characteristics >> switch from "memory access" to "system call". This benefit is also the main >> downside, so I'd actually be OK with either decision on this one. > > The problem with this is that properties "look free", they look just > like attribute access, so you wouldn't normally handle exceptions when > accessing them. But .lstat() and .is_dir() etc may do an OS call, so > if you're needing to be careful with error handling, you may want to > handle errors on them. Hence I think it's best practice to make them > functions(). > > Some of us discussed this on python-dev or python-ideas a while back, > and I think there was general agreement with what I've stated above > and therefore they should be methods. But I'll dig up the links and > add to a Rejected ideas section. Yes, only the stuff that *never* needs a system call (regardless of OS) would be a candidate for handling as a property rather than a method call. Consistency of access would likely trump that idea anyway, but it would still be worth ensuring that the PEP is clear on which values are guaranteed to reflect the state at the time of the directory scanning and which may imply an additional stat call. >> * it would be nice to see some relative performance numbers for NFS and CIFS >> network shares - the additional network round trips can make excessive stat >> calls absolutely brutal from a speed perspective when using a network drive >> (that's why the stat caching added to the import system in 3.3 dramatically >> sped up the case of having network drives on sys.path, and why I thought AJ >> had a point when he was complaining about the fact we didn't expose the >> dirent data from os.listdir) > > Don't know if you saw, but there are actually some benchmarks, > including one over NFS, on the scandir GitHub page: > > https://github.com/benhoyt/scandir#benchmarks No, I hadn't seen those - may be worth referencing explicitly from the PEP (and if there's already a reference... oops!) > os.walk() was 23 times faster with scandir() than the current > listdir() + stat() implementation on the Windows NFS file system I > tried. Pretty good speedup! Ah, nice! Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg at krypto.org Sun Jun 29 08:26:24 2014 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 28 Jun 2014 23:26:24 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: On Jun 28, 2014 12:49 PM, "Ben Hoyt" wrote: > > >> But the underlying system calls -- ``FindFirstFile`` / > >> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- > > > > What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir? > > I guess it'd be better to say "Windows" and "Unix-based OSs" > throughout the PEP? Because all of these (including Mac OS X) are > Unix-based. No, Just say POSIX. > > > It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we > > should mimic stat_result recent addition: the new > > stat_result.file_attributes field. Add DirEntry.file_attributes which > > would only be available on Windows. > > > > The Windows structure also contains > > > > FILETIME ftCreationTime; > > FILETIME ftLastAccessTime; > > FILETIME ftLastWriteTime; > > DWORD nFileSizeHigh; > > DWORD nFileSizeLow; > > > > It would be nice to expose them as well. I'm no more surprised that > > the exact API is different depending on the OS for functions of the os > > module. > > I think you've misunderstood how DirEntry.lstat() works on Windows -- > it's basically a no-op, as Windows returns the full stat information > with the original FindFirst/FindNext OS calls. This is fairly explict > in the PEP, but I'm sure I could make it clearer: > > DirEntry.lstat(): "like os.lstat(), but requires no system calls on Windows > > So you can already get the dwFileAttributes for free by saying > entry.lstat().st_file_attributes. You can also get all the other > fields you mentioned for free via .lstat() with no additional OS calls > on Windows, for example: entry.lstat().st_size. > > Feel free to suggest changes to the PEP or scandir docs if this isn't > clear. Note that is_dir()/is_file()/is_symlink() are free on all > systems, but .lstat() is only free on Windows. > > > Does your implementation uses a free list to avoid the cost of memory > > allocation? A short free list of 10 or maybe just 1 may help. The free > > list may be stored directly in the generator object. > > No, it doesn't. I might add this to the PEP under "possible > improvements". However, I think the speed increase by removing the > extra OS call and/or disk seek is going to be way more than memory > allocation improvements, so I'm not sure this would be worth it. > > > Does it support also bytes filenames on UNIX? > > > Python now supports undecodable filenames thanks to the PEP 383 > > (surrogateescape). I prefer to use the same type for filenames on > > Linux and Windows, so Unicode is better. But some users might prefer > > bytes for other reasons. > > I forget exactly now what my scandir module does, but for os.scandir() > I think this should behave exactly like os.listdir() does for > Unicode/bytes filenames. > > > Crazy idea: would it be possible to "convert" a DirEntry object to a > > pathlib.Path object without losing the cache? I guess that > > pathlib.Path expects a full stat_result object. > > The main problem is that pathlib.Path objects explicitly don't cache > stat info (and Guido doesn't want them to, for good reason I think). > There's a thread on python-dev about this earlier. I'll add it to a > "Rejected ideas" section. > > > I don't understand how you can build a full lstat() result without > > really calling stat. I see that WIN32_FIND_DATA contains the size, but > > here you call lstat(). > > See above. > > > Do you plan to continue to maintain your module for Python < 3.5, but > > upgrade your module for the final PEP? > > Yes, I intend to maintain the standalone scandir module for 2.6 <= > Python < 3.5, at least for a good while. For integration into the > Python 3.5 stdlib, the implementation will be integrated into > posixmodule.c, of course. > > >> Should there be a way to access the full path? > >> ---------------------------------------------- > >> > >> Should ``DirEntry``'s have a way to get the full path without using > >> ``os.path.join(path, entry.name)``? This is a pretty common pattern, > >> and it may be useful to add pathlib-like ``str(entry)`` functionality. > >> This functionality has also been requested in `issue 13`_ on GitHub. > >> > >> .. _`issue 13`: https://github.com/benhoyt/scandir/issues/13 > > > > I think that it would be very convinient to store the directory name > > in the DirEntry. It should be light, it's just a reference. > > > > And provide a fullname() name which would just return > > os.path.join(path, entry.name) without trying to resolve path to get > > an absolute path. > > Yeah, fair suggestion. I'm still slightly on the fence about this, but > I think an explicit fullname() is a good suggestion. Ideally I think > it'd be better to mimic pathlib.Path.__str__() which is kind of the > equivalent of fullname(). But how does pathlib deal with unicode/bytes > issues if it's the str function which has to return a str object? Or > at least, it'd be very weird if __str__() returned bytes. But I think > it'd need to if you passed bytes into scandir(). Do others have > thoughts? > > > Would it be hard to implement the wildcard feature on UNIX to compare > > performances of scandir('*.jpg') with and without the wildcard built > > in os.scandir? > > It's a good idea, the problem with this is that the Windows wildcard > implementation has a bunch of crazy edge cases where *.ext will catch > more things than just a simple regex/glob. This was discussed on > python-dev or python-ideas previously, so I'll dig it up and add to a > Rejected Ideas section. In any case, this could be added later if > there's a way to iron out the Windows quirks. > > -Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From walter at livinglogic.de Sun Jun 29 10:23:42 2014 From: walter at livinglogic.de (Walter =?utf-8?q?D=C3=B6rwald?=) Date: Sun, 29 Jun 2014 10:23:42 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: <17DF953A-91A9-41E8-AFF7-FC612B2FB4BE@livinglogic.de> On 28 Jun 2014, at 21:48, Ben Hoyt wrote: > [...] >> Crazy idea: would it be possible to "convert" a DirEntry object to a >> pathlib.Path object without losing the cache? I guess that >> pathlib.Path expects a full stat_result object. > > The main problem is that pathlib.Path objects explicitly don't cache > stat info (and Guido doesn't want them to, for good reason I think). > There's a thread on python-dev about this earlier. I'll add it to a > "Rejected ideas" section. However, it would be bad to have two implementations of the concept of "filename" with different attribute and method names. The best way to ensure compatible APIs would be if one class was derived from the other. > [...] Servus, Walter From steve at pearwood.info Sun Jun 29 12:52:40 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 29 Jun 2014 20:52:40 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> Message-ID: <20140629105235.GM13014@ando> On Sat, Jun 28, 2014 at 03:55:00PM -0400, Ben Hoyt wrote: > Re is_dir etc being properties rather than methods: [...] > The problem with this is that properties "look free", they look just > like attribute access, so you wouldn't normally handle exceptions when > accessing them. But .lstat() and .is_dir() etc may do an OS call, so > if you're needing to be careful with error handling, you may want to > handle errors on them. Hence I think it's best practice to make them > functions(). I think this one could go either way. Methods look like they actually re-test the value each time you call it. I can easily see people not realising that the value is cached and writing code like this toy example: # Detect a file change. t = the_file.lstat().st_mtime while the_file.lstat().st_mtime == t: sleep(0.1) print("Changed!") I know that's not the best way to detect file changes, but I'm sure people will do something like that and not realise that the call to lstat is cached. Personally, I would prefer a property. If I forget to wrap a call in a try...except, it will fail hard and I will get an exception. But with a method call, the failure is silent and I keep getting the cached result. Speaking of caching, is there a way to freshen the cached values? -- Steven From ncoghlan at gmail.com Sun Jun 29 13:08:36 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Jun 2014 21:08:36 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <20140629105235.GM13014@ando> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: On 29 June 2014 20:52, Steven D'Aprano wrote: > Speaking of caching, is there a way to freshen the cached values? Switch to a full Path object instead of relying on the cached DirEntry data. This is what makes me wary of including lstat, even though Windows offers it without the extra stat call. Caching behaviour is *really* hard to make intuitive, especially when it *sometimes* returns data that looks fresh (as it on first call on POSIX systems). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Sun Jun 29 13:45:49 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Jun 2014 12:45:49 +0100 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: On 29 June 2014 12:08, Nick Coghlan wrote: > This is what makes me wary of including lstat, even though Windows > offers it without the extra stat call. Caching behaviour is *really* > hard to make intuitive, especially when it *sometimes* returns data > that looks fresh (as it on first call on POSIX systems). If it matters that much we *could* simply call it cached_lstat(). It's ugly, but I really don't like the idea of throwing the information away - after all, the fact that we currently throw data away is why there's even a need for scandir. Let's not make the same mistake again... Paul From ncoghlan at gmail.com Sun Jun 29 14:28:14 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Jun 2014 22:28:14 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: On 29 June 2014 21:45, Paul Moore wrote: > On 29 June 2014 12:08, Nick Coghlan wrote: >> This is what makes me wary of including lstat, even though Windows >> offers it without the extra stat call. Caching behaviour is *really* >> hard to make intuitive, especially when it *sometimes* returns data >> that looks fresh (as it on first call on POSIX systems). > > If it matters that much we *could* simply call it cached_lstat(). It's > ugly, but I really don't like the idea of throwing the information > away - after all, the fact that we currently throw data away is why > there's even a need for scandir. Let's not make the same mistake > again... Future-proofing is the reason DirEntry is a full fledged class in the first place, though. Effectively communicating the behavioural difference between DirEntry and pathlib.Path is the main thing that makes me nervous about adhering too closely to the Path API. To restate the problem and the alternative proposal, these are the DirEntry methods under discussion: is_dir(): like os.path.isdir(), but requires no system calls on at least POSIX and Windows is_file(): like os.path.isfile(), but requires no system calls on at least POSIX and Windows is_symlink(): like os.path.islink(), but requires no system calls on at least POSIX and Windows lstat(): like os.lstat(), but requires no system calls on Windows For the almost-certain-to-be-cached items, the suggestion is to make them properties (or just ordinary attributes): is_dir is_file is_symlink What do with lstat() is currently less clear, since POSIX directory scanning doesn't provide that level of detail by default. The PEP also doesn't currently state whether the is_dir(), is_file() and is_symlink() results would be updated if a call to lstat() produced different answers than the original directory scanning process, which further suggests to me that allowing the stat call to be delayed on POSIX systems is a potentially problematic and inherently confusing design. We would have two options: - update them, meaning calling lstat() may change those results from being a snapshot of the setting at the time the directory was scanned - leave them alone, meaning the DirEntry object and the DirEntry.lstat() result may give different answers Those both sound ugly to me. So, here's my alternative proposal: add an "ensure_lstat" flag to scandir() itself, and don't have *any* methods on DirEntry, only attributes. That would make the DirEntry attributes: is_dir: boolean, always populated is_file: boolean, always populated is_symlink boolean, always populated lstat_result: stat result, may be None on POSIX systems if ensure_lstat is False (I'm not particularly sold on "lstat_result" as the name, but "lstat" reads as a verb to me, so doesn't sound right as an attribute name) What this would allow: - by default, scanning is efficient everywhere, but lstat_result may be None on POSIX systems - if you always need the lstat result, setting "ensure_lstat" will trigger the extra system call implicitly - if you only sometimes need the stat result, you can call os.lstat() explicitly when the DirEntry lstat attribute is None Most importantly, *regardless of platform*, the cached stat result (if not None) would reflect the state of the entry at the time the directory was scanned, rather than at some arbitrary later point in time when lstat() was first called on the DirEntry object. There'd still be a slight window of discrepancy (since the filesystem state may change between reading the directory entry and making the lstat() call), but this could be effectively eliminated from the perspective of the Python code by making the result of the lstat() call authoritative for the whole DirEntry object. Regards, Nick. P.S. We'd be generating quite a few of these, so we can use __slots__ to keep the memory overhead to a minimum (that's just a general comment - it's really irrelevant to the methods-or-attributes question). -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From j.wielicki at sotecware.net Sun Jun 29 13:12:55 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Sun, 29 Jun 2014 13:12:55 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: <53AFF4B7.9030200@sotecware.net> On 29.06.2014 13:08, Nick Coghlan wrote: > On 29 June 2014 20:52, Steven D'Aprano wrote: >> Speaking of caching, is there a way to freshen the cached values? > > Switch to a full Path object instead of relying on the cached DirEntry data. > > This is what makes me wary of including lstat, even though Windows > offers it without the extra stat call. Caching behaviour is *really* > hard to make intuitive, especially when it *sometimes* returns data > that looks fresh (as it on first call on POSIX systems). This bugs me too. An idea I had was adding a keyword argument to scandir which specifies whether stat data should be added to the direntry or not. If the flag is set to True, This would implicitly call lstat on POSIX before returning the DirEntry, and use the available data on Windows. If the flag is set to False, all the fields in the DirEntry will be None, for consistency, even on Windows. This is not optimal in cases where the stat information is needed only for some of the DirEntry objects, but would also reduce the required logic in the DirEntry object. Thoughts? > > Regards, > Nick. > > From ethan at stoneleaf.us Sun Jun 29 19:02:16 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 29 Jun 2014 10:02:16 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: <53B04698.90600@stoneleaf.us> On 06/29/2014 05:28 AM, Nick Coghlan wrote: > > So, here's my alternative proposal: add an "ensure_lstat" flag to > scandir() itself, and don't have *any* methods on DirEntry, only > attributes. > > That would make the DirEntry attributes: > > is_dir: boolean, always populated > is_file: boolean, always populated > is_symlink boolean, always populated > lstat_result: stat result, may be None on POSIX systems if > ensure_lstat is False > > (I'm not particularly sold on "lstat_result" as the name, but "lstat" > reads as a verb to me, so doesn't sound right as an attribute name) +1 -- ~Ethan~ From ethan at stoneleaf.us Sun Jun 29 19:04:19 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 29 Jun 2014 10:04:19 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53AFF4B7.9030200@sotecware.net> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53AFF4B7.9030200@sotecware.net> Message-ID: <53B04713.1070700@stoneleaf.us> On 06/29/2014 04:12 AM, Jonas Wielicki wrote: > > If the flag is set to False, all the fields in the DirEntry will be > None, for consistency, even on Windows. -1 This consistency is unnecessary. -- ~Ethan~ From 4kir4.1i at gmail.com Sun Jun 29 20:32:53 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Sun, 29 Jun 2014 22:32:53 +0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator References: <877g412nhg.fsf@gmail.com> Message-ID: <87zjgv1s8a.fsf@gmail.com> Chris Angelico writes: > On Sat, Jun 28, 2014 at 11:05 PM, Akira Li <4kir4.1i at gmail.com> wrote: >> Have you considered adding support for paths relative to directory >> descriptors [1] via keyword only dir_fd=None parameter if it may lead to >> more efficient implementations on some platforms? >> >> [1]: https://docs.python.org/3.4/library/os.html#dir-fd > > Potentially more efficient and also potentially safer (see 'man > openat')... but an enhancement that can wait, if necessary. > Introducing the feature later creates unnecessary incompatibilities. Either it should be explicitly rejected in the PEP 471 and something-like `os.scandir(os.open(relative_path, dir_fd=fd))` recommended instead (assuming `os.scandir in os.supports_fd` like `os.listdir()`). At C level it could be implemented using fdopendir/openat or scandirat. Here's the function description using Argument Clinic DSL: /*[clinic input] os.scandir path : path_t(allow_fd=True, nullable=True) = '.' *path* can be specified as either str or bytes. On some platforms, *path* may also be specified as an open file descriptor; the file descriptor must refer to a directory. If this functionality is unavailable, using it raises NotImplementedError. * dir_fd : dir_fd = None If not None, it should be a file descriptor open to a directory, and *path* should be a relative string; path will then be relative to that directory. if *dir_fd* is unavailable, using it raises NotImplementedError. Yield a DirEntry object for each file and directory in *path*. Just like os.listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. {parameters} It's an error to use *dir_fd* when specifying *path* as an open file descriptor. [clinic start generated code]*/ And corresponding tests (from test_posix:PosixTester), to show the compatibility with os.listdir argument parsing in detail: def test_scandir_default(self): # When scandir is called without argument, # it's the same as scandir(os.curdir). self.assertIn(support.TESTFN, [e.name for e in posix.scandir()]) def _test_scandir(self, curdir): filenames = sorted(e.name for e in posix.scandir(curdir)) self.assertIn(support.TESTFN, filenames) #NOTE: assume listdir, scandir accept the same types on the platform self.assertEqual(sorted(posix.listdir(curdir)), filenames) def test_scandir(self): self._test_scandir(os.curdir) def test_scandir_none(self): # it's the same as scandir(os.curdir). self._test_scandir(None) def test_scandir_bytes(self): # When scandir is called with a bytes object, # the returned entries names are still of type str. # Call `os.fsencode(entry.name)` to get bytes self.assertIn('a', {'a'}) self.assertNotIn(b'a', {'a'}) self._test_scandir(b'.') @unittest.skipUnless(posix.scandir in os.supports_fd, "test needs fd support for posix.scandir()") def test_scandir_fd_minus_one(self): # it's the same as scandir(os.curdir). self._test_scandir(-1) def test_scandir_float(self): # invalid args self.assertRaises(TypeError, posix.scandir, -1.0) @unittest.skipUnless(posix.scandir in os.supports_fd, "test needs fd support for posix.scandir()") def test_scandir_fd(self): fd = posix.open(posix.getcwd(), posix.O_RDONLY) self.addCleanup(posix.close, fd) self._test_scandir(fd) self.assertEqual( sorted(posix.scandir('.')), sorted(posix.scandir(fd))) # call 2nd time to test rewind self.assertEqual( sorted(posix.scandir('.')), sorted(posix.scandir(fd))) @unittest.skipUnless(posix.scandir in os.supports_dir_fd, "test needs dir_fd support for os.scandir()") def test_scandir_dir_fd(self): relpath = 'relative_path' with support.temp_dir() as parent: fullpath = os.path.join(parent, relpath) with support.temp_dir(path=fullpath): support.create_empty_file(os.path.join(parent, 'a')) support.create_empty_file(os.path.join(fullpath, 'b')) fd = posix.open(parent, posix.O_RDONLY) self.addCleanup(posix.close, fd) self.assertEqual( sorted(posix.scandir(relpath, dir_fd=fd)), sorted(posix.scandir(fullpath))) # check that fd is still useful self.assertEqual( sorted(posix.scandir(relpath, dir_fd=fd)), sorted(posix.scandir(fullpath))) -- Akira From j.wielicki at sotecware.net Sun Jun 29 23:04:09 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Sun, 29 Jun 2014 23:04:09 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B04713.1070700@stoneleaf.us> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53AFF4B7.9030200@sotecware.net> <53B04713.1070700@stoneleaf.us> Message-ID: <53B07F49.6010300@sotecware.net> On 29.06.2014 19:04, Ethan Furman wrote: > On 06/29/2014 04:12 AM, Jonas Wielicki wrote: >> >> If the flag is set to False, all the fields in the DirEntry will be >> None, for consistency, even on Windows. > > -1 > > This consistency is unnecessary. I?m not sure -- similar to the windows_wildcard option this might be a temptation to write platform dependent code, although possibly by accident (i.e. not reading the docs carefully). > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/j.wielicki%40sotecware.net > From berker.peksag at gmail.com Mon Jun 30 02:08:24 2014 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Mon, 30 Jun 2014 03:08:24 +0300 Subject: [Python-Dev] Fix Unicode-disabled build of Python 2.7 In-Reply-To: References: <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com> Message-ID: On Sat, Jun 28, 2014 at 2:51 AM, Victor Stinner wrote: > 2014-06-26 13:04 GMT+02:00 Antoine Pitrou : >> For the same reason, I agree with Victor that we should ditch the >> threading-disabled builds. It's too much of a hassle for no actual, >> practical benefit. People who want a threadless unicodeless Python can >> install Python 1.5.2 for all I care. > > By the way, adding a buildbot for testing Python without thread > support is not enough. The buildbot is currently broken since more > than one month and nobody noticed :-p I've opened http://bugs.python.org/issue21755 to fix the test a couple of weeks ago. --Berker > > http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%203.x/ > > Ok, I noticed, but I consider that I spent too much time on this minor > use case. I prefer to leave such task to someone else :-) > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/berker.peksag%40gmail.com From v+python at g.nevcal.com Mon Jun 30 04:33:33 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 29 Jun 2014 19:33:33 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: <53B0CC7D.6090609@g.nevcal.com> On 6/29/2014 5:28 AM, Nick Coghlan wrote: > There'd still be a slight window of discrepancy (since the filesystem > state may change between reading the directory entry and making the > lstat() call), but this could be effectively eliminated from the > perspective of the Python code by making the result of the lstat() > call authoritative for the whole DirEntry object. +1 to this in particular, but this whole refresh of the semantics sounds better overall. Finally, for the case where someone does want to keep the DirEntry around, a .refresh() API could rerun lstat() and update all the data. And with that (initial data potentially always populated, or None, and an explicit refresh() API), the data could all be returned as properties, implying that they aren't fetching new data themselves, because they wouldn't be. Glenn -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Mon Jun 30 19:05:54 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 30 Jun 2014 13:05:54 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: > So, here's my alternative proposal: add an "ensure_lstat" flag to > scandir() itself, and don't have *any* methods on DirEntry, only > attributes. > > That would make the DirEntry attributes: > > is_dir: boolean, always populated > is_file: boolean, always populated > is_symlink boolean, always populated > lstat_result: stat result, may be None on POSIX systems if > ensure_lstat is False > > (I'm not particularly sold on "lstat_result" as the name, but "lstat" > reads as a verb to me, so doesn't sound right as an attribute name) > > What this would allow: > > - by default, scanning is efficient everywhere, but lstat_result may > be None on POSIX systems > - if you always need the lstat result, setting "ensure_lstat" will > trigger the extra system call implicitly > - if you only sometimes need the stat result, you can call os.lstat() > explicitly when the DirEntry lstat attribute is None > > Most importantly, *regardless of platform*, the cached stat result (if > not None) would reflect the state of the entry at the time the > directory was scanned, rather than at some arbitrary later point in > time when lstat() was first called on the DirEntry object. > > There'd still be a slight window of discrepancy (since the filesystem > state may change between reading the directory entry and making the > lstat() call), but this could be effectively eliminated from the > perspective of the Python code by making the result of the lstat() > call authoritative for the whole DirEntry object. Yeah, I quite like this. It does make the caching more explicit and consistent. It's slightly annoying that it's less like pathlib.Path now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't matter. The differences in naming may highlight the difference in caching, so maybe it's a good thing. Two further questions from me: 1) How does error handling work? Now os.stat() will/may be called during iteration, so in __next__. But it hard to catch errors because you don't call __next__ explicitly. Is this a problem? How do other iterators that make system calls or raise errors handle this? 2) There's still the open question in the PEP of whether to include a way to access the full path. This is cheap to build, it has to be built anyway on POSIX systems, and it's quite useful for further operations on the file. I think the best way to handle this is a .fullname or .full_name attribute as suggested elsewhere. Thoughts? -Ben