From rdmurray at bitdance.com Mon Feb 1 11:40:22 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 01 Feb 2016 11:40:22 -0500 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <20160201031226.GF31806@ando.pearwood.info> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> Message-ID: <20160201164023.CC500B200A1@webabinitio.net> On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano wrote: > On Sun, Jan 31, 2016 at 08:23:00PM +0000, Brett Cannon wrote: > > So freezing the stdlib helps on UNIX and not on OS X (if my old testing is > > still accurate). I guess the next question is what it does on Windows and > > if we would want to ever consider freezing the stdlib as part of the build > > process (and if we would want to change the order of importers on > > sys.meta_path so frozen modules came after file-based ones). > > I find that being able to easily open stdlib .py files in a text editor > to read the source is extremely valuable. I've learned much more from > reading the source than from (e.g.) StackOverflow. Likewise, it's often > handy to do a grep over the stdlib. When you talk about freezing the > stdlib, what exactly does that mean? > > - will the source files still be there? Well, Brett said it would be optional, though perhaps the above paragraph is asking about doing it in our Windows build. But the linux distros might make also use the option if it exists, so the question is very meaningful. However, you'd have to ask the distro if the source would be shipped in the linux case, and I'd guess not in most cases. I don't know about anyone else, but on my own development systems it is not that unusual for me to *edit* the stdlib files (to add debug prints) while debugging my own programs. Freeze would definitely interfere with that. I could, of course, install a separate source build on my dev system, but I thought it worth mentioning as a factor. On the other hand, if the distros go the way Nick has (I think) been advocating, and have a separate 'system python for system scripts' that is independent of the one installed for user use, having the system-only python be frozen and sourceless would actually make sense on a couple of levels. --David From barry at python.org Mon Feb 1 11:54:41 2016 From: barry at python.org (Barry Warsaw) Date: Mon, 1 Feb 2016 11:54:41 -0500 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <20160201164023.CC500B200A1@webabinitio.net> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> Message-ID: <20160201115441.7984a500@subdivisions.wooz.org> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: >Well, Brett said it would be optional, though perhaps the above >paragraph is asking about doing it in our Windows build. But the linux >distros might make also use the option if it exists, so the question is >very meaningful. However, you'd have to ask the distro if the source >would be shipped in the linux case, and I'd guess not in most cases. It's very likely the .py files would still be shipped, but perhaps in a -dev package that isn't normally installed. >I don't know about anyone else, but on my own development systems it is >not that unusual for me to *edit* the stdlib files (to add debug prints) >while debugging my own programs. Freeze would definitely interfere with >that. I could, of course, install a separate source build on my dev >system, but I thought it worth mentioning as a factor. I do this too, though usually in a VM or chroot and not in my live system. A very common situation for me though is pdb stepping through my own code and landing in -or passing through- stdlib. >On the other hand, if the distros go the way Nick has (I think) been >advocating, and have a separate 'system python for system scripts' that >is independent of the one installed for user use, having the system-only >python be frozen and sourceless would actually make sense on a couple of >levels. Yep, we've talked about it in Debian-land too, but never quite gotten around to doing anything. Certainly I'd like to see some consistency among Linux distros there (i.e. discussed on linux-sig@). But even with system scripts, I do need to step through them occasionally. If it were a matter of changing a shebang or invoking the script with a different Python (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full unpacked source, that would be fine. Cheers, -Barry From yselivanov.ml at gmail.com Mon Feb 1 11:54:17 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 11:54:17 -0500 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <20160130042835.GJ4619@ando.pearwood.info> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> Message-ID: <56AF8DB9.30900@gmail.com> On 2016-01-29 11:28 PM, Steven D'Aprano wrote: > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: >> Hi, >> >> >> tl;dr The summary is that I have a patch that improves CPython >> performance up to 5-10% on macro benchmarks. Benchmarks results on >> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available >> at [1]. There are no slowdowns that I could reproduce consistently. > Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now > unmaintained, and the project repo on Google Code appears to be dead (I > get a 404), but I understand that it was significantly faster than > CPython back in the 2.6 days. > > https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf > > Thanks for bringing this up! IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per bytecode instead of 8. That allows to minimize the number of bytecodes, thus having some performance increase. TBH, I don't think it was "significantly faster". If I were to do some big refactoring of the ceval loop, I'd probably consider implementing a register VM. While register VMs are a bit faster than stack VMs (up to 20-30%), they would also allow us to apply more optimizations, and even bolt on a simple JIT compiler. Yury From brett at python.org Mon Feb 1 12:18:34 2016 From: brett at python.org (Brett Cannon) Date: Mon, 01 Feb 2016 17:18:34 +0000 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56AF8DB9.30900@gmail.com> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> Message-ID: On Mon, 1 Feb 2016 at 09:08 Yury Selivanov wrote: > > > On 2016-01-29 11:28 PM, Steven D'Aprano wrote: > > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: > >> Hi, > >> > >> > >> tl;dr The summary is that I have a patch that improves CPython > >> performance up to 5-10% on macro benchmarks. Benchmarks results on > >> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available > >> at [1]. There are no slowdowns that I could reproduce consistently. > > Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now > > unmaintained, and the project repo on Google Code appears to be dead (I > > get a 404), but I understand that it was significantly faster than > > CPython back in the 2.6 days. > > > > > https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf > > > > > > Thanks for bringing this up! > > IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per > bytecode instead of 8. That allows to minimize the number of bytecodes, > thus having some performance increase. TBH, I don't think it was > "significantly faster". > > If I were to do some big refactoring of the ceval loop, I'd probably > consider implementing a register VM. While register VMs are a bit > faster than stack VMs (up to 20-30%), they would also allow us to apply > more optimizations, and even bolt on a simple JIT compiler. > If you did tackle the register VM approach that would also settle a long-standing question of whether a certain optimization works for Python. As for bolting on a JIT, the whole point of Pyjion is to see if that's worth it for CPython, so that's already being taken care of (and is actually easier with a stack-based VM since the JIT engine we're using is stack-based itself). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Feb 1 12:20:25 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 01 Feb 2016 09:20:25 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <20160201164023.CC500B200A1@webabinitio.net> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> Message-ID: <56AF93D9.2040104@stoneleaf.us> On 02/01/2016 08:40 AM, R. David Murray wrote: > On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano wrote: >> I find that being able to easily open stdlib .py files in a text editor >> to read the source is extremely valuable. I've learned much more from >> reading the source than from (e.g.) StackOverflow. Likewise, it's often >> handy to do a grep over the stdlib. When you talk about freezing the >> stdlib, what exactly does that mean? >> >> - will the source files still be there? > > Well, Brett said it would be optional, though perhaps the above > paragraph is asking about doing it in our Windows build. But the linux > distros might make also use the option if it exists, so the question is > very meaningful. However, you'd have to ask the distro if the source > would be shipped in the linux case, and I'd guess not in most cases. > > I don't know about anyone else, but on my own development systems it is > not that unusual for me to *edit* the stdlib files (to add debug prints) > while debugging my own programs. Freeze would definitely interfere with > that. I could, of course, install a separate source build on my dev > system, but I thought it worth mentioning as a factor. Yup, so do I. > On the other hand, if the distros go the way Nick has (I think) been > advocating, and have a separate 'system python for system scripts' that > is independent of the one installed for user use, having the system-only > python be frozen and sourceless would actually make sense on a couple of > levels. Agreed. -- ~Ethan~ From brett at python.org Mon Feb 1 12:23:55 2016 From: brett at python.org (Brett Cannon) Date: Mon, 01 Feb 2016 17:23:55 +0000 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <20160201164023.CC500B200A1@webabinitio.net> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> Message-ID: On Mon, 1 Feb 2016 at 08:48 R. David Murray wrote: > On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano > wrote: > > On Sun, Jan 31, 2016 at 08:23:00PM +0000, Brett Cannon wrote: > > > So freezing the stdlib helps on UNIX and not on OS X (if my old > testing is > > > still accurate). I guess the next question is what it does on Windows > and > > > if we would want to ever consider freezing the stdlib as part of the > build > > > process (and if we would want to change the order of importers on > > > sys.meta_path so frozen modules came after file-based ones). > > > > I find that being able to easily open stdlib .py files in a text editor > > to read the source is extremely valuable. I've learned much more from > > reading the source than from (e.g.) StackOverflow. Likewise, it's often > > handy to do a grep over the stdlib. When you talk about freezing the > > stdlib, what exactly does that mean? > > > > - will the source files still be there? > > Well, Brett said it would be optional, though perhaps the above > paragraph is asking about doing it in our Windows build. Nope, it would probably need to be across all OSs to have consistent semantics. > But the linux > distros might make also use the option if it exists, so the question is > very meaningful. However, you'd have to ask the distro if the source > would be shipped in the linux case, and I'd guess not in most cases. > > I don't know about anyone else, but on my own development systems it is > not that unusual for me to *edit* the stdlib files (to add debug prints) > while debugging my own programs. Freeze would definitely interfere with > that. I could, of course, install a separate source build on my dev > system, but I thought it worth mentioning as a factor. > This is what would need to be discussed in terms of how to handle this. For instance, we already do stuff in (I believe) site.py when we detect the build is in a checkout, so we could in that instance make sure the stdlib file directory takes precedence over any frozen code (hence why I wondered if the frozen importer on sys.meta_path should come after the sys.path importer). If we did that then we could make installing the stdlib files optional but still take precedence. It's all workable, it's just a question of if we want to. This is why I think we should get concrete benchmark numbers on Windows, Linux, and OS X to see if this is even worth considering as something we provide in our own binaries. > > On the other hand, if the distros go the way Nick has (I think) been > advocating, and have a separate 'system python for system scripts' that > is independent of the one installed for user use, having the system-only > python be frozen and sourceless would actually make sense on a couple of > levels. > It at least wouldn't hurt anything. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Feb 1 12:54:28 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 12:54:28 -0500 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> Message-ID: <56AF9BD4.40607@gmail.com> Hi Brett, On 2016-02-01 12:18 PM, Brett Cannon wrote: > > On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote: > > > [..] > > If I were to do some big refactoring of the ceval loop, I'd probably > consider implementing a register VM. While register VMs are a bit > faster than stack VMs (up to 20-30%), they would also allow us to > apply > more optimizations, and even bolt on a simple JIT compiler. > > > [..] > > As for bolting on a JIT, the whole point of Pyjion is to see if that's > worth it for CPython, so that's already being taken care of (and is > actually easier with a stack-based VM since the JIT engine we're using > is stack-based itself). Sure, I have very high hopes for Pyjion and Pyston. I really hope that Microsoft and Dropbox will keep pushing. Yury From mike.romberg at comcast.net Mon Feb 1 12:59:39 2016 From: mike.romberg at comcast.net (mike.romberg at comcast.net) Date: Mon, 1 Feb 2016 10:59:39 -0700 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <20160201115441.7984a500@subdivisions.wooz.org> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> Message-ID: <22191.40203.430978.404940@lrd.home.lan> >>>>> " " == Barry Warsaw writes: >> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: >> I don't know about anyone else, but on my own development >> systems it is not that unusual for me to *edit* the stdlib >> files (to add debug prints) while debugging my own programs. >> Freeze would definitely interfere with that. I could, of >> course, install a separate source build on my dev system, but I >> thought it worth mentioning as a factor. [snip] > But even with system scripts, I do need to step through them > occasionally. If it were a matter of changing a shebang or > invoking the script with a different Python > (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full > unpacked source, that would be fine. If the stdlib were to use implicit namespace packages ( https://www.python.org/dev/peps/pep-0420/ ) and the various loaders/importers as well, then python could do what I've done with an embedded python application for years. Freeze the stdlib (or put it in a zipfile or whatever is fast). Then arrange PYTHONPATH to first look on the filesystem and then look in the frozen/ziped storage. Normally the filesystem part is empty. So, modules are loaded from the frozen/zip area. But if you wanna override one of the frozen modules simply copy one or more .py files onto the file system. I've been doing this only with modules in the global scope. But implicit namespace packages seem to open the door for this with packages. Mike From srkunze at mail.de Mon Feb 1 13:11:28 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 19:11:28 +0100 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org>

Message-ID: <56AF9FD0.6030603@mail.de> Thanks, Brett. Wasn't aware of lazy imports as well. I think that one is even better reducing startup time as freezing stdlib. On 31.01.2016 18:57, Brett Cannon wrote: > I have opened http://bugs.python.org/issue26252 to track writing the > example (and before ppl go playing with the lazy loader, be aware of > http://bugs.python.org/issue26186). > > On Sun, 31 Jan 2016 at 09:26 Brett Cannon > wrote: > > There are no example docs for it yet, but enough people have asked > this week about how to set up a custom importer that I will write > up a generic example case which will make sense for a lazy loader > (need to file the issue before I forget). > > > On Sun, 31 Jan 2016, 09:11 Donald Stufft > wrote: > > >> On Jan 31, 2016, at 12:02 PM, Brett Cannon > > wrote: >> >> A lazy importer was added in Python 3.5 > > Is there any docs on how to actually use the LazyLoader in > 3.5? I can?t seem to find any but I don?t really know the > import system that well. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F > 6E3C BCE9 3372 DCFA > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Mon Feb 1 13:12:47 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Mon, 01 Feb 2016 10:12:47 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <22191.40203.430978.404940@lrd.home.lan> (mike romberg's message of "Mon, 1 Feb 2016 10:59:39 -0700") References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> Message-ID: <87lh74l2ow.fsf@thinkpad.rath.org> On Feb 01 2016, mike.romberg at comcast.net wrote: >>>>>> " " == Barry Warsaw writes: > > >> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: > > >> I don't know about anyone else, but on my own development > >> systems it is not that unusual for me to *edit* the > >> stdlib files (to add debug prints) while debugging my own > >> programs. Freeze would definitely interfere with that. > >> I could, of course, install a separate source build on my > >> dev system, but I thought it worth mentioning as a > >> factor. > > [snip] > > > But even with system scripts, I do need to step through > > them occasionally. If it were a matter of changing a > > shebang or invoking the script with a different Python > > (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the > > full unpacked source, that would be fine. > > If the stdlib were to use implicit namespace packages > ( https://www.python.org/dev/peps/pep-0420/ ) and the various > loaders/importers as well, then python could do what I've done > with an embedded python application for years. Freeze the > stdlib (or put it in a zipfile or whatever is fast). Then > arrange PYTHONPATH to first look on the filesystem and then look > in the frozen/ziped storage. Presumably that would eliminate the performance advantages of the frozen/zipped storage because now Python would still have to issue all the stat calls to first check for the existence of a .py file. Best, -Nikolaus (No Cc on replies please, I'm reading the list) -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From srkunze at mail.de Mon Feb 1 13:21:13 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 19:21:13 +0100 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> Message-ID: <56AFA219.9020103@mail.de> On 01.02.2016 18:18, Brett Cannon wrote: > > > On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote: > > > > On 2016-01-29 11:28 PM, Steven D'Aprano wrote: > > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: > >> Hi, > >> > >> > >> tl;dr The summary is that I have a patch that improves CPython > >> performance up to 5-10% on macro benchmarks. Benchmarks results on > >> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are > available > >> at [1]. There are no slowdowns that I could reproduce > consistently. > > Have you looked at Cesare Di Mauro's wpython? As far as I know, > it's now > > unmaintained, and the project repo on Google Code appears to be > dead (I > > get a 404), but I understand that it was significantly faster than > > CPython back in the 2.6 days. > > > > > https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf > > > > > > Thanks for bringing this up! > > IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per > bytecode instead of 8. That allows to minimize the number of > bytecodes, > thus having some performance increase. TBH, I don't think it was > "significantly faster". > > If I were to do some big refactoring of the ceval loop, I'd probably > consider implementing a register VM. While register VMs are a bit > faster than stack VMs (up to 20-30%), they would also allow us to > apply > more optimizations, and even bolt on a simple JIT compiler. > > > If you did tackle the register VM approach that would also settle a > long-standing question of whether a certain optimization works for Python. Are there some resources on why register machines are considered faster than stack machines? > As for bolting on a JIT, the whole point of Pyjion is to see if that's > worth it for CPython, so that's already being taken care of (and is > actually easier with a stack-based VM since the JIT engine we're using > is stack-based itself). Interesting. Haven't noticed these projects, yet. So, it could be that we will see a jitted CPython when Pyjion appears to be successful? Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Feb 1 13:22:42 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 19:22:42 +0100 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56AF8DB9.30900@gmail.com> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> Message-ID: <56AFA272.7030400@mail.de> On 01.02.2016 17:54, Yury Selivanov wrote: > If I were to do some big refactoring of the ceval loop, I'd probably > consider implementing a register VM. While register VMs are a bit > faster than stack VMs (up to 20-30%), they would also allow us to > apply more optimizations, and even bolt on a simple JIT compiler. How do JIT and register machine related to each other? :) Best, Sven From brett at python.org Mon Feb 1 13:28:32 2016 From: brett at python.org (Brett Cannon) Date: Mon, 01 Feb 2016 18:28:32 +0000 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56AFA219.9020103@mail.de> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> <56AFA219.9020103@mail.de> Message-ID: On Mon, 1 Feb 2016 at 10:21 Sven R. Kunze wrote: > > > On 01.02.2016 18:18, Brett Cannon wrote: > > > > On Mon, 1 Feb 2016 at 09:08 Yury Selivanov < > yselivanov.ml at gmail.com> wrote: > >> >> >> On 2016-01-29 11:28 PM, Steven D'Aprano wrote: >> > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: >> >> Hi, >> >> >> >> >> >> tl;dr The summary is that I have a patch that improves CPython >> >> performance up to 5-10% on macro benchmarks. Benchmarks results on >> >> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available >> >> at [1]. There are no slowdowns that I could reproduce consistently. >> > Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now >> > unmaintained, and the project repo on Google Code appears to be dead (I >> > get a 404), but I understand that it was significantly faster than >> > CPython back in the 2.6 days. >> > >> > >> https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf >> > >> > >> >> Thanks for bringing this up! >> >> IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per >> bytecode instead of 8. That allows to minimize the number of bytecodes, >> thus having some performance increase. TBH, I don't think it was >> "significantly faster". >> >> If I were to do some big refactoring of the ceval loop, I'd probably >> consider implementing a register VM. While register VMs are a bit >> faster than stack VMs (up to 20-30%), they would also allow us to apply >> more optimizations, and even bolt on a simple JIT compiler. >> > > If you did tackle the register VM approach that would also settle a > long-standing question of whether a certain optimization works for Python. > > > Are there some resources on why register machines are considered faster > than stack machines? > A search for [stack vs register based virtual machine] will get you some information. > > > As for bolting on a JIT, the whole point of Pyjion is to see if that's > worth it for CPython, so that's already being taken care of (and is > actually easier with a stack-based VM since the JIT engine we're using is > stack-based itself). > > > Interesting. Haven't noticed these projects, yet. > You aren't really supposed to yet. :) In Pyjion's case we are still working on compatibility, let alone trying to show a speed improvement so we have not said much beyond this mailing list (we have a talk proposal in for PyCon US that we hope gets accepted). We just happened to get picked up on Reddit and HN recently and so interest has spiked in the project. > > So, it could be that we will see a jitted CPython when Pyjion appears to > be successful? > The ability to plug in a JIT, but yes, that's the hope. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Feb 1 13:53:27 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 19:53:27 +0100 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> <56AFA219.9020103@mail.de> Message-ID: <56AFA9A7.9020907@mail.de> On 01.02.2016 19:28, Brett Cannon wrote: > A search for [stack vs register based virtual machine] will get you > some information. Alright. :) Will go for that. > You aren't really supposed to yet. :) In Pyjion's case we are still > working on compatibility, let alone trying to show a speed improvement > so we have not said much beyond this mailing list (we have a talk > proposal in for PyCon US that we hope gets accepted). We just happened > to get picked up on Reddit and HN recently and so interest has spiked > in the project. Exciting. :) > > So, it could be that we will see a jitted CPython when Pyjion > appears to be successful? > > > The ability to plug in a JIT, but yes, that's the hope. Okay. Not sure what you mean by plugin. One thing I like about Python is that it just works. So, plugin sounds like unnecessary work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Feb 1 14:10:40 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 14:10:40 -0500 Subject: [Python-Dev] Opcode cache in ceval loop Message-ID: <56AFADB0.8000502@gmail.com> Hi, This is the second email thread I start regarding implementing an opcode cache in ceval loop. Since my first post on this topic: - I've implemented another optimization (LOAD_ATTR); - I've added detailed statistics mode so that I can "see" how the cache performs and tune it; - some macro benchmarks are now 10-20% faster; 2to3 (a real application) is 7-8% faster; - and I have some good insights on the memory footprint. ** The purpose of this email is to get a general approval from python-dev, so that I can start polishing the patches and getting them reviewed/committed. ** Summary of optimizations ------------------------ When a code object is executed more than ~1000 times, it's considered "hot". It gets its opcodes analyzed to initialize caches for LOAD_METHOD (a new opcode I propose to add in [1]), LOAD_ATTR, and LOAD_GLOBAL. It's important to only optimize code objects that were executed "enough" times, to avoid optimizing code objects for modules, classes, and functions that were imported but never used. The cache struct is defined in code.h [2], and is 32 bytes long. When a code object becomes hot, it gets an cache offset table allocated for it (+1 byte for each opcode) + an array of cache structs. To measure the max/average memory impact, I tuned my code to optimize *every* code object on *first* run. Then I ran the entire Python test suite. Python test suite + standard library both contain around 72395 code objects, which required 20Mb of memory for caches. The test process consumed around 400Mb of memory. Thus, the absolute worst case scenario, the overhead is about 5%. Then I ran the test suite without any modifications to the patch. This means that only code objects that are called frequently enough are optimized. In this more, only 2072 code objects were optimized, using less than 1Mb of memory for the cache. LOAD_ATTR --------- Damien George mentioned that they optimize a lot of dict lookups in MicroPython by memorizing last key/value offset in the dict object, thus eliminating lots of hash lookups. I've implemented this optimization in my patch. The results are quite good. A simple micro-benchmark [3] shows ~30% speed improvement. Here are some debug stats generated by 2to3 benchmark: -- Opcode cache LOAD_ATTR hits = 14778415 (83%) -- Opcode cache LOAD_ATTR misses = 750 (0%) -- Opcode cache LOAD_ATTR opts = 282 -- Opcode cache LOAD_ATTR deopts = 60 -- Opcode cache LOAD_ATTR total = 17777912 Each "hit" makes LOAD_ATTR about 30% faster. LOAD_GLOBAL ----------- This turned out to be a very stable optimization. Here is the debug output of the 2to3 test: -- Opcode cache LOAD_GLOBAL hits = 3940647 (100%) -- Opcode cache LOAD_GLOBAL misses = 0 (0%) -- Opcode cache LOAD_GLOBAL opts = 252 All benchmarks (and real code) have stats like that. Globals and builtins are very rarely modified, so the cache works really well. With LOAD_GLOBAL opcode cache, global lookup is very cheap, there is no hash lookup for it at all. It makes optimizations like "def foo(len=len)" obsolete. LOAD_METHOD ----------- This is a new opcode I propose to add in [1]. The idea is to substitute LOAD_ATTR with it, and avoid instantiation of BoundMethod objects. With the cache, we can store a reference to the method descriptor (I use type->tp_version_tag for cache invalidation, the same thing _PyType_Lookup is built around). The cache makes LOAD_METHOD really efficient. A simple micro-benchmark like [4], shows that with the cache and LOAD_METHOD, "s.startswith('abc')" becomes as efficient as "s[:3] == 'abc'". LOAD_METHOD/CALL_FUNCTION without cache is about 20% faster than LOAD_ATTR/CALL_FUNCTION. With the cache, it's about 30% faster. Here's the debug output of the 2to3 benchmark: -- Opcode cache LOAD_METHOD hits = 5164848 (64%) -- Opcode cache LOAD_METHOD misses = 12 (0%) -- Opcode cache LOAD_METHOD opts = 94 -- Opcode cache LOAD_METHOD deopts = 12 -- Opcode cache LOAD_METHOD dct-chk= 1614801 -- Opcode cache LOAD_METHOD total = 7945954 What's next? ------------ First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110 [1]. It's a very straightforward optimization, the patch is small and easy to review. Second, I'd like to merge the new opcode cache, see issue 26219 [5]. All unittests pass. Memory usage increase is very moderate (<1mb for the entire test suite), and the performance increase is significant. The only potential blocker for this is PEP 509 approval (which I'd be happy to assist with). What do you think? Thanks, Yury [1] http://bugs.python.org/issue26110 [2] https://github.com/1st1/cpython/blob/opcache5/Include/code.h#L10 [3] https://gist.github.com/1st1/37d928f1e84813bf1c44 [4] https://gist.github.com/1st1/10588e6e11c4d7c19445 [5] http://bugs.python.org/issue26219 From brett at python.org Mon Feb 1 14:30:44 2016 From: brett at python.org (Brett Cannon) Date: Mon, 01 Feb 2016 19:30:44 +0000 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFADB0.8000502@gmail.com> References: <56AFADB0.8000502@gmail.com> Message-ID: On Mon, 1 Feb 2016 at 11:11 Yury Selivanov wrote: > Hi, > > This is the second email thread I start regarding implementing an opcode > cache in ceval loop. Since my first post on this topic: > > - I've implemented another optimization (LOAD_ATTR); > > - I've added detailed statistics mode so that I can "see" how the cache > performs and tune it; > > - some macro benchmarks are now 10-20% faster; 2to3 (a real application) > is 7-8% faster; > > - and I have some good insights on the memory footprint. > > ** The purpose of this email is to get a general approval from > python-dev, so that I can start polishing the patches and getting them > reviewed/committed. ** > > > Summary of optimizations > ------------------------ > > When a code object is executed more than ~1000 times, it's considered > "hot". It gets its opcodes analyzed to initialize caches for > LOAD_METHOD (a new opcode I propose to add in [1]), LOAD_ATTR, and > LOAD_GLOBAL. > > It's important to only optimize code objects that were executed "enough" > times, to avoid optimizing code objects for modules, classes, and > functions that were imported but never used. > > The cache struct is defined in code.h [2], and is 32 bytes long. When a > code object becomes hot, it gets an cache offset table allocated for it > (+1 byte for each opcode) + an array of cache structs. > > To measure the max/average memory impact, I tuned my code to optimize > *every* code object on *first* run. Then I ran the entire Python test > suite. Python test suite + standard library both contain around 72395 > code objects, which required 20Mb of memory for caches. The test > process consumed around 400Mb of memory. Thus, the absolute worst case > scenario, the overhead is about 5%. > > Then I ran the test suite without any modifications to the patch. This > means that only code objects that are called frequently enough are > optimized. In this more, only 2072 code objects were optimized, using > less than 1Mb of memory for the cache. > > > LOAD_ATTR > --------- > > Damien George mentioned that they optimize a lot of dict lookups in > MicroPython by memorizing last key/value offset in the dict object, thus > eliminating lots of hash lookups. I've implemented this optimization in > my patch. The results are quite good. A simple micro-benchmark [3] > shows ~30% speed improvement. Here are some debug stats generated by > 2to3 benchmark: > > -- Opcode cache LOAD_ATTR hits = 14778415 (83%) > -- Opcode cache LOAD_ATTR misses = 750 (0%) > -- Opcode cache LOAD_ATTR opts = 282 > -- Opcode cache LOAD_ATTR deopts = 60 > -- Opcode cache LOAD_ATTR total = 17777912 > > Each "hit" makes LOAD_ATTR about 30% faster. > > > LOAD_GLOBAL > ----------- > > This turned out to be a very stable optimization. Here is the debug > output of the 2to3 test: > > -- Opcode cache LOAD_GLOBAL hits = 3940647 (100%) > -- Opcode cache LOAD_GLOBAL misses = 0 (0%) > -- Opcode cache LOAD_GLOBAL opts = 252 > > All benchmarks (and real code) have stats like that. Globals and > builtins are very rarely modified, so the cache works really well. With > LOAD_GLOBAL opcode cache, global lookup is very cheap, there is no hash > lookup for it at all. It makes optimizations like "def foo(len=len)" > obsolete. > > > LOAD_METHOD > ----------- > > This is a new opcode I propose to add in [1]. The idea is to substitute > LOAD_ATTR with it, and avoid instantiation of BoundMethod objects. > > With the cache, we can store a reference to the method descriptor (I use > type->tp_version_tag for cache invalidation, the same thing > _PyType_Lookup is built around). > > The cache makes LOAD_METHOD really efficient. A simple micro-benchmark > like [4], shows that with the cache and LOAD_METHOD, > "s.startswith('abc')" becomes as efficient as "s[:3] == 'abc'". > > LOAD_METHOD/CALL_FUNCTION without cache is about 20% faster than > LOAD_ATTR/CALL_FUNCTION. With the cache, it's about 30% faster. > > Here's the debug output of the 2to3 benchmark: > > -- Opcode cache LOAD_METHOD hits = 5164848 (64%) > -- Opcode cache LOAD_METHOD misses = 12 (0%) > -- Opcode cache LOAD_METHOD opts = 94 > -- Opcode cache LOAD_METHOD deopts = 12 > -- Opcode cache LOAD_METHOD dct-chk= 1614801 > -- Opcode cache LOAD_METHOD total = 7945954 > > > What's next? > ------------ > > First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110 > [1]. It's a very straightforward optimization, the patch is small and > easy to review. +1 from me. > > Second, I'd like to merge the new opcode cache, see issue 26219 [5]. > All unittests pass. Memory usage increase is very moderate (<1mb for > the entire test suite), and the performance increase is significant. > The only potential blocker for this is PEP 509 approval (which I'd be > happy to assist with). > I think the fact that it improves performance across the board as well as eliminates the various tricks people use to cache global and built-ins, a big +1 from me. I guess that means Victor needs to ask for pronouncement on PEP 509. BTW, where does LOAD_ATTR fit into all of this? > > What do you think? > It all looks great to me! -Brett > > Thanks, > Yury > > > [1] http://bugs.python.org/issue26110 > [2] https://github.com/1st1/cpython/blob/opcache5/Include/code.h#L10 > [3] https://gist.github.com/1st1/37d928f1e84813bf1c44 > [4] https://gist.github.com/1st1/10588e6e11c4d7c19445 > [5] http://bugs.python.org/issue26219 > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Feb 1 14:51:41 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 14:51:41 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> Message-ID: <56AFB74D.8040108@gmail.com> Hi Brett, On 2016-02-01 2:30 PM, Brett Cannon wrote: > > > On Mon, 1 Feb 2016 at 11:11 Yury Selivanov > wrote: > > Hi, > [..] > > What's next? > ------------ > > First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110 > [1]. It's a very straightforward optimization, the patch is small and > easy to review. > > > +1 from me. > > > Second, I'd like to merge the new opcode cache, see issue 26219 [5]. > All unittests pass. Memory usage increase is very moderate (<1mb for > the entire test suite), and the performance increase is significant. > The only potential blocker for this is PEP 509 approval (which I'd be > happy to assist with). > > > I think the fact that it improves performance across the board as well > as eliminates the various tricks people use to cache global and > built-ins, a big +1 from me. I guess that means Victor needs to ask > for pronouncement on PEP 509. Great! AFAIK Victor still needs to update the PEP with some changes (globally unique ma_version). My patch includes the latest implementation of PEP 509, and it works fine (no regressions, no broken unittests). I can also assist with reviewing Victor's implementation if the PEP is accepted. > > BTW, where does LOAD_ATTR fit into all of this? LOAD_ATTR optimization doesn't use any of PEP 509 new stuff (if I understand you question correctly). It's based on the following assumptions (that really make JITs work so well): 1. Most classes don't implement __getattribute__. 2. A lot of attributes are stored in objects' __dict__s. 3. Most attributes aren't shaded by descriptors/getters-setters; most code just uses "self.attr". 4. An average method/function works on objects of the same type. Which means that those objects were constructed in a very similar (if not exact) fashion. For instance: class F: def __init__(self, name): self.name = name def say(self): print(self.name) # <- For all F instances, # offset of 'name' in `F().__dict__`s # will be the same If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets deoptimized, and the default implementation is used. So if the code is very dynamic - there's no improvement, but no performance penalty either. In my patch, I use the cache to store (for LOAD_ATTR specifically): - pointer to object's type - type->tp_version_tag - the last successful __dict__ offset The first two fields are used to make sure that we have objects of the same type. If it changes, we deoptimize the opcode immediately. Then we try the offset. If it's successful - we have a cache hit. If not, that's fine, we'll try another few times before deoptimizing the opcode. > > What do you think? > > > It all looks great to me! Thanks! Yury From brett at python.org Mon Feb 1 15:08:16 2016 From: brett at python.org (Brett Cannon) Date: Mon, 01 Feb 2016 20:08:16 +0000 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFB74D.8040108@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> Message-ID: On Mon, 1 Feb 2016 at 11:51 Yury Selivanov wrote: > Hi Brett, > > On 2016-02-01 2:30 PM, Brett Cannon wrote: > > > > > > On Mon, 1 Feb 2016 at 11:11 Yury Selivanov > > wrote: > > > > Hi, > > > [..] > > > > What's next? > > ------------ > > > > First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110 > > [1]. It's a very straightforward optimization, the patch is small > and > > easy to review. > > > > > > +1 from me. > > > > > > Second, I'd like to merge the new opcode cache, see issue 26219 [5]. > > All unittests pass. Memory usage increase is very moderate (<1mb for > > the entire test suite), and the performance increase is significant. > > The only potential blocker for this is PEP 509 approval (which I'd be > > happy to assist with). > > > > > > I think the fact that it improves performance across the board as well > > as eliminates the various tricks people use to cache global and > > built-ins, a big +1 from me. I guess that means Victor needs to ask > > for pronouncement on PEP 509. > > Great! AFAIK Victor still needs to update the PEP with some changes > (globally unique ma_version). My patch includes the latest > implementation of PEP 509, and it works fine (no regressions, no broken > unittests). I can also assist with reviewing Victor's implementation if > the PEP is accepted. > > > > > BTW, where does LOAD_ATTR fit into all of this? > > LOAD_ATTR optimization doesn't use any of PEP 509 new stuff (if I > understand you question correctly). It's based on the following > assumptions (that really make JITs work so well): > > 1. Most classes don't implement __getattribute__. > > 2. A lot of attributes are stored in objects' __dict__s. > > 3. Most attributes aren't shaded by descriptors/getters-setters; most > code just uses "self.attr". > > 4. An average method/function works on objects of the same type. Which > means that those objects were constructed in a very similar (if not > exact) fashion. > > For instance: > > class F: > def __init__(self, name): > self.name = name > def say(self): > print(self.name) # <- For all F instances, > # offset of 'name' in `F().__dict__`s > # will be the same > > If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets > deoptimized, and the default implementation is used. So if the code is > very dynamic - there's no improvement, but no performance penalty either. > > In my patch, I use the cache to store (for LOAD_ATTR specifically): > > - pointer to object's type > - type->tp_version_tag > - the last successful __dict__ offset > > The first two fields are used to make sure that we have objects of the > same type. If it changes, we deoptimize the opcode immediately. Then > we try the offset. If it's successful - we have a cache hit. If not, > that's fine, we'll try another few times before deoptimizing the opcode. > So this is a third "next step" that has its own issue? -Brett > > > > > What do you think? > > > > > > It all looks great to me! > > Thanks! > > Yury > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Feb 1 15:16:55 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 15:16:55 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> Message-ID: <56AFBD37.60405@gmail.com> Brett, On 2016-02-01 3:08 PM, Brett Cannon wrote: > > > On Mon, 1 Feb 2016 at 11:51 Yury Selivanov > wrote: > > Hi Brett, > [..] > > > The first two fields are used to make sure that we have objects of the > same type. If it changes, we deoptimize the opcode immediately. Then > we try the offset. If it's successful - we have a cache hit. If not, > that's fine, we'll try another few times before deoptimizing the > opcode. > > > So this is a third "next step" that has its own issue? It's all in issue http://bugs.python.org/issue26219 right now. My current plan is to implement LOAD_METHOD/CALL_METHOD (just opcodes, no cache) in 26110. Then implement caching for LOAD_METHOD, LOAD_GLOBAL, and LOAD_ATTR in 26219. I'm flexible to break down 26219 in three separate issues if that helps the review process (but that would take more of my time): - implement support for opcode caching (general infrastructure) + LOAD_GLOBAL optimization - LOAD_METHOD optimization - LOAD_ATTR optimization Yury From brett at python.org Mon Feb 1 15:21:06 2016 From: brett at python.org (Brett Cannon) Date: Mon, 01 Feb 2016 20:21:06 +0000 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFBD37.60405@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBD37.60405@gmail.com> Message-ID: On Mon, 1 Feb 2016 at 12:16 Yury Selivanov wrote: > Brett, > > On 2016-02-01 3:08 PM, Brett Cannon wrote: > > > > > > On Mon, 1 Feb 2016 at 11:51 Yury Selivanov > > wrote: > > > > Hi Brett, > > > [..] > > > > > > The first two fields are used to make sure that we have objects of > the > > same type. If it changes, we deoptimize the opcode immediately. > Then > > we try the offset. If it's successful - we have a cache hit. If > not, > > that's fine, we'll try another few times before deoptimizing the > > opcode. > > > > > > So this is a third "next step" that has its own issue? > > It's all in issue http://bugs.python.org/issue26219 right now. > > My current plan is to implement LOAD_METHOD/CALL_METHOD (just opcodes, > no cache) in 26110. > > Then implement caching for LOAD_METHOD, LOAD_GLOBAL, and LOAD_ATTR in > 26219. I'm flexible to break down 26219 in three separate issues if > that helps the review process (but that would take more of my time): > > - implement support for opcode caching (general infrastructure) + > LOAD_GLOBAL optimization > - LOAD_METHOD optimization > - LOAD_ATTR optimization > I personally don't care how you break it down, just trying to keep all the moving pieces in my head. :) Anyway, it sounds like PEP 509 is blocking part of it, but the LOAD_METHOD stuff can go in as-is. So are you truly blocked only on getting the latest version of that patch up to http://bugs.python.org/issue26110 and getting a code review? -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Feb 1 15:27:53 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 21:27:53 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFB74D.8040108@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> Message-ID: <56AFBFC9.3090604@mail.de> On 01.02.2016 20:51, Yury Selivanov wrote: > If LOAD_ATTR gets too many cache misses (20 in my current patch) it > gets deoptimized, and the default implementation is used. So if the > code is very dynamic - there's no improvement, but no performance > penalty either. Will you re-try optimizing it? From yselivanov.ml at gmail.com Mon Feb 1 15:31:03 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 15:31:03 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBD37.60405@gmail.com> Message-ID: <56AFC087.4070206@gmail.com> On 2016-02-01 3:21 PM, Brett Cannon wrote: > > On Mon, 1 Feb 2016 at 12:16 Yury Selivanov > wrote: > > Brett, > > On 2016-02-01 3:08 PM, Brett Cannon wrote: > > > > > > On Mon, 1 Feb 2016 at 11:51 Yury Selivanov > > > >> wrote: > > > > Hi Brett, > > > [..] > > > > > > The first two fields are used to make sure that we have > objects of the > > same type. If it changes, we deoptimize the opcode > immediately. Then > > we try the offset. If it's successful - we have a cache > hit. If not, > > that's fine, we'll try another few times before deoptimizing the > > opcode. > > > > > > So this is a third "next step" that has its own issue? > > It's all in issue http://bugs.python.org/issue26219 right now. > > My current plan is to implement LOAD_METHOD/CALL_METHOD (just opcodes, > no cache) in 26110. > > Then implement caching for LOAD_METHOD, LOAD_GLOBAL, and LOAD_ATTR in > 26219. I'm flexible to break down 26219 in three separate issues if > that helps the review process (but that would take more of my time): > > - implement support for opcode caching (general infrastructure) + > LOAD_GLOBAL optimization > - LOAD_METHOD optimization > - LOAD_ATTR optimization > > > I personally don't care how you break it down, just trying to keep all > the moving pieces in my head. :) > > Anyway, it sounds like PEP 509 is blocking part of it, but the > LOAD_METHOD stuff can go in as-is. So are you truly blocked only on > getting the latest version of that patch up to > http://bugs.python.org/issue26110 and getting a code review? Yep. The initial implementation of LOAD_METHOD doesn't need PEP 509 / opcode caching. I'll have to focus on something else this week, but early next week I can upload a new patch for 26110. When we have 26110 committed and PEP 509 approved and committed, I can update the opcode cache patch (issue 26219) and we can start reviewing it. Yury From abarnert at yahoo.com Mon Feb 1 15:39:25 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Feb 2016 12:39:25 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <22191.40203.430978.404940@lrd.home.lan> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> Message-ID: On Feb 1, 2016, at 09:59, mike.romberg at comcast.net wrote: > > If the stdlib were to use implicit namespace packages > ( https://www.python.org/dev/peps/pep-0420/ ) and the various > loaders/importers as well, then python could do what I've done with an > embedded python application for years. Freeze the stdlib (or put it > in a zipfile or whatever is fast). Then arrange PYTHONPATH to first > look on the filesystem and then look in the frozen/ziped storage. This is a great solution for experienced developers, but I think it would be pretty bad for novices or transplants from other languages (maybe even including Python 2). There are already multiple duplicate questions every month on StackOverflow from people asking "how do I find the source to stdlib module X". The canonical answer starts off by explaining how to import the module and use its __file__, which everyone is able to handle. If we have to instead explain how to work out the .py name from the qualified module name, how to work out the stdlib path from sys.path, and then how to find the source from those two things, with the caveat that it may not be installed at all on some platforms, and how to make sure what they're asking about really is a stdlib module, and how to make sure they aren't shadowing it with a module elsewhere on sys.path, that's a lot more complicated. Especially when you consider that some people on Windows and Mac are writing Python scripts without ever learning how to use the terminal or find their Python packages via Explorer/Finder. And meanwhile, other people would be asking why their app runs slower on one machine than another, because they didn't expect that installing python-dev on top of python would slow down startup. Finally, on Linux and Mac, the stdlib will usually be somewhere that's not user-writable--and we shouldn't expect users to have to mess with stuff in /usr/lib or /System/Library even if they do have sudo access. Of course we could put a "stdlib shadow" location on the sys.path and configure it for /usr/local/lib and /Library and/or for somewhere in -, but that just makes the lookup proceed even more complicated--not to mention that we've just added three stat calls to remove one open, at which point the optimization has probably become a pessimization. From yselivanov.ml at gmail.com Mon Feb 1 15:35:21 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 15:35:21 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFBFC9.3090604@mail.de> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> Message-ID: <56AFC189.8010407@gmail.com> On 2016-02-01 3:27 PM, Sven R. Kunze wrote: > On 01.02.2016 20:51, Yury Selivanov wrote: >> If LOAD_ATTR gets too many cache misses (20 in my current patch) it >> gets deoptimized, and the default implementation is used. So if the >> code is very dynamic - there's no improvement, but no performance >> penalty either. > > Will you re-try optimizing it? No. It's important to understand that if we have a lot of cache misses after the code object was executed 1000 times, it doesn't make sense to keep trying to update that cache. It just means that the code, in that particular point, works with different kinds of objects. FWIW, I experimented with different ideas (one is to never de-optimize), and the current strategy works best on the vast number of benchmarks. Yury From damien.p.george at gmail.com Mon Feb 1 15:59:15 2016 From: damien.p.george at gmail.com (Damien George) Date: Mon, 1 Feb 2016 20:59:15 +0000 Subject: [Python-Dev] Opcode cache in ceval loop Message-ID: Hi Yury, That's great news about the speed improvements with the dict offset cache! > The cache struct is defined in code.h [2], and is 32 bytes long. When a > code object becomes hot, it gets an cache offset table allocated for it > (+1 byte for each opcode) + an array of cache structs. Ok, so each opcode has a 1-byte cache that sits separately to the actual bytecode. But a lot of opcodes don't use it so that leads to some wasted memory, correct? But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the offset into the bytecode cannot be easily mapped to a bytecode number. Cheers, Damien. From srkunze at mail.de Mon Feb 1 16:02:24 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 22:02:24 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFC189.8010407@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> Message-ID: <56AFC7E0.9080203@mail.de> On 01.02.2016 21:35, Yury Selivanov wrote: > It's important to understand that if we have a lot of cache misses > after the code object was executed 1000 times, it doesn't make sense > to keep trying to update that cache. It just means that the code, in > that particular point, works with different kinds of objects. So, the assumption is that the code makes the difference here not time. That could be true for production code. > FWIW, I experimented with different ideas (one is to never > de-optimize), and the current strategy works best on the vast number > of benchmarks. Nice. Regarding the magic constants (1000, 20) what is the process of updating them? Best, Sven From yselivanov.ml at gmail.com Mon Feb 1 16:21:37 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 16:21:37 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: Message-ID: <56AFCC61.5040302@gmail.com> Hi Damien, On 2016-02-01 3:59 PM, Damien George wrote: > Hi Yury, > > That's great news about the speed improvements with the dict offset cache! > >> The cache struct is defined in code.h [2], and is 32 bytes long. When a >> code object becomes hot, it gets an cache offset table allocated for it >> (+1 byte for each opcode) + an array of cache structs. > Ok, so each opcode has a 1-byte cache that sits separately to the > actual bytecode. But a lot of opcodes don't use it so that leads to > some wasted memory, correct? Each code object has a list of opcodes and their arguments (bytes object == unsigned char array). "Hot" code objects have an offset table (unsigned chars), and a cache entries array (hope your email client will display the following correctly): opcodes offset cache entries table OPCODE 0 cache for 1st LOAD_ATTR ARG1 0 cache for 1st LOAD_GLOBAL ARG2 0 cache for 2nd LOAD_ATTR OPCODE 0 cache for 1st LOAD_METHOD LOAD_ATTR 1 ... ARG1 0 ARG2 0 OPCODE 0 LOAD_GLOBAL 2 ARG1 0 ARG2 0 LOAD_ATTR 3 ARG1 0 ARG2 0 ... ... LOAD_METHOD 4 ... ... When, say, a LOAD_ATTR opcode executes, it first checks if the code object has a non-NULL cache-entries table. If it has, that LOAD_ATTR then uses the offset table (indexing with its `INSTR_OFFSET()`) to find its position in cache-entries. > > But then how do you index the cache, do you keep a count of the > current opcode number? If I remember correctly, CPython has some > opcodes taking 1 byte, and some taking 3 bytes, so the offset into the > bytecode cannot be easily mapped to a bytecode number. First, when a code object is created, it doesn't have an offset table and cache entries (those are set to NULL). Each code object has a new field to count how many times it was called. Each time a code object is called with PyEval_EvalFrameEx, that field is inced. Once a code object is called more than 1024 times we: 1. allocate memory for its offset table 2. iterate through its opcodes and count how many LOAD_ATTR, LOAD_METHOD and LOAD_GLOBAL opcodes it has; 3. As part of (2) we initialize the offset-table with correct mapping. Some opcodes will have a non-zero entry in the offset-table, some won't. Opcode args will always have zeros in the offset tables. 4. Then we allocate cache-entries table. Yury From yselivanov.ml at gmail.com Mon Feb 1 16:27:47 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 16:27:47 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFC7E0.9080203@mail.de> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> Message-ID: <56AFCDD3.20905@gmail.com> On 2016-02-01 4:02 PM, Sven R. Kunze wrote: > On 01.02.2016 21:35, Yury Selivanov wrote: >> It's important to understand that if we have a lot of cache misses >> after the code object was executed 1000 times, it doesn't make sense >> to keep trying to update that cache. It just means that the code, in >> that particular point, works with different kinds of objects. > > So, the assumption is that the code makes the difference here not > time. That could be true for production code. > >> FWIW, I experimented with different ideas (one is to never >> de-optimize), and the current strategy works best on the vast number >> of benchmarks. > > Nice. > > Regarding the magic constants (1000, 20) what is the process of > updating them? Right now they are private constants in ceval.c. I will (maybe) expose a private API via the _testcapi module to re-define them (set them to 1 or 0), only to write better unittests. I have no plans to make those constants public or have a public API to tackle them. IMHO, this is something that almost nobody will ever use. Yury From abarnert at yahoo.com Mon Feb 1 16:29:16 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Feb 2016 13:29:16 -0800 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFADB0.8000502@gmail.com> References: <56AFADB0.8000502@gmail.com> Message-ID: <84EC55C0-E955-4ACE-B8E2-A87770FF67E5@yahoo.com> Looking over the thread and the two issues, you've got good arguments for why the improved code will be the most common code, and good benchmarks for various kinds of real-life code, but it doesn't seem like you'd tried to stress it on anything that could be made worse. From your explanations and your code, I wouldn't expect that @classmethods, functions stored in the object dict or generated by __getattr__, non-function callables as methods, etc. would go significantly slower, or code that mixes @properties or __getattr__ proxy attributes with real attributes, or uses __slots__, or code that does frequently write to a global, etc. But it would be nice to _know_ that they don't instead of just expecting it. Sent from my iPhone > On Feb 1, 2016, at 11:10, Yury Selivanov wrote: > > Hi, > > This is the second email thread I start regarding implementing an opcode cache in ceval loop. Since my first post on this topic: > > - I've implemented another optimization (LOAD_ATTR); > > - I've added detailed statistics mode so that I can "see" how the cache performs and tune it; > > - some macro benchmarks are now 10-20% faster; 2to3 (a real application) is 7-8% faster; > > - and I have some good insights on the memory footprint. > > ** The purpose of this email is to get a general approval from python-dev, so that I can start polishing the patches and getting them reviewed/committed. ** > > > Summary of optimizations > ------------------------ > > When a code object is executed more than ~1000 times, it's considered "hot". It gets its opcodes analyzed to initialize caches for LOAD_METHOD (a new opcode I propose to add in [1]), LOAD_ATTR, and LOAD_GLOBAL. > > It's important to only optimize code objects that were executed "enough" times, to avoid optimizing code objects for modules, classes, and functions that were imported but never used. > > The cache struct is defined in code.h [2], and is 32 bytes long. When a code object becomes hot, it gets an cache offset table allocated for it (+1 byte for each opcode) + an array of cache structs. > > To measure the max/average memory impact, I tuned my code to optimize *every* code object on *first* run. Then I ran the entire Python test suite. Python test suite + standard library both contain around 72395 code objects, which required 20Mb of memory for caches. The test process consumed around 400Mb of memory. Thus, the absolute worst case scenario, the overhead is about 5%. > > Then I ran the test suite without any modifications to the patch. This means that only code objects that are called frequently enough are optimized. In this more, only 2072 code objects were optimized, using less than 1Mb of memory for the cache. > > > LOAD_ATTR > --------- > > Damien George mentioned that they optimize a lot of dict lookups in MicroPython by memorizing last key/value offset in the dict object, thus eliminating lots of hash lookups. I've implemented this optimization in my patch. The results are quite good. A simple micro-benchmark [3] shows ~30% speed improvement. Here are some debug stats generated by 2to3 benchmark: > > -- Opcode cache LOAD_ATTR hits = 14778415 (83%) > -- Opcode cache LOAD_ATTR misses = 750 (0%) > -- Opcode cache LOAD_ATTR opts = 282 > -- Opcode cache LOAD_ATTR deopts = 60 > -- Opcode cache LOAD_ATTR total = 17777912 > > Each "hit" makes LOAD_ATTR about 30% faster. > > > LOAD_GLOBAL > ----------- > > This turned out to be a very stable optimization. Here is the debug output of the 2to3 test: > > -- Opcode cache LOAD_GLOBAL hits = 3940647 (100%) > -- Opcode cache LOAD_GLOBAL misses = 0 (0%) > -- Opcode cache LOAD_GLOBAL opts = 252 > > All benchmarks (and real code) have stats like that. Globals and builtins are very rarely modified, so the cache works really well. With LOAD_GLOBAL opcode cache, global lookup is very cheap, there is no hash lookup for it at all. It makes optimizations like "def foo(len=len)" obsolete. > > > LOAD_METHOD > ----------- > > This is a new opcode I propose to add in [1]. The idea is to substitute LOAD_ATTR with it, and avoid instantiation of BoundMethod objects. > > With the cache, we can store a reference to the method descriptor (I use type->tp_version_tag for cache invalidation, the same thing _PyType_Lookup is built around). > > The cache makes LOAD_METHOD really efficient. A simple micro-benchmark like [4], shows that with the cache and LOAD_METHOD, "s.startswith('abc')" becomes as efficient as "s[:3] == 'abc'". > > LOAD_METHOD/CALL_FUNCTION without cache is about 20% faster than LOAD_ATTR/CALL_FUNCTION. With the cache, it's about 30% faster. > > Here's the debug output of the 2to3 benchmark: > > -- Opcode cache LOAD_METHOD hits = 5164848 (64%) > -- Opcode cache LOAD_METHOD misses = 12 (0%) > -- Opcode cache LOAD_METHOD opts = 94 > -- Opcode cache LOAD_METHOD deopts = 12 > -- Opcode cache LOAD_METHOD dct-chk= 1614801 > -- Opcode cache LOAD_METHOD total = 7945954 > > > What's next? > ------------ > > First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110 [1]. It's a very straightforward optimization, the patch is small and easy to review. > > Second, I'd like to merge the new opcode cache, see issue 26219 [5]. All unittests pass. Memory usage increase is very moderate (<1mb for the entire test suite), and the performance increase is significant. The only potential blocker for this is PEP 509 approval (which I'd be happy to assist with). > > What do you think? > > Thanks, > Yury > > > [1] http://bugs.python.org/issue26110 > [2] https://github.com/1st1/cpython/blob/opcache5/Include/code.h#L10 > [3] https://gist.github.com/1st1/37d928f1e84813bf1c44 > [4] https://gist.github.com/1st1/10588e6e11c4d7c19445 > [5] http://bugs.python.org/issue26219 > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/abarnert%40yahoo.com From srkunze at mail.de Mon Feb 1 16:32:41 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 22:32:41 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFCDD3.20905@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> Message-ID: <56AFCEF9.2060508@mail.de> On 01.02.2016 22:27, Yury Selivanov wrote: > Right now they are private constants in ceval.c. > > I will (maybe) expose a private API via the _testcapi module to > re-define them (set them to 1 or 0), only to write better unittests. > I have no plans to make those constants public or have a public API to > tackle them. IMHO, this is something that almost nobody will ever use. Alright. I agree with you on that. What I actually meant was: how can we find the optimal values? I understand that 1000 and 20 are some hand-figured/subjective values for now. Is there standardized/objective way to find out the best values? What does best even mean here? Best, Sven From yselivanov.ml at gmail.com Mon Feb 1 16:43:23 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 16:43:23 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFCEF9.2060508@mail.de> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> Message-ID: <56AFD17B.5090305@gmail.com> Sven, On 2016-02-01 4:32 PM, Sven R. Kunze wrote: > On 01.02.2016 22:27, Yury Selivanov wrote: >> Right now they are private constants in ceval.c. >> >> I will (maybe) expose a private API via the _testcapi module to >> re-define them (set them to 1 or 0), only to write better unittests. >> I have no plans to make those constants public or have a public API >> to tackle them. IMHO, this is something that almost nobody will ever >> use. > > Alright. I agree with you on that. > > What I actually meant was: how can we find the optimal values? I > understand that 1000 and 20 are some hand-figured/subjective values > for now. > > Is there standardized/objective way to find out the best values? What > does best even mean here? Running lots of benchmarks and micro-benchmarks hundreds of times ;) I've done a lot of that, and I noticed that the numbers don't matter too much. What matters is that we don't want to optimize the code that runs 0 or 1 times. To save some memory we don't want to optimize the code that runs 10 times. So 1000 seems to be about right. We also need to deoptimize the code to avoid having too many cache misses/pointless cache updates. I found that, for instance, LOAD_ATTR is either super stable (hits 100% of times), or really unstable, so 20 misses is, again, seems to be alright. I'm flexible about tweaking those values, I encourage you and everyone to experiment, if you have time ;) https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L100 Thanks, Yury From srkunze at mail.de Mon Feb 1 16:49:14 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 1 Feb 2016 22:49:14 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFD17B.5090305@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> <56AFD17B.5090305@gmail.com> Message-ID: <56AFD2DA.3040805@mail.de> On 01.02.2016 22:43, Yury Selivanov wrote: > Sven, > > On 2016-02-01 4:32 PM, Sven R. Kunze wrote: >> On 01.02.2016 22:27, Yury Selivanov wrote: >>> Right now they are private constants in ceval.c. >>> >>> I will (maybe) expose a private API via the _testcapi module to >>> re-define them (set them to 1 or 0), only to write better >>> unittests. I have no plans to make those constants public or have a >>> public API to tackle them. IMHO, this is something that almost >>> nobody will ever use. >> >> Alright. I agree with you on that. >> >> What I actually meant was: how can we find the optimal values? I >> understand that 1000 and 20 are some hand-figured/subjective values >> for now. >> >> Is there standardized/objective way to find out the best values? What >> does best even mean here? > > Running lots of benchmarks and micro-benchmarks hundreds of times ;) > I've done a lot of that, and I noticed that the numbers don't matter > too much. That's actually pretty interesting. :) Do you consider writing a blog post about this at some time? > What matters is that we don't want to optimize the code that runs 0 or > 1 times. To save some memory we don't want to optimize the code that > runs 10 times. So 1000 seems to be about right. > > We also need to deoptimize the code to avoid having too many cache > misses/pointless cache updates. I found that, for instance, LOAD_ATTR > is either super stable (hits 100% of times), or really unstable, so 20 > misses is, again, seems to be alright. > > I'm flexible about tweaking those values, I encourage you and everyone > to experiment, if you have time ;) > https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L100 Right now, I am busy with the heap implementation but I think I can look into it later. Best, Sven From yselivanov.ml at gmail.com Mon Feb 1 17:12:14 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 17:12:14 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <84EC55C0-E955-4ACE-B8E2-A87770FF67E5@yahoo.com> References: <56AFADB0.8000502@gmail.com> <84EC55C0-E955-4ACE-B8E2-A87770FF67E5@yahoo.com> Message-ID: <56AFD83E.6040108@gmail.com> Andrew, On 2016-02-01 4:29 PM, Andrew Barnert wrote: > Looking over the thread and the two issues, you've got good arguments for why the improved code will be the most common code, and good benchmarks for various kinds of real-life code, but it doesn't seem like you'd tried to stress it on anything that could be made worse. From your explanations and your code, I wouldn't expect that @classmethods, functions stored in the object dict or generated by __getattr__, non-function callables as methods, etc. would go significantly slower, Right. The caching, of course, has some overhead, albeit barely detectable. The only way the slow down might become "significant" if there is a bug in the ceval.c code -- i.e. an opcode doesn't get de-optimized etc. That should be fixable. > or code that mixes @properties or __getattr__ proxy attributes with real attributes, or uses __slots__, No performance degradation for __slots__, we have a benchmark for that. I also tried to add __slots__ to every class in the Richards test - no improvement or performance degradation there. > or code that does frequently write to a global, etc. But it would be nice to _know_ that they don't instead of just expecting it. FWIW I've just tried to write a micro-benchmark for __getattr__: https://gist.github.com/1st1/22c1aa0a46f246a31515 Opcode cache gets quickly deoptimized with it, but, as expected, the CPython with opcode cache is <1% slower. But that's a 1% in a super micro-benchmark; of course the cost of having a cache that isn't used will show up. In a real code that doesn't consist only of LOAD_ATTRs, it won't even be possible to see any slowdown. Thanks, Yury From yselivanov.ml at gmail.com Mon Feb 1 17:22:46 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 1 Feb 2016 17:22:46 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFCC61.5040302@gmail.com> References: <56AFCC61.5040302@gmail.com> Message-ID: <56AFDAB6.1050807@gmail.com> On 2016-02-01 4:21 PM, Yury Selivanov wrote: > Hi Damien, > > On 2016-02-01 3:59 PM, Damien George wrote: > [..] >> >> But then how do you index the cache, do you keep a count of the >> current opcode number? If I remember correctly, CPython has some >> opcodes taking 1 byte, and some taking 3 bytes, so the offset into the >> bytecode cannot be easily mapped to a bytecode number. > Here are a few links that might explain the idea better: https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L1229 https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L2610 https://github.com/1st1/cpython/blob/opcache5/Objects/codeobject.c#L167 Yury From breamoreboy at yahoo.co.uk Mon Feb 1 17:50:29 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 1 Feb 2016 22:50:29 +0000 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56AF8DB9.30900@gmail.com> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> Message-ID: On 01/02/2016 16:54, Yury Selivanov wrote: > > > On 2016-01-29 11:28 PM, Steven D'Aprano wrote: >> On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote: >>> Hi, >>> >>> tl;dr The summary is that I have a patch that improves CPython >>> performance up to 5-10% on macro benchmarks. Benchmarks results on >>> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available >>> at [1]. There are no slowdowns that I could reproduce consistently. >> Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now >> unmaintained, and the project repo on Google Code appears to be dead (I >> get a 404), but I understand that it was significantly faster than >> CPython back in the 2.6 days. >> >> https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf >> > > Thanks for bringing this up! > > IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per > bytecode instead of 8. That allows to minimize the number of bytecodes, > thus having some performance increase. TBH, I don't think it was > "significantly faster". > From https://code.google.com/archive/p/wpython/ WPython is a re-implementation of (some parts of) Python, which drops support for bytecode in favour of a wordcode-based model (where a is word is 16 bits wide). It also implements an hybrid stack-register virtual machine, and adds a lot of other optimizations. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From greg.ewing at canterbury.ac.nz Mon Feb 1 18:27:26 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Feb 2016 12:27:26 +1300 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56AFA219.9020103@mail.de> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> <56AFA219.9020103@mail.de> Message-ID: <56AFE9DE.9070002@canterbury.ac.nz> Sven R. Kunze wrote: > Are there some resources on why register machines are considered faster > than stack machines? If a register VM is faster, it's probably because each register instruction does the work of about 2-3 stack instructions, meaning less trips around the eval loop, so less unpredictable branches and less pipeline flushes. This assumes that bytecode dispatching is a substantial fraction of the time taken to execute each instruction. For something like cpython, where the operations carried out by the bytecodes involve a substantial amount of work, this may not be true. It also assumes the VM is executing the bytecodes directly. If there is a JIT involved, it all gets translated into something else anyway, and then it's more a matter of whether you find it easier to design the JIT to deal with stack or register code. -- Greg From tjreedy at udel.edu Mon Feb 1 22:44:34 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 1 Feb 2016 22:44:34 -0500 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> Message-ID: On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote: > There are already multiple duplicate questions every month on > StackOverflow from people asking "how do I find the source to stdlib > module X". The canonical answer starts off by explaining how to > import the module and use its __file__, which everyone is able to > handle. Perhaps even easier: start IDLE, hit Alt-M, type in module name as one would import it, click OK. If Python source is available, IDLE will open in an editor window. with the path on the title bar. If we have to instead explain how to work out the .py name > from the qualified module name, how to work out the stdlib path from > sys.path, and then how to find the source from those two things, with > the caveat that it may not be installed at all on some platforms, and > how to make sure what they're asking about really is a stdlib module, > and how to make sure they aren't shadowing it with a module elsewhere > on sys.path, that's a lot more complicated. The windows has the path on the title bar, so one can tell what was loaded. IDLE currently uses imp.find_module (this could be updated), with a backup of __import__(...).__file__, so it will load non-stdlib files that can be imported. > Finally, on Linux and Mac, the stdlib will usually be somewhere > that's not user-writable On Windows, this depends on the install location. Perhaps there should be an option for edit-save or view only to avoid accidental changes. -- Terry Jan Reedy From abarnert at yahoo.com Tue Feb 2 00:16:39 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Feb 2016 21:16:39 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> Message-ID: <2F71ECE7-BAB7-44C5-8FFB-2CC637D40C73@yahoo.com> On Feb 1, 2016, at 19:44, Terry Reedy wrote: > >> On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote: >> >> There are already multiple duplicate questions every month on >> StackOverflow from people asking "how do I find the source to stdlib >> module X". The canonical answer starts off by explaining how to >> import the module and use its __file__, which everyone is able to >> handle. > > Perhaps even easier: start IDLE, hit Alt-M, type in module name as one would import it, click OK. If Python source is available, IDLE will open in an editor window. with the path on the title bar. > >> If we have to instead explain how to work out the .py name >> from the qualified module name, how to work out the stdlib path from >> sys.path, and then how to find the source from those two things, with >> the caveat that it may not be installed at all on some platforms, and >> how to make sure what they're asking about really is a stdlib module, >> and how to make sure they aren't shadowing it with a module elsewhere >> on sys.path, that's a lot more complicated. > > The windows has the path on the title bar, so one can tell what was loaded. The point of this thread is the suggestion that the stdlib modules be frozen or stored in a zipfile, unless a user modifies things in some way to make the source accessible. So, if a user hasn't done that (which no novice will know how to do), there won't be a path to show in the title bar, so IDLE won't be any more help than the command line. (I suppose IDLE could grow a new feature to look up "associated source files" for a zipped stdlib or something, but that seems like a pretty big new feature.) > IDLE currently uses imp.find_module (this could be updated), with a backup of __import__(...).__file__, so it will load non-stdlib files that can be imported. > > > Finally, on Linux and Mac, the stdlib will usually be somewhere > > that's not user-writable > > On Windows, this depends on the install location. Perhaps there should be an option for edit-save or view only to avoid accidental changes. The problem is that, if the standard way for users to see stdlib sources is to copy them from somewhere else (like $install/src/Lib) into a stdlib directory (like $install/Lib), then that stdlib directory has to be writable--and on Mac and Linux, it's not. From vadmium+py at gmail.com Tue Feb 2 02:57:36 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Tue, 2 Feb 2016 07:57:36 +0000 Subject: [Python-Dev] [Python-checkins] cpython: merge In-Reply-To: <20160202052149.77411.58810@psf.io> References: <20160202052149.77411.58810@psf.io> Message-ID: On 2 February 2016 at 05:21, raymond.hettinger wrote: > https://hg.python.org/cpython/rev/0731f097157b > changeset: 100142:0731f097157b > parent: 100140:c7f1acdd8be1 > user: Raymond Hettinger > date: Mon Feb 01 21:21:19 2016 -0800 > summary: > merge > > files: > Doc/library/collections.rst | 4 ++-- > Lib/test/test_deque.py | 23 ++++++++++++----------- > Modules/_collectionsmodule.c | 7 ++----- > 3 files changed, 16 insertions(+), 18 deletions(-) This wasn?t actually a merge (there is only one parent). Hopefully I fixed it up with . But it looks like the original NEWS entry didn?t get merged in your earlier merge , so there was nothing for me to merge the NEWS changes into in the default branch. From victor.stinner at gmail.com Tue Feb 2 04:28:43 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 2 Feb 2016 10:28:43 +0100 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56A90B97.7090001@gmail.com> References: <56A90B97.7090001@gmail.com> Message-ID: Hi, I'm back for the FOSDEM event at Bruxelles, it was really cool. I gave talk about FAT Python and I got good feedback. But friends told me that people now have expectations on FAT Python. It looks like people care of Python performance :-) FYI the slides of my talk: https://github.com/haypo/conf/raw/master/2016-FOSDEM/fat_python.pdf (a video was recorded, I don't know when it will be online) I take a first look at your patch and sorry, I'm skeptical about the design. I have to play with it a little bit more to check if there is no better design. To be clear, FAT Python with your work looks more and more like a cheap JIT compiler :-) Guards, specializations, optimizing at runtime after a threshold... all these things come from JIT compilers. I like the idea of a kind-of JIT compiler without having to pay the high cost of a large dependency like LLVM. I like baby steps in CPython, it's faster, it's possible to implement it in a single release cycle (one minor Python release, Python 3.6). Integrating a JIT compiler into CPython already failed with Unladen Swallow :-/ PyPy has a complete different design (and has serious issues with the Python C API), Pyston is restricted to Python 2.7, Pyjion looks specific to Windows (CoreCLR), Numba is specific to numeric computations (numpy). IMHO none of these projects can be easily be merged into CPython "quickly" (again, in a single Python release cycle). By the way, Pyjion still looks very young (I heard that they are still working on the compatibility with CPython, not on performance yet). 2016-01-27 19:25 GMT+01:00 Yury Selivanov : > tl;dr The summary is that I have a patch that improves CPython performance > up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS > X, desktop CPU/Linux, server CPU/Linux are available at [1]. There are no > slowdowns that I could reproduce consistently. That's really impressive, great job Yury :-) Getting non-negligible speedup on large macrobenchmarks became really hard in CPython. CPython is already well optimized in all corners. It looks like the overall Python performance still depends heavily on the performance of dictionary and attribute lookups. Even if it was well known, I didn't expect up to 10% speedup on *macro* benchmarks. > LOAD_METHOD & CALL_METHOD > ------------------------- > > We had a lot of conversations with Victor about his PEP 509, and he sent me > a link to his amazing compilation of notes about CPython performance [2]. > One optimization that he pointed out to me was LOAD/CALL_METHOD opcodes, an > idea first originated in PyPy. > > There is a patch that implements this optimization, it's tracked here: [3]. > There are some low level details that I explained in the issue, but I'll go > over the high level design in this email as well. Your cache is stored directly in code objects. Currently, code objects are immutable. Antoine Pitrou's patch adding a LOAD_GLOBAL cache adds a cache to functions with an "alias" in each frame object: http://bugs.python.org/issue10401 Andrea Griffini's patch also adding a cache for LOAD_GLOBAL adds a cache for code objects too. https://bugs.python.org/issue1616125 I don't know what is the best place to store the cache. I vaguely recall a patch which uses a single unique global cache, but maybe I'm wrong :-p > The opcodes we want to optimize are LAOD_GLOBAL, 0 and 3. Let's look at the > first one, that loads the 'print' function from builtins. The opcode knows > the following bits of information: I tested your latest patch. It looks like LOAD_GLOBAL never invalidates the cache on cache miss ("deoptimize" the instruction). I suggest to always invalidate the cache at each cache miss. Not only, it's common to modify global variables, but there is also the issue of different namespace used with the same code object. Examples: * late global initialization. See for example _a85chars cache of base64.a85encode. * code object created in a temporary namespace and then always run in a different global namespace. See for example collections.namedtuple(). I'm not sure that it's the best example because it looks like the Python code only loads builtins, not globals. But it looks like your code keeps a copy of the version of the global namespace dict. I tested with a threshold of 1: always optimize all code objects. Maybe with your default threshold of 1024 runs, the issue with different namespaces doesn't occur in practice. > A straightforward way to implement such a cache is simple, but consumes a > lot of memory, that would be just wasted, since we only need such a cache > for LOAD_GLOBAL and LOAD_METHOD opcodes. So we have to be creative about the > cache design. I'm not sure that it's worth to develop a complex dynamic logic to only enable optimizations after a threshold (design very close to a JIT compiler). What is the overhead (% of RSS memory) on a concrete application when all code objects are optimized at startup? Maybe we need a global boolean flag to disable the optimization? Or even a compilation option? I mean that all these new counters have a cost, and the code may be even faster without these counters if everything is always optimized, no? I'm not sure that the storage for the cache is really efficient. It's a compact data structure, but it looks "expensive" to access it (there is one level of indirection). I understand that it's compact to reduce the memory footpring overhead. I'm not sure that the threshold of 1000x run is ok for short scripts. It would be nice to optimize also scripts which only call a function 900x times :-) Classical memory vs cpu compromise issue :-) I'm just thinking aloud :-) Victor From victor.stinner at gmail.com Tue Feb 2 04:33:43 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 2 Feb 2016 10:33:43 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFADB0.8000502@gmail.com> References: <56AFADB0.8000502@gmail.com> Message-ID: Hi, Maybe it's worth to write a PEP to summarize all your changes to optimize CPython? It would avoid to have to follow different threads on the mailing lists, different issues on the bug tracker, with external links to GitHub gists, etc. Your code changes critical parts of Python: code object structure and Python/ceval.c. At least, it would help to document Python internals ;-) The previous "big" change (optimization) like that was the new "type attribute cache": addition of tp_version_tag to PyTypeObject. I "documented" it in the PEP 509 and it was difficult to rebuild the context, understand the design, etc. https://www.python.org/dev/peps/pep-0509/#method-cache-and-type-version-tag Victor From yselivanov.ml at gmail.com Tue Feb 2 09:15:58 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 2 Feb 2016 09:15:58 -0500 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: References: <56A90B97.7090001@gmail.com> Message-ID: <56B0BA1E.5040505@gmail.com> On 2016-02-02 4:28 AM, Victor Stinner wrote: [..] > I take a first look at your patch and sorry, Thanks for the initial code review! > I'm skeptical about the > design. I have to play with it a little bit more to check if there is > no better design. So far I see two things you are worried about: 1. The cache is attached to the code object vs function/frame. I think the code object is the perfect place for such a cache. The cache must be there (and survive!) "across" the frames. If you attach it to the function object, you'll have to re-attach it to a frame object on each PyEval call. I can't see how that would be better. 2. Two levels of indirection in my cache -- offsets table + cache table. In my other email thread "Opcode cache in ceval loop" I explained that optimizing every code object in the standard library and unittests adds 5% memory overhead. Optimizing only those that are called frequently is less than 1%. Besides, many functions that you import are never called, or only called once or twice. And code objects for modules and class bodies are called once. If we don't use an offset table and just allocate a cache entry for every opcode, then the memory usage will raise *significantly*. Right now the overhead of the offset table is *8 bits* per opcode, the overhead of the cache table is *32 bytes* per an optimized opcode. The overhead of using 1 extra indirection is minimal. [..] > > > 2016-01-27 19:25 GMT+01:00 Yury Selivanov : >> tl;dr The summary is that I have a patch that improves CPython performance >> up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS >> X, desktop CPU/Linux, server CPU/Linux are available at [1]. There are no >> slowdowns that I could reproduce consistently. > That's really impressive, great job Yury :-) Getting non-negligible > speedup on large macrobenchmarks became really hard in CPython. > CPython is already well optimized in all corners. It looks like the > overall Python performance still depends heavily on the performance of > dictionary and attribute lookups. Even if it was well known, I didn't > expect up to 10% speedup on *macro* benchmarks. Thanks! > > >> LOAD_METHOD & CALL_METHOD >> ------------------------- >> >> We had a lot of conversations with Victor about his PEP 509, and he sent me >> a link to his amazing compilation of notes about CPython performance [2]. >> One optimization that he pointed out to me was LOAD/CALL_METHOD opcodes, an >> idea first originated in PyPy. >> >> There is a patch that implements this optimization, it's tracked here: [3]. >> There are some low level details that I explained in the issue, but I'll go >> over the high level design in this email as well. > Your cache is stored directly in code objects. Currently, code objects > are immutable. Code objects are immutable on the Python level. My cache doesn't make any previously immutable field mutable. Adding a few mutable cache structures visible only at the C level is acceptable I think. > > Antoine Pitrou's patch adding a LOAD_GLOBAL cache adds a cache to > functions with an "alias" in each frame object: > http://bugs.python.org/issue10401 > > Andrea Griffini's patch also adding a cache for LOAD_GLOBAL adds a > cache for code objects too. > https://bugs.python.org/issue1616125 Those patches are nice, but optimizing just LOAD_GLOBAL won't give you a big speed-up. For instance, 2to3 became 7-8% faster once I started to optimize LOAD_ATTR. The idea of my patch is that it implements caching in such a way, that we can add it to several different opcodes. >> The opcodes we want to optimize are LAOD_GLOBAL, 0 and 3. Let's look at the >> first one, that loads the 'print' function from builtins. The opcode knows >> the following bits of information: > I tested your latest patch. It looks like LOAD_GLOBAL never > invalidates the cache on cache miss ("deoptimize" the instruction). Yes, that was a deliberate decision (but we can add the deoptimization easily). So far I haven't seen a use case or benchmark where we really need to deoptimize. > > I suggest to always invalidate the cache at each cache miss. Not only, > it's common to modify global variables, but there is also the issue of > different namespace used with the same code object. Examples: > > * late global initialization. See for example _a85chars cache of > base64.a85encode. > * code object created in a temporary namespace and then always run in > a different global namespace. See for example > collections.namedtuple(). I'm not sure that it's the best example > because it looks like the Python code only loads builtins, not > globals. But it looks like your code keeps a copy of the version of > the global namespace dict. > > I tested with a threshold of 1: always optimize all code objects. > Maybe with your default threshold of 1024 runs, the issue with > different namespaces doesn't occur in practice. Yep. I added a constant in ceval.c that enables collection of opcode cache stats. 99.9% of all global dicts in benchmarks are stable. test suite was a bit different, only ~99% :) One percent of cache misses was probably because of unittest.mock. > > >> A straightforward way to implement such a cache is simple, but consumes a >> lot of memory, that would be just wasted, since we only need such a cache >> for LOAD_GLOBAL and LOAD_METHOD opcodes. So we have to be creative about the >> cache design. > I'm not sure that it's worth to develop a complex dynamic logic to > only enable optimizations after a threshold (design very close to a > JIT compiler). I think it's not even remotely close to what JITs do. In my design I have a simple counter -- when it reaches 1000, we create the caches in the code objects. Some opcodes start to use it. That's basically it. JIT compilers trace the code, collect information about types, think about memory, optimize, deoptimize, think about memory again, etc, etc :) > What is the overhead (% of RSS memory) on a concrete > application when all code objects are optimized at startup? I've mentioned that in my other thread. When the whole test suite is run with *every* code object being optimized (threshold = 1), about 73000 code objects were optimized, requiring >20Mb of memory (the test suite process consumed ~400Mb of memory). So 5% looks to be the worst case. When I ran the test suite with threshold set to 1024, only 2000 objects were optimized, requiring less than 1% of the total process memory. > > Maybe we need a global boolean flag to disable the optimization? Or > even a compilation option? I'd hate to add such thing. Why would you want to disable the cache? To save 1% of memory? TBH I think this only adds maintenance overhead to us. > > I mean that all these new counters have a cost, and the code may be > even faster without these counters if everything is always optimized, > no? Yes, but only marginally. You'll save one "inc" in eval loop. And a couple of "if"s. Maybe on a micro benchmark you can see a difference. But optimizing everything will require much more memory. And we shouldn't optimize code objects that are run only once -- that's code objects for modules and classes. Threshold of 1024 is big enough to say that the code object is frequently used and will probably continue to be frequently used in the future. > > I'm not sure that the storage for the cache is really efficient. It's > a compact data structure, but it looks "expensive" to access it (there > is one level of indirection). I understand that it's compact to reduce > the memory footpring overhead. > > I'm not sure that the threshold of 1000x run is ok for short scripts. > It would be nice to optimize also scripts which only call a function > 900x times :-) Classical memory vs cpu compromise issue :-) I'd be OK to change the threshold to 500 or something. But IMHO it won't change much. Short/small scripts won't hit it anyways. And even if they do, they typically don't run long enough to get a measurable speedup. > > I'm just thinking aloud :-) Thanks! I'm happy that you are looking at this thing with a critical eye. BTW, here's a debug output of unit tests with every code object optimized: -- Opcode cache number of objects = 72395 -- Opcode cache total extra mem = 20925595 -- Opcode cache LOAD_METHOD hits = 64569036 (63%) -- Opcode cache LOAD_METHOD misses = 23899 (0%) -- Opcode cache LOAD_METHOD opts = 104872 -- Opcode cache LOAD_METHOD deopts = 19191 -- Opcode cache LOAD_METHOD dct-chk= 12805608 -- Opcode cache LOAD_METHOD total = 101735114 -- Opcode cache LOAD_GLOBAL hits = 123808815 (99%) -- Opcode cache LOAD_GLOBAL misses = 310397 (0%) -- Opcode cache LOAD_GLOBAL opts = 125205 -- Opcode cache LOAD_ATTR hits = 59089435 (53%) -- Opcode cache LOAD_ATTR misses = 33372 (0%) -- Opcode cache LOAD_ATTR opts = 73643 -- Opcode cache LOAD_ATTR deopts = 20276 -- Opcode cache LOAD_ATTR total = 111049468 Yury From srkunze at mail.de Tue Feb 2 12:16:34 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 2 Feb 2016 18:16:34 +0100 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56AFE9DE.9070002@canterbury.ac.nz> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> <56AFA219.9020103@mail.de> <56AFE9DE.9070002@canterbury.ac.nz> Message-ID: <56B0E472.1030600@mail.de> On 02.02.2016 00:27, Greg Ewing wrote: > Sven R. Kunze wrote: >> Are there some resources on why register machines are considered >> faster than stack machines? > > If a register VM is faster, it's probably because each register > instruction does the work of about 2-3 stack instructions, > meaning less trips around the eval loop, so less unpredictable > branches and less pipeline flushes. That's was I found so far as well. > This assumes that bytecode dispatching is a substantial fraction > of the time taken to execute each instruction. For something > like cpython, where the operations carried out by the bytecodes > involve a substantial amount of work, this may not be true. Interesting point indeed. It makes sense that register machines only saves us the bytecode dispatching. How much that is compared to the work each instruction requires, I cannot say. Maybe, Yury has a better understanding here. > It also assumes the VM is executing the bytecodes directly. If > there is a JIT involved, it all gets translated into something > else anyway, and then it's more a matter of whether you find > it easier to design the JIT to deal with stack or register code. It seems like Yury thinks so. He didn't tell use so far. Best, Sven From pludemann at google.com Tue Feb 2 12:33:52 2016 From: pludemann at google.com (Peter Ludemann) Date: Tue, 2 Feb 2016 09:33:52 -0800 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: <56B0E472.1030600@mail.de> References: <56A90B97.7090001@gmail.com> <20160130042835.GJ4619@ando.pearwood.info> <56AF8DB9.30900@gmail.com> <56AFA219.9020103@mail.de> <56AFE9DE.9070002@canterbury.ac.nz> <56B0E472.1030600@mail.de> Message-ID: Also, modern compiler technology tends to use "infinite register" machines for the intermediate representation, then uses register coloring to assign the actual registers (and generate spill code if needed). I've seen work on inter-function optimization for avoiding some register loads and stores (combined with tail-call optimization, it can turn recursive calls into loops in the register machine). On 2 February 2016 at 09:16, Sven R. Kunze wrote: > On 02.02.2016 00:27, Greg Ewing wrote: > >> Sven R. Kunze wrote: >> >>> Are there some resources on why register machines are considered faster >>> than stack machines? >>> >> >> If a register VM is faster, it's probably because each register >> instruction does the work of about 2-3 stack instructions, >> meaning less trips around the eval loop, so less unpredictable >> branches and less pipeline flushes. >> > > That's was I found so far as well. > > This assumes that bytecode dispatching is a substantial fraction >> of the time taken to execute each instruction. For something >> like cpython, where the operations carried out by the bytecodes >> involve a substantial amount of work, this may not be true. >> > > Interesting point indeed. It makes sense that register machines only saves > us the bytecode dispatching. > > How much that is compared to the work each instruction requires, I cannot > say. Maybe, Yury has a better understanding here. > > It also assumes the VM is executing the bytecodes directly. If >> there is a JIT involved, it all gets translated into something >> else anyway, and then it's more a matter of whether you find >> it easier to design the JIT to deal with stack or register code. >> > > It seems like Yury thinks so. He didn't tell use so far. > > > Best, > Sven > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/pludemann%40google.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Feb 2 12:41:01 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 2 Feb 2016 19:41:01 +0200 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFADB0.8000502@gmail.com> References: <56AFADB0.8000502@gmail.com> Message-ID: On 01.02.16 21:10, Yury Selivanov wrote: > To measure the max/average memory impact, I tuned my code to optimize > *every* code object on *first* run. Then I ran the entire Python test > suite. Python test suite + standard library both contain around 72395 > code objects, which required 20Mb of memory for caches. The test > process consumed around 400Mb of memory. Thus, the absolute worst case > scenario, the overhead is about 5%. Test process consumes such much memory because few tests creates huge objects. If exclude these tests (note that tests that requires more than 1Gb are already excluded by default) and tests that creates a number of threads (threads consume much memory too), the rest of tests needs less than 100Mb of memory. Absolute required minimum is about 25Mb. Thus, the absolute worst case scenario, the overhead is about 100%. From yselivanov.ml at gmail.com Tue Feb 2 12:45:55 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 2 Feb 2016 12:45:55 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> Message-ID: <56B0EB53.4000307@gmail.com> On 2016-02-02 12:41 PM, Serhiy Storchaka wrote: > On 01.02.16 21:10, Yury Selivanov wrote: >> To measure the max/average memory impact, I tuned my code to optimize >> *every* code object on *first* run. Then I ran the entire Python test >> suite. Python test suite + standard library both contain around 72395 >> code objects, which required 20Mb of memory for caches. The test >> process consumed around 400Mb of memory. Thus, the absolute worst case >> scenario, the overhead is about 5%. > > Test process consumes such much memory because few tests creates huge > objects. If exclude these tests (note that tests that requires more > than 1Gb are already excluded by default) and tests that creates a > number of threads (threads consume much memory too), the rest of tests > needs less than 100Mb of memory. Absolute required minimum is about > 25Mb. Thus, the absolute worst case scenario, the overhead is about 100%. Can you give me the exact configuration of tests (command line to run) that would only consume 25mb? Yury From brett at python.org Tue Feb 2 12:52:52 2016 From: brett at python.org (Brett Cannon) Date: Tue, 02 Feb 2016 17:52:52 +0000 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: References: <56A90B97.7090001@gmail.com> Message-ID: On Tue, 2 Feb 2016 at 01:29 Victor Stinner wrote: > Hi, > > I'm back for the FOSDEM event at Bruxelles, it was really cool. I gave > talk about FAT Python and I got good feedback. But friends told me > that people now have expectations on FAT Python. It looks like people > care of Python performance :-) > > FYI the slides of my talk: > https://github.com/haypo/conf/raw/master/2016-FOSDEM/fat_python.pdf > (a video was recorded, I don't know when it will be online) > > I take a first look at your patch and sorry, I'm skeptical about the > design. I have to play with it a little bit more to check if there is > no better design. > > To be clear, FAT Python with your work looks more and more like a > cheap JIT compiler :-) Guards, specializations, optimizing at runtime > after a threshold... all these things come from JIT compilers. I like > the idea of a kind-of JIT compiler without having to pay the high cost > of a large dependency like LLVM. I like baby steps in CPython, it's > faster, it's possible to implement it in a single release cycle (one > minor Python release, Python 3.6). Integrating a JIT compiler into > CPython already failed with Unladen Swallow :-/ > > PyPy has a complete different design (and has serious issues with the > Python C API), Pyston is restricted to Python 2.7, Pyjion looks > specific to Windows (CoreCLR), Numba is specific to numeric > computations (numpy). IMHO none of these projects can be easily be > merged into CPython "quickly" (again, in a single Python release > cycle). By the way, Pyjion still looks very young (I heard that they > are still working on the compatibility with CPython, not on > performance yet). > We are not ready to have a serious discussion about Pyjion yet as we are still working on compatibility (we have a talk proposal in for PyCon US 2016 and so we are hoping to have something to discuss at the language summit), but Victor's email shows there is some misconceptions about it already and a misunderstanding of our fundamental goal. First off, Pyjion is very much a work-in-progress. You can find it at https://github.com/microsoft/pyjion (where there is an FAQ), but for this audience the key thing to know is that we are still working on compatibility (see https://github.com/Microsoft/Pyjion/blob/master/Tests/python_tests.txt for the list of tests we do (not) pass from the Python test suite). Out of our roughly 400 tests, we don't pass about 18 of them. Second, we have not really started work on performance yet. We have done some very low-hanging fruit stuff, but just barely. IOW we are not really ready to discuss performance (ATM we JIT instantly for all code objects and even being that aggressive with the JIT overhead we are even/slightly slower than an unmodified Python 3.5 VM, so we are hopeful this work will pan out). Third, the over-arching goal of Pyjion is not to add a JIT into CPython, but to add a C API to CPython that will allow plugging in a JIT. If you simply JIT code objects then the API required to let someone plug in a JIT is basically three functions, maybe as little as two (you can see the exact patch against CPython that we are working with at https://github.com/Microsoft/Pyjion/blob/master/Patches/python.diff). We have no interest in shipping a JIT with CPython, just making it much easier to let others add one if they want to because it makes sense for their workload. We have no plans to suggest shipping a JIT with CPython, just to make it an option for people to add in if they want (and if Yury's caching stuff goes in with an execution counter then even the one bit of true overhead we had will be part of CPython already which makes it even more of an easy decision to consider the API we will eventually propose). Fourth, it is not Windows-only by design. CoreCLR is cross-platform on all major OSs, so that is not a restriction (and honestly we are using CoreCLR simply because Dino used to work on the CLR team so he knows the bytecode really well; we easily could have used some other JIT to prove our point). The only reason Pyjion doesn't work with other OSs is momenum/laziness on Dino and my part; Dino hacked together Pyjion at PyCon US 2015 and he is the most comfortable on Windows, and so he just did it in Windows on Visual Studio and just didn't bother to start with e.g., CMake to make it build on other OSs. Since we are still trying to work out some compatibility stuff so we would rather do that than worry about Linux or OS X support right now. Fifth, if we manage to show that a C API can easily be added to CPython to make a JIT something that can simply be plugged in and be useful, then we will also have a basic JIT framework for people to use. As I said, our use of CoreCLR is just for ease of development. There is no reason we couldn't use ChakraCore, v8, LLVM, etc. But since all of these JIT compilers would need to know how to handle CPython bytecode, we have tried to design a framework where JIT compilers just need a wrapper to handle code emission and our framework that we are building will handle driving the code emission (e.g., the wrapper needs to know how to emit add_integer(), but our framework handles when to have to do that). Anyway, as I said, Pyjion is very much a work in progress. We hope to have something more solid to propose/discuss at the language summit at PyCon US 2016. The only reason I keep mentioning it is because what Victor is calling "JIT-like" is really "minimize doing extra work that's not needed" and that benefits everyone trying to do any computational work that takes extra time to speed up CPython (which includes Pyjion). IOW Yury's work combined with Victor's work could quite easily just spill out beyond just local caches and into allowing pluggable JITs in CPython. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Feb 2 13:45:43 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 2 Feb 2016 20:45:43 +0200 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B0EB53.4000307@gmail.com> References: <56AFADB0.8000502@gmail.com> <56B0EB53.4000307@gmail.com> Message-ID: On 02.02.16 19:45, Yury Selivanov wrote: > On 2016-02-02 12:41 PM, Serhiy Storchaka wrote: >> On 01.02.16 21:10, Yury Selivanov wrote: >>> To measure the max/average memory impact, I tuned my code to optimize >>> *every* code object on *first* run. Then I ran the entire Python test >>> suite. Python test suite + standard library both contain around 72395 >>> code objects, which required 20Mb of memory for caches. The test >>> process consumed around 400Mb of memory. Thus, the absolute worst case >>> scenario, the overhead is about 5%. >> >> Test process consumes such much memory because few tests creates huge >> objects. If exclude these tests (note that tests that requires more >> than 1Gb are already excluded by default) and tests that creates a >> number of threads (threads consume much memory too), the rest of tests >> needs less than 100Mb of memory. Absolute required minimum is about >> 25Mb. Thus, the absolute worst case scenario, the overhead is about 100%. > Can you give me the exact configuration of tests (command line to run) > that would only consume 25mb? I don't remember what exact tests consume the most of memory, but following tests are failed when run with less than 30Mb of memory: test___all__ test_asynchat test_asyncio test_bz2 test_capi test_concurrent_futures test_ctypes test_decimal test_descr test_distutils test_docxmlrpc test_eintr test_email test_fork1 test_fstring test_ftplib test_functools test_gc test_gdb test_hashlib test_httplib test_httpservers test_idle test_imaplib test_import test_importlib test_io test_itertools test_json test_lib2to3 test_list test_logging test_longexp test_lzma test_mmap test_multiprocessing_fork test_multiprocessing_forkserver test_multiprocessing_main_handling test_multiprocessing_spawn test_os test_pickle test_poplib test_pydoc test_queue test_regrtest test_resource test_robotparser test_shutil test_smtplib test_socket test_sqlite test_ssl test_subprocess test_tarfile test_tcl test_thread test_threaded_import test_threadedtempfile test_threading test_threading_local test_threadsignals test_tix test_tk test_tools test_ttk_guionly test_ttk_textonly test_tuple test_unicode test_urllib2_localnet test_wait3 test_wait4 test_xmlrpc test_zipfile test_zlib From yselivanov.ml at gmail.com Tue Feb 2 14:23:10 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 2 Feb 2016 14:23:10 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> <56B0EB53.4000307@gmail.com> Message-ID: <56B1021E.3020608@gmail.com> On 2016-02-02 1:45 PM, Serhiy Storchaka wrote: > On 02.02.16 19:45, Yury Selivanov wrote: >> On 2016-02-02 12:41 PM, Serhiy Storchaka wrote: >>> On 01.02.16 21:10, Yury Selivanov wrote: >>>> To measure the max/average memory impact, I tuned my code to optimize >>>> *every* code object on *first* run. Then I ran the entire Python test >>>> suite. Python test suite + standard library both contain around 72395 >>>> code objects, which required 20Mb of memory for caches. The test >>>> process consumed around 400Mb of memory. Thus, the absolute worst >>>> case >>>> scenario, the overhead is about 5%. >>> >>> Test process consumes such much memory because few tests creates huge >>> objects. If exclude these tests (note that tests that requires more >>> than 1Gb are already excluded by default) and tests that creates a >>> number of threads (threads consume much memory too), the rest of tests >>> needs less than 100Mb of memory. Absolute required minimum is about >>> 25Mb. Thus, the absolute worst case scenario, the overhead is about >>> 100%. >> Can you give me the exact configuration of tests (command line to run) >> that would only consume 25mb? > > I don't remember what exact tests consume the most of memory, but > following tests are failed when run with less than 30Mb of memory: > > test___all__ test_asynchat test_asyncio test_bz2 test_capi > test_concurrent_futures test_ctypes test_decimal test_descr > test_distutils test_docxmlrpc test_eintr test_email test_fork1 > test_fstring test_ftplib test_functools test_gc test_gdb test_hashlib > test_httplib test_httpservers test_idle test_imaplib test_import > test_importlib test_io test_itertools test_json test_lib2to3 test_list > test_logging test_longexp test_lzma test_mmap > test_multiprocessing_fork test_multiprocessing_forkserver > test_multiprocessing_main_handling test_multiprocessing_spawn test_os > test_pickle test_poplib test_pydoc test_queue test_regrtest > test_resource test_robotparser test_shutil test_smtplib test_socket > test_sqlite test_ssl test_subprocess test_tarfile test_tcl test_thread > test_threaded_import test_threadedtempfile test_threading > test_threading_local test_threadsignals test_tix test_tk test_tools > test_ttk_guionly test_ttk_textonly test_tuple test_unicode > test_urllib2_localnet test_wait3 test_wait4 test_xmlrpc test_zipfile > test_zlib Alright, I modified the code to optimize ALL code objects, and ran unit tests with the above tests excluded: -- Max process mem (ru_maxrss) = 131858432 -- Opcode cache number of objects = 42109 -- Opcode cache total extra mem = 10901106 And asyncio tests: -- Max process mem (ru_maxrss) = 57081856 -- Opcode cache number of objects = 4656 -- Opcode cache total extra mem = 1766681 So the absolute worst case for a small asyncio program is 3%, for unit tests (with the above list excluded) - 8%. I think it'd be very hard to find a real-life program that consists of only code objects, and nothing else (no data to work with/process, no objects with dicts, no threads, basically nothing). Because only for such a program you would have a 100% memory overhead for the bytecode cache (when all code objects are optimized). FWIW, here are stats for asyncio with only hot objects being optimized: -- Max process mem (ru_maxrss) = 54775808 -- Opcode cache number of objects = 121 -- Opcode cache total extra mem = 43521 Yury From yselivanov.ml at gmail.com Tue Feb 2 14:41:24 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 2 Feb 2016 14:41:24 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> Message-ID: <56B10664.1030500@gmail.com> Hi Victor, On 2016-02-02 4:33 AM, Victor Stinner wrote: > Hi, > > Maybe it's worth to write a PEP to summarize all your changes to > optimize CPython? It would avoid to have to follow different threads > on the mailing lists, different issues on the bug tracker, with > external links to GitHub gists, etc. Your code changes critical parts > of Python: code object structure and Python/ceval.c. Not sure about that... PEPs take a LOT of time :( Besides, all my changes are CPython specific and can be considered as an implementation detail. > > At least, it would help to document Python internals ;-) I can write a ceval.txt file explaining what's going on in ceval loop, with details on the opcode cache and other things. I think it's even better than a PEP, to be honest. Yury From storchaka at gmail.com Tue Feb 2 15:07:11 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 2 Feb 2016 22:07:11 +0200 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B1021E.3020608@gmail.com> References: <56AFADB0.8000502@gmail.com> <56B0EB53.4000307@gmail.com> <56B1021E.3020608@gmail.com> Message-ID: On 02.02.16 21:23, Yury Selivanov wrote: > Alright, I modified the code to optimize ALL code objects, and ran unit > tests with the above tests excluded: > > -- Max process mem (ru_maxrss) = 131858432 > -- Opcode cache number of objects = 42109 > -- Opcode cache total extra mem = 10901106 Thank you for doing these tests. Now results are more convincing to me. > And asyncio tests: > > -- Max process mem (ru_maxrss) = 57081856 > -- Opcode cache number of objects = 4656 > -- Opcode cache total extra mem = 1766681 > FWIW, here are stats for asyncio with only hot objects being optimized: > > -- Max process mem (ru_maxrss) = 54775808 > -- Opcode cache number of objects = 121 > -- Opcode cache total extra mem = 43521 Interesting, 57081856 - 54775808 = 2306048, but 1766681 - 43521 = 1723160. There are additional 0.5Mb lost during fragmentation. From srkunze at mail.de Tue Feb 2 15:27:36 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 2 Feb 2016 21:27:36 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B10664.1030500@gmail.com> References: <56AFADB0.8000502@gmail.com> <56B10664.1030500@gmail.com> Message-ID: <56B11138.9080002@mail.de> On 02.02.2016 20:41, Yury Selivanov wrote: > Hi Victor, > > On 2016-02-02 4:33 AM, Victor Stinner wrote: >> Hi, >> >> Maybe it's worth to write a PEP to summarize all your changes to >> optimize CPython? It would avoid to have to follow different threads >> on the mailing lists, different issues on the bug tracker, with >> external links to GitHub gists, etc. Your code changes critical parts >> of Python: code object structure and Python/ceval.c. > > Not sure about that... PEPs take a LOT of time :( True. > Besides, all my changes are CPython specific and > can be considered as an implementation detail. > >> >> At least, it would help to document Python internals ;-) > > I can write a ceval.txt file explaining what's going on > in ceval loop, with details on the opcode cache and other > things. I think it's even better than a PEP, to be honest. I would love to see that. :) Best, Sven From stephen at xemacs.org Tue Feb 2 15:49:05 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 3 Feb 2016 05:49:05 +0900 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B10664.1030500@gmail.com> References: <56AFADB0.8000502@gmail.com> <56B10664.1030500@gmail.com> Message-ID: <22193.5697.54840.384679@turnbull.sk.tsukuba.ac.jp> Yury Selivanov writes: > Not sure about that... PEPs take a LOT of time :( Informational PEPs need not take so much time, no more than you would spend on ceval.txt. I'm sure a PEP would get a lot more attention from reviewers, too. Even if you PEP the whole thing, as you say it's a (big ;-) implementation detail. A PEP won't make things more controversial (or less) than they already are. I don't see why it would take that much more time than ceval.txt. > I can write a ceval.txt file explaining what's going on > in ceval loop, with details on the opcode cache and other > things. I think it's even better than a PEP, to be honest. Unlikely to be better, since that's a subset of the proposed PEP. Of course it's up to you, since you'd be doing most of the work, but for the rest of us PEPs are a lot more discoverable and easily referenced than a .txt file with a short name. From storchaka at gmail.com Tue Feb 2 15:57:06 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 2 Feb 2016 22:57:06 +0200 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B10664.1030500@gmail.com> References: <56AFADB0.8000502@gmail.com> <56B10664.1030500@gmail.com> Message-ID: On 02.02.16 21:41, Yury Selivanov wrote: > I can write a ceval.txt file explaining what's going on > in ceval loop, with details on the opcode cache and other > things. I think it's even better than a PEP, to be honest. I totally agree. From robertpancoast77 at gmail.com Tue Feb 2 16:19:18 2016 From: robertpancoast77 at gmail.com (=?UTF-8?Q?=C6=A6OB_COASTN?=) Date: Tue, 2 Feb 2016 16:19:18 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> <56B10664.1030500@gmail.com> Message-ID: >> I can write a ceval.txt file explaining what's going on >> in ceval loop, with details on the opcode cache and other >> things. I think it's even better than a PEP, to be honest. > > > I totally agree. > Please include the notes text file. This provides an excellent summary for those of us that haven't yet taken the deep dive into the ceval loop, but still wish to understand its implementation. From victor.stinner at gmail.com Tue Feb 2 16:32:57 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 2 Feb 2016 22:32:57 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B1021E.3020608@gmail.com> References: <56AFADB0.8000502@gmail.com> <56B0EB53.4000307@gmail.com> <56B1021E.3020608@gmail.com> Message-ID: 2016-02-02 20:23 GMT+01:00 Yury Selivanov : > Alright, I modified the code to optimize ALL code objects, and ran unit > tests with the above tests excluded: > > -- Max process mem (ru_maxrss) = 131858432 > -- Opcode cache number of objects = 42109 > -- Opcode cache total extra mem = 10901106 In my experience, RSS is a coarse measure of the memory usage. I wrote tracemalloc to get a reliable measure of the *Python* memory usage: https://docs.python.org/dev/library/tracemalloc.html#tracemalloc.get_traced_memory Run tests with -X tracemalloc -i, and then type in the REPL: >>> import tracemalloc; print("%.1f kB" % (tracemalloc.get_traced_memory()[1] / 1024.)) 10197.7 kB I expect this value to be (much) lower than RSS. Victor From steve.dower at python.org Wed Feb 3 00:15:03 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 2 Feb 2016 21:15:03 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry Message-ID: <56B18CD7.2010409@python.org> I was throwing around some ideas with colleagues about how we detect Python installations on Windows from within Visual Studio, and it came up that there are many Python distros that install into different locations but write the same registry entries. (I knew about this, of course, but this time I decided to do something.) Apart from not being detected properly by all IDEs/tools/installers, non-standard distros that register themselves in the official keys may also mess with the default sys.path values. For example, at one point (possibly still true) if you installed both Canopy and Anaconda, you would break the first one because they tried to load the other's stdlib. Other implementations have different structures or do not register themselves at all, which also makes it more complicated for tools to discover them. So here is a rough proposal to standardise the registry keys that can be set on Windows in a way that (a) lets other installers besides the official ones have equal footing, (b) provides consistent search and resolution semantics for tools, and (c) includes slightly more rich metadata (such as display names and URLs). Presented in PEP-like form here, but if feedback suggests just putting it in the docs I'm okay with that too. It is fully backwards compatible with official releases of Python (at least back to 2.5, possibly further) and does not require modifications to Python or the official installer - it is purely codifying a superset of what we already do. Any and all feedback welcomed, especially from the owners of other distros, Python implementations or tools on the list. Cheers, Steve ----- PEP: ??? Title: Python environment registration in the Windows Registry Version: $Revision$ Last-Modified: $Date$ Author: Steve Dower Status: Draft Type: ??? Content-Type: text/x-rst Created: 02-Feb-2016 Abstract ======== When installed on Windows, the official Python installer creates a registry key for discovery and detection by other applications. Unofficial installers, such as those used by distributions, typically create identical keys for the same purpose. However, these may conflict with the official installer or other distributions. This PEP defines a schema for the Python registry key to allow unofficial installers to separately register their installation, and to allow applications to detect and correctly display all Python environments on a user's machine. No implementation changes to Python are proposed with this PEP. The schema matches the registry values that have been used by the official installer since at least Python 2.5, and the resolution behaviour matches the behaviour of the official Python releases. Specification ============= We consider there to be a single collection of Python environments on a machine, where the collection may be different for each user of the machine. There are three potential registry locations where the collection may be stored based on the installation options of each environment. These are:: HKEY_CURRENT_USER\Software\Python\\ HKEY_LOCAL_MACHINE\Software\Python\\ HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\\ On a given machine, an environment is uniquely identified by its Company-Tag pair. Keys should be searched in the order shown, and if the same Company-Tag pair appears in more than one of the above locations, only the first occurrence is offerred. Official Python releases use ``PythonCore`` for Company, and the value of ``sys.winver`` for Tag. Other registered environments may use any values for Company and Tag. Recommendations are made in the following sections. Backwards Compatibility ----------------------- Python 3.4 and earlier did not distinguish between 32-bit and 64-bit builds in ``sys.winver``. As a result, it is possible to have valid side-by-side installations of both 32-bit and 64-bit interpreters. To ensure backwards compatibility, applications should treat environments listed under the following two registry keys as distinct, even if Tag matches:: HKEY_LOCAL_MACHINE\Software\Python\PythonCore\ HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\PythonCore\ Note that this does not apply to Python 3.5 and later, which uses different Tags. Environments registered under other Company names must use distinct Tags for side-by-side installations. 1. Environments in ``HKEY_CURRENT_USER`` are always preferred 2. Environments in ``HKEY_LOCAL_MACHINE\Software\Wow6432Node`` are preferred if the interpreter is known to be 32-bit Company ------- The Company part of the key is intended to group related environments and to ensure that Tags are namespaced appropriately. The key name should be alphanumeric without spaces and likely to be unique. For example, a trademarked name, a UUID, or a hostname would be appropriate:: HKEY_CURRENT_USER\Software\Python\ExampleCorp HKEY_CURRENT_USER\Software\Python\6C465E66-5A8C-4942-9E6A-D29159480C60 HKEY_CURRENT_USER\Software\Python\www.example.com If a string value named ``DisplayName`` exists, it should be used to identify the environment category to users. Otherwise, the name of the key should be used. If a string value named ``SupportUrl`` exists, it may be displayed or otherwise used to direct users to a web site related to the environment. A complete example may look like:: HKEY_CURRENT_USER\Software\Python\ExampleCorp (Default) = (value not set) DisplayName = "Example Corp" SupportUrl = "http://www.example.com" Tag --- The Tag part of the key is intended to uniquely identify an environment within those provided by a single company. The key name should be alphanumeric without spaces and stable across installations. For example, the Python language version, a UUID or a partial/complete hash would be appropriate; an integer counter that increases for each new environment may not:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6 HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66 If a string value named ``DisplayName`` exists, it should be used to identify the environment to users. Otherwise, the name of the key should be used. If a string value named ``SupportUrl`` exists, it may be displayed or otherwise used to direct users to a web site related to the environment. If a string value named ``Version`` exists, it should be used to identify the version of the environment. This is independent from the version of Python implemented by the environment. If a string value named ``SysVersion`` exists, it must be in ``x.y`` or ``x.y.z`` format matching the version returned by ``sys.version_info`` in the interpreter. Otherwise, if the Tag matches this format it is used. If not, the Python version is unknown. Note that each of these values is recommended, but optional. A complete example may look like this:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66 (Default) = (value not set) DisplayName = "Distro 3" SupportUrl = "http://www.example.com/distro-3" Version = "3.0.12345.0" SysVersion = "3.6.0" InstallPath ----------- Beneath the environment key, an ``InstallPath`` key must be created. This key is always named ``InstallPath``, and the default value must match ``sys.prefix``:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6\InstallPath (Default) = "C:\ExampleCorpPy36" If a string value named ``ExecutablePath`` exists, it must be a path to the ``python.exe`` (or equivalent) executable. Otherwise, the interpreter executable is assumed to be called ``python.exe`` and exist in the directory referenced by the default value. If a string value named ``WindowedExecutablePath`` exists, it must be a path to the ``pythonw.exe`` (or equivalent) executable. Otherwise, the windowed interpreter executable is assumed to be called ``pythonw.exe`` and exist in the directory referenced by the default value. A complete example may look like:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\InstallPath (Default) = "C:\ExampleDistro30" ExecutablePath = "C:\ExampleDistro30\ex_python.exe" WindowedExecutablePath = "C:\ExampleDistro30\ex_pythonw.exe" Help ---- Beneath the environment key, a ``Help`` key may be created. This key is always named ``Help`` if present and has no default value. Each subkey of ``Help`` specifies a documentation file, tool, or URL associated with the environment. The subkey may have any name, and the default value is a string appropriate for passing to ``os.startfile`` or equivalent. If a string value named ``DisplayName`` exists, it should be used to identify the help file to users. Otherwise, the key name should be used. A complete example may look like:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\Help Python\ (Default) = "C:\ExampleDistro30\python36.chm" DisplayName = "Python Documentation" Extras\ (Default) = "http://www.example.com/tutorial" DisplayName = "Example Distro Online Tutorial" From tritium-list at sdamon.com Wed Feb 3 03:15:56 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Wed, 03 Feb 2016 03:15:56 -0500 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B18CD7.2010409@python.org> References: <56B18CD7.2010409@python.org> Message-ID: <56B1B73C.1030204@sdamon.com> ...just when I thought I have solved the registry headaches I have been dealing with... I am not saying this proposal will make the registry situation worse, but it may break my solution to the headaches Python's registry use causes with some non-standard module installers (and even the standard distutils exe installers, but that is being mitigated). In the wild exist modules with their own EXE or MSI installers that check the registry for 'the system python'. No matter how hard you hit them, they will only install to *that one python*. You can imagine the sadist that builds such an installer would not be receptive to the concept of wheels. So in order to force those modules to install to the python YOU want, dammit, you have to edit the registry. I have (contributed to) a script that just sets whatever python it was last run with to the one true system python. Works with virtualenvs too.This is not a terribly obscure script either, by the way. It is in the first reply to the google search for "how to install exes in a virtualenv". So here I am in a situation where some pythons use another registry entry. I have no idea if this makes my life as a user harder. What does this kind of meddling do against tools trying to autodetect python, and finding an ever changing value? Are we trying to guarantee that the keys used by the python.org installers or only ever actually used by standard CPython? I know for PTVS manually adding a python environment to visual studio is trivial - you fill in three locations, and its done. Just today I added a python environment to my system that was not autodetected. It took under a minute and almost no effort to add it... so for that tool this adds very little benefit. I do not know about other tools. On a very personal note (like the rest of this wasn't my personal issues with possibly making my life slightly more difficult), I would much rather see python stop touching the registry all together - but I have no strong argument for that. On 2/3/2016 00:15, Steve Dower wrote: > I was throwing around some ideas with colleagues about how we detect > Python installations on Windows from within Visual Studio, and it came > up that there are many Python distros that install into different > locations but write the same registry entries. (I knew about this, of > course, but this time I decided to do something.) > > Apart from not being detected properly by all IDEs/tools/installers, > non-standard distros that register themselves in the official keys may > also mess with the default sys.path values. For example, at one point > (possibly still true) if you installed both Canopy and Anaconda, you > would break the first one because they tried to load the other's stdlib. > > Other implementations have different structures or do not register > themselves at all, which also makes it more complicated for tools to > discover them. > > So here is a rough proposal to standardise the registry keys that can > be set on Windows in a way that (a) lets other installers besides the > official ones have equal footing, (b) provides consistent search and > resolution semantics for tools, and (c) includes slightly more rich > metadata (such as display names and URLs). Presented in PEP-like form > here, but if feedback suggests just putting it in the docs I'm okay > with that too. It is fully backwards compatible with official releases > of Python (at least back to 2.5, possibly further) and does not > require modifications to Python or the official installer - it is > purely codifying a superset of what we already do. > > Any and all feedback welcomed, especially from the owners of other > distros, Python implementations or tools on the list. > > Cheers, > Steve > > ----- > > PEP: ??? > Title: Python environment registration in the Windows Registry > Version: $Revision$ > Last-Modified: $Date$ > Author: Steve Dower > Status: Draft > Type: ??? > Content-Type: text/x-rst > Created: 02-Feb-2016 > > > Abstract > ======== > > When installed on Windows, the official Python installer creates a > registry key for discovery and detection by other applications. > Unofficial installers, such as those used by distributions, typically > create identical keys for the same purpose. However, these may > conflict with the official installer or other distributions. > > This PEP defines a schema for the Python registry key to allow > unofficial installers to separately register their installation, and > to allow applications to detect and correctly display all Python > environments on a user's machine. No implementation changes to Python > are proposed with this PEP. > > The schema matches the registry values that have been used by the > official installer since at least Python 2.5, and the resolution > behaviour matches the behaviour of the official Python releases. > > Specification > ============= > > We consider there to be a single collection of Python environments on > a machine, where the collection may be different for each user of the > machine. There are three potential registry locations where the > collection may be stored based on the installation options of each > environment. These are:: > > HKEY_CURRENT_USER\Software\Python\\ > HKEY_LOCAL_MACHINE\Software\Python\\ > HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\\ > > On a given machine, an environment is uniquely identified by its > Company-Tag pair. Keys should be searched in the order shown, and if > the same Company-Tag pair appears in more than one of the above > locations, only the first occurrence is offerred. > > Official Python releases use ``PythonCore`` for Company, and the value > of ``sys.winver`` for Tag. Other registered environments may use any > values for Company and Tag. Recommendations are made in the following > sections. > > > > Backwards Compatibility > ----------------------- > > Python 3.4 and earlier did not distinguish between 32-bit and 64-bit > builds in ``sys.winver``. As a result, it is possible to have valid > side-by-side installations of both 32-bit and 64-bit interpreters. > > To ensure backwards compatibility, applications should treat > environments listed under the following two registry keys as distinct, > even if Tag matches:: > > HKEY_LOCAL_MACHINE\Software\Python\PythonCore\ > HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\PythonCore\ > > Note that this does not apply to Python 3.5 and later, which uses > different Tags. Environments registered under other Company names must > use distinct Tags for side-by-side installations. > > 1. Environments in ``HKEY_CURRENT_USER`` are always preferred > 2. Environments in ``HKEY_LOCAL_MACHINE\Software\Wow6432Node`` are > preferred if the interpreter is known to be 32-bit > > > Company > ------- > > The Company part of the key is intended to group related environments > and to ensure that Tags are namespaced appropriately. The key name > should be alphanumeric without spaces and likely to be unique. For > example, a trademarked name, a UUID, or a hostname would be appropriate:: > > HKEY_CURRENT_USER\Software\Python\ExampleCorp > HKEY_CURRENT_USER\Software\Python\6C465E66-5A8C-4942-9E6A-D29159480C60 > HKEY_CURRENT_USER\Software\Python\www.example.com > > If a string value named ``DisplayName`` exists, it should be used to > identify the environment category to users. Otherwise, the name of the > key should be used. > > If a string value named ``SupportUrl`` exists, it may be displayed or > otherwise used to direct users to a web site related to the environment. > > A complete example may look like:: > > HKEY_CURRENT_USER\Software\Python\ExampleCorp > (Default) = (value not set) > DisplayName = "Example Corp" > SupportUrl = "http://www.example.com" > > Tag > --- > > The Tag part of the key is intended to uniquely identify an > environment within those provided by a single company. The key name > should be alphanumeric without spaces and stable across installations. > For example, the Python language version, a UUID or a partial/complete > hash would be appropriate; an integer counter that increases for each > new environment may not:: > > HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6 > HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66 > > If a string value named ``DisplayName`` exists, it should be used to > identify the environment to users. Otherwise, the name of the key > should be used. > > If a string value named ``SupportUrl`` exists, it may be displayed or > otherwise used to direct users to a web site related to the environment. > > If a string value named ``Version`` exists, it should be used to > identify the version of the environment. This is independent from the > version of Python implemented by the environment. > > If a string value named ``SysVersion`` exists, it must be in ``x.y`` > or ``x.y.z`` format matching the version returned by > ``sys.version_info`` in the interpreter. Otherwise, if the Tag matches > this format it is used. If not, the Python version is unknown. > > Note that each of these values is recommended, but optional. A > complete example may look like this:: > > HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66 > (Default) = (value not set) > DisplayName = "Distro 3" > SupportUrl = "http://www.example.com/distro-3" > Version = "3.0.12345.0" > SysVersion = "3.6.0" > > InstallPath > ----------- > > Beneath the environment key, an ``InstallPath`` key must be created. > This key is always named ``InstallPath``, and the default value must > match ``sys.prefix``:: > > HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6\InstallPath > (Default) = "C:\ExampleCorpPy36" > > If a string value named ``ExecutablePath`` exists, it must be a path > to the ``python.exe`` (or equivalent) executable. Otherwise, the > interpreter executable is assumed to be called ``python.exe`` and > exist in the directory referenced by the default value. > > If a string value named ``WindowedExecutablePath`` exists, it must be > a path to the ``pythonw.exe`` (or equivalent) executable. Otherwise, > the windowed interpreter executable is assumed to be called > ``pythonw.exe`` and exist in the directory referenced by the default > value. > > A complete example may look like:: > > HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\InstallPath > (Default) = "C:\ExampleDistro30" > ExecutablePath = "C:\ExampleDistro30\ex_python.exe" > WindowedExecutablePath = "C:\ExampleDistro30\ex_pythonw.exe" > > Help > ---- > > Beneath the environment key, a ``Help`` key may be created. This key > is always named ``Help`` if present and has no default value. > > Each subkey of ``Help`` specifies a documentation file, tool, or URL > associated with the environment. The subkey may have any name, and the > default value is a string appropriate for passing to ``os.startfile`` > or equivalent. > > If a string value named ``DisplayName`` exists, it should be used to > identify the help file to users. Otherwise, the key name should be used. > > A complete example may look like:: > > HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\Help > Python\ > (Default) = "C:\ExampleDistro30\python36.chm" > DisplayName = "Python Documentation" > Extras\ > (Default) = "http://www.example.com/tutorial" > DisplayName = "Example Distro Online Tutorial" > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com From v+python at g.nevcal.com Wed Feb 3 04:50:46 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 3 Feb 2016 01:50:46 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B1B73C.1030204@sdamon.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> Message-ID: <56B1CD76.3000000@g.nevcal.com> On 2/3/2016 12:15 AM, Alexander Walters wrote: > On a very personal note (like the rest of this wasn't my personal > issues with possibly making my life slightly more difficult), I would > much rather see python stop touching the registry all together - but I > have no strong argument for that. Me too. My opinions follow, some might call them arguments, strong or weak, some might call them foolishness. I've been setting up a collection of tools and programs on Dropbox for far-flung project team members (users, not programmers) to share. It is nearly impossible to install a typical Windows program on Dropbox, because the installation information is partly in the installed directory structure, and partly in the registry. Exporting registry info from the machine that does the install to other machines is hard because other users have different paths to Dropbox. OK, for commercial software, installing to Dropbox probably violates some of the licensing, forcing each user to buy/install, but FOSS or in-house software doesn't have such restrictions: but still, most of it wants to use the registry, and Windows almost, but doesn't quite, force it to. Portable software exists, but often is 3rd party hacks of popular FOSS rather than being directly supported by the FOSS development team. Python falls into this category. Happily, I recently found WinPython Zero, which hacks it (somehow) to work portably, and I've deployed it successfully on Dropbox. I'd rather Python were portable on its own, without hacks. Portability requires not using the registry, so I agree with Alexander there. Portability, as "Windows portable software" is generally defined, is focused on moving programs and data around on a flash drive, from one computer to another, and is focused on single-user, any (Windows) machine (with sufficient specs). That doesn't quite fit the Dropbox environment. Most portable software reintroduces the idea of storing configuration information in the program folder, which is OK for "project" configuration info, done once, by one person, but not for "personal preferences". The other thing Windows GUI lost is the concept of "current working directory", which hit me hard when I first started trying to set up project working areas on Dropbox. Many Windows programs want to run only one copy of themselves, in one Window, with one set of settings, and one "Start In" directory (which generally defaults to .... the program directory, or sometimes to "My Documents". This is why I went looking for a portable Python (and other portable things), and I finally realized that I was "fighting city hall" in trying to get an environment set up that was usable for the various teams (of users, not developers). Writeup at slashdot for more details on the lack of a "current working directory" concept in Windows GUI programs. The path to Dropbox folders is different for everyone, even the drive letter can be different. So here are my preferences for CPython: 1) It would be best CPython itself were fully portable. That wherever you point the installer, it lives there, and if somehow you happen to execute it from that directory, it would use its invocation path as the basis for finding the rest of the installed pieces. 2) A script could be provided that sets the association for .py and the corresponding ftype to point to the python that executed the script... and which has a parameter to clear that back out, too. This would be to allow users to set a particular python for convenient script execution, since Windows does the association thing, instead of using #! lines. 3) A script could be provided (maybe the one Alexander referred to) that sets the registry so that the "apparently one true System Python" points to the python that executed the script... and which has a parameter to clear that back out, too. This would be for compatible transition to the new registry-free Python versions for packages that have weird installers (such as Alexander alluded to). But with the registry-free Python available, those packages would hopefully see the future is registry free, and avoid requiring registry data to do installs. 4) A script could be provided to add the python that executed the script to the PATH, with an option to remove it. 5) A script could be provided to add the python that executed the script to the launcher .ini file, with an option to remove it. 6) A script could be provided to add the python that executed it to the Start menu, and/or Desktop icons, with an option to remove them. Maybe scripts 2-6 are all the same one, with different options (or different invocation shortcuts for the clicky folk). Not everyone would probably need all the scripts, but none of them would be particularly large, such that they would be burdensome for others to ignore. Such a collection of scripts would allow folks to achieve various levels of integration with Windows "conveniences", without requiring it. The portability would allow Python to be installed on Dropbox or a network share, and used from there without requiring every team member to do all the individual installs of the packages needed for a project. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Feb 3 06:21:48 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Feb 2016 11:21:48 +0000 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B1CD76.3000000@g.nevcal.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> Message-ID: On 3 February 2016 at 09:50, Glenn Linderman wrote: > On 2/3/2016 12:15 AM, Alexander Walters wrote: > >> On a very personal note (like the rest of this wasn't my personal issues >> with possibly making my life slightly more difficult), I would much rather >> see python stop touching the registry all together - but I have no strong >> argument for that. > > Me too. My opinions follow, some might call them arguments, strong or weak, > some might call them foolishness. I also would prefer that Python not use/need the registry. As far as I know, in practical terms Python works just fine without the registry entries being present, but I'd like it if that were formally supported as an option for users. Obviously it means that tools that want to see what Python versions are installed won't see such unregistered copies, I'm fine with that (and I don't have a problem with leaving it up to said tools as to whether they want to include ways to support using a Python installation that's not in the registry). Some issues with this proposal: 1. I don't like the way it states that Python distributions "must" set keys. I'd rather that it were explicitly stated that a distribution which sets no registry keys is entirely valid (with the obvious implication that tools which scan the registry looking for installed Python distributions won't be able to see such an installation). 2. It's not clear to me how alternative distributions will specify their registry keys (or the fact that they don't use them). The registry keys are built into the Python interpreter - this proposal seems to imply that distributions that simply repackage the python.org build will no longer be valid or supported, and instead anyone who wants to produce a custom Python distribution (even private in-house repackagings) will need to change the Python source and rebuild. That's a pretty major change, and if that *is* the intent, then as a minimum I'd say we need to provide compiler flags to let redistributors specify their Company/Tag values (or say that they want to omit registry use). And I'm still not happy that "repackage the python.org binaries" has been removed as an option. It's possible that the reason the above two points have been missed is because the proposal focuses purely on "informational" registry data. But Python also modifies sys.path based on the registry entries - and possibly has other behavioural changes as well. The pywin32 package, in particular, makes use of this (it's a feature of pywin32 that I disagree with and I wish it didn't do that, but it does, and it's a very widely used package on Windows). So ignoring this aspect of Python's behaviour is a big problem. (Also, what changes will pywin32 need to make to correctly support being installed into non-python.org distributions when this proposal is implemented?) Paul From steve.dower at python.org Wed Feb 3 11:17:59 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 3 Feb 2016 08:17:59 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B1B73C.1030204@sdamon.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> Message-ID: <56B22837.5060102@python.org> On 03Feb2016 0015, Alexander Walters wrote: > I am not saying this proposal will make the registry situation worse, > but it may break my solution to the headaches Python's registry use > causes with some non-standard module installers (and even the standard > distutils exe installers, but that is being mitigated). In the wild > exist modules with their own EXE or MSI installers that check the > registry for 'the system python'. No matter how hard you hit them, they > will only install to *that one python*. You can imagine the sadist that > builds such an installer would not be receptive to the concept of > wheels. (I agree, but maybe if you can point them to a PEP it'll help? Right now, there's nowhere to even point them to.) > So in order to force those modules to install to the python YOU > want, dammit, you have to edit the registry. I have (contributed to) a > script that just sets whatever python it was last run with to the one > true system python. Works with virtualenvs too.This is not a terribly > obscure script either, by the way. It is in the first reply to the > google search for "how to install exes in a virtualenv". I highly doubt it will break your current solution, as everyone pre- and post-update will still look in PythonCore\X.Y for Python. > So here I am in a situation where some pythons use another registry > entry. I have no idea if this makes my life as a user harder. What does > this kind of meddling do against tools trying to autodetect python, and > finding an ever changing value? Are we trying to guarantee that the > keys used by the python.org installers or only ever actually used by > standard CPython? Guaranteeing that only python.org Python uses PythonCore is part of it, but the other part is officially telling everyone else that they are welcome to create their own keys. > I know for PTVS manually adding a python environment to visual studio is > trivial - you fill in three locations, and its done. Just today I added > a python environment to my system that was not autodetected. It took > under a minute and almost no effort to add it... so for that tool this > adds very little benefit. I do not know about other tools. I'm also a PTVS maintainer, so I know how much magic is going on behind those three locations :) But I don't think people should need to do that by hand at all. For example, the path to an Anaconda installation is buried deep inside AppData (as is Python 3.5+ now), and varies based on your username. Canopy does the same, and once you've found it there are (or were?) at least three copies of python.exe to choose from (we worked with Enthought to make this Just Work for PTVS users). > On a very personal note (like the rest of this wasn't my personal issues > with possibly making my life slightly more difficult), I would much > rather see python stop touching the registry all together - but I have > no strong argument for that. I also agree with that, but ultimately the registry is the global configuration store on Windows, and when we need global state it is the place to go. (My actual hope for Python 3.6 is to drop the few places where Python *reads* from the registry for configuration that shouldn't be global, at which point the Python key is solely for other programs. But those fixes are probably not PEP-worthy.) Cheers, Steve From steve.dower at python.org Wed Feb 3 11:29:39 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 3 Feb 2016 08:29:39 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B1CD76.3000000@g.nevcal.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> Message-ID: <56B22AF3.6050203@python.org> On 03Feb2016 0150, Glenn Linderman wrote: > Portable software exists, but often is 3rd party hacks of popular FOSS > rather than being directly supported by the FOSS development team. > Python falls into this category. Happily, I recently found WinPython > Zero, which hacks it (somehow) to work portably, and I've deployed it > successfully on Dropbox. I'd rather Python were portable on its own, > without hacks. > > Portability requires not using the registry, so I agree with Alexander > there. Python has been registry independent for a while if you set the PYTHONHOME environment variable (and independent but potentially unreliable even without this). Most of the registry settings created on install are for supporting upgrades, repairs and uninstallation, none of which matter in your case. (Also many python-dev readers' cases, but there are a lot of users who get into trouble very quickly without this level of management.) > So here are my preferences for CPython: > > 1) It would be best CPython itself were fully portable. That wherever > you point the installer, it lives there, and if somehow you happen to > execute it from that directory, it would use its invocation path as the > basis for finding the rest of the installed pieces. Agreed, and it already basically does this as mentioned above. > 2) A script could be provided that sets the association for .py and the > corresponding ftype to point to the python that executed the script... > and which has a parameter to clear that back out, too. This would be to > allow users to set a particular python for convenient script execution, > since Windows does the association thing, instead of using #! lines. You probably want the py.exe launcher here (though that relies on Python being registered...), as it handles shebangs - even /usr/bin/env style. A script such as what you're asking for would be possible, but not something I want to be responsible for providing and maintaining. > 3) A script could be provided (maybe the one Alexander referred to) that > sets the registry so that the "apparently one true System Python" points > to the python that executed the script... and which has a parameter to > clear that back out, too. This would be for compatible transition to > the new registry-free Python versions for packages that have weird > installers (such as Alexander alluded to). But with the registry-free > Python available, those packages would hopefully see the future is > registry free, and avoid requiring registry data to do installs. > 4) A script could be provided to add the python that executed the script > to the PATH, with an option to remove it. > > 5) A script could be provided to add the python that executed the script > to the launcher .ini file, with an option to remove it. > > 6) A script could be provided to add the python that executed it to the > Start menu, and/or Desktop icons, with an option to remove them. 4-6 are basically the definition of an installer, and other installers are also able to do this. > Maybe scripts 2-6 are all the same one, with different options (or > different invocation shortcuts for the clicky folk). Not everyone would > probably need all the scripts, but none of them would be particularly > large, such that they would be burdensome for others to ignore. > > Such a collection of scripts would allow folks to achieve various levels > of integration with Windows "conveniences", without requiring it. The > portability would allow Python to be installed on Dropbox or a network > share, and used from there without requiring every team member to do all > the individual installs of the packages needed for a project. Perhaps you really want a script to run the installer and then pip? Final point I want to reiterate - Python itself is essentially registry free already in that it does not need registry settings to function. The current problems are: 1. other programs need to locate all available Pythons 2. there appears to only be one space to register your Python 3. this space is *sometimes* used by Python to locate itself and installed packages I want to fix problem 2 via documentation, and then look at the much more difficult problem 3. Cheers, Steve From p.f.moore at gmail.com Wed Feb 3 11:39:16 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Feb 2016 16:39:16 +0000 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B22AF3.6050203@python.org> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> <56B22AF3.6050203@python.org> Message-ID: On 3 February 2016 at 16:29, Steve Dower wrote: > Final point I want to reiterate - Python itself is essentially registry free > already in that it does not need registry settings to function. That's something we should probably publicise better. People seem unaware of it (in much the same way that they never really noticed zip application support). Maybe we could include a section in the Python 3.6 "What's new" (even though it's not technically new - but I did a quick check of What's New back to 3.2 and couldn't see any mention)? Paul From steve.dower at python.org Wed Feb 3 11:46:59 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 3 Feb 2016 08:46:59 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> Message-ID: <56B22F03.5080102@python.org> On 03Feb2016 0321, Paul Moore wrote: > Some issues with this proposal: > > 1. I don't like the way it states that Python distributions "must" set > keys. I'd rather that it were explicitly stated that a distribution > which sets no registry keys is entirely valid (with the obvious > implication that tools which scan the registry looking for installed > Python distributions won't be able to see such an installation). Good point, I never meant to imply that. If you don't want your Python install/env found then you don't have to register anything. (Of course, when users come to us IDE developers and say "your tool can't find Python XYZ", we'll all just go to Python XYZ and say "your users need you to register when you install, see this PEP" :) ) > 2. It's not clear to me how alternative distributions will specify > their registry keys (or the fact that they don't use them). The > registry keys are built into the Python interpreter - this proposal > seems to imply that distributions that simply repackage the python.org > build will no longer be valid or supported, and instead anyone who > wants to produce a custom Python distribution (even private in-house > repackagings) will need to change the Python source and rebuild. > That's a pretty major change, and if that *is* the intent, then as a > minimum I'd say we need to provide compiler flags to let > redistributors specify their Company/Tag values (or say that they want > to omit registry use). And I'm still not happy that "repackage the > python.org binaries" has been removed as an option. There's only one place where the registry key is used within the interpreter itself, which is PC/getpathp.c. Essentially the process is this: sys.path = [] sys.path.append('') sys.path.extend(os.getenv('PYTHONPATH').split(';')) sys.path.extend(read_subkeys(fr'HKCU\Software\Python\PythonCore\{sys.winver}\PythonPath\**')) sys.path.extend(read_subkeys(fr'HKLM\Software\Python\PythonCore\{sys.winver}\PythonPath\**')) home = os.getenv('PYTHONHOME') if not home: if os.path.exists(os.path.join(sys.argv[0], '..', 'Lib', 'os.py')): home = os.path.dirname(sys.argv[0]) if not home: paths = read_value(fr'HKCU\Software\Python\PythonCore\{sys.winver}\PythonPath') if not paths: paths = read_value(fr'HKLM\Software\Python\PythonCore\{sys.winver}\PythonPath') if paths: sys.path.extend(paths.split(';')) else: sys.path.append(r'.\Lib') # more well-known subdirs else: sys.path.append(os.path.join(home, 'Lib')) # more well-known subdirs ... So a few high-level observations: * any program can install anywhere on the machine and make its libraries available to a specific version of Python by creating a subkey under 'PythonCore\x.y\PythonPath' * any environment lacking 'Lib\os.py' (e.g. venv) relies on the registry to locate enough stdlib to import site * this is too complicated, but guaranteed we will break users in production if we change it now So if repackagers follow a few rules (that I documented in https://docs.python.org/3.5/using/windows.html - I see the process above is also documented there, which I wish I remembered before writing all that out), they'll be fine. Unfortunately, following those rules means that you don't register anywhere that separate tools can find you, and so users complain and you "fix" it by doing the wrong thing. This PEP offers a right way to fix it. > It's possible that the reason the above two points have been missed is > because the proposal focuses purely on "informational" registry data. > But Python also modifies sys.path based on the registry entries - and > possibly has other behavioural changes as well. The pywin32 package, > in particular, makes use of this (it's a feature of pywin32 that I > disagree with and I wish it didn't do that, but it does, and it's a > very widely used package on Windows). So ignoring this aspect of > Python's behaviour is a big problem. (Also, what changes will pywin32 > need to make to correctly support being installed into non-python.org > distributions when this proposal is implemented?) I haven't looked into pywin32's use of this recently - I tend to only use Christoph Gohlke's wheels that don't register anything. But it is certainly a valid concern. Hopefully Mark Hammond is watching :) Cheers, Steve > > Paul From steve.dower at python.org Wed Feb 3 12:04:23 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 3 Feb 2016 09:04:23 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> <56B22AF3.6050203@python.org> Message-ID: <56B23317.2080507@python.org> On 03Feb2016 0839, Paul Moore wrote: > On 3 February 2016 at 16:29, Steve Dower wrote: >> Final point I want to reiterate - Python itself is essentially registry free >> already in that it does not need registry settings to function. > > That's something we should probably publicise better. People seem > unaware of it (in much the same way that they never really noticed zip > application support). Maybe we could include a section in the Python > 3.6 "What's new" (even though it's not technically new - but I did a > quick check of What's New back to 3.2 and couldn't see any mention)? Maybe, but since it is still potentially problematic I'd rather not right now. Basically, I don't want to have to support people whose "portable" version of Python works fine on one machine, but has syntax errors in the stdlib on another machine. Adding the applocal option for 3.5 (described at https://docs.python.org/3.5/using/windows.html) helps with this, but I'm guessing always running in isolated mode is not what most people really want. Until we genuinely never rely on the registry, I don't want to claim that we don't rely on the registry. Cheers, Steve --- Rest of the email is spelling out how to create the scenario above, since I assume people won't believe it :) 1. Take Python 3.4.1, install it (Just for Me), zip up the stdlib into python34.zip and copy the binaries and zip to a "portable" folder 2. Update to Python 3.4.2 on the main machine 3. Run "-m test test_grammar" with your portable 3.4.1. 4. Boom! SyntaxError in test_grammar.py because you picked up the 3.4.2 stdlib (the error comes from https://hg.python.org/cpython/rev/4ad33d82193d) From p.f.moore at gmail.com Wed Feb 3 12:12:54 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Feb 2016 17:12:54 +0000 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B22F03.5080102@python.org> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> <56B22F03.5080102@python.org> Message-ID: On 3 February 2016 at 16:46, Steve Dower wrote: > So a few high-level observations: > > * any program can install anywhere on the machine and make its libraries available to a specific version of Python by creating a subkey under 'PythonCore\x.y\PythonPath' Yeah, that's horrid but not really something we can change without compatibility breakage. But it's essentially opt-in - if you don't need the feature you don't have to register anything under that key. > * any environment lacking 'Lib\os.py' (e.g. venv) relies on the registry to locate enough stdlib to import site > * this is too complicated, but guaranteed we will break users in production if we change it now > > So if repackagers follow a few rules (that I documented in https://docs.python.org/3.5/using/windows.html - I see the process above is also documented there, which I wish I remembered before writing all that out), they'll be fine. Unfortunately, following those rules means that you don't register anywhere that separate tools can find you, and so users complain and you "fix" it by doing the wrong thing. Thanks for the explanation. And for documenting it (even if I looked for the documentation, failed to find it and then whined about it not being documented - my apologies!) More specifically for the people wanting "portable" Python systems, if I read that right then if python.exe is alongside a Lib directory containing os.py, Python needs no environment variables, and no registry entries, to run perfectly. I don't see any issue with Python builds that don't register themselves not showing up in tools that look for Python installations. And as you say, if we give people who make "official" distributions a way to properly register, then that helps them and leaves the unregistered case for "homebrew" portable copies of Python. (I've just seen your other note about it being "potentially problematic". OK, let's leave it low-key for now, but when we are comfortable with it, can we publicise it then? I get a definite impression that quite a lot of people assume that "you can't have a portable build of Python"). > This PEP offers a right way to fix it. Thanks for the explanation, and I now agree that's what the PEP is doing. So +0.5 from me for this PEP (Only 0.5, because I still have some concerns that talking about registry entries in such detail gives the impression that Python is tied to them more than it actually is. If you can see a way of toning down the wording, then great, but better to document the proposal accurately than to water it down because people might get a mistaken impression). >> It's possible that the reason the above two points have been missed is >> because the proposal focuses purely on "informational" registry data. >> But Python also modifies sys.path based on the registry entries - and >> possibly has other behavioural changes as well. The pywin32 package, >> in particular, makes use of this (it's a feature of pywin32 that I >> disagree with and I wish it didn't do that, but it does, and it's a >> very widely used package on Windows). So ignoring this aspect of >> Python's behaviour is a big problem. (Also, what changes will pywin32 >> need to make to correctly support being installed into non-python.org >> distributions when this proposal is implemented?) > > I haven't looked into pywin32's use of this recently - I tend to only use Christoph Gohlke's wheels that don't register anything. But it is certainly a valid concern. Hopefully Mark Hammond is watching :) Yeah, I've not checked if pywin32 still does this, it's a long time since I really used it. Like you, I go for wheels only these days. Paul From p.f.moore at gmail.com Wed Feb 3 12:15:38 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Feb 2016 17:15:38 +0000 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B23317.2080507@python.org> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> <56B22AF3.6050203@python.org> <56B23317.2080507@python.org> Message-ID: On 3 February 2016 at 17:04, Steve Dower wrote: > Rest of the email is spelling out how to create the scenario above, since I > assume people won't believe it :) > > 1. Take Python 3.4.1, install it (Just for Me), zip up the stdlib into > python34.zip and copy the binaries and zip to a "portable" folder > > 2. Update to Python 3.4.2 on the main machine > > 3. Run "-m test test_grammar" with your portable 3.4.1. > > 4. Boom! SyntaxError in test_grammar.py because you picked up the 3.4.2 > stdlib (the error comes from https://hg.python.org/cpython/rev/4ad33d82193d) Sigh. There's nothing so small that it isn't a compatibility break :-) But of course this process violates the rule "set PYTHONHOME or have a Lib/os.py *file* alongside python.exe". Like you say the rules are subtle enough that people will make mistakes :-( Thanks for the explanation. Paul From eryksun at gmail.com Wed Feb 3 12:23:36 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 3 Feb 2016 11:23:36 -0600 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B22F03.5080102@python.org> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> <56B22F03.5080102@python.org> Message-ID: On Wed, Feb 3, 2016 at 10:46 AM, Steve Dower wrote: > > sys.path.extend(read_subkeys(fr'HKCU\Software\Python\PythonCore\{sys.winver}\PythonPath\**')) > sys.path.extend(read_subkeys(fr'HKLM\Software\Python\PythonCore\{sys.winver}\PythonPath\**')) It seems like a bug (in spirit at least) that this step isn't skipped for -E and -I (Py_IgnoreEnvironmentFlag, Py_IsolatedFlag). > I haven't looked into pywin32's use of this recently - I tend to only use > Christoph Gohlke's wheels that don't register anything. I install the pypiwin32 wheel using pip, which uses pypiwin32.pth: # .pth file for the PyWin32 extensions win32 win32\lib Pythonwin import os;os.environ["PATH"]+=(';'+os.path.join(sitedir,"pypiwin32_system32")) This is different from a PythonPath subkey in a couple of respects. The paths listed in .pth files are appended to sys.path instead of prepended. They also don't get added when run with -S or for a venv environment that excludes site-packages. From steve.dower at python.org Wed Feb 3 12:53:41 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 3 Feb 2016 09:53:41 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B1CD76.3000000@g.nevcal.com> <56B22F03.5080102@python.org> Message-ID: <56B23EA5.2070107@python.org> On 03Feb2016 0923, eryk sun wrote: > On Wed, Feb 3, 2016 at 10:46 AM, Steve Dower wrote: >> >> sys.path.extend(read_subkeys(fr'HKCU\Software\Python\PythonCore\{sys.winver}\PythonPath\**')) >> sys.path.extend(read_subkeys(fr'HKLM\Software\Python\PythonCore\{sys.winver}\PythonPath\**')) > > It seems like a bug (in spirit at least) that this step isn't skipped > for -E and -I (Py_IgnoreEnvironmentFlag, Py_IsolatedFlag). They should be skipped. If not, I'm very much in favour of fixing that immediately in all active branches. >> I haven't looked into pywin32's use of this recently - I tend to only use >> Christoph Gohlke's wheels that don't register anything. > > I install the pypiwin32 wheel using pip, which uses pypiwin32.pth: > > # .pth file for the PyWin32 extensions > win32 > win32\lib > Pythonwin > > import os;os.environ["PATH"]+=(';'+os.path.join(sitedir,"pypiwin32_system32")) > > This is different from a PythonPath subkey in a couple of respects. > The paths listed in .pth files are appended to sys.path instead of > prepended. They also don't get added when run with -S or for a venv > environment that excludes site-packages. Yeah, there are serious problems with doing these kinds of hacks in .pth files. However, this is not directly affected by the registry, so thankfully not a concern right now. Cheers, Steve From tritium-list at sdamon.com Wed Feb 3 14:20:49 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Wed, 03 Feb 2016 14:20:49 -0500 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B22837.5060102@python.org> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B22837.5060102@python.org> Message-ID: <56B25311.8070706@sdamon.com> On 2/3/2016 11:17, Steve Dower wrote: >> I know for PTVS manually adding a python environment to visual studio is >> trivial - you fill in three locations, and its done. Just today I added >> a python environment to my system that was not autodetected. It took >> under a minute and almost no effort to add it... so for that tool this >> adds very little benefit. I do not know about other tools. > > I'm also a PTVS maintainer, so I know how much magic is going on > behind those three locations :) But I don't think people should need > to do that by hand at all. > > For example, the path to an Anaconda installation is buried deep > inside AppData (as is Python 3.5+ now), and varies based on your > username. Canopy does the same, and once you've found it there are (or > were?) at least three copies of python.exe to choose from (we worked > with Enthought to make this Just Work for PTVS users). Uh.... its C:\Anaconda[2]\ for anyone running the installer with the privileges to edit the registry... (It wont ask to elevate unless you install for all users, and that's where all users will install). So on that point alone, this saves nothing substantive really. (I will go off on python35 installing in insane locations some other time.) From p.f.moore at gmail.com Wed Feb 3 14:33:01 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Feb 2016 19:33:01 +0000 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B25311.8070706@sdamon.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B22837.5060102@python.org> <56B25311.8070706@sdamon.com> Message-ID: On 3 February 2016 at 19:20, Alexander Walters wrote: > Uh.... its C:\Anaconda[2]\ for anyone running the installer with the > privileges to edit the registry... (It wont ask to elevate unless you > install for all users, and that's where all users will install). So on that > point alone, this saves nothing substantive really. Per-user installs go into HKCU, which doesn't require elevation, so the proposal *does* help for per-user installs. And does the all-users install really offer no option for users to choose their own install location? Not even to switch to another drive (if, for example, C: is protected)? Paul From moiein2000 at gmail.com Wed Feb 3 14:32:18 2016 From: moiein2000 at gmail.com (Matthew Einhorn) Date: Wed, 3 Feb 2016 14:32:18 -0500 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B1B73C.1030204@sdamon.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> Message-ID: On Wed, Feb 3, 2016 at 3:15 AM, Alexander Walters wrote: > ...just when I thought I have solved the registry headaches I have been > dealing with... > > I am not saying this proposal will make the registry situation worse, but > it may break my solution to the headaches Python's registry use causes with > some non-standard module installers (and even the standard distutils exe > installers, but that is being mitigated). In the wild exist modules with > their own EXE or MSI installers that check the registry for 'the system > python'. No matter how hard you hit them, they will only install to *that > one python*. If I remember correctly, you can use `wheel convert filename.exe` on those installers which create a wheel that you can install. I think that's what I used to do with pywin32 before pypiwin32 came along. I just tested it and it still works on the pywin32 exe. -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Wed Feb 3 15:53:34 2016 From: francismb at email.de (francismb) Date: Wed, 3 Feb 2016 21:53:34 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56AFD17B.5090305@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> <56AFD17B.5090305@gmail.com> Message-ID: <56B268CE.80004@email.de> Hi, On 02/01/2016 10:43 PM, Yury Selivanov wrote: > > We also need to deoptimize the code to avoid having too many cache > misses/pointless cache updates. I found that, for instance, LOAD_ATTR > is either super stable (hits 100% of times), or really unstable, so 20 > misses is, again, seems to be alright. > Aren't those hits/misses a way to see how dynamic the code is? I mean can't the current magic (manually tweaked on a limited set) values, be self tweaked/adapted on those numbers? Thanks in advance, francis From victor.stinner at gmail.com Wed Feb 3 16:03:41 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 3 Feb 2016 22:03:41 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance Message-ID: Hi, There is an old discussion about the performance of PyMem_Malloc() memory allocator. CPython is stressing a lot memory allocators. Last time I made statistics, it was for the PEP 454: "For example, the Python test suites calls malloc() , realloc() or free() 270,000 times per second in average." https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator I proposed a simple change: modify PyMem_Malloc() to use the pymalloc allocator which is faster for allocation smaller than 512 bytes, or fallback to malloc() (which is the current internal allocator of PyMem_Malloc()). This tiny change makes Python up to 6% faster on some specific (macro) benchmarks, and it doesn't seem to make Python slower on any benchmark: http://bugs.python.org/issue26249#msg259445 Do you see any drawback of using pymalloc for PyMem_Malloc()? Does anyone recall the rationale to have two families to memory allocators? FYI Python has 3 families since 3.4: PyMem, PyObject but also PyMem_Raw! https://www.python.org/dev/peps/pep-0445/ -- Since pymalloc is only used for small memory allocations, I understand that small objects will not more be allocated on the heap memory, but only in pymalloc arenas which are allocated by mmap. The advantage of arenas is that it's possible to "punch holes" in the memory when a whole arena is freed, whereas the heap memory has the famous "fragmentation" issue because the heap is a single contiguous memory block. The libc malloc() uses mmap() for allocations larger than a threshold which is now dynamic, and initialized to 128 kB or 256 kB by default (I don't recall exactly the default value). Is there a risk of *higher* memory fragmentation if we start to use pymalloc for PyMem_Malloc()? Does someone know how to test it? Victor From yselivanov.ml at gmail.com Wed Feb 3 16:22:28 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 3 Feb 2016 16:22:28 -0500 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B268CE.80004@email.de> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> <56AFD17B.5090305@gmail.com> <56B268CE.80004@email.de> Message-ID: <56B26F94.1040806@gmail.com> On 2016-02-03 3:53 PM, francismb wrote: > Hi, > > On 02/01/2016 10:43 PM, Yury Selivanov wrote: > >> We also need to deoptimize the code to avoid having too many cache >> misses/pointless cache updates. I found that, for instance, LOAD_ATTR >> is either super stable (hits 100% of times), or really unstable, so 20 >> misses is, again, seems to be alright. >> > Aren't those hits/misses a way to see how dynamic the code is? I mean > can't the current magic (manually tweaked on a limited set) values, > be self tweaked/adapted on those numbers? Probably. One way of tackling this is to give each optimized opcode a counter for hit/misses. When we have a "hit" we increment that counter, when it's a miss, we decrement it. I kind of have something like that right now: https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L3035 But I only decrement that counter -- the idea is that LOAD_ATTR is allowed to "miss" only 20 times before getting deoptimized. I'll experiment with inc/dec on hit/miss and see how that affects the performance. An ideal way would be to calculate a hit/miss ratio over time for each cached opcode, but that would be an expensive calculation. Yury From victor.stinner at gmail.com Wed Feb 3 16:33:29 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 3 Feb 2016 22:33:29 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: Message-ID: > There is an old discussion about the performance of PyMem_Malloc() memory allocator. Oops, I forgot to mention that my patch is a follow-up of a previous patch showing nice speedup on dict: http://bugs.python.org/issue23601 (but I said it in my issue ;-)) Well, see http://bugs.python.org/issue26249 for the longer context. 2016-02-03 22:03 GMT+01:00 Victor Stinner : > Does anyone recall the rationale to have two families to memory allocators? I asked Mercurial, and I found the change addind PyMem_Malloc(): --- branch: legacy-trunk user: Guido van Rossum date: Tue Aug 05 01:59:22 1997 +0000 files: Include/mymalloc.h description: Added Py_Malloc and friends as well as PyMem_Malloc and friends. --- As expected, it's old, as the change adding PyObject_Malloc(): --- changeset: 12576:1c7c2dd1beb1 branch: legacy-trunk user: Guido van Rossum date: Wed May 03 23:44:39 2000 +0000 files: Include/mymalloc.h Include/objimpl.h Modules/_cursesmodule.c Modules/_sre.c Modules/_tkinter.c Modules/almodule.c Modules/arraymodule.c Modules/bsddbmodule. description: Vladimir Marangozov's long-awaited malloc restructuring. For more comments, read the patches at python.org archives. For documentation read the comments in mymalloc.h and objimpl.h. (This is not exactly what Vladimir posted to the patches list; I've made a few changes, and Vladimir sent me a fix in private email for a problem that only occurs in debug mode. I'm also holding back on his change to main.c, which seems unnecessary to me.) --- Victor From srkunze at mail.de Wed Feb 3 16:37:33 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 3 Feb 2016 22:37:33 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B26F94.1040806@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> <56AFD17B.5090305@gmail.com> <56B268CE.80004@email.de> <56B26F94.1040806@gmail.com> Message-ID: <56B2731D.10709@mail.de> On 03.02.2016 22:22, Yury Selivanov wrote: > One way of tackling this is to give each optimized opcode > a counter for hit/misses. When we have a "hit" we increment > that counter, when it's a miss, we decrement it. Within a given range, I suppose. Like: c = min(c+1, 100) > > I kind of have something like that right now: > https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L3035 > > But I only decrement that counter -- the idea is that LOAD_ATTR > is allowed to "miss" only 20 times before getting deoptimized. > > I'll experiment with inc/dec on hit/miss and see how that affects > the performance. > > An ideal way would be to calculate a hit/miss ratio over time > for each cached opcode, but that would be an expensive > calculation. From Nikolaus at rath.org Wed Feb 3 18:38:15 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 03 Feb 2016 15:38:15 -0800 Subject: [Python-Dev] Git for Mercurial Users Message-ID: <87lh71jrfc.fsf@thinkpad.rath.org> Hello, With the upcoming move to Git, I thought people might be interested in some thoughts that I wrote down when learning Git for the first time as a long-time Mercurial user: http://www.rath.org/mercurial-for-git-users-and-vice-versa.html Comments are welcome (but probably more appropriate off-list). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From rosuav at gmail.com Wed Feb 3 18:44:02 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Feb 2016 10:44:02 +1100 Subject: [Python-Dev] Git for Mercurial Users In-Reply-To: <87lh71jrfc.fsf@thinkpad.rath.org> References: <87lh71jrfc.fsf@thinkpad.rath.org> Message-ID: On Thu, Feb 4, 2016 at 10:38 AM, Nikolaus Rath wrote: > Hello, > > With the upcoming move to Git, I thought people might be interested in some > thoughts that I wrote down when learning Git for the first time as a > long-time Mercurial user: > > http://www.rath.org/mercurial-for-git-users-and-vice-versa.html > > Comments are welcome (but probably more appropriate off-list). Also worth reading is this quick summary of roughly-equivalent commands: https://github.com/sympy/sympy/wiki/Git-hg-rosetta-stone ChrisA From steve.dower at python.org Wed Feb 3 18:45:28 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 3 Feb 2016 15:45:28 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B25311.8070706@sdamon.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B22837.5060102@python.org> <56B25311.8070706@sdamon.com> Message-ID: <56B29118.9080207@python.org> On 03Feb2016 1120, Alexander Walters wrote: > Uh.... its C:\Anaconda[2]\ for anyone running the installer with the > privileges to edit the registry... (It wont ask to elevate unless you > install for all users, and that's where all users will install). So on > that point alone, this saves nothing substantive really. (I will go off > on python35 installing in insane locations some other time.) The install location is customisable, and users can always write to their own registry hive. The same applies to Python, so you can choose to install it as conveniently or as securely as you like. In either case, other applications need a guaranteed place to find these installations, and that place is the system registry. Cheers, Steve From ericsnowcurrently at gmail.com Wed Feb 3 20:33:25 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 3 Feb 2016 18:33:25 -0700 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B18CD7.2010409@python.org> References: <56B18CD7.2010409@python.org> Message-ID: On Tue, Feb 2, 2016 at 10:15 PM, Steve Dower wrote: > I was throwing around some ideas with colleagues about how we detect Python > installations on Windows from within Visual Studio, and it came up that > there are many Python distros that install into different locations but > write the same registry entries. (I knew about this, of course, but this > time I decided to do something.) > > [snip] > > So here is a rough proposal to standardise the registry keys that can be set > on Windows in a way that (a) lets other installers besides the official ones > have equal footing, (b) provides consistent search and resolution semantics > for tools, and (c) includes slightly more rich metadata (such as display > names and URLs). Presented in PEP-like form here, but if feedback suggests > just putting it in the docs I'm okay with that too. It is fully backwards > compatible with official releases of Python (at least back to 2.5, possibly > further) and does not require modifications to Python or the official > installer - it is purely codifying a superset of what we already do. > > Any and all feedback welcomed, especially from the owners of other distros, > Python implementations or tools on the list. Just wanted to quickly point out another use of the WIndows registry in Python: WindowsRegistryFinder [1]. This is an import "meta-path" finder that locates modules declared (*not* defined) in the registry. I'm not familiar with the Windows registry nor do I know if anyone is using this finder. That said, ISTM the finder's use of the registry does not face quite the same challenges you've described in the proposal. I expect Martin von L?wis could explain more as he was involved with adding the finder. Just wanted to throw that out there, particularly if there's a chance of the finder's registry keys conflicting in some way. -eric [1] https://hg.python.org/cpython/file/5873cfb42ebe/Lib/importlib/_bootstrap_external.py#l570 From eryksun at gmail.com Wed Feb 3 21:51:02 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 3 Feb 2016 20:51:02 -0600 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: References: <56B18CD7.2010409@python.org> Message-ID: On Wed, Feb 3, 2016 at 7:33 PM, Eric Snow wrote: > Just wanted to quickly point out another use of the WIndows registry > in Python: WindowsRegistryFinder [1]. This is an import "meta-path" > finder that locates modules declared (*not* defined) in the registry. > I'm not familiar with the Windows registry nor do I know if anyone is > using this finder. The "Modules" key (WindowsRegistryFinder in 3.3+ and previously PyWin_FindRegisteredModule) adds individual modules by subkey name, with the filepath in the default value (the filename can differ, but it can't use an arbitrary extension). The "PythonPath" and "Modules" keys both date back to Mark Hammond's Windows port in the mid 1990s. From steve.dower at python.org Wed Feb 3 21:59:24 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 3 Feb 2016 18:59:24 -0800 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: References: <56B18CD7.2010409@python.org>

Message-ID: <56B2BE8C.5060909@python.org> On 03Feb2016 1851, eryk sun wrote: > On Wed, Feb 3, 2016 at 7:33 PM, Eric Snow wrote: >> Just wanted to quickly point out another use of the WIndows registry >> in Python: WindowsRegistryFinder [1]. This is an import "meta-path" >> finder that locates modules declared (*not* defined) in the registry. >> I'm not familiar with the Windows registry nor do I know if anyone is >> using this finder. > > The "Modules" key (WindowsRegistryFinder in 3.3+ and previously > PyWin_FindRegisteredModule) adds individual modules by subkey name, > with the filepath in the default value (the filename can differ, but > it can't use an arbitrary extension). The "PythonPath" and "Modules" > keys both date back to Mark Hammond's Windows port in the mid 1990s. Yep, essentially, I expect these keys that actually affect how Python works to remain under PythonCore, and continue not to be documented or recommended for general use. But I see no reason to deprecate or remove them. Specialised situations that use these keys should continue to set them under PythonCore. I hope that is sufficiently implied by saying nothing about them in the PEP - I really don't want to have to be more explicit about it and I definitely do not want to actually name or list them in any way. Cheers, Steve From zachary.ware+pydev at gmail.com Thu Feb 4 01:48:21 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Thu, 4 Feb 2016 00:48:21 -0600 Subject: [Python-Dev] speed.python.org Message-ID: I'm happy to announce that speed.python.org is finally functional! There's not much there yet, as each benchmark builder has only sent one result so far (and one of those involved a bit of cheating on my part), but it's there. There are likely to be rough edges that still need smoothing out. When you find them, please report them at https://github.com/zware/codespeed/issues or on the speed at python.org mailing list. Many thanks to Intel for funding the work to get it set up and to Brett Cannon and Benjamin Peterson for their reviews. Happy benchmarking, -- Zach From victor.stinner at gmail.com Thu Feb 4 03:19:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 4 Feb 2016 09:19:42 +0100 Subject: [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: Great! 2016-02-04 7:48 GMT+01:00 Zachary Ware : > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. > > Happy benchmarking, > -- > Zach > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com From mal at egenix.com Thu Feb 4 05:17:51 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 4 Feb 2016 11:17:51 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: Message-ID: <56B3254F.7020605@egenix.com> On 03.02.2016 22:03, Victor Stinner wrote: > Hi, > > There is an old discussion about the performance of PyMem_Malloc() > memory allocator. CPython is stressing a lot memory allocators. Last > time I made statistics, it was for the PEP 454: > "For example, the Python test suites calls malloc() , realloc() or > free() 270,000 times per second in average." > https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator > > I proposed a simple change: modify PyMem_Malloc() to use the pymalloc > allocator which is faster for allocation smaller than 512 bytes, or > fallback to malloc() (which is the current internal allocator of > PyMem_Malloc()). > > This tiny change makes Python up to 6% faster on some specific (macro) > benchmarks, and it doesn't seem to make Python slower on any > benchmark: > http://bugs.python.org/issue26249#msg259445 > > Do you see any drawback of using pymalloc for PyMem_Malloc()? Yes: You cannot free memory allocated using pymalloc with the standard C lib free(). It would be better to go through the list of PyMem_*() calls in Python and replace them with PyObject_*() calls, where possible. > Does anyone recall the rationale to have two families to memory allocators? The PyMem_*() APIs were needed to have a cross-platform malloc() implementation which returns standard C lib free()able memory, but also behaves well when passing 0 as size. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 04 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From victor.stinner at gmail.com Thu Feb 4 07:29:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 4 Feb 2016 13:29:42 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: <56B3254F.7020605@egenix.com> References: <56B3254F.7020605@egenix.com> Message-ID: Hi, 2016-02-04 11:17 GMT+01:00 M.-A. Lemburg : >> Do you see any drawback of using pymalloc for PyMem_Malloc()? > > Yes: You cannot free memory allocated using pymalloc with the > standard C lib free(). That's not completly new. If Python is compiled in debug mode, you get a fatal error with a huge error message if you free the memory allocated by PyMem_Malloc() using PyObject_Free() or PyMem_RawFree(). But yes, technically it's possible to use free() when Python is *not* compiled in debug mode. > It would be better to go through the list of PyMem_*() calls > in Python and replace them with PyObject_*() calls, where > possible. There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() and PyMem_Free(). I would prefer to modify a single place having to replace 536 calls :-/ >> Does anyone recall the rationale to have two families to memory allocators? > > The PyMem_*() APIs were needed to have a cross-platform malloc() > implementation which returns standard C lib free()able memory, > but also behaves well when passing 0 as size. Yeah, PyMem_Malloc() & PyMem_Free() help to have a portable behaviour. But, why not PyObject_Malloc() & PObject_Free() were not used in the first place? An explanation can be that PyMem_Malloc() can be called without the GIL held. But it wasn't true before Python 3.4, since PyMem_Malloc() called (indirectly) PyObject_Malloc() when Python was compiled in debug mode, and PyObject_Malloc() requires the GIL to be held. When I wrote the PEP 445, there was a discussion about the GIL. It was proposed to allow to call PyMem_xxx() without the GIL: https://www.python.org/dev/peps/pep-0445/#gil-free-pymem-malloc This option was rejected. Victor From mal at egenix.com Thu Feb 4 07:54:54 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 4 Feb 2016 13:54:54 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: <56B3254F.7020605@egenix.com> Message-ID: <56B34A1E.4010501@egenix.com> On 04.02.2016 13:29, Victor Stinner wrote: > Hi, > > 2016-02-04 11:17 GMT+01:00 M.-A. Lemburg : >>> Do you see any drawback of using pymalloc for PyMem_Malloc()? >> >> Yes: You cannot free memory allocated using pymalloc with the >> standard C lib free(). > > That's not completly new. > > If Python is compiled in debug mode, you get a fatal error with a huge > error message if you free the memory allocated by PyMem_Malloc() using > PyObject_Free() or PyMem_RawFree(). > > But yes, technically it's possible to use free() when Python is *not* > compiled in debug mode. Debug mode is a completely different beast ;-) >> It would be better to go through the list of PyMem_*() calls >> in Python and replace them with PyObject_*() calls, where >> possible. > > There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() > and PyMem_Free(). > > I would prefer to modify a single place having to replace 536 calls :-/ You have a point there, but I don't think it'll work out that easily, since we are using such calls to e.g. pass dynamically allocated buffers to code in extensions (which then have to free the buffers again). >>> Does anyone recall the rationale to have two families to memory allocators? >> >> The PyMem_*() APIs were needed to have a cross-platform malloc() >> implementation which returns standard C lib free()able memory, >> but also behaves well when passing 0 as size. > > Yeah, PyMem_Malloc() & PyMem_Free() help to have a portable behaviour. > But, why not PyObject_Malloc() & PObject_Free() were not used in the > first place? Good question. I guess developers simply thought of PyObject_Malloc() being for PyObjects, not arbitrary memory buffers, most likely because pymalloc was advertised as allocator for Python objects, not random chunks of memory. Also: PyObject_*() APIs were first introduced with pymalloc, and no one really was interested in going through all the calls to PyMem_*() APIs and convert those to use the new pymalloc at the time. All this happened between Python 1.5.2 and 2.0. One of the reasons probably also was that pymalloc originally did not return memory back to the system malloc(). This was changed only some years ago. > An explanation can be that PyMem_Malloc() can be called without the > GIL held. But it wasn't true before Python 3.4, since PyMem_Malloc() > called (indirectly) PyObject_Malloc() when Python was compiled in > debug mode, and PyObject_Malloc() requires the GIL to be held. > > When I wrote the PEP 445, there was a discussion about the GIL. It was > proposed to allow to call PyMem_xxx() without the GIL: > https://www.python.org/dev/peps/pep-0445/#gil-free-pymem-malloc > > This option was rejected. AFAIR, the GIL was not really part of the consideration at the time. We used pymalloc for PyObject allocation, that's all. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 04 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From ncoghlan at gmail.com Thu Feb 4 08:04:57 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:04:57 +1000 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <20160201164023.CC500B200A1@webabinitio.net> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> Message-ID: On 2 February 2016 at 02:40, R. David Murray wrote: > On the other hand, if the distros go the way Nick has (I think) been > advocating, and have a separate 'system python for system scripts' that > is independent of the one installed for user use, having the system-only > python be frozen and sourceless would actually make sense on a couple of > levels. While omitting Python source files does let us reduce base image sizes (quite significantly), the current perspective in Fedora and Project Atomic is that going bytecode-only (whether frozen or not) breaks too many things to be worthwhile. As one simple example, it means tracebacks no longer include source code lines, dramatically increasing the difficulty of debugging failures. As such, we're more likely to pursue minimisation efforts by splitting the standard library up into "stuff essential distro components use" and "the rest of the standard library that upstream defines" than by figuring out how to avoid shipping source files (I believe Debian already makes this distinction with the python-minimal vs python split). Zipping up the standard library doesn't break tracebacks though, so it's potentially worth exploring that option further. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Feb 4 08:09:29 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:09:29 +1000 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> Message-ID: On 2 February 2016 at 06:39, Andrew Barnert via Python-Dev wrote: > On Feb 1, 2016, at 09:59, mike.romberg at comcast.net wrote: >> >> If the stdlib were to use implicit namespace packages >> ( https://www.python.org/dev/peps/pep-0420/ ) and the various >> loaders/importers as well, then python could do what I've done with an >> embedded python application for years. Freeze the stdlib (or put it >> in a zipfile or whatever is fast). Then arrange PYTHONPATH to first >> look on the filesystem and then look in the frozen/ziped storage. > > This is a great solution for experienced developers, but I think it would be pretty bad for novices or transplants from other languages (maybe even including Python 2). > > There are already multiple duplicate questions every month on StackOverflow from people asking "how do I find the source to stdlib module X". The canonical answer starts off by explaining how to import the module and use its __file__, which everyone is able to handle. If we have to instead explain how to work out the .py name from the qualified module name, how to work out the stdlib path from sys.path, and then how to find the source from those two things, with the caveat that it may not be installed at all on some platforms, and how to make sure what they're asking about really is a stdlib module, and how to make sure they aren't shadowing it with a module elsewhere on sys.path, that's a lot more complicated. Especially when you consider that some people on Windows and Mac are writing Python scripts without ever learning how to use the terminal or find their Python packages via Explorer/Finder. For folks that *do* know how to use the terminal: $ python3 -m inspect --details inspect Target: inspect Origin: /usr/lib64/python3.4/inspect.py Cached: /usr/lib64/python3.4/__pycache__/inspect.cpython-34.pyc Loader: <_frozen_importlib.SourceFileLoader object at 0x7f0d8d23d9b0> (And if they just want to *read* the source code, then leaving out "--details" prints the full module source, and would work even if the standard library were in a zip archive) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Feb 4 08:18:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:18:36 +1000 Subject: [Python-Dev] Speeding up CPython 5-10% In-Reply-To: References: <56A90B97.7090001@gmail.com> Message-ID: On 3 February 2016 at 03:52, Brett Cannon wrote: > Fifth, if we manage to show that a C API can easily be added to CPython to > make a JIT something that can simply be plugged in and be useful, then we > will also have a basic JIT framework for people to use. As I said, our use > of CoreCLR is just for ease of development. There is no reason we couldn't > use ChakraCore, v8, LLVM, etc. But since all of these JIT compilers would > need to know how to handle CPython bytecode, we have tried to design a > framework where JIT compilers just need a wrapper to handle code emission > and our framework that we are building will handle driving the code emission > (e.g., the wrapper needs to know how to emit add_integer(), but our > framework handles when to have to do that). That could also be really interesting in the context of pymetabiosis [1] if it meant that PyPy could still at least partially JIT the Python code running on the CPython side of the boundary. Cheers, Nick. [1] https://github.com/rguillebert/pymetabiosis -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Thu Feb 4 08:25:05 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 4 Feb 2016 14:25:05 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: <56B34A1E.4010501@egenix.com> References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> Message-ID: Thanks for your feedback, you are asking good questions :-) 2016-02-04 13:54 GMT+01:00 M.-A. Lemburg : >> There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() >> and PyMem_Free(). >> >> I would prefer to modify a single place having to replace 536 calls :-/ > > You have a point there, but I don't think it'll work out > that easily, since we are using such calls to e.g. pass > dynamically allocated buffers to code in extensions (which then > have to free the buffers again). Ah, interesting. But I'm not sure that we delegate the responsability of freeing the memory to external libraries. Usually, it's more the opposite: a library gives us an allocated memory block, and we have to free it. No? I checked if we call directly malloc() to pass the buffer to a library, but I failed to find such case. Again, in debug mode, calling free() on a memory block allocated by PyMem_Malloc() will likely crash. Since we run the Python test suite with a Python compiled in debug mode, we would already have detected such bug, no? See also my old issue http://bugs.python.org/issue18203 which replaced almost all direct calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc(). > Good question. I guess developers simply thought of PyObject_Malloc() > being for PyObjects, Yeah, I also understood that, but in practice, it looks like PyMem_Malloc() is slower than so using it makes the code less efficient than it can be. Instead of teaching developers that well, in fact, PyObject_Malloc() is unrelated to object programming, I think that it's simpler to modify PyMem_Malloc() to reuse pymalloc ;-) Victor From ncoghlan at gmail.com Thu Feb 4 08:36:54 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:36:54 +1000 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <22193.5697.54840.384679@turnbull.sk.tsukuba.ac.jp> References: <56AFADB0.8000502@gmail.com> <56B10664.1030500@gmail.com> <22193.5697.54840.384679@turnbull.sk.tsukuba.ac.jp> Message-ID: On 3 February 2016 at 06:49, Stephen J. Turnbull wrote: > Yury Selivanov writes: > > > Not sure about that... PEPs take a LOT of time :( > > Informational PEPs need not take so much time, no more than you would > spend on ceval.txt. I'm sure a PEP would get a lot more attention > from reviewers, too. > > Even if you PEP the whole thing, as you say it's a (big ;-) > implementation detail. A PEP won't make things more controversial (or > less) than they already are. I don't see why it would take that much > more time than ceval.txt. For a typical PEP, you need to explain both the status quo *and* the state after the changes, as well as provide references to the related discussions. I think in this case the main target audience for the technical details should be future maintainers, so Yury writing a ceval.txt akin to the current dictnotes.txt, listsort.txt, etc would cover the essentials. If someone else wanted to also describe the change in a PEP for ease of future reference, using Yury's ceval.txt as input, I do think that would be a good thing, but I wouldn't want to make the enhancement conditional on someone volunteering to do that. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Feb 4 08:41:59 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:41:59 +1000 Subject: [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: On 4 February 2016 at 16:48, Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. This is great to hear! Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Feb 4 08:46:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:46:04 +1000 Subject: [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: On 4 February 2016 at 16:48, Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. Heh, cdecimal utterly demolishing the old pure Python decimal module on the telco benchmark means normalising against CPython 3.5 rather than 2.7 really isn't very readable :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Feb 4 08:51:52 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:51:52 +1000 Subject: [Python-Dev] Python environment registration in the Windows Registry In-Reply-To: <56B18CD7.2010409@python.org> References: <56B18CD7.2010409@python.org> Message-ID: On 3 February 2016 at 15:15, Steve Dower wrote: > Presented in PEP-like form here, but if feedback suggests > just putting it in the docs I'm okay with that too. We don't really have anywhere in the docs to track platform integration topics like this, so an Informational PEP is your best bet. Cheers, Nick. P.S. While I guess you *could* try to figure out a suitable home in the docs, I don't think you'd gain anything particularly useful from the additional effort -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Thu Feb 4 09:05:41 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 4 Feb 2016 15:05:41 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> Message-ID: <56B35AB5.5090308@egenix.com> On 04.02.2016 14:25, Victor Stinner wrote: > Thanks for your feedback, you are asking good questions :-) > > 2016-02-04 13:54 GMT+01:00 M.-A. Lemburg : >>> There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() >>> and PyMem_Free(). >>> >>> I would prefer to modify a single place having to replace 536 calls :-/ >> >> You have a point there, but I don't think it'll work out >> that easily, since we are using such calls to e.g. pass >> dynamically allocated buffers to code in extensions (which then >> have to free the buffers again). > > Ah, interesting. But I'm not sure that we delegate the responsability > of freeing the memory to external libraries. Usually, it's more the > opposite: a library gives us an allocated memory block, and we have to > free it. No? Sometimes, yes, but we also do allocations for e.g. parsing values in Python argument tuples (e.g. using "es" or "et"): https://docs.python.org/3.6/c-api/arg.html We do document to use PyMem_Free() on those; not sure whether everyone does this though. > I checked if we call directly malloc() to pass the buffer to a > library, but I failed to find such case. > > Again, in debug mode, calling free() on a memory block allocated by > PyMem_Malloc() will likely crash. Since we run the Python test suite > with a Python compiled in debug mode, we would already have detected > such bug, no? The Python test suite doesn't test Python C extensions, so it's not surprising that it passes :-) > See also my old issue http://bugs.python.org/issue18203 which replaced > almost all direct calls to malloc() with PyMem_Malloc() or > PyMem_RawMalloc(). > >> Good question. I guess developers simply thought of PyObject_Malloc() >> being for PyObjects, > > Yeah, I also understood that, but in practice, it looks like > PyMem_Malloc() is slower than so using it makes the code less > efficient than it can be. > > Instead of teaching developers that well, in fact, PyObject_Malloc() > is unrelated to object programming, I think that it's simpler to > modify PyMem_Malloc() to reuse pymalloc ;-) Perhaps if you add some guards somewhere :-) Seriously, this may work if C extensions use the APIs consistently, but in order to tell, we'd need to check few. I know that I switched over all mx Extensions to use PyObject_*() instead of PyMem_*() or native malloc() several years ago and have not run into any issues. I guess the main question then is whether pymalloc is good enough for general memory allocation needs; and the answer may well be "yes". BTW: Tuning pymalloc for commonly used object sizes is another area where Python could gain better performance, i.e. reserve more / pre-allocate space for often used block sizes. pymalloc will also only work well for small blocks (up to 512 bytes). Everything else is routed to the system malloc(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 04 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From bussonniermatthias at gmail.com Thu Feb 4 10:57:52 2016 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Thu, 4 Feb 2016 07:57:52 -0800 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <56B26F94.1040806@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> <56AFD17B.5090305@gmail.com> <56B268CE.80004@email.de> <56B26F94.1040806@gmail.com> Message-ID: > On Feb 3, 2016, at 13:22, Yury Selivanov wrote: > > > An ideal way would be to calculate a hit/miss ratio over time > for each cached opcode, but that would be an expensive > calculation. Do you mean like a sliding windows ? Otherwise if you just want a let's say 20% miss threshold, you increment by 1 on hit, and decrement by 4 on miss. On Feb 3, 2016, at 13:37, Sven R. Kunze wrote: > On 03.02.2016 22:22, Yury Selivanov wrote: >> One way of tackling this is to give each optimized opcode >> a counter for hit/misses. When we have a "hit" we increment >> that counter, when it's a miss, we decrement it. > > Within a given range, I suppose. Like: > > c = min(c+1, 100) Min might be overkill, maybe you can use a or mask, to limit the windows range to 256 consecutive call ? -- M > > Yury > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/bussonniermatthias%40gmail.com From srkunze at mail.de Thu Feb 4 11:22:41 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 4 Feb 2016 17:22:41 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> <56AFD17B.5090305@gmail.com> <56B268CE.80004@email.de> <56B26F94.1040806@gmail.com> Message-ID: <56B37AD1.3090807@mail.de> On 04.02.2016 16:57, Matthias Bussonnier wrote: >> On Feb 3, 2016, at 13:22, Yury Selivanov wrote: >> >> >> An ideal way would be to calculate a hit/miss ratio over time >> for each cached opcode, but that would be an expensive >> calculation. > Do you mean like a sliding windows ? > Otherwise if you just want a let's say 20% miss threshold, you increment by 1 on hit, > and decrement by 4 on miss. Division is expensive. > > On Feb 3, 2016, at 13:37, Sven R. Kunze wrote: > >> On 03.02.2016 22:22, Yury Selivanov wrote: >>> One way of tackling this is to give each optimized opcode >>> a counter for hit/misses. When we have a "hit" we increment >>> that counter, when it's a miss, we decrement it. >> Within a given range, I suppose. Like: >> >> c = min(c+1, 100) > > Min might be overkill, maybe you can use a or mask, to limit the windows range > to 256 consecutive call ? Sure, that is how I would have written it in Python. But I would suggest an AND mask. ;-) Best, Sven From srkunze at mail.de Thu Feb 4 12:18:44 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 4 Feb 2016 18:18:44 +0100 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org>

<56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> <56B387F4.5030503@mail.de> Message-ID: On 2/4/2016 12:18 PM, Sven R. Kunze wrote: > On 04.02.2016 14:09, Nick Coghlan wrote: >> On 2 February 2016 at 06:39, Andrew Barnert via Python-Dev >> wrote: >>> On Feb 1, 2016, at 09:59,mike.romberg at comcast.net wrote: >>>> If the stdlib were to use implicit namespace packages >>>> (https://www.python.org/dev/peps/pep-0420/ ) and the various >>>> loaders/importers as well, then python could do what I've done with an >>>> embedded python application for years. Freeze the stdlib (or put it >>>> in a zipfile or whatever is fast). Then arrange PYTHONPATH to first >>>> look on the filesystem and then look in the frozen/ziped storage. >>> This is a great solution for experienced developers, but I think it would be pretty bad for novices or transplants from other languages (maybe even including Python 2). >>> >>> There are already multiple duplicate questions every month on StackOverflow from people asking "how do I find the source to stdlib module X". The canonical answer starts off by explaining how to import the module and use its __file__, which everyone is able to handle. If we have to instead explain how to work out the .py name from the qualified module name, how to work out the stdlib path from sys.path, and then how to find the source from those two things, with the caveat that it may not be installed at all on some platforms, and how to make sure what they're asking about really is a stdlib module, and how to make sure they aren't shadowing it with a module elsewhere on sys.path, that's a lot more complicated. Especially when you consider that some people on Windows and Mac are writing Py >>> thon scripts without ever learning how to use the terminal or find their Python packages via Explorer/Finder. >> For folks that *do* know how to use the terminal: >> >> $ python3 -m inspect --details inspect >> Target: inspect >> Origin: /usr/lib64/python3.4/inspect.py >> Cached: /usr/lib64/python3.4/__pycache__/inspect.cpython-34.pyc >> Loader: <_frozen_importlib.SourceFileLoader object at 0x7f0d8d23d9b0> >> >> (And if they just want to *read* the source code, then leaving out >> "--details" prints the full module source, and would work even if the >> standard library were in a zip archive) This is completely inadequate as a replacement for loading source into an editor, even if just for reading. First, on Windows, the console defaults to 300 lines. Print more and only the last 300 lines remain. The max is buffer size is 9999. But setting the buffer to that is obnoxious because the buffer is then padded with blank lines to make 9999 lines. The little rectangle that one grabs in the scrollbar is then scaled down to almost nothing, becoming hard to grab. Second is navigation. No Find, Find-next, or Find-all. Because of padding, moving to the unpadded 'bottom of file' is difficult. Third, for a repository version, I would have to type, without error, instead of 'python3', some version of, for instance, some suffix of 'F:/python/dev/35/PcBuild//python_d.exe'. "" depends, I believe, on the build options. > I want to see and debug also core Python in PyCharm and this is not > acceptable. > > If you want to make it opt-in, fine. But opt-out is a no-go. I have a > side-by-side comparison as we use Java and Python in production. It's > the *ease of access* that makes Python great compared to Java. > > @Andrew > Even for experienced developers it just sucks and there are more > important things to do. I agree that removing stdlib python source files by default is an poor idea. The disk space saved is trivial. So, for me, would be nearly all of the time saving. Over recent versions, more and more source files have been linked to in the docs. Guido recently approved of linking the rest. Removing source contradicts this trend. Easily loading modules, including stdlib modules, into an IDLE Editor Window is a documented feature that goes back to the original commit in Aug 2000. We not not usually break stdlib features without acknowledgement, some decussion, and a positive decision to do so. Someone has already mentioned the degredation of tracebacks. So why not just leave the source files alone in /Lib. As far as I can see, they would not hurt anything At least on Windows, zip files are treated as directories and python35.zip comes before /Lib on sys.path. The Windows installer currently has an option, selected by default I believe, to run compileall. Add to compileall an option to compile all to python35.zip rather than __pycache and and use that in that installer. Even if the zip is including in the installer, compileall-zip + source files would let adventurous people patch their stdlib files. Editing a stdlib file, to see if a confirmed bug disappeared (it did), was how I made my first code contribution. If I had had to download and setup svn and maybe visual c to try a one line change, I would not have done it. -- Terry Jan Reedy From stephen at xemacs.org Thu Feb 4 23:43:39 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 5 Feb 2016 13:43:39 +0900 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: References: <56AFADB0.8000502@gmail.com> <56B10664.1030500@gmail.com> <22193.5697.54840.384679@turnbull.sk.tsukuba.ac.jp> Message-ID: <22196.10363.610893.387870@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > If someone else wanted to also describe the change in a PEP for ease > of future reference, using Yury's ceval.txt as input, I do think that > would be a good thing, but I wouldn't want to make the enhancement > conditional on someone volunteering to do that. I wasn't suggesting making it conditional, I was encouraging Yury to do it himself as the most familiar with the situation. I may be underestimating the additional cost, but it seems to me explaining both before and after would be very useful to people who've hacked ceval in the past. (Presumably Yury would just be explaining "after" in his ceval.txt.) The important thing is to make it discoverable, though, and I don't care if it's done by PEP or not. In fact, perhaps "let Yury be Yury", plus an informational PEP listing all of the *.txt files in the tree would be more useful? Or in the devguide? From steve at pearwood.info Fri Feb 5 00:05:45 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Feb 2016 16:05:45 +1100 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> <56B387F4.5030503@mail.de> Message-ID: <20160205050545.GR31806@ando.pearwood.info> On Thu, Feb 04, 2016 at 07:58:30PM -0500, Terry Reedy wrote: > >>For folks that *do* know how to use the terminal: > >> > >>$ python3 -m inspect --details inspect > >>Target: inspect > >>Origin: /usr/lib64/python3.4/inspect.py > >>Cached: /usr/lib64/python3.4/__pycache__/inspect.cpython-34.pyc > >>Loader: <_frozen_importlib.SourceFileLoader object at 0x7f0d8d23d9b0> > >> > >>(And if they just want to *read* the source code, then leaving out > >>"--details" prints the full module source, and would work even if the > >>standard library were in a zip archive) > > This is completely inadequate as a replacement for loading source into > an editor, even if just for reading. [...] I agree with Terry. The inspect trick Nick describes above is a great feature to have, but it's not a substitute for opening the source in an editor, not even on OSes where the command line tools are more powerful than Windows' default tools. [...] > I agree that removing stdlib python source files by default is an poor > idea. The disk space saved is trivial. So, for me, would be nearly all > of the time saving. I too would be very reluctant to remove the source files from Python by default, but I have an alternative. I don't know if this is a ridiculous idea or not, but now that the .pyc bytecode files are kept in a separate __pycache__ directory, could we freeze that directory and leave the source files available for reading? (I'm not even sure if this suggestion makes sense, since I'm not really sure what "freezing" the stdlib entails. Is it documented anywhere?) -- Steve From ncoghlan at gmail.com Fri Feb 5 07:33:26 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 5 Feb 2016 22:33:26 +1000 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <20160205050545.GR31806@ando.pearwood.info> References: <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <20160201115441.7984a500@subdivisions.wooz.org> <22191.40203.430978.404940@lrd.home.lan> <56B387F4.5030503@mail.de> <20160205050545.GR31806@ando.pearwood.info> Message-ID: On 5 February 2016 at 15:05, Steven D'Aprano wrote: > (I'm not even sure if this suggestion makes sense, since I'm not really > sure what "freezing" the stdlib entails. Is it documented anywhere?) It's not particularly well documented - most of the docs you'll find are about freeze utilities that don't explain how they work, or the FrozenImporter, which doesn't explain how to *create* a frozen module and link it into your Python executable. Your approach of thinking of a frozen module as a generated .pyc file that has been converted to a builtin module is a pretty good working model, though. (It isn't *entirely* accurate, but the discrepancies are sufficiently arcane that they aren't going to matter in any case that doesn't involve specifically poking around at the import related attributes). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From srkunze at mail.de Fri Feb 5 10:07:09 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 5 Feb 2016 16:07:09 +0100 Subject: [Python-Dev] Opcode cache in ceval loop In-Reply-To: <6EDCB056-C454-400E-A1C2-E757CF8E1B28@gmail.com> References: <56AFADB0.8000502@gmail.com> <56AFB74D.8040108@gmail.com> <56AFBFC9.3090604@mail.de> <56AFC189.8010407@gmail.com> <56AFC7E0.9080203@mail.de> <56AFCDD3.20905@gmail.com> <56AFCEF9.2060508@mail.de> <56AFD17B.5090305@gmail.com> <56B268CE.80004@email.de> <56B26F94.1040806@gmail.com> <56B37AD1.3090807@mail.de> <6EDCB056-C454-400E-A1C2-E757CF8E1B28@gmail.com> Message-ID: <56B4BA9D.7000707@mail.de> On 05.02.2016 00:06, Matthias Bussonnier wrote: >> On Feb 4, 2016, at 08:22, Sven R. Kunze wrote: >> >> On 04.02.2016 16:57, Matthias Bussonnier wrote: >>>> On Feb 3, 2016, at 13:22, Yury Selivanov wrote: >>>> >>>> >>>> An ideal way would be to calculate a hit/miss ratio over time >>>> for each cached opcode, but that would be an expensive >>>> calculation. >>> Do you mean like a sliding windows ? >>> Otherwise if you just want a let's say 20% miss threshold, you increment by 1 on hit, >>> and decrement by 4 on miss. >> Division is expensive. > I'm not speaking about division here. > if you +M / -N the counter will decrease in average only if the hit/miss ratio > is below N/(M+N), but you do not need to do the division. > > Then you deoptimize only if you get < 0. I see but it looks still more complicated. :) > >> >>> On Feb 3, 2016, at 13:37, Sven R. Kunze wrote: >>> >>>> On 03.02.2016 22:22, Yury Selivanov wrote: >>>>> One way of tackling this is to give each optimized opcode >>>>> a counter for hit/misses. When we have a "hit" we increment >>>>> that counter, when it's a miss, we decrement it. >>>> Within a given range, I suppose. Like: >>>> >>>> c = min(c+1, 100) >>> Min might be overkill, maybe you can use a or mask, to limit the windows range >>> to 256 consecutive call ? >> Sure, that is how I would have written it in Python. But I would suggest an AND mask. ;-) > > Sure, implementation detail I would say. Should not write emails before breakfast... ;-) > The other problem, with the mask, is if your increment hit 256 you wrap around back to 0 > where it deoptimize (which is not what you want), so you might need to not mask the > sign bit and deoptimize only on a certain negative threshold. > > > Does it make sens ? Definitely. I am curious about the actual implementation of this idea. Best, Sven From status at bugs.python.org Fri Feb 5 12:08:32 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 5 Feb 2016 18:08:32 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160205170832.6327B560D4@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-01-29 - 2016-02-05) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5413 (+32) closed 32641 (+26) total 38054 (+58) Open issues with patches: 2367 Issues opened (50) ================== #25660: tabs don't work correctly in python repl http://bugs.python.org/issue25660 reopened by martin.panter #26239: distutils link-objects is not normalized http://bugs.python.org/issue26239 opened by jdfergason #26240: Docstring of the subprocess module should be cleaned up http://bugs.python.org/issue26240 opened by Antony.Lee #26243: zlib.compress level as keyword argument http://bugs.python.org/issue26243 opened by palaviv #26246: Code output toggle button uses removed jQuery method http://bugs.python.org/issue26246 opened by ccwang002 #26247: Document Chrome/Chromium for python2.7 http://bugs.python.org/issue26247 opened by Ismail s #26248: Improve scandir DirEntry docs, especially re symlinks and cach http://bugs.python.org/issue26248 opened by benhoyt #26249: Change PyMem_Malloc to use PyObject_Malloc allocator? http://bugs.python.org/issue26249 opened by haypo #26250: no document for sqlite3.Cursor.connection http://bugs.python.org/issue26250 opened by qinghao #26251: Use "Low-fragmentation Heap" memory allocator on Windows http://bugs.python.org/issue26251 opened by haypo #26252: Add an example to importlib docs on setting up an importer http://bugs.python.org/issue26252 opened by brett.cannon #26253: tarfile in stream mode always set zlib compression level to 9 http://bugs.python.org/issue26253 opened by Patrik Dufresne #26254: ssl should raise an exception when trying to load an unusable http://bugs.python.org/issue26254 opened by abacabadabacaba #26256: Fast decimalisation and conversion to other bases http://bugs.python.org/issue26256 opened by jneb #26257: Eliminate buffer_tests.py http://bugs.python.org/issue26257 opened by martin.panter #26258: readline module for python 3.x on windows http://bugs.python.org/issue26258 opened by Ali Razmjoo #26259: Memleak when repeated calls to asyncio.queue.Queue.get is perf http://bugs.python.org/issue26259 opened by Jonas Brunsgaard #26261: NamedTemporaryFile documentation is vague about the `name` att http://bugs.python.org/issue26261 opened by ztane #26262: Cannot compile with /fp:strict with MSVC http://bugs.python.org/issue26262 opened by zach.ware #26263: Serialize array.array to JSON by default http://bugs.python.org/issue26263 opened by Omer.Katz #26264: keyword module missing async and await keywords http://bugs.python.org/issue26264 opened by tuxtimo #26265: build errors on OS X 10.11 with --enable-universalsdk http://bugs.python.org/issue26265 opened by davidjamesbeck #26266: add classattribute to enum to handle non-Enum attributes http://bugs.python.org/issue26266 opened by ethan.furman #26267: UUID docs should say how to get "standard form" http://bugs.python.org/issue26267 opened by abarnert #26268: Update python.org installers to use OpenSSL 1.0.2f http://bugs.python.org/issue26268 opened by zach.ware #26269: zipfile should call lstat instead of stat if available http://bugs.python.org/issue26269 opened by Patrik Dufresne #26270: Support for read()/write()/select() on asyncio http://bugs.python.org/issue26270 opened by Paulo Costa #26271: freeze.py makefile uses the wrong flags variables http://bugs.python.org/issue26271 opened by Daniel Shaulov #26273: Expose TCP_CONGESTION and TCP_USER_TIMEOUT to the socket modul http://bugs.python.org/issue26273 opened by Omar Sandoval #26275: perf.py: calibrate benchmarks using time, not using a fixed nu http://bugs.python.org/issue26275 opened by haypo #26276: Inconsistent behaviour of PEP 3101 formatting between versions http://bugs.python.org/issue26276 opened by Mark Shannon #26277: Allow zipapp to target modules http://bugs.python.org/issue26277 opened by flying sheep #26278: BaseTransport.close() does not trigger connection_lost() http://bugs.python.org/issue26278 opened by S??mer.Cip #26279: time.strptime does not properly convert out-of-bounds values http://bugs.python.org/issue26279 opened by iaslan #26280: ceval: Optimize [] operation similarly to CPython 2.7 http://bugs.python.org/issue26280 opened by yselivanov #26281: Clear sys.path_importer_cache from importlib.invalidate_caches http://bugs.python.org/issue26281 opened by brett.cannon #26282: Add support for partial keyword arguments in extension functio http://bugs.python.org/issue26282 opened by serhiy.storchaka #26283: zipfile can not handle the path build by os.path.join() http://bugs.python.org/issue26283 opened by ????????? #26284: FIx telco benchmark http://bugs.python.org/issue26284 opened by skrah #26285: Garbage collection of unused input sections from CPython binar http://bugs.python.org/issue26285 opened by alecsandru.patrascu #26286: dis module: coroutine opcode documentation clarity http://bugs.python.org/issue26286 opened by Jim.Jewett #26287: Core dump in f-string with formatting errors due to refcount b http://bugs.python.org/issue26287 opened by encukou #26288: Optimize PyLong_AsDouble for single-digit longs http://bugs.python.org/issue26288 opened by yselivanov #26289: Optimize floor division for ints http://bugs.python.org/issue26289 opened by yselivanov #26290: fileinput and 'for line in sys.stdin' do strange mockery of in http://bugs.python.org/issue26290 opened by Don Hatch #26292: Raw I/O writelines() broken http://bugs.python.org/issue26292 opened by haypo #26293: Embedded zipfile fields dependent on absolute position http://bugs.python.org/issue26293 opened by spoo #26294: Queue().unfinished_tasks not in docs - deliberate? http://bugs.python.org/issue26294 opened by frankmillman #26295: Random failures when running test suite in parallel (-m test - http://bugs.python.org/issue26295 opened by haypo #26296: colorys rgb_to_hls algorithm error http://bugs.python.org/issue26296 opened by Mats Luspa Most recent 15 issues with no replies (15) ========================================== #26294: Queue().unfinished_tasks not in docs - deliberate? http://bugs.python.org/issue26294 #26289: Optimize floor division for ints http://bugs.python.org/issue26289 #26288: Optimize PyLong_AsDouble for single-digit longs http://bugs.python.org/issue26288 #26286: dis module: coroutine opcode documentation clarity http://bugs.python.org/issue26286 #26284: FIx telco benchmark http://bugs.python.org/issue26284 #26283: zipfile can not handle the path build by os.path.join() http://bugs.python.org/issue26283 #26281: Clear sys.path_importer_cache from importlib.invalidate_caches http://bugs.python.org/issue26281 #26279: time.strptime does not properly convert out-of-bounds values http://bugs.python.org/issue26279 #26278: BaseTransport.close() does not trigger connection_lost() http://bugs.python.org/issue26278 #26277: Allow zipapp to target modules http://bugs.python.org/issue26277 #26273: Expose TCP_CONGESTION and TCP_USER_TIMEOUT to the socket modul http://bugs.python.org/issue26273 #26271: freeze.py makefile uses the wrong flags variables http://bugs.python.org/issue26271 #26269: zipfile should call lstat instead of stat if available http://bugs.python.org/issue26269 #26268: Update python.org installers to use OpenSSL 1.0.2f http://bugs.python.org/issue26268 #26266: add classattribute to enum to handle non-Enum attributes http://bugs.python.org/issue26266 Most recent 15 issues waiting for review (15) ============================================= #26289: Optimize floor division for ints http://bugs.python.org/issue26289 #26288: Optimize PyLong_AsDouble for single-digit longs http://bugs.python.org/issue26288 #26285: Garbage collection of unused input sections from CPython binar http://bugs.python.org/issue26285 #26280: ceval: Optimize [] operation similarly to CPython 2.7 http://bugs.python.org/issue26280 #26275: perf.py: calibrate benchmarks using time, not using a fixed nu http://bugs.python.org/issue26275 #26273: Expose TCP_CONGESTION and TCP_USER_TIMEOUT to the socket modul http://bugs.python.org/issue26273 #26271: freeze.py makefile uses the wrong flags variables http://bugs.python.org/issue26271 #26257: Eliminate buffer_tests.py http://bugs.python.org/issue26257 #26249: Change PyMem_Malloc to use PyObject_Malloc allocator? http://bugs.python.org/issue26249 #26248: Improve scandir DirEntry docs, especially re symlinks and cach http://bugs.python.org/issue26248 #26246: Code output toggle button uses removed jQuery method http://bugs.python.org/issue26246 #26243: zlib.compress level as keyword argument http://bugs.python.org/issue26243 #26228: pty.spawn hangs on FreeBSD 9.3, 10.x http://bugs.python.org/issue26228 #26224: Add "version added" for documentation of asyncio.timeout for d http://bugs.python.org/issue26224 #26219: implement per-opcode cache in ceval http://bugs.python.org/issue26219 Top 10 most discussed issues (10) ================================= #21955: ceval.c: implement fast path for integers with a single digit http://bugs.python.org/issue21955 56 msgs #26275: perf.py: calibrate benchmarks using time, not using a fixed nu http://bugs.python.org/issue26275 19 msgs #26249: Change PyMem_Malloc to use PyObject_Malloc allocator? http://bugs.python.org/issue26249 18 msgs #26280: ceval: Optimize [] operation similarly to CPython 2.7 http://bugs.python.org/issue26280 13 msgs #25660: tabs don't work correctly in python repl http://bugs.python.org/issue25660 10 msgs #26194: Undefined behavior for deque.insert() when len(d) == maxlen http://bugs.python.org/issue26194 10 msgs #26256: Fast decimalisation and conversion to other bases http://bugs.python.org/issue26256 9 msgs #25924: investigate if getaddrinfo(3) on OSX is thread-safe http://bugs.python.org/issue25924 8 msgs #26229: Make number serialization ES6/V8 compatible http://bugs.python.org/issue26229 8 msgs #26285: Garbage collection of unused input sections from CPython binar http://bugs.python.org/issue26285 8 msgs Issues closed (26) ================== #12923: test_urllib fails in refleak mode http://bugs.python.org/issue12923 closed by martin.panter #19587: Remove empty tests in test_bytes.FixedStringTest http://bugs.python.org/issue19587 closed by martin.panter #19883: Integer overflow in zipimport.c http://bugs.python.org/issue19883 closed by serhiy.storchaka #21328: Resize doesn't change reported length on create_string_buffer( http://bugs.python.org/issue21328 closed by Dustin.Oprea #22923: No prompt for "display all X possibilities" on completion-enab http://bugs.python.org/issue22923 closed by yoha #23076: list(pathlib.Path().glob("")) fails with IndexError http://bugs.python.org/issue23076 closed by berker.peksag #23601: use small object allocator for dict key storage http://bugs.python.org/issue23601 closed by rhettinger #25774: [benchmarks] Adjust to allow uploading benchmark data to codes http://bugs.python.org/issue25774 closed by zach.ware #25798: Update python.org installers to use OpenSSL 1.0.2e http://bugs.python.org/issue25798 closed by zach.ware #25934: ICC compiler: ICC treats denormal floating point numbers as 0. http://bugs.python.org/issue25934 closed by zach.ware #25945: Type confusion in partial_setstate and partial_call leads to m http://bugs.python.org/issue25945 closed by serhiy.storchaka #26125: Incorrect error message in the module asyncio.selector_events. http://bugs.python.org/issue26125 closed by berker.peksag #26173: test_ssl.bad_cert_test() exception handling http://bugs.python.org/issue26173 closed by martin.panter #26218: Set PrependPath default to true http://bugs.python.org/issue26218 closed by steve.dower #26222: Missing code in linux_distribution python 2.7.11 http://bugs.python.org/issue26222 closed by berker.peksag #26233: select.epoll.poll() should avoid calling malloc() each time http://bugs.python.org/issue26233 closed by haypo #26238: httplib use wrong hostname in https request with SNI support http://bugs.python.org/issue26238 closed by lvhancy #26241: repr() and str() are identical for floats in 3.5 http://bugs.python.org/issue26241 closed by mark.dickinson #26242: reST formatting error in Doc/library/importlib.rst http://bugs.python.org/issue26242 closed by berker.peksag #26244: zlib.compressobj level default value documentation http://bugs.python.org/issue26244 closed by martin.panter #26245: AttributeError (GL_READ_WRITE) when importing OpenGL.GL http://bugs.python.org/issue26245 closed by berker.peksag #26255: symtable.Symbol.is_referenced() returns false for valid use http://bugs.python.org/issue26255 closed by benjamin.peterson #26260: utf8 decoding inconsistency between P2 and P3 http://bugs.python.org/issue26260 closed by haypo #26272: `zipfile.ZipFile` fails reading a file object in specific vers http://bugs.python.org/issue26272 closed by Pengyu Chen #26274: Add CPU affinity to perf.py http://bugs.python.org/issue26274 closed by haypo #26291: Floating-point arithmetic http://bugs.python.org/issue26291 closed by ebarry From yselivanov.ml at gmail.com Fri Feb 5 12:21:05 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 5 Feb 2016 12:21:05 -0500 Subject: [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: <56B4DA01.5040304@gmail.com> Big thanks to you, Zachary (and everyone involved)! It's a very good news. Yury On 2016-02-04 1:48 AM, Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. > > Happy benchmarking, From emile at fenx.com Fri Feb 5 12:27:04 2016 From: emile at fenx.com (Emile van Sebille) Date: Fri, 5 Feb 2016 09:27:04 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <56AF93D9.2040104@stoneleaf.us> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> Message-ID: On 2/1/2016 9:20 AM, Ethan Furman wrote: > On 02/01/2016 08:40 AM, R. David Murray wrote: >> On the other hand, if the distros go the way Nick has (I think) been >> advocating, and have a separate 'system python for system scripts' that >> is independent of the one installed for user use, having the system-only >> python be frozen and sourceless would actually make sense on a couple of >> levels. > > Agreed. Except for that nasty licensing issue requiring source code. Emile From tritium-list at sdamon.com Fri Feb 5 12:37:27 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Fri, 05 Feb 2016 12:37:27 -0500 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> Message-ID: <56B4DDD7.8060905@sdamon.com> On 2/5/2016 12:27, Emile van Sebille wrote: > On 2/1/2016 9:20 AM, Ethan Furman wrote: >> On 02/01/2016 08:40 AM, R. David Murray wrote: > >>> On the other hand, if the distros go the way Nick has (I think) been >>> advocating, and have a separate 'system python for system scripts' that >>> is independent of the one installed for user use, having the >>> system-only >>> python be frozen and sourceless would actually make sense on a >>> couple of >>> levels. >> >> Agreed. > > Except for that nasty licensing issue requiring source code. > > Emile Licensing requires, in the GPL at least, that the *modified* sources be made *available*, not that they be shipped with the product. Looking at the Python license, and what tools already do, there is zero need to ship the source to stay compliant. From brett at python.org Fri Feb 5 13:07:03 2016 From: brett at python.org (Brett Cannon) Date: Fri, 05 Feb 2016 18:07:03 +0000 Subject: [Python-Dev] [Speed] speed.python.org In-Reply-To: References: Message-ID: On Thu, 4 Feb 2016 at 05:46 Nick Coghlan wrote: > On 4 February 2016 at 16:48, Zachary Ware > wrote: > > I'm happy to announce that speed.python.org is finally functional! > > There's not much there yet, as each benchmark builder has only sent > > one result so far (and one of those involved a bit of cheating on my > > part), but it's there. > > > > There are likely to be rough edges that still need smoothing out. > > When you find them, please report them at > > https://github.com/zware/codespeed/issues or on the speed at python.org > > mailing list. > > > > Many thanks to Intel for funding the work to get it set up and to > > Brett Cannon and Benjamin Peterson for their reviews. > > Heh, cdecimal utterly demolishing the old pure Python decimal module > on the telco benchmark means normalising against CPython 3.5 rather > than 2.7 really isn't very readable :) > I find viewing the graphs using the horizontal layout is much easier to read (the bars are a lot thicker and everything zooms in more). -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Feb 5 13:29:18 2016 From: brett at python.org (Brett Cannon) Date: Fri, 05 Feb 2016 18:29:18 +0000 Subject: [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: To piggyback on Zach's speed.python.org announcement, we will most likely be kicking off a discussion of redoing the benchmark suite, tweaking the test runner, etc. over on the speed@ ML. Those of us who have been doing perf work lately have found some shortcoming we would like to fix in our benchmarks suite, so if you want to participate in that discussion, please join speed@ by next week. On Wed, 3 Feb 2016 at 22:49 Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. > > Happy benchmarking, > -- > Zach > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emile at fenx.com Fri Feb 5 13:33:46 2016 From: emile at fenx.com (Emile van Sebille) Date: Fri, 5 Feb 2016 10:33:46 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: <56B4DDD7.8060905@sdamon.com> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <56B4DDD7.8060905@sdamon.com> Message-ID: On 2/5/2016 9:37 AM, Alexander Walters wrote: > > > On 2/5/2016 12:27, Emile van Sebille wrote: >> On 2/1/2016 9:20 AM, Ethan Furman wrote: >>> On 02/01/2016 08:40 AM, R. David Murray wrote: >> >>>> On the other hand, if the distros go the way Nick has (I think) been >>>> advocating, and have a separate 'system python for system scripts' that >>>> is independent of the one installed for user use, having the >>>> system-only >>>> python be frozen and sourceless would actually make sense on a >>>> couple of >>>> levels. >>> >>> Agreed. >> >> Except for that nasty licensing issue requiring source code. >> >> Emile > Licensing requires, in the GPL at least, that the *modified* sources be > made *available*, not that they be shipped with the product. Looking at > the Python license, and what tools already do, there is zero need to > ship the source to stay compliant. Hmm, the annotated Open Source Definition explicitly states "The program must include source code" -- how did I misinterpret that? Emile http://opensource.org/osd-annotated From brett at python.org Fri Feb 5 13:38:14 2016 From: brett at python.org (Brett Cannon) Date: Fri, 05 Feb 2016 18:38:14 +0000 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <56B4DDD7.8060905@sdamon.com> Message-ID: On Fri, 5 Feb 2016 at 10:34 Emile van Sebille wrote: > On 2/5/2016 9:37 AM, Alexander Walters wrote: > > > > > > On 2/5/2016 12:27, Emile van Sebille wrote: > >> On 2/1/2016 9:20 AM, Ethan Furman wrote: > >>> On 02/01/2016 08:40 AM, R. David Murray wrote: > >> > >>>> On the other hand, if the distros go the way Nick has (I think) been > >>>> advocating, and have a separate 'system python for system scripts' > that > >>>> is independent of the one installed for user use, having the > >>>> system-only > >>>> python be frozen and sourceless would actually make sense on a > >>>> couple of > >>>> levels. > >>> > >>> Agreed. > >> > >> Except for that nasty licensing issue requiring source code. > >> > >> Emile > > Licensing requires, in the GPL at least, that the *modified* sources be > > made *available*, not that they be shipped with the product. Looking at > > the Python license, and what tools already do, there is zero need to > > ship the source to stay compliant. > > Hmm, the annotated Open Source Definition explicitly states "The program > must include source code" -- how did I misinterpret that? > Because you left off the part following: "... and must allow distribution in source code as well as compiled form". This is entirely a discussion of distribution in a compiled form. -------------- next part -------------- An HTML attachment was scrubbed... URL: From emile at fenx.com Fri Feb 5 14:56:13 2016 From: emile at fenx.com (Emile van Sebille) Date: Fri, 5 Feb 2016 11:56:13 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <56B4DDD7.8060905@sdamon.com> Message-ID: On 2/5/2016 10:38 AM, Brett Cannon wrote: > > > On Fri, 5 Feb 2016 at 10:34 Emile van Sebille > wrote: > >> Except for that nasty licensing issue requiring source code. > >> > >> Emile > > Licensing requires, in the GPL at least, that the *modified* > sources be > > made *available*, not that they be shipped with the product. > Looking at > > the Python license, and what tools already do, there is zero need to > > ship the source to stay compliant. > > Hmm, the annotated Open Source Definition explicitly states "The program > must include source code" -- how did I misinterpret that? > > > Because you left off the part following: "... and must allow > distribution in source code as well as compiled form". This is entirely > a discussion of distribution in a compiled form. Aah, 'must' is less restrictive in this context than I expected. When you combine the two halves the first part might be more accurately phrased as 'The program must make source code available' rather than 'must include' which I understood to mean 'ship with'. Emile From abarnert at yahoo.com Fri Feb 5 15:36:40 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Feb 2016 20:36:40 +0000 (UTC) Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: Message-ID: <855923374.2012934.1454704600216.JavaMail.yahoo@mail.yahoo.com> On Friday, February 5, 2016 11:57 AM, Emile van Sebille wrote: > Aah, 'must' is less restrictive in this context than I expected. When > you combine the two halves the first part might be more accurately > phrased as 'The program must make source code available' rather than > 'must include' which I understood to mean 'ship with'. First, step back and think of this in common sense terms: If being open source required any Python installation to have the .py source to the .pyc or .zip files in the stdlib, surely it would also require any Python installation to have the .c source to the interpreter too. But lots of people have Python without having the .c source. Also, the GPL isn't typical of all open source licenses, it's only typical of _copyleft_ licenses. Permissive licenses, like Python's, are very different. Copyleft licenses are designed to make sure that all derived works are also copylefted; permissive licenses are designed to permit derived works as widely as possible. As the Python license specifically says, "All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source." Meanwhile, the fact that someone has decided that the Python license qualifies under the Open Source Definition doesn't mean the OSD is the right way to understand it. Read the license itself, or one of the summaries at opensource.org or fsf.org. (And if you still can't figure something out, and it's important to your work, you almost certainly need to ask a lawyer.) So, if you think the first sentence of section 2 of the OSD contradicts the explanation in the rest of the paragraph--well, even if you're right, that doesn't affect Python's license at all. Finally, if you want to see what it takes to actually make all the terms unambiguous both to ordinary human beings and to legal codes, see the GPL FAQ sections on their definitions of "propagate" and "convey". It may take you lots of careful reading to understand it, but when you finally do, it's definitely unambiguous. From stephen at xemacs.org Fri Feb 5 23:31:15 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 6 Feb 2016 13:31:15 +0900 Subject: [Python-Dev] Licensing issue (?) for Frozen Python? [was: More optimisation ideas] In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> Message-ID: <22197.30483.854098.718888@turnbull.sk.tsukuba.ac.jp> Executive summary: There is no licensing issue because Python isn't copyleft. Stick to the pragmatic *technical* issue of how to reliably provide corresponding source to those who want to look at that source (just because that's how we do things in Python). Emile van Sebille writes: > Except for that nasty licensing issue requiring source code. CPython is not now and never has been copyleft. CPython is distributed by the PSF *as* open source with a license that *permits* redistribution of original source and derivatives (including executables), but legally need not *remain* open source downstream. The remaining issue is the PSF's CLA which permits the PSF to relicense/sublicense under any open source license. However it's not clear to me that the PSF is required by the CLA to distribute source! It receives the code under very permissive licenses, and the CLA merely names the contributor's chosen license. I imagine those licenses determine whether the PSF must distribute source. If so, no, not even the PSF is bound (legally) to distribute Python source. Of course if *you* want to you can GPL Python (I think that's now possible, at one time there was a issue with the CNRI license IIRC), and then licensees of *your* distribution (but not you!) are required to distribute source. Of course our trust in the PSF is based on the moral principle of reciprocity: we contribute to the PSF's distribution as open source (according to the CLA) in large part because we expect to receive open source back. But if the PSF ever goes so wrong as to even think of taking advantage of that loophole, we are well and truly hosed anyway. (Among other things, that means a voting majority of the current PSF Board -- many of them core developers -- fell under a bus.) So don't worry about it. From rosuav at gmail.com Fri Feb 5 23:42:46 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 6 Feb 2016 15:42:46 +1100 Subject: [Python-Dev] Licensing issue (?) for Frozen Python? [was: More optimisation ideas] In-Reply-To: <22197.30483.854098.718888@turnbull.sk.tsukuba.ac.jp> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <22197.30483.854098.718888@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sat, Feb 6, 2016 at 3:31 PM, Stephen J. Turnbull wrote: > Of course if *you* want to you can GPL Python (I think that's now > possible, at one time there was a issue with the CNRI license IIRC), > and then licensees of *your* distribution (but not you!) are required > to distribute source. And even the GPL doesn't require you to distribute the source along with every copy of the binary. As long as the source is *available*, it's acceptable to distribute just the binary for convenience. For instance, on my Debian systems, I can say "apt-get install somepackage" to get just the binary, and then "apt-get source somepackage" if I want the corresponding source. IANAL, but I suspect it would be compliant if the same way of obtaining the C source code also gets you the unfrozen stdlib. So yeah, no licensing problem. ChrisA From stephen at xemacs.org Sat Feb 6 00:31:31 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 6 Feb 2016 14:31:31 +0900 Subject: [Python-Dev] Licensing issue (?) for Frozen Python? [was: More optimisation ideas] In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <22197.30483.854098.718888@turnbull.sk.tsukuba.ac.jp> Message-ID: <22197.34099.197114.379728@turnbull.sk.tsukuba.ac.jp> Chris Angelico writes: > And even the GPL doesn't require you to distribute the source along > with every copy of the binary. As long as the source is *available*, > it's acceptable to distribute just the binary for convenience. True (and it would apply to frozen Python as long as the source includes the build scripts such as setup.py used to "freeze" Python), but it can be complex (especially for commercial distribution). However, the technical problem remains. For example, you mention Debian. While Debian keeps its source and binary packages very close to "in sync" on the server, there are several gotchas. For example, Debian does not restrict itself to packaging patches, it sometimes breaks your security when it thinks it's smarter than Bruce. So ... is the corresponding source you're interested in the patched or unpatched source? Do you know which you get when you install the source package? Do you know how to get the other? Suppose for reasons of stability you've "pinned" the binary. Is the corresponding Debian source package still easily available? Did you think of that gotcha when you installed the source package, or did you just assume they were still in sync? I'm sure somebody with the "security mindset" (eg, Bruce) can think of many more.... It's not Python's responsibility to solve these gotchas, of course. Many (eg, do you want patched vs. unpatched) are use-case-dependent anyway. However, many of them do go away (and Python has fulfilled any imaginable responsibility) if we distribute source with the binaries, or arrange that binaries are built from source at installation. From rosuav at gmail.com Sat Feb 6 00:38:53 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 6 Feb 2016 16:38:53 +1100 Subject: [Python-Dev] Licensing issue (?) for Frozen Python? [was: More optimisation ideas] In-Reply-To: <22197.34099.197114.379728@turnbull.sk.tsukuba.ac.jp> References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <22197.30483.854098.718888@turnbull.sk.tsukuba.ac.jp> <22197.34099.197114.379728@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sat, Feb 6, 2016 at 4:31 PM, Stephen J. Turnbull wrote: > However, the technical problem remains. For example, you mention > Debian. While Debian keeps its source and binary packages very close > to "in sync" on the server, there are several gotchas. For example, > Debian does not restrict itself to packaging patches, it sometimes > breaks your security when it thinks it's smarter than Bruce. So > ... is the corresponding source you're interested in the patched or > unpatched source? Do you know which you get when you install the > source package? Do you know how to get the other? Suppose for > reasons of stability you've "pinned" the binary. Is the corresponding > Debian source package still easily available? Did you think of that > gotcha when you installed the source package, or did you just assume > they were still in sync? I'm sure somebody with the "security > mindset" (eg, Bruce) can think of many more.... Right, sure. The technical problems are still there. Although I'm fairly confident that Debian's binaries would correspond to Debian's source - but honestly, if I'm looking for sources for anything other than the kernel, I probably want to get the latest from source control, rather than using the somewhat older version shipped in the repos. As to availability, though, most of the big distros (including Debian) keep their sources around for a long time. ChrisA From ncoghlan at gmail.com Sat Feb 6 02:05:26 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Feb 2016 17:05:26 +1000 Subject: [Python-Dev] [Speed] speed.python.org In-Reply-To: References: Message-ID: On 6 February 2016 at 04:07, Brett Cannon wrote: > On Thu, 4 Feb 2016 at 05:46 Nick Coghlan wrote: >> Heh, cdecimal utterly demolishing the old pure Python decimal module >> on the telco benchmark means normalising against CPython 3.5 rather >> than 2.7 really isn't very readable :) > > I find viewing the graphs using the horizontal layout is much easier to read > (the bars are a lot thicker and everything zooms in more). That comment was based on the horizontal layout - the telco benchmark runs ~53x faster in Python 3 than it does in Python 2 (without switching to cdecimal), so you end up with all the other benchmarks being squashed into the leftmost couple of grid cells. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Sat Feb 6 09:32:19 2016 From: barry at python.org (Barry Warsaw) Date: Sat, 6 Feb 2016 09:32:19 -0500 Subject: [Python-Dev] Licensing issue (?) for Frozen Python? [was: More optimisation ideas] In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <22197.30483.854098.718888@turnbull.sk.tsukuba.ac.jp> <22197.34099.197114.379728@turnbull.sk.tsukuba.ac.jp> Message-ID: <20160206093219.2ac5a449@subdivisions.wooz.org> On Feb 06, 2016, at 04:38 PM, Chris Angelico wrote: >Right, sure. The technical problems are still there. Although I'm >fairly confident that Debian's binaries would correspond to Debian's >source - but honestly, if I'm looking for sources for anything other >than the kernel, I probably want to get the latest from source >control, rather than using the somewhat older version shipped in the >repos. > >As to availability, though, most of the big distros (including Debian) >keep their sources around for a long time. Not to get too deep into what other projects do, but yes in Debian, you can always get the patched source that corresponds to the binary you've installed, usually in both version controlled form and otherwise. I'd expect this to be true of most if not all of the Linux distros. A more interesting question is how you can actually verify this equivalence, and there are folks across the ecosystem working on reproducible builds. The idea is that you should be able to take the source that *claims* to correspond to that binary, and using the established build tools, locally reproduce a bit-wise exact duplicate of the binary. I've applied and submitted several patches to various upstreams that help with this effort, such as being able to pass in "locked" datetimes instead of the package always using e.g. datetime.now(). Let's not dive down the rabbit hole too far into how you can trust your build tool chain, and every other layer down to the quantum. Cheers, -Barry From me at ixokai.io Sat Feb 6 02:58:20 2016 From: me at ixokai.io (Stephen Hansen) Date: Fri, 05 Feb 2016 23:58:20 -0800 Subject: [Python-Dev] More optimisation ideas In-Reply-To: References: <56AB9BCE.2080000@python.org> <56ACE564.7080107@python.org> <56AE55C8.8000807@egenix.com> <20160201031226.GF31806@ando.pearwood.info> <20160201164023.CC500B200A1@webabinitio.net> <56AF93D9.2040104@stoneleaf.us> <56B4DDD7.8060905@sdamon.com> Message-ID: <1454745500.2156470.513600826.6BD53848@webmail.messagingengine.com> On Fri, Feb 5, 2016, at 10:33 AM, Emile van Sebille wrote: > On 2/5/2016 9:37 AM, Alexander Walters wrote: > > > > On 2/5/2016 12:27, Emile van Sebille wrote: > >> On 2/1/2016 9:20 AM, Ethan Furman wrote: > >>> On 02/01/2016 08:40 AM, R. David Murray wrote: > >> > >>>> On the other hand, if the distros go the way Nick has (I think) been > >>>> advocating, and have a separate 'system python for system scripts' that > >>>> is independent of the one installed for user use, having the > >>>> system-only > >>>> python be frozen and sourceless would actually make sense on a > >>>> couple of > >>>> levels. > >>> > >>> Agreed. > >> > >> Except for that nasty licensing issue requiring source code. > >> > >> Emile > > Licensing requires, in the GPL at least, that the *modified* sources be > > made *available*, not that they be shipped with the product. Looking at > > the Python license, and what tools already do, there is zero need to > > ship the source to stay compliant. > > Hmm, the annotated Open Source Definition explicitly states "The program > must include source code" -- how did I misinterpret that? Couple things. First, the OSD is not authoritative. Python's license establishes the rules of its distribution: that Python's license is considered compatible with the OSD doesn't actually mean your reading of anything on the OSD page as having any binding meaning. Second, OSD's Rule 2 means that those who are distributing Python -- the PSF, originally -- must provide source code if they're distributing it under Python's license, but it doesn't actually mean it must be packaged with it in every download. In fact, its not today. The standard library source is included in normal downloads, but the C source of Python isn't. But you can download it readily though, so that's fine. Its fully compliant with the OSD. But! If Debian (pulling them out of a hat randomly) is distributing Python, they aren't the PSF, and notably are not bound by the OSD rules, only by Python's license terms. The PSF satisfied their requirements to the licensing terms when releasing Python, but now Debian has Python, and they are distributing it-- that's an entirely separate act, and you must look at them as a separate actor in terms of the license. They don't have to distribute it in the same license. They must be ABLE to (as OSD's Rule 3 says), but they don't HAVE to. Some random person can take Python, rename it Snakey, and release it under almost any license they want and give no one the source code at all. Python has from the beginning allowed this:its actually in quite a few closed source / proprietary products without ever advertising it and providing no source, entirely legally and ethically -- Python's gone out of its way to support this sort of use-case. As it happens, Debian usually distributes something very close to the official release (sometimes they backport patches and such), and always does so under the same license as Python (AFAICT), but they don't *have* to. GPL is copyleft and requires its derivative works to be GPL'd (or at least, no more restrictive then GPL)-- so in GPL, to distribute it you MUST distribute it under GPL-compatible terms. Python is a permissive license and allows anyone to do basically anything, INCLUDING produce closed source releases if someone wanted to, or just release modifications or modules that are available under different licenses. The OSD encompasses both ends of the spectrum: the GPL's mandate of source access and the OSD's mandate of the receiver to be able to distribute in the same terms they received (notably, NOT the same terms it was originally released under). -- Stephen Hansen m e @ i x o k a i . i o From python at stevedower.id.au Sat Feb 6 16:01:00 2016 From: python at stevedower.id.au (Steve Dower) Date: Sat, 6 Feb 2016 13:01:00 -0800 Subject: [Python-Dev] PEP 514: Python environment registration in the Windows Registry In-Reply-To: <56B39E26.3060407@sdamon.com> References: <56B18CD7.2010409@python.org> <56B1B73C.1030204@sdamon.com> <56B39E26.3060407@sdamon.com> Message-ID: <56B65F0C.2070403@python.org> I've posted an updated version of this PEP that should soon be visible at https://www.python.org/dev/peps/pep-0514. Leaving aside the fact that the current implementation of Python relies on *other* information in the registry (that is not specified in this PEP), I'm still looking for feedback or concerns from developers who are likely to create or use the keys that are described here. ---------------- PEP: 514 Title: Python registration in the Windows registry Version: $Revision$ Last-Modified: $Date$ Author: Steve Dower Status: Draft Type: Informational Content-Type: text/x-rst Created: 02-Feb-2016 Post-History: 02-Feb-2016 Abstract ======== This PEP defines a schema for the Python registry key to allow third-party installers to register their installation, and to allow applications to detect and correctly display all Python environments on a user's machine. No implementation changes to Python are proposed with this PEP. Python environments are not required to be registered unless they want to be automatically discoverable by external tools. The schema matches the registry values that have been used by the official installer since at least Python 2.5, and the resolution behaviour matches the behaviour of the official Python releases. Motivation ========== When installed on Windows, the official Python installer creates a registry key for discovery and detection by other applications. This allows tools such as installers or IDEs to automatically detect and display a user's Python installations. Third-party installers, such as those used by distributions, typically create identical keys for the same purpose. Most tools that use the registry to detect Python installations only inspect the keys used by the official installer. As a result, third-party installations that wish to be discoverable will overwrite these values, resulting in users "losing" their Python installation. By describing a layout for registry keys that allows third-party installations to register themselves uniquely, as well as providing tool developers guidance for discovering all available Python installations, these collisions should be prevented. Definitions =========== A "registry key" is the equivalent of a file-system path into the registry. Each key may contain "subkeys" (keys nested within keys) and "values" (named and typed attributes attached to a key). ``HKEY_CURRENT_USER`` is the root of settings for the currently logged-in user, and this user can generally read and write all settings under this root. ``HKEY_LOCAL_MACHINE`` is the root of settings for all users. Generally, any user can read these settings but only administrators can modify them. It is typical for values under ``HKEY_CURRENT_USER`` to take precedence over those in ``HKEY_LOCAL_MACHINE``. On 64-bit Windows, ``HKEY_LOCAL_MACHINE\Software\Wow6432Node`` is a special key that 32-bit processes transparently read and write to rather than accessing the ``Software`` key directly. Structure ========= We consider there to be a single collection of Python environments on a machine, where the collection may be different for each user of the machine. There are three potential registry locations where the collection may be stored based on the installation options of each environment:: HKEY_CURRENT_USER\Software\Python\\ HKEY_LOCAL_MACHINE\Software\Python\\ HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\\ Environments are uniquely identified by their Company-Tag pair, with two options for conflict resolution: include everything, or give priority to user preferences. Tools that include every installed environment, even where the Company-Tag pairs match, should ensure users can easily identify whether the registration was per-user or per-machine. Tools that give priority to user preferences must ignore values from ``HKEY_LOCAL_MACHINE`` when a matching Company-Tag pair exists is in ``HKEY_CURRENT_USER``. Official Python releases use ``PythonCore`` for Company, and the value of ``sys.winver`` for Tag. Other registered environments may use any values for Company and Tag. Recommendations are made in the following sections. Python environments are not required to register themselves unless they want to be automatically discoverable by external tools. Backwards Compatibility ----------------------- Python 3.4 and earlier did not distinguish between 32-bit and 64-bit builds in ``sys.winver``. As a result, it is possible to have valid side-by-side installations of both 32-bit and 64-bit interpreters. To ensure backwards compatibility, applications should treat environments listed under the following two registry keys as distinct, even when the Tag matches:: HKEY_LOCAL_MACHINE\Software\Python\PythonCore\ HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\PythonCore\ Environments listed under ``HKEY_CURRENT_USER`` may be treated as distinct from both of the above keys, potentially resulting in three environments discovered using the same Tag. Alternatively, a tool may determine whether the per-user environment is 64-bit or 32-bit and give it priority over the per-machine environment, resulting in a maximum of two discovered environments. It is not possible to detect side-by-side installations of both 64-bit and 32-bit versions of Python prior to 3.5 when they have been installed for the current user. Python 3.5 and later always uses different Tags for 64-bit and 32-bit versions. Environments registered under other Company names must use distinct Tags to support side-by-side installations. There is no backwards compatibility allowance. Company ------- The Company part of the key is intended to group related environments and to ensure that Tags are namespaced appropriately. The key name should be alphanumeric without spaces and likely to be unique. For example, a trademarked name, a UUID, or a hostname would be appropriate:: HKEY_CURRENT_USER\Software\Python\ExampleCorp HKEY_CURRENT_USER\Software\Python\6C465E66-5A8C-4942-9E6A-D29159480C60 HKEY_CURRENT_USER\Software\Python\www.example.com The company name ``PyLauncher`` is reserved for the PEP 397 launcher (``py.exe``). It does not follow this convention and should be ignored by tools. If a string value named ``DisplayName`` exists, it should be used to identify the environment category to users. Otherwise, the name of the key should be used. If a string value named ``SupportUrl`` exists, it may be displayed or otherwise used to direct users to a web site related to the environment. A complete example may look like:: HKEY_CURRENT_USER\Software\Python\ExampleCorp (Default) = (value not set) DisplayName = "Example Corp" SupportUrl = "http://www.example.com" Tag --- The Tag part of the key is intended to uniquely identify an environment within those provided by a single company. The key name should be alphanumeric without spaces and stable across installations. For example, the Python language version, a UUID or a partial/complete hash would be appropriate; an integer counter that increases for each new environment may not:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6 HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66 If a string value named ``DisplayName`` exists, it should be used to identify the environment to users. Otherwise, the name of the key should be used. If a string value named ``SupportUrl`` exists, it may be displayed or otherwise used to direct users to a web site related to the environment. If a string value named ``Version`` exists, it should be used to identify the version of the environment. This is independent from the version of Python implemented by the environment. If a string value named ``SysVersion`` exists, it must be in ``x.y`` or ``x.y.z`` format matching the version returned by ``sys.version_info`` in the interpreter. Otherwise, if the Tag matches this format it is used. If not, the Python version is unknown. Note that each of these values is recommended, but optional. A complete example may look like this:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66 (Default) = (value not set) DisplayName = "Distro 3" SupportUrl = "http://www.example.com/distro-3" Version = "3.0.12345.0" SysVersion = "3.6.0" InstallPath ----------- Beneath the environment key, an ``InstallPath`` key must be created. This key is always named ``InstallPath``, and the default value must match ``sys.prefix``:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6\InstallPath (Default) = "C:\ExampleCorpPy36" If a string value named ``ExecutablePath`` exists, it must be a path to the ``python.exe`` (or equivalent) executable. Otherwise, the interpreter executable is assumed to be called ``python.exe`` and exist in the directory referenced by the default value. If a string value named ``WindowedExecutablePath`` exists, it must be a path to the ``pythonw.exe`` (or equivalent) executable. Otherwise, the windowed interpreter executable is assumed to be called ``pythonw.exe`` and exist in the directory referenced by the default value. A complete example may look like:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\InstallPath (Default) = "C:\ExampleDistro30" ExecutablePath = "C:\ExampleDistro30\ex_python.exe" WindowedExecutablePath = "C:\ExampleDistro30\ex_pythonw.exe" Help ---- Beneath the environment key, a ``Help`` key may be created. This key is always named ``Help`` if present and has no default value. Each subkey of ``Help`` specifies a documentation file, tool, or URL associated with the environment. The subkey may have any name, and the default value is a string appropriate for passing to ``os.startfile`` or equivalent. If a string value named ``DisplayName`` exists, it should be used to identify the help file to users. Otherwise, the key name should be used. A complete example may look like:: HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\Help Python\ (Default) = "C:\ExampleDistro30\python36.chm" DisplayName = "Python Documentation" Extras\ (Default) = "http://www.example.com/tutorial" DisplayName = "Example Distro Online Tutorial" Other Keys ---------- Some other registry keys are used for defining or inferring search paths under certain conditions. A third-party installation is permitted to define these keys under their Company-Tag key, however, the interpreter must be modified and rebuilt in order to read these values. Copyright ========= This document has been placed in the public domain. From dalanmiller at rethinkdb.com Sat Feb 6 16:44:30 2016 From: dalanmiller at rethinkdb.com (Daniel Miller) Date: Sat, 6 Feb 2016 13:44:30 -0800 Subject: [Python-Dev] PEP 0492 __aenter__ & __aexit__ Message-ID: Hi Python-Dev Group, I am trying to implement __aenter__ and __aexit__ for the RethinkDB Python driver. Looking at the PEP I don't see any definitions as to what the expected parameters that __exit__ are supposed to take and couldn't find any other similar implementations. Is there a piece of documentation I should be looking at that I'm missing? https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with Many thanks, Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Feb 6 17:05:33 2016 From: brett at python.org (Brett Cannon) Date: Sat, 06 Feb 2016 22:05:33 +0000 Subject: [Python-Dev] PEP 0492 __aenter__ & __aexit__ In-Reply-To: References: Message-ID: On Sat, 6 Feb 2016 at 13:50 Daniel Miller wrote: > Hi Python-Dev Group, > > I am trying to implement __aenter__ and __aexit__ for the RethinkDB > Python driver. Looking at the PEP I don't see any > definitions as to what the expected parameters that __exit__ are supposed > to take and couldn't find any other similar implementations. Is there a > piece of documentation I should be looking at that I'm missing? > > > https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with > The arguments to __aexit__ are the same as __exit__ in a normal context manager. See https://docs.python.org/3.5/reference/datamodel.html#object.__aexit__ for the official docs for __aexit__. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Sun Feb 7 02:54:27 2016 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 07 Feb 2016 07:54:27 +0000 Subject: [Python-Dev] [Speed] speed.python.org In-Reply-To: References: Message-ID: Displaying ratios linearly rather than on a log scale axis can be misleading depending on what you are looking for. (feature request: allow a log scale?) major kudos to everyone involved in getting this setup! On Fri, Feb 5, 2016 at 11:06 PM Nick Coghlan wrote: > On 6 February 2016 at 04:07, Brett Cannon wrote: > > On Thu, 4 Feb 2016 at 05:46 Nick Coghlan wrote: > >> Heh, cdecimal utterly demolishing the old pure Python decimal module > >> on the telco benchmark means normalising against CPython 3.5 rather > >> than 2.7 really isn't very readable :) > > > > I find viewing the graphs using the horizontal layout is much easier to > read > > (the bars are a lot thicker and everything zooms in more). > > That comment was based on the horizontal layout - the telco benchmark > runs ~53x faster in Python 3 than it does in Python 2 (without > switching to cdecimal), so you end up with all the other benchmarks > being squashed into the leftmost couple of grid cells. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sun Feb 7 03:22:37 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 7 Feb 2016 09:22:37 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: <56B34A1E.4010501@egenix.com> References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> Message-ID: M.-A. Lemburg schrieb am 04.02.2016 um 13:54: > On 04.02.2016 13:29, Victor Stinner wrote: >> But, why not PyObject_Malloc() & PObject_Free() were not used in the >> first place? > > Good question. I guess developers simply thought of PyObject_Malloc() > being for PyObjects, not arbitrary memory buffers, most likely > because pymalloc was advertised as allocator for Python objects, > not random chunks of memory. Note that the PyObject_Malloc() functions have never been documented. (Well, there are references regarding their mere existence in the docs, but nothing more than that.) https://docs.python.org/3.6/search.html?q=pyobject_malloc&check_keywords=yes&area=default And, for example, the "what's new in 2.5" document says: """ Python?s API has many different functions for allocating memory that are grouped into families. For example, PyMem_Malloc(), PyMem_Realloc(), and PyMem_Free() are one family that allocates raw memory, while PyObject_Malloc(), PyObject_Realloc(), and PyObject_Free() are another family that?s supposed to be used for creating Python objects. """ I don't think there are many extensions out there in which *object* memory gets allocated manually, which implicitly puts a pretty clear "don't use" marker on these functions. Stefan From victor.stinner at gmail.com Sun Feb 7 05:13:23 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 7 Feb 2016 11:13:23 +0100 Subject: [Python-Dev] Google Summer of Code In-Reply-To: References: Message-ID: Hi, I would like to propose the FAT Python project subject to the Google Summer of Code: https://developers.google.com/open-source/gsoc/ I have a long list of optimization ideas for fatoptimizer: http://fatoptimizer.readthedocs.org/en/latest/todo.html The fatoptimizer project is written in pure Python and has a simple design. I implemented quite simple optimizations which are learnt at school. IMHO such project fits well for a student. Does the PSF already plan to apply to the GSoC? Are there other projects? Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From randyeels at gmail.com Sun Feb 7 08:58:55 2016 From: randyeels at gmail.com (Randy Eels) Date: Sun, 7 Feb 2016 14:58:55 +0100 Subject: [Python-Dev] When does `PyType_Type.tp_alloc get assigned to PyType_GenericAlloc ? Message-ID: Hi everyone, I've a question about the implementation of the `type` builtin (in Python 3.5). In Objects/typeobject.c, the `tp_alloc` slot of PyType_Type gets set to 0. However, I can see (using gdb) that it later gets assigned to `&PyType_GenericAlloc`. I'd argue that this makes sense because, in `type_new`, there is a line where that member function gets called without previously checking whether that member points to something: ``` /* Allocate the type object */ type = (PyTypeObject *)metatype->tp_alloc(metatype, nslots); ``` Yet, I can't seem to understand where and when does the `tp_alloc` slot of PyType_Type get re-assigned to PyType_GenericAlloc. Does that even happen? Or am I missing something bigger? And, just out of further curiosity, why doesn't the aforementioned slot get initialised to `PyType_GenericAlloc` in the first place? Thanks a lot. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalanmiller at rethinkdb.com Sun Feb 7 11:16:41 2016 From: dalanmiller at rethinkdb.com (Daniel Miller) Date: Sun, 7 Feb 2016 08:16:41 -0800 Subject: [Python-Dev] PEP 0492 __aenter__ & __aexit__ In-Reply-To: References: Message-ID: Awesome, I missed that. Thank you Brett. Am I understanding correctly that if I'd like to avoid `async with await EXPR` whatever is returned from EXPR must implement `__await__` as a non-coroutine method? Which then I'd just be able to use `async with ...`? 2016-02-06 16:05 GMT-06:00 Brett Cannon : > > > On Sat, 6 Feb 2016 at 13:50 Daniel Miller > wrote: > >> Hi Python-Dev Group, >> >> I am trying to implement __aenter__ and __aexit__ for the RethinkDB >> Python driver. Looking at the PEP I don't see >> any definitions as to what the expected parameters that __exit__ are >> supposed to take and couldn't find any other similar implementations. Is >> there a piece of documentation I should be looking at that I'm missing? >> >> >> https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with >> > > The arguments to __aexit__ are the same as __exit__ in a normal context > manager. See > https://docs.python.org/3.5/reference/datamodel.html#object.__aexit__ for > the official docs for __aexit__. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sun Feb 7 14:07:33 2016 From: brett at python.org (Brett Cannon) Date: Sun, 07 Feb 2016 19:07:33 +0000 Subject: [Python-Dev] PEP 0492 __aenter__ & __aexit__ In-Reply-To: References: Message-ID: On Sun, 7 Feb 2016 at 08:17 Daniel Miller wrote: > Awesome, I missed that. Thank you Brett. > Welcome! > > Am I understanding correctly that if I'd like to avoid `async with await > EXPR` whatever is returned from EXPR must implement `__await__` as a > non-coroutine method? Which then I'd just be able to use `async with ...`? > Assuming I'm following what you're asking properly, __aenter__ needs to return an awaitable: https://docs.python.org/3/reference/datamodel.html?#awaitable-objects. That is either an object that implements __await__() or a coroutine (which is basically a generator decorated with types.coroutine). > > > > 2016-02-06 16:05 GMT-06:00 Brett Cannon : > >> >> >> On Sat, 6 Feb 2016 at 13:50 Daniel Miller >> wrote: >> >>> Hi Python-Dev Group, >>> >>> I am trying to implement __aenter__ and __aexit__ for the RethinkDB >>> Python driver. Looking at the PEP I don't see >>> any definitions as to what the expected parameters that __exit__ are >>> supposed to take and couldn't find any other similar implementations. Is >>> there a piece of documentation I should be looking at that I'm missing? >>> >>> >>> https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with >>> >> >> The arguments to __aexit__ are the same as __exit__ in a normal context >> manager. See >> https://docs.python.org/3.5/reference/datamodel.html#object.__aexit__ for >> the official docs for __aexit__. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sun Feb 7 14:42:16 2016 From: brett at python.org (Brett Cannon) Date: Sun, 07 Feb 2016 19:42:16 +0000 Subject: [Python-Dev] Google Summer of Code In-Reply-To: References: Message-ID: On Sun, 7 Feb 2016 at 02:14 Victor Stinner wrote: > Hi, > > I would like to propose the FAT Python project subject to the Google > Summer of Code: > https://developers.google.com/open-source/gsoc/ > > I have a long list of optimization ideas for fatoptimizer: > http://fatoptimizer.readthedocs.org/en/latest/todo.html > > The fatoptimizer project is written in pure Python and has a simple > design. I implemented quite simple optimizations which are learnt at > school. IMHO such project fits well for a student. > > Does the PSF already plan to apply to the GSoC? Are there other projects? > Terri Oda has already emailed the core-mentorship mailing list looking for core projects, so the PSF is definitely doing it again and is looking for core projects. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Feb 7 14:45:12 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Feb 2016 11:45:12 -0800 Subject: [Python-Dev] When does `PyType_Type.tp_alloc get assigned to PyType_GenericAlloc ? In-Reply-To: References: Message-ID: I think it's probably line 2649 in typeobject.c, in type_new(): type->tp_alloc = PyType_GenericAlloc; On Sun, Feb 7, 2016 at 5:58 AM, Randy Eels wrote: > Hi everyone, > > I've a question about the implementation of the `type` builtin (in Python > 3.5). > > In Objects/typeobject.c, the `tp_alloc` slot of PyType_Type gets set to 0. > However, I can see (using gdb) that it later gets assigned to > `&PyType_GenericAlloc`. I'd argue that this makes sense because, in > `type_new`, there is a line where that member function gets called without > previously checking whether that member points to something: > > ``` > /* Allocate the type object */ > type = (PyTypeObject *)metatype->tp_alloc(metatype, nslots); > ``` > > Yet, I can't seem to understand where and when does the `tp_alloc` slot of > PyType_Type get re-assigned to PyType_GenericAlloc. Does that even happen? > Or am I missing something bigger? > > And, just out of further curiosity, why doesn't the aforementioned slot get > initialised to `PyType_GenericAlloc` in the first place? > > Thanks a lot. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From eryksun at gmail.com Sun Feb 7 15:27:30 2016 From: eryksun at gmail.com (eryk sun) Date: Sun, 7 Feb 2016 14:27:30 -0600 Subject: [Python-Dev] When does `PyType_Type.tp_alloc get assigned to PyType_GenericAlloc ? In-Reply-To: References: Message-ID: On Sun, Feb 7, 2016 at 7:58 AM, Randy Eels wrote: > > Yet, I can't seem to understand where and when does the `tp_alloc` slot of > PyType_Type get re-assigned to PyType_GenericAlloc. Does that even happen? > Or am I missing something bigger? _Py_InitializeEx_Private in Python/pylifecycle.c calls _Py_ReadyTypes in Objects/object.c. This calls PyType_Ready(&PyType_Type) in Objects/typeobject.c, which assigns type->tp_base = &PyBaseObject_Type and then calls inherit_slots. This executes COPYSLOT(tp_alloc), which assigns PyType_Type.tp_alloc = PyBaseObject_Type.tp_alloc, which is statically assigned as PyType_GenericAlloc. Debug trace on Windows: 0:000> bp python35!PyType_Ready 0:000> g Breakpoint 0 hit python35!PyType_Ready: 00000000`6502d160 4053 push rbx 0:000> ?? ((PyTypeObject *)@rcx)->tp_name char * 0x00000000`650e4044 "object" 0:000> g Breakpoint 0 hit python35!PyType_Ready: 00000000`6502d160 4053 push rbx 0:000> ?? ((PyTypeObject *)@rcx)->tp_name char * 0x00000000`651d8e5c "type" 0:000> bp python35!inherit_slots 0:000> g Breakpoint 1 hit python35!inherit_slots: 00000000`6502c440 48895c2408 mov qword ptr [rsp+8],rbx ss:00000000`0028f960={ python35!PyType_Type (00000000`6527cba0)} At entry to inherit_slots, PyType_Type.tp_alloc is NULL: 0:000> ?? python35!PyType_Type.tp_alloc * 0x00000000`00000000 0:000> pt python35!inherit_slots+0xd17: 00000000`6502d157 c3 ret At exit it's set to PyType_GenericAlloc: 0:000> ?? python35!PyType_Type.tp_alloc * 0x00000000`65025580 0:000> ln 65025580 (00000000`65025580) python35!PyType_GenericAlloc | (00000000`650256a0) python35!PyType_GenericNew Exact matches: python35!PyType_GenericAlloc (void) From stephen at xemacs.org Sun Feb 7 21:38:52 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 8 Feb 2016 11:38:52 +0900 Subject: [Python-Dev] Google Summer of Code In-Reply-To: References: Message-ID: <22199.65468.795324.917055@turnbull.sk.tsukuba.ac.jp> As long as it's been brought up here: Yes, the PSF is applying. Google has been deliberately stirring things up in the last couple of years, so no promises, but it is very likely that we will be approved and core Python will have at least two slots allocated (although I'm not sure core Python even has a suborg admin yet, Terri will recruit someone). Please see the Wiki page at https://wiki.python.org/moin/SummerOfCode/2016 for more details. To contact the org admins (for wiki permissions and to join the mentors' mailing list), use the python.org mailbox gsoc-admins. core-mentorship is a good place to recruit students as well as to get advice on mentoring (or mentor-mentoring, if you're a suborg admin with new mentors). @Victor: I agree with you that implementing FATPython optimizations is a good student-level project, and one that is likely to attract continuing participation. You should get in touch with the admins[1] about updating the wiki page at: https://wiki.python.org/moin/SummerOfCode/2016/python-core. It's looking kinda lonely at the moment. Note that we can still add suborgs and projects after we've been approved (in 2-3 weeks) up to March 7, but the more projects and the more attractive they are, the better our chances of being accepted, and the better the students we'll attract. Footnotes: [1] I'm an assistant admin, but just recovering from Faculty Hell Month and a nasty cold, and don't have a handle on my own tools yet. Somebody else is likely to get to it more quickly. :-( From victor.stinner at gmail.com Mon Feb 8 09:18:52 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 15:18:52 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: <56B35AB5.5090308@egenix.com> References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> <56B35AB5.5090308@egenix.com> Message-ID: 2016-02-04 15:05 GMT+01:00 M.-A. Lemburg : > Sometimes, yes, but we also do allocations for e.g. > parsing values in Python argument tuples (e.g. using > "es" or "et"): > > https://docs.python.org/3.6/c-api/arg.html > > We do document to use PyMem_Free() on those; not sure whether > everyone does this though. It's well documented. If programs start to crash, they must be fixed. I don't propose to "break the API" for free, but to get a speedup on the overall Python. And I don't think that we can say that it's an API change, since we already stated that PyMem_Free() must be used. If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API. > The Python test suite doesn't test Python C extensions, > so it's not surprising that it passes :-) What do you mean by "C extensions"? Which modules? Many modules in the stdlib have "C accelerators" and the PEP 399 now *require* to test the C and Python implementations. >> Instead of teaching developers that well, in fact, PyObject_Malloc() >> is unrelated to object programming, I think that it's simpler to >> modify PyMem_Malloc() to reuse pymalloc ;-) > > Perhaps if you add some guards somewhere :-) We have runtime checks but only implemented in debug mode for efficiency. By the way, I proposed once to add an environment variable to allow to enable these checks without having to recompile Python. Since the PEP 445, it became easy to implement this. What do you think? https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environment-variable "This alternative was rejected because a new environment variable would make Python initialization even more complex. PEP 432 tries to simplify the CPython startup sequence." The PEP 432 looks stuck, so I don't think that we should block enhancements because of this PEP. Anyway, my idea should be easy to implement. > Seriously, this may work if C extensions use the APIs > consistently, but in order to tell, we'd need to check > few. Can you suggest me names of projects that must be tested? > I guess the main question then is whether pymalloc is good enough > for general memory allocation needs; and the answer may well be > "yes". What do you mean by "good enough"? For the runtime performance, pymalloc looks to be faster than malloc(). What are your other criterias? Memory fragmentation? Victor From victor.stinner at gmail.com Mon Feb 8 09:21:25 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 15:21:25 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> Message-ID: 2016-02-07 9:22 GMT+01:00 Stefan Behnel : > Note that the PyObject_Malloc() functions have never been documented. Yeah, there is an old bug to track this: http://bugs.python.org/issue20064 > And, for example, the "what's new in 2.5" document says: > > """ > Python?s API has many different functions for allocating memory that are > grouped into families. For example, PyMem_Malloc(), PyMem_Realloc(), and > PyMem_Free() are one family that allocates raw memory, while > PyObject_Malloc(), PyObject_Realloc(), and PyObject_Free() are another > family that?s supposed to be used for creating Python objects. > """ > > I don't think there are many extensions out there in which *object* memory > gets allocated manually, which implicitly puts a pretty clear "don't use" > marker on these functions. Should I understand that it's another good reason to make PyMem_Malloc() faster for everyone? Victor From victor.stinner at gmail.com Mon Feb 8 09:32:00 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 15:32:00 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? Message-ID: Hi, Since 3.3, functions of the os module started to emit DeprecationWarning when called with bytes filenames. The rationale is quite simple: Windows native type for filenames is Unicode, and the Windows has a weird behaviour when you use bytes. For example, os.listdir(b'.') gives you paths which cannot be used with open() on filenames which are not encodable the ANSI code page. Unencodable characters are replaced with "?". The following issue was opened to document this weird behaviour (but the doc was never completed): "Document that bytes OS API can returns unusable results on Windows" http://bugs.python.org/issue16700 When the new os.scandir() API was designed, I asked to *not* support bytes filenames since they are "broken by design". https://www.python.org/dev/peps/pep-0471/ Recently, an user complained that os.walk() doesn't work with bytes on Windows anymore: "Regression: os.walk now using os.scandir() breaks bytes filenames on windows" http://bugs.python.org/issue25911 Serhiy Storchaka just pushed a change to reintroduce support bytes support on Windows in os.walk(), but I would prefer to do the *opposite*: drop supports for bytes filenames on Windows. Are we brave enough to force users to use the "right" type for filenames? -- On Python 2, it wasn't possible to use Unicode for filenames, many functions fail badly with Unicode, especially when you mix bytes and Unicode. On Python 3, Unicode is the "natural" types, most Python functions prefer Unicode, and the PEP 383 (surrogateescape) allows to safetely use Unicode on UNIX even with undecodable filenames (invalid bytes are stored as Unicode surrogate characters). Victor From victor.stinner at gmail.com Mon Feb 8 09:40:10 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 15:40:10 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: 2016-02-08 15:32 GMT+01:00 Victor Stinner : > Since 3.3, functions of the os module started to emit > DeprecationWarning when called with bytes filenames. > (...) > Recently, an user complained that os.walk() doesn't work with bytes on > Windows anymore: > (...) It's also sad to see that deprecation warnings are completly ignored. Python 3.3 was release in 2011, 5 years ago. I would prefer to show deprecation warnings by default. But I know that it's an old debate: developers vs users :-) I like to see my users as potential developers ;-) Victor From bussonniermatthias at gmail.com Mon Feb 8 11:01:15 2016 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 8 Feb 2016 08:01:15 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: <6A29E782-51AB-4CEF-ACB9-18F0675F109C@gmail.com> > On Feb 8, 2016, at 06:40, Victor Stinner wrote: > > 2016-02-08 15:32 GMT+01:00 Victor Stinner : >> Since 3.3, functions of the os module started to emit >> DeprecationWarning when called with bytes filenames. >> (...) >> Recently, an user complained that os.walk() doesn't work with bytes on >> Windows anymore: >> (...) > > It's also sad to see that deprecation warnings are completly ignored. > Python 3.3 was release in 2011, 5 years ago. > > I would prefer to show deprecation warnings by default. But I know > that it's an old debate: developers vs users :-) I like to see my > users as potential developers ;-) This is tracked in this issue: http://bugs.python.org/issue24294 : DeprecationWarnings should be visible by default in the interactive REPL IPython have enabled them only if they come from __main__. From totally subjective experience, that has already pushed a few library to update their code to new apis[1]. -- M [1] or sometime to wrap code in ignore warnings... > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/bussonniermatthias%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Mon Feb 8 11:23:47 2016 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 8 Feb 2016 11:23:47 -0500 Subject: [Python-Dev] Improving docs for len() of set Message-ID: Hi folks, Just a suggestion for a documentation tweak. Currently the docs for len() on a set say this: .. describe:: len(s) Return the cardinality of set *s*. I'm a relatively seasoned programmer, but I don't really have a maths background, and I didn't know what "cardinality" meant. I could kind of grok it by context, but could we change this to something like the following? .. describe:: len(s) Return the number of elements in set *s* (cardinality of *s*). Happy to open a bugs.python.org issue on this, but wanted to get general consensus first. -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Feb 8 11:38:56 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Feb 2016 08:38:56 -0800 Subject: [Python-Dev] Improving docs for len() of set In-Reply-To: References: Message-ID: <56B8C4A0.4010308@stoneleaf.us> On 02/08/2016 08:23 AM, Ben Hoyt wrote: > .. describe:: len(s) > > Return the number of elements in set *s* (cardinality of *s*). Return the number of elements (cardinality) of *s*. +1 -- ~Ethan~ From abarnert at yahoo.com Mon Feb 8 11:49:43 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Feb 2016 16:49:43 +0000 (UTC) Subject: [Python-Dev] Improving docs for len() of set In-Reply-To: References: Message-ID: <1770987414.726640.1454950183061.JavaMail.yahoo@mail.yahoo.com> On Monday, February 8, 2016 8:23 AM, Ben Hoyt wrote: >Just a suggestion for a documentation tweak. Currently the docs for len() on a set say this: > > .. describe:: len(s)> > Return the cardinality of set *s*. > >I'm a relatively seasoned programmer, but I don't really have a maths background, and I didn't know what "cardinality" meant. I could kind of grok it by context, but could we change this to something like the following? > > .. describe:: len(s) > > Return the number of elements in set *s* (cardinality of *s*). +{{}} (using the normal von Neumann definitions for 0={} and Succ(n) = n U {n}) From tunedconsulting at gmail.com Mon Feb 8 11:48:28 2016 From: tunedconsulting at gmail.com (Lorenzo Moriondo) Date: Mon, 8 Feb 2016 17:48:28 +0100 Subject: [Python-Dev] Improving docs for len() of set In-Reply-To: <56B8C4A0.4010308@stoneleaf.us> References: <56B8C4A0.4010308@stoneleaf.us> Message-ID: @Ben I am not either a great fan of formal (mathematic) definitions (: Lorenzo Moriondo, from mobile ~~it.linkedin.com~in~lorenzomoriondo~~ On Feb 8, 2016 5:38 PM, "Ethan Furman" wrote: > > On 02/08/2016 08:23 AM, Ben Hoyt wrote: > > > .. describe:: len(s) > > > > Return the number of elements in set *s* (cardinality of *s*). > > Return the number of elements (cardinality) of *s*. > > +1 +1 to this for me as well > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tunedconsulting%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Feb 8 11:56:38 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Feb 2016 08:56:38 -0800 Subject: [Python-Dev] Improving docs for len() of set In-Reply-To: <1770987414.726640.1454950183061.JavaMail.yahoo@mail.yahoo.com> References: <1770987414.726640.1454950183061.JavaMail.yahoo@mail.yahoo.com> Message-ID: <56B8C8C6.6030204@stoneleaf.us> On 02/08/2016 08:49 AM, Andrew Barnert via Python-Dev wrote: > +{{}} > > (using the normal von Neumann definitions for 0={} and > Succ(n) = n U {n}) I'm glad you know what you meant, 'cause I haven't got a clue! :) -- ~Ethan~ From brett at python.org Mon Feb 8 12:02:41 2016 From: brett at python.org (Brett Cannon) Date: Mon, 08 Feb 2016 17:02:41 +0000 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On Mon, 8 Feb 2016 at 06:33 Victor Stinner wrote: > Hi, > > Since 3.3, functions of the os module started to emit > DeprecationWarning when called with bytes filenames. > > The rationale is quite simple: Windows native type for filenames is > Unicode, and the Windows has a weird behaviour when you use bytes. For > example, os.listdir(b'.') gives you paths which cannot be used with > open() on filenames which are not encodable the ANSI code page. > Unencodable characters are replaced with "?". The following issue was > opened to document this weird behaviour (but the doc was never > completed): > > "Document that bytes OS API can returns unusable results on Windows" > http://bugs.python.org/issue16700 > > > When the new os.scandir() API was designed, I asked to *not* support > bytes filenames since they are "broken by design". > https://www.python.org/dev/peps/pep-0471/ > > Recently, an user complained that os.walk() doesn't work with bytes on > Windows anymore: > > "Regression: os.walk now using os.scandir() breaks bytes filenames on > windows" > http://bugs.python.org/issue25911 > > > Serhiy Storchaka just pushed a change to reintroduce support bytes > support on Windows in os.walk(), but I would prefer to do the > *opposite*: drop supports for bytes filenames on Windows. > > Are we brave enough to force users to use the "right" type for filenames? > > -- > > On Python 2, it wasn't possible to use Unicode for filenames, many > functions fail badly with Unicode, especially when you mix bytes and > Unicode. > > On Python 3, Unicode is the "natural" types, most Python functions > prefer Unicode, and the PEP 383 (surrogateescape) allows to safetely > use Unicode on UNIX even with undecodable filenames (invalid bytes are > stored as Unicode surrogate characters). > If Unicode string don't work in Python 2 then what is Python 2/3 to do as a cross-platform solution if we completely remove bytes support in Python 3? Wouldn't that mean there is no common type between Python 2 & 3 that one can use which will work with the os module except native strings (which are difficult to get right)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Mon Feb 8 12:10:50 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 08 Feb 2016 12:10:50 -0500 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: <56B8CC1A.7090501@sdamon.com> On 2/8/2016 12:02, Brett Cannon wrote: > > > If Unicode string don't work in Python 2 then what is Python 2/3 to do > as a cross-platform solution if we completely remove bytes support in > Python 3? Wouldn't that mean there is no common type between Python 2 > & 3 that one can use which will work with the os module except native > strings (which are difficult to get right)? The only solution then would be to do `if not PY3: arg = arg.encode(...);; os.SOMEFUNC(arg)`, pardon my psudocode. Its annoying, but at least its not a language syntax change which means it isn't intractable, just an annoying roadblock. If I had my druthers it would be put off until after 2.x is well and truly dead. From kevin.hong at hackillinois.org Mon Feb 8 12:19:15 2016 From: kevin.hong at hackillinois.org (Kevin Hong) Date: Mon, 8 Feb 2016 11:19:15 -0600 Subject: [Python-Dev] HackIllinois 2016 + Python Message-ID: Hi all! My name is Kevin and I am a staff member of HackIllinois, a 36-hour hackathon at the University of Illinois Urbana-Champaign where students from across the nation come to build some of the most innovative hardware and software projects. For highlights from last year?s event, check out go.hackillinois.org/video. >From February 19-21st 2016, HackIllinois returns and we are introducing a new initiative called OpenSource at HackIllinois to promote Open Source development during the event. This program is designed to provide students with the opportunity to meet and collaborate with experienced developers, like you all, who serve as a guide and mentor into the open source world. Over the course of the event, you and your group of hackers will build features for an open source project of your choosing. Please see http://www.hackillinois.org/opensource for more details! If you or any other open source developers you work with are interested in learning more about OpenSource at HackIllinois, feel free to email me at kevin.hong at hackillinois.org. I look forward to speaking with you soon! Best Regards, Kevin Hong -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Mon Feb 8 12:24:26 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 08 Feb 2016 12:24:26 -0500 Subject: [Python-Dev] HackIllinois 2016 + Python In-Reply-To: References: Message-ID: <56B8CF4A.6010505@sdamon.com> Hello. You might want to post this in the psf-community list too. There are a lot of open source developers in the community they are not working directly on CPython (what this list is about). On 2/8/2016 12:19, Kevin Hong wrote: > Hi all! > > My name is Kevin and I am a staff member of HackIllinois, a 36-hour > hackathon at the University of Illinois Urbana-Champaign where > students from across the nation come to build some of the most > innovative hardware and software projects. For highlights from last > year?s event, check outgo.hackillinois.org/video > . > > From February 19-21st 2016, HackIllinois returns and we are > introducing a new initiative called OpenSource at HackIllinois to promote > Open Source development during the event. This program is designed to > provide students with the opportunity to meet and collaborate with > experienced developers, like you all, who serve as a guide and mentor > into the open source world. Over the course of the event, you and your > group of hackers will build features for an open source project of > your choosing. Please see http://www.hackillinois.org/opensource for > more details! > > If you or any other open source developers you work with are > interested in learning more about OpenSource at HackIllinois, feel free > to email me at kevin.hong at hackillinois.org > . I look forward to speaking with > you soon! > > Best Regards, > > Kevin Hong > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Feb 8 12:33:27 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 8 Feb 2016 12:33:27 -0500 Subject: [Python-Dev] PEP 0492 __aenter__ & __aexit__ In-Reply-To: References: Message-ID: <56B8D167.8090505@gmail.com> Brett, Do you think we should update PEP 492 with links to the docs? I'm thinking of adding a new section to the top. Yury On 2016-02-06 5:05 PM, Brett Cannon wrote: > > > On Sat, 6 Feb 2016 at 13:50 Daniel Miller > wrote: > > Hi Python-Dev Group, > > I am trying to implement __aenter__ and __aexit__ for the > RethinkDB Python driver. Looking at the > PEP I don't see any definitions as to what the expected parameters > that __exit__ are supposed to take and couldn't find any other > similar implementations. Is there a piece of documentation I > should be looking at that I'm missing? > > https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with > > > The arguments to __aexit__ are the same as __exit__ in a normal > context manager. See > https://docs.python.org/3.5/reference/datamodel.html#object.__aexit__ for > the official docs for __aexit__. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com From victor.stinner at gmail.com Mon Feb 8 12:44:19 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 18:44:19 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement Message-ID: Hi, I changed the Python compiler to ignore any kind "constant expressions", whereas it only ignored strings and integers before: http://bugs.python.org/issue26204 The compiler now also emits a SyntaxWarning on such case. IMHO the warning can help to detect bugs for developers who just learnt Python. The warning is *not* emited for strings, since triple quoted strings are a common syntax for multiline comments. The warning is *not* emited neither for ellispis (...) since "f(): ..." is a legit syntax for abstract function. Are you ok with the new warning? New behaviour: haypo at smithers$ ./python Python 3.6.0a0 (default:759a975e1230, Feb 8 2016, 18:21:23) >>> def f(): ... False ... :2: SyntaxWarning: ignore constant statement >>> import dis; dis.dis(f) 2 0 LOAD_CONST 0 (None) 3 RETURN_VALUE Old behaviour: haypo at smithers$ python3 Python 3.4.3 (default, Jun 29 2015, 12:16:01) >>> def f(): ... False ... >>> import dis; dis.dis(f) 2 0 LOAD_CONST 1 (False) 3 POP_TOP 4 LOAD_CONST 0 (None) 7 RETURN_VALUE Before strings and numbers were already ignored. Example: haypo at smithers$ python3 Python 3.4.3 (default, Jun 29 2015, 12:16:01) >>> def f(): ... 123 ... >>> import dis; dis.dis(f) 2 0 LOAD_CONST 0 (None) 3 RETURN_VALUE Victor From francismb at email.de Mon Feb 8 13:00:49 2016 From: francismb at email.de (francismb) Date: Mon, 8 Feb 2016 19:00:49 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: <56B8D7D1.2080805@email.de> Hi, On 02/08/2016 06:44 PM, Victor Stinner wrote: > Hi, > > I changed the Python compiler to ignore any kind "constant > expressions", whereas it only ignored strings and integers before: > http://bugs.python.org/issue26204 > > The compiler now also emits a SyntaxWarning on such case. IMHO the > warning can help to detect bugs for developers who just learnt Python. > [...] > New behaviour: > > haypo at smithers$ ./python > Python 3.6.0a0 (default:759a975e1230, Feb 8 2016, 18:21:23) >>>> def f(): > ... False > ... > :2: SyntaxWarning: ignore constant statement > Just for my understanding: What would happen if someone has functions where some return constant expressions and others not and then that functions are used depending on some other context. E.g: def behaviour2(ctx): return 1 def behaviour1(ctx): return some_calculation_with(ctx) [...] if ... : return behaviour1(ctx) else : return behaviour2() Is that going to raise a warning? Thanks in advance! francis From greg at krypto.org Mon Feb 8 13:00:59 2016 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 08 Feb 2016 18:00:59 +0000 Subject: [Python-Dev] Improving docs for len() of set In-Reply-To: References: Message-ID: On Mon, Feb 8, 2016 at 8:24 AM Ben Hoyt wrote: > Hi folks, > > Just a suggestion for a documentation tweak. Currently the docs for len() > on a set say this: > > .. describe:: len(s) > > Return the cardinality of set *s*. > > I'm a relatively seasoned programmer, but I don't really have a maths > background, and I didn't know what "cardinality" meant. I could kind of > grok it by context, but could we change this to something like the > following? > > .. describe:: len(s) > > Return the number of elements in set *s* (cardinality of *s*). > > Agreed. Done. :) -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Feb 8 13:06:17 2016 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 08 Feb 2016 18:06:17 +0000 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: On Mon, Feb 8, 2016 at 9:44 AM Victor Stinner wrote: > Hi, > > I changed the Python compiler to ignore any kind "constant > expressions", whereas it only ignored strings and integers before: > http://bugs.python.org/issue26204 > > The compiler now also emits a SyntaxWarning on such case. IMHO the > warning can help to detect bugs for developers who just learnt Python. > > The warning is *not* emited for strings, since triple quoted strings > are a common syntax for multiline comments. > > The warning is *not* emited neither for ellispis (...) since "f(): > ..." is a legit syntax for abstract function. > > Are you ok with the new warning? > I'm +1 on this. -gps > > > New behaviour: > > haypo at smithers$ ./python > Python 3.6.0a0 (default:759a975e1230, Feb 8 2016, 18:21:23) > >>> def f(): > ... False > ... > :2: SyntaxWarning: ignore constant statement > > >>> import dis; dis.dis(f) > 2 0 LOAD_CONST 0 (None) > 3 RETURN_VALUE > > > Old behaviour: > > haypo at smithers$ python3 > Python 3.4.3 (default, Jun 29 2015, 12:16:01) > >>> def f(): > ... False > ... > >>> import dis; dis.dis(f) > 2 0 LOAD_CONST 1 (False) > 3 POP_TOP > 4 LOAD_CONST 0 (None) > 7 RETURN_VALUE > > > > Before strings and numbers were already ignored. Example: > > haypo at smithers$ python3 > Python 3.4.3 (default, Jun 29 2015, 12:16:01) > > >>> def f(): > ... 123 > ... > >>> import dis; dis.dis(f) > 2 0 LOAD_CONST 0 (None) > 3 RETURN_VALUE > > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Feb 8 13:12:57 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Feb 2016 10:12:57 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B8D7D1.2080805@email.de> References: <56B8D7D1.2080805@email.de> Message-ID: <56B8DAA9.4030105@stoneleaf.us> On 02/08/2016 10:00 AM, francismb wrote: > On 02/08/2016 06:44 PM, Victor Stinner wrote: >> I changed the Python compiler to ignore any kind "constant >> expressions", whereas it only ignored strings and integers before: >> http://bugs.python.org/issue26204 >> >> The compiler now also emits a SyntaxWarning on such case. IMHO the >> warning can help to detect bugs for developers who just learnt >> Python. >> > [...] >> New behaviour: >> >> haypo at smithers$ ./python >> Python 3.6.0a0 (default:759a975e1230, Feb 8 2016, 18:21:23) >> --> def f(): >> ... False >> ... >> :2: SyntaxWarning: ignore constant statement > > Just for my understanding: > > What would happen if someone has functions where some return > constant expressions and others not and then that functions > are used depending on some other context. E.g: > > def behaviour2(ctx): > return 1 > > def behaviour1(ctx): > return some_calculation_with(ctx) > > > [...] > > if ... : > return behaviour1(ctx) > else : > return behaviour2() > > > Is that going to raise a warning? No, because those constants are being used (returned). Only constants that aren't used at all get omitted. -- ~Ethan~ From ethan at stoneleaf.us Mon Feb 8 13:15:02 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Feb 2016 10:15:02 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: <56B8DB26.2030006@stoneleaf.us> On 02/08/2016 09:44 AM, Victor Stinner wrote: > Are you ok with the new warning? +1 -- ~Ethan~ From xiscu at email.de Mon Feb 8 13:11:34 2016 From: xiscu at email.de (xiscu) Date: Mon, 8 Feb 2016 19:11:34 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B8D7D1.2080805@email.de> References: <56B8D7D1.2080805@email.de> Message-ID: <56B8DA56.7020403@email.de> >> New behaviour: >> >> haypo at smithers$ ./python >> Python 3.6.0a0 (default:759a975e1230, Feb 8 2016, 18:21:23) >>>>> def f(): >> ... False >> ... Ok, I see in your case there's no return :-) From abarnert at yahoo.com Mon Feb 8 13:16:10 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Feb 2016 18:16:10 +0000 (UTC) Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: <56B8CC1A.7090501@sdamon.com> References: <56B8CC1A.7090501@sdamon.com> Message-ID: <1266180511.802466.1454955370986.JavaMail.yahoo@mail.yahoo.com> On Monday, February 8, 2016 9:11 AM, Alexander Walters wrote: > > On 2/8/2016 12:02, Brett Cannon wrote: >> >> >> If Unicode string don't work in Python 2 then what is Python 2/3 to do >> as a cross-platform solution if we completely remove bytes support in >> Python 3? Wouldn't that mean there is no common type between Python 2 >> & 3 that one can use which will work with the os module except native >> strings (which are difficult to get right)? > > The only solution then would be to do `if not PY3: arg = > arg.encode(...);; os.SOMEFUNC(arg)`, pardon my psudocode. That's exactly what you _don't_ want to do. More generally, the assumption here is wrong. It's not true that you can't use Unicode for Window filenames on Python 2. What is true is that you have to be a lot more careful about using Unicode _consistently_. And that Python 2 gives you very little help in doing so. And some third-party modules may make it harder on you. But if you always use unicode, `os.listdir(u'.')` calls FindFirstFileW instead of FindFirstFileA and gives you back unicode filenames, os.stat or open call _wstat or _wopen with those unicode filenames, etc. The problem is that on POSIX, you're often better off using str everywhere, because Python 2.7 doesn't do surrogate escape. And once you're using str on one platform/unicode on the other for filenames, it gets very easy to mix str and unicode in other places (like strings you want to print out for the user or store in a database), and then you're in mojibake hell. The io module, the pathlib backport, and six can help a bit (at the cost of performance and/or simplicity), but there's no easy answer--if there _were_ an easy answer, we wouldn't have Python 3 in the first place, right? From brett at python.org Mon Feb 8 13:25:15 2016 From: brett at python.org (Brett Cannon) Date: Mon, 08 Feb 2016 18:25:15 +0000 Subject: [Python-Dev] PEP 0492 __aenter__ & __aexit__ In-Reply-To: <56B8D167.8090505@gmail.com> References: <56B8D167.8090505@gmail.com> Message-ID: On Mon, 8 Feb 2016 at 09:34 Yury Selivanov wrote: > Brett, > > Do you think we should update PEP 492 with links to the docs? I'm > thinking of adding a new section to the top. > Probably. Links around the Internet, search engines, etc. will point to the PEP for a while, and so knowing that the most up-to-date info is actually the docs and not the PEP would be good. I honestly just know all of this stuff because of a massive blog post I'm writing on async/await ATM. -Brett > > Yury > > On 2016-02-06 5:05 PM, Brett Cannon wrote: > > > > > > On Sat, 6 Feb 2016 at 13:50 Daniel Miller > > wrote: > > > > Hi Python-Dev Group, > > > > I am trying to implement __aenter__ and __aexit__ for the > > RethinkDB Python driver. Looking at the > > PEP I don't see any definitions as to what the expected parameters > > that __exit__ are supposed to take and couldn't find any other > > similar implementations. Is there a piece of documentation I > > should be looking at that I'm missing? > > > > > https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with > > > > > > The arguments to __aexit__ are the same as __exit__ in a normal > > context manager. See > > https://docs.python.org/3.5/reference/datamodel.html#object.__aexit__ > for > > the official docs for __aexit__. > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Feb 8 13:26:32 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Feb 2016 18:26:32 +0000 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On 8 February 2016 at 14:32, Victor Stinner wrote: > Since 3.3, functions of the os module started to emit > DeprecationWarning when called with bytes filenames. Everywhere? Or just on Windows? I can't tell from your email and I don't have a Unix system to hand to check. > The rationale is quite simple: Windows native type for filenames is > Unicode, and the Windows has a weird behaviour when you use bytes. For > example, os.listdir(b'.') gives you paths which cannot be used with > open() on filenames which are not encodable the ANSI code page. > Unencodable characters are replaced with "?". The following issue was > opened to document this weird behaviour (but the doc was never > completed): > > "Document that bytes OS API can returns unusable results on Windows" > http://bugs.python.org/issue16700 OK, that seems fine, but obviously of limited interest to Unix users who aren't worried about cross-platform portability :-) > When the new os.scandir() API was designed, I asked to *not* support > bytes filenames since they are "broken by design". > https://www.python.org/dev/peps/pep-0471/ > > Recently, an user complained that os.walk() doesn't work with bytes on > Windows anymore: > > "Regression: os.walk now using os.scandir() breaks bytes filenames on windows" > http://bugs.python.org/issue25911 > > Serhiy Storchaka just pushed a change to reintroduce support bytes > support on Windows in os.walk(), but I would prefer to do the > *opposite*: drop supports for bytes filenames on Windows. But leave those APIs as Unix only? That seems like a regression, too (sure, the bytes APIs are problematic on Windows, but only for certain characters AIUI). Windows users currently using programs written using the bytes API (presumably originally intended for Unix where the bytes API was a deliberate choice), who don't hit any encoding issues currently, will see those programs broken for no reason other than "users using different character sets than you may have been hitting issues before". That seems like a weird justification to me... > Are we brave enough to force users to use the "right" type for filenames? If it were *all* users I'd say it's worth considering. But practicality beats purity here IMO, and I feel that allowing people's code to be "portable by default" is a more important goal than enforcing encoding purity on a single platform. Paul From stephane at wirtel.be Mon Feb 8 13:34:43 2016 From: stephane at wirtel.be (Stephane Wirtel) Date: Mon, 8 Feb 2016 19:34:43 +0100 Subject: [Python-Dev] Syntax Highlightning for C API of CPython in VIM Message-ID: <20160208183443.GA27740@sg1> Hi everyone, With my talk "Exploring our Python Interpreter", I think this VIM plugin can be useful for the community. It's a syntax highlighter for the C API of CPython 3.5 and 3.6. I used Clang for the parsing and automatically generated the keywords for VIM. PyObject and the others typedefs of CPython will have the defined color of your favourite editor and it's the same for the enums, the typedefs, the functions and the macros. Where can you use this VIM plugin ? If you want to write a CPython extension or if you want to hack in the CPython code. Check this screenshot: http://i.imgur.com/0k13KOU.png Here is the repository: https://github.com/matrixise/cpython-vim-syntax Please, if you see some issues, tell me via an issue on Github. Thank you so much, Stephane -- St?phane Wirtel - http://wirtel.be - @matrixise From yselivanov.ml at gmail.com Mon Feb 8 14:09:13 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 8 Feb 2016 14:09:13 -0500 Subject: [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: <56B8E7D9.5060302@gmail.com> Zachary, Do you run the benchmarks in rigorous mode? Yury On 2016-02-04 1:48 AM, Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. > > Happy benchmarking, From guido at python.org Mon Feb 8 14:13:56 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Feb 2016 11:13:56 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: On Mon, Feb 8, 2016 at 9:44 AM, Victor Stinner wrote: > I changed the Python compiler to ignore any kind "constant > expressions", whereas it only ignored strings and integers before: > http://bugs.python.org/issue26204 > > The compiler now also emits a SyntaxWarning on such case. IMHO the > warning can help to detect bugs for developers who just learnt Python. Hum. I'm not excited by this idea. It is not bad syntax. Have you actually seen newbies who were confused by such things? -- --Guido van Rossum (python.org/~guido) From zachary.ware+pydev at gmail.com Mon Feb 8 14:33:48 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Mon, 8 Feb 2016 13:33:48 -0600 Subject: [Python-Dev] speed.python.org In-Reply-To: <56B8E7D9.5060302@gmail.com> References: <56B8E7D9.5060302@gmail.com> Message-ID: On Mon, Feb 8, 2016 at 1:09 PM, Yury Selivanov wrote: > Zachary, > > Do you run the benchmarks in rigorous mode? Not currently. I think I need to reschedule when the benchmarks are run anyway, to avoid conflicts with PyPy's usage of that box, and will add rigorous mode when I do that. -- Zach From abarnert at yahoo.com Mon Feb 8 14:46:26 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Feb 2016 11:46:26 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: <35855860-403B-4F19-AEB6-39881B4D699E@yahoo.com> > On Feb 8, 2016, at 11:13, Guido van Rossum wrote: > >> On Mon, Feb 8, 2016 at 9:44 AM, Victor Stinner wrote: >> I changed the Python compiler to ignore any kind "constant >> expressions", whereas it only ignored strings and integers before: >> http://bugs.python.org/issue26204 >> >> The compiler now also emits a SyntaxWarning on such case. IMHO the >> warning can help to detect bugs for developers who just learnt Python. > > Hum. I'm not excited by this idea. It is not bad syntax. Have you > actually seen newbies who were confused by such things? This does overlap to some extent with a problem that newbies *do* get confused by (and that transplants from Ruby don't find confusing, but do keep forgetting): writing an expression as the last statement in a function and then getting a TypeError or AttributeError about NoneType from the caller. Victor's example of a function that was presumably meant to return False, but instead just evaluates False and returns None, does happen. But often, that last expression isn't a constant, but something like self.y - self.x. So I'm not sure how much this warning would help that case. In fact, it might add to the confusion if sometimes you get a warning and sometimes you don't. (And you wouldn't want a warning about any function with no return whose last statement is an expression, because often that's perfectly reasonable code, where the expression is a mutating method call, like self.spam.append(arg).) Also, there are plenty of other common newbie/transplant problems that are similar to this one but can't be caught with a warning, like just referencing a function or method instead of calling it because you left the parens off. That's *usually* a bug, but not always--it could be a LBYL check for an attribute's presence, for example. From victor.stinner at gmail.com Mon Feb 8 14:51:37 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 20:51:37 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: Le 8 f?vr. 2016 8:14 PM, "Guido van Rossum" a ?crit : > Hum. I'm not excited by this idea. It is not bad syntax. Do you see an use case for "constant statements" other than strings and ellipsis? Such statement does nothing. Previously the compiler emited LOAD_CONST+POP_TOP. GCC also emits a warning on such code. > Have you > actually seen newbies who were confused by such things? Well, not really. But I don't see any use case of such code except of obvious mistakes. Sometimes such code appears after multiple refactoring (and mistakes). Are you suggesting to remove the warning? Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Mon Feb 8 15:00:37 2016 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 8 Feb 2016 15:00:37 -0500 Subject: [Python-Dev] Improving docs for len() of set In-Reply-To: References: Message-ID: Thanks! Commit ref: https://hg.python.org/cpython/rev/a67fda8e33b0 -Ben On Mon, Feb 8, 2016 at 1:00 PM, Gregory P. Smith wrote: > > > On Mon, Feb 8, 2016 at 8:24 AM Ben Hoyt wrote: > >> Hi folks, >> >> Just a suggestion for a documentation tweak. Currently the docs for len() >> on a set say this: >> >> .. describe:: len(s) >> >> Return the cardinality of set *s*. >> >> I'm a relatively seasoned programmer, but I don't really have a maths >> background, and I didn't know what "cardinality" meant. I could kind of >> grok it by context, but could we change this to something like the >> following? >> >> .. describe:: len(s) >> >> Return the number of elements in set *s* (cardinality of *s*). >> >> > Agreed. Done. :) > > -gps > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Mon Feb 8 15:09:22 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 08 Feb 2016 15:09:22 -0500 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: <56B8F5F2.6030602@sdamon.com> I am not keen on a SyntaxWarning. Either something is python syntax, or it is not. This warning catches something linters have been catching for ages. I really don't see the value in adding this, and can see it causing more confusion than it solves. In the #python irc channel, we see quite a few newbie mistakes, but declaring a constant that isn't used is rarely if ever one of them. On 2/8/2016 12:44, Victor Stinner wrote: > Hi, > > I changed the Python compiler to ignore any kind "constant > expressions", whereas it only ignored strings and integers before: > http://bugs.python.org/issue26204 > > The compiler now also emits a SyntaxWarning on such case. IMHO the > warning can help to detect bugs for developers who just learnt Python. > > The warning is *not* emited for strings, since triple quoted strings > are a common syntax for multiline comments. > > The warning is *not* emited neither for ellispis (...) since "f(): > ..." is a legit syntax for abstract function. > > Are you ok with the new warning? > > > New behaviour: > > haypo at smithers$ ./python > Python 3.6.0a0 (default:759a975e1230, Feb 8 2016, 18:21:23) >>>> def f(): > ... False > ... > :2: SyntaxWarning: ignore constant statement > >>>> import dis; dis.dis(f) > 2 0 LOAD_CONST 0 (None) > 3 RETURN_VALUE > > > Old behaviour: > > haypo at smithers$ python3 > Python 3.4.3 (default, Jun 29 2015, 12:16:01) >>>> def f(): > ... False > ... >>>> import dis; dis.dis(f) > 2 0 LOAD_CONST 1 (False) > 3 POP_TOP > 4 LOAD_CONST 0 (None) > 7 RETURN_VALUE > > > > Before strings and numbers were already ignored. Example: > > haypo at smithers$ python3 > Python 3.4.3 (default, Jun 29 2015, 12:16:01) > >>>> def f(): > ... 123 > ... >>>> import dis; dis.dis(f) > 2 0 LOAD_CONST 0 (None) > 3 RETURN_VALUE > > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com From guido at python.org Mon Feb 8 15:34:22 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Feb 2016 12:34:22 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: On Mon, Feb 8, 2016 at 11:51 AM, Victor Stinner wrote: > Le 8 f?vr. 2016 8:14 PM, "Guido van Rossum" a ?crit : >> Hum. I'm not excited by this idea. It is not bad syntax. > > Do you see an use case for "constant statements" other than strings and > ellipsis? The same use case as for all dead code: it could be a placeholder for something better in the future. It could also be generated code where the generator expects the optimizer to remove it (or doesn't care). If you want to do linter integration that should probably be integrated with the user's editor, like it is in PyCharm, and IIUC people can do this in e.g. Emacs, Sublime or Vim as well. Leave the interpreter alone. > Such statement does nothing. Previously the compiler emited > LOAD_CONST+POP_TOP. > > GCC also emits a warning on such code. > >> Have you >> actually seen newbies who were confused by such things? > > Well, not really. But I don't see any use case of such code except of > obvious mistakes. Sometimes such code appears after multiple refactoring > (and mistakes). > > Are you suggesting to remove the warning? I haven't seen this warning yet. I take it this is new in the 3.6 branch? -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Feb 8 15:39:00 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Feb 2016 12:39:00 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B8DB26.2030006@stoneleaf.us> References: <56B8DB26.2030006@stoneleaf.us> Message-ID: <56B8FCE4.7090001@stoneleaf.us> On 02/08/2016 10:15 AM, Ethan Furman wrote: > On 02/08/2016 09:44 AM, Victor Stinner wrote: > > > Are you ok with the new warning? > > +1 Changing my vote: -1 on the warning +0 on simply removing the unused constant -- ~Ethan~ From chris.barker at noaa.gov Mon Feb 8 15:41:28 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 8 Feb 2016 12:41:28 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On Mon, Feb 8, 2016 at 6:32 AM, Victor Stinner wrote: > Windows native type for filenames is > Unicode, and the Windows has a weird behaviour when you use bytes. Just to clarify -- what does it currently do for bytes? IIUC, Windows uses UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming some Windows ANSI-compatible encoding? (and what does it return?) Are we brave enough to force users to use the "right" type for filenames? > I think so :-) On Python 2, it wasn't possible to use Unicode for filenames, many > functions fail badly with Unicode, I've had fine success using Unicode filenames with py2 on Windows -- in fact, as soon as my users have non-ansi characters in their names I'm pretty sure I have no choice.... especially when you mix bytes and > Unicode. > well yes, that sure does get ugly! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Feb 8 16:20:24 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 22:20:24 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: Le 8 f?vr. 2016 9:34 PM, "Guido van Rossum" a ?crit : > If you want to do linter integration that should probably be > integrated with the user's editor, like it is in PyCharm, and IIUC > people can do this in e.g. Emacs, Sublime or Vim as well. Leave the > interpreter alone. In GCC, warnings are welcome because it does one thing: compile code. GCC is used by developers. Users use the produced binary. In Python, it's different because it executes code and runs code. It's used by developers and users. It's more tricky to make choices like showing or not deprecation warnings. It looks like most Python developers prefer to use an external linter. I don't really care of the warning, I will remove it. > I haven't seen this warning yet. I take it this is new in the 3.6 branch? Yes it's a recent change in the default branch (a few hours ago). Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Feb 8 16:23:21 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 22:23:21 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B8F5F2.6030602@sdamon.com> References: <56B8F5F2.6030602@sdamon.com> Message-ID: Le 8 f?vr. 2016 9:10 PM, "Alexander Walters" a ?crit : > > I am not keen on a SyntaxWarning. Either something is python syntax, or it is not. Oh I forgot to mention that Python already emits SyntaxWarning, on "assert True" for example. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Feb 8 16:27:23 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Feb 2016 13:27:23 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: On Mon, Feb 8, 2016 at 1:20 PM, Victor Stinner wrote: > Le 8 f?vr. 2016 9:34 PM, "Guido van Rossum" a ?crit : >> If you want to do linter integration that should probably be >> integrated with the user's editor, like it is in PyCharm, and IIUC >> people can do this in e.g. Emacs, Sublime or Vim as well. Leave the >> interpreter alone. > > In GCC, warnings are welcome because it does one thing: compile code. GCC is > used by developers. Users use the produced binary. > > In Python, it's different because it executes code and runs code. It's used > by developers and users. > > It's more tricky to make choices like showing or not deprecation warnings. > > It looks like most Python developers prefer to use an external linter. > > I don't really care of the warning, I will remove it. Thanks! -- --Guido van Rossum (python.org/~guido) From tritium-list at sdamon.com Mon Feb 8 16:28:31 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 08 Feb 2016 16:28:31 -0500 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> Message-ID: <56B9087F.8090801@sdamon.com> What incantation do you need to do to make that behavior apparent? tritium at gesa:~$ python3.5 -W all Python 3.5.1 (default, Dec 18 2015, 02:15:10) [GCC 4.6.3] on linux Type "help", "copyright", "credits" or "license" for more information. Jedi is not installed, falling back to readline >>> assert True >>> On 2/8/2016 16:23, Victor Stinner wrote: > > > Le 8 f?vr. 2016 9:10 PM, "Alexander Walters" > a ?crit : > > > > I am not keen on a SyntaxWarning. Either something is python > syntax, or it is not. > > Oh I forgot to mention that Python already emits SyntaxWarning, on > "assert True" for example. > > Victor > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jayvdb at gmail.com Mon Feb 8 16:37:01 2016 From: jayvdb at gmail.com (John Mark Vandenberg) Date: Tue, 9 Feb 2016 08:37:01 +1100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: On Tue, Feb 9, 2016 at 8:20 AM, Victor Stinner wrote: > Le 8 f?vr. 2016 9:34 PM, "Guido van Rossum" a ?crit : >> If you want to do linter integration that should probably be >> integrated with the user's editor, like it is in PyCharm, and IIUC >> people can do this in e.g. Emacs, Sublime or Vim as well. Leave the >> interpreter alone. > > In GCC, warnings are welcome because it does one thing: compile code. GCC is > used by developers. Users use the produced binary. > > In Python, it's different because it executes code and runs code. It's used > by developers and users. > > It's more tricky to make choices like showing or not deprecation warnings. > > It looks like most Python developers prefer to use an external linter. fwiw, pyflakes doesnt detect this. I've created a bug for that https://bugs.launchpad.net/pyflakes/+bug/1543246 -- John Vandenberg From tritium-list at sdamon.com Mon Feb 8 16:41:13 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 08 Feb 2016 16:41:13 -0500 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: <56B90B79.8080706@sdamon.com> On 2/8/2016 16:37, John Mark Vandenberg wrote: > fwiw, pyflakes doesnt detect this. I've created a bug for that > https://bugs.launchpad.net/pyflakes/+bug/1543246 Flake8 does, so it might be in the ... poorly named ... pep8 checker. From jayvdb at gmail.com Mon Feb 8 16:48:00 2016 From: jayvdb at gmail.com (John Mark Vandenberg) Date: Tue, 9 Feb 2016 08:48:00 +1100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B90B79.8080706@sdamon.com> References: <56B90B79.8080706@sdamon.com> Message-ID: On Tue, Feb 9, 2016 at 8:41 AM, Alexander Walters wrote: > > > On 2/8/2016 16:37, John Mark Vandenberg wrote: >> >> fwiw, pyflakes doesnt detect this. I've created a bug for that >> https://bugs.launchpad.net/pyflakes/+bug/1543246 > > > Flake8 does, so it might be in the ... poorly named ... pep8 checker. I believe the pep8 checker does not have a check for this. Could you confirm; which flake8 code are you seeing? -- John Vandenberg From victor.stinner at gmail.com Mon Feb 8 16:51:50 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 8 Feb 2016 22:51:50 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B9087F.8090801@sdamon.com> References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> Message-ID: 2016-02-08 22:28 GMT+01:00 Alexander Walters : > What incantation do you need to do to make that behavior apparent? I didn't know. I just checked. It's assert used with a non-empty tuple: >>> assert ("tuple",) :1: SyntaxWarning: assertion is always true, perhaps remove parentheses? Victor From jayvdb at gmail.com Mon Feb 8 17:14:36 2016 From: jayvdb at gmail.com (John Mark Vandenberg) Date: Tue, 9 Feb 2016 09:14:36 +1100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> Message-ID: On Tue, Feb 9, 2016 at 8:51 AM, Victor Stinner wrote: > 2016-02-08 22:28 GMT+01:00 Alexander Walters : >> What incantation do you need to do to make that behavior apparent? > > I didn't know. I just checked. It's assert used with a non-empty tuple: > >>>> assert ("tuple",) > :1: SyntaxWarning: assertion is always true, perhaps remove parentheses? And pyflakes also has a check for this, but it is similarly tight. https://github.com/pyflakes/pyflakes/pull/51 It seems that the pyflakes maintainers tend to only accepts patches for scenarios that Python does emit a SyntaxWarning. So it is a bit of catch-22 there wrt unused constants. pylint of course reports these unused constants with its message id "pointless-statement". -- John Vandenberg From tjreedy at udel.edu Mon Feb 8 17:19:06 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 8 Feb 2016 17:19:06 -0500 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> Message-ID: On 2/8/2016 4:51 PM, Victor Stinner wrote: > 2016-02-08 22:28 GMT+01:00 Alexander Walters : >> What incantation do you need to do to make that behavior apparent? > > I didn't know. I just checked. It's assert used with a non-empty tuple: > >>>> assert ("tuple",) > :1: SyntaxWarning: assertion is always true, perhaps remove parentheses? I think this should be left to linters also. -- Terry Jan Reedy From yselivanov.ml at gmail.com Mon Feb 8 17:43:25 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 8 Feb 2016 17:43:25 -0500 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> Message-ID: <56B91A0D.5030003@gmail.com> On 2016-02-08 5:19 PM, Terry Reedy wrote: > On 2/8/2016 4:51 PM, Victor Stinner wrote: >> 2016-02-08 22:28 GMT+01:00 Alexander Walters : >>> What incantation do you need to do to make that behavior apparent? >> >> I didn't know. I just checked. It's assert used with a non-empty tuple: >> >>>>> assert ("tuple",) >> :1: SyntaxWarning: assertion is always true, perhaps remove >> parentheses? > > I think this should be left to linters also. > I agree. I'd remove that warning. Yury From chris.barker at noaa.gov Mon Feb 8 18:21:47 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 8 Feb 2016 15:21:47 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> Message-ID: On Mon, Feb 8, 2016 at 1:51 PM, Victor Stinner wrote: > I didn't know. I just checked. It's assert used with a non-empty tuple: > > >>> assert ("tuple",) > which is more interesting with a tuple without the parentheses: t = In [*4*]: t = True, In [*5*]: t Out[*5*]: (True,) works fine, but not if you use an assert: In [*7*]: assert True, File "", line 1 assert True, ^ SyntaxError: invalid syntax I actually like the Warning with the note about the problem better: :1: SyntaxWarning: assertion is always true, perhaps remove > parentheses? And, of course, more relevant with something Falsey in the tuple: In [*14*]: assert (False,) :1: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert (False,) But I am curious why you get a different error without the parens? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Feb 8 18:48:39 2016 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 8 Feb 2016 23:48:39 +0000 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> Message-ID: <56B92957.5000204@mrabarnett.plus.com> On 2016-02-08 23:21, Chris Barker wrote: > On Mon, Feb 8, 2016 at 1:51 PM, Victor Stinner > wrote: > > I didn't know. I just checked. It's assert used with a non-empty tuple: > > >>> assert ("tuple",) > > > which is more interesting with a tuple without the parentheses: > > t = In [*4*]: t = True, > > In [*5*]: t > > Out[*5*]: (True,) > > works fine, but not if you use an assert: > > In [*7*]: assert True, > > File "", line 1 > > assert True, > > ^ > > SyntaxError:invalid syntax > > I actually like the Warning with the note about the problem better: > > :1: SyntaxWarning: assertion is always true, perhaps remove > parentheses? > > > And, of course, more relevant with something Falsey in the tuple: > > In [*14*]: assert (False,) > > :1: SyntaxWarning: assertion is always > true, perhaps remove parentheses? > > assert (False,) > > But I am curious why you get a different error without the parens? > Try: help('assert') You'll see that in "assert (True,)", the tuple (an object) is the first condition (and probably a mistake), whereas in "assert True,", the True is the condition and the second expression (after the comma) is missing. From chris.barker at noaa.gov Mon Feb 8 19:08:11 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 8 Feb 2016 16:08:11 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B92957.5000204@mrabarnett.plus.com> References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B92957.5000204@mrabarnett.plus.com> Message-ID: On Mon, Feb 8, 2016 at 3:48 PM, MRAB wrote: > help('assert') > > You'll see that in "assert (True,)", the tuple (an object) is the first > condition (and probably a mistake), whereas in "assert True,", the True is > the condition and the second expression (after the comma) is missing. yes, of course, that explains it. -CHB > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From eryksun at gmail.com Mon Feb 8 19:37:13 2016 From: eryksun at gmail.com (eryk sun) Date: Mon, 8 Feb 2016 18:37:13 -0600 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker wrote: > Just to clarify -- what does it currently do for bytes? IIUC, Windows uses > UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming > some Windows ANSI-compatible encoding? (and what does it return?) UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a bytes path that's passed to CreateFileA matches the listing from FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not roundtrip. Invalid byte sequences map to the default character. Note that an ASCII question mark is not always the default character. It depends on the codepage. For example, in codepage 932 (Japanese), it's an error if a lead byte (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not uncommon). In this case the ANSI API substitutes the default character for Japanese, '?' (U+30FB, Katakana middle dot). >>> locale.getpreferredencoding() 'cp932' >>> open(b'\xe05', 'w').close() >>> os.listdir('.') ['?'] >>> os.listdir(b'.') [b'\x81E'] All invalid sequences get mapped to '?', which roundtrips as b'\x81\x45', so you can't reliably create and open files with arbitrary bytes paths in this locale. From guido at python.org Mon Feb 8 19:53:44 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Feb 2016 16:53:44 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B92957.5000204@mrabarnett.plus.com> Message-ID: The warning for 'assert (cond, msg)' was specifically put in because this is a nasty trap. It's *always* a mistaken attempt to write 'assert cond, msg' -- usually in an attempt to break a long line without using a backslash. I'd actually consider promoting it to a syntax error rather than removing the warning. Compared to other "lint warnings" this one is much nastier -- it is also much more certain that it is a mistake. (Much more certain than e.g. an undefined variable, which could still be legitimate code due to dynamic updates to globals() or builtins.) -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Mon Feb 8 20:02:02 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Feb 2016 12:02:02 +1100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B91A0D.5030003@gmail.com> References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B91A0D.5030003@gmail.com> Message-ID: <20160209010201.GZ31806@ando.pearwood.info> On Mon, Feb 08, 2016 at 05:43:25PM -0500, Yury Selivanov wrote: > > > On 2016-02-08 5:19 PM, Terry Reedy wrote: > >On 2/8/2016 4:51 PM, Victor Stinner wrote: > >>2016-02-08 22:28 GMT+01:00 Alexander Walters : > >>>What incantation do you need to do to make that behavior apparent? > >> > >>I didn't know. I just checked. It's assert used with a non-empty tuple: > >> > >>>>>assert ("tuple",) > >>:1: SyntaxWarning: assertion is always true, perhaps remove > >>parentheses? > > > >I think this should be left to linters also. > > > > I agree. I'd remove that warning. Please don't remove the warning, it is very useful. Compare an assertion written correctly: py> assert 1==2, "Error in arithmetic" Traceback (most recent call last): File "", line 1, in AssertionError: Error in arithmetic with the simple mistake of wrapping the "tuple" in parens: py> assert (1==2, "Error in arithmetic") :1: SyntaxWarning: assertion is always true, perhaps remove parentheses? py> This especially hurts people who think that assert is a function. In Python 2.5 and older, you get no warning, and can write wrong code: py> x = 2 py> assert(x==1, 'expected 1 but got %s' % x) py> Removing this warning would be a regression. -- Steve From python at mrabarnett.plus.com Mon Feb 8 20:41:12 2016 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 9 Feb 2016 01:41:12 +0000 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B92957.5000204@mrabarnett.plus.com> Message-ID: <56B943B8.3090000@mrabarnett.plus.com> On 2016-02-09 00:53, Guido van Rossum wrote: > The warning for 'assert (cond, msg)' was specifically put in because > this is a nasty trap. It's *always* a mistaken attempt to write > 'assert cond, msg' -- usually in an attempt to break a long line > without using a backslash. I'd actually consider promoting it to a > syntax error rather than removing the warning. > > Compared to other "lint warnings" this one is much nastier -- it is > also much more certain that it is a mistake. (Much more certain than > e.g. an undefined variable, which could still be legitimate code due > to dynamic updates to globals() or builtins.) Would there be less chance of confusion if there were some kind of syntax such as "assert cond with msg"? From guido at python.org Mon Feb 8 20:49:04 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Feb 2016 17:49:04 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B943B8.3090000@mrabarnett.plus.com> References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B92957.5000204@mrabarnett.plus.com> <56B943B8.3090000@mrabarnett.plus.com> Message-ID: On Mon, Feb 8, 2016 at 5:41 PM, MRAB wrote: > On 2016-02-09 00:53, Guido van Rossum wrote: >> >> The warning for 'assert (cond, msg)' was specifically put in because >> this is a nasty trap. It's *always* a mistaken attempt to write >> 'assert cond, msg' -- usually in an attempt to break a long line >> without using a backslash. I'd actually consider promoting it to a >> syntax error rather than removing the warning. >> >> Compared to other "lint warnings" this one is much nastier -- it is >> also much more certain that it is a mistake. (Much more certain than >> e.g. an undefined variable, which could still be legitimate code due >> to dynamic updates to globals() or builtins.) > > Would there be less chance of confusion if there were some kind of syntax > such as "assert cond with msg"? Perhaps, but as long as the "with msg" isn't mandatory and the "assert x, y" syntax is still valid we'd still have to warn about "assert (x, y)". Note that in general "assert constant" is not a problem (assert True and assert False have their uses :-). It's only the literal tuple form. -- --Guido van Rossum (python.org/~guido) From rosuav at gmail.com Mon Feb 8 20:49:50 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 9 Feb 2016 12:49:50 +1100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56B943B8.3090000@mrabarnett.plus.com> References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B92957.5000204@mrabarnett.plus.com> <56B943B8.3090000@mrabarnett.plus.com> Message-ID: On Tue, Feb 9, 2016 at 12:41 PM, MRAB wrote: > On 2016-02-09 00:53, Guido van Rossum wrote: >> >> The warning for 'assert (cond, msg)' was specifically put in because >> this is a nasty trap. It's *always* a mistaken attempt to write >> 'assert cond, msg' -- usually in an attempt to break a long line >> without using a backslash. I'd actually consider promoting it to a >> syntax error rather than removing the warning. >> >> Compared to other "lint warnings" this one is much nastier -- it is >> also much more certain that it is a mistake. (Much more certain than >> e.g. an undefined variable, which could still be legitimate code due >> to dynamic updates to globals() or builtins.) > > Would there be less chance of confusion if there were some kind of syntax > such as "assert cond with msg"? Is assert the *only* statement that has a comma separating unrelated items? Every other statement that uses a comma is separating identical items (eg "import os, sys" - "os" and "sys" are equivalent), and tokens that have completely different meaning are separated by a word. The only other exception I can think of - pun intended - is the old "except BaseException, e:" syntax, which got dropped in Py3. Maybe it's time to introduce a new syntax with a view to deprecating the comma syntax ("use this old syntax if you need to support Python 2.7"). +1. ChrisA From guido at python.org Mon Feb 8 20:52:30 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Feb 2016 17:52:30 -0800 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B92957.5000204@mrabarnett.plus.com> <56B943B8.3090000@mrabarnett.plus.com> Message-ID: Personally I don't think it's worth the churn. On Mon, Feb 8, 2016 at 5:49 PM, Chris Angelico wrote: > On Tue, Feb 9, 2016 at 12:41 PM, MRAB wrote: >> On 2016-02-09 00:53, Guido van Rossum wrote: >>> >>> The warning for 'assert (cond, msg)' was specifically put in because >>> this is a nasty trap. It's *always* a mistaken attempt to write >>> 'assert cond, msg' -- usually in an attempt to break a long line >>> without using a backslash. I'd actually consider promoting it to a >>> syntax error rather than removing the warning. >>> >>> Compared to other "lint warnings" this one is much nastier -- it is >>> also much more certain that it is a mistake. (Much more certain than >>> e.g. an undefined variable, which could still be legitimate code due >>> to dynamic updates to globals() or builtins.) >> >> Would there be less chance of confusion if there were some kind of syntax >> such as "assert cond with msg"? > > Is assert the *only* statement that has a comma separating unrelated > items? Every other statement that uses a comma is separating identical > items (eg "import os, sys" - "os" and "sys" are equivalent), and > tokens that have completely different meaning are separated by a word. > The only other exception I can think of - pun intended - is the old > "except BaseException, e:" syntax, which got dropped in Py3. Maybe > it's time to introduce a new syntax with a view to deprecating the > comma syntax ("use this old syntax if you need to support Python > 2.7"). > > +1. > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From chris.barker at noaa.gov Mon Feb 8 20:57:15 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 8 Feb 2016 17:57:15 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: <5886558760997884134@unknownmsgid> All I can say is "ouch". Hard to call it a regression to no longer allow this mess... CHB > On Feb 8, 2016, at 4:37 PM, eryk sun wrote: > >> On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker wrote: >> Just to clarify -- what does it currently do for bytes? IIUC, Windows uses >> UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming >> some Windows ANSI-compatible encoding? (and what does it return?) > > UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI > codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a > bytes path that's passed to CreateFileA matches the listing from > FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not > roundtrip. Invalid byte sequences map to the default character. Note > that an ASCII question mark is not always the default character. It > depends on the codepage. > > For example, in codepage 932 (Japanese), it's an error if a lead byte > (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a > value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not > uncommon). In this case the ANSI API substitutes the default character > for Japanese, '?' (U+30FB, Katakana middle dot). > >>>> locale.getpreferredencoding() > 'cp932' >>>> open(b'\xe05', 'w').close() >>>> os.listdir('.') > ['?'] >>>> os.listdir(b'.') > [b'\x81E'] > > All invalid sequences get mapped to '?', which roundtrips as > b'\x81\x45', so you can't reliably create and open files with > arbitrary bytes paths in this locale. From jayvdb at gmail.com Mon Feb 8 21:38:40 2016 From: jayvdb at gmail.com (John Mark Vandenberg) Date: Tue, 9 Feb 2016 13:38:40 +1100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: On Tue, Feb 9, 2016 at 7:34 AM, Guido van Rossum wrote: > On Mon, Feb 8, 2016 at 11:51 AM, Victor Stinner > wrote: >> Le 8 f?vr. 2016 8:14 PM, "Guido van Rossum" a ?crit : >>> Hum. I'm not excited by this idea. It is not bad syntax. >> >> Do you see an use case for "constant statements" other than strings and >> ellipsis? > > The same use case as for all dead code: it could be a placeholder for > something better in the future. Allowing dead code is useful as it allows complex code to be left in place. It can be risky removing the code. Unused literals are stupefyingly simple statements. A line of merely a constant, e.g. 'True' or '1', does not present the same risks or benefits. That it is a hope for something better? It could be easily replaced with 'pass', '...', a comment, and/or a string literal explaining what needs improving. > It could also be generated code where the generator expects the > optimizer to remove it (or doesn't care). Why shouldnt a user see that it is generating such code? There is a decent chance that it is a bug in the generated code. fwiw, this is a syntax warning in Ruby - "unused literal ignored", since 2003 (5aadcd9). -- John Vandenberg From p.f.moore at gmail.com Tue Feb 9 03:08:36 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 9 Feb 2016 08:08:36 +0000 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: <5886558760997884134@unknownmsgid> References: <5886558760997884134@unknownmsgid> Message-ID: On 9 February 2016 at 01:57, Chris Barker - NOAA Federal wrote:OTOH, it's a > All I can say is "ouch". Hard to call it a regression to no longer > allow this mess.. OTOH, it's a major regression for someone using an 8-bit codepage that doesn't have these problems. Code that worked fine for them now doesn't. I dislike "works for some people" solutions as much as anyone, but breaking code that does the job that people need it to is not something we should do lightly (if at all). Paul From victor.stinner at gmail.com Tue Feb 9 04:21:11 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 9 Feb 2016 10:21:11 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: 2016-02-09 1:37 GMT+01:00 eryk sun : > For example, in codepage 932 (Japanese), it's an error if a lead byte > (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a > value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not > uncommon). In this case the ANSI API substitutes the default character > for Japanese, '?' (U+30FB, Katakana middle dot). > > >>> locale.getpreferredencoding() > 'cp932' > >>> open(b'\xe05', 'w').close() > >>> os.listdir('.') > ['?'] > >>> os.listdir(b'.') > [b'\x81E'] Hum, I'm not sure that I understand your example. Can you pass the result of os.listdir(str) to open() on Python 3? Are you able to open the file? Same question for os.listdir(bytes). Victor From victor.stinner at gmail.com Tue Feb 9 04:22:29 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 9 Feb 2016 10:22:29 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: 2016-02-09 1:37 GMT+01:00 eryk sun : > For example, in codepage 932 (Japanese), it's an error if a lead byte > (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a > value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not > uncommon). In this case the ANSI API substitutes the default character > for Japanese, '?' (U+30FB, Katakana middle dot). > > >>> locale.getpreferredencoding() > 'cp932' > >>> open(b'\xe05', 'w').close() > >>> os.listdir('.') > ['?'] > >>> os.listdir(b'.') > [b'\x81E'] > > All invalid sequences get mapped to '?', which roundtrips as > b'\x81\x45', so you can't reliably create and open files with > arbitrary bytes paths in this locale. Oh, and I forgot to ask: what is your filesystem? Is it the same behaviour for NTFS, FAT32, network shared directories, etc.? Victor From contrebasse at gmail.com Tue Feb 9 04:57:02 2016 From: contrebasse at gmail.com (Joseph Martinot-Lagarde) Date: Tue, 9 Feb 2016 09:57:02 +0000 (UTC) Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement References: Message-ID: Victor Stinner gmail.com> writes: > > Hi, > > I changed the Python compiler to ignore any kind "constant > expressions", whereas it only ignored strings and integers before: > http://bugs.python.org/issue26204 > > The compiler now also emits a SyntaxWarning on such case. IMHO the > warning can help to detect bugs for developers who just learnt Python. > > The warning is *not* emited for strings, since triple quoted strings > are a common syntax for multiline comments. > > The warning is *not* emited neither for ellispis (...) since "f(): > ..." is a legit syntax for abstract function. > I frequently use 1/0 as a quick break in a script or a program (it's even more useful with post-mortem debugging). Would it be considered as a constant and ignored instead of raising a ZeroDivisionError ? Joseph From stephen at xemacs.org Tue Feb 9 05:00:36 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 9 Feb 2016 19:00:36 +0900 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: <5886558760997884134@unknownmsgid> References: <5886558760997884134@unknownmsgid> Message-ID: <22201.47300.786077.79522@turnbull.sk.tsukuba.ac.jp> Chris Barker - NOAA Federal writes: > All I can say is "ouch". Hard to call it a regression to no longer > allow this mess... We can't "disallow" the mess, it's embedded in the lunatic computing environment (which I happen to live in). We can't even stop people from using existing Python programs abusing bytes-oriented APIs. All we can do is make it harder for people to port to Python 3, and that would be bad because it's much easier to refactor once you're in Python 3. And as Paul points out, it works fine in ASCII-compatible one-byte environments (and probably in ISO-2022-compatible 8-bit multibyte environments, too -- the big problems are the abominations known as Shift JIS and Big5). Please, let's leave it alone. From victor.stinner at gmail.com Tue Feb 9 05:06:43 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 9 Feb 2016 11:06:43 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: 2016-02-09 10:57 GMT+01:00 Joseph Martinot-Lagarde : > I frequently use 1/0 as a quick break in a script or a program (it's even > more useful with post-mortem debugging). Would it be considered as a > constant and ignored instead of raising a ZeroDivisionError ? "self.x - self.y" and "1/0" are not removed since they have side effects. Right now, "(1, 2, 3)" is not removed. But later we may remove it, since it has no side effect, it's a constant statement. Note: We are talking about statements. 1 is not removed in "lambda: 1" which is a valid expression ;-) Victor From victor.stinner at gmail.com Tue Feb 9 05:13:58 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 9 Feb 2016 11:13:58 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: Hi, 2016-02-08 18:02 GMT+01:00 Brett Cannon : > If Unicode string don't work in Python 2 then what is Python 2/3 to do as a > cross-platform solution if we completely remove bytes support in Python 3? > Wouldn't that mean there is no common type between Python 2 & 3 that one can > use which will work with the os module except native strings (which are > difficult to get right)? IMHO we have to put a line somewhere between Python 2 and Python 3. For some specific use cases, there is no good solution which works on both Python versions. For filenames, there is no simple design on Python 2. bytes is the natural choice on UNIX, whereas Unicode is preferred on Windows. But it's difficult to handle two types in the same code base. As a consequence, most users use bytes on Python 2, which is a bad choice for Windows... On Python 3, it's much simpler: always use Unicode. Again, the PEP 383 helps on UNIX. I wrote a PoC for Mercurial to always use Unicode, but the idea was rejected since Mercurial must support undecodable filenames on UNIX. It's possible on Python 3 (str+PEP 383), not on Python 2. I tried to port Mercurial to Python 3 and use Unicode for filenames in the same change. It's probably better to do that in two steps: first port to Python 3, then use Unicode. I guess that the final change is to drop Python 2? I don't know if it's feasible for Mercurial. Victor From p.f.moore at gmail.com Tue Feb 9 06:35:40 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 9 Feb 2016 11:35:40 +0000 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On 9 February 2016 at 10:13, Victor Stinner wrote: > IMHO we have to put a line somewhere between Python 2 and Python 3. > For some specific use cases, there is no good solution which works on > both Python versions. > > For filenames, there is no simple design on Python 2. bytes is the > natural choice on UNIX, whereas Unicode is preferred on Windows. But > it's difficult to handle two types in the same code base. As a > consequence, most users use bytes on Python 2, which is a bad choice > for Windows... > > On Python 3, it's much simpler: always use Unicode. Again, the PEP 383 > helps on UNIX. So if you were proposing "drop the bytes APIs everywhere" that might be acceptable (for Python 3). But of course it makes porting harder, so it's probably not a good idea until Python 2 is no longer relevant. Paul From alecsandru.patrascu at intel.com Tue Feb 9 06:45:18 2016 From: alecsandru.patrascu at intel.com (Patrascu, Alecsandru) Date: Tue, 9 Feb 2016 11:45:18 +0000 Subject: [Python-Dev] CPython build options for out-of-the box performance Message-ID: <3CF256F4F774BD48A1691D131AA043191424CED7@IRSMSX102.ger.corp.intel.com> Hi all, This is Alecsandru from the Dynamic Scripting Languages Optimization Team at Intel Corporation. I want to open a discussion regarding the way CPython is built, mainly the options that are available to the programmers. Analyzing the CPython ecosystem we can see that there are a lot of users that just download the sources and hit the commands "./configure", "make" and "make install" once and then continue using it with their Python scripts. One of the problems with this workflow it that the users do not benefit from the entire optimization features that are existing in the build system, such as PGO and LTO. Therefore, I propose a workflow, like the following. Assume some work has to be done into the CPython interpreter, a developer can do the following steps: A. Implementation and debugging phase. 1. The command "./configure PYDIST=debug" is ran once. It will enable the Py_DEBUG, -O0 and -g flags 2. The command "make" is ran once or multiple times B. Testing the implementation from step A, in a pre-release environment 1. The command "./configure PYDIST=devel" is ran once. It will disable the Py_DEBUG flags and will enable the -O3 and -g flags, and it is just like the current implementation in CPython 2. The command "make" is ran once or multiple times C. For any other CPython usage, for example distributing the interpreter, installing it inside an operating system, or just the majority of users who are not CPython developers and only want to compile it once and use it as-is: 1. The command "./configure" is ran once. Alternatively, the command "./configure PYDIST=release" can be used. It will disable all debugging functionality, enable the -O3 flag and will enable PGO and LTO. 2. The command "make" is ran once If you think this benefits CPython, I can create an issue and post the patches that enable all of the above. Thank you, Alecsandru From phil at riverbankcomputing.com Tue Feb 9 06:44:55 2016 From: phil at riverbankcomputing.com (Phil Thompson) Date: Tue, 9 Feb 2016 11:44:55 +0000 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files Message-ID: I've been adding support to the SIP wrapper generator for automatically generating PEP 484 compatible stub files so that future versions of PyQt can be shipped with them. By way of feedback I thought I'd share my experience, confusions and suggestions. There are a number of things I'd like to express but cannot find a way to do so... - objects that implement the buffer protocol - type objects - slice objects - capsules - sequences of fixed size (ie. specified in the same way as Tuple) - distinguishing between instance and class attributes. The documentation is incomplete - there is no mention of Set or Tuple for example. I found the documentation confusing regarding Optional. Intuitively it seems to be the way to specify arguments with default values. However it is explained in terms of (for example) Union[str, None] and I (intuitively but incorrectly) read that as meaning "a str or None" as opposed to "a str or nothing". bytes can be used as shorthand for bytes, bytearray and memoryview - but what about objects that really only support bytes? Shouldn't the shorthand be called something like AnyBytes? Is there any recommended way to test the validity and completeness of stub files? What's the recommended way to parse them? Phil From g.brandl at gmx.net Tue Feb 9 06:55:11 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 9 Feb 2016 12:55:11 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: On 02/09/2016 10:57 AM, Joseph Martinot-Lagarde wrote: > Victor Stinner gmail.com> writes: > >> >> Hi, >> >> I changed the Python compiler to ignore any kind "constant >> expressions", whereas it only ignored strings and integers before: >> http://bugs.python.org/issue26204 >> >> The compiler now also emits a SyntaxWarning on such case. IMHO the >> warning can help to detect bugs for developers who just learnt Python. >> >> The warning is *not* emited for strings, since triple quoted strings >> are a common syntax for multiline comments. >> >> The warning is *not* emited neither for ellispis (...) since "f(): >> ..." is a legit syntax for abstract function. >> > > I frequently use 1/0 as a quick break in a script or a program (it's even > more useful with post-mortem debugging). Would it be considered as a > constant and ignored instead of raising a ZeroDivisionError ? At first, expressions involving operators are not seen as constant. But 1/2 would be removed, since the peepholer will evaluate it to 0.5 (or 0) and the constant-removal pass will recognize it as a constant (assuming this ordering of the passes). In the case of 1/0 the peepholer will try to evaluate it, but get an exception and therefore not touch the expression further. cheers, Georg From victor.stinner at gmail.com Tue Feb 9 08:03:00 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 9 Feb 2016 14:03:00 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: 2016-02-08 19:26 GMT+01:00 Paul Moore : > On 8 February 2016 at 14:32, Victor Stinner wrote: >> Since 3.3, functions of the os module started to emit >> DeprecationWarning when called with bytes filenames. > > Everywhere? Or just on Windows? I can't tell from your email and I > don't have a Unix system to hand to check. I propose to only drop support for bytes filenames on Windows. Victor From eryksun at gmail.com Tue Feb 9 08:27:39 2016 From: eryksun at gmail.com (eryk sun) Date: Tue, 9 Feb 2016 07:27:39 -0600 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On Tue, Feb 9, 2016 at 3:21 AM, Victor Stinner wrote: > 2016-02-09 1:37 GMT+01:00 eryk sun : >> For example, in codepage 932 (Japanese), it's an error if a lead byte >> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a >> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not >> uncommon). In this case the ANSI API substitutes the default character >> for Japanese, '?' (U+30FB, Katakana middle dot). >> >> >>> locale.getpreferredencoding() >> 'cp932' >> >>> open(b'\xe05', 'w').close() >> >>> os.listdir('.') >> ['?'] >> >>> os.listdir(b'.') >> [b'\x81E'] > > Hum, I'm not sure that I understand your example. Say I create a sequence of files with the names "file_?[N].txt" encoded in Latin-1, where N is 0-2. They all map to the same file in a Japanese system locale: >>> open(b'file_\xe00.txt', 'w').close(); os.listdir('.') ['file_?.txt'] >>> open(b'file_\xe01.txt', 'w').close(); os.listdir('.') ['file_?.txt'] >>> open(b'file_\xe02.txt', 'w').close(); os.listdir('.') ['file_?.txt'] >>> os.listdir(b'.') [b'file_\x81E.txt'] This isn't a problem with a single-byte codepage such as 1251. For example, codepage 1251 doesn't map b"\x98" to any character, but harmlessly maps it to "\x98" (SOS in the C1 Controls block). Single-byte code pages still have the problem that when a filename is created using the wide-character API, listing it as bytes may use either an approximate mapping (e.g. "?" => "a" in 1251) or the codepage default character (e.g. "\xd7" => "?" in 1251). From eryksun at gmail.com Tue Feb 9 08:33:19 2016 From: eryksun at gmail.com (eryk sun) Date: Tue, 9 Feb 2016 07:33:19 -0600 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On Tue, Feb 9, 2016 at 3:22 AM, Victor Stinner wrote: > 2016-02-09 1:37 GMT+01:00 eryk sun : >> For example, in codepage 932 (Japanese), it's an error if a lead byte >> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a >> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not >> uncommon). In this case the ANSI API substitutes the default character >> for Japanese, '?' (U+30FB, Katakana middle dot). >> >> >>> locale.getpreferredencoding() >> 'cp932' >> >>> open(b'\xe05', 'w').close() >> >>> os.listdir('.') >> ['?'] >> >>> os.listdir(b'.') >> [b'\x81E'] >> >> All invalid sequences get mapped to '?', which roundtrips as >> b'\x81\x45', so you can't reliably create and open files with >> arbitrary bytes paths in this locale. > > Oh, and I forgot to ask: what is your filesystem? Is it the same > behaviour for NTFS, FAT32, network shared directories, etc.? That was tested using NTFS, but the same would apply to FAT32, exFAT, and UDF since they all use Unicode [1]. CreateFile[A|W] wraps the NtCreateFile system call. The NT executive is Unicode, so the system call receives the filename using a Unicode-only OBJECT_ATTRIBUTES [2] record. I can't say what an arbitrary non-Microsoft filesystem will do with the U+30FB character when it processes the IRP_MJ_CREATE. I was only concerned with ANSI<=>Unicode conversion that's implemented in the ntdll.dll runtime library. [1]: https://msdn.microsoft.com/en-us/library/ee681827 [2]: https://msdn.microsoft.com/en-us/library/ff557749 From desmoulinmichel at gmail.com Tue Feb 9 10:09:15 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Tue, 9 Feb 2016 16:09:15 +0100 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: References: Message-ID: <56BA011B.1070105@gmail.com> Hello, Le 08/02/2016 20:13, Guido van Rossum a ?crit : > On Mon, Feb 8, 2016 at 9:44 AM, Victor Stinner wrote: >> I changed the Python compiler to ignore any kind "constant >> expressions", whereas it only ignored strings and integers before: >> http://bugs.python.org/issue26204 >> >> The compiler now also emits a SyntaxWarning on such case. IMHO the >> warning can help to detect bugs for developers who just learnt Python. > Hum. I'm not excited by this idea. It is not bad syntax. Have you > actually seen newbies who were confused by such things? > I give regular Python trainings and I see similar errors regularly such as: - not returning something; - using something without putting the result back in a variable. However, these are impossible to warn about. What's more, I have yet to see somebody creating a constant and not doing anything with it. I never worked with Ruby dev though. My sample of dev is not big enough to be significant, but I haven't met this issue yet. I still like the idea, anything making Python easier for beginers is a good thing for me. One particular argument against it is the use of linters, but you must realize most beginers don't use linters. Just like they don't use virtualenv, pip, pdb, etc. They are part of a toolkit you learn to use on the way, but not something you start with. Besides, many people using Python are not dev, and will just never take the time to use linters, not learn about them. From abarnert at yahoo.com Tue Feb 9 12:15:12 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Feb 2016 17:15:12 +0000 (UTC) Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <56BA011B.1070105@gmail.com> References: <56BA011B.1070105@gmail.com> Message-ID: <303640414.1250951.1455038113015.JavaMail.yahoo@mail.yahoo.com> On Tuesday, February 9, 2016 8:14 AM, Michel Desmoulin wrote: > I give regular Python trainings and I see similar errors regularly such as: > > - not returning something; > - using something without putting the result back in a variable. > > However, these are impossible to warn about. > > What's more, I have yet to see somebody creating a constant and not > doing anything with it. I never worked with Ruby dev though. > > My sample of dev is not big enough to be significant, but I haven't met > this issue yet. I still like the idea, anything making Python easier for > beginers is a good thing for me. What idea do you like? Somehow warning about the things that are impossible to warn about? Or warning about something different that isn't any of the things your novices have faced? Or...? > One particular argument against it is the use of linters, but you must > realize most beginers don't use linters. That doesn't mean the compiler should do everything linters do. Rank beginners are generally writing very simple programs, where the whole thing can be visualized at once, so many warnings aren't relevant. And they haven't learned many important language features, so many warnings are relevant, but they aren't prepared to deal with them (e.g., global variables everywhere because they haven't learned to declare functions yet). As a teacher, do you want to explain all those warnings to them? Or teach them the bad habit of ignoring warnings? Or just not teach them to use linters (or static type checkers, or other such tools) until they're ready to write code that should pass without warnings? Part of learning to use linters effectively is learning to configure them. That's almost certainly not something you want to be teaching beginners when they're just starting out. But if the compiler started adding a bunch of warnings that people had to configure, a la gcc, you'd be forced to teach them right off the bat. And meanwhile, once past the initial stage, many beginners _do_ use linters, they just don't realize it. If you use PyCharm or Eclipse/PyDev or almost any IDE except IDLE, it may be linting in the background and showing you the results as inline code hints, or in some other user-friendly way, or at least catching some of the simpler things a linter would check for. Whether you want to use those tools in your teaching is up to you, but they exist. And if they need any support from the compiler to do their job better, presumably they'd ask for it. > They are part of a toolkit you learn to use > on the way, but not something you start with. Besides, many people using > Python are not dev, and will just never take the time to use linters, > not learn about them. If people who aren't going to go deep enough into Python to write scripts longer than a page don't need linters, then they certainly don't need a bunch of warnings from the compiler either. From abarnert at yahoo.com Tue Feb 9 12:58:17 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Feb 2016 09:58:17 -0800 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: Message-ID: On Feb 9, 2016, at 03:44, Phil Thompson wrote: > > There are a number of things I'd like to express but cannot find a way to do so... > > - objects that implement the buffer protocol That seems like it should be filed as a bug with the typing repo. Presumably this is just an empty type that registers bytes, bytearray, and memoryview, and third-party classes have to register with it manually? > - type objects > - slice objects Can't you just use the concrete types type and slice tor these two? I don't think you need generic or abstract "any metaclass, whether inheriting from type or not" or "any class that meets the slice protocol", do you? > - capsules That one seems reasonable. But maybe there should just be a types.Capsule Type or types.PyCapsule, and then you can just check that the same as any other concrete type? But how often do you need to verify that something is a capsule, without knowing that it's the *right* capsule? At runtime, you'd usually use PyCapsule_IsValid, not PyCapsule_CheckExacf, right? So should the type checker be tracking the name too? > - sequences of fixed size (ie. specified in the same way as Tuple) How would you disambiguate between a sequence of one int and a sequence of 0 or more ints if they're both spelled "Sequence[int]"? That isn't a problem for Tuple, because it's assumed to be heterogeneous, so Tuple[int] can only be a 1-tuple. (This was actually discussed in some depth. I thought it would be a problem, because some types--including tuple itself--are sometimes used as homogenous arbitrary-length containers and sometimes as heterogeneous fixed-length containers, but Guido and others had some good answers for that, even if I can't remember what they were.) > - distinguishing between instance and class attributes. Where? Are you building a protocol that checks the data members of a type for conformance or something? If so, why is an object that has "spam" and "eggs" as instance attributes but "cheese" as a class attribute not usable as an object conforming to the protocol with all three attributes? (Also, does @property count as a class or instance attribute? What about an arbitrary data descriptor? Or a non-data descriptor?) From jayvdb at gmail.com Tue Feb 9 15:00:42 2016 From: jayvdb at gmail.com (John Mark Vandenberg) Date: Wed, 10 Feb 2016 07:00:42 +1100 Subject: [Python-Dev] Very old git mirror under github user "python-git" Message-ID: Does anyone know who controls this mirror, which is attracting pull requests? https://github.com/python-git/python/pulls Can it be pulled down to avoid confusion, since it is using Python's logo? https://github.com/python-git -- John Vandenberg From guido at python.org Tue Feb 9 15:54:38 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Feb 2016 12:54:38 -0800 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: Message-ID: [Just adding to Andrew's response] On Tue, Feb 9, 2016 at 9:58 AM, Andrew Barnert via Python-Dev wrote: > On Feb 9, 2016, at 03:44, Phil Thompson wrote: >> >> There are a number of things I'd like to express but cannot find a way to do so... >> >> - objects that implement the buffer protocol > > That seems like it should be filed as a bug with the typing repo. Presumably this is just an empty type that registers bytes, bytearray, and memoryview, and third-party classes have to register with it manually? Hm, there's no way to talk about these in regular Python code either, is there? I think that issue should be resolved first. Probably by adding something to collections.abc. And then we can add the corresponding name to typing.py. This will take time though (have to wait for 3.6) so I'd recommend 'Any' for now (and filing those bugs). >> - type objects You can use 'type' for this (i.e. the builtin). You can't specify any properties for types though; that's a feature request: https://github.com/python/typing/issues/107 -- but it may be a while before we address it (it's not entirely clear how it should work, and we have many other pressing issues still). >> - slice objects > Can't you just use the concrete types type and slice tor these two? I don't think you need generic or abstract "any metaclass, whether inheriting from type or not" or "any class that meets the slice protocol", do you? Can't you use 'slice' (i.e. the builtin)? Mypy supports that. >> - capsules > > That one seems reasonable. But maybe there should just be a types.Capsule Type or types.PyCapsule, and then you can just check that the same as any other concrete type? > > But how often do you need to verify that something is a capsule, without knowing that it's the *right* capsule? At runtime, you'd usually use PyCapsule_IsValid, not PyCapsule_CheckExacf, right? So should the type checker be tracking the name too? > >> - sequences of fixed size (ie. specified in the same way as Tuple) That's kind of a poor data structure. :-( Why can't you use Tuple here? > How would you disambiguate between a sequence of one int and a sequence of 0 or more ints if they're both spelled "Sequence[int]"? That isn't a problem for Tuple, because it's assumed to be heterogeneous, so Tuple[int] can only be a 1-tuple. (This was actually discussed in some depth. I thought it would be a problem, because some types--including tuple itself--are sometimes used as homogenous arbitrary-length containers and sometimes as heterogeneous fixed-length containers, but Guido and others had some good answers for that, even if I can't remember what they were.) We solved that by allowing Tuple[int, ...] to spell a homogeneous tuple of integers. >> - distinguishing between instance and class attributes. > > Where? Are you building a protocol that checks the data members of a type for conformance or something? If so, why is an object that has "spam" and "eggs" as instance attributes but "cheese" as a class attribute not usable as an object conforming to the protocol with all three attributes? (Also, does @property count as a class or instance attribute? What about an arbitrary data descriptor? Or a non-data descriptor?) It's a known mypy bug. :-( It's somewhat convoluted to fix. https://github.com/JukkaL/mypy/issues/1097 Some things Andrew snipped: > The documentation is incomplete - there is no mention of Set or Tuple for example. Tuple is here: https://docs.python.org/3/library/typing.html#typing.Tuple collections.Set maps to typing.AbstractSet (https://docs.python.org/3/library/typing.html#typing.AbstractSet; present twice in the docs somehow :-( ). typing.Set (corresponding to builtins.set) is indeed missing, I've a note of that: http://bugs.python.org/issue26322. > I found the documentation confusing regarding Optional. Intuitively it seems to be the way to specify arguments with default values. However it is explained in terms of (for example) Union[str, None] and I (intuitively but incorrectly) read that as meaning "a str or None" as opposed to "a str or nothing". But it *does* mean 'str or None'. The *type* of an argument doesn't have any bearing on whether it may be omitted from the argument list by the caller -- these are orthogonal concepts (though sadly the word optional might apply to both). It's possible (though unusual) to have an optional argument that must be a str when given; it's also possible to have a mandatory argument that may be a str or None. Can you help improve the wording in the docs (preferably by filing an issue)? > bytes can be used as shorthand for bytes, bytearray and memoryview - but what about objects that really only support bytes? Shouldn't the shorthand be called something like AnyBytes? We debated that, but found it too annoying to have to import and write write AnyBytes in so many places. The type checker may not be precise for cases that only accept bytes, but hopefully it's more useful in general this way. > Is there any recommended way to test the validity and completeness of stub files? What's the recommended way to parse them? That's also an open issue. For a quick check I tend to just point mypy at a stub file, since it is the most mature implementation of PEP 484 to date (Google's pytype is still working on PEP 484 compatibility). While this doesn't always catch all errors, it will at least find syntax errors and cases that mypy doesn't support. :-) -- --Guido van Rossum (python.org/~guido) From phil at riverbankcomputing.com Tue Feb 9 17:06:25 2016 From: phil at riverbankcomputing.com (Phil Thompson) Date: Tue, 9 Feb 2016 22:06:25 +0000 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: Message-ID: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> On 9 Feb 2016, at 8:54 pm, Guido van Rossum wrote: > > [Just adding to Andrew's response] > > On Tue, Feb 9, 2016 at 9:58 AM, Andrew Barnert via Python-Dev > wrote: >> On Feb 9, 2016, at 03:44, Phil Thompson wrote: >>> >>> There are a number of things I'd like to express but cannot find a way to do so... >>> >>> - objects that implement the buffer protocol >> >> That seems like it should be filed as a bug with the typing repo. Presumably this is just an empty type that registers bytes, bytearray, and memoryview, and third-party classes have to register with it manually? > > Hm, there's no way to talk about these in regular Python code either, > is there? I think that issue should be resolved first. Probably by > adding something to collections.abc. And then we can add the > corresponding name to typing.py. This will take time though (have to > wait for 3.6) so I'd recommend 'Any' for now (and filing those bugs). Ok. >>> - type objects > > You can use 'type' for this (i.e. the builtin). You can't specify any > properties for types though; that's a feature request: > https://github.com/python/typing/issues/107 -- but it may be a while > before we address it (it's not entirely clear how it should work, and > we have many other pressing issues still). Yes, I can use type. >>> - slice objects > >> Can't you just use the concrete types type and slice tor these two? I don't think you need generic or abstract "any metaclass, whether inheriting from type or not" or "any class that meets the slice protocol", do you? > > Can't you use 'slice' (i.e. the builtin)? Mypy supports that. Yes, I can use slice. >>> - capsules >> >> That one seems reasonable. But maybe there should just be a types.Capsule Type or types.PyCapsule, and then you can just check that the same as any other concrete type? >> >> But how often do you need to verify that something is a capsule, without knowing that it's the *right* capsule? At runtime, you'd usually use PyCapsule_IsValid, not PyCapsule_CheckExacf, right? So should the type checker be tracking the name too? >> >>> - sequences of fixed size (ie. specified in the same way as Tuple) > > That's kind of a poor data structure. :-( Why can't you use Tuple here? Because allowing any sequence is more flexible that only allowing a tuple. >> How would you disambiguate between a sequence of one int and a sequence of 0 or more ints if they're both spelled "Sequence[int]"? That isn't a problem for Tuple, because it's assumed to be heterogeneous, so Tuple[int] can only be a 1-tuple. (This was actually discussed in some depth. I thought it would be a problem, because some types--including tuple itself--are sometimes used as homogenous arbitrary-length containers and sometimes as heterogeneous fixed-length containers, but Guido and others had some good answers for that, even if I can't remember what they were.) > > We solved that by allowing Tuple[int, ...] to spell a homogeneous > tuple of integers. > >>> - distinguishing between instance and class attributes. >> >> Where? Are you building a protocol that checks the data members of a type for conformance or something? If so, why is an object that has "spam" and "eggs" as instance attributes but "cheese" as a class attribute not usable as an object conforming to the protocol with all three attributes? (Also, does @property count as a class or instance attribute? What about an arbitrary data descriptor? Or a non-data descriptor?) > > It's a known mypy bug. :-( It's somewhat convoluted to fix. > https://github.com/JukkaL/mypy/issues/1097 > > Some things Andrew snipped: > >> The documentation is incomplete - there is no mention of Set or Tuple for example. > > Tuple is here: https://docs.python.org/3/library/typing.html#typing.Tuple Yes, I missed that. > collections.Set maps to typing.AbstractSet > (https://docs.python.org/3/library/typing.html#typing.AbstractSet; > present twice in the docs somehow :-( ). typing.Set (corresponding to > builtins.set) is indeed missing, I've a note of that: > http://bugs.python.org/issue26322. > >> I found the documentation confusing regarding Optional. Intuitively it seems to be the way to specify arguments with default values. However it is explained in terms of (for example) Union[str, None] and I (intuitively but incorrectly) read that as meaning "a str or None" as opposed to "a str or nothing". > > But it *does* mean 'str or None'. The *type* of an argument doesn't > have any bearing on whether it may be omitted from the argument list > by the caller -- these are orthogonal concepts (though sadly the word > optional might apply to both). It's possible (though unusual) to have > an optional argument that must be a str when given; it's also possible > to have a mandatory argument that may be a str or None. In the case of Python wrappers around a C++ library then *every* optional argument will have to have a specific type when given. So you are saying that a mandatory argument that may be a str or None would be specified as Union[str, None]? But the docs say that that is the underlying implementation of Option[str] - which (to me) means an optional argument that should be a string when given. > Can you help improve the wording in the docs (preferably by filing an issue)? When I eventually understand what it means... >> bytes can be used as shorthand for bytes, bytearray and memoryview - but what about objects that really only support bytes? Shouldn't the shorthand be called something like AnyBytes? > > We debated that, but found it too annoying to have to import and write > write AnyBytes in so many places. The type checker may not be precise > for cases that only accept bytes, but hopefully it's more useful in > general this way. > >> Is there any recommended way to test the validity and completeness of stub files? What's the recommended way to parse them? > > That's also an open issue. For a quick check I tend to just point mypy > at a stub file, since it is the most mature implementation of PEP 484 > to date (Google's pytype is still working on PEP 484 compatibility). > While this doesn't always catch all errors, it will at least find > syntax errors and cases that mypy doesn't support. :-) Ok I'll try that. Phil From yselivanov.ml at gmail.com Tue Feb 9 17:15:47 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 9 Feb 2016 17:15:47 -0500 Subject: [Python-Dev] Issue #26204: compiler now emits a SyntaxWarning on constant statement In-Reply-To: <20160209010201.GZ31806@ando.pearwood.info> References: <56B8F5F2.6030602@sdamon.com> <56B9087F.8090801@sdamon.com> <56B91A0D.5030003@gmail.com> <20160209010201.GZ31806@ando.pearwood.info> Message-ID: <56BA6513.3000306@gmail.com> On 2016-02-08 8:02 PM, Steven D'Aprano wrote: > On Mon, Feb 08, 2016 at 05:43:25PM -0500, Yury Selivanov wrote: >> >> On 2016-02-08 5:19 PM, Terry Reedy wrote: >>> On 2/8/2016 4:51 PM, Victor Stinner wrote: >>>> 2016-02-08 22:28 GMT+01:00 Alexander Walters : >>>>> What incantation do you need to do to make that behavior apparent? >>>> I didn't know. I just checked. It's assert used with a non-empty tuple: >>>> >>>>>>> assert ("tuple",) >>>> :1: SyntaxWarning: assertion is always true, perhaps remove >>>> parentheses? >>> I think this should be left to linters also. >>> >> I agree. I'd remove that warning. > > Please don't remove the warning, it is very useful. > > Compare an assertion written correctly: > > py> assert 1==2, "Error in arithmetic" > Traceback (most recent call last): > File "", line 1, in > AssertionError: Error in arithmetic > > > with the simple mistake of wrapping the "tuple" in parens: > > py> assert (1==2, "Error in arithmetic") > :1: SyntaxWarning: assertion is always true, perhaps remove parentheses? > py> You're right! It's indeed a trap that we should warn about. Thanks! Yury From guido at python.org Tue Feb 9 18:48:41 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Feb 2016 15:48:41 -0800 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> References: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> Message-ID: [Phil] >>> I found the documentation confusing regarding Optional. Intuitively it seems to be the way to specify arguments with default values. However it is explained in terms of (for example) Union[str, None] and I (intuitively but incorrectly) read that as meaning "a str or None" as opposed to "a str or nothing". [me] >> But it *does* mean 'str or None'. The *type* of an argument doesn't >> have any bearing on whether it may be omitted from the argument list >> by the caller -- these are orthogonal concepts (though sadly the word >> optional might apply to both). It's possible (though unusual) to have >> an optional argument that must be a str when given; it's also possible >> to have a mandatory argument that may be a str or None. [Phil] > In the case of Python wrappers around a C++ library then *every* optional argument will have to have a specific type when given. IIUC you're saying that every argument that may be omitted must still have a definite type other than None. Right? In that case just don't use Optional[]. If a signature has the form def foo(a: str = 'xyz') -> str: ... then this means that str may be omitted or it may be a str -- you cannot call foo(a=None). You can even (in a stub file) write this as: def foo(a: str = ...) -> str: ... (literal '...' i.e. ellipsis) if you don't want to commit to a specific default value (it makes no difference to mypy). > So you are saying that a mandatory argument that may be a str or None would be specified as Union[str, None]? Or as Optional[str], which means the same. > But the docs say that that is the underlying implementation of Option[str] - which (to me) means an optional argument that should be a string when given. (Assuming you meant Option*al*.) There seems to be an utter confusion of the two uses of the term "optional" here. An "optional argument" (outside PEP 484) is one that has a default value. The "Optional[T]" notation in PEP 484 means "Union[T, None]". They mean different things. >> Can you help improve the wording in the docs (preferably by filing an issue)? > > When I eventually understand what it means... -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Tue Feb 9 18:56:41 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 09 Feb 2016 15:56:41 -0800 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> Message-ID: <56BA7CB9.2070406@stoneleaf.us> On 02/09/2016 03:48 PM, Guido van Rossum wrote: > (Assuming you meant Option*al*.) There seems to be an utter confusion > of the two uses of the term "optional" here. An "optional argument" > (outside PEP 484) is one that has a default value. The "Optional[T]" > notation in PEP 484 means "Union[T, None]". They mean different > things. In an effort to be (crystal) clear: option argument in Python: has a default value, so may be omitted when the function is called. Optional[T] in MyPy: the argument has no default value, and must be supplied when the function is called, but the argument can be None. -- ~Ethan~ From python at stevedower.id.au Tue Feb 9 20:37:36 2016 From: python at stevedower.id.au (Steve Dower) Date: Tue, 9 Feb 2016 17:37:36 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: Message-ID: Could we perhaps redefine bytes paths on Windows as utf8 and use Unicode everywhere internally? I really don't like the idea of not being able to use bytes in cross platform code. Unless it's become feasible to use Unicode for lossless filenames on Linux - last I heard it wasn't. Top-posted from my Windows Phone -----Original Message----- From: "Victor Stinner" Sent: ?2/?9/?2016 5:05 To: "Paul Moore" Cc: "Python Dev" Subject: Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module? 2016-02-08 19:26 GMT+01:00 Paul Moore : > On 8 February 2016 at 14:32, Victor Stinner wrote: >> Since 3.3, functions of the os module started to emit >> DeprecationWarning when called with bytes filenames. > > Everywhere? Or just on Windows? I can't tell from your email and I > don't have a Unix system to hand to check. I propose to only drop support for bytes filenames on Windows. Victor _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Feb 9 20:41:08 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 10 Feb 2016 12:41:08 +1100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: Message-ID: On Wed, Feb 10, 2016 at 12:37 PM, Steve Dower wrote: > I really don't like the idea of not being able to use bytes in cross > platform code. Unless it's become feasible to use Unicode for lossless > filenames on Linux - last I heard it wasn't. It has, but only in Python 3 - anyone who needs to support 2.7 and arbitrary bytes in filenames can't use Unicode strings. ChrisA From abarnert at yahoo.com Tue Feb 9 21:01:21 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Feb 2016 18:01:21 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: Message-ID: On Feb 9, 2016, at 17:37, Steve Dower wrote: > > Could we perhaps redefine bytes paths on Windows as utf8 and use Unicode everywhere internally? When you receive bytes from argv, stdin, a text file, a GUI, a named pipe, etc., and then use them as a path, Python treating them as UTF-8 would break everything. Plus, the problem only exists in Python 2, and Python is not going to fix Unicode support in Python 2, both because it's too late for such a major change in Python 2, and because it's probably impossible* (which is why we have Python 3 in the first place). > I really don't like the idea of not being able to use bytes in cross platform code. Unless it's become feasible to use Unicode for lossless filenames on Linux - last I heard it wasn't. It is, and has been for years. Surrogate escaping solved the linux problem. That doesn't help for Python 2, but again, it's too late for Python 2. * Well, maybe in the future, some linux distros will bite the same bullet OS X did and mandate that filesystem drivers must expose UTF-8, doing whatever transcoding or other munging is necessary under the covers, to be valid. But I'm guessing any such distros will be all-Python-3 long before then, and the people using Python 2 will also be using old versions or conservative distros. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at stevedower.id.au Tue Feb 9 21:42:45 2016 From: python at stevedower.id.au (Steve Dower) Date: Tue, 9 Feb 2016 18:42:45 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: Message-ID: <56BAA3A5.7070604@python.org> On 09Feb2016 1801, Andrew Barnert wrote: > On Feb 9, 2016, at 17:37, Steve Dower > wrote: > >> Could we perhaps redefine bytes paths on Windows as utf8 and use >> Unicode everywhere internally? > > When you receive bytes from argv, stdin, a text file, a GUI, a named > pipe, etc., and then use them as a path, Python treating them as UTF-8 > would break everything. Sure, but that's already broken today if you're communicating bytes via some protocol without manually managing the encoding, in which case you should be decoding it (and potentially re-encoding to sys.getfilesystemencoding()). The problem here is the protocol that Python uses to return bytes paths, and that protocol is inconsistent between APIs and information is lost. It really requires going through all the OS calls and either (a) making them consistently decode bytes to str using the declared FS encoding (currently 'mbcs', but I see no reason we can't make it 'utf_8'), or (b) make them consistently use the user's current system locale setting by always using the *A Win32 APIs rather than the *W ones. >> I really don't like the idea of not being able to use bytes in cross >> platform code. Unless it's become feasible to use Unicode for lossless >> filenames on Linux - last I heard it wasn't. > > It is, and has been for years. Surrogate escaping solved the linux > problem. That doesn't help for Python 2, but again, it's too late for > Python 2. Okay, guess I was operating out of old information. Thanks (and thanks Chris for the same answer). From stephen at xemacs.org Tue Feb 9 23:17:48 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 10 Feb 2016 13:17:48 +0900 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <56BAA3A5.7070604@python.org> References: <56BAA3A5.7070604@python.org> Message-ID: <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> Steve Dower writes: > On 09Feb2016 1801, Andrew Barnert wrote: > > On Feb 9, 2016, at 17:37, Steve Dower > > wrote: > > > >> Could we perhaps redefine bytes paths on Windows as utf8 and use > >> Unicode everywhere internally? > > > > When you receive bytes from argv, stdin, a text file, a GUI, a named > > pipe, etc., and then use them as a path, Python treating them as UTF-8 > > would break everything. > > Sure, but that's already broken today if you're communicating bytes via > some protocol without manually managing the encoding, in which case you > should be decoding it (and potentially re-encoding to > sys.getfilesystemencoding()). The problem is that treating them as UTF-8 in Python will raise errors on any file name that isn't valid UTF-8, or corrupt the filename if you use one of the handlers available in Python 2. If you use Latin-1, that (1) handles all 256 bytes, and (2) roundtrips to Unicode. Although semantically useless to users, if it's just read from a directory, then used to manipulate file contents, no problem. Of course if you then edit a multibyte file name as Unicode it is likely that all hell will break loose. But you can take that sentence and s/Unicode/bytes/, too. :-/ > The problem here is the protocol that Python uses to return bytes paths, > and that protocol is inconsistent between APIs and information is lost. No, the problem is that the necessary information simply isn't always available. Not even today: think removable media, especially archival content. Also network file systems: I don't know if it still happens, but I've seen Shift JIS, GB2312, and KOI8-R all in the same directory, and sometimes two of those in the *same path*. (Don't ask me how non-malicious users managed to do the latter!) > It really requires going through all the OS calls and either (a) making > them consistently decode bytes to str using the declared FS encoding > (currently 'mbcs', but I see no reason we can't make it 'utf_8'), If it were that easy, it would have been done two decades ago. I'm no fan of Windows[1], but it's obvious that Microsoft has devoted enormous amounts of brainpower to the problem of encoding rationalization since the early 90s. I don't think they would have missed this idea. Footnotes: [1] Its complete inability to DTRT for mixed English and Japanese was what drove me to Unix-like OSes in the early 90s. From python at stevedower.id.au Tue Feb 9 23:40:17 2016 From: python at stevedower.id.au (Steve Dower) Date: Tue, 9 Feb 2016 20:40:17 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> Message-ID: <56BABF31.3030005@python.org> On 09Feb2016 2017, Stephen J. Turnbull wrote: > > The problem here is the protocol that Python uses to return bytes paths, > > and that protocol is inconsistent between APIs and information is lost. > > No, the problem is that the necessary information simply isn't always > available. Not even today: think removable media, especially archival > content. Also network file systems: I don't know if it still happens, > but I've seen Shift JIS, GB2312, and KOI8-R all in the same directory, > and sometimes two of those in the *same path*. (Don't ask me how > non-malicious users managed to do the latter!) But if we return bytes paths and the user passes them back in unchanged, that should be irrelevant. The earlier issue was that that doesn't work (e.g. a bytes path from os.scandir couldn't be passed back into open()). > > It really requires going through all the OS calls and either (a) making > > them consistently decode bytes to str using the declared FS encoding > > (currently 'mbcs', but I see no reason we can't make it 'utf_8'), > > If it were that easy, it would have been done two decades ago. I'm no > fan of Windows[1], but it's obvious that Microsoft has devoted > enormous amounts of brainpower to the problem of encoding > rationalization since the early 90s. I don't think they would have > missed this idea. I meant with Python's calls into the API. Anywhere Python does the conversion from bytes to LPCWSTR (the UTF-16 type) there's a chance it'll be wrong. Your earlier comments (regarding encoding/decoding to/from Unicode, which I didn't have anything valuable to add to) basically reflect the fact that developers need to treat bytes paths as blobs on all platforms and the core Python runtime needs to obtain and use them consistently. Which means *always* using the Win32 *A APIs and never doing a conversion ourselves. Microsoft's solution here is the user's active code page, much like *nix's solution as I understand it, except that where *nix will convert _to_ the encoding as a normalized form, Windows will convert _from_ the encoding to its UTF-16 "normalized" form. Back-compat concerns have prevented any significant changes being made here, otherwise there wouldn't be a 'bytes' interface at all. (Or more likely, everything would be UTF-8 based, but back-compat is king in Windows-land.) Cheers, Steve From victor.stinner at gmail.com Wed Feb 10 02:08:33 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 10 Feb 2016 08:08:33 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? Message-ID: Le mercredi 10 f?vrier 2016, Steve Dower a ?crit : > > I really don't like the idea of not being able to use bytes in cross > platform code. Unless it's become feasible to use Unicode for lossless > filenames on Linux - last I heard it wasn't. > The point of my email is that even on Python 3, users kept bad habits because of Python 2. *Yes*, you can use Unicode filenames on all platforms on Python 3 since 2009 thanks to the following PEP: https://www.python.org/dev/peps/pep-0383/ In my first email, I mentioned a bug report of an user still using bytes filenames on Windows with Python 3. It is on the Blender project which *only* supports Python 3. Or maybe I missed something huge which really force Blender to use bytes??? But if a few functions still require bytes, I would suggest to use instead os.fsencode() for them. It's more much convenient to handle filenames as Unicode on Python 3. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Feb 10 02:56:25 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Feb 2016 23:56:25 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> Message-ID: On Feb 9, 2016, at 20:17, Stephen J. Turnbull wrote: >> It really requires going through all the OS calls and either (a) making >> them consistently decode bytes to str using the declared FS encoding >> (currently 'mbcs', but I see no reason we can't make it 'utf_8'), > > If it were that easy, it would have been done two decades ago. I'm no > fan of Windows[1], but it's obvious that Microsoft has devoted > enormous amounts of brainpower to the problem of encoding > rationalization since the early 90s. I don't think they would have > missed this idea. Microsoft spent a lot of time and effort on the idea that UTF-16 (or, originally, UCS-2) everywhere was the answer. Never call the A functions (or the msvcrt functions that emulate the C and POSIX stdlib), and there's never a problem. What if you read filenames out of a text file? No problem; text files are UTF-16-BOM. Over a socket? All network protocols are also UTF-16. What if you have to read a file written in Unix? Come on, nobody's ever created a useful file without Windows. What about Windows 3.1? Uh... that's a problem. Also, what happens when Unicode goes over 64k characters? And so on. So their grand project failed. That doesn't mean the problem can't be solved. Apple solved their equivalent problem, albeit by sacrificing backward compatibility in a way Microsoft can't get away with. I haven't seen a MacRoman or Shift-JIS filename since they broke the last holdout (the low-level AppleEvent interface) in 10.7--and most of the apps I was using back then don't run on 10.10 without an update. So Python 2 works great on Macs, whether you use bytes or unicode. But that doesn't help us on Windows, where you can't use bytes, or Linux, where you can't use Unicode (without surrogate escape or some other mechanism that Python 2 doesn't have). From stephen at xemacs.org Wed Feb 10 03:00:17 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 10 Feb 2016 17:00:17 +0900 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <56BABF31.3030005@python.org> References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> <56BABF31.3030005@python.org> Message-ID: <22202.60945.71836.106523@turnbull.sk.tsukuba.ac.jp> Executive summary: Code pages and POSIX locales aren't solutions, they're the Original Sin. Steve Dower writes: > On 09Feb2016 2017, Stephen J. Turnbull wrote: > > > The problem here is the protocol that Python uses to return > > > bytes paths, and that protocol is inconsistent between APIs > > > and information is lost. > > > > No, the problem is that the necessary information simply isn't always > > available. > > But if we return bytes paths and the user passes them back in unchanged, > that should be irrelevant. Yes. That's pretty much exactly the semantics of using the latin-1 codec. UTF-8 can't do that without surrogateescape, which Python 2 lacks. > The earlier issue was that that doesn't work (e.g. a bytes path > from os.scandir couldn't be passed back into open()). My purely-from-the-user-side take is that that's just a bug in os.scandir that should be fixed, and that even though the complexity that occasions such bugs is an undesirable aspect of Python (v2) programming, it's not a bug because it *can't* be fixed -- you have to fix the world, not Python. Or switch to Python 3. I don't know enough to have an opinion on whether "fixing" os.scandir could cause other problems. > I meant with Python's calls into the API. Anywhere Python does the > conversion from bytes to LPCWSTR (the UTF-16 type) there's a chance > it'll be wrong. Indeed. That's why converting the bytes is often the wrong thing to do *period*. The reasons that Python might be wrong apply to every agent that might decide the conversion -- except the user; the user is never wrong about these things. > Microsoft's solution here is the user's active code page, much like > *nix's solution as I understand it, except that where *nix will convert > _to_ the encoding as a normalized form, Windows will convert _from_ the > encoding to its UTF-16 "normalized" form. Not quite accurate. Unix by original design doesn't *have* a normalized form.[1] Bytez-iz-bytez-R-Us, that's Unix. Recently everybody (except for a few nationalist lunatics and the unteachables in some legislatures) has learned that some form of Unicode is the way to go internally. But that's "best practice", not POSIX requirement, and tons of software continues to operate[2] based on the assumption that users are monolingual with a canonical one-byte encoding, so it doesn't matter as long as *no conversion is ever done*, and the input methods and fonts are consistent with each other. Code pages just try to *enforce* that constraint (and as I already mentioned, that pissed me off so much in 1990 that I'm still a Windows refusenik today). > Back-compat concerns have prevented any significant changes being > made here, otherwise there wouldn't be a 'bytes' interface at > all. It's not just back-compat, it's absolutely necessary in a code-page- based world because you just can't be sure what encoding your content is in until the user tells you the crap you've spewed on her screen might be Klingon, but it's not any of the 7 human languages she knows. "Toto! I don't think we're in Kansas any more...." The fact is that code-page-based content continues to be produced in significant quantities, despite the universal availability and absolute superiority (except for workstation reconfiguration costs) of Unicode. Footnotes: [1] The POSIX locale selects encodings for console input and output. File I/O is just bytes, both the content and the file name. The code page also defines the file name encoding as I understand it. [2] I would hope that nobody is *writing* software like that any more, but I live in Japan. That hope is years in the future for me. From p.f.moore at gmail.com Wed Feb 10 03:30:24 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Feb 2016 08:30:24 +0000 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <22202.60945.71836.106523@turnbull.sk.tsukuba.ac.jp> References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> <56BABF31.3030005@python.org> <22202.60945.71836.106523@turnbull.sk.tsukuba.ac.jp> Message-ID: On 10 February 2016 at 08:00, Stephen J. Turnbull wrote: >> The earlier issue was that that doesn't work (e.g. a bytes path > > from os.scandir couldn't be passed back into open()). > > My purely-from-the-user-side take is that that's just a bug in > os.scandir that should be fixed, and that even though the complexity > that occasions such bugs is an undesirable aspect of Python (v2) > programming, it's not a bug because it *can't* be fixed -- you have to > fix the world, not Python. Or switch to Python 3. > > I don't know enough to have an opinion on whether "fixing" os.scandir > could cause other problems. The original os.scandir issue was encountered on Python 3. And I do agree with Victor that the correct answer was to point out to the user that they should be using unicode/surrogateescape. What I disagree with is mandating that (by removing the bytes interface) on anything other than all platforms at once, because that doesn't remove the problem (of coders using the wrong approach on Python 3) it just makes the code such users write non-portable. Whether removing the bytes interface is feasible, given that there's then no way that works across Python 2 and 3 of writing code that manipulates the sort of bytes-that-use-multiple-encodings data that you mention, is a separate issue. Paul From victor.stinner at gmail.com Wed Feb 10 03:45:38 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 10 Feb 2016 09:45:38 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> <56BABF31.3030005@python.org> <22202.60945.71836.106523@turnbull.sk.tsukuba.ac.jp> Message-ID: 2016-02-10 9:30 GMT+01:00 Paul Moore : > Whether removing the bytes interface is feasible, given that there's > then no way that works across Python 2 and 3 of writing code that > manipulates the sort of bytes-that-use-multiple-encodings data that > you mention, is a separate issue. It's annoying that 8 years after the release of Python 3.0, Python 3 is still stuck by Python 2 :-( Victor From phil at riverbankcomputing.com Wed Feb 10 04:11:00 2016 From: phil at riverbankcomputing.com (Phil Thompson) Date: Wed, 10 Feb 2016 09:11:00 +0000 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> Message-ID: <21AD5FE9-CA0A-4E8A-B8F5-ED7144DC81DA@riverbankcomputing.com> > On 9 Feb 2016, at 11:48 pm, Guido van Rossum wrote: > > [Phil] >>>> I found the documentation confusing regarding Optional. Intuitively it seems to be the way to specify arguments with default values. However it is explained in terms of (for example) Union[str, None] and I (intuitively but incorrectly) read that as meaning "a str or None" as opposed to "a str or nothing". > [me] >>> But it *does* mean 'str or None'. The *type* of an argument doesn't >>> have any bearing on whether it may be omitted from the argument list >>> by the caller -- these are orthogonal concepts (though sadly the word >>> optional might apply to both). It's possible (though unusual) to have >>> an optional argument that must be a str when given; it's also possible >>> to have a mandatory argument that may be a str or None. > [Phil] >> In the case of Python wrappers around a C++ library then *every* optional argument will have to have a specific type when given. > > IIUC you're saying that every argument that may be omitted must still > have a definite type other than None. Right? In that case just don't > use Optional[]. If a signature has the form > > def foo(a: str = 'xyz') -> str: ... > > then this means that str may be omitted or it may be a str -- you > cannot call foo(a=None). > > You can even (in a stub file) write this as: > > def foo(a: str = ...) -> str: ... > > (literal '...' i.e. ellipsis) if you don't want to commit to a > specific default value (it makes no difference to mypy). > >> So you are saying that a mandatory argument that may be a str or None would be specified as Union[str, None]? > > Or as Optional[str], which means the same. > >> But the docs say that that is the underlying implementation of Option[str] - which (to me) means an optional argument that should be a string when given. > > (Assuming you meant Option*al*.) There seems to be an utter confusion > of the two uses of the term "optional" here. An "optional argument" > (outside PEP 484) is one that has a default value. The "Optional[T]" > notation in PEP 484 means "Union[T, None]". They mean different > things. > >>> Can you help improve the wording in the docs (preferably by filing an issue)? >> >> When I eventually understand what it means... I understand now. The documentation, as it stands, is correct and consistent but (to me) the meaning of Optional is completely counter-intuitive. What you suggest with str = ... is exactly what I need. Adding a section to the docs describing that should clear up the confusion. Thanks, Phil From p.f.moore at gmail.com Wed Feb 10 04:28:07 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Feb 2016 09:28:07 +0000 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> <56BABF31.3030005@python.org> <22202.60945.71836.106523@turnbull.sk.tsukuba.ac.jp> Message-ID: On 10 February 2016 at 08:45, Victor Stinner wrote: > 2016-02-10 9:30 GMT+01:00 Paul Moore : >> Whether removing the bytes interface is feasible, given that there's >> then no way that works across Python 2 and 3 of writing code that >> manipulates the sort of bytes-that-use-multiple-encodings data that >> you mention, is a separate issue. > > It's annoying that 8 years after the release of Python 3.0, Python 3 > is still stuck by Python 2 :-( Agreed. Of course personally, I'm in favour of going Python 3/Unicode everywhere, it's the Unix guys with their legacy distros and Python installations and bytes-based filesystems that get in the way of that :-) And I don't think we're brave enough to force *Unix* users to use the right type for filenames :-) Paul From abarnert at yahoo.com Wed Feb 10 04:50:16 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Feb 2016 09:50:16 +0000 (UTC) Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: Message-ID: <1161374091.1565135.1455097816642.JavaMail.yahoo@mail.yahoo.com> On Wednesday, February 10, 2016 12:47 AM, Victor Stinner wrote: > > 2016-02-10 9:30 GMT+01:00 Paul Moore : >> Whether removing the bytes interface is feasible, given that there's >> then no way that works across Python 2 and 3 of writing code that >> manipulates the sort of bytes-that-use-multiple-encodings data that >> you mention, is a separate issue. Well, there's a surrogate-escape backport on PyPI (I think there's a standalone one, and one in python-future), so you _could_ do everything the same as in 3.x. Depending on what you're doing, you may also need to use the io module instead of file (which may just mean "from io import open", but could mean more work), wrap the stdio streams explicitly, manually decode argv, etc. But someone could write a six-like module (or add it to six) that does all of that. It may be a little slower and more memory-intensive in 2.7 than in 3.x, but for most apps, that doesn't matter. The big problem would be third-party libraries (and stdlib modules like csv) that want to use bytes in 2.x; convincing them all to support full-on-unicode in 2.x might be more trouble than it's worth. Still, if I were feeling the pain of maintaining lots of linux-bytes-Windows-unicode-2.7 code, I'd try it and see how far I get. > It's annoying that 8 years after the release of Python 3.0, Python 3 > is still stuck by Python 2 :-( I understand the frustration, but... time already goes too fast at my age; don't skip me ahead almost a whole year to December 2016. :) Also, unless you're the one guy who actually abandoned 2.6 for 3.0, it's probably more useful to count from 2.7, 3.2, or the no-2.8 declaration, which are all about 5 years ago. From steve at pearwood.info Wed Feb 10 05:18:15 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 10 Feb 2016 21:18:15 +1100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: Message-ID: <20160210101813.GF31806@ando.pearwood.info> On Wed, Feb 10, 2016 at 12:41:08PM +1100, Chris Angelico wrote: > On Wed, Feb 10, 2016 at 12:37 PM, Steve Dower wrote: > > I really don't like the idea of not being able to use bytes in cross > > platform code. Unless it's become feasible to use Unicode for lossless > > filenames on Linux - last I heard it wasn't. > > It has, but only in Python 3 - anyone who needs to support 2.7 and > arbitrary bytes in filenames can't use Unicode strings. Are you sure? Unless I'm confused, which I may be, I don't think you can specify file names with arbitrary bytes in Python 3. Writing, and reading, filenames including odd bytes works in Python 2.7: [steve at ando ~]$ python -c 'open("/tmp/abc\xD8\x01", "w").write("Hello World\n")' [steve at ando ~]$ ls /tmp/abc* /tmp/abc?? [steve at ando ~]$ python -c 'print open("/tmp/abc\xD8\x01", "r").read()' Hello World [steve at ando ~]$ And I can read the file using bytes in Python 3: [steve at ando ~]$ python3.3 -c 'print(open(b"/tmp/abc\xD8\x01", "r").read())' Hello World [steve at ando ~]$ But Unicode fails: [steve at ando ~]$ python3.3 -c 'print(open("/tmp/abc\xD8\x01", "r").read())' Traceback (most recent call last): File "", line 1, in FileNotFoundError: [Errno 2] No such file or directory: '/tmp/abc?\x01' What Unicode string does one need to give in order to open file b"/tmp/abc\xD8\x01"? I think one would need to find a valid unicode string which, when encoded to UTF-8, gives the byte sequence \xD8\x01, but since that's half of a surrogate pair it is an illegal UTF-8 byte sequence. So I don't think it can be done. Am I mistaken? -- Steve From victor.stinner at gmail.com Wed Feb 10 05:37:58 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 10 Feb 2016 11:37:58 +0100 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <20160210101813.GF31806@ando.pearwood.info> References: <20160210101813.GF31806@ando.pearwood.info> Message-ID: 2016-02-10 11:18 GMT+01:00 Steven D'Aprano : > [steve at ando ~]$ python3.3 -c 'print(open(b"/tmp/abc\xD8\x01", "r").read())' > Hello World > > [steve at ando ~]$ python3.3 -c 'print(open("/tmp/abc\xD8\x01", "r").read())' > Traceback (most recent call last): > File "", line 1, in > FileNotFoundError: [Errno 2] No such file or directory: '/tmp/abc?\x01' > > What Unicode string does one need to give in order to open file > b"/tmp/abc\xD8\x01"? Use os.fsdecode(b"/tmp/abc\xD8\x01") to get the filename as an Unicode string, it will work. Removing 'b' in front of byte strings is not enough to convert an arbitrary byte strings to Unicode :-D Encodings are more complex than that... See http://unicodebook.readthedocs.org/ The problem on Python 2 is that the UTF-8 encoders encode surrogate characters, which is wrong. You cannot use an error handler to choose how to handle these surrogate characters. On Python 3, you have a wide choice of builtin error handlers, and you can even write your own error handlers. Example with Python 3.6 and its new "namereplace" error handler. >>> def format_filename(filename, encoding='ascii', errors='backslashreplace'): ... return filename.encode(encoding, errors).decode(encoding) ... >>> print(format_filename(os.fsdecode(b'abc\xff'))) abc\udcff >>> print(format_filename(os.fsdecode(b'abc\xff'), errors='replace')) abc? >>> print(format_filename(os.fsdecode(b'abc\xff'), errors='ignore')) abc >>> print(format_filename(os.fsdecode(b'abc\xff') + "?", errors='namereplace')) abc\udcff\N{LATIN SMALL LETTER E WITH ACUTE} My locale encoding is UTF-8. Victor From storchaka at gmail.com Wed Feb 10 07:41:30 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 10 Feb 2016 14:41:30 +0200 Subject: [Python-Dev] Windows: Remove support of bytes filenames in the os module? In-Reply-To: References: Message-ID: On 08.02.16 16:32, Victor Stinner wrote: > On Python 2, it wasn't possible to use Unicode for filenames, many > functions fail badly with Unicode, especially when you mix bytes and > Unicode. Even not all os functions support Unicode. See http://bugs.python.org/issue18695. From ncoghlan at gmail.com Wed Feb 10 07:47:48 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Feb 2016 22:47:48 +1000 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: Message-ID: On 10 February 2016 at 06:54, Guido van Rossum wrote: > [Just adding to Andrew's response] > > On Tue, Feb 9, 2016 at 9:58 AM, Andrew Barnert via Python-Dev > wrote: >> On Feb 9, 2016, at 03:44, Phil Thompson wrote: >>> >>> There are a number of things I'd like to express but cannot find a way to do so... >>> >>> - objects that implement the buffer protocol >> >> That seems like it should be filed as a bug with the typing repo. Presumably this is just an empty type that registers bytes, bytearray, and memoryview, and third-party classes have to register with it manually? > > Hm, there's no way to talk about these in regular Python code either, > is there? I think that issue should be resolved first. Probably by > adding something to collections.abc. And then we can add the > corresponding name to typing.py. This will take time though (have to > wait for 3.6) so I'd recommend 'Any' for now (and filing those bugs). Somewhat related, there's actually no way to export PEP 3118 buffers directly from a type implemented in Python: http://bugs.python.org/issue13797 Cython and PyPy each have their own approach to handling that, but there's no language level cross-interpreter convention A type (e.g. BytesLike, given the change we made to relevant error messages) could still be added to collections.abc without addressing that problem, it would just need to be empty and used only for explicit registration without any structural typing support. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Wed Feb 10 09:50:21 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 10 Feb 2016 23:50:21 +0900 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> Message-ID: <22203.20013.518404.39381@turnbull.sk.tsukuba.ac.jp> Andrew Barnert via Python-Dev writes: > That doesn't mean the problem can't be solved. Apple solved their > equivalent problem, albeit by sacrificing backward compatibility in > a way Microsoft can't get away with. I haven't seen a MacRoman or > Shift-JIS filename since they broke the last holdout If you lived where I do, you'd still be seeing both, because you wouldn't be able to escape archival files on CD and removable media (typically written on Windows boxen). They still work, sort of == same as always, and as far as I know, that's because Apple has *not* sacrificed backward compatibility: under the hood, Darwin is still a POSIX kernel which thinks of file names and everything else outside of memory as bytestreams. One place they *fail very badly* is Shift JIS filenames in zipfiles, which nothing provided by Apple can deal with safely, and InfoZip breaks too (at least in MacPorts). Yes, I know that is specifically disallowed. Feel free to tell 1_0000_0000 Japanese Windows users. Thank heaven for Python there! A three-line hack and I'm free! > So Python 2 works great on Macs, whether you use bytes or > unicode. But that doesn't help us on Windows, where you can't use > bytes, or Linux, where you can't use Unicode (without surrogate > escape or some other mechanism that Python 2 doesn't have). You contradict yourself! ;-) From stephen at xemacs.org Wed Feb 10 09:51:57 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 10 Feb 2016 23:51:57 +0900 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: <56BAA3A5.7070604@python.org> <22202.47596.877974.939792@turnbull.sk.tsukuba.ac.jp> <56BABF31.3030005@python.org> <22202.60945.71836.106523@turnbull.sk.tsukuba.ac.jp> Message-ID: <22203.20109.683214.199505@turnbull.sk.tsukuba.ac.jp> Victor Stinner writes: > It's annoying that 8 years after the release of Python 3.0, Python 3 > is still stuck by Python 2 :-( I prefer to think of it as the irritant that reminds me that I am very much alive, and so is Python, vibrantly so. From guido at python.org Wed Feb 10 12:52:23 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Feb 2016 09:52:23 -0800 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: <21AD5FE9-CA0A-4E8A-B8F5-ED7144DC81DA@riverbankcomputing.com> References: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> <21AD5FE9-CA0A-4E8A-B8F5-ED7144DC81DA@riverbankcomputing.com> Message-ID: On Wed, Feb 10, 2016 at 1:11 AM, Phil Thompson wrote: > I understand now. The documentation, as it stands, is correct and consistent but (to me) the meaning of Optional is completely counter-intuitive. What you suggest with str = ... is exactly what I need. Adding a section to the docs describing that should clear up the confusion. I tried to add some clarity to the docs with this paragraph: Note that this is not the same concept as an optional argument, which is one that has a default. An optional argument with a default needn't use the ``Optional`` qualifier on its type annotation (although it is inferred if the default is ``None``). A mandatory argument may still have an ``Optional`` type if an explicit value of ``None`` is allowed. Should be live on docs.python.org with the next push (I don't recall the delay, at most a day IIRC). -- --Guido van Rossum (python.org/~guido) From phil at riverbankcomputing.com Wed Feb 10 13:01:55 2016 From: phil at riverbankcomputing.com (Phil Thompson) Date: Wed, 10 Feb 2016 18:01:55 +0000 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> <21AD5FE9-CA0A-4E8A-B8F5-ED7144DC81DA@riverbankcomputing.com> Message-ID: On 10 Feb 2016, at 5:52 pm, Guido van Rossum wrote: > > On Wed, Feb 10, 2016 at 1:11 AM, Phil Thompson > wrote: >> I understand now. The documentation, as it stands, is correct and consistent but (to me) the meaning of Optional is completely counter-intuitive. What you suggest with str = ... is exactly what I need. Adding a section to the docs describing that should clear up the confusion. > > I tried to add some clarity to the docs with this paragraph: > > Note that this is not the same concept as an optional argument, > which is one that has a default. An optional argument with a > default needn't use the ``Optional`` qualifier on its type > annotation (although it is inferred if the default is ``None``). > A mandatory argument may still have an ``Optional`` type if an > explicit value of ``None`` is allowed. > > Should be live on docs.python.org with the next push (I don't recall > the delay, at most a day IIRC). That should do it, thanks. A followup question... Is... def foo(bar: str = Optional[str]) ...valid? In other words, bar can be omitted, but if specified must be a str or None? Thanks, Phil From guido at python.org Wed Feb 10 13:15:02 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Feb 2016 10:15:02 -0800 Subject: [Python-Dev] Experiences with Creating PEP 484 Stub Files In-Reply-To: References: <896E1AC7-85F0-4F24-8B8D-CE2368FD03E1@riverbankcomputing.com> <21AD5FE9-CA0A-4E8A-B8F5-ED7144DC81DA@riverbankcomputing.com> Message-ID: On Wed, Feb 10, 2016 at 10:01 AM, Phil Thompson wrote: > On 10 Feb 2016, at 5:52 pm, Guido van Rossum wrote: [...] > That should do it, thanks. A followup question... > > Is... > > def foo(bar: str = Optional[str]) > > ...valid? In other words, bar can be omitted, but if specified must be a str or None? The syntax you gave makes no sense (the default value shouldn't be a type) but to do what your words describe you can do def foo(bar: Optional[str] = ...): ... That's literally what you would put in the stub file (the ... are literal ellipses). In a .py file you'd have to specify a concrete default value. If your concrete default is neither str nor None you'd have to use cast(str, default_value), e.g. _NO_VALUE = object() # marker def foo(bar: Optional[str] = cast(str, _NO_VALUE)): ...implementation... Now the implementation can distinguish between foo(), foo(None) and foo(''). -- --Guido van Rossum (python.org/~guido) From abarnert at yahoo.com Wed Feb 10 15:30:33 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Feb 2016 20:30:33 +0000 (UTC) Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <22203.20013.518404.39381@turnbull.sk.tsukuba.ac.jp> References: <22203.20013.518404.39381@turnbull.sk.tsukuba.ac.jp> Message-ID: <1844729669.1825953.1455136233223.JavaMail.yahoo@mail.yahoo.com> On Wednesday, February 10, 2016 6:50 AM, Stephen J. Turnbull wrote: > Andrew Barnert via Python-Dev writes: > >> That doesn't mean the problem can't be solved. Apple solved their >> equivalent problem, albeit by sacrificing backward compatibility in >> a way Microsoft can't get away with. I haven't seen a MacRoman or >> Shift-JIS filename since they broke the last holdout > > If you lived where I do, you'd still be seeing both, because you > wouldn't be able to escape archival files on CD and removable media > (typically written on Windows boxen). They still work, sort of == > same as always, and as far as I know, that's because Apple has *not* > sacrificed backward compatibility: under the hood, Darwin is still a > POSIX kernel which thinks of file names and everything else outside of > memory as bytestreams. Sure, but the Darwin kernel can't read CDs; that's up to the CD filesystem driver. Anyway, Windows CDs can't cause this problem. Windows CDs use the Joliet filesystem,[^1] which stores everything in UCS2.[^2] When you call CreateFileA or fopen or _open with bytes, Windows decodes those bytes and stores them as UCS2. The filesystem drivers on POSIX platforms have to encode that UCS2 to _something_ (POSIX APIs make it very hard for you to deal with filename strings like "A\0B\0C\0.\0T\0X\0T\0\0\0"...). The linux driver uses a mount option to decide how to encode; the OS X driver always uses UTF-8. And every valid UCS2 string can be encoded as UTF-8, so you can use unicode everywhere, even in Python 2. Of course you can have mojibake problems, but that's a different issue,[^3] and no worse with unicode than with bytes.[^4] The same thing is true with NTFS external drives, VFAT USB drives, etc. Generally, it's usually not Windows media on *nix systems that break Python 2 unicode; it's native *nix filesystems where users mix locales. > One place they *fail very badly* is Shift JIS filenames in zipfiles, > which nothing provided by Apple can deal with safely, and InfoZip > breaks too (at least in MacPorts). Yes, I know that is specifically > disallowed. Feel free to tell 1_0000_0000 Japanese Windows users. The good news is, as far as I can tell, it's not disallowed anymore.[^5] So we just have to tell them that they shouldn't have been doing it in the past. :) Anyway, zipfiles are data files as far as the OS is concerned; the fact that they contain filenames is no more relevant to the kernel (or filesystem driver or userland) than the fact that "List of PDFs to Read This Weekend.txt" contains filenames. PS, everything Apple provides is already using Info-ZIP. >> So Python 2 works great on Macs, whether you use bytes or >> unicode. But that doesn't help us on Windows, where you can't use >> bytes, or Linux, where you can't use Unicode (without surrogate >> escape or some other mechanism that Python 2 doesn't have). > > You contradict yourself! ;-) Yes, as I later realized, sometimes, you _can_ (or at least ought to be able to--I haven't actually tried) use Python 2 with unicode everywhere to write cross-platform software that actually works on linux, by using backports of surrogate-escape and pathlib, and the io module instead of the file type, as long as you only need stdlib and third-party modules that support unicode filenames. If that does work for at least some apps, then I'm perfectly happen to have been wrong earlier. And if catching myself before someone else did makes me a flip-flopper, well, I'm not running for president. :P [^1]: Except when Vista and 7 mistakenly think your CD is a DVD and use UDF instead of ISO9660--but in that case, the encoding is stored in the filesystem header, so it's also not a problem. [^2]: Actually, despite Microsoft's spec, later versions of Windows store UTF-16, even if there are surrogate pairs, or BMP-but-post-UCS2 code points. But that doesn't matter here; the linux, Mac, etc. drivers all assume UTF-16, which works either way. [^3]: Say you write a program that assumes it will only be run on Shift-JIS systems, and you use CreateFileA to create a file named "???????". The actual bytes you're sending are cp436 for "?n???[???[???h", so the file on the CD is named, in Unicode, "?n???[???[???h". So of course the Mac driver encodes that to UTF-8 b"?n???[???[???h". You won't have any problems opening what you readdir, or what you copy from a UTF-8 terminal or a UTF-16 Cocoa app like Finder, etc. But of course you will have trouble getting your user to recognize that name as meaningful, unless you can figure out or guess or prompt the user to guess that it needs to be passed through s.encode('cp436').decode('shift-jis'). [^4]: Your locale is always UTF-8 on Mac. So the only significant difference is that if you're using bytes, you need b.decode('utf-8').encode('cp436').decode('shift-jis') to fix the problem. [^5]: Zipfiles using the Unicode extension can store a UTF-8 transcoding along with the local bytes, in which case the local bytes do not have to be in the header-declared encoding, because they will be ignored. And I think everything Microsoft ships now handles this properly. And Info-ZIP, and therefore all of Apple's tools, also handle it properly--so, not only is it legal, it even works. From succer110 at tiscali.it Wed Feb 10 16:59:18 2016 From: succer110 at tiscali.it (Luca Sangiacomo) Date: Wed, 10 Feb 2016 22:59:18 +0100 Subject: [Python-Dev] why we have both re.match and re.string? Message-ID: <56BBB2B6.7030107@tiscali.it> Hi, I hope the question is not too silly, but why I would like to understand the advantages of having both re.match() and re.search(). Wouldn't be more clear to have just one function with one additional parameters like this: re.search(regexp, text, from_beginning=True|False) ? In this way we prevent, as written in the documentation, people writing ".*" in front of the regexp used with re.match() Thanks. From g.brandl at gmx.net Wed Feb 10 17:20:38 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 10 Feb 2016 23:20:38 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals Message-ID: This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion. cheers, Georg -------------------------------------------------------------------------------- PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6 Abstract and Rationale ====================== This PEP proposes to extend Python's syntax so that underscores can be used in integral and floating-point number literals. This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation. Examples:: # grouping decimal numbers by thousands amount = 10_000_000.0 # grouping hexadecimal addresses by words addr = 0xDEAD_BEEF # grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110 Specification ============= The current proposal is to allow underscores anywhere in numeric literals, with these exceptions: * Leading underscores cannot be allowed, since they already introduce identifiers. * Trailing underscores are not allowed, because they look confusing and don't contribute much to readability. * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up, because they are fixed strings and not logically part of the number. * No underscore allowed after a sign in an exponent (``1e-_5``), because underscores can also not be used after the signs in front of the number (``-1e5``). * No underscore allowed after a decimal point, because this leads to ambiguity with attribute access (the lexer cannot know that there is no number literal in ``foo._5``). There appears to be no reason to restrict the use of underscores otherwise. The production list for integer literals would therefore look like this:: integer: decimalinteger | octinteger | hexinteger | bininteger decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] nonzerodigit: "1"..."9" decimalrest: (digit | "_")* digit digit: "0"..."9" octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" bindigit: "0" | "1" For floating-point literals:: floatnumber: pointfloat | exponentfloat pointfloat: [intpart] fraction | intpart "." exponentfloat: (intpart | pointfloat) exponent intpart: digit (digit | "_")* fraction: "." intpart exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest] Alternative Syntax ================== Underscore Placement Rules -------------------------- Instead of the liberal rule specified above, the use of underscores could be limited. Common rules are (see the "other languages" section): * Only one consecutive underscore allowed, and only between digits. * Multiple consecutive underscore allowed, but only between digits. Different Separators -------------------- A proposed alternate syntax was to use whitespace for grouping. Although strings are a precedent for combining adjoining literals, the behavior can lead to unexpected effects which are not possible with underscores. Also, no other language is known to use this rule, except for languages that generally disregard any whitespace. C++14 introduces apostrophes for grouping, which is not considered due to the conflict with Python's string literals. [1]_ Behavior in Other Languages =========================== Those languages that do allow underscore grouping implement a large variety of rules for allowed placement of underscores. This is a listing placing the known rules into three major groups. In cases where the language spec contradicts the actual behavior, the actual behavior is listed. **Group 1: liberal (like this PEP)** * D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust [4]_ * Swift (although textual description says "between digits") [5]_ **Group 2: only between digits, multiple consecutive underscores** * C# (open proposal for 7.0) [6]_ * Java [7]_ **Group 3: only between digits, only one underscore** * Ada [8]_ * Julia (but not in the exponent part of floats) [9]_ * Ruby (docs say "anywhere", in reality only between digits) [10]_ Implementation ============== A preliminary patch that implements the specification given above has been posted to the issue tracker. [11]_ References ========== .. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html .. [2] http://dlang.org/spec/lex.html#integerliteral .. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors .. [4] http://doc.rust-lang.org/reference.html#number-literals .. [5] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html .. [6] https://github.com/dotnet/roslyn/issues/216 .. [7] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html .. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4 .. [9] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/ .. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers .. [11] http://bugs.python.org/issue26331 Copyright ========= This document has been placed in the public domain. From brett at python.org Wed Feb 10 17:35:00 2016 From: brett at python.org (Brett Cannon) Date: Wed, 10 Feb 2016 22:35:00 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On Wed, 10 Feb 2016 at 14:21 Georg Brandl wrote: > This came up in python-ideas, and has met mostly positive comments, > although the exact syntax rules are up for discussion. > > cheers, > Georg > > > -------------------------------------------------------------------------------- > > PEP: 515 > Title: Underscores in Numeric Literals > Version: $Revision$ > Last-Modified: $Date$ > Author: Georg Brandl > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 10-Feb-2016 > Python-Version: 3.6 > > Abstract and Rationale > ====================== > > This PEP proposes to extend Python's syntax so that underscores can be > used in > integral and floating-point number literals. > > This is a common feature of other modern languages, and can aid > readability of > long literals, or literals whose value should clearly separate into parts, > such > as bytes or words in hexadecimal notation. > > Examples:: > > # grouping decimal numbers by thousands > amount = 10_000_000.0 > > # grouping hexadecimal addresses by words > addr = 0xDEAD_BEEF > > # grouping bits into bytes in a binary literal > flags = 0b_0011_1111_0100_1110 > I assume all of these examples are possible in either the liberal or restrictive approaches? > > > Specification > ============= > > The current proposal is to allow underscores anywhere in numeric literals, > with > these exceptions: > > * Leading underscores cannot be allowed, since they already introduce > identifiers. > * Trailing underscores are not allowed, because they look confusing and > don't > contribute much to readability. > * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up, > because they are fixed strings and not logically part of the number. > * No underscore allowed after a sign in an exponent (``1e-_5``), because > underscores can also not be used after the signs in front of the number > (``-1e5``). > * No underscore allowed after a decimal point, because this leads to > ambiguity > with attribute access (the lexer cannot know that there is no number > literal > in ``foo._5``). > > There appears to be no reason to restrict the use of underscores otherwise. > > The production list for integer literals would therefore look like this:: > > integer: decimalinteger | octinteger | hexinteger | bininteger > decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] > nonzerodigit: "1"..."9" > decimalrest: (digit | "_")* digit > digit: "0"..."9" > octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit > hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit > bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit > octdigit: "0"..."7" > hexdigit: digit | "a"..."f" | "A"..."F" > bindigit: "0" | "1" > > For floating-point literals:: > > floatnumber: pointfloat | exponentfloat > pointfloat: [intpart] fraction | intpart "." > exponentfloat: (intpart | pointfloat) exponent > intpart: digit (digit | "_")* > fraction: "." intpart > exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest] > > > Alternative Syntax > ================== > > Underscore Placement Rules > -------------------------- > > Instead of the liberal rule specified above, the use of underscores could > be > limited. Common rules are (see the "other languages" section): > > * Only one consecutive underscore allowed, and only between digits. > * Multiple consecutive underscore allowed, but only between digits. > > Different Separators > -------------------- > > A proposed alternate syntax was to use whitespace for grouping. Although > strings are a precedent for combining adjoining literals, the behavior can > lead > to unexpected effects which are not possible with underscores. Also, no > other > language is known to use this rule, except for languages that generally > disregard any whitespace. > > C++14 introduces apostrophes for grouping, which is not considered due to > the > conflict with Python's string literals. [1]_ > > > Behavior in Other Languages > =========================== > > Those languages that do allow underscore grouping implement a large > variety of > rules for allowed placement of underscores. This is a listing placing the > known > rules into three major groups. In cases where the language spec > contradicts the > actual behavior, the actual behavior is listed. > > **Group 1: liberal (like this PEP)** > > * D [2]_ > * Perl 5 (although docs say it's more restricted) [3]_ > * Rust [4]_ > * Swift (although textual description says "between digits") [5]_ > > **Group 2: only between digits, multiple consecutive underscores** > > * C# (open proposal for 7.0) [6]_ > * Java [7]_ > > **Group 3: only between digits, only one underscore** > > * Ada [8]_ > * Julia (but not in the exponent part of floats) [9]_ > * Ruby (docs say "anywhere", in reality only between digits) [10]_ > > > Implementation > ============== > > A preliminary patch that implements the specification given above has been > posted to the issue tracker. [11]_ > Is the implementation made easier or harder if we went with the Group 2 or 3 approaches? Are there any reasonable examples that the Group 1 approach allows that Group 3 doesn't that people have used in other languages? I'm +1 on the idea, but which approach I prefer is going to be partially dependent on the difficulty of implementing (else I say Group 3 to make it easier to explain the rules). -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Wed Feb 10 17:42:52 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 10 Feb 2016 14:42:52 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <56BBBCEC.9020506@g.nevcal.com> On 2/10/2016 2:20 PM, Georg Brandl wrote: > This came up in python-ideas, and has met mostly positive comments, > although the exact syntax rules are up for discussion. > > cheers, > Georg > > -------------------------------------------------------------------------------- > > PEP: 515 > Title: Underscores in Numeric Literals > Version: $Revision$ > Last-Modified: $Date$ > Author: Georg Brandl > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 10-Feb-2016 > Python-Version: 3.6 > > Abstract and Rationale > ====================== > > This PEP proposes to extend Python's syntax so that underscores can be used in > integral and floating-point number literals. > > This is a common feature of other modern languages, and can aid readability of > long literals, or literals whose value should clearly separate into parts, such > as bytes or words in hexadecimal notation. > > Examples:: > > # grouping decimal numbers by thousands > amount = 10_000_000.0 > > # grouping hexadecimal addresses by words > addr = 0xDEAD_BEEF > > # grouping bits into bytes in a binary literal > flags = 0b_0011_1111_0100_1110 +1 You don't mention potential restrictions that decimal numbers should permit them only every three places, or hex ones only every 2 or 4, and your binary example mentions grouping into bytes, but actually groups into nybbles. But such restrictions would be annoying: if it is useful to the coder to use them, that is fine. But different situation may find other placements more useful... particularly in binary, as it might want to match widths of various bitfields. Adding that as a rejected consideration, with justifications, would be helpful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Feb 10 17:53:09 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Feb 2016 22:53:09 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 10 February 2016 at 22:20, Georg Brandl wrote: > This came up in python-ideas, and has met mostly positive comments, > although the exact syntax rules are up for discussion. +1 on the PEP. Is there any value in allowing underscores in strings passed to the Decimal constructor as well? The same sorts of justifications would seem to apply. It's perfectly arguable that the change for Decimal would be so rarely used as to not be worth it, though, so I don't mind either way in practice. Paul From desmoulinmichel at gmail.com Wed Feb 10 17:52:48 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 10 Feb 2016 23:52:48 +0100 Subject: [Python-Dev] why we have both re.match and re.string? In-Reply-To: <56BBB2B6.7030107@tiscali.it> References: <56BBB2B6.7030107@tiscali.it> Message-ID: <56BBBF40.4010207@gmail.com> Hi, Le 10/02/2016 22:59, Luca Sangiacomo a ?crit : > Hi, > I hope the question is not too silly, but why I would like to > understand the advantages of having both re.match() and re.search(). > Wouldn't be more clear to have just one function with one additional > parameters like this: > > re.search(regexp, text, from_beginning=True|False) ? Actually you can just do re.search(^regexp, text) But with match you express the intent to match the text with something, while with search, you express that you look for something in the text. Maybe that was the idea? > > In this way we prevent, as written in the documentation, people > writing ".*" in front of the regexp used with re.match() > > Thanks. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/desmoulin.michel%40gmail.com From victor.stinner at gmail.com Wed Feb 10 18:04:49 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 11 Feb 2016 00:04:49 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: It looks like the implementation https://bugs.python.org/issue26331 only changes the Python parser. What about other functions converting strings to numbers at runtime like int(str) and float(str)? Paul also asked for Decimal(str). Victor From python at mrabarnett.plus.com Wed Feb 10 18:08:37 2016 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 10 Feb 2016 23:08:37 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <56BBC2F5.7020201@mrabarnett.plus.com> On 2016-02-10 22:35, Brett Cannon wrote: [snip] > > Examples:: > > # grouping decimal numbers by thousands > amount = 10_000_000.0 > > # grouping hexadecimal addresses by words > addr = 0xDEAD_BEEF > > # grouping bits into bytes in a binary literal > flags = 0b_0011_1111_0100_1110 > > > I assume all of these examples are possible in either the liberal or > restrictive approaches? > [snip] Strictly speaking, "0b_0011_1111_0100_1110" wouldn't be valid if an underscore was allowed only between digits because the "b" isn't a digit. Similarly, "0x_FF_FF" wouldn't be valid, but "0xFF_FF" would. From eryksun at gmail.com Wed Feb 10 18:11:03 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 10 Feb 2016 17:11:03 -0600 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <1844729669.1825953.1455136233223.JavaMail.yahoo@mail.yahoo.com> References: <22203.20013.518404.39381@turnbull.sk.tsukuba.ac.jp> <1844729669.1825953.1455136233223.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Wed, Feb 10, 2016 at 2:30 PM, Andrew Barnert via Python-Dev wrote: > [^3]: Say you write a program that assumes it will only be run on Shift-JIS systems, and you use > CreateFileA to create a file named "???????". The actual bytes you're sending are cp436 > for "?n???[???[???h", so the file on the CD is named, in Unicode, "?n???[???[???h". Unless the system default was changed or the program called SetFileApisToOEM, CreateFileA would decode using the ANSI codepage 1252, not the OEM codepage 437 (not 436), i.e. "?n?\x8d\x81[?\x8f\x81[???h". Otherwise the example is right. But the transcoding strategy won't work in general. For example, if the tables are turned such that the ANSI codepage is 932 and the program passes a bytes name from codepage 1252, the user on the other end won't be able to transcode without error if the original bytes contained invalid DBCS sequences that were mapped to the default character, U+30FB. This transcodes as the meaningless string "\x81E". The user can replace that string with "--" and enjoy a nice game of hang man. From steve at pearwood.info Wed Feb 10 18:05:51 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Feb 2016 10:05:51 +1100 Subject: [Python-Dev] why we have both re.match and re.string? In-Reply-To: <56BBB2B6.7030107@tiscali.it> References: <56BBB2B6.7030107@tiscali.it> Message-ID: <20160210230551.GG31806@ando.pearwood.info> On Wed, Feb 10, 2016 at 10:59:18PM +0100, Luca Sangiacomo wrote: > Hi, > I hope the question is not too silly, but why I would like to understand > the advantages of having both re.match() and re.search(). Wouldn't be > more clear to have just one function with one additional parameters like > this: > > re.search(regexp, text, from_beginning=True|False) ? I guess the most important reason now is backwards compatibility. The oldest Python I have installed here is version 1.5, and it has the brand new "re" module (intended as a replacement for the old "regex" module). Both have search() and match() top-level functions. So my guess is that you would have to track down the author of the original "regex" module. But a more general answer is the principle, "Functions shouldn't take constant bool arguments". It is an API design principle which (if I remember correctly) Guido has stated a number of times. Functions should not take a boolean argument which (1) exists only to select between two different modes and (2) are nearly always given as a constant. Do you ever find yourself writing code like this? if some_calculation(): result = re.match(regex, string) else: result = re.search(regex, string) If you do, that would be a hint that perhaps match() and search() should be combined so you can write: result = re.search(regex, string, some_calculation()) But I expect that you almost never do. I would expect that if we combined the two functions into one, we would nearly always call them with a constant bool: # I always forget whether True means match from the start or not, # and which is the default... result = re.search(regex, string, False) which suggests that search() is actually two different functions, and should be split into two, just as we have now. It's a general principle, not a law of nature, so you may find exceptions in the standard library. But if I were designing the re module from scratch, I would either keep the two distinct functions, or just provide search() and let users use ^ to anchor the search to the beginning. > In this way we prevent, as written in the documentation, people writing > ".*" in front of the regexp used with re.match() I only see one example that does that: https://docs.python.org/3/library/re.html#checking-for-a-pair Perhaps it should be changed. -- Steve From steve at pearwood.info Wed Feb 10 18:14:46 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Feb 2016 10:14:46 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <20160210231446.GH31806@ando.pearwood.info> On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote: > On 10 February 2016 at 22:20, Georg Brandl wrote: > > This came up in python-ideas, and has met mostly positive comments, > > although the exact syntax rules are up for discussion. > > +1 on the PEP. Is there any value in allowing underscores in strings > passed to the Decimal constructor as well? The same sorts of > justifications would seem to apply. It's perfectly arguable that the > change for Decimal would be so rarely used as to not be worth it, > though, so I don't mind either way in practice. Let's delay making any change to string conversions for now, and that includes Decimal. We can also do this: Decimal("123_456_789.00000_12345_67890".replace("_", "")) for those who absolutely must include underscores in their numeric strings. The big win is for numeric literals, not numeric string conversions. -- Steve From abarnert at yahoo.com Wed Feb 10 18:45:48 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Feb 2016 15:45:48 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On Feb 10, 2016, at 14:20, Georg Brandl wrote: First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly). > * Trailing underscores are not allowed, because they look confusing and don't > contribute much to readability. Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6" is just fine, or "123e__+456"? More to the point, if we really need an extra rule, and more complicated BNF, to outlaw this case, I don't think we want a liberal design at all. Also, notice that Swift, Rust, and D all show examples with trailing underscores in their references, and they don't look particularly out of place with the other examples. > There appears to be no reason to restrict the use of underscores otherwise. What other restrictions are there? I think the only place you've left that's not between digits is between the e and the sign. A dead-simple rule like Swift's seems better than five separate rules that I have to learn and remember that make lexing more complicated and that ultimately amount to the conservative rule plus one other place I can put underscores where I'd never want to. > **Group 1: liberal (like this PEP)** > > * D [2]_ > * Perl 5 (although docs say it's more restricted) [3]_ > * Rust [4]_ > * Swift (although textual description says "between digits") [5]_ I don't think any of these are liberal like this PEP. For example, Swift's actual grammar rule allows underscores anywhere but leading in the "digits" part of int literals and all three potential digit parts of float literals. That's the whole rule. It's more conservative than this PEP in not allowing them outside of digit parts (like between E and +), more liberal in allowing them to be trailing, but I'm pretty sure the reason behind the design wasn't specifically about how liberal or conservative they wanted to be, but about being as simple as possible. Rust's rule seems to be equivalent to Swift's, except that they forgot to define exponents anywhere. I don't think either of them was trying to be more liberal or more conservative; rather, they were both trying to be as simple as possible. D does go out of its way to be as liberal as possible, e.g., allowing things like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit part, which can't have leading underscores), but it's also more conservative than this spec in not allowing underscores between e and the sign. I think Perl is the only language that allows them anywhere but in the digits part. From abarnert at yahoo.com Wed Feb 10 19:03:19 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Feb 2016 16:03:19 -0800 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: References: <22203.20013.518404.39381@turnbull.sk.tsukuba.ac.jp> <1844729669.1825953.1455136233223.JavaMail.yahoo@mail.yahoo.com> Message-ID: <24074D19-77C8-4462-9F17-838A56A3F7BA@yahoo.com> On Feb 10, 2016, at 15:11, eryk sun wrote: > > On Wed, Feb 10, 2016 at 2:30 PM, Andrew Barnert via Python-Dev > wrote: >> [^3]: Say you write a program that assumes it will only be run on Shift-JIS systems, and you use >> CreateFileA to create a file named "???????". The actual bytes you're sending are cp436 >> for "?n???[???[???h", so the file on the CD is named, in Unicode, "?n???[???[???h". > > Unless the system default was changed or the program called > SetFileApisToOEM, CreateFileA would decode using the ANSI codepage > 1252, not the OEM codepage 437 (not 436), i.e. > "?n?\x8d\x81[?\x8f\x81[???h". Otherwise the example is right. But the > transcoding strategy won't work in general. For example, if the tables > are turned such that the ANSI codepage is 932 and the program passes a > bytes name from codepage 1252, the user on the other end won't be able > to transcode without error if the original bytes contained invalid > DBCS sequences that were mapped to the default character, U+30FB. > This > transcodes as the meaningless string "\x81E". The user can replace > that string with "--" and enjoy a nice game of hang man. Of course there's no way to recover the actual intended filenames if that information was thrown out instead of being stored, but that's no different from the situation where the user mashed the keyboard instead of typing what they intended. The point remains: the Mac strategy (which is also the linux strategy for filesystems that are inherently UTF-16) always generates valid UTF-8, and doesn't try to magically cure mojibake but doesn't get in the way of the user manually curing it. When the Unicode encoding is lossy, of course the user can't cure that, but UTF-8 isn't making it any harder. From steve at pearwood.info Wed Feb 10 19:04:27 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Feb 2016 11:04:27 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <20160211000425.GI31806@ando.pearwood.info> On Wed, Feb 10, 2016 at 11:20:38PM +0100, Georg Brandl wrote: > This came up in python-ideas, and has met mostly positive comments, > although the exact syntax rules are up for discussion. Nicely done. But I would change the restrictions to a simpler version. Instead of five rules to learn: > The current proposal is to allow underscores anywhere in numeric literals, with > these exceptions: > > * Leading underscores cannot be allowed, since they already introduce > identifiers. > * Trailing underscores are not allowed, because they look confusing and don't > contribute much to readability. > * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up, > because they are fixed strings and not logically part of the number. > * No underscore allowed after a sign in an exponent (``1e-_5``), because > underscores can also not be used after the signs in front of the number > (``-1e5``). > * No underscore allowed after a decimal point, because this leads to ambiguity > with attribute access (the lexer cannot know that there is no number literal > in ``foo._5``). change to a single rule "one or more underscores may appear between two (hex)digits, but otherwise nowhere else". That's much simpler to understand than a series of restrictions as given above. That would be your second restrictive rule: "Multiple consecutive underscore allowed, but only between digits." That forbids leading and trailing underscores, underscores inside or immediately after the leading number base (since x, o and b aren't digits), and immediately before or after the sign, decimal point or e|E exponent symbol. > There appears to be no reason to restrict the use of underscores otherwise. I don't like underscores immediately before the . or e|E in floats either: 123_.000_456 The dot is already visually distinctive enough, as is the e|E, and placing an underscore immediately before them doesn't aid in grouping the digits. > Instead of the liberal rule specified above, the use of underscores could be > limited. Common rules are (see the "other languages" section): > > * Only one consecutive underscore allowed, and only between digits. > * Multiple consecutive underscore allowed, but only between digits. I don't think there is any need to restrict it to only a single underscore. There are uses for more than one: Fraction(3__141_592_654, 1_000_000_000) hints that the 3 is somewhat special (for obvious reasons). -- Steve From greg.ewing at canterbury.ac.nz Wed Feb 10 19:08:41 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Feb 2016 13:08:41 +1300 Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: References: Message-ID: <56BBD109.2010600@canterbury.ac.nz> The Mersenne Twister is no longer regarded as quite state-of-the art because it can get into states that produce long sequences that are not very random. There is a variation on MT called WELL that has better properties in this regard. Does anyone think it would be a good idea to replace MT with WELL as Python's default rng? https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear -- Greg From ethan at stoneleaf.us Wed Feb 10 19:14:18 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Feb 2016 16:14:18 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160211000425.GI31806@ando.pearwood.info> References: <20160211000425.GI31806@ando.pearwood.info> Message-ID: <56BBD25A.7070300@stoneleaf.us> On 02/10/2016 04:04 PM, Steven D'Aprano wrote: > change to a single rule "one or more underscores may appear between > two (hex)digits, but otherwise nowhere else". That's much simpler to > understand than a series of restrictions as given above. I like the simpler rule, but I would also allow for an underscore between the base and the first digit: 0x_1ef9_ab22 is easier (at least, for me ;) to parse than 0x1ef9_ab22 However, since Georg is doing the work, I'm not going to argue too hard. -- ~Ethan~ From steve at pearwood.info Wed Feb 10 19:21:27 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Feb 2016 11:21:27 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <20160211002127.GJ31806@ando.pearwood.info> On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev wrote: > On Feb 10, 2016, at 14:20, Georg Brandl wrote: > > First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly). > > > * Trailing underscores are not allowed, because they look confusing and don't > > contribute much to readability. > > Why is "123_456_" so ugly that we have to catch it, when > "1___2_345______6" is just fine, It's not just fine, it's ugly as sin, but it shouldn't be a matter for the parser to decide a style-issue. Just as we allow people to write ugly tuples: t = ( 1 , 2, 3 ,4, 5, ) so we should allow people to write ugly ints rather than try to enforce good taste in the parser. There are uses for allowing multiple underscores, and odd groupings, so rather than a blanket ban, we trust that people won't do stupid things. > or "123e__+456"? That I would prohibit. I think that the decimal point and exponent sign provide sufficient visual distinctiveness that putting underscores around them doesn't gain you anything. In some cases it looks like you might have missed a group of digits: 1.234_e-89 hints that perhaps there ought to be more digits after the 4. I'd be okay with a rule "no underscores in the exponent at all", but I don't particularly see the need for it since that's pretty much covered by the style guide saying "don't use underscores unnecessarily". For floats, exponents have a practical limitation of three digits, so there's not much need for grouping them. +1 on allowing underscores between digits +0 on prohibiting underscores in the exponent > More to the point, > if we really need an extra rule, and more complicated BNF, to outlaw > this case, I don't think we want a liberal design at all. I think "underscores can occur between any two digits" is pretty liberal, since it allows multiple underscores, and allows grouping in any size group (including mixed sizes, and stupid sizes like 1). To me, the opposite of a liberal rule is something like "underscores may only occur between groups of three digits". > Also, notice that Swift, Rust, and D all show examples with trailing > underscores in their references, and they don't look particularly out > of place with the other examples. That's a matter of opinion. -- Steve From vadmium+py at gmail.com Wed Feb 10 20:16:16 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Thu, 11 Feb 2016 01:16:16 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: I have occasionally wondered about this missing feature. On 10 February 2016 at 22:20, Georg Brandl wrote: > Abstract and Rationale > ====================== > > This PEP proposes to extend Python's syntax so that underscores can be used in > integral and floating-point number literals. This should extend complex or imaginary literals like 10_000j for consistency. > Specification > ============= > > * Trailing underscores are not allowed, because they look confusing and don't > contribute much to readability. > * No underscore allowed after a sign in an exponent (``1e-_5``), because > underscores can also not be used after the signs in front of the number > (``-1e5``). > [. . .] > > The production list for integer literals would therefore look like this:: > > integer: decimalinteger | octinteger | hexinteger | bininteger > decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] > nonzerodigit: "1"..."9" > decimalrest: (digit | "_")* digit > digit: "0"..."9" > octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit > hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit > bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit > octdigit: "0"..."7" > hexdigit: digit | "a"..."f" | "A"..."F" > bindigit: "0" | "1" > > For floating-point literals:: > > floatnumber: pointfloat | exponentfloat > pointfloat: [intpart] fraction | intpart "." > exponentfloat: (intpart | pointfloat) exponent > intpart: digit (digit | "_")* This allows trailing underscores such as 1_.2, 1.2_, 1.2_e-5. Your bullet point above suggests at least some of these are not desired. > fraction: "." intpart > exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest] This allows underscores in the exponent (1e-5_0), contradicting the other bullet point. From abarnert at yahoo.com Wed Feb 10 23:41:27 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Feb 2016 20:41:27 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160211002127.GJ31806@ando.pearwood.info> References: <20160211002127.GJ31806@ando.pearwood.info> Message-ID: <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> On Feb 10, 2016, at 16:21, Steven D'Aprano wrote: > >> On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev wrote: >> On Feb 10, 2016, at 14:20, Georg Brandl wrote: >> >> First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly). >> >>> * Trailing underscores are not allowed, because they look confusing and don't >>> contribute much to readability. >> >> Why is "123_456_" so ugly that we have to catch it, when >> "1___2_345______6" is just fine, > > It's not just fine, it's ugly as sin, but it shouldn't be a matter for > the parser to decide a style-issue. Exactly. So why should it be any more of a matter for the parser to decide that "123_456_" is illegal? Leave that in the style guide, and keep the parser, and the reference documentation, as simple as possible. >> or "123e__+456"? > > That I would prohibit. The PEP allows that. The simpler rule used by Swift and Rust prohibits it. >> More to the point, >> if we really need an extra rule, and more complicated BNF, to outlaw >> this case, I don't think we want a liberal design at all. > > I think "underscores can occur between any two digits" is pretty > liberal, since it allows multiple underscores, and allows grouping in > any size group (including mixed sizes, and stupid sizes like 1). The PEP calls that a type-2 conservative proposal, and uses "liberal" to mean that underscores can appear in places that aren't between digits. I don't think we want that liberalism, especially if it requires 5 rules instead of 1 to get it right. Again, Swift and Rust only allow underscores in the digit part of integers, and the up to three digit parts of floats, and the only rule they impose is no leading underscore. (In some caass they lead to ambiguity, in others they don't, but it's easier to just always ban them.) I don't see anything wrong with that rule. The fact that it doesn't allow "1.2e_+3" seems fine. The fact that it doesn't prevent "123_" seems fine also. It's not about being as liberal as possible, or as restrictive as possible, because those edge cases just don't matter, so being as simple as possible seems like an obvious win. >> Also, notice that Swift, Rust, and D all show examples with trailing >> underscores in their references, and they don't look particularly out >> of place with the other examples. > > That's a matter of opinion. Sure, but it's apparently the opinion of the people who designed and/or documented this feature in three out of the four languages I looked at (aka every language but Perl), not mine. And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"? They're both presented as something the syntax allows, and neither one looks like something I'd ever want to write, much less promote in a style guide or something, but neither one screams out as something that's so heinous we need to complicate the language to ensure it raises a SyntaxError. Yes, that's my opinion, but do.you really have a different opinion about any part of that? From g.brandl at gmx.net Thu Feb 11 02:25:24 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 08:25:24 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 02/11/2016 02:16 AM, Martin Panter wrote: > I have occasionally wondered about this missing feature. > > On 10 February 2016 at 22:20, Georg Brandl wrote: >> Abstract and Rationale >> ====================== >> >> This PEP proposes to extend Python's syntax so that underscores can be used in >> integral and floating-point number literals. > > This should extend complex or imaginary literals like 10_000j for consistency. Yes, that was always the case, but I guess it should be explicit. >> Specification >> ============= >> >> * Trailing underscores are not allowed, because they look confusing and don't >> contribute much to readability. >> * No underscore allowed after a sign in an exponent (``1e-_5``), because >> underscores can also not be used after the signs in front of the number >> (``-1e5``). >> [. . .] >> >> The production list for integer literals would therefore look like this:: >> >> integer: decimalinteger | octinteger | hexinteger | bininteger >> decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] >> nonzerodigit: "1"..."9" >> decimalrest: (digit | "_")* digit >> digit: "0"..."9" >> octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit >> hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit >> bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit >> octdigit: "0"..."7" >> hexdigit: digit | "a"..."f" | "A"..."F" >> bindigit: "0" | "1" >> >> For floating-point literals:: >> >> floatnumber: pointfloat | exponentfloat >> pointfloat: [intpart] fraction | intpart "." >> exponentfloat: (intpart | pointfloat) exponent >> intpart: digit (digit | "_")* > > This allows trailing underscores such as 1_.2, 1.2_, 1.2_e-5. Your > bullet point above suggests at least some of these are not desired. The middle one isn't, indeed. I updated the grammar accordingly. >> fraction: "." intpart >> exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest] > > This allows underscores in the exponent (1e-5_0), contradicting the > other bullet point. I clarified the bullet points. An "immediately" was missing. Thanks for the feedback! Georg From g.brandl at gmx.net Thu Feb 11 02:37:18 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 08:37:18 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 02/11/2016 12:45 AM, Andrew Barnert via Python-Dev wrote: > On Feb 10, 2016, at 14:20, Georg Brandl wrote: > > First, general questions: should the PEP mention the Decimal constructor? > What about int and float (I'd assume int(s) continues to work as always, > while int(s, 0) gets the new behavior, but if that isn't obviously true, it > may be worth saying explicitly). > >> * Trailing underscores are not allowed, because they look confusing and >> don't contribute much to readability. > > Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6" > is just fine, or "123e__+456"? More to the point, if we really need an extra > rule, and more complicated BNF, to outlaw this case, I don't think we want a > liberal design at all. > > Also, notice that Swift, Rust, and D all show examples with trailing > underscores in their references, and they don't look particularly out of > place with the other examples. That's a point. I'll look into the implementation. >> There appears to be no reason to restrict the use of underscores >> otherwise. > > What other restrictions are there? I think the only place you've left that's > not between digits is between the e and the sign. There are other places left: * between 0x and the digits * between the digits and "j" * before and after the decimal point > A dead-simple rule like > Swift's seems better than five separate rules that I have to learn and > remember that make lexing more complicated and that ultimately amount to the > conservative rule plus one other place I can put underscores where I'd never > want to. Not quite, see above. >> **Group 1: liberal (like this PEP)** >> >> * D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust >> [4]_ * Swift (although textual description says "between digits") [5]_ > > I don't think any of these are liberal like this PEP. > > For example, Swift's actual grammar rule allows underscores anywhere but > leading in the "digits" part of int literals and all three potential digit > parts of float literals. That's the whole rule. It's more conservative than > this PEP in not allowing them outside of digit parts (like between E and +), > more liberal in allowing them to be trailing, but I'm pretty sure the reason > behind the design wasn't specifically about how liberal or conservative they > wanted to be, but about being as simple as possible. Rust's rule seems to be > equivalent to Swift's, except that they forgot to define exponents anywhere. > I don't think either of them was trying to be more liberal or more > conservative; rather, they were both trying to be as simple as possible. I actually modelled this PEP closely on Rust. It has restrictions as in this PEP, except that trailing underscores are allowed, and that "1.0e_+5" is not allowed (allowed by the PEP), and "1.0e+_5" is (not allowed by the PEP). I don't think you can argue that it's simpler. (If the PEP and our lexical reference were as loosely worded as Rust's, one could probably say it's "simple", too.) Also, both Swift and Rust don't have the baggage of allowing ".5" style literals, which makes the grammar simpler in Swift's case. > D does go out of its way to be as liberal as possible, e.g., allowing things > like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit > part, which can't have leading underscores), but it's also more conservative > than this spec in not allowing underscores between e and the sign. > > I think Perl is the only language that allows them anywhere but in the digits > part. Thanks for the feedback! Georg From g.brandl at gmx.net Thu Feb 11 02:45:30 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 08:45:30 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 02/10/2016 11:35 PM, Brett Cannon wrote: >> Examples:: >> >> # grouping decimal numbers by thousands >> amount = 10_000_000.0 >> >> # grouping hexadecimal addresses by words >> addr = 0xDEAD_BEEF >> >> # grouping bits into bytes in a binary literal >> flags = 0b_0011_1111_0100_1110 >> > > I assume all of these examples are possible in either the liberal or restrictive > approaches? The last one isn't for restrictive -- its first underscore isn't between digits. >> >> Implementation >> ============== >> >> A preliminary patch that implements the specification given above has been >> posted to the issue tracker. [11]_ >> > > Is the implementation made easier or harder if we went with the Group 2 or 3 > approaches? Are there any reasonable examples that the Group 1 approach allows > that Group 3 doesn't that people have used in other languages? Group 3 is probably a little more work than group 2, since you have to make sure only one consecutive underscore is present. I don't see a point to that. > I'm +1 on the idea, but which approach I prefer is going to be partially > dependent on the difficulty of implementing (else I say Group 3 to make it > easier to explain the rules). Based on the feedback so far, I have an easier rule in mind that I will base the next PEP revision on. It's basically "One ore more underscores allowed anywhere after a digit or a base specifier." This preserves my preferred non-restrictive cases (0b_1111_0000, 1.5_j) and disallows more controversial versions like "1.5e_+_2". cheers, Georg From g.brandl at gmx.net Thu Feb 11 03:09:02 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 09:09:02 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <56BBBCEC.9020506@g.nevcal.com> References: <56BBBCEC.9020506@g.nevcal.com> Message-ID: On 02/10/2016 11:42 PM, Glenn Linderman wrote: > On 2/10/2016 2:20 PM, Georg Brandl wrote: >> This came up in python-ideas, and has met mostly positive comments, >> although the exact syntax rules are up for discussion. >> >> cheers, >> Georg >> >> -------------------------------------------------------------------------------- >> >> PEP: 515 >> Title: Underscores in Numeric Literals >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Georg Brandl >> Status: Draft >> Type: Standards Track >> Content-Type: text/x-rst >> Created: 10-Feb-2016 >> Python-Version: 3.6 >> >> Abstract and Rationale >> ====================== >> >> This PEP proposes to extend Python's syntax so that underscores can be used in >> integral and floating-point number literals. >> >> This is a common feature of other modern languages, and can aid readability of >> long literals, or literals whose value should clearly separate into parts, such >> as bytes or words in hexadecimal notation. >> >> Examples:: >> >> # grouping decimal numbers by thousands >> amount = 10_000_000.0 >> >> # grouping hexadecimal addresses by words >> addr = 0xDEAD_BEEF >> >> # grouping bits into bytes in a binary literal >> flags = 0b_0011_1111_0100_1110 > > +1 > > You don't mention potential restrictions that decimal numbers should permit them > only every three places, or hex ones only every 2 or 4, and your binary example > mentions grouping into bytes, but actually groups into nybbles. > > But such restrictions would be annoying: if it is useful to the coder to use > them, that is fine. But different situation may find other placements more > useful... particularly in binary, as it might want to match widths of various > bitfields. > > Adding that as a rejected consideration, with justifications, would be helpful. I added a short paragraph. Thanks for the feedback, Georg From g.brandl at gmx.net Thu Feb 11 03:11:09 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 09:11:09 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 02/11/2016 12:04 AM, Victor Stinner wrote: > It looks like the implementation https://bugs.python.org/issue26331 > only changes the Python parser. > > What about other functions converting strings to numbers at runtime > like int(str) and float(str)? Paul also asked for Decimal(str). I added these as "Open Questions" to the PEP. For Decimal, it's probably a good idea. For int(), it should only be allowed with base argument = 0. For float() and complex(), probably. Georg From g.brandl at gmx.net Thu Feb 11 03:22:56 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 09:22:56 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals Message-ID: Hey all, based on the feedback so far, I revised the PEP. There is now a much simpler rule for allowed underscores, with no exceptions. This made the grammar simpler as well. --------------------------------------------------------------------------- PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6 Abstract and Rationale ====================== This PEP proposes to extend Python's syntax so that underscores can be used in integral, floating-point and complex number literals. This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation. Examples:: # grouping decimal numbers by thousands amount = 10_000_000.0 # grouping hexadecimal addresses by words addr = 0xDEAD_BEEF # grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110 # making the literal suffix stand out more imag = 1.247812376e-15_j Specification ============= The current proposal is to allow one or more consecutive underscores following digits and base specifiers in numeric literals. The production list for integer literals would therefore look like this:: integer: decimalinteger | octinteger | hexinteger | bininteger decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")* nonzerodigit: "1"..."9" digit: "0"..."9" octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" bindigit: "0" | "1" For floating-point and complex literals:: floatnumber: pointfloat | exponentfloat pointfloat: [intpart] fraction | intpart "." exponentfloat: (intpart | pointfloat) exponent intpart: digit (digit | "_")* fraction: "." intpart exponent: ("e" | "E") ["+" | "-"] intpart imagnumber: (floatnumber | intpart) ("j" | "J") Alternative Syntax ================== Underscore Placement Rules -------------------------- Instead of the liberal rule specified above, the use of underscores could be limited. Common rules are (see the "other languages" section): * Only one consecutive underscore allowed, and only between digits. * Multiple consecutive underscore allowed, but only between digits. A less common rule would be to allow underscores only every N digits (where N could be 3 for decimal literals, or 4 for hexadecimal ones). This is unnecessarily restrictive, especially considering the separator placement is different in different cultures. Different Separators -------------------- A proposed alternate syntax was to use whitespace for grouping. Although strings are a precedent for combining adjoining literals, the behavior can lead to unexpected effects which are not possible with underscores. Also, no other language is known to use this rule, except for languages that generally disregard any whitespace. C++14 introduces apostrophes for grouping, which is not considered due to the conflict with Python's string literals. [1]_ Behavior in Other Languages =========================== Those languages that do allow underscore grouping implement a large variety of rules for allowed placement of underscores. This is a listing placing the known rules into three major groups. In cases where the language spec contradicts the actual behavior, the actual behavior is listed. **Group 1: liberal** This group is the least homogeneous: the rules vary slightly between languages. All of them allow trailing underscores. Some allow underscores after non-digits like the ``e`` or the sign in exponents. * D [2]_ * Perl 5 (underscores basically allowed anywhere, although docs say it's more restricted) [3]_ * Rust (allows between exponent sign and digits) [4]_ * Swift (although textual description says "between digits") [5]_ **Group 2: only between digits, multiple consecutive underscores** * C# (open proposal for 7.0) [6]_ * Java [7]_ **Group 3: only between digits, only one underscore** * Ada [8]_ * Julia (but not in the exponent part of floats) [9]_ * Ruby (docs say "anywhere", in reality only between digits) [10]_ Implementation ============== A preliminary patch that implements the specification given above has been posted to the issue tracker. [11]_ Open Questions ============== This PEP currently only proposes changing the literal syntax. The following extensions are open for discussion: * Allowing underscores in string arguments to the ``Decimal`` constructor. It could be argued that these are akin to literals, since there is no Decimal literal available (yet). * Allowing underscores in string arguments to ``int()`` with base argument 0, ``float()`` and ``complex()``. References ========== .. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html .. [2] http://dlang.org/spec/lex.html#integerliteral .. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors .. [4] http://doc.rust-lang.org/reference.html#number-literals .. [5] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html .. [6] https://github.com/dotnet/roslyn/issues/216 .. [7] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html .. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4 .. [9] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/ .. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers .. [11] http://bugs.python.org/issue26331 Copyright ========= This document has been placed in the public domain. From p.f.moore at gmail.com Thu Feb 11 04:10:35 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 11 Feb 2016 09:10:35 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160210231446.GH31806@ando.pearwood.info> References: <20160210231446.GH31806@ando.pearwood.info> Message-ID: On 10 February 2016 at 23:14, Steven D'Aprano wrote: > On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote: >> On 10 February 2016 at 22:20, Georg Brandl wrote: >> > This came up in python-ideas, and has met mostly positive comments, >> > although the exact syntax rules are up for discussion. >> >> +1 on the PEP. Is there any value in allowing underscores in strings >> passed to the Decimal constructor as well? The same sorts of >> justifications would seem to apply. It's perfectly arguable that the >> change for Decimal would be so rarely used as to not be worth it, >> though, so I don't mind either way in practice. > > Let's delay making any change to string conversions for now, and that > includes Decimal. We can also do this: > > Decimal("123_456_789.00000_12345_67890".replace("_", "")) > > for those who absolutely must include underscores in their numeric > strings. The big win is for numeric literals, not numeric string > conversions. Good point. Maybe add this as an example in the PEP to explain why conversions are excluded. But I did only mean the Decimal constructor, which I think of more as a "decimal literal" - whereas int() and float() are (in my mind at least) conversion functions and as such should not be coupled to literal format (for example, 0x0001 notation isn't supported by int()) Paul From steve at pearwood.info Thu Feb 11 04:39:05 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Feb 2016 20:39:05 +1100 Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: <56BBD109.2010600@canterbury.ac.nz> References: <56BBD109.2010600@canterbury.ac.nz> Message-ID: <20160211093905.GK31806@ando.pearwood.info> On Thu, Feb 11, 2016 at 01:08:41PM +1300, Greg Ewing wrote: > The Mersenne Twister is no longer regarded as quite state-of-the art > because it can get into states that produce long sequences that are > not very random. > > There is a variation on MT called WELL that has better properties > in this regard. Does anyone think it would be a good idea to replace > MT with WELL as Python's default rng? > > https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear I'm not able to judge the claims about which PRNG is better (perhaps Tim Peters has an opinion?) but if we do change, I'd like to see the existing random.Random moved to random.MT_Random for backwards compatibility and compatibility with other software which uses MT. Not necessarily saying that we have to keep it around forever (after all, we did dump the Wichmann-Hill PRNG some time ago) but we ought to keep it for at least a couple of releases. -- Steve From g.brandl at gmx.net Thu Feb 11 04:39:53 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 10:39:53 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160210231446.GH31806@ando.pearwood.info> Message-ID: On 02/11/2016 10:10 AM, Paul Moore wrote: > On 10 February 2016 at 23:14, Steven D'Aprano wrote: >> On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote: >>> On 10 February 2016 at 22:20, Georg Brandl wrote: >>> > This came up in python-ideas, and has met mostly positive comments, >>> > although the exact syntax rules are up for discussion. >>> >>> +1 on the PEP. Is there any value in allowing underscores in strings >>> passed to the Decimal constructor as well? The same sorts of >>> justifications would seem to apply. It's perfectly arguable that the >>> change for Decimal would be so rarely used as to not be worth it, >>> though, so I don't mind either way in practice. >> >> Let's delay making any change to string conversions for now, and that >> includes Decimal. We can also do this: >> >> Decimal("123_456_789.00000_12345_67890".replace("_", "")) >> >> for those who absolutely must include underscores in their numeric >> strings. The big win is for numeric literals, not numeric string >> conversions. > > Good point. Maybe add this as an example in the PEP to explain why > conversions are excluded. But I did only mean the Decimal constructor, > which I think of more as a "decimal literal" - whereas int() and > float() are (in my mind at least) conversion functions and as such > should not be coupled to literal format (for example, 0x0001 notation > isn't supported by int()) Actually, it is. Just not without a base argument, because the default base is 10. But both with base 0 and base 16, '0x' prefixes are allowed. That's why I'm leaning towards supporting the underscores. In any case I'm preparing the implementation. Georg From victor.stinner at gmail.com Thu Feb 11 04:59:12 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 11 Feb 2016 10:59:12 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: 2016-02-11 9:11 GMT+01:00 Georg Brandl : > On 02/11/2016 12:04 AM, Victor Stinner wrote: >> It looks like the implementation https://bugs.python.org/issue26331 >> only changes the Python parser. >> >> What about other functions converting strings to numbers at runtime >> like int(str) and float(str)? Paul also asked for Decimal(str). > > I added these as "Open Questions" to the PEP. Ok nice. Now another question :-) Would it be useful to add an option to repr(int) and repr(float), or a formatter to int.__format__() and float.__float__() to add an underscore for thousands. Currently, we have the "n" format which depends on the current LC_NUMERIC locale: >>> '{:n}'.format(1234) '1234' >>> import locale; locale.setlocale(locale.LC_ALL, '') 'fr_FR.UTF-8' >>> '{:n}'.format(1234) '1 234' My idea: >>> (1234).__repr__(pep515=True) '1_234' >>> (1234.0).__repr__(pep515=True) '1_234.0' or maybe: >>> '{:pep515}'.format(1234) '1_234' >>> '{:pep515}'.format(1234.0) '1_234.0' I don't think that it would be a good idea to modify repr() default behaviour, it would likely break a lot of applications. Victor From ncoghlan at gmail.com Thu Feb 11 05:07:56 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Feb 2016 20:07:56 +1000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 11 February 2016 at 19:59, Victor Stinner wrote: > 2016-02-11 9:11 GMT+01:00 Georg Brandl : >> On 02/11/2016 12:04 AM, Victor Stinner wrote: >>> It looks like the implementation https://bugs.python.org/issue26331 >>> only changes the Python parser. >>> >>> What about other functions converting strings to numbers at runtime >>> like int(str) and float(str)? Paul also asked for Decimal(str). >> >> I added these as "Open Questions" to the PEP. > > Ok nice. Now another question :-) > > Would it be useful to add an option to repr(int) and repr(float), or a > formatter to int.__format__() and float.__float__() to add an > underscore for thousands. Given that str.format supports a thousands separator: >>> "{:,d}".format(100000000) '100,000,000' it might be reasonable to permit "_" in place of "," in the format specifier. However, I'm not sure when you'd use it aside from code generation, and you can already insert the thousands separator and then replace "," with "_". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Thu Feb 11 05:13:27 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Feb 2016 21:13:27 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> Message-ID: <20160211101326.GL31806@ando.pearwood.info> On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote: > And honestly, are you really claiming that in your opinion, "123_456_" > is worse than all of their other examples, like "1_23__4"? Yes I am, because 123_456_ looks like you've forgotten to finish typing the last group of digits, while 1_23__4 merely looks like you have no taste. > They're both presented as something the syntax allows, and neither one > looks like something I'd ever want to write, much less promote in a > style guide or something, but neither one screams out as something > that's so heinous we need to complicate the language to ensure it > raises a SyntaxError. Yes, that's my opinion, but do.you really have a > different opinion about any part of that? I don't think the rule "underscores must occur between digits" is complicating the specification. It is *less* complicated to explain this rule than to give a whole lot of special cases - can you use a leading or trailing underscore? - can an underscore follow the base prefix 0b 0o 0x? - can an underscore precede or follow the decimal place? - can an underscore precede or follow a + or - sign? - can an underscore precede or follow the e|E exponent symbol? - can an underscore precede or follow the j suffix for complex numbers? versus - underscores can only appear between (hex)digits. I'm not sure why you seem to think that "only between digits" is more complex than the alternative -- to me it is less complex, with no special cases to memorise, just one general rule. Of course, if (generic) you think that it is a feature to be able to put underscores before the decimal point, after the E exponent, etc. then you will dislike my suggested rule. That's okay, but in that case, it is not because of "simplicity|complexity" but because (generic) you want to be able to write things which my rule would prohibit. -- Steve From storchaka at gmail.com Thu Feb 11 05:17:00 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Feb 2016 12:17:00 +0200 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 11.02.16 00:20, Georg Brandl wrote: > **Group 1: liberal (like this PEP)** > > * D [2]_ > * Perl 5 (although docs say it's more restricted) [3]_ > * Rust [4]_ > * Swift (although textual description says "between digits") [5]_ > > **Group 2: only between digits, multiple consecutive underscores** > > * C# (open proposal for 7.0) [6]_ > * Java [7]_ > > **Group 3: only between digits, only one underscore** > > * Ada [8]_ > * Julia (but not in the exponent part of floats) [9]_ > * Ruby (docs say "anywhere", in reality only between digits) [10]_ C++ is in this group too. The documentation of Perl explicitly says that Perl is in this group too (23__500 is not legal). Perhaps there is a bug in Perl implementation. And may be Swift is intended to be in this group. I think we should follow the majority of languages and use simple rule: "only between digits". I have provided an implementation. From encukou at gmail.com Thu Feb 11 05:24:58 2016 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 11 Feb 2016 11:24:58 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <56BC617A.6080406@gmail.com> On 02/11/2016 11:07 AM, Nick Coghlan wrote: > On 11 February 2016 at 19:59, Victor Stinner wrote: >> 2016-02-11 9:11 GMT+01:00 Georg Brandl : >>> On 02/11/2016 12:04 AM, Victor Stinner wrote: >>>> It looks like the implementation https://bugs.python.org/issue26331 >>>> only changes the Python parser. >>>> >>>> What about other functions converting strings to numbers at runtime >>>> like int(str) and float(str)? Paul also asked for Decimal(str). >>> >>> I added these as "Open Questions" to the PEP. >> >> Ok nice. Now another question :-) >> >> Would it be useful to add an option to repr(int) and repr(float), or a >> formatter to int.__format__() and float.__float__() to add an >> underscore for thousands. > > Given that str.format supports a thousands separator: > >>>> "{:,d}".format(100000000) > '100,000,000' > > it might be reasonable to permit "_" in place of "," in the format specifier. > > However, I'm not sure when you'd use it aside from code generation, > and you can already insert the thousands separator and then replace > "," with "_". It would make "SI style" [0] numbers a little bit more straightforward to generate, since the order of operations wouldn't matter. Currently it's: "{:,}".format(1234.5678).replace(',', ' ').replace('.', ',') Also it would make numbers with decimal comma and dot as separator a bit easier to generate. Currently, that's (from PEP 378): format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".") [0] https://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use From steve at pearwood.info Thu Feb 11 05:29:48 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Feb 2016 21:29:48 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <20160211102948.GM31806@ando.pearwood.info> On Thu, Feb 11, 2016 at 08:07:56PM +1000, Nick Coghlan wrote: > Given that str.format supports a thousands separator: > > >>> "{:,d}".format(100000000) > '100,000,000' > > it might be reasonable to permit "_" in place of "," in the format specifier. +1 > However, I'm not sure when you'd use it aside from code generation, > and you can already insert the thousands separator and then replace > "," with "_". It's not always easy or convenient to call .replace(",", "_") on the output of format: "With my help, the {} caught {:,d} ants.".format("aardvark", 100000000) would need to be re-written as something like: py> "With my help, the {} caught {} ants.".format("aardvark", "{:,d}".format(100000000).replace(",", "_")) 'With my help, the aardvark caught 100_000_000 ants.' -- Steve From rosuav at gmail.com Thu Feb 11 06:12:40 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Feb 2016 22:12:40 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On Thu, Feb 11, 2016 at 7:22 PM, Georg Brandl wrote: > * Allowing underscores in string arguments to the ``Decimal`` constructor. It > could be argued that these are akin to literals, since there is no Decimal > literal available (yet). > > * Allowing underscores in string arguments to ``int()`` with base argument 0, > ``float()`` and ``complex()``. I'm -0.5 on both of these, with the caveat that if either gets done, both should be. Decimal() shouldn't be different from int() just because there's currently no way to express a Decimal literal; if Python 3.7 introduces such a literal, there'd be this weird rule difference that has to be maintained for backward compatibility, and has no justification left. (As a side point, I would be fully in favour of Decimal literals. I'd also be in favour of something like "from __future__ import fraction_literals" so 1/2 would evaluate to Fraction(1,2) rather than 0.5. Hence I'm inclined *not* to support underscores in Decimal().) ChrisA From robert.kern at gmail.com Thu Feb 11 06:57:45 2016 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 11 Feb 2016 11:57:45 +0000 Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: <56BBD109.2010600@canterbury.ac.nz> References: <56BBD109.2010600@canterbury.ac.nz> Message-ID: On 2016-02-11 00:08, Greg Ewing wrote: > The Mersenne Twister is no longer regarded as quite state-of-the art > because it can get into states that produce long sequences that are > not very random. > > There is a variation on MT called WELL that has better properties > in this regard. Does anyone think it would be a good idea to replace > MT with WELL as Python's default rng? > > https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear There was a side-discussion about this during the secrets module proposal discussion. WELL would not be my first choice. It escapes the excess-0 islands faster than MT, but still suffers from them. More troubling to me is that it is a linear feedback shift register, like MT, and all LFSRs quickly fail the linear complexity test in BigCrush. xorshift* shares some of these flaws, but is significantly stronger and dominates WELL in most (all?) relevant dimensions. http://xorshift.di.unimi.it/ I'm favorable to the PCG family these days, though xorshift* and Random123 are reasonable alternatives. http://www.pcg-random.org/ https://www.deshawresearch.com/resources_random123.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From g.brandl at gmx.net Thu Feb 11 07:14:48 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 13:14:48 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 02/11/2016 11:17 AM, Serhiy Storchaka wrote: >> **Group 3: only between digits, only one underscore** >> >> * Ada [8]_ >> * Julia (but not in the exponent part of floats) [9]_ >> * Ruby (docs say "anywhere", in reality only between digits) [10]_ > > C++ is in this group too. > > The documentation of Perl explicitly says that Perl is in this group too > (23__500 is not legal). Perhaps there is a bug in Perl implementation. > And may be Swift is intended to be in this group. > > I think we should follow the majority of languages and use simple rule: > "only between digits". > > I have provided an implementation. Thanks for the alternate patch. I used the two-function approach you took in ast.c for my latest revision. I still think that some cases (like two of the examples in the PEP, 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed rule is preferable. cheers, Georg From barry at python.org Thu Feb 11 09:51:02 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 11 Feb 2016 09:51:02 -0500 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <20160211095102.032fea58@subdivisions.wooz.org> On Feb 11, 2016, at 09:22 AM, Georg Brandl wrote: >based on the feedback so far, I revised the PEP. There is now >a much simpler rule for allowed underscores, with no exceptions. >This made the grammar simpler as well. I'd be +1, but there's something missing from the PEP: what the underscores *mean*. You describe the syntax nicely, but not the semantics. From reading the examples, I'd guess that the underscores are semantically transparent, meaning that the resulting value is the same if you just removed the underscores and interpreted the resulting literal. Right or wrong, could you please add a paragraph explaining the meaning of the underscores? Cheers, -Barry From hugo.fisher at gmail.com Thu Feb 11 04:28:41 2016 From: hugo.fisher at gmail.com (Hugh Fisher) Date: Thu, 11 Feb 2016 20:28:41 +1100 Subject: [Python-Dev] fullOfEels, assistant program for writing Python extension modules in C Message-ID: I've written a Python program named fullOfEels to speed up the first stages of writing Python extension modules in C. It is not a replacement for SWIG, SIP, or ctypes. It's for the case where you want to work in the opposite direction, specifying a Python API and then writing an implementation in C. (A small niche maybe, but I hope it isn't just me who sometimes works this way.) The input is a Python module specifying what it should do but not how, with all the functions, classes, and methods being just pass. The output is a pair of .h and .c files with all the boilerplate C code required: module initialization, class type structs, C method functions and method tables. Downloadable from https://bitbucket.org/hugh_fisher/fullofeels All feedback and suggestions welcome. -- cheers, Hugh Fisher From njs at pobox.com Thu Feb 11 11:30:37 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 11 Feb 2016 08:30:37 -0800 Subject: [Python-Dev] fullOfEels, assistant program for writing Python extension modules in C In-Reply-To: References: Message-ID: You're almost certainly aware of this, but just to double check since you don't mention it in the email: cython is also a great tool for handling similar situations. Not quite the same since in addition to generating all the boilerplate for you it then lets you use almost-python to actually write the C implementations as well, and I understand that with your tool you write the actual implementations in C. But probably also worth considering in cases where you'd consider this tool, so wanted to make sure it was on your radar. On Feb 11, 2016 8:21 AM, "Hugh Fisher" wrote: > I've written a Python program named fullOfEels to speed up the first > stages of writing Python extension modules in C. > > It is not a replacement for SWIG, SIP, or ctypes. It's for the case > where you want to work in the opposite direction, specifying a Python > API and then writing an implementation in C. (A small niche maybe, but > I hope it isn't just me who sometimes works this way.) > > The input is a Python module specifying what it should do but not how, > with all the functions, classes, and methods being just pass. The > output is a pair of .h and .c files with all the boilerplate C code > required: module initialization, class type structs, C method > functions and method tables. > > Downloadable from > https://bitbucket.org/hugh_fisher/fullofeels > > All feedback and suggestions welcome. > > -- > > cheers, > Hugh Fisher > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/njs%40pobox.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Feb 11 11:35:53 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Feb 2016 08:35:53 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160211101326.GL31806@ando.pearwood.info> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> Message-ID: <4CEFA336-635D-4776-9D06-86FB9523DE54@yahoo.com> On Feb 11, 2016, at 02:13, Steven D'Aprano wrote: > >> On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote: >> They're both presented as something the syntax allows, and neither one >> looks like something I'd ever want to write, much less promote in a >> style guide or something, but neither one screams out as something >> that's so heinous we need to complicate the language to ensure it >> raises a SyntaxError. Yes, that's my opinion, but do.you really have a >> different opinion about any part of that? > > I don't think the rule "underscores must occur between digits" is > complicating the specification. That rule isn't in the specification in the PEP, except as one of the alternatives rejected for being "too restrictive". It's also not the rule you were suggesting in your previous email, arguing where you insisted that you wanted something "more liberal". I also don't understand why you're presenting this whole thing as an argument against my response, which was suggesting that whatever rule we choose should be simpler than what's in the PEP, when that's also (apparently, now) your position. > It is *less* complicated to explain this > rule than to give a whole lot of special cases Sure. Your rule is about as complicated as the Swift rule, and both are much less complicated than the PEP. I'm fine with either one, because, as I said, the edge cases don't matter to me nearly as much as having a rule that's easy to keep it my head and easy to lex. The only reason I specifically proposed the Swift rule instead of one of the other simple rules is that it seemed the most "liberal", which the PEP was in favor of, and and it has precedent in more other languages. But, in favor of your version, almost every language uses some variation of "you can put underscores between digits" as the "tutorial-level" explanation and rationale. From python at stevedower.id.au Thu Feb 11 11:52:31 2016 From: python at stevedower.id.au (Steve Dower) Date: Thu, 11 Feb 2016 08:52:31 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160211095102.032fea58@subdivisions.wooz.org> References: <20160211095102.032fea58@subdivisions.wooz.org> Message-ID: <56BCBC4F.9050703@python.org> On 11Feb2016 0651, Barry Warsaw wrote: > On Feb 11, 2016, at 09:22 AM, Georg Brandl wrote: > >> based on the feedback so far, I revised the PEP. There is now >> a much simpler rule for allowed underscores, with no exceptions. >> This made the grammar simpler as well. > > I'd be +1, but there's something missing from the PEP: what the underscores > *mean*. You describe the syntax nicely, but not the semantics. > > From reading the examples, I'd guess that the underscores are semantically > transparent, meaning that the resulting value is the same if you just removed > the underscores and interpreted the resulting literal. > > Right or wrong, could you please add a paragraph explaining the meaning of the > underscores? Glad I kept reading the thread this far - just pretend I also wrote exactly the same thing as Barry. Cheers, Steve From abarnert at yahoo.com Thu Feb 11 11:55:39 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Feb 2016 08:55:39 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <59244F6E-25A6-492F-9E7C-354BC41D17F9@yahoo.com> On Feb 11, 2016, at 00:22, Georg Brandl wrote: > > Allowing underscores in string arguments to the ``Decimal`` constructor. It > could be argued that these are akin to literals, since there is no Decimal > literal available (yet). I'm +1 on this. Partly for consistency (see below)--but also, one of the use cases for Decimal is when you need more precision than float, meaning you'll often have even more digits to separate. > * Allowing underscores in string arguments to ``int()`` with base argument 0, > ``float()`` and ``complex()``. +1, because these are actually defined in terms of literals. For example, under int, "Base 0 means to interpret exactly as a code literal". This isn't actually quite true, because "-2" is not an integer literal but is accepted here--but see float for an example that *is* rigorously defined, and still defers to literal syntax and semantics. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Thu Feb 11 11:57:56 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 17:57:56 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <56BCBC4F.9050703@python.org> References: <20160211095102.032fea58@subdivisions.wooz.org> <56BCBC4F.9050703@python.org> Message-ID: On 02/11/2016 05:52 PM, Steve Dower wrote: > On 11Feb2016 0651, Barry Warsaw wrote: >> On Feb 11, 2016, at 09:22 AM, Georg Brandl wrote: >> >>> based on the feedback so far, I revised the PEP. There is now >>> a much simpler rule for allowed underscores, with no exceptions. >>> This made the grammar simpler as well. >> >> I'd be +1, but there's something missing from the PEP: what the underscores >> *mean*. You describe the syntax nicely, but not the semantics. >> >> From reading the examples, I'd guess that the underscores are semantically >> transparent, meaning that the resulting value is the same if you just removed >> the underscores and interpreted the resulting literal. >> >> Right or wrong, could you please add a paragraph explaining the meaning of the >> underscores? > > Glad I kept reading the thread this far - just pretend I also wrote > exactly the same thing as Barry. D'oh :) I added (hopefully) clarifying wording. Thanks, Georg From barry at python.org Thu Feb 11 12:02:33 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 11 Feb 2016 12:02:33 -0500 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160211095102.032fea58@subdivisions.wooz.org> <56BCBC4F.9050703@python.org> Message-ID: <20160211120233.58f8322a@anarchist.wooz.org> On Feb 11, 2016, at 05:57 PM, Georg Brandl wrote: >D'oh :) I added (hopefully) clarifying wording. I saw the diff - perfect! Thanks. -Barry From storchaka at gmail.com Thu Feb 11 12:19:17 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Feb 2016 19:19:17 +0200 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 11.02.16 14:14, Georg Brandl wrote: > On 02/11/2016 11:17 AM, Serhiy Storchaka wrote: > >>> **Group 3: only between digits, only one underscore** >>> >>> * Ada [8]_ >>> * Julia (but not in the exponent part of floats) [9]_ >>> * Ruby (docs say "anywhere", in reality only between digits) [10]_ >> >> C++ is in this group too. >> >> The documentation of Perl explicitly says that Perl is in this group too >> (23__500 is not legal). Perhaps there is a bug in Perl implementation. >> And may be Swift is intended to be in this group. >> >> I think we should follow the majority of languages and use simple rule: >> "only between digits". >> >> I have provided an implementation. > > Thanks for the alternate patch. I used the two-function approach you took > in ast.c for my latest revision. > > I still think that some cases (like two of the examples in the PEP, > 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed > rule is preferable. Should I write an alternative PEP for strong rule? From g.brandl at gmx.net Thu Feb 11 12:34:19 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 18:34:19 +0100 Subject: [Python-Dev] Python 3.2.7 and 3.3.7 Message-ID: Hi all, I'm planning to release 3.2.7 and 3.3.7 at the end of February. There will be a release candidate on Feb 20, and the final on Feb 27, if there is no holdup. These are both security (source-only) releases. 3.2.7 will be the last release from the 3.2 series. If you know of any patches that should go in, make sure to commit them in time or notify me. Thanks, Georg From tjreedy at udel.edu Thu Feb 11 12:39:07 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 11 Feb 2016 12:39:07 -0500 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 2/11/2016 2:45 AM, Georg Brandl wrote: Thanks for grabbing this issue and moving it forward. I will like being about to write or read 200_000_000 and be sure I an right without counting 0s. > Based on the feedback so far, I have an easier rule in mind that I will base > the next PEP revision on. It's basically > > "One ore more underscores allowed anywhere after a digit or a base specifier." > > This preserves my preferred non-restrictive cases (0b_1111_0000, 1.5_j) and > disallows more controversial versions like "1.5e_+_2". I like both choices above. I don't like trailing underscores for two reasons. 1. The stated purpose of adding '_'s is to visually separate. Trailing underscores do not do that. They serve no purpose. 2. Trailing _s are used to turn keywords (class) into identifiers (class_). To me, 123_ mentally clashes with this usage. If trailing _ is allowed, to simplify the implementation, I would like PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged". -- Terry Jan Reedy From g.brandl at gmx.net Thu Feb 11 12:40:33 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Feb 2016 18:40:33 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 02/11/2016 06:19 PM, Serhiy Storchaka wrote: >> Thanks for the alternate patch. I used the two-function approach you took >> in ast.c for my latest revision. >> >> I still think that some cases (like two of the examples in the PEP, >> 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed >> rule is preferable. > > Should I write an alternative PEP for strong rule? That seems excessive for a minor point. Let's collect feedback for a few days, and we can also collect some informal votes. In the end, I suspect that Guido will let us know about his preference for one of the possibilities, and when he does, I will update the PEP accordingly. cheers, Georg From ethan at stoneleaf.us Thu Feb 11 12:43:21 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Feb 2016 09:43:21 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <56BCC839.9070505@stoneleaf.us> On 02/11/2016 09:19 AM, Serhiy Storchaka wrote: > On 11.02.16 14:14, Georg Brandl wrote: >> I still think that some cases (like two of the examples in the PEP, >> 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed >> rule is preferable. > > Should I write an alternative PEP for strong rule? Please don't. A style guide recommendation which allows for variations when necessary is much better -- consenting adults, remember? -- ~Ethan~ From brett at python.org Thu Feb 11 13:03:34 2016 From: brett at python.org (Brett Cannon) Date: Thu, 11 Feb 2016 18:03:34 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160211101326.GL31806@ando.pearwood.info> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> Message-ID: On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano wrote: > On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote: > > > And honestly, are you really claiming that in your opinion, "123_456_" > > is worse than all of their other examples, like "1_23__4"? > > Yes I am, because 123_456_ looks like you've forgotten to finish typing > the last group of digits, while 1_23__4 merely looks like you have no > taste. > OK, but the keyword in your sentence is "taste". If we update PEP 8 for our needs to say "Numerical literals should not have multiple underscores in a row or have a trailing underscore" then this is taken care of. We get a dead-simple rule for when underscores can be used, the implementation is simple, and we get to have more tasteful usage in the stdlib w/o forcing our tastes upon everyone or complicating the rules or implementation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Feb 11 13:05:44 2016 From: brett at python.org (Brett Cannon) Date: Thu, 11 Feb 2016 18:05:44 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On Thu, 11 Feb 2016 at 00:23 Georg Brandl wrote: > Hey all, > > based on the feedback so far, I revised the PEP. There is now > a much simpler rule for allowed underscores, with no exceptions. > This made the grammar simpler as well. > > --------------------------------------------------------------------------- > > PEP: 515 > Title: Underscores in Numeric Literals > Version: $Revision$ > Last-Modified: $Date$ > Author: Georg Brandl > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 10-Feb-2016 > Python-Version: 3.6 > > Abstract and Rationale > ====================== > > This PEP proposes to extend Python's syntax so that underscores can be > used in > integral, floating-point and complex number literals. > > This is a common feature of other modern languages, and can aid > readability of > long literals, or literals whose value should clearly separate into parts, > such > as bytes or words in hexadecimal notation. > > Examples:: > > # grouping decimal numbers by thousands > amount = 10_000_000.0 > > # grouping hexadecimal addresses by words > addr = 0xDEAD_BEEF > > # grouping bits into bytes in a binary literal > flags = 0b_0011_1111_0100_1110 > > # making the literal suffix stand out more > imag = 1.247812376e-15_j > > > Specification > ============= > > The current proposal is to allow one or more consecutive underscores > following > digits and base specifiers in numeric literals. > +1 from me. Nice and simple! And we can always update PEP 8 do disallow any usage that we deem ugly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Feb 11 13:15:45 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Feb 2016 10:15:45 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <1581BE5B-98E6-4ABE-BEAA-027FC02C61DD@yahoo.com> On Feb 11, 2016, at 09:39, Terry Reedy wrote: > > If trailing _ is allowed, to simplify the implementation, I would like PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged". That's a good point: we need style rules for PEP 8. But I think everything that's just obviously pointless (like putting an underscore between every pair of digits, or sprinkling underscores all over a huge number to make ASCII art), or already handled by other guidelines (e.g., using a ton of underscores to "line up a table" is the same as using a ton of spaces, which is already discouraged) doesn't really need to be covered. And I think trailing underscores probably fall into that category. It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this: While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like. 123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge) 123_456_789_012: better 123_456_789_012_: bad (trailing) 1_2_3_4_5_6: bad (too many) 1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise 3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise) 123.456_789e123: good 123.456_789e1_23: bad (never useful in exponent) 0x1234_5678: good 0o123_456: good 0x123_456_789: bad (3 hex digits is usually not a meaningful group) The one case that seems contentious is "123_456_j". Honestly, I don't care which way that goes, and I'd be fine if the PEP left out any mention of it, but if people feel strongly one way or the other, the PEP could just give it as a good or a bad example and that would be enough to clarify the intention. From storchaka at gmail.com Thu Feb 11 13:29:15 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Feb 2016 20:29:15 +0200 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 11.02.16 19:40, Georg Brandl wrote: > On 02/11/2016 06:19 PM, Serhiy Storchaka wrote: > >>> Thanks for the alternate patch. I used the two-function approach you took >>> in ast.c for my latest revision. >>> >>> I still think that some cases (like two of the examples in the PEP, >>> 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed >>> rule is preferable. >> >> Should I write an alternative PEP for strong rule? > > That seems excessive for a minor point. Let's collect feedback for > a few days, and we can also collect some informal votes. I suspect that my arguments can be lost otherwise. From abarnert at yahoo.com Thu Feb 11 13:29:22 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Feb 2016 10:29:22 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <1581BE5B-98E6-4ABE-BEAA-027FC02C61DD@yahoo.com> References: <1581BE5B-98E6-4ABE-BEAA-027FC02C61DD@yahoo.com> Message-ID: <9D15D138-4565-492F-8B25-CC7ABC5B9308@yahoo.com> On Feb 11, 2016, at 10:15, Andrew Barnert via Python-Dev wrote: > > That's a good point: we need style rules for PEP 8. One more point: should the tutorial mention underscores? It looks like the intro docs for a lot of the other languages do. And it would only take one short sentence in 3.1.1 Numbers to say that you can use underscores to make large numbers like 123_456.789_012 more readable. From jdhardy at gmail.com Thu Feb 11 13:35:42 2016 From: jdhardy at gmail.com (Jeff Hardy) Date: Thu, 11 Feb 2016 10:35:42 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <1581BE5B-98E6-4ABE-BEAA-027FC02C61DD@yahoo.com> References: <1581BE5B-98E6-4ABE-BEAA-027FC02C61DD@yahoo.com> Message-ID: On Thu, Feb 11, 2016 at 10:15 AM, Andrew Barnert via Python-Dev < python-dev at python.org> wrote: > On Feb 11, 2016, at 09:39, Terry Reedy wrote: > > > > If trailing _ is allowed, to simplify the implementation, I would like > PEP 8, while on the subject, to say something like "While trailing _s on > numbers are allowed, to simplify the implementation, they serve no purpose > and are strongly discouraged". > > That's a good point: we need style rules for PEP 8. > > But I think everything that's just obviously pointless (like putting an > underscore between every pair of digits, or sprinkling underscores all over > a huge number to make ASCII art), or already handled by other guidelines > (e.g., using a ton of underscores to "line up a table" is the same as using > a ton of spaces, which is already discouraged) doesn't really need to be > covered. And I think trailing underscores probably fall into that category. > > It might be simpler to write a "whitelist" than a "blacklist" of all the > ugly things people might come up with, and then just give a bunch of > examples instead of a bunch of rules. Something like this: > > While underscores can legally appear anywhere in the digit string, you > should never use them for purposes other than visually separating > meaningful digit groups like thousands, bytes, and the like. > > 123456_789012: ok (millions are groups, but thousands are more common, > and 6-digit groups are readable, but on the edge) > 123_456_789_012: better > 123_456_789_012_: bad (trailing) > 1_2_3_4_5_6: bad (too many) > 1234_5678: ok if code is intended to deal with east-Asian numerals > (where 10000 is a standard grouping), bad otherwise > 3__141_592_654: ok if this represents a fixed-point fraction > (obviously bad otherwise) > 123.456_789e123: good > 123.456_789e1_23: bad (never useful in exponent) > 0x1234_5678: good > 0o123_456: good > 0x123_456_789: bad (3 hex digits is usually not a meaningful group) > > The one case that seems contentious is "123_456_j". Honestly, I don't care > which way that goes, and I'd be fine if the PEP left out any mention of it, > but if people feel strongly one way or the other, the PEP could just give > it as a good or a bad example and that would be enough to clarify the > intention. > I imagine that for whatever "bad" grouping you can suggest, someone, somewhere, has a legitimate reason to use it. Any rule more complex than "Use underscores in numeric literals only when the improve clarity" is unnecessarily prescriptive. - Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Feb 11 13:50:09 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Feb 2016 20:50:09 +0200 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 11.02.16 10:22, Georg Brandl wrote: > Abstract and Rationale > ====================== > > This PEP proposes to extend Python's syntax so that underscores can be used in > integral, floating-point and complex number literals. > > This is a common feature of other modern languages, and can aid readability of > long literals, or literals whose value should clearly separate into parts, such > as bytes or words in hexadecimal notation. I have strong preference for more strict and simpler rule, used by most other languages -- "only between two digits". Main arguments: 1. Simple rule is easier to understand, remember and recognize. I care not about the complexity of the implementation (there is no large difference), but about cognitive complexity. 2. Most languages use this rule. It is better to follow non-formal standard that invent the rule that differs from rules in every other language. This will help programmers that use multiple languages. I have provided an alternative patch and can provide an alternative PEP if it is needed. > The production list for integer literals would therefore look like this:: > > integer: decimalinteger | octinteger | hexinteger | bininteger > decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")* > nonzerodigit: "1"..."9" > digit: "0"..."9" > octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* > hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* > bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)* > octdigit: "0"..."7" > hexdigit: digit | "a"..."f" | "A"..."F" > bindigit: "0" | "1" > > For floating-point and complex literals:: > > floatnumber: pointfloat | exponentfloat > pointfloat: [intpart] fraction | intpart "." > exponentfloat: (intpart | pointfloat) exponent > intpart: digit (digit | "_")* intpart: digit (["_"] digit)* > fraction: "." intpart > exponent: ("e" | "E") ["+" | "-"] intpart > imagnumber: (floatnumber | intpart) ("j" | "J") > **Group 1: liberal** > > This group is the least homogeneous: the rules vary slightly between languages. > All of them allow trailing underscores. Some allow underscores after non-digits > like the ``e`` or the sign in exponents. > > * D [2]_ > * Perl 5 (underscores basically allowed anywhere, although docs say it's more > restricted) [3]_ > * Rust (allows between exponent sign and digits) [4]_ > * Swift (although textual description says "between digits") [5]_ > > **Group 2: only between digits, multiple consecutive underscores** > > * C# (open proposal for 7.0) [6]_ > * Java [7]_ > > **Group 3: only between digits, only one underscore** > > * Ada [8]_ > * Julia (but not in the exponent part of floats) [9]_ > * Ruby (docs say "anywhere", in reality only between digits) [10]_ This classification is misleading. The difference between groups 2 and 3 is less then between different languages in group 1. To be fair, groups 2 and 3 should be united in one group. C++ should be included in this group. Perl 5 and Swift should be either included in both groups or excluded from any group, because they have inconsistencies between the documentation and the implementation or between different parts of the documentation. With correct classification it is obvious what variant is the most popular. From ethan at stoneleaf.us Thu Feb 11 14:01:26 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Feb 2016 11:01:26 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <56BCDA86.4010009@stoneleaf.us> On 02/11/2016 10:50 AM, Serhiy Storchaka wrote: > I have strong preference for more strict and simpler rule, used by > most other languages -- "only between two digits". Main arguments: > 2. Most languages use this rule. It is better to follow non-formal > standard that invent the rule that differs from rules in every other > language. This will help programmers that use multiple languages. If Python followed other languages in everything: 1) Python would not need to exist; and 2) Python would suck ;) If our rule is more permissive that other languages then cross-language developers can still use the same style in both languages, without penalizing those who want to use the extra freedom in Python. -- ~Ethan~ From v+python at g.nevcal.com Thu Feb 11 14:08:49 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 11 Feb 2016 11:08:49 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <56BCDA86.4010009@stoneleaf.us> References: <56BCDA86.4010009@stoneleaf.us> Message-ID: <56BCDC41.7080000@g.nevcal.com> On 2/11/2016 11:01 AM, Ethan Furman wrote: > On 02/11/2016 10:50 AM, Serhiy Storchaka wrote: > > I have strong preference for more strict and simpler rule, used by > > most other languages -- "only between two digits". Main arguments: > > > 2. Most languages use this rule. It is better to follow non-formal > > standard that invent the rule that differs from rules in every other > > language. This will help programmers that use multiple languages. > > If Python followed other languages in everything: > > 1) Python would not need to exist; and > 2) Python would suck ;) > > If our rule is more permissive that other languages then > cross-language developers can still use the same style in both > languages, without penalizing those who want to use the extra freedom > in Python. Ditto. If people need an idea to shoot down, regarding literal constants, and because I couldn't find a Python-Non-Ideas list to post this in, here is one. Note that it is unambiguous, does not conflict with existing binary literals, but otherwise sucks. Please vote this idea down with emphasis: Base 64 decoding literals: print( 0b64_CjMy_NTM0_Mjkw_NQ ) 325342905 -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Feb 11 14:04:20 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 11 Feb 2016 11:04:20 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <56BCDB34.6020400@g.nevcal.com> On 2/11/2016 12:22 AM, Georg Brandl wrote: > Hey all, > > based on the feedback so far, I revised the PEP. There is now > a much simpler rule for allowed underscores, with no exceptions. > This made the grammar simpler as well. +1 overall > Examples:: > > # grouping decimal numbers by thousands > amount = 10_000_000.0 > > # grouping hexadecimal addresses by words > addr = 0xDEAD_BEEF > > # grouping bits into bytes in a binary literal nybbles, not bytes, is shown... which is more readable, and does group into bytes also. > flags = 0b_0011_1111_0100_1110 +1 on 0b_ and 0X_ and, especially, 0O_ (but why anyone would use uppercase base designators is beyond me, as it is definitely less readable) > # making the literal suffix stand out more > imag = 1.247812376e-15_j +1 on _j -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Feb 11 18:01:03 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Feb 2016 10:01:03 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <20160211230102.GO31806@ando.pearwood.info> On Thu, Feb 11, 2016 at 08:50:09PM +0200, Serhiy Storchaka wrote: > I have strong preference for more strict and simpler rule, used by most > other languages -- "only between two digits". Main arguments: > > 1. Simple rule is easier to understand, remember and recognize. I care > not about the complexity of the implementation (there is no large > difference), but about cognitive complexity. > > 2. Most languages use this rule. It is better to follow non-formal > standard that invent the rule that differs from rules in every other > language. This will help programmers that use multiple languages. > > I have provided an alternative patch and can provide an alternative PEP > if it is needed. I don't think an alternative PEP is needed, but I hope that your alternative gets a fair treatment in the PEP. > >The production list for integer literals would therefore look like this:: > > > > integer: decimalinteger | octinteger | hexinteger | bininteger > > decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")* > > nonzerodigit: "1"..."9" > > digit: "0"..."9" > > octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* > > octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* > > > hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* > > hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* > > > bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* > > bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)* To me, Serhiy's versions (starting with single > symbols) are not only simpler to learn, but have a simpler (or at least shorter) implementation too. [...] > >**Group 3: only between digits, only one underscore** > > > >* Ada [8]_ > >* Julia (but not in the exponent part of floats) [9]_ > >* Ruby (docs say "anywhere", in reality only between digits) [10]_ > > This classification is misleading. The difference between groups 2 and 3 > is less then between different languages in group 1. To be fair, groups > 2 and 3 should be united in one group. C++ should be included in this > group. Perl 5 and Swift should be either included in both groups or > excluded from any group, because they have inconsistencies between the > documentation and the implementation or between different parts of the > documentation. > > With correct classification it is obvious what variant is the most popular. It is not obvious to me what you think the correct classification is. If you disagree with Georg's classification, would you reclassify the languages, and if there is agreement that you are correct, he can update the PEP? -- Steve From hugo.fisher at gmail.com Thu Feb 11 18:05:48 2016 From: hugo.fisher at gmail.com (Hugh Fisher) Date: Fri, 12 Feb 2016 10:05:48 +1100 Subject: [Python-Dev] fullOfEels, assistant program for writing Python extension modules in C Message-ID: On Fri, Feb 12, 2016 at 3:30 AM, Nathaniel Smith wrote: > You're almost certainly aware of this, but just to double check since you > don't mention it in the email: cython is also a great tool for handling > similar situations. Not quite the same since in addition to generating all > the boilerplate for you it then lets you use almost-python to actually write > the C implementations as well, and I understand that with your tool you > write the actual implementations in C. But probably also worth considering > in cases where you'd consider this tool, so wanted to make sure it was on > your radar. Yes, cython is a fine tool and I wouldn't try to dissuade anyone from using it if it works for them. FullOfEels is for when the implementation should be hidden altogether. Most often this is because of cross-platform differences or coding horrors, but could also be handy for teaching when it's easier to just give students plain Python modules to look at. Thanks for replying. -- cheers, Hugh Fisher From steve at pearwood.info Thu Feb 11 19:16:34 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Feb 2016 11:16:34 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> Message-ID: <20160212001633.GP31806@ando.pearwood.info> On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote: > On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano wrote: > > > On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote: > > > > > And honestly, are you really claiming that in your opinion, "123_456_" > > > is worse than all of their other examples, like "1_23__4"? > > > > Yes I am, because 123_456_ looks like you've forgotten to finish typing > > the last group of digits, while 1_23__4 merely looks like you have no > > taste. > > > > OK, but the keyword in your sentence is "taste". I disagree. The key *idea* in my sentence is that the trailing underscore looks like a programming error. In my opinion, avoiding that impression is important enough to make trailing underscores a syntax error. I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I haven't seen anyone in favour of trailing underscores. Does anyone think there is a good case for allowing trailing underscores? > If we update PEP 8 for our > needs to say "Numerical literals should not have multiple underscores in a > row or have a trailing underscore" then this is taken care of. We get a > dead-simple rule for when underscores can be used, the implementation is > simple, and we get to have more tasteful usage in the stdlib w/o forcing > our tastes upon everyone or complicating the rules or implementation. I think this is a misrepresentation of the alternative. As I see it, we have two alternatives: - one or more underscores can appear AFTER the base specifier or any digit; - one or more underscores can appear BETWEEN two digits. To describe the second alternative as "complicating the rules" is, I think, grossly unfair. And if Serhiy's proposal is correct, the implementation is also no more complicated: # underscores after digits octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* # underscores between digits octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)* The idea that the second alternative "forc[es] our tastes on everyone" while the first does not is bogus. The first alternative also prohibits things which are a matter of taste: # prohibited in both alternatives 0_xDEADBEEF 0._1234 1.2e_99 -_1 1j_ I think that there is broad agreement that: - the basic idea is sound - leading underscores followed by digits are currently legal identifiers and this will not change - underscores should not follow the sign - + - underscores should not follow the decimal point . - underscores should not follow the exponent e|E - underscores will not be permitted inside the exponent (even if it is harmless, it's silly to write 1.2e9_9) - underscores should not follow the complex suffix j and only minor disagreement about: - whether or not underscores will be allowed after the base specifier 0x 0o 0b - whether or not underscores will be allowed before the decimal point, exponent and complex suffix. Can we have a show of hands, in favour or against the above two? And then perhaps Guido can rule on this one way or the other and we can get back to arguing about more important matters? :-) In case it isn't obvious, I prefer to say No to allowing underscores after the base specifier, or before the decimal point, exponent and complex suffix. -- Steve From vadmium+py at gmail.com Thu Feb 11 19:59:00 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Fri, 12 Feb 2016 00:59:00 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: On 11 February 2016 at 11:12, Chris Angelico wrote: > On Thu, Feb 11, 2016 at 7:22 PM, Georg Brandl wrote: The following extensions are open for discussion: >> * Allowing underscores in string arguments to the ``Decimal`` constructor. It >> could be argued that these are akin to literals, since there is no Decimal >> literal available (yet). >> >> * Allowing underscores in string arguments to ``int()`` with base argument 0, >> ``float()`` and ``complex()``. > > I'm -0.5 on both of these, with the caveat that if either gets done, > both should be. Decimal() shouldn't be different from int() just > because there's currently no way to express a Decimal literal; if > Python 3.7 introduces such a literal, there'd be this weird rule > difference that has to be maintained for backward compatibility, and > has no justification left. I would be weakly in favour of all relevant constructors being updated to match the new syntax. The main reason is just consistency, and that the documentation already kind of guarantees that the literal syntax is supported (definitely for int and float; for complex it is too vague). To be consistent, the following minor extensions of the syntax should be allowed, which are not legal Python literals: int("0_001"), int("J_00", 20), float("0_001"), complex("0_001"). Maybe also with non-ASCII digits. However I tried writing Arabic-Indic digits (U+0600 etc) and my web browser split the number apart when I inserted an underscore. Maybe a right-to-left thing. But using Devangari digits U+0966, U+0967: int("?_???") (= 1_000). Non-ASCII digits are apparently intentionally supported, but not documented: . > (As a side point, I would be fully in favour of Decimal literals. I'd > also be in favour of something like "from __future__ import > fraction_literals" so 1/2 would evaluate to Fraction(1,2) rather than > 0.5. Hence I'm inclined *not* to support underscores in Decimal().) Seems more like an argument to have the support in Decimal() consistent with float() etc, i.e. all or nothing. From vadmium+py at gmail.com Thu Feb 11 20:29:26 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Fri, 12 Feb 2016 01:29:26 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160212001633.GP31806@ando.pearwood.info> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> Message-ID: On 12 February 2016 at 00:16, Steven D'Aprano wrote: > On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote: >> On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano wrote: >> >> > On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote: >> > >> > > And honestly, are you really claiming that in your opinion, "123_456_" >> > > is worse than all of their other examples, like "1_23__4"? >> > >> > Yes I am, because 123_456_ looks like you've forgotten to finish typing >> > the last group of digits, while 1_23__4 merely looks like you have no >> > taste. >> > >> >> OK, but the keyword in your sentence is "taste". > > I disagree. The key *idea* in my sentence is that the trailing > underscore looks like a programming error. In my opinion, avoiding that > impression is important enough to make trailing underscores a syntax > error. > > I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I > haven't seen anyone in favour of trailing underscores. Does anyone think > there is a good case for allowing trailing underscores? > > >> If we update PEP 8 for our >> needs to say "Numerical literals should not have multiple underscores in a >> row or have a trailing underscore" then this is taken care of. We get a >> dead-simple rule for when underscores can be used, the implementation is >> simple, and we get to have more tasteful usage in the stdlib w/o forcing >> our tastes upon everyone or complicating the rules or implementation. > > I think this is a misrepresentation of the alternative. As I see it, we > have two alternatives: > > - one or more underscores can appear AFTER the base specifier or any digit; +1 > - one or more underscores can appear BETWEEN two digits. -0 Having underscores between digits is the main usage, but I don?t see much harm in the more liberal version, unless it that makes the specification or implementation too complex. Allowing stuff like 0x_100, 4.7_e3, and 1_j seems of slightly more benefit IMO than disallowing 1_000_. > To describe the second alternative as "complicating the rules" is, I > think, grossly unfair. And if Serhiy's proposal is correct, the > implementation is also no more complicated: > > # underscores after digits > octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* > hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* > bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* > > # underscores between digits > octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* > hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* > bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)* > > > The idea that the second alternative "forc[es] our tastes on everyone" > while the first does not is bogus. The first alternative also prohibits > things which are a matter of taste: > > # prohibited in both alternatives > 0_xDEADBEEF > 0._1234 > 1.2e_99 > -_1 This one is already a valid variable identifier name. > 1j_ > > > I think that there is broad agreement that: > > - the basic idea is sound > - leading underscores followed by digits are currently legal > identifiers and this will not change +1 to both > - underscores should not follow the sign - + > - underscores should not follow the decimal point . > - underscores should not follow the exponent e|E No strong opinion on these from me > - underscores will not be permitted inside the exponent (even if > it is harmless, it's silly to write 1.2e9_9) -0, it seems like a needless inconsistency, unless it somehow hurts the implementation > - underscores should not follow the complex suffix j No opinion > and only minor disagreement about: > > - whether or not underscores will be allowed after the base > specifier 0x 0o 0b +0 > - whether or not underscores will be allowed before the decimal > point, exponent and complex suffix. No opinion about directly before decimal point; +0 before exponent or imaginary (complex) suffix. > Can we have a show of hands, in favour or against the above two? And > then perhaps Guido can rule on this one way or the other and we can get > back to arguing about more important matters? :-) > > In case it isn't obvious, I prefer to say No to allowing underscores > after the base specifier, or before the decimal point, exponent and > complex suffix. From abarnert at yahoo.com Thu Feb 11 20:38:26 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Feb 2016 01:38:26 +0000 (UTC) Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: Message-ID: <263337469.131022.1455241106360.JavaMail.yahoo@mail.yahoo.com> On Thursday, February 11, 2016 10:35 AM, Jeff Hardy wrote: >On Thu, Feb 11, 2016 at 10:15 AM, Andrew Barnert via Python-Dev wrote: > >>That's a good point: we need style rules for PEP 8. ... >>It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this: >> >>While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like. >> >> 123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge) >> 123_456_789_012: better >> 123_456_789_012_: bad (trailing) >> 1_2_3_4_5_6: bad (too many) >> 1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise >> 3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise) >> 123.456_789e123: good >> 123.456_789e1_23: bad (never useful in exponent) >> 0x1234_5678: good >> 0o123_456: good >> 0x123_456_789: bad (3 hex digits is usually not a meaningful group) > >I imagine that for whatever "bad" grouping you can suggest, someone, somewhere, has a legitimate reason to use it. That's exactly why we should just have bad examples in the style guide, rather than coming up with style rules that try to strongly discourage them (or making them syntax errors). >Any rule more complex than "Use underscores in numeric literals only when the improve clarity" is unnecessarily prescriptive. Your rule doesn't need to be stated at all. It's already a given that you shouldn't add semantically-meaningless characters anywhere unless they improve clarity.... I don't think saying that they're for "visually separating meaningful digit groups like thousands, bytes, and the like" is unnecessarily prescriptive. If someone comes up with a legitimate use for something we've never anticipated, it will almost certainly just be a way of grouping digits that's meaningful in a way we didn't anticipate. And, if not, it's just a style guideline, so it doesn't have to apply 100% of the time. If someone really comes up with something that has nothing to do with grouping digits, all the style guideline will do is make them stop and think about whether it really is a good use of underscores--and, if it is, they'll go ahead and do it. From v+python at g.nevcal.com Thu Feb 11 21:02:27 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 11 Feb 2016 18:02:27 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160212001633.GP31806@ando.pearwood.info> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> Message-ID: <56BD3D33.9000508@g.nevcal.com> On 2/11/2016 4:16 PM, Steven D'Aprano wrote: > On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote: >> On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano wrote: >> >>> On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote: >>> >>>> And honestly, are you really claiming that in your opinion, "123_456_" >>>> is worse than all of their other examples, like "1_23__4"? >>> Yes I am, because 123_456_ looks like you've forgotten to finish typing >>> the last group of digits, while 1_23__4 merely looks like you have no >>> taste. >>> >> OK, but the keyword in your sentence is "taste". > I disagree. The key *idea* in my sentence is that the trailing > underscore looks like a programming error. In my opinion, avoiding that > impression is important enough to make trailing underscores a syntax > error. > > I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I > haven't seen anyone in favour of trailing underscores. Does anyone think > there is a good case for allowing trailing underscores? > > >> If we update PEP 8 for our >> needs to say "Numerical literals should not have multiple underscores in a >> row or have a trailing underscore" then this is taken care of. We get a >> dead-simple rule for when underscores can be used, the implementation is >> simple, and we get to have more tasteful usage in the stdlib w/o forcing >> our tastes upon everyone or complicating the rules or implementation. > I think this is a misrepresentation of the alternative. As I see it, we > have two alternatives: > > - one or more underscores can appear AFTER the base specifier or any digit; > - one or more underscores can appear BETWEEN two digits. > > To describe the second alternative as "complicating the rules" is, I > think, grossly unfair. And if Serhiy's proposal is correct, the > implementation is also no more complicated: > > # underscores after digits > octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* > hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* > bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* # underscores after digits octinteger: "0" ("o" | "O") (octdigit | "_")* hexinteger: "0" ("x" | "X") (hexdigit | "_")* bininteger: "0" ("b" | "B") (bindigit | "_")* An extra side effect is that there are more ways to write zero. 0x, 0b, 0o, 0X, 0B, 0O, 0x_, 0b_, 0o_, etc. But most people write 0 anyway, so those would be bad style, anyway, but it makes the implementation simpler. > > # underscores between digits > octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* > hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* > bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)* > > > The idea that the second alternative "forc[es] our tastes on everyone" > while the first does not is bogus. The first alternative also prohibits > things which are a matter of taste: > > # prohibited in both alternatives > 0_xDEADBEEF > 0._1234 > 1.2e_99 > -_1 > 1j_ > > > I think that there is broad agreement that: > > - the basic idea is sound > - leading underscores followed by digits are currently legal > identifiers and this will not change > - underscores should not follow the sign - + > - underscores should not follow the decimal point . > - underscores should not follow the exponent e|E > - underscores will not be permitted inside the exponent (even if > it is harmless, it's silly to write 1.2e9_9) > - underscores should not follow the complex suffix j > > and only minor disagreement about: > > - whether or not underscores will be allowed after the base > specifier 0x 0o 0b +1 to allow underscores after the base specifier. > - whether or not underscores will be allowed before the decimal > point, exponent and complex suffix. +1 to allow them. There may be cases where they are useful, and if it is not useful, it would not be used. I really liked someone's style guide proposal: use of underscore within numeric constants should only be done to aid readability. However, pre-judging what aids readability to one person's particular taste is inappropriate. > Can we have a show of hands, in favour or against the above two? And > then perhaps Guido can rule on this one way or the other and we can get > back to arguing about more important matters? :-) > > In case it isn't obvious, I prefer to say No to allowing underscores > after the base specifier, or before the decimal point, exponent and > complex suffix. I think it was obvious :) And I think we disagree. And yes, there are more important matters. But it was just a couple days ago when I wrote a big constant in some new code that I was thinking how nice it would be if I could put a delimiter in there... so I'll be glad for the feature when it is available. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Feb 11 22:10:15 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Feb 2016 12:10:15 +0900 Subject: [Python-Dev] Windows: Remove support of bytes filenames in theos module? In-Reply-To: <1844729669.1825953.1455136233223.JavaMail.yahoo@mail.yahoo.com> References: <22203.20013.518404.39381@turnbull.sk.tsukuba.ac.jp> <1844729669.1825953.1455136233223.JavaMail.yahoo@mail.yahoo.com> Message-ID: <22205.19735.804637.504662@turnbull.sk.tsukuba.ac.jp> Executive summary: My experience is that having bytes APIs in the os module is very useful. But perhaps higher-level functions like os.scandir can do without (I present no arguments either way on that, just acknowledge it). Andrew Barnert writes: > Anyway, Windows CDs can't cause this problem. My bad. I meant archival Mac CDs (or perhaps they were taken from a network filesystem) which is where I see MacRoman, and Windows (ie, FAT-formatted) USB drives, which is where I see Shift JIS. The point here is not what is technically possible or even standard, it's that though what I see in practice may not *require* bytes APIs, it's *very convenient* to have them (especially interactively). > The same thing is true with NTFS external drives, VFAT USB drives, > etc. Generally, it's usually not Windows media on *nix systems that > break Python 2 unicode; it's native *nix filesystems where users > mix locales. IMHO, Python 2 unicode is not breakable, let alone broken. ;-) Mailman 2 has managed to almost get to a state where you can't get it to raise a Unicode exception (except where deliberately used as EAFP), let alone one that is not handled (before the catch-all "except Exception" that keeps the daemon running). And that's in an application whose original encoding support assumed standard conformance by design in a realm where spammers and junior high school hackers regularly violate the most ancient of RFCs (the restriction to ASCII in headers goes back to a 6xx RFC at the latest!) Python 2 Unicode turns out to have been an excellent compromise between the needs of backward compatibility with uniformly encoded bytestreams for Europe, and the forward-looking needs of a globalizing Internet. (But you knew that! :-) As I wrote earlier, the world is broken, or at least Japan. The world "got bettah", thus Python 3. And most of the time Python 3 is wonderful in Japan (specifically, it's trivial to get recalcitrant students to use best I18N practice). My point is that *where I live* the experience is very different. There are *no* Japanese who use *nix (other than Mac OS X) for paperwork in my neighborhood. Shift JIS filenames *are* from Windows media recently written, though probably not by Microsoft-provided software. Bytes APIs are a very useful tool in dealing with these issues, at least in the hands of someone who has become expert in dealing with them. I suspect the same is true of China, except that like their business partner Apple they are in a position to legislate uniformity, and do. (Unfortunately that's GB18030, not Unicode.) So maybe they're better off than a place that coined the phrase "politics that can't decide". I admit I've not yet used os.scandir, let alone its bytes API. Perhaps we can, and perhaps we should, restrict the bytes API in the os module to a few basic functions, and require that the environment be sane for cases where we want to use higher-level or optimized functions. > > You contradict yourself! ;-) > > I'm perfectly happen to have been wrong earlier. And if catching > myself before someone else did makes me a flip-flopper, well, I'm > not running for president. :P I consider that the most important qualification for President, especially if your name is Trump or Sanders. That's one of the things I respect most about Python: with a few (negligible) exceptions, minds change to fit the facts. And, BTW, EAFP applies here, too. Make mistakes on the mailing lists before you commit them to code. Please! From stephen at xemacs.org Thu Feb 11 22:17:32 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Feb 2016 12:17:32 +0900 Subject: [Python-Dev] Duelling PEPs not needed [was: PEP 515: Underscores in Numeric Literals] In-Reply-To: References: Message-ID: <22205.20172.25609.202018@turnbull.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > I suspect that my arguments can be lost [without a competing PEP]. Send Georg a patch for his PEP, that's where they belong, since only one of the two PEPs could be approved, and they would be 95% the same otherwise. If he doesn't apply it (he's allowed to move it to the "rejected arguments" section, though), or the decision silently goes against you, speak up then -- that would be a problem IMO. Or you could offer to BD1P! (If you're selected, I hope you change your mind! :-) From stephen at xemacs.org Thu Feb 11 22:19:15 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Feb 2016 12:19:15 +0900 Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: <20160211093905.GK31806@ando.pearwood.info> References: <56BBD109.2010600@canterbury.ac.nz> <20160211093905.GK31806@ando.pearwood.info> Message-ID: <22205.20275.556528.189811@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Peters has an opinion?) but if we do change, I'd like to see the > existing random.Random moved to random.MT_Random for backwards > compatibility and compatibility with other software which uses MT. Not > necessarily saying that we have to keep it around forever (after all, we > did dump the Wichmann-Hill PRNG some time ago) but we ought to keep it > for at least a couple of releases. I think we should keep it around forever. Even my slowest colleagues are learning that they should record their seeds and PRNG algorithms for reproducibility's sake. :-) For that matter, restore Wichmann-Hill. Both should be clearly marked as "use only for reproducing previous bitstreams" (eg, in a package random.deprecated_generators). From mertz at gnosis.cx Thu Feb 11 22:56:45 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 11 Feb 2016 20:56:45 -0700 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <56BD3D33.9000508@g.nevcal.com> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> <56BD3D33.9000508@g.nevcal.com> Message-ID: Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos. https://en.m.wikipedia.org/wiki/Crore On Feb 11, 2016 7:04 PM, "Glenn Linderman" wrote: > On 2/11/2016 4:16 PM, Steven D'Aprano wrote: > > On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote: > > On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano wrote: > > > On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote: > > > And honestly, are you really claiming that in your opinion, "123_456_" > is worse than all of their other examples, like "1_23__4"? > > > Yes I am, because 123_456_ looks like you've forgotten to finish typing > the last group of digits, while 1_23__4 merely looks like you have no > taste. > > > > OK, but the keyword in your sentence is "taste". > > > I disagree. The key *idea* in my sentence is that the trailing > underscore looks like a programming error. In my opinion, avoiding that > impression is important enough to make trailing underscores a syntax > error. > > I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I > haven't seen anyone in favour of trailing underscores. Does anyone think > there is a good case for allowing trailing underscores? > > > > If we update PEP 8 for our > needs to say "Numerical literals should not have multiple underscores in a > row or have a trailing underscore" then this is taken care of. We get a > dead-simple rule for when underscores can be used, the implementation is > simple, and we get to have more tasteful usage in the stdlib w/o forcing > our tastes upon everyone or complicating the rules or implementation. > > > I think this is a misrepresentation of the alternative. As I see it, we > have two alternatives: > > - one or more underscores can appear AFTER the base specifier or any digit; > - one or more underscores can appear BETWEEN two digits. > > To describe the second alternative as "complicating the rules" is, I > think, grossly unfair. And if Serhiy's proposal is correct, the > implementation is also no more complicated: > > # underscores after digits > octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* > hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* > bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* > > > # underscores after digits > octinteger: "0" ("o" | "O") (octdigit | "_")* > hexinteger: "0" ("x" | "X") (hexdigit | "_")* > bininteger: "0" ("b" | "B") (bindigit | "_")* > > > An extra side effect is that there are more ways to write zero. 0x, 0b, > 0o, 0X, 0B, 0O, 0x_, 0b_, 0o_, etc. > But most people write 0 anyway, so those would be bad style, anyway, > but it makes the implementation simpler. > > > > # underscores between digits > octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* > hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* > bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)* > > > The idea that the second alternative "forc[es] our tastes on everyone" > while the first does not is bogus. The first alternative also prohibits > things which are a matter of taste: > > # prohibited in both alternatives > 0_xDEADBEEF > 0._1234 > 1.2e_99 > -_1 > 1j_ > > > I think that there is broad agreement that: > > - the basic idea is sound > - leading underscores followed by digits are currently legal > identifiers and this will not change > - underscores should not follow the sign - + > - underscores should not follow the decimal point . > - underscores should not follow the exponent e|E > - underscores will not be permitted inside the exponent (even if > it is harmless, it's silly to write 1.2e9_9) > - underscores should not follow the complex suffix j > > and only minor disagreement about: > > - whether or not underscores will be allowed after the base > specifier 0x 0o 0b > > > +1 to allow underscores after the base specifier. > > - whether or not underscores will be allowed before the decimal > point, exponent and complex suffix. > > > +1 to allow them. There may be cases where they are useful, and if it is > not useful, it would not be used. I really liked someone's style guide > proposal: use of underscore within numeric constants should only be done to > aid readability. However, pre-judging what aids readability to one > person's particular taste is inappropriate. > > Can we have a show of hands, in favour or against the above two? And > then perhaps Guido can rule on this one way or the other and we can get > back to arguing about more important matters? :-) > > In case it isn't obvious, I prefer to say No to allowing underscores > after the base specifier, or before the decimal point, exponent and > complex suffix. > > I think it was obvious :) And I think we disagree. And yes, there are > more important matters. But it was just a couple days ago when I wrote a > big constant in some new code that I was thinking how nice it would be if I > could put a delimiter in there... so I'll be glad for the feature when it > is available. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Feb 11 23:09:15 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 11 Feb 2016 20:09:15 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> <56BD3D33.9000508@g.nevcal.com> Message-ID: <56BD5AEB.5000500@g.nevcal.com> On 2/11/2016 7:56 PM, David Mertz wrote: > > Great PEP overall. We definitely don't want the restriction to > grouping numbers only in threes. South Asian crore use grouping in twos. > > https://en.m.wikipedia.org/wiki/Crore > Interesting... 3 digits in the least significant group, and _then_ by twos. Wouldn't have predicted that one! Never bumped into that notation before! -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Feb 11 23:12:29 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Feb 2016 04:12:29 +0000 (UTC) Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: <22205.20275.556528.189811@turnbull.sk.tsukuba.ac.jp> References: <22205.20275.556528.189811@turnbull.sk.tsukuba.ac.jp> Message-ID: <1792987316.2489584.1455250350012.JavaMail.yahoo@mail.yahoo.com> On Thursday, February 11, 2016 7:20 PM, Stephen J. Turnbull wrote: > I think we should keep it around forever. Even my slowest colleagues > are learning that they should record their seeds and PRNG algorithms > for reproducibility's sake. :-) +1 > For that matter, restore Wichmann-Hill. So you can write code that works on 2.3 and 3.6, but not 3.5? I agree that it shouldn't have gone away, but I think it may be too late for adding it back to help too much. > Both should be clearly marked as "use only for reproducing previous > bitstreams" (eg, in a package random.deprecated_generators). I like the random.deprecated_generators idea. From tim.peters at gmail.com Thu Feb 11 23:15:00 2016 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 11 Feb 2016 22:15:00 -0600 Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: <56BBD109.2010600@canterbury.ac.nz> References: <56BBD109.2010600@canterbury.ac.nz> Message-ID: [Greg Ewing ] > The Mersenne Twister is no longer regarded as quite state-of-the art > because it can get into states that produce long sequences that are > not very random. > > There is a variation on MT called WELL that has better properties > in this regard. Does anyone think it would be a good idea to replace > MT with WELL as Python's default rng? I don't think so, because I've seen no groundswell of discontent about the Twister among Python users. Perhaps I'm missing some? Changes are disruptive and people argue about RNGs with religious zeal, so I favor making a change in this area only when it's compelling. It was compelling to move away from Wichmann-Hill when the Twister was introduced: WH was waaaaaay behind the state of the art at the time, its limitations were causing real problems, and there was near-universal adoption of the Twister around the world. The Twister was a game changer. When the time comes for a change, I'd be more inclined to (as Robert Kern already said) look at PCG and Random123. Like the Twister, WELL requires massive internal state, and fails the same kinds of randomnesss tests (while the suggested alternatives fail none to date). WELL does escape "zeroland" faster, but still much slower than PCG or Random123 (which appear to have no systematic attractors). The alternatives require much smaller state, and at least PCG much simpler code. Note that the seeding function used by Python doesn't take the user-supplied seed as-is (only __setstate__ does): it runs rounds of pseudo-random bit dispersion, to make it highly unlikely that an initial state with lots of zeroes is produced. While the Twister escapes zeroland very slowly, the flip side is that it also transitions _to_ zeroland very slowly. It's quite possible that nobody has ever fallen into such a state (short of contriving to via __setstate__). Falling into zeroland was a very real problem in the Twister's very early days, which is why its authors added the bit-dispersal code to the seeding function. Python was wise to wait until they did. It's prudent to wait for someone else to find the early surprises in PCG and Random123 too ;-) From abarnert at yahoo.com Thu Feb 11 23:22:51 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Feb 2016 04:22:51 +0000 (UTC) Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <56BD5AEB.5000500@g.nevcal.com> References: <56BD5AEB.5000500@g.nevcal.com> Message-ID: <1457615622.2456404.1455250972021.JavaMail.yahoo@mail.yahoo.com> On Thursday, February 11, 2016 8:10 PM, Glenn Linderman wrote: >On 2/11/2016 7:56 PM, David Mertz wrote: > >Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos. >>https://en.m.wikipedia.org/wiki/Crore >> >Interesting... 3 digits in the least significant group, and _then_ by twos. Wouldn't have predicted that one! Never bumped into that notation before! The first time I used underscore separators in any language, it was a test script for a server that wanted social security numbers as integers instead of strings, like 123_45_6789.[^1] Which is why I suggested the style guideline should just say "meaningful grouping of digits", rather than try to predict what counts as "meaningful" for every program. [^1] Of course in Python, it's usually trivial to stick a shim in between the database and the model thingy so I could just pass in "123-45-6789", so I don't expect to ever need this specific example. From v+python at g.nevcal.com Thu Feb 11 23:28:04 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 11 Feb 2016 20:28:04 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <1457615622.2456404.1455250972021.JavaMail.yahoo@mail.yahoo.com> References: <56BD5AEB.5000500@g.nevcal.com> <1457615622.2456404.1455250972021.JavaMail.yahoo@mail.yahoo.com> Message-ID: <56BD5F54.2090206@g.nevcal.com> On 2/11/2016 8:22 PM, Andrew Barnert wrote: > On Thursday, February 11, 2016 8:10 PM, Glenn Linderman wrote: > >> On 2/11/2016 7:56 PM, David Mertz wrote: >> >> Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos. >>> https://en.m.wikipedia.org/wiki/Crore >>> >> Interesting... 3 digits in the least significant group, and _then_ > by twos. Wouldn't have predicted that one! Never bumped into that > notation before! > > > The first time I used underscore separators in any language, it was a test script for a server that wanted social security numbers as integers instead of strings, like 123_45_6789.[^1] > > Which is why I suggested the style guideline should just say "meaningful grouping of digits", rather than try to predict what counts as "meaningful" for every program. > > > [^1] Of course in Python, it's usually trivial to stick a shim in between the database and the model thingy so I could just pass in "123-45-6789", so I don't expect to ever need this specific example. > Yes, I had thought of the Social Security Number possibility also, although having them as constants in a program seems a bit unusual. Test script, fake numbers, yeah, I guess so. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Feb 11 23:45:36 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 12 Feb 2016 15:45:36 +1100 Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: <1792987316.2489584.1455250350012.JavaMail.yahoo@mail.yahoo.com> References: <22205.20275.556528.189811@turnbull.sk.tsukuba.ac.jp> <1792987316.2489584.1455250350012.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Fri, Feb 12, 2016 at 3:12 PM, Andrew Barnert via Python-Dev wrote: > On Thursday, February 11, 2016 7:20 PM, Stephen J. Turnbull wrote: > > > >> I think we should keep it around forever. Even my slowest colleagues >> are learning that they should record their seeds and PRNG algorithms >> for reproducibility's sake. :-) > > +1 > >> For that matter, restore Wichmann-Hill. > > So you can write code that works on 2.3 and 3.6, but not 3.5? > > I agree that it shouldn't have gone away, but I think it may be too late for adding it back to help too much. You're probably right, but the point isn't to make the same code run, necessarily. It's to make things verifiable. Suppose I do some scientific research that involves a pseudo-random number component, and I publish my results ("Monte Carlo analysis produced these results, blah blah, using this seed, etc, etc"). If you want to come back later and say "I think there was a bug in your code", you need to be able to generate the exact same PRNG sequence. I published my algorithm and my seed, so you should in theory be able to recreate that sequence; but if you have to reimplement the same algorithm, that's a lot of unnecessary work that could have been replaced with "from random.deprecated_generators import WichmannHill as Random". (Plus there's the whole question of "was your reimplemented PRNG buggy" - or, for that matter, "was the original PRNG buggy". Using the exact same code eliminates even that.) So I'm +1 on keeping Mersenne Twister even after it's been replaced as the default PRNG, -0 on reinstating something that hasn't been used in well over a decade, and -1 on replacing MT today - I'm not seeing strong arguments in favour of changing. ChrisA From p.f.moore at gmail.com Fri Feb 12 04:00:19 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Feb 2016 09:00:19 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <20160212001633.GP31806@ando.pearwood.info> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> Message-ID: On 12 February 2016 at 00:16, Steven D'Aprano wrote: > I think that there is broad agreement that: > > - the basic idea is sound > - leading underscores followed by digits are currently legal > identifiers and this will not change > - underscores should not follow the sign - + > - underscores should not follow the decimal point . > - underscores should not follow the exponent e|E > - underscores will not be permitted inside the exponent (even if > it is harmless, it's silly to write 1.2e9_9) > - underscores should not follow the complex suffix j > > and only minor disagreement about: > > - whether or not underscores will be allowed after the base > specifier 0x 0o 0b > - whether or not underscores will be allowed before the decimal > point, exponent and complex suffix. > > Can we have a show of hands, in favour or against the above two? And > then perhaps Guido can rule on this one way or the other and we can get > back to arguing about more important matters? :-) > > In case it isn't obvious, I prefer to say No to allowing underscores > after the base specifier, or before the decimal point, exponent and > complex suffix. I have no opinion on anything other than that whatever syntax is implemented as long as it allows single underscores between digits, such as 1_000_000 Everything else is irrelevant to me, and if I read code that uses anything else, I'd judge it based on readability and style, and wouldn't care about arguments that "it's allowed by the grammar". Paul From storchaka at gmail.com Fri Feb 12 04:45:27 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Feb 2016 11:45:27 +0200 Subject: [Python-Dev] Py_SETREF again Message-ID: Sorry to bringing this up again. I was hoping we were done with that. When discussing the name of the Py_SETREF macro I was supposed to add a pair of macros: for Py_DECREF and Py_XDECREF. But I got a lot of opinions to be limited to only one macro. On 28.02.14 15:58, Kristj?n Valur J?nsson wrote: > Also, for the equivalence to hold there is no separate Py_XSETREF, the X > behaviour is implied, which I favour. Enough of this X-proliferation > already! On 16.12.15 16:53, Random832 wrote: > I think "SET" names imply that it's safe if the original > reference is NULL. This isn't an objection to the names, but if > it is given one of those names I think it should use Py_XDECREF. It was my initial intension. But then I had got a number of voices for single macros. On 16.12.15 23:16, Victor Stinner wrote: > I would prefer a single macro to avoid bugs, I don't think that such > macro has a critical impact on performances. It's more designed for > safety, no? On 17.12.15 08:22, Nick Coghlan wrote: >> 1. Py_SETREF > > +1 if it always uses Py_XDECREF on the previous value (as I'd expect > this to work even if the previous value was NULL) There was no (besides my) clearly expressed vote for two macros. As a result I have replaced both Py_DECREF and Py_XDECREF with the macro that always uses Py_XDECREF. Now Raymond, who was not involved in the previous discussions, expressed the view that we should to rename Py_SETREF to Py_XSETREF and add new Py_SETREF that uses Py_DECREF for using in the code that used Py_DECREF previously. [1] We should discuss the need for this, and may be re-discuss the names for the macros. [1] http://bugs.python.org/issue26200 From ncoghlan at gmail.com Fri Feb 12 05:34:44 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Feb 2016 20:34:44 +1000 Subject: [Python-Dev] Py_SETREF again In-Reply-To: References: Message-ID: On 12 February 2016 at 19:45, Serhiy Storchaka wrote: > Now Raymond, who was not involved in the previous discussions, expressed the > view that we should to rename Py_SETREF to Py_XSETREF and add new Py_SETREF > that uses Py_DECREF for using in the code that used Py_DECREF previously. > [1] > > We should discuss the need for this, and may be re-discuss the names for the > macros. I'm inclined to go with the resolution discussed later in the comments on that tracker issue - switch to spelling out the details when you want to avoid the Py_XDECREF inside the Py_SETREF macro. As Raymond notes, if you're wanting to manage when and how DECREF's occur, you may not want to hide them inside another macro at all. I'm also wondering if it may be worth adding some notes about reference counting to PEP 7, such as: * using Py_RETURN_NONE/FALSE/TRUE * using Py_CLEAR * using Py_SETREF (but being free to avoid it if you want to use Py_DECREF instead or are hand-optimising the code in some other way) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Fri Feb 12 06:18:11 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 12 Feb 2016 12:18:11 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> <56B35AB5.5090308@egenix.com> Message-ID: ping? 2016-02-08 15:18 GMT+01:00 Victor Stinner : > 2016-02-04 15:05 GMT+01:00 M.-A. Lemburg : >> Sometimes, yes, but we also do allocations for e.g. >> parsing values in Python argument tuples (e.g. using >> "es" or "et"): >> >> https://docs.python.org/3.6/c-api/arg.html >> >> We do document to use PyMem_Free() on those; not sure whether >> everyone does this though. > > It's well documented. If programs start to crash, they must be fixed. > > I don't propose to "break the API" for free, but to get a speedup on > the overall Python. > > And I don't think that we can say that it's an API change, since we > already stated that PyMem_Free() must be used. > > If your program has bugs, you can use a debug build of Python 3.5 to > detect misusage of the API. > > >> The Python test suite doesn't test Python C extensions, >> so it's not surprising that it passes :-) > > What do you mean by "C extensions"? Which modules? > > Many modules in the stdlib have "C accelerators" and the PEP 399 now > *require* to test the C and Python implementations. > > > >>> Instead of teaching developers that well, in fact, PyObject_Malloc() >>> is unrelated to object programming, I think that it's simpler to >>> modify PyMem_Malloc() to reuse pymalloc ;-) >> >> Perhaps if you add some guards somewhere :-) > > We have runtime checks but only implemented in debug mode for efficiency. > > By the way, I proposed once to add an environment variable to allow to > enable these checks without having to recompile Python. Since the PEP > 445, it became easy to implement this. What do you think? > https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environment-variable > > "This alternative was rejected because a new environment variable > would make Python initialization even more complex. PEP 432 tries to > simplify the CPython startup sequence." > > The PEP 432 looks stuck, so I don't think that we should block > enhancements because of this PEP. Anyway, my idea should be easy to > implement. > > >> Seriously, this may work if C extensions use the APIs >> consistently, but in order to tell, we'd need to check >> few. > > Can you suggest me names of projects that must be tested? > > >> I guess the main question then is whether pymalloc is good enough >> for general memory allocation needs; and the answer may well be >> "yes". > > What do you mean by "good enough"? For the runtime performance, > pymalloc looks to be faster than malloc(). What are your other > criterias? Memory fragmentation? > > > Victor From robert.kern at gmail.com Fri Feb 12 06:28:05 2016 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 12 Feb 2016 11:28:05 +0000 Subject: [Python-Dev] Time for a change of random number generator? In-Reply-To: References: <56BBD109.2010600@canterbury.ac.nz> Message-ID: On 2016-02-12 04:15, Tim Peters wrote: > [Greg Ewing ] >> The Mersenne Twister is no longer regarded as quite state-of-the art >> because it can get into states that produce long sequences that are >> not very random. >> >> There is a variation on MT called WELL that has better properties >> in this regard. Does anyone think it would be a good idea to replace >> MT with WELL as Python's default rng? > > I don't think so, because I've seen no groundswell of discontent about > the Twister among Python users. Perhaps I'm missing some? Well me, but I'm mostly focused on numpy's PRNG, which is proceeding apace. https://github.com/bashtage/ng-numpy-randomstate While I am concerned about MT's BigCrush failures, what makes me most discontented is not having multiple guaranteed-independent streams. > It's prudent to wait for someone else to find the early surprises in > PCG and Random123 too ;-) Quite so! -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mal at egenix.com Fri Feb 12 08:31:15 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 12 Feb 2016 14:31:15 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> <56B35AB5.5090308@egenix.com> Message-ID: <56BDDEA3.2060702@egenix.com> On 12.02.2016 12:18, Victor Stinner wrote: > ping? Sorry, your email must gotten lost in my inbox. > 2016-02-08 15:18 GMT+01:00 Victor Stinner : >> 2016-02-04 15:05 GMT+01:00 M.-A. Lemburg : >>> Sometimes, yes, but we also do allocations for e.g. >>> parsing values in Python argument tuples (e.g. using >>> "es" or "et"): >>> >>> https://docs.python.org/3.6/c-api/arg.html >>> >>> We do document to use PyMem_Free() on those; not sure whether >>> everyone does this though. >> >> It's well documented. If programs start to crash, they must be fixed. >> >> I don't propose to "break the API" for free, but to get a speedup on >> the overall Python. >> >> And I don't think that we can say that it's an API change, since we >> already stated that PyMem_Free() must be used. >> >> If your program has bugs, you can use a debug build of Python 3.5 to >> detect misusage of the API. Yes, but people don't necessarily do this, e.g. I have for a very long time ignored debug builds completely and when I started to try them, I found that some of the things I had been doing with e.g. free list implementations did not work in debug builds. >>> The Python test suite doesn't test Python C extensions, >>> so it's not surprising that it passes :-) >> >> What do you mean by "C extensions"? Which modules? >> >> Many modules in the stdlib have "C accelerators" and the PEP 399 now >> *require* to test the C and Python implementations. Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs). It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc. >>>> Instead of teaching developers that well, in fact, PyObject_Malloc() >>>> is unrelated to object programming, I think that it's simpler to >>>> modify PyMem_Malloc() to reuse pymalloc ;-) >>> >>> Perhaps if you add some guards somewhere :-) >> >> We have runtime checks but only implemented in debug mode for efficiency. >> >> By the way, I proposed once to add an environment variable to allow to >> enable these checks without having to recompile Python. Since the PEP >> 445, it became easy to implement this. What do you think? >> https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environment-variable >> >> "This alternative was rejected because a new environment variable >> would make Python initialization even more complex. PEP 432 tries to >> simplify the CPython startup sequence." >> >> The PEP 432 looks stuck, so I don't think that we should block >> enhancements because of this PEP. Anyway, my idea should be easy to >> implement. I suppose such a flag would create a noticeable runtime performance hit, since the compiler would no longer be able to inline the PyMem_*() APIs if you redirect those APIs to other sets at runtime. I also don't see much point in carrying around such baggage in production builds of Python, since you'd most likely only want to use the tools to debug C extensions during their development. >>> Seriously, this may work if C extensions use the APIs >>> consistently, but in order to tell, we'd need to check >>> few. >> >> Can you suggest me names of projects that must be tested? See above for a list of starters :-) It would be good to add a few more that work on text or larger chunks of memory, since those will most likely utilize the memory allocators more than other extensions which mostly wrap (sets of) C variables. Some of them may also have benchmarks, so in addition to checking whether they work with the change, you could also test performance. >>> I guess the main question then is whether pymalloc is good enough >>> for general memory allocation needs; and the answer may well be >>> "yes". >> >> What do you mean by "good enough"? For the runtime performance, >> pymalloc looks to be faster than malloc(). What are your other >> criterias? Memory fragmentation? Runtime performance, difference in memory consumption (arenas cannot be freed if there are still small chunks allocated), memory locality. I'm no expert in this, so can't really comment much. I suspect that lib C and OS provided allocators will have advantages as well, but since pymalloc redirects to them for all larger memory chunks, it's probably an overall win for Python C extensions (and Python itself). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 12 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2016-01-19: Released eGenix pyOpenSSL 0.13.13 ... http://egenix.com/go86 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From g.brandl at gmx.net Fri Feb 12 08:43:19 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 12 Feb 2016 14:43:19 +0100 Subject: [Python-Dev] Py_SETREF again In-Reply-To: References: Message-ID: On 02/12/2016 10:45 AM, Serhiy Storchaka wrote: > Sorry to bringing this up again. I was hoping we were done with that. > > When discussing the name of the Py_SETREF macro I was supposed to add a > pair of macros: for Py_DECREF and Py_XDECREF. But I got a lot of > opinions to be limited to only one macro. > > There was no (besides my) clearly expressed vote for two macros. I would have voted in favor. Spelling the SETREF out, as Nick proposes, kind of defies the purpose of the macro: it's not strictly a convenience macro, it helps prevent refcounting bugs. > As a result I have replaced both Py_DECREF and Py_XDECREF with the macro > that always uses Py_XDECREF. Can you roughly say which fraction of replacements changed DECREF to an implicit XDECREF? Georg From storchaka at gmail.com Fri Feb 12 09:19:13 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Feb 2016 16:19:13 +0200 Subject: [Python-Dev] Py_SETREF again In-Reply-To: References: Message-ID: On 12.02.16 15:43, Georg Brandl wrote: > On 02/12/2016 10:45 AM, Serhiy Storchaka wrote: >> Sorry to bringing this up again. I was hoping we were done with that. >> >> When discussing the name of the Py_SETREF macro I was supposed to add a >> pair of macros: for Py_DECREF and Py_XDECREF. But I got a lot of >> opinions to be limited to only one macro. >> >> There was no (besides my) clearly expressed vote for two macros. > > I would have voted in favor. > > Spelling the SETREF out, as Nick proposes, kind of defies the purpose of > the macro: it's not strictly a convenience macro, it helps prevent > refcounting bugs. > >> As a result I have replaced both Py_DECREF and Py_XDECREF with the macro >> that always uses Py_XDECREF. > > Can you roughly say which fraction of replacements changed DECREF to an > implicit XDECREF? Changesets c4e8751ce637, bc7c56a225de, 539ba7267701, b02d256b8827, 1118dfcbcc35. Rough estimation: Py_DECREF - 62 Py_XDECREF - 57 Py_CLEAR - 46 Total statistic of using macros in current code: Py_SETREF 174 2.5% Py_CLEAR 781 11% Py_XDECREF 1443 20.5% Py_DECREF 4631 66% From victor.stinner at gmail.com Fri Feb 12 10:07:21 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 12 Feb 2016 16:07:21 +0100 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: <56BDDEA3.2060702@egenix.com> References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> <56B35AB5.5090308@egenix.com> <56BDDEA3.2060702@egenix.com> Message-ID: Hi, 2016-02-12 14:31 GMT+01:00 M.-A. Lemburg : > Sorry, your email must gotten lost in my inbox. no problemo > Yes, but those are part of the stdlib. You'd need to check > a few C extensions which are not tested as part of the stdlib, > e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom > types in C since these will often need the memory management > APIs). > > It may also be a good idea to check wrapper generators such > as cython, swig, cffi, etc. Ok, I will try my patch on some of them. Thanks for the pointers. > I suppose such a flag would create a noticeable runtime > performance hit, since the compiler would no longer be > able to inline the PyMem_*() APIs if you redirect those > APIs to other sets at runtime. Hum, I think that you missed the PEP 445. The overhead of this PEP was discussed and considered as negligible enough to implement the PEP: https://www.python.org/dev/peps/pep-0445/#performances Using the PEP 445, there is no overhead to enable debug hooks at runtime (except of the overhead of the debug checks themself ;-)). PyMem_Malloc now calls a pointer: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l319 Same for PyObject_Malloc: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l380 > I also don't see much point in carrying around such > baggage in production builds of Python, since you'd most > likely only want to use the tools to debug C extensions during > their development. I propose adding an environment variable because it's rare that a debug build is installed on system. Usually, using a debug build requires to recompile all C extensions which is not really... convenient... With such env var, it would be trivial to check quickly if the Python memory allocators are used correctly. > Runtime performance, difference in memory consumption (arenas > cannot be freed if there are still small chunks allocated), > memory locality. I'm no expert in this, so can't really > comment much. "arenas cannot be freed if there are still small chunks allocated" yeah, this is called memory fragmentation. There is a big difference between libc malloc() and pymalloc for small allocations: pymalloc is able to free an arena using munmap() which releases immediatly the memory to the system, whereas most implementation of malloc() use a single contigious memory block which is only shrinked when all memory "at the top" is free. So it's the same fragmentation issue that you described, except that it uses a single arena which has an arbitrary size (between 1 MB and 10 GB, there is no limit), whereas pymalloc uses small arenas of 256 KB. In short, I expect less fragmentation with pymalloc. "memory locality": I have no idea on that. I guess that it can be seen on benchmarks. pymalloc is designed for objects with short lifetime. Victor From pmiscml at gmail.com Fri Feb 12 10:43:38 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 12 Feb 2016 17:43:38 +0200 Subject: [Python-Dev] [ANN] MicroPython 1.6 Message-ID: <20160212174338.651accf5@x230> Hello, MicroPython is a lean and efficient Python implementation for microcontrollers, embedded, and mobile systems (which also runs just as fine on desktops, servers, and clouds). https://github.com/micropython/micropython https://github.com/micropython/micropython/releases/tag/v1.6 There're following major changes since 1.5: 1. LwIP module support for embedded TCP/IP networking. 2. IPv6 support in the Unix port. 3. Beta support for persistent bytecode (similar to CPython's .pyc) 4. 64-bit NaN boxing (improved floating-point performance if enabled). 5. Support for new official PyBoards PYBv1.1 and PYBLITEv1.0. 6. Long int constant folding during bytecode compilation (glad that CPython will catch up in that area thanks to FAT Python project). 7. There's a ongoing crowdfunding campaign to fund complete and well-maintained MicroPython port to ubiquitous ESP8266 WiFi SoC, and improve networking and IoT support in MicroPython in general: https://www.kickstarter.com/projects/214379695/micropython-on-the-esp8266-beautifully-easy-iot -- Best regards, Paul mailto:pmiscml at gmail.com From status at bugs.python.org Fri Feb 12 12:08:39 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 12 Feb 2016 18:08:39 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160212170839.2C2B05667B@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-02-05 - 2016-02-12) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5417 ( +4) closed 32691 (+50) total 38108 (+54) Open issues with patches: 2368 Issues opened (39) ================== #22107: tempfile module misinterprets access denied error on Windows http://bugs.python.org/issue22107 reopened by serhiy.storchaka #25911: Regression: os.walk now using os.scandir() breaks bytes filena http://bugs.python.org/issue25911 reopened by serhiy.storchaka #26299: wsgiref.util FileWrapper raises ValueError: I/O operation on c http://bugs.python.org/issue26299 opened by samwyse #26300: "unpacked" bytecode http://bugs.python.org/issue26300 opened by abarnert #26302: cookies module allows commas in keys http://bugs.python.org/issue26302 opened by jason.coombs #26303: Shared execution context between doctests in a module http://bugs.python.org/issue26303 opened by kernc #26305: Make Argument Clinic to generate PEP 7 conforming code http://bugs.python.org/issue26305 opened by serhiy.storchaka #26306: Can't create abstract tuple http://bugs.python.org/issue26306 opened by Jack Hargreaves #26307: no PGO for built-in modules with `make profile-opt` http://bugs.python.org/issue26307 opened by tzot #26308: Solaris 10 build issues http://bugs.python.org/issue26308 opened by gms #26309: socketserver.BaseServer._handle_request_noblock() don't shutdw http://bugs.python.org/issue26309 opened by palaviv #26313: ssl.py _load_windows_store_certs fails if windows cert store i http://bugs.python.org/issue26313 opened by Jonathan Kamens #26314: interned strings are stored in a dict, a set would use less me http://bugs.python.org/issue26314 opened by gregory.p.smith #26316: Probable typo in Arg Clinic???s linear_format() http://bugs.python.org/issue26316 opened by martin.panter #26317: Build Problem with GCC + Macintosh OS X 10.11 El Capitain http://bugs.python.org/issue26317 opened by Robert P Fischer #26318: `io.open(fd, ...).name` returns numeric fd instead of None http://bugs.python.org/issue26318 opened by mmarkk #26319: Check recData size before unpack in zipfile http://bugs.python.org/issue26319 opened by j w #26322: Missing docs for typing.Set http://bugs.python.org/issue26322 opened by gvanrossum #26323: Add a assert_called() method for mock objects http://bugs.python.org/issue26323 opened by Amit.Saha #26326: Named entity "vertical line" missed in 2.7 htmlentitydefs.py http://bugs.python.org/issue26326 opened by andreas.roehler #26327: File > Save in IDLE shell window not working http://bugs.python.org/issue26327 opened by xflr6 #26328: shutil._copyxattr() function shouldn't fail if setting securit http://bugs.python.org/issue26328 opened by bigon #26329: os.path.normpath("//") returns // http://bugs.python.org/issue26329 opened by Fred Rolland #26330: shutil.disk_usage() on Windows can't properly handle unicode http://bugs.python.org/issue26330 opened by giampaolo.rodola #26331: Tokenizer: allow underscores for grouping in numeric literals http://bugs.python.org/issue26331 opened by georg.brandl #26332: OSError: exception: access violation writing <...> (Windows 10 http://bugs.python.org/issue26332 opened by jk #26333: Multiprocessing imap hangs when generator input errors http://bugs.python.org/issue26333 opened by Aaron Halfaker #26334: bytes.translate() doesn't take keyword arguments; docs suggest http://bugs.python.org/issue26334 opened by Nicholas Chammas #26335: Make mmap.write return the number of bytes written like other http://bugs.python.org/issue26335 opened by jstasiak #26336: Expose regex bytecode as attribute of compiled pattern object http://bugs.python.org/issue26336 opened by Jonathan Goble #26337: Bypass imghdr module determines the type of image http://bugs.python.org/issue26337 opened by Ramin Farajpour Cami #26338: remove duplicate bind addresses in create_server http://bugs.python.org/issue26338 opened by sebastien.bourdeauducq #26340: modal dialog with transient method; parent window fails to ico http://bugs.python.org/issue26340 opened by vs #26342: Faster bit ops for single-digit positive longs http://bugs.python.org/issue26342 opened by yselivanov #26346: PySequenceMethods documentation missing sq_slice and sq_ass_sl http://bugs.python.org/issue26346 opened by atuining #26347: BoundArguments.apply_defaults doesn't handle empty arguments http://bugs.python.org/issue26347 opened by Frederick Wagner #26348: activate.fish sets VENV prompt incorrectly http://bugs.python.org/issue26348 opened by Dan McCombs #26349: Ship python35.lib with the embedded distribution, please http://bugs.python.org/issue26349 opened by Thomas F??hringer #26350: Windoes: signal doc should state certains signals can't be reg http://bugs.python.org/issue26350 opened by giampaolo.rodola Most recent 15 issues with no replies (15) ========================================== #26349: Ship python35.lib with the embedded distribution, please http://bugs.python.org/issue26349 #26348: activate.fish sets VENV prompt incorrectly http://bugs.python.org/issue26348 #26340: modal dialog with transient method; parent window fails to ico http://bugs.python.org/issue26340 #26338: remove duplicate bind addresses in create_server http://bugs.python.org/issue26338 #26336: Expose regex bytecode as attribute of compiled pattern object http://bugs.python.org/issue26336 #26335: Make mmap.write return the number of bytes written like other http://bugs.python.org/issue26335 #26333: Multiprocessing imap hangs when generator input errors http://bugs.python.org/issue26333 #26332: OSError: exception: access violation writing <...> (Windows 10 http://bugs.python.org/issue26332 #26327: File > Save in IDLE shell window not working http://bugs.python.org/issue26327 #26322: Missing docs for typing.Set http://bugs.python.org/issue26322 #26319: Check recData size before unpack in zipfile http://bugs.python.org/issue26319 #26313: ssl.py _load_windows_store_certs fails if windows cert store i http://bugs.python.org/issue26313 #26308: Solaris 10 build issues http://bugs.python.org/issue26308 #26306: Can't create abstract tuple http://bugs.python.org/issue26306 #26299: wsgiref.util FileWrapper raises ValueError: I/O operation on c http://bugs.python.org/issue26299 Most recent 15 issues waiting for review (15) ============================================= #26348: activate.fish sets VENV prompt incorrectly http://bugs.python.org/issue26348 #26347: BoundArguments.apply_defaults doesn't handle empty arguments http://bugs.python.org/issue26347 #26342: Faster bit ops for single-digit positive longs http://bugs.python.org/issue26342 #26338: remove duplicate bind addresses in create_server http://bugs.python.org/issue26338 #26335: Make mmap.write return the number of bytes written like other http://bugs.python.org/issue26335 #26331: Tokenizer: allow underscores for grouping in numeric literals http://bugs.python.org/issue26331 #26323: Add a assert_called() method for mock objects http://bugs.python.org/issue26323 #26314: interned strings are stored in a dict, a set would use less me http://bugs.python.org/issue26314 #26309: socketserver.BaseServer._handle_request_noblock() don't shutdw http://bugs.python.org/issue26309 #26308: Solaris 10 build issues http://bugs.python.org/issue26308 #26305: Make Argument Clinic to generate PEP 7 conforming code http://bugs.python.org/issue26305 #26302: cookies module allows commas in keys http://bugs.python.org/issue26302 #26285: Garbage collection of unused input sections from CPython binar http://bugs.python.org/issue26285 #26282: Add support for partial keyword arguments in extension functio http://bugs.python.org/issue26282 #26280: ceval: Optimize list[int] (subscript) operation similarly to C http://bugs.python.org/issue26280 Top 10 most discussed issues (10) ================================= #21955: ceval.c: implement fast path for integers with a single digit http://bugs.python.org/issue21955 28 msgs #26302: cookies module allows commas in keys http://bugs.python.org/issue26302 14 msgs #26200: SETREF adds unnecessary work in some cases http://bugs.python.org/issue26200 11 msgs #26331: Tokenizer: allow underscores for grouping in numeric literals http://bugs.python.org/issue26331 11 msgs #24165: Free list for single-digits ints http://bugs.python.org/issue24165 10 msgs #26182: Deprecation warnings for the future async and await keywords i http://bugs.python.org/issue26182 6 msgs #24916: In sysconfig, don't rely on sys.version format http://bugs.python.org/issue24916 5 msgs #26330: shutil.disk_usage() on Windows can't properly handle unicode http://bugs.python.org/issue26330 5 msgs #25195: mock.ANY doesn't match mock.MagicMock() object http://bugs.python.org/issue25195 4 msgs #25911: Regression: os.walk now using os.scandir() breaks bytes filena http://bugs.python.org/issue25911 4 msgs Issues closed (47) ================== #15731: Mechanism for inheriting docstrings and signatures http://bugs.python.org/issue15731 closed by ncoghlan #16023: IDLE freezes on ^5 or ^6 (Un-)Tabify Region with OS X Cocoa Tk http://bugs.python.org/issue16023 closed by ned.deily #19543: Add -3 warnings for codec convenience method changes http://bugs.python.org/issue19543 closed by ncoghlan #22983: Cookie parsing should be more permissive http://bugs.python.org/issue22983 closed by martin.panter #25226: "suffix" attribute not documented in logging.TimedRotatingFile http://bugs.python.org/issue25226 closed by vinay.sajip #25295: functools.lru_cache raises KeyError http://bugs.python.org/issue25295 closed by Peter Brady #25639: open 'PhysicalDriveN' on windows fails (since python 3.5) with http://bugs.python.org/issue25639 closed by eryksun #25698: The copy_reg module becomes unexpectedly empty in test_cpickle http://bugs.python.org/issue25698 closed by serhiy.storchaka #25709: Problem with string concatenation and utf-8 cache. http://bugs.python.org/issue25709 closed by georg.brandl #25848: Tkinter tests failed on Windows buildbots http://bugs.python.org/issue25848 closed by zach.ware #25949: Lazy creation of __dict__ in OrderedDict http://bugs.python.org/issue25949 closed by serhiy.storchaka #25983: Add tests for multi-argument type() http://bugs.python.org/issue25983 closed by serhiy.storchaka #25985: Use sys.version_info instead of sys.version http://bugs.python.org/issue25985 closed by serhiy.storchaka #25992: test_gdb fails on OSX http://bugs.python.org/issue25992 closed by ned.deily #25994: File descriptor leaks in os.scandir() http://bugs.python.org/issue25994 closed by serhiy.storchaka #25995: os.walk() consumes a lot of file descriptors http://bugs.python.org/issue25995 closed by serhiy.storchaka #26045: Improve error message for http.client when posting unicode str http://bugs.python.org/issue26045 closed by martin.panter #26086: Bug in os module http://bugs.python.org/issue26086 closed by steven.daprano #26117: Close directory descriptor in scandir iterator on error http://bugs.python.org/issue26117 closed by serhiy.storchaka #26136: DeprecationWarning for PEP 479 (generator_stop) http://bugs.python.org/issue26136 closed by martin.panter #26198: PyArg_ParseTuple with format "et#" and "es#" detects overflow http://bugs.python.org/issue26198 closed by serhiy.storchaka #26204: compiler: ignore constants used as statements (don't emit LOAD http://bugs.python.org/issue26204 closed by haypo #26223: decimal.to_eng_string() does not implement engineering notatio http://bugs.python.org/issue26223 closed by rhettinger #26243: zlib.compress level as keyword argument http://bugs.python.org/issue26243 closed by martin.panter #26248: Improve scandir DirEntry docs, especially re symlinks and cach http://bugs.python.org/issue26248 closed by gvanrossum #26279: time.strptime does not properly convert out-of-bounds values http://bugs.python.org/issue26279 closed by ned.deily #26287: Core dump in f-string with formatting errors http://bugs.python.org/issue26287 closed by eric.smith #26288: Optimize PyLong_AsDouble for single-digit longs http://bugs.python.org/issue26288 closed by yselivanov #26289: Optimize floor division for ints http://bugs.python.org/issue26289 closed by yselivanov #26294: Queue().unfinished_tasks not in docs - deliberate? http://bugs.python.org/issue26294 closed by rhettinger #26297: Move constant folding to AST level http://bugs.python.org/issue26297 closed by serhiy.storchaka #26298: Split ceval.c into small files http://bugs.python.org/issue26298 closed by haypo #26301: ceval.c: reintroduce fast-path for list[index] in BINARY_SUBSC http://bugs.python.org/issue26301 closed by haypo #26304: Fix ???allows to ??? in documentation http://bugs.python.org/issue26304 closed by martin.panter #26310: Fix typo ???variariables??? in socketserver.py http://bugs.python.org/issue26310 closed by martin.panter #26311: Typo in documentation for xml.parsers.expat http://bugs.python.org/issue26311 closed by martin.panter #26312: Raise SystemError on programmical errors in PyArg_Parse*() http://bugs.python.org/issue26312 closed by serhiy.storchaka #26315: Optimize mod division for ints http://bugs.python.org/issue26315 closed by yselivanov #26320: Web documentation for 2.7 has unreadable highlights in Table o http://bugs.python.org/issue26320 closed by python-dev #26321: datetime.strptime fails to parse AM/PM correctly http://bugs.python.org/issue26321 closed by zach.ware #26324: sum() incorrect on negative zeros http://bugs.python.org/issue26324 closed by Jim.Jewett #26325: Add helper to check that no ResourceWarning is emitted http://bugs.python.org/issue26325 closed by serhiy.storchaka #26339: Python rk0.3b1 KeyError: 'config_argparse_rel_path' http://bugs.python.org/issue26339 closed by berker.peksag #26341: Implement free-list for single-digit longs http://bugs.python.org/issue26341 closed by yselivanov #26343: os.O_CLOEXEC not available on OS X http://bugs.python.org/issue26343 closed by ned.deily #26344: `sys.meta_path` Skipped for Packages with Non-Standard Suffixe http://bugs.python.org/issue26344 closed by brett.cannon #26345: Extra newline appended to UTF-8 strings on Windows http://bugs.python.org/issue26345 closed by eryksun From chris.barker at noaa.gov Fri Feb 12 15:06:04 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 12 Feb 2016 12:06:04 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> Message-ID: On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore wrote: > > I have no opinion on anything other than that whatever syntax is > implemented as long as it allows single underscores between digits, > such as > > 1_000_000 > > Everything else is irrelevant to me, and if I read code that uses > anything else, I'd judge it based on readability and style, and > wouldn't care about arguments that "it's allowed by the grammar". I totally agree -- and it's clear that other cultures group digits differently, so we should allow that, but while I'll live with it either way, I'd rather have it be as restrictive as possible rather than as unrestricted as possible. As in: no double underscores no underscore right before or after a period no underscore at the beginning or end. .... As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement.... As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Feb 12 15:39:25 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 12 Feb 2016 20:39:25 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> Message-ID: <56BE42FD.2020608@mrabarnett.plus.com> On 2016-02-12 20:06, Chris Barker wrote: > On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore > wrote: > > > I have no opinion on anything other than that whatever syntax is > implemented as long as it allows single underscores between digits, > such as > > 1_000_000 > > Everything else is irrelevant to me, and if I read code that uses > anything else, I'd judge it based on readability and style, and > wouldn't care about arguments that "it's allowed by the grammar". > > > I totally agree -- and it's clear that other cultures group digits > differently, so we should allow that, but while I'll live with it either > way, I'd rather have it be as restrictive as possible rather than as > unrestricted as possible. As in: > > no double underscores > no underscore right before or after a period > no underscore at the beginning or end. > .... > > As Paul said, as long as I can do the above, I'll be fine, but I think > everyone's source code will be a lot cleaner in the long run if you > don't have the option of doing who knows what weird arrangement.... > > As for the SS# example -- it seems a bad idea to me to store a SS# > number as an integer anyway -- so all the weird IDs etc. formats aren't > really relevant... > That also applies to telephone numbers, account numbers, etc. They aren't really numbers (you wouldn't do arithmetic on them) and might have leading zeros. From v+python at g.nevcal.com Fri Feb 12 15:58:00 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 12 Feb 2016 12:58:00 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> Message-ID: <56BE4758.8000400@g.nevcal.com> On 2/12/2016 12:06 PM, Chris Barker wrote: > On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore > wrote: > > > I have no opinion on anything other than that whatever syntax is > implemented as long as it allows single underscores between digits, > such as > > 1_000_000 > > Everything else is irrelevant to me, and if I read code that uses > anything else, I'd judge it based on readability and style, and > wouldn't care about arguments that "it's allowed by the grammar". > > > I totally agree -- and it's clear that other cultures group digits > differently, so we should allow that, but while I'll live with it > either way, I'd rather have it be as restrictive as possible rather > than as unrestricted as possible. As in: > > no double underscores Useful for really long binary constants... one _ for nybble or field divisions, two __ for byte divisions. Of course, really long binary constants might be a bad idea. > no underscore right before or after a period > no underscore at the beginning or end. You get your wish for the beginning... it would be ambiguous with identifiers. And your style guide can include whatever restrictions you like, for your code. > .... > > As Paul said, as long as I can do the above, I'll be fine, but I think > everyone's source code will be a lot cleaner in the long run if you > don't have the option of doing who knows what weird arrangement.... > > As for the SS# example -- it seems a bad idea to me to store a SS# > number as an integer anyway -- so all the weird IDs etc. formats > aren't really relevant... SS#... why not integer? Phone#... why not integer? There's a lot of nice digit-division conventions for phone#s in different parts of the world. The only ambiguity is if such numbers have leading zeros, you have to "know" (or record) how many total digits are expected. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Feb 12 16:31:33 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Feb 2016 13:31:33 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: <56BE4758.8000400@g.nevcal.com> References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> <56BE4758.8000400@g.nevcal.com> Message-ID: <89219717-7DCC-4BE0-B1D1-FA8E90321021@yahoo.com> On Feb 12, 2016, at 12:58, Glenn Linderman wrote: > >> On 2/12/2016 12:06 PM, Chris Barker wrote: >> As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant... > > SS#... why not integer? Phone#... why not integer? There's a lot of nice digit-division conventions for phone#s in different parts of the world. I'm the one who brought up the SSN example--and, as I said at the time, I almost certainly wouldn't have done that in Python. I was maintaining tests for a service that stored SSNs as integers (which I think is a mistake, but I couldn't change it), a automatically-generated strongly-typed interface to that service (which is good), and no easy way to wrap or hook that interface (which is bad). In Python, it's hard to imagine how I'd end up with a situation where I couldn't wrap or hook the interface and treat SSNs as strings in my test code. (In fact, for complicated tests, I did exactly that in Python to make sure they were correct, then ported them over to integrate with the test suite...) And anyway, the only point was that I've actually used a grouping that isn't "every 3 digits" and it didn't end the world. I think everyone agrees that some such groupings will come up--even if not every specific examples is good, there are some that are. Even the people who want something more conservative than the PEP doesn't seem to be taking that position--they may not want double underscores, or "123_456_j", but they're fine with "if yuan > 9999_9999:". So, either we try to anticipate every possible way people might want to group numbers and decide which ones are good or bad, or we just let the style guide say "meaningful group of digits" and let each developer decide what counts as "meaningful" for their application. Does anyone really want to argue for the former? If not, why not just settle that and go back to bikeshedding the cases that *are* contended, like "123_456_j"? (I'm happy either way, as long as the grammar rule is dead simple and the PEP 8 rule is pretty simple, but I know others have strong, and conflicting, opinions on that.) From p.f.moore at gmail.com Fri Feb 12 18:17:19 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Feb 2016 23:17:19 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals In-Reply-To: References: <20160211002127.GJ31806@ando.pearwood.info> <3C4BBA25-0829-45D2-94A1-063026EF71AB@yahoo.com> <20160211101326.GL31806@ando.pearwood.info> <20160212001633.GP31806@ando.pearwood.info> Message-ID: On 12 February 2016 at 20:06, Chris Barker wrote: > As Paul said, as long as I can do the above, I'll be fine, but I think > everyone's source code will be a lot cleaner in the long run if you don't > have the option of doing who knows what weird arrangement.... Just to be clear, I'm personally in favour of less restrictions rather than more (as a general principle) - consenting adults and all that. But I'm also in favour of less debate rather than more on this issue, so I'll shut up at this point :-) Paul From g.brandl at gmx.net Sat Feb 13 03:48:49 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 13 Feb 2016 09:48:49 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) Message-ID: Hi all, after talking to Guido and Serhiy we present the next revision of this PEP. It is a compromise that we are all happy with, and a relatively restricted rule that makes additions to PEP 8 basically unnecessary. I think the discussion has shown that supporting underscores in the from-string constructors is valuable, therefore this is now added to the specification section. The remaining open question is about the reverse direction: do we want a string formatting modifier that adds underscores as thousands separators? cheers, Georg ----------------------------------------------------------------- PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl, Serhiy Storchaka Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6 Post-History: 10-Feb-2016, 11-Feb-2016 Abstract and Rationale ====================== This PEP proposes to extend Python's syntax and number-from-string constructors so that underscores can be used as visual separators for digit grouping purposes in integral, floating-point and complex number literals. This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation. Examples:: # grouping decimal numbers by thousands amount = 10_000_000.0 # grouping hexadecimal addresses by words addr = 0xDEAD_BEEF # grouping bits into nibbles in a binary literal flags = 0b_0011_1111_0100_1110 # same, for string conversions flags = int('0b_1111_0000', 2) Specification ============= The current proposal is to allow one underscore between digits, and after base specifiers in numeric literals. The underscores have no semantic meaning, and literals are parsed as if the underscores were absent. Literal Grammar --------------- The production list for integer literals would therefore look like this:: integer: decinteger | bininteger | octinteger | hexinteger decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")* bininteger: "0" ("b" | "B") (["_"] bindigit)+ octinteger: "0" ("o" | "O") (["_"] octdigit)+ hexinteger: "0" ("x" | "X") (["_"] hexdigit)+ nonzerodigit: "1"..."9" digit: "0"..."9" bindigit: "0" | "1" octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" For floating-point and complex literals:: floatnumber: pointfloat | exponentfloat pointfloat: [digitpart] fraction | digitpart "." exponentfloat: (digitpart | pointfloat) exponent digitpart: digit (["_"] digit)* fraction: "." digitpart exponent: ("e" | "E") ["+" | "-"] digitpart imagnumber: (floatnumber | digitpart) ("j" | "J") Constructors ------------ Following the same rules for placement, underscores will be allowed in the following constructors: - ``int()`` (with any base) - ``float()`` - ``complex()`` - ``Decimal()`` Prior Art ========= Those languages that do allow underscore grouping implement a large variety of rules for allowed placement of underscores. In cases where the language spec contradicts the actual behavior, the actual behavior is listed. ("single" or "multiple" refer to allowing runs of consecutive underscores.) * Ada: single, only between digits [8]_ * C# (open proposal for 7.0): multiple, only between digits [6]_ * C++14: single, between digits (different separator chosen) [1]_ * D: multiple, anywhere, including trailing [2]_ * Java: multiple, only between digits [7]_ * Julia: single, only between digits (but not in float exponent parts) [9]_ * Perl 5: multiple, basically anywhere, although docs say it's restricted to one underscore between digits [3]_ * Ruby: single, only between digits (although docs say "anywhere") [10]_ * Rust: multiple, anywhere, except for between exponent "e" and digits [4]_ * Swift: multiple, between digits and trailing (although textual description says only "between digits") [5]_ Alternative Syntax ================== Underscore Placement Rules -------------------------- Instead of the relatively strict rule specified above, the use of underscores could be limited. As we seen from other languages, common rules include: * Only one consecutive underscore allowed, and only between digits. * Multiple consecutive underscores allowed, but only between digits. * Multiple consecutive underscores allowed, in most positions except for the start of the literal, or special positions like after a decimal point. The syntax in this PEP has ultimately been selected because it covers the common use cases, and does not allow for syntax that would have to be discouraged in style guides anyway. A less common rule would be to allow underscores only every N digits (where N could be 3 for decimal literals, or 4 for hexadecimal ones). This is unnecessarily restrictive, especially considering the separator placement is different in different cultures. Different Separators -------------------- A proposed alternate syntax was to use whitespace for grouping. Although strings are a precedent for combining adjoining literals, the behavior can lead to unexpected effects which are not possible with underscores. Also, no other language is known to use this rule, except for languages that generally disregard any whitespace. C++14 introduces apostrophes for grouping (because underscores introduce ambiguity with user-defined literals), which is not considered because of the use in Python's string literals. [1]_ Open Proposals ============== It has been proposed [11]_ to extend the number-to-string formatting language to allow ``_`` as a thousans separator, where currently only ``,`` is supported. This could be used to easily generate code with more readable literals. Implementation ============== A preliminary patch that implements the specification given above has been posted to the issue tracker. [12]_ References ========== .. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html .. [2] http://dlang.org/spec/lex.html#integerliteral .. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors .. [4] http://doc.rust-lang.org/reference.html#number-literals .. [5] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html .. [6] https://github.com/dotnet/roslyn/issues/216 .. [7] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html .. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4 .. [9] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/ .. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers .. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html .. [12] http://bugs.python.org/issue26331 Copyright ========= This document has been placed in the public domain. From storchaka at gmail.com Sat Feb 13 06:10:59 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 13 Feb 2016 13:10:59 +0200 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) In-Reply-To: References: Message-ID: On 13.02.16 10:48, Georg Brandl wrote: > Following the same rules for placement, underscores will be allowed in > the following constructors: > > - ``int()`` (with any base) > - ``float()`` > - ``complex()`` > - ``Decimal()`` What about float.fromhex()? Should underscores be allowed in it (I think no)? From g.brandl at gmx.net Sat Feb 13 06:22:42 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 13 Feb 2016 12:22:42 +0100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) In-Reply-To: References: Message-ID: On 02/13/2016 12:10 PM, Serhiy Storchaka wrote: > On 13.02.16 10:48, Georg Brandl wrote: >> Following the same rules for placement, underscores will be allowed in >> the following constructors: >> >> - ``int()`` (with any base) >> - ``float()`` >> - ``complex()`` >> - ``Decimal()`` > > What about float.fromhex()? Should underscores be allowed in it (I think > no)? Good question. It *does* accept a "0x" prefix, as does ``int(x, 16)``, so there is some precedent for literal-like interpretation of the input here as well. Georg From steve at pearwood.info Sat Feb 13 09:58:52 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Feb 2016 01:58:52 +1100 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) In-Reply-To: References: Message-ID: <20160213145851.GW31806@ando.pearwood.info> On Sat, Feb 13, 2016 at 09:48:49AM +0100, Georg Brandl wrote: > Hi all, > > after talking to Guido and Serhiy we present the next revision > of this PEP. It is a compromise that we are all happy with, > and a relatively restricted rule that makes additions to PEP 8 > basically unnecessary. > > I think the discussion has shown that supporting underscores in > the from-string constructors is valuable, therefore this is now > added to the specification section. What about Fraction? Currently this is legal: py> Fraction("1/1000000") Fraction(1, 1000000) I think the PEP should also support underscores in Fractions: Fraction("1/1_000_000") > The remaining open question is about the reverse direction: do > we want a string formatting modifier that adds underscores as > thousands separators? Yes please. > Open Proposals > ============== > > It has been proposed [11]_ to extend the number-to-string formatting > language to allow ``_`` as a thousans separator, where currently only > ``,`` is supported. This could be used to easily generate code with > more readable literals. /s/thousans/thousands/ -- Steve From v+python at g.nevcal.com Sat Feb 13 12:27:08 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 13 Feb 2016 09:27:08 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) In-Reply-To: References: Message-ID: <56BF676C.9060905@g.nevcal.com> On 2/13/2016 12:48 AM, Georg Brandl wrote: > Instead of the relatively strict rule specified above, the use of > underscores could be limited. This sentence doesn't really make sense. Either s/limited/more limited/ or s/limited/further limited/ or s/limited/relaxed/ Maybe the whole section should be reworded. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Feb 13 12:40:57 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 13 Feb 2016 09:40:57 -0800 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) In-Reply-To: References: Message-ID: <56BF6AA9.9050803@stoneleaf.us> On 02/13/2016 12:48 AM, Georg Brandl wrote: > The remaining open question is about the reverse direction: do > we want a string formatting modifier that adds underscores as > thousands separators? +0 Would be nice, but also wouldn't make much sense in other groupings. > Instead of the relatively strict rule specified above, the use of > underscores could be limited. As we seen from other languages, common > rules include: s/seen/see or s/we// -- ~Ethan~ From brett at python.org Sat Feb 13 21:14:58 2016 From: brett at python.org (Brett Cannon) Date: Sun, 14 Feb 2016 02:14:58 +0000 Subject: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3) In-Reply-To: References: Message-ID: On Sat, Feb 13, 2016, 00:49 Georg Brandl wrote: > Hi all, > > after talking to Guido and Serhiy we present the next revision > of this PEP. It is a compromise that we are all happy with, > and a relatively restricted rule that makes additions to PEP 8 > basically unnecessary. > +1 from me. > I think the discussion has shown that supporting underscores in > the from-string constructors is valuable, therefore this is now > added to the specification section. > > The remaining open question is about the reverse direction: do > we want a string formatting modifier that adds underscores as > thousands separators? > +0 Brett > cheers, > Georg > > ----------------------------------------------------------------- > > PEP: 515 > Title: Underscores in Numeric Literals > Version: $Revision$ > Last-Modified: $Date$ > Author: Georg Brandl, Serhiy Storchaka > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 10-Feb-2016 > Python-Version: 3.6 > Post-History: 10-Feb-2016, 11-Feb-2016 > > Abstract and Rationale > ====================== > > This PEP proposes to extend Python's syntax and number-from-string > constructors so that underscores can be used as visual separators for > digit grouping purposes in integral, floating-point and complex number > literals. > > This is a common feature of other modern languages, and can aid > readability of long literals, or literals whose value should clearly > separate into parts, such as bytes or words in hexadecimal notation. > > Examples:: > > # grouping decimal numbers by thousands > amount = 10_000_000.0 > > # grouping hexadecimal addresses by words > addr = 0xDEAD_BEEF > > # grouping bits into nibbles in a binary literal > flags = 0b_0011_1111_0100_1110 > > # same, for string conversions > flags = int('0b_1111_0000', 2) > > > Specification > ============= > > The current proposal is to allow one underscore between digits, and > after base specifiers in numeric literals. The underscores have no > semantic meaning, and literals are parsed as if the underscores were > absent. > > Literal Grammar > --------------- > > The production list for integer literals would therefore look like > this:: > > integer: decinteger | bininteger | octinteger | hexinteger > decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")* > bininteger: "0" ("b" | "B") (["_"] bindigit)+ > octinteger: "0" ("o" | "O") (["_"] octdigit)+ > hexinteger: "0" ("x" | "X") (["_"] hexdigit)+ > nonzerodigit: "1"..."9" > digit: "0"..."9" > bindigit: "0" | "1" > octdigit: "0"..."7" > hexdigit: digit | "a"..."f" | "A"..."F" > > For floating-point and complex literals:: > > floatnumber: pointfloat | exponentfloat > pointfloat: [digitpart] fraction | digitpart "." > exponentfloat: (digitpart | pointfloat) exponent > digitpart: digit (["_"] digit)* > fraction: "." digitpart > exponent: ("e" | "E") ["+" | "-"] digitpart > imagnumber: (floatnumber | digitpart) ("j" | "J") > > Constructors > ------------ > > Following the same rules for placement, underscores will be allowed in > the following constructors: > > - ``int()`` (with any base) > - ``float()`` > - ``complex()`` > - ``Decimal()`` > > > Prior Art > ========= > > Those languages that do allow underscore grouping implement a large > variety of rules for allowed placement of underscores. In cases where > the language spec contradicts the actual behavior, the actual behavior > is listed. ("single" or "multiple" refer to allowing runs of > consecutive underscores.) > > * Ada: single, only between digits [8]_ > * C# (open proposal for 7.0): multiple, only between digits [6]_ > * C++14: single, between digits (different separator chosen) [1]_ > * D: multiple, anywhere, including trailing [2]_ > * Java: multiple, only between digits [7]_ > * Julia: single, only between digits (but not in float exponent parts) > [9]_ > * Perl 5: multiple, basically anywhere, although docs say it's > restricted to one underscore between digits [3]_ > * Ruby: single, only between digits (although docs say "anywhere") > [10]_ > * Rust: multiple, anywhere, except for between exponent "e" and digits > [4]_ > * Swift: multiple, between digits and trailing (although textual > description says only "between digits") [5]_ > > > Alternative Syntax > ================== > > Underscore Placement Rules > -------------------------- > > Instead of the relatively strict rule specified above, the use of > underscores could be limited. As we seen from other languages, common > rules include: > > * Only one consecutive underscore allowed, and only between digits. > * Multiple consecutive underscores allowed, but only between digits. > * Multiple consecutive underscores allowed, in most positions except > for the start of the literal, or special positions like after a > decimal point. > > The syntax in this PEP has ultimately been selected because it covers > the common use cases, and does not allow for syntax that would have to > be discouraged in style guides anyway. > > A less common rule would be to allow underscores only every N digits > (where N could be 3 for decimal literals, or 4 for hexadecimal ones). > This is unnecessarily restrictive, especially considering the > separator placement is different in different cultures. > > Different Separators > -------------------- > > A proposed alternate syntax was to use whitespace for grouping. > Although strings are a precedent for combining adjoining literals, the > behavior can lead to unexpected effects which are not possible with > underscores. Also, no other language is known to use this rule, > except for languages that generally disregard any whitespace. > > C++14 introduces apostrophes for grouping (because underscores > introduce ambiguity with user-defined literals), which is not > considered because of the use in Python's string literals. [1]_ > > > Open Proposals > ============== > > It has been proposed [11]_ to extend the number-to-string formatting > language to allow ``_`` as a thousans separator, where currently only > ``,`` is supported. This could be used to easily generate code with > more readable literals. > > > Implementation > ============== > > A preliminary patch that implements the specification given above has > been posted to the issue tracker. [12]_ > > > References > ========== > > .. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html > > .. [2] http://dlang.org/spec/lex.html#integerliteral > > .. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors > > .. [4] http://doc.rust-lang.org/reference.html#number-literals > > .. [5] > > https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html > > .. [6] https://github.com/dotnet/roslyn/issues/216 > > .. [7] > > https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html > > .. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4 > > .. [9] > > http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/ > > .. [10] > http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers > > .. [11] > https://mail.python.org/pipermail/python-dev/2016-February/143283.html > > .. [12] http://bugs.python.org/issue26331 > > > Copyright > ========= > > This document has been placed in the public domain. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alecsandru.patrascu at intel.com Sun Feb 14 05:31:46 2016 From: alecsandru.patrascu at intel.com (Patrascu, Alecsandru) Date: Sun, 14 Feb 2016 10:31:46 +0000 Subject: [Python-Dev] CPython build options for out-of-the box performance In-Reply-To: <3CF256F4F774BD48A1691D131AA043191424CED7@IRSMSX102.ger.corp.intel.com> References: <3CF256F4F774BD48A1691D131AA043191424CED7@IRSMSX102.ger.corp.intel.com> Message-ID: <3CF256F4F774BD48A1691D131AA043191424F95B@IRSMSX102.ger.corp.intel.com> I've added the patches here[1], to be more clear about the workflow and the small modifications in the CPython build system. [1] http://bugs.python.org/issue26359 Thank you, Alecsandru > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+alecsandru.patrascu=intel.com at python.org] On Behalf Of Patrascu, > Alecsandru > Sent: Tuesday, February 9, 2016 1:45 PM > To: python-dev at python.org > Subject: [Python-Dev] CPython build options for out-of-the box performance > > Hi all, > > This is Alecsandru from the Dynamic Scripting Languages Optimization Team > at Intel Corporation. I want to open a discussion regarding the way > CPython is built, mainly the options that are available to the > programmers. Analyzing the CPython ecosystem we can see that there are a > lot of users that just download the sources and hit the commands > "./configure", "make" and "make install" once and then continue using it > with their Python scripts. One of the problems with this workflow it that > the users do not benefit from the entire optimization features that are > existing in the build system, such as PGO and LTO. > > Therefore, I propose a workflow, like the following. Assume some work has > to be done into the CPython interpreter, a developer can do the following > steps: > A. Implementation and debugging phase. > 1. The command "./configure PYDIST=debug" is ran once. It will enable > the Py_DEBUG, -O0 and -g flags > 2. The command "make" is ran once or multiple times > > B. Testing the implementation from step A, in a pre-release environment > 1. The command "./configure PYDIST=devel" is ran once. It will disable > the Py_DEBUG flags and will enable the -O3 and -g flags, and it is just > like the current implementation in CPython > 2. The command "make" is ran once or multiple times > > C. For any other CPython usage, for example distributing the interpreter, > installing it inside an operating system, or just the majority of users > who are not CPython developers and only want to compile it once and use it > as-is: > 1. The command "./configure" is ran once. Alternatively, the command > "./configure PYDIST=release" can be used. It will disable all debugging > functionality, enable the -O3 flag and will enable PGO and LTO. > 2. The command "make" is ran once > > If you think this benefits CPython, I can create an issue and post the > patches that enable all of the above. > > Thank you, > Alecsandru > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python- > dev/alecsandru.patrascu%40intel.com From jcgoble3 at gmail.com Sun Feb 14 13:49:38 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Sun, 14 Feb 2016 13:49:38 -0500 Subject: [Python-Dev] Regular expression bytecode Message-ID: I'm new to Python's mailing lists, so please forgive me if I'm sending this to the wrong list. :) I filed http://bugs.python.org/issue26336 a few days ago, but now I think this list might be a better place to get discussion going. Basically, I'd like to see the bytecode of a compiled regex object exposed as a public (probably read-only) attribute of the object. Currently, although compiled in pure Python through modules sre_compile and sre_parse, the list of opcodes is then passed into C and copied into an array in a C struct, without being publicly exposed in any way. The only way for a user to get an internal representation of the regex is the re.DEBUG flag, which only produces an intermediate representation rather than the actual bytecode and only goes to stdout, which makes it useless for someone who wants to examine it programmatically. I'm sure others can think of other potential use cases for this, but one in particular would be that someone could write a debugger that can allow a user to step through a regex one opcode at a time to see exactly where it is failing. It would also perhaps be nice to have a public constructor for the regex object type, which would enable users to modify the bytecode and directly create a new regex object from it, similar to what is currently possible through the types.FunctionType and types.CodeType constructors. In addition to exposing the code in a public attribute, a helper module written in Python similar to the dis module (which is for Python's own bytecode) would be very helpful, allowing the code to be easily disassembled and examined at a higher level. Is this a good idea, or am I barking up the wrong tree? I think it's a great idea, but I'm open to being told this is a horrible idea. :) I welcome any and all comments both here and on the bug tracker. Jonathan Goble From leewangzhong+python at gmail.com Sun Feb 14 14:41:27 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sun, 14 Feb 2016 14:41:27 -0500 Subject: [Python-Dev] Regular expression bytecode In-Reply-To: References: Message-ID: I think it would be nice for manipulating (e.g. optimizing, possibly with JIT-like analysis) and comparing regexes. It can also be useful as a teaching tool, e.g. exercises in optimizing and comparing regexes. I think the discussion should be on python-ideas, though. On Feb 14, 2016 2:01 PM, "Jonathan Goble" wrote: > I'm new to Python's mailing lists, so please forgive me if I'm sending > this to the wrong list. :) > > I filed http://bugs.python.org/issue26336 a few days ago, but now I > think this list might be a better place to get discussion going. > Basically, I'd like to see the bytecode of a compiled regex object > exposed as a public (probably read-only) attribute of the object. > > Currently, although compiled in pure Python through modules > sre_compile and sre_parse, the list of opcodes is then passed into C > and copied into an array in a C struct, without being publicly exposed > in any way. The only way for a user to get an internal representation > of the regex is the re.DEBUG flag, which only produces an intermediate > representation rather than the actual bytecode and only goes to > stdout, which makes it useless for someone who wants to examine it > programmatically. > > I'm sure others can think of other potential use cases for this, but > one in particular would be that someone could write a debugger that > can allow a user to step through a regex one opcode at a time to see > exactly where it is failing. It would also perhaps be nice to have a > public constructor for the regex object type, which would enable users > to modify the bytecode and directly create a new regex object from it, > similar to what is currently possible through the types.FunctionType > and types.CodeType constructors. > > In addition to exposing the code in a public attribute, a helper > module written in Python similar to the dis module (which is for > Python's own bytecode) would be very helpful, allowing the code to be > easily disassembled and examined at a higher level. > > Is this a good idea, or am I barking up the wrong tree? I think it's a > great idea, but I'm open to being told this is a horrible idea. :) I > welcome any and all comments both here and on the bug tracker. > > Jonathan Goble > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gunkmute at gmail.com Sun Feb 14 19:20:00 2016 From: gunkmute at gmail.com (Demur Rumed) Date: Mon, 15 Feb 2016 00:20:00 +0000 Subject: [Python-Dev] Wordcode v2 Message-ID: Saw recent discussion: https://mail.python.org/pipermail/python-dev/2016-February/143013.html I remember trying WPython; it was fast. Unfortunately it feels it came at the wrong time when development was invested in getting py3k out the door. It also had a lot of other ideas like *_INT instructions which allowed having oparg to be a constant int rather than needing to LOAD_CONST one. Anyways I'll stop reminiscing abarnert has started an experiment with wordcode: https://github.com/abarnert/cpython/blob/c095a32f2a68ac708466b9c64906cc4d0f5de1ee/Python/wordcode.md I've personally benchmarked this fork with positive results. This experiment seeks to be conservative-- it doesn't seek to introduce new opcodes or combine BINARY_OP's all into a single op where the currently unused-in-wordcode arg then states the kind of binary op (? la COMPARE_OP). I've submitted a pull request which is working on fixing tests & updating peephole.c Bringing this up on the list to figure out if there's interest in a basic wordcode change. It feels like there's no downsides: faster code, smaller bytecode, simpler interpretation of bytecode (The Nth instruction starts at the 2Nth byte if you count EXTENDED_ARG as an instruction). The only downside is the transitional cost What'd be necessary for this to be pulled upstream? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcgoble3 at gmail.com Sun Feb 14 20:39:24 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Sun, 14 Feb 2016 20:39:24 -0500 Subject: [Python-Dev] Regular expression bytecode In-Reply-To: References: Message-ID: On Sun, Feb 14, 2016 at 2:41 PM, Franklin? Lee wrote: > I think it would be nice for manipulating (e.g. optimizing, possibly with > JIT-like analysis) and comparing regexes. It can also be useful as a > teaching tool, e.g. exercises in optimizing and comparing regexes. Both great points in favor of this. > I think the discussion should be on python-ideas, though. Thanks for being gentle with the correction. :) I'll resend it over there later tonight when I have some more time on my hands. From guido at python.org Sun Feb 14 22:05:02 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Feb 2016 19:05:02 -0800 Subject: [Python-Dev] Wordcode v2 In-Reply-To: References: Message-ID: I think it's probably too soon to discuss on python-dev, but I do think that something like this could be attempted in 3.6 or (more likely) 3.7, if it really is faster. An unfortunate issue however is that many projects seem to make a hobby of hacking bytecode. All those projects would have to be totally rewritten in order to support the new wordcode format (as opposed to just having to be slightly adjusted to support the occasional new bytecode opcode). Those projects of course don't work with Pypy or Jython either, but they do work for mainstream CPython, and it's unacceptable to just leave them all behind. As an example, AFAIK coverage.py interprets bytecode. This is an important piece of infrastructure that we wouldn't want to leave behind. I think py.test's assert-rewrite code also generates or looks at bytecode. Also important. All of which means that it's more likely to make it into 3.7. See you on python-ideas! --Guido On Sun, Feb 14, 2016 at 4:20 PM, Demur Rumed wrote: > Saw recent discussion: > https://mail.python.org/pipermail/python-dev/2016-February/143013.html > > I remember trying WPython; it was fast. Unfortunately it feels it came at > the wrong time when development was invested in getting py3k out the door. > It also had a lot of other ideas like *_INT instructions which allowed > having oparg to be a constant int rather than needing to LOAD_CONST one. > Anyways I'll stop reminiscing > > abarnert has started an experiment with wordcode: > https://github.com/abarnert/cpython/blob/c095a32f2a68ac708466b9c64906cc4d0f5de1ee/Python/wordcode.md > > I've personally benchmarked this fork with positive results. This experiment > seeks to be conservative-- it doesn't seek to introduce new opcodes or > combine BINARY_OP's all into a single op where the currently > unused-in-wordcode arg then states the kind of binary op (? la COMPARE_OP). > I've submitted a pull request which is working on fixing tests & updating > peephole.c > > Bringing this up on the list to figure out if there's interest in a basic > wordcode change. It feels like there's no downsides: faster code, smaller > bytecode, simpler interpretation of bytecode (The Nth instruction starts at > the 2Nth byte if you count EXTENDED_ARG as an instruction). The only > downside is the transitional cost > > What'd be necessary for this to be pulled upstream? > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From fijall at gmail.com Mon Feb 15 02:13:06 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Mon, 15 Feb 2016 08:13:06 +0100 Subject: [Python-Dev] Wordcode v2 In-Reply-To: References: Message-ID: On Mon, Feb 15, 2016 at 4:05 AM, Guido van Rossum wrote: > I think it's probably too soon to discuss on python-dev, but I do > think that something like this could be attempted in 3.6 or (more > likely) 3.7, if it really is faster. > > An unfortunate issue however is that many projects seem to make a > hobby of hacking bytecode. All those projects would have to be totally > rewritten in order to support the new wordcode format (as opposed to > just having to be slightly adjusted to support the occasional new > bytecode opcode). Those projects of course don't work with Pypy or > Jython either, but they do work for mainstream CPython, and it's > unacceptable to just leave them all behind. They mostly work with PyPy (which has 2 or 3 additional bytecodes, but nothing too dramatic) > > As an example, AFAIK coverage.py interprets bytecode. This is an > important piece of infrastructure that we wouldn't want to leave > behind. I think py.test's assert-rewrite code also generates or looks > at bytecode. Also important. > > All of which means that it's more likely to make it into 3.7. See you > on python-ideas! > > --Guido > > On Sun, Feb 14, 2016 at 4:20 PM, Demur Rumed wrote: >> Saw recent discussion: >> https://mail.python.org/pipermail/python-dev/2016-February/143013.html >> >> I remember trying WPython; it was fast. Unfortunately it feels it came at >> the wrong time when development was invested in getting py3k out the door. >> It also had a lot of other ideas like *_INT instructions which allowed >> having oparg to be a constant int rather than needing to LOAD_CONST one. >> Anyways I'll stop reminiscing >> >> abarnert has started an experiment with wordcode: >> https://github.com/abarnert/cpython/blob/c095a32f2a68ac708466b9c64906cc4d0f5de1ee/Python/wordcode.md >> >> I've personally benchmarked this fork with positive results. This experiment >> seeks to be conservative-- it doesn't seek to introduce new opcodes or >> combine BINARY_OP's all into a single op where the currently >> unused-in-wordcode arg then states the kind of binary op (? la COMPARE_OP). >> I've submitted a pull request which is working on fixing tests & updating >> peephole.c >> >> Bringing this up on the list to figure out if there's interest in a basic >> wordcode change. It feels like there's no downsides: faster code, smaller >> bytecode, simpler interpretation of bytecode (The Nth instruction starts at >> the 2Nth byte if you count EXTENDED_ARG as an instruction). The only >> downside is the transitional cost >> >> What'd be necessary for this to be pulled upstream? >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com From abarnert at yahoo.com Mon Feb 15 02:14:46 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 14 Feb 2016 23:14:46 -0800 Subject: [Python-Dev] Wordcode v2 In-Reply-To: References: Message-ID: On Feb 14, 2016, at 19:05, Guido van Rossum wrote: > > I think it's probably too soon to discuss on python-dev, but I do > think that something like this could be attempted in 3.6 or (more > likely) 3.7, if it really is faster. > > An unfortunate issue however is that many projects seem to make a > hobby of hacking bytecode. > All those projects would have to be totally > rewritten in order to support the new wordcode format (as opposed to > just having to be slightly adjusted to support the occasional new > bytecode opcode). This is part of why I suggested, on -ideas, that we should add a mutating/assembling API to the dis module. People argued that such an API would make the bytecode format more fragile, but the exact opposite is true. At the dis level, everything is unchanged by wordcode. Or by Serhiy's args-packed-in-opcode. So, if the dis module could do everything for people that, say, the third-party byteplay module does (which wouldn't take much), so things like coverage.py, or the various special-case optimizer decorators on PyPI and ActiveState, etc. could all be written to deal with the dis module format rather than raw bytecode, we could make changes like this without risking nearly as much breakage. Anyway, this obviously wouldn't help the transition for 3.6. But improving dis in 3.6, with a warning that raw bytecode might start changing more frequently and/or radically in the future now that there's less reason to depend on it, might help if wordcode were to go into 3.7. > All of which means that it's more likely to make it into 3.7. See you > on python-ideas! > > --Guido > >> On Sun, Feb 14, 2016 at 4:20 PM, Demur Rumed wrote: >> Saw recent discussion: >> https://mail.python.org/pipermail/python-dev/2016-February/143013.html >> >> I remember trying WPython; it was fast. Unfortunately it feels it came at >> the wrong time when development was invested in getting py3k out the door. >> It also had a lot of other ideas like *_INT instructions which allowed >> having oparg to be a constant int rather than needing to LOAD_CONST one. >> Anyways I'll stop reminiscing Despite the name (and inspiration), my fork has very little to do with WPython. I'm just focused on simpler (hopefully = faster) fetch code; he started with that, but ended up going the exact opposite direction, accepting more complicated (and much slower) fetch code as a reasonable cost for drastically reducing the number of instructions. (If you double the 30% fetch-and-parse overhead per instruction, but cut the number of instructions to 40%, the net is a huge win.) >> >> abarnert has started an experiment with wordcode: >> https://github.com/abarnert/cpython/blob/c095a32f2a68ac708466b9c64906cc4d0f5de1ee/Python/wordcode.md >> >> I've personally benchmarked this fork with positive results. This experiment >> seeks to be conservative-- it doesn't seek to introduce new opcodes or >> combine BINARY_OP's all into a single op where the currently >> unused-in-wordcode arg then states the kind of binary op (? la COMPARE_OP). >> I've submitted a pull request which is working on fixing tests & updating >> peephole.c >> >> Bringing this up on the list to figure out if there's interest in a basic >> wordcode change. It feels like there's no downsides: faster code, smaller >> bytecode, simpler interpretation of bytecode (The Nth instruction starts at >> the 2Nth byte if you count EXTENDED_ARG as an instruction). The only >> downside is the transitional cost >> >> What'd be necessary for this to be pulled upstream? >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/abarnert%40yahoo.com From greg.ewing at canterbury.ac.nz Mon Feb 15 02:20:05 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Feb 2016 20:20:05 +1300 Subject: [Python-Dev] Wordcode v2 In-Reply-To: References: Message-ID: <56C17C25.5090002@canterbury.ac.nz> Guido van Rossum wrote: > An unfortunate issue however is that many projects seem to make a > hobby of hacking bytecode. All those projects would have to be totally > rewritten in order to support the new wordcode format Maybe this argues for having an assembly-language-like intermediate form between the AST and the actual code used by the interpreter? Done properly it could make things easier for bytecode-hacking projects as well as providing some insulation from implementation details. -- Greg From russell at keith-magee.com Mon Feb 15 03:24:50 2016 From: russell at keith-magee.com (Russell Keith-Magee) Date: Mon, 15 Feb 2016 16:24:50 +0800 Subject: [Python-Dev] Bug in build system for cross-platform builds Message-ID: Hi all, I?ve been working on developing Python builds for mobile platforms, and I?m looking for some help resolving a bug in Python?s build system. The problem affects cross-platform builds - builds where you are compiling python for a CPU architecture other than the one on the machine that is doing the compilation. This requirement stems from supporting mobile platforms (iOS, Android etc) where you compile on your laptop, then ship the compiled binary to the device. In the Python 3.5 dev cycle, Issue 22359 [1] was addressed, fixing parallel builds. However, as a side effect, this patch broke (as far as I can tell) *all* cross platform builds. This was reported in issue 22625 [2]. Since that time, the problem has gotten slightly worse; the addition of changeset 95566 [3] and 95854 [4] has cemented the problem. I?ve been able to hack together a fix that enables me to get a set of binaries, but the patch is essentially reverting 22359, and making some (very dubious) assumptions about the order in which things are built. Autoconf et al aren?t my strong suit; I was hoping someone might be able to help me resolve this issue. Yours, Russ Magee %-) [1] http://bugs.python.org/issue22359 [2] http://bugs.python.org/issue22625 [3] https://hg.python.org/cpython/rev/565b96093ec8 [4] https://hg.python.org/cpython/rev/02e3bf65b2f8 -------------- next part -------------- An HTML attachment was scrubbed... URL: From deronnax at gmail.com Mon Feb 15 07:32:57 2016 From: deronnax at gmail.com (Mathieu Dupuy) Date: Mon, 15 Feb 2016 23:02:57 +1030 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: Message-ID: A python representative (like Guido himself) should contact Github to obtain coordinates of the owner, and maybe have them pulling it down if he doesn't answer. The pull requests it's attracting are old and/or of low value. 2016-02-10 6:30 GMT+10:30 John Mark Vandenberg : > Does anyone know who controls this mirror, which is attracting pull > requests? > > https://github.com/python-git/python/pulls > > Can it be pulled down to avoid confusion, since it is using Python's logo? > > https://github.com/python-git > > -- > John Vandenberg > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/deronnax%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Feb 15 15:16:56 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Feb 2016 09:16:56 +1300 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: Message-ID: <56C23238.3000306@canterbury.ac.nz> Mathieu Dupuy wrote: > A python representative (like Guido himself) should contact Github to > obtain coordinates of the owner... ...and then order a drone strike on him? -- Greg From phd at phdru.name Mon Feb 15 15:27:20 2016 From: phd at phdru.name (Oleg Broytman) Date: Mon, 15 Feb 2016 21:27:20 +0100 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: <56C23238.3000306@canterbury.ac.nz> References: <56C23238.3000306@canterbury.ac.nz> Message-ID: <20160215202720.GA27790@phdru.name> On Tue, Feb 16, 2016 at 09:16:56AM +1300, Greg Ewing wrote: > Mathieu Dupuy wrote: > >A python representative (like Guido himself) should contact Github to > >obtain coordinates of the owner... > > ...and then order a drone strike on him? Yes, and then pry the repo from his cold dead fingers. Well, I hope prying can be done without striking first. ;-) > -- > Greg Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From vadmium+py at gmail.com Mon Feb 15 16:22:10 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Mon, 15 Feb 2016 21:22:10 +0000 Subject: [Python-Dev] Bug in build system for cross-platform builds In-Reply-To: References: Message-ID: On 15 February 2016 at 08:24, Russell Keith-Magee wrote: > Hi all, > > I?ve been working on developing Python builds for mobile platforms, and I?m > looking for some help resolving a bug in Python?s build system. > > The problem affects cross-platform builds - builds where you are compiling > python for a CPU architecture other than the one on the machine that is > doing the compilation. This requirement stems from supporting mobile > platforms (iOS, Android etc) where you compile on your laptop, then ship the > compiled binary to the device. > > In the Python 3.5 dev cycle, Issue 22359 [1] was addressed, fixing parallel > builds. However, as a side effect, this patch broke (as far as I can tell) > *all* cross platform builds. This was reported in issue 22625 [2]. > > Since that time, the problem has gotten slightly worse; the addition of > changeset 95566 [3] and 95854 [4] has cemented the problem. I?ve been able > to hack together a fix that enables me to get a set of binaries, but the > patch is essentially reverting 22359, and making some (very dubious) > assumptions about the order in which things are built. > > Autoconf et al aren?t my strong suit; I was hoping someone might be able to > help me resolve this issue. Would you mind answering my question in ? In particular, how did cross-compiling previously work before these changes. AFAIK Python builds a preliminary Python executable which is executed on the host to complete the final build. So how do you differentiate between host and target compilers etc? From russell at keith-magee.com Mon Feb 15 20:41:26 2016 From: russell at keith-magee.com (Russell Keith-Magee) Date: Tue, 16 Feb 2016 09:41:26 +0800 Subject: [Python-Dev] Bug in build system for cross-platform builds In-Reply-To: References: Message-ID: On Tue, Feb 16, 2016 at 5:22 AM, Martin Panter wrote: > On 15 February 2016 at 08:24, Russell Keith-Magee > wrote: > > Hi all, > > > > I?ve been working on developing Python builds for mobile platforms, and > I?m > > looking for some help resolving a bug in Python?s build system. > > > > The problem affects cross-platform builds - builds where you are > compiling > > python for a CPU architecture other than the one on the machine that is > > doing the compilation. This requirement stems from supporting mobile > > platforms (iOS, Android etc) where you compile on your laptop, then ship > the > > compiled binary to the device. > > > > In the Python 3.5 dev cycle, Issue 22359 [1] was addressed, fixing > parallel > > builds. However, as a side effect, this patch broke (as far as I can > tell) > > *all* cross platform builds. This was reported in issue 22625 [2]. > > > > Since that time, the problem has gotten slightly worse; the addition of > > changeset 95566 [3] and 95854 [4] has cemented the problem. I?ve been > able > > to hack together a fix that enables me to get a set of binaries, but the > > patch is essentially reverting 22359, and making some (very dubious) > > assumptions about the order in which things are built. > > > > Autoconf et al aren?t my strong suit; I was hoping someone might be able > to > > help me resolve this issue. > > Would you mind answering my question in > ? In particular, how did > cross-compiling previously work before these changes. AFAIK Python > builds a preliminary Python executable which is executed on the host > to complete the final build. So how do you differentiate between host > and target compilers etc? > In order to build for a host platform, you have to compile for a local platform first - for example, to compile an iOS ARM64 binary, you have to compile for OS X x86_64 first. This gives you a local platform version of Python you can use when building the iOS version. Early in the Makefile, the variable PYTHON_FOR_BUILD is set. This points at the CPU-local version of Python that can be invoked, which is used for module builds, and for compiling the standard library source code. This is set by ?host and ?build flags to configure, plus the use of CC and LDFLAGS environment variables to point at the compiler and libraries for the platform you?re compiling for, and a PATH variable that provides the local platform?s version of Python. There are two places where special handling is required: the compilation and execution of the parser generator, and _freeze_importlib. In both cases, the tool needs to be compiled for the local platform, and then executed. Historically (i.e., Py3.4 and earlier), this has been done by spawning a child MAKE to compile the tool; this runs the compilation phase with the local CPU environment, before returning to the master makefile and executing the tool. By spawning the child MAKE, you get a ?clean? environment, so the tool is built natively. However, as I understand it, it causes problems with parallel builds due to race conditions on build rules. The change in Python3.5 simplified the rule so that child MAKE calls weren?t used, but that means that pgen and _freeze_importlib are compiled for ARM64, so they won?t run on the local platform. As best as I can work out, the solution is to: (1) Include the parser generator and _freeze_importlib as part of the artefacts of local platform. That way, you could use the version of pgen and _freeze_importlib that was compiled as part of the local platform build. At present, pgen and _freeze_importlib are used during the build process, but aren?t preserved at the end of the build; or (2) Include some concept of the ?local compiler? in the build process, which can be used to compile pgen and _freeze_importlib; or There might be other approaches that will work; as I said, build systems aren?t my strength. Yours, Russ Magee %-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph at grothesque.org Tue Feb 16 04:48:05 2016 From: christoph at grothesque.org (Christoph Groth) Date: Tue, 16 Feb 2016 10:48:05 +0100 Subject: [Python-Dev] Hash randomization for which types? Message-ID: <87k2m50ywa.fsf@grothesque.org> Hello, Recent Python versions randomize the hashes of str, bytes and datetime objects. I suppose that the choice of these three types is the result of a compromise. Has this been discussed somewhere publicly? I'm not a web programmer, but don't web applications also use dictionaries that are indexed by, say, tuples of integers? Just curious... Thanks, Christoph -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 810 bytes Desc: not available URL: From v+python at g.nevcal.com Tue Feb 16 14:56:55 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 16 Feb 2016 11:56:55 -0800 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: <87k2m50ywa.fsf@grothesque.org> References: <87k2m50ywa.fsf@grothesque.org> Message-ID: <56C37F07.8020904@g.nevcal.com> On 2/16/2016 1:48 AM, Christoph Groth wrote: > Hello, > > Recent Python versions randomize the hashes of str, bytes and datetime > objects. I suppose that the choice of these three types is the result > of a compromise. Has this been discussed somewhere publicly? Search archives of this list... it was discussed at length. > I'm not a web programmer, but don't web applications also use > dictionaries that are indexed by, say, tuples of integers? Sure, and that is the biggest part of the reason they were randomized. I think hashes of all types have been randomized, not _just_ the list you mentioned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Feb 16 20:54:45 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 17 Feb 2016 12:54:45 +1100 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: <56C37F07.8020904@g.nevcal.com> References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> Message-ID: <20160217015445.GA12028@ando.pearwood.info> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote: > On 2/16/2016 1:48 AM, Christoph Groth wrote: > >Hello, > > > >Recent Python versions randomize the hashes of str, bytes and datetime > >objects. I suppose that the choice of these three types is the result > >of a compromise. Has this been discussed somewhere publicly? > > Search archives of this list... it was discussed at length. There's a lot of discussion on the mailing list. I think that this is the very start of it, in Dec 2011: https://mail.python.org/pipermail/python-dev/2011-December/115116.html and continuing into 2012, for example: https://mail.python.org/pipermail/python-dev/2012-January/115577.html https://mail.python.org/pipermail/python-dev/2012-January/115690.html and a LOT more, spread over many different threads and subject lines. You should also read the issue on the bug tracker: http://bugs.python.org/issue13703 My recollection is that it was decided that only strings and bytes need to have their hashes randomized, because only strings and bytes can be used directly from user-input without first having a conversion step with likely input range validation. In addition, changing the hash for ints would break too much code for too little benefit: unlike strings, where hash collision attacks on web apps are proven and easy, hash collision attacks based on ints are more difficult and rare. See also the comment here: http://bugs.python.org/issue13703#msg151847 > >I'm not a web programmer, but don't web applications also use > >dictionaries that are indexed by, say, tuples of integers? > > Sure, and that is the biggest part of the reason they were randomized. But they aren't, as far as I can see: [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" 1071302475 [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" 1071302475 Web apps can use dicts indexed by anything that they like, but unless there is an actual attack, what does it matter? Guido makes a good point about security here: https://mail.python.org/pipermail/python-dev/2013-October/129181.html > I think hashes of all types have been randomized, not _just_ the list > you mentioned. I'm pretty sure that's not actually the case. Using 3.6 from the repo (admittedly not fully up to date though), I can see hash randomization working for strings: [steve at ando 3.6]$ ./python -c "print(hash('abc'))" 11601873 [steve at ando 3.6]$ ./python -c "print(hash('abc'))" -2009889747 but not for ints: [steve at ando 3.6]$ ./python -c "print(hash(42))" 42 [steve at ando 3.6]$ ./python -c "print(hash(42))" 42 which agrees with my recollection that only strings and bytes would be randomized. -- Steve From stephen at xemacs.org Tue Feb 16 21:22:22 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 17 Feb 2016 11:22:22 +0900 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: <56C37F07.8020904@g.nevcal.com> References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> Message-ID: <22211.55646.297661.231923@turnbull.sk.tsukuba.ac.jp> Glenn Linderman writes: > I think hashes of all types have been randomized, not _just_ the list > you mentioned. Yes. There's only one hash function used, which operates on byte streams IIRC. That function now has a random offset. The details of hashing each type are in the serializations to byte streams. From shell909090 at gmail.com Tue Feb 16 22:45:57 2016 From: shell909090 at gmail.com (Shell Xu) Date: Wed, 17 Feb 2016 11:45:57 +0800 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: <20160217015445.GA12028@ando.pearwood.info> References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <20160217015445.GA12028@ando.pearwood.info> Message-ID: I thought you are right. Here is the source code in python 2.7.11: long PyObject_Hash(PyObject *v) { PyTypeObject *tp = v->ob_type; if (tp->tp_hash != NULL) return (*tp->tp_hash)(v); /* To keep to the general practice that inheriting * solely from object in C code should work without * an explicit call to PyType_Ready, we implicitly call * PyType_Ready here and then check the tp_hash slot again */ if (tp->tp_dict == NULL) { if (PyType_Ready(tp) < 0) return -1; if (tp->tp_hash != NULL) return (*tp->tp_hash)(v); } if (tp->tp_compare == NULL && RICHCOMPARE(tp) == NULL) { return _Py_HashPointer(v); /* Use address as hash value */ } /* If there's a cmp but no hash defined, the object can't be hashed */ return PyObject_HashNotImplemented(v); } If object has hash function, it will be used. If not, _Py_HashPointer will be used. Which _Py_HashSecret are not used. And I checked reference of _Py_HashSecret. Only bufferobject, unicodeobject and stringobject use _Py_HashSecret. On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano wrote: > On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote: > > On 2/16/2016 1:48 AM, Christoph Groth wrote: > > >Hello, > > > > > >Recent Python versions randomize the hashes of str, bytes and datetime > > >objects. I suppose that the choice of these three types is the result > > >of a compromise. Has this been discussed somewhere publicly? > > > > Search archives of this list... it was discussed at length. > > There's a lot of discussion on the mailing list. I think that this is > the very start of it, in Dec 2011: > > https://mail.python.org/pipermail/python-dev/2011-December/115116.html > > and continuing into 2012, for example: > > https://mail.python.org/pipermail/python-dev/2012-January/115577.html > https://mail.python.org/pipermail/python-dev/2012-January/115690.html > > and a LOT more, spread over many different threads and subject lines. > > You should also read the issue on the bug tracker: > > http://bugs.python.org/issue13703 > > > My recollection is that it was decided that only strings and bytes need > to have their hashes randomized, because only strings and bytes can be > used directly from user-input without first having a conversion step > with likely input range validation. In addition, changing the hash for > ints would break too much code for too little benefit: unlike strings, > where hash collision attacks on web apps are proven and easy, hash > collision attacks based on ints are more difficult and rare. > > See also the comment here: > > http://bugs.python.org/issue13703#msg151847 > > > > > >I'm not a web programmer, but don't web applications also use > > >dictionaries that are indexed by, say, tuples of integers? > > > > Sure, and that is the biggest part of the reason they were randomized. > > But they aren't, as far as I can see: > > [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" > 1071302475 > [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" > 1071302475 > > Web apps can use dicts indexed by anything that they like, but unless > there is an actual attack, what does it matter? Guido makes a good point > about security here: > > https://mail.python.org/pipermail/python-dev/2013-October/129181.html > > > > > I think hashes of all types have been randomized, not _just_ the list > > you mentioned. > > I'm pretty sure that's not actually the case. Using 3.6 from the repo > (admittedly not fully up to date though), I can see hash randomization > working for strings: > > [steve at ando 3.6]$ ./python -c "print(hash('abc'))" > 11601873 > [steve at ando 3.6]$ ./python -c "print(hash('abc'))" > -2009889747 > > but not for ints: > > [steve at ando 3.6]$ ./python -c "print(hash(42))" > 42 > [steve at ando 3.6]$ ./python -c "print(hash(42))" > 42 > > > which agrees with my recollection that only strings and bytes would be > randomized. > > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/shell909090%40gmail.com > -- ????????????????????????????????? blog: http://shell909090.org/blog/ twitter: @shell909090 about.me: http://about.me/shell909090 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.kaplinskiy at gmail.com Tue Feb 16 23:56:20 2016 From: mike.kaplinskiy at gmail.com (Mike Kaplinskiy) Date: Tue, 16 Feb 2016 20:56:20 -0800 Subject: [Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) Message-ID: Hey folks, I hope this is the right list for this sort of thing (python-ideas seemed more far-fetched). For some context: there is currently a issue with pex that causes sys.modules lookups to stop working for __main__. In turns this makes unittest.run() & pkg_resources.resource_* fail. The root cause is that pex uses runpy.run_module with alter_sys=False. The fix should be to just pass alter_sys=True, but that changes sys.argv[0] and various existing pex files depend on that being the pex file. You can read more at https://github.com/pantsbuild/pex/pull/211 . Conservatively, I'd like to propose adding an argument to disable this behavior. The current behavior breaks a somewhat reasonable invariant that you can restart your program via `os.execv([sys.executable] + sys.argv)`. Moreover it might be user-friendly to add a `argv=sys.argv[1:]` argument to set & restore the full arguments to the module, where `argv=None` disables argv[0] switching. What do you think? Mike. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Wed Feb 17 00:42:15 2016 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 17 Feb 2016 05:42:15 +0000 Subject: [Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) In-Reply-To: References: Message-ID: On Tue, Feb 16, 2016 at 9:00 PM Mike Kaplinskiy wrote: > Hey folks, > > I hope this is the right list for this sort of thing (python-ideas seemed > more far-fetched). > > For some context: there is currently a issue with pex that causes > sys.modules lookups to stop working for __main__. In turns this makes > unittest.run() & pkg_resources.resource_* fail. The root cause is that pex > uses runpy.run_module with alter_sys=False. The fix should be to just pass > alter_sys=True, but that changes sys.argv[0] and various existing pex files > depend on that being the pex file. You can read more at > https://github.com/pantsbuild/pex/pull/211 . > > Conservatively, I'd like to propose adding an argument to disable this > behavior. The current behavior breaks a somewhat reasonable invariant that > you can restart your program via `os.execv([sys.executable] + sys.argv)`. > I don't know enough about pex to really dig into what it is trying to do so this is tangential to answering your question but: sys.executable may be None. ex: If you're an embedded Python interpreter there is no Python executable. It cannot be blindly used re-execute the current process. sys.argv represents the C main() argv array. Your inclination (in the linked to bug above) to leave sys.argv[0] alone is a good one. -gps Moreover it might be user-friendly to add a `argv=sys.argv[1:]` argument to > set & restore the full arguments to the module, where `argv=None` disables > argv[0] switching. > > What do you think? > > Mike. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Wed Feb 17 02:34:29 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 17 Feb 2016 08:34:29 +0100 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <20160217015445.GA12028@ando.pearwood.info> Message-ID: Note that hashing in python 2.7 and prior to 3.4 is simply broken and the randomization does not do nearly enough, see https://bugs.python.org/issue14621 On Wed, Feb 17, 2016 at 4:45 AM, Shell Xu wrote: > I thought you are right. Here is the source code in python 2.7.11: > > long > PyObject_Hash(PyObject *v) > { > PyTypeObject *tp = v->ob_type; > if (tp->tp_hash != NULL) > return (*tp->tp_hash)(v); > /* To keep to the general practice that inheriting > * solely from object in C code should work without > * an explicit call to PyType_Ready, we implicitly call > * PyType_Ready here and then check the tp_hash slot again > */ > if (tp->tp_dict == NULL) { > if (PyType_Ready(tp) < 0) > return -1; > if (tp->tp_hash != NULL) > return (*tp->tp_hash)(v); > } > if (tp->tp_compare == NULL && RICHCOMPARE(tp) == NULL) { > return _Py_HashPointer(v); /* Use address as hash value */ > } > /* If there's a cmp but no hash defined, the object can't be hashed */ > return PyObject_HashNotImplemented(v); > } > > If object has hash function, it will be used. If not, _Py_HashPointer will > be used. Which _Py_HashSecret are not used. > And I checked reference of _Py_HashSecret. Only bufferobject, unicodeobject > and stringobject use _Py_HashSecret. > > On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano > wrote: >> >> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote: >> > On 2/16/2016 1:48 AM, Christoph Groth wrote: >> > >Hello, >> > > >> > >Recent Python versions randomize the hashes of str, bytes and datetime >> > >objects. I suppose that the choice of these three types is the result >> > >of a compromise. Has this been discussed somewhere publicly? >> > >> > Search archives of this list... it was discussed at length. >> >> There's a lot of discussion on the mailing list. I think that this is >> the very start of it, in Dec 2011: >> >> https://mail.python.org/pipermail/python-dev/2011-December/115116.html >> >> and continuing into 2012, for example: >> >> https://mail.python.org/pipermail/python-dev/2012-January/115577.html >> https://mail.python.org/pipermail/python-dev/2012-January/115690.html >> >> and a LOT more, spread over many different threads and subject lines. >> >> You should also read the issue on the bug tracker: >> >> http://bugs.python.org/issue13703 >> >> >> My recollection is that it was decided that only strings and bytes need >> to have their hashes randomized, because only strings and bytes can be >> used directly from user-input without first having a conversion step >> with likely input range validation. In addition, changing the hash for >> ints would break too much code for too little benefit: unlike strings, >> where hash collision attacks on web apps are proven and easy, hash >> collision attacks based on ints are more difficult and rare. >> >> See also the comment here: >> >> http://bugs.python.org/issue13703#msg151847 >> >> >> >> > >I'm not a web programmer, but don't web applications also use >> > >dictionaries that are indexed by, say, tuples of integers? >> > >> > Sure, and that is the biggest part of the reason they were randomized. >> >> But they aren't, as far as I can see: >> >> [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" >> 1071302475 >> [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" >> 1071302475 >> >> Web apps can use dicts indexed by anything that they like, but unless >> there is an actual attack, what does it matter? Guido makes a good point >> about security here: >> >> https://mail.python.org/pipermail/python-dev/2013-October/129181.html >> >> >> >> > I think hashes of all types have been randomized, not _just_ the list >> > you mentioned. >> >> I'm pretty sure that's not actually the case. Using 3.6 from the repo >> (admittedly not fully up to date though), I can see hash randomization >> working for strings: >> >> [steve at ando 3.6]$ ./python -c "print(hash('abc'))" >> 11601873 >> [steve at ando 3.6]$ ./python -c "print(hash('abc'))" >> -2009889747 >> >> but not for ints: >> >> [steve at ando 3.6]$ ./python -c "print(hash(42))" >> 42 >> [steve at ando 3.6]$ ./python -c "print(hash(42))" >> 42 >> >> >> which agrees with my recollection that only strings and bytes would be >> randomized. >> >> >> >> -- >> Steve >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/shell909090%40gmail.com > > > > > -- > ????????????????????????????????? > blog: http://shell909090.org/blog/ > twitter: @shell909090 > about.me: http://about.me/shell909090 > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > From christoph at grothesque.org Wed Feb 17 04:49:15 2016 From: christoph at grothesque.org (Christoph Groth) Date: Wed, 17 Feb 2016 10:49:15 +0100 Subject: [Python-Dev] Hash randomization for which types? References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <22211.55646.297661.231923@turnbull.sk.tsukuba.ac.jp> Message-ID: <878u2jis4k.fsf@grothesque.org> Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > I think hashes of all types have been randomized, not _just_ the list > > you mentioned. > > Yes. There's only one hash function used, which operates on byte > streams IIRC. That function now has a random offset. The details of > hashing each type are in the serializations to byte streams. Could you please elaborate? Numbers are not hashed as byte streams, at least not up to Python 3.5. I am quite familiar with the way hashing of numbers is done in Python 2 & 3. (I had to re-implement this for a project of mine: https://pypi.python.org/pypi/tinyarray/) From antoine at python.org Wed Feb 17 06:04:49 2016 From: antoine at python.org (Antoine Pitrou) Date: Wed, 17 Feb 2016 11:04:49 +0000 (UTC) Subject: [Python-Dev] Wordcode v2 References: Message-ID: Demur Rumed gmail.com> writes: > I've personally benchmarked this fork with positive results. I'm skeptical of claims like this. What did you benchmark exactly, and with which results? I don't think changing the opcode encoding per se will bring any large benefit... Regards Antoine. From larry at hastings.org Wed Feb 17 08:29:31 2016 From: larry at hastings.org (Larry Hastings) Date: Wed, 17 Feb 2016 08:29:31 -0500 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: <22211.55646.297661.231923@turnbull.sk.tsukuba.ac.jp> References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <22211.55646.297661.231923@turnbull.sk.tsukuba.ac.jp> Message-ID: <56C475BB.8070707@hastings.org> On 02/16/2016 09:22 PM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > I think hashes of all types have been randomized, not _just_ the list > > you mentioned. > > Yes. There's only one hash function used, which operates on byte > streams IIRC. That function now has a random offset. The details of > hashing each type are in the serializations to byte streams. Both these statements are wrong. int objects have their own hash algorithm, built in to long_hash() in Objects/longobject.c. The hash of an int is the value of the int, unless it's -1 or doesn't fit into the native type. And ints don't participate in hash randomization. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Feb 17 08:34:33 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 17 Feb 2016 22:34:33 +0900 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: <878u2jis4k.fsf@grothesque.org> References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <22211.55646.297661.231923@turnbull.sk.tsukuba.ac.jp> <878u2jis4k.fsf@grothesque.org> Message-ID: <22212.30441.653905.908389@turnbull.sk.tsukuba.ac.jp> Christoph Groth writes: > Stephen J. Turnbull wrote: > > Yes. There's only one hash function used, which operates on byte > > streams IIRC. That function now has a random offset. The details of > > hashing each type are in the serializations to byte streams. > > Could you please elaborate? Numbers are not hashed as byte streams, Just a stupid mistake on my part. Should have reviewed the code first. I'll shut up now, take my fly meds, get some sleep, drink coffee in the morning, and then take an axe to my keyboard. :-( Steve From rosuav at gmail.com Wed Feb 17 08:49:58 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 18 Feb 2016 00:49:58 +1100 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: <56C475BB.8070707@hastings.org> References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <22211.55646.297661.231923@turnbull.sk.tsukuba.ac.jp> <56C475BB.8070707@hastings.org> Message-ID: On Thu, Feb 18, 2016 at 12:29 AM, Larry Hastings wrote: > int objects have their own hash algorithm, built in to long_hash() in > Objects/longobject.c. The hash of an int is the value of the int, unless > it's -1 or doesn't fit into the native type. Can someone elaborate on this special case, please? I can see the code there, but there's no comment. Is there some value in not hashing to -1? ChrisA From larry at hastings.org Wed Feb 17 09:05:24 2016 From: larry at hastings.org (Larry Hastings) Date: Wed, 17 Feb 2016 09:05:24 -0500 Subject: [Python-Dev] Hash randomization for which types? In-Reply-To: References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <22211.55646.297661.231923@turnbull.sk.tsukuba.ac.jp> <56C475BB.8070707@hastings.org> Message-ID: <56C47E24.2090807@hastings.org> On 02/17/2016 08:49 AM, Chris Angelico wrote: > On Thu, Feb 18, 2016 at 12:29 AM, Larry Hastings wrote: >> int objects have their own hash algorithm, built in to long_hash() in >> Objects/longobject.c. The hash of an int is the value of the int, unless >> it's -1 or doesn't fit into the native type. > Can someone elaborate on this special case, please? I can see the code > there, but there's no comment. Is there some value in not hashing to > -1? Returning -1 indicates an error / exception. So hash functions never return -1 as a hash value. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph at grothesque.org Wed Feb 17 09:51:50 2016 From: christoph at grothesque.org (Christoph Groth) Date: Wed, 17 Feb 2016 15:51:50 +0100 Subject: [Python-Dev] Hash randomization for which types? References: <87k2m50ywa.fsf@grothesque.org> <56C37F07.8020904@g.nevcal.com> <20160217015445.GA12028@ando.pearwood.info> Message-ID: <87r3gbgzjt.fsf@grothesque.org> Steven D'Aprano wrote: > On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote: >> On 2/16/2016 1:48 AM, Christoph Groth wrote: >> >Recent Python versions randomize the hashes of str, bytes and datetime >> >objects. I suppose that the choice of these three types is the result >> >of a compromise. Has this been discussed somewhere publicly? >> >> Search archives of this list... it was discussed at length. > > There's a lot of discussion on the mailing list. I think that this is > the very start of it, in Dec 2011: > (...) I tried searching myself for an hour or so, but though I found many discussions, I didn't see any discussion about whether hashes of other types should be randomized as well. The relevant PEP also doesn't touch this issue. > My recollection is that it was decided that only strings and bytes need > to have their hashes randomized, because only strings and bytes can be > used directly from user-input without first having a conversion step > with likely input range validation. In addition, changing the hash for > ints would break too much code for too little benefit: unlike strings, > where hash collision attacks on web apps are proven and easy, hash > collision attacks based on ints are more difficult and rare. > > See also the comment here: > > http://bugs.python.org/issue13703#msg151847 Perfect, that's exactly what I was looking for. I am reassured that this has been thought through. Thanks a lot! Christoph From mike.kaplinskiy at gmail.com Wed Feb 17 01:20:40 2016 From: mike.kaplinskiy at gmail.com (Mike Kaplinskiy) Date: Tue, 16 Feb 2016 22:20:40 -0800 Subject: [Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) In-Reply-To: References: Message-ID: On Tue, Feb 16, 2016 at 9:42 PM, Gregory P. Smith wrote: > > On Tue, Feb 16, 2016 at 9:00 PM Mike Kaplinskiy > wrote: > >> Hey folks, >> >> I hope this is the right list for this sort of thing (python-ideas seemed >> more far-fetched). >> >> For some context: there is currently a issue with pex that causes >> sys.modules lookups to stop working for __main__. In turns this makes >> unittest.run() & pkg_resources.resource_* fail. The root cause is that pex >> uses runpy.run_module with alter_sys=False. The fix should be to just pass >> alter_sys=True, but that changes sys.argv[0] and various existing pex files >> depend on that being the pex file. You can read more at >> https://github.com/pantsbuild/pex/pull/211 . >> >> Conservatively, I'd like to propose adding an argument to disable this >> behavior. The current behavior breaks a somewhat reasonable invariant that >> you can restart your program via `os.execv([sys.executable] + sys.argv)`. >> > > I don't know enough about pex to really dig into what it is trying to do > so this is tangential to answering your question but: > Sorry about that - a pex file is kind of like a relocatable virtualenv in one zip file. When it runs, it first executes some pex-specific code to extract packages (.egg, .whl) and add them to sys.path before running the actual user code. It's conceptually similar to a fat .jar file in JVM projects - all you need is `python`/`java` and all the code is in one file. > sys.executable may be None. ex: If you're an embedded Python interpreter > there is no Python executable. It cannot be blindly used re-execute the > current process. > > sys.argv represents the C main() argv array. Your inclination (in the > linked to bug above) to leave sys.argv[0] alone is a good one. > I was originally going to argue for getting rid of the feature entirely, but if runpy is to live up to the promise of being exactly the same as `python -m XXX yyy zzz`, it needs to be there. IMO it's bad form to depend on sys.argv[0] for anything but presentation purposes - usage messages and the like. It's hard to justify breaking compatibility for that though - unfortunately the runpy interface isn't pliable enough to really reimplement or unimplement this feature, and doing it by hand is...painful. You can also make an argument that `python runmodule.py module a b` and `python -m module a b` should be _exactly_ the same output, especially if `runmodule.py` is implementing something pass-through like profiling, coverage or tracing. A nicer interface might be some sort of callback to "do whatever you want before the module is executed", but that might be overkill. > > -gps > > Moreover it might be user-friendly to add a `argv=sys.argv[1:]` argument >> to set & restore the full arguments to the module, where `argv=None` >> disables argv[0] switching. >> >> What do you think? >> >> Mike. >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Feb 17 13:44:52 2016 From: brett at python.org (Brett Cannon) Date: Wed, 17 Feb 2016 18:44:52 +0000 Subject: [Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) In-Reply-To: References: Message-ID: On Tue, 16 Feb 2016 at 20:59 Mike Kaplinskiy wrote: > Hey folks, > > I hope this is the right list for this sort of thing (python-ideas seemed > more far-fetched). > > For some context: there is currently a issue with pex that causes > sys.modules lookups to stop working for __main__. In turns this makes > unittest.run() & pkg_resources.resource_* fail. The root cause is that pex > uses runpy.run_module with alter_sys=False. The fix should be to just pass > alter_sys=True, but that changes sys.argv[0] and various existing pex files > depend on that being the pex file. You can read more at > https://github.com/pantsbuild/pex/pull/211 . > > Conservatively, I'd like to propose adding an argument to disable this > behavior. The current behavior breaks a somewhat reasonable invariant that > you can restart your program via `os.execv([sys.executable] + sys.argv)`. > Moreover it might be user-friendly to add a `argv=sys.argv[1:]` argument to > set & restore the full arguments to the module, where `argv=None` disables > argv[0] switching. > > What do you think? > This probably is best served as a feature request on bugs.python.org since it's not asking for some massive change or new feature but just a minor tweak to a module. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Feb 17 13:44:44 2016 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 17 Feb 2016 18:44:44 +0000 Subject: [Python-Dev] Buffer overflow bug in GNU C's getaddrinfo() Message-ID: <56C4BF9C.7060607@mrabarnett.plus.com> Is this something that we need to worry about? Extremely severe bug leaves dizzying number of software and devices vulnerable http://arstechnica.com/security/2016/02/extremely-severe-bug-leaves-dizzying-number-of-apps-and-devices-vulnerable/ From abarnert at yahoo.com Wed Feb 17 15:09:59 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 17 Feb 2016 12:09:59 -0800 Subject: [Python-Dev] Buffer overflow bug in GNU C's getaddrinfo() In-Reply-To: <56C4BF9C.7060607@mrabarnett.plus.com> References: <56C4BF9C.7060607@mrabarnett.plus.com> Message-ID: On Feb 17, 2016, at 10:44, MRAB wrote: > > Is this something that we need to worry about? > > Extremely severe bug leaves dizzying number of software and devices vulnerable > http://arstechnica.com/security/2016/02/extremely-severe-bug-leaves-dizzying-number-of-apps-and-devices-vulnerable/ Is there a workaround that Python and/or Python apps should be doing, or is this just a matter of everyone on glibc 2.9+ needs to update their glibc? From guido at python.org Wed Feb 17 16:21:27 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Feb 2016 13:21:27 -0800 Subject: [Python-Dev] Buffer overflow bug in GNU C's getaddrinfo() In-Reply-To: References: <56C4BF9C.7060607@mrabarnett.plus.com> Message-ID: Does python.org serve any Python binaries that are statically linked with a vulnerable glibc? That seems to be the question. If not, it's up to the downstream distributions. On Wed, Feb 17, 2016 at 12:09 PM, Andrew Barnert via Python-Dev wrote: > On Feb 17, 2016, at 10:44, MRAB wrote: >> >> Is this something that we need to worry about? >> >> Extremely severe bug leaves dizzying number of software and devices vulnerable >> http://arstechnica.com/security/2016/02/extremely-severe-bug-leaves-dizzying-number-of-apps-and-devices-vulnerable/ > > Is there a workaround that Python and/or Python apps should be doing, or is this just a matter of everyone on glibc 2.9+ needs to update their glibc? > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From greg at krypto.org Wed Feb 17 16:46:40 2016 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 17 Feb 2016 21:46:40 +0000 Subject: [Python-Dev] Buffer overflow bug in GNU C's getaddrinfo() In-Reply-To: References: <56C4BF9C.7060607@mrabarnett.plus.com> Message-ID: On Wed, Feb 17, 2016 at 12:12 PM Andrew Barnert via Python-Dev < python-dev at python.org> wrote: > On Feb 17, 2016, at 10:44, MRAB wrote: > > > > Is this something that we need to worry about? > > > > Extremely severe bug leaves dizzying number of software and devices > vulnerable > > > http://arstechnica.com/security/2016/02/extremely-severe-bug-leaves-dizzying-number-of-apps-and-devices-vulnerable/ > > Is there a workaround that Python and/or Python apps should be doing, or > is this just a matter of everyone on glibc 2.9+ needs to update their glibc? > There are no workarounds that we could put within Python. People need to update their glibc and reboot. All *useful(*)* Linux distros have already released update packages. All of the infrastructure running Linux needs the update applied and a reboot (I'm guessing our infrastructure peeps have already done that). But this also includes Linux buildbots run by our random set of buildbot donors. -gps (*) off topic: Raspbian Wheezy is apparently not on the useful list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From randyeels at gmail.com Wed Feb 17 17:24:02 2016 From: randyeels at gmail.com (Randy Eels) Date: Wed, 17 Feb 2016 23:24:02 +0100 Subject: [Python-Dev] When does `PyType_Type.tp_alloc get assigned to PyType_GenericAlloc ? In-Reply-To: References: Message-ID: On Sun, Feb 7, 2016 at 8:45 PM, Guido van Rossum wrote: > I think it's probably line 2649 in typeobject.c, in type_new(): > > type->tp_alloc = PyType_GenericAlloc; I pondered it but it doesn't seem to be that. Isn't `type_new` called *after* PyType_Type.tp_alloc has been called? I thought that line was only being executed for user-defined types, and maybe some built-in types, but certainly not PyType_Type. On Sun, Feb 7, 2016 at 9:27 PM, eryk sun wrote: > On Sun, Feb 7, 2016 at 7:58 AM, Randy Eels wrote: > > > > Yet, I can't seem to understand where and when does the `tp_alloc` slot > of > > PyType_Type get re-assigned to PyType_GenericAlloc. Does that even > happen? > > Or am I missing something bigger? > > _Py_InitializeEx_Private in Python/pylifecycle.c calls _Py_ReadyTypes > in Objects/object.c. This calls PyType_Ready(&PyType_Type) in > Objects/typeobject.c, which assigns type->tp_base = &PyBaseObject_Type > and then calls inherit_slots. This executes COPYSLOT(tp_alloc), which > assigns PyType_Type.tp_alloc = PyBaseObject_Type.tp_alloc, which is > statically assigned as PyType_GenericAlloc. > > Debug trace on Windows: > > 0:000> bp python35!PyType_Ready > 0:000> g > Breakpoint 0 hit > python35!PyType_Ready: > 00000000`6502d160 4053 push rbx > 0:000> ?? ((PyTypeObject *)@rcx)->tp_name > char * 0x00000000`650e4044 > "object" > 0:000> g > Breakpoint 0 hit > python35!PyType_Ready: > 00000000`6502d160 4053 push rbx > 0:000> ?? ((PyTypeObject *)@rcx)->tp_name > char * 0x00000000`651d8e5c > "type" > 0:000> bp python35!inherit_slots > 0:000> g > Breakpoint 1 hit > python35!inherit_slots: > 00000000`6502c440 48895c2408 mov qword ptr [rsp+8],rbx > ss:00000000`0028f960={ > python35!PyType_Type > (00000000`6527cba0)} > > At entry to inherit_slots, PyType_Type.tp_alloc is NULL: > > 0:000> ?? python35!PyType_Type.tp_alloc > * 0x00000000`00000000 > 0:000> pt > python35!inherit_slots+0xd17: > 00000000`6502d157 c3 ret > > At exit it's set to PyType_GenericAlloc: > > 0:000> ?? python35!PyType_Type.tp_alloc > * 0x00000000`65025580 > 0:000> ln 65025580 > (00000000`65025580) python35!PyType_GenericAlloc | > (00000000`650256a0) python35!PyType_GenericNew > Exact matches: > python35!PyType_GenericAlloc (void) > This makes quite a bit of sense. I completely overlooked the interpreter init routines. Thank you both Guido and eryk! -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.hirschfeld at gmail.com Wed Feb 17 18:26:29 2016 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 17 Feb 2016 23:26:29 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?RE_25939_-_=5Fssl=2Eenum=5Fcertificates_br?= =?utf-8?q?oken_on_Windows?= Message-ID: I've run into issue 25939 (https://bugs.python.org/issue25939) when trying to deploy a python webapp with IIS on Windows. This issue is preventing us from deploying the app to production as the workaround AFAICT requires running the app under an admin account. Apologies if this is an inappropriate forum for a +1 but I just wanted to let the devs know that this is an issue which affects the use of Python (on Windows) in the enterprise. I noticed that the patch hasn't been merged yet so was interested in making sure it didn't fall by the wayside... As a mere user I don't expect the devs to prioritize my own problems which no doubt only affect a very small number of python users but I would be very grateful if the patch did make it into a minor release. Regards, Dave From p.f.moore at gmail.com Thu Feb 18 05:49:45 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 18 Feb 2016 10:49:45 +0000 Subject: [Python-Dev] RE 25939 - _ssl.enum_certificates broken on Windows In-Reply-To: References: Message-ID: On 17 February 2016 at 23:26, Dave Hirschfeld wrote: > I've run into issue 25939 (https://bugs.python.org/issue25939) when trying > to deploy a python webapp with IIS on Windows. This issue is preventing us > from deploying the app to production as the workaround AFAICT requires > running the app under an admin account. > > Apologies if this is an inappropriate forum for a +1 but I just wanted to > let the devs know that this is an issue which affects the use of Python > (on Windows) in the enterprise. I noticed that the patch hasn't been > merged yet so was interested in making sure it didn't fall by the > wayside... > > As a mere user I don't expect the devs to prioritize my own problems which > no doubt only affect a very small number of python users but I would be > very grateful if the patch did make it into a minor release. Looks like Benjamin has committed the fix. Paul From mike.kaplinskiy at gmail.com Thu Feb 18 17:37:51 2016 From: mike.kaplinskiy at gmail.com (Mike Kaplinskiy) Date: Thu, 18 Feb 2016 14:37:51 -0800 Subject: [Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) In-Reply-To: References: Message-ID: Done: http://bugs.python.org/issue26388 On Wed, Feb 17, 2016 at 10:44 AM, Brett Cannon wrote: > > > On Tue, 16 Feb 2016 at 20:59 Mike Kaplinskiy > wrote: > >> Hey folks, >> >> I hope this is the right list for this sort of thing (python-ideas seemed >> more far-fetched). >> >> For some context: there is currently a issue with pex that causes >> sys.modules lookups to stop working for __main__. In turns this makes >> unittest.run() & pkg_resources.resource_* fail. The root cause is that pex >> uses runpy.run_module with alter_sys=False. The fix should be to just pass >> alter_sys=True, but that changes sys.argv[0] and various existing pex files >> depend on that being the pex file. You can read more at >> https://github.com/pantsbuild/pex/pull/211 . >> >> Conservatively, I'd like to propose adding an argument to disable this >> behavior. The current behavior breaks a somewhat reasonable invariant that >> you can restart your program via `os.execv([sys.executable] + sys.argv)`. >> Moreover it might be user-friendly to add a `argv=sys.argv[1:]` argument to >> set & restore the full arguments to the module, where `argv=None` disables >> argv[0] switching. >> >> What do you think? >> > > This probably is best served as a feature request on bugs.python.org > since it's not asking for some massive change or new feature but just a > minor tweak to a module. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.hirschfeld at gmail.com Thu Feb 18 18:18:07 2016 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 18 Feb 2016 23:18:07 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?RE_25939_-_=5Fssl=2Eenum=5Fcertificates_b?= =?utf-8?q?roken_on_Windows?= References: Message-ID: Paul Moore gmail.com> writes: > > On 17 February 2016 at 23:26, Dave Hirschfeld gmail.com> wrote: > > I've run into issue 25939 (https://bugs.python.org/issue25939) when trying > > to deploy a python webapp with IIS on Windows. This issue is preventing us > > from deploying the app to production as the workaround AFAICT requires > > running the app under an admin account. > > > > As a mere user I don't expect the devs to prioritize my own problems which > > no doubt only affect a very small number of python users but I would be > > very grateful if the patch did make it into a minor release. > > Looks like Benjamin has committed the fix. > > Paul > The issue is still open because of an unresolved question about testing but the patch has been committed so thanks for that! Reminds me of the saying "perfection is the enemy of shipped". I'd help with getting the test committed but unfortunately it's well outside my area of expertise... -Dave From steve.dower at python.org Thu Feb 18 19:18:20 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Feb 2016 16:18:20 -0800 Subject: [Python-Dev] RE 25939 - _ssl.enum_certificates broken on Windows In-Reply-To: References: Message-ID: I think the test is blocked on my question of whether we are allowed to rely on ctypes in the test suite. If so, it's fine as I recall. Fairly sure it's a Windows-specific test anyway, so ctypes can basically be assumed for all Windows platforms we currently care about. Top-posted from my Windows Phone -----Original Message----- From: "Dave Hirschfeld" Sent: ?2/?18/?2016 15:20 To: "python-dev at python.org" Subject: Re: [Python-Dev] RE 25939 - _ssl.enum_certificates broken on Windows Paul Moore gmail.com> writes: > > On 17 February 2016 at 23:26, Dave Hirschfeld gmail.com> wrote: > > I've run into issue 25939 (https://bugs.python.org/issue25939) when trying > > to deploy a python webapp with IIS on Windows. This issue is preventing us > > from deploying the app to production as the workaround AFAICT requires > > running the app under an admin account. > > > > As a mere user I don't expect the devs to prioritize my own problems which > > no doubt only affect a very small number of python users but I would be > > very grateful if the patch did make it into a minor release. > > Looks like Benjamin has committed the fix. > > Paul > The issue is still open because of an unresolved question about testing but the patch has been committed so thanks for that! Reminds me of the saying "perfection is the enemy of shipped". I'd help with getting the test committed but unfortunately it's well outside my area of expertise... -Dave _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Thu Feb 18 19:24:45 2016 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 18 Feb 2016 19:24:45 -0500 Subject: [Python-Dev] RE 25939 - _ssl.enum_certificates broken on Windows In-Reply-To: References: Message-ID: <9FFCE035-9F6A-46FC-B2F4-0BE0E3FE51A1@trueblade.com> There are already many tests that require ctypes. See for example test_uuid.py. -- Eric. > On Feb 18, 2016, at 7:18 PM, Steve Dower wrote: > > I think the test is blocked on my question of whether we are allowed to rely on ctypes in the test suite. > > If so, it's fine as I recall. Fairly sure it's a Windows-specific test anyway, so ctypes can basically be assumed for all Windows platforms we currently care about. > > Top-posted from my Windows Phone > From: Dave Hirschfeld > Sent: ?2/?18/?2016 15:20 > To: python-dev at python.org > Subject: Re: [Python-Dev] RE 25939 - _ssl.enum_certificates broken on Windows > > Paul Moore gmail.com> writes: > > > > > On 17 February 2016 at 23:26, Dave Hirschfeld > gmail.com> wrote: > > > I've run into issue 25939 (https://bugs.python.org/issue25939) when > trying > > > to deploy a python webapp with IIS on Windows. This issue is > preventing us > > > from deploying the app to production as the workaround AFAICT > requires > > > running the app under an admin account. > > > > > > > As a mere user I don't expect the devs to prioritize my own problems > which > > > no doubt only affect a very small number of python users but I would > be > > > very grateful if the patch did make it into a minor release. > > > > Looks like Benjamin has committed the fix. > > > > Paul > > > > The issue is still open because of an unresolved question about testing > but the patch has been committed so thanks for that! Reminds me of the > saying "perfection is the enemy of shipped". > > I'd help with getting the test committed but unfortunately it's well > outside my area of expertise... > > > -Dave > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Feb 19 12:08:34 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 19 Feb 2016 18:08:34 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160219170834.A1CF656668@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-02-12 - 2016-02-19) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5432 (+15) closed 32718 (+27) total 38150 (+42) Open issues with patches: 2375 Issues opened (35) ================== #25939: _ssl.enum_certificates() fails with ERROR_ACCESS_DENIED if pyt http://bugs.python.org/issue25939 reopened by benjamin.peterson #26351: Occasionally check for Ctrl-C in long-running operations like http://bugs.python.org/issue26351 opened by steven.daprano #26352: getpass incorrectly displays password prompt on stderr on fall http://bugs.python.org/issue26352 opened by Matt Hooks #26353: IDLE: Saving Shell should not add \n http://bugs.python.org/issue26353 opened by terry.reedy #26355: Emit major version based canonical URLs for docs http://bugs.python.org/issue26355 opened by ncoghlan #26357: asyncio.wait loses coroutine return value http://bugs.python.org/issue26357 opened by Andr?? Caron #26358: mmap.mmap.__iter__ is broken (yields bytes instead of ints) http://bugs.python.org/issue26358 opened by ztane #26359: CPython build options for out-of-the box performance http://bugs.python.org/issue26359 opened by alecsandru.patrascu #26360: Deadlock in thread.join on Python 2.7/Mac OS X 10.9 http://bugs.python.org/issue26360 opened by mark.dickinson #26362: Approved API for creating a temporary file path http://bugs.python.org/issue26362 opened by bignose #26363: __builtins__ propagation is misleading described in exec and e http://bugs.python.org/issue26363 opened by xcombelle #26366: Use ???.. versionadded??? over ???.. versionchanged??? where a http://bugs.python.org/issue26366 opened by Tony R. #26367: importlib.__import__ does not fail for invalid relative import http://bugs.python.org/issue26367 opened by mjacob #26369: doc for unicode.decode and str.encode is unnecessarily confusi http://bugs.python.org/issue26369 opened by benspiller #26370: shelve filename inconsistent between platforms http://bugs.python.org/issue26370 opened by Dima.Tisnek #26372: Popen.communicate not ignoring BrokenPipeError http://bugs.python.org/issue26372 opened by memeplex #26373: asyncio: add support for async context manager on streams? http://bugs.python.org/issue26373 opened by haypo #26374: concurrent_futures Executor.map semantics better specified in http://bugs.python.org/issue26374 opened by F.D. Sacerdoti #26375: New versions of Python hangs on imaplib.IMAP4_SSL() http://bugs.python.org/issue26375 opened by mniklas #26376: Tkinter root window won't close if packed. http://bugs.python.org/issue26376 opened by Sam Yeager #26377: Tkinter dialogs will not close if root window not packed. http://bugs.python.org/issue26377 opened by Sam Yeager #26379: zlib decompress as_bytearray flag http://bugs.python.org/issue26379 opened by llllllllll #26380: Add an http method enum http://bugs.python.org/issue26380 opened by demian.brecht #26381: Add 'geo' URI scheme (RFC 5870) to urllib.parse.uses_params http://bugs.python.org/issue26381 opened by Serhiy Int #26382: List object memory allocator http://bugs.python.org/issue26382 opened by catalin.manciu #26383: benchmarks (perf.py): number of decimal places in csv output http://bugs.python.org/issue26383 opened by florin.papa #26384: UnboundLocalError in socket._sendfile_use_sendfile http://bugs.python.org/issue26384 opened by berker.peksag #26385: the call of tempfile.NamedTemporaryFile fails and leaves a fil http://bugs.python.org/issue26385 opened by Eugene Viktorov #26386: tkinter - Treeview - .selection_add and selection_toggle http://bugs.python.org/issue26386 opened by gbarnabic #26387: Race condition in sqlite module http://bugs.python.org/issue26387 opened by scorp #26388: Disabling changing sys.argv[0] with runpy.run_module(...alter_ http://bugs.python.org/issue26388 opened by Mike Kaplinskiy #26389: Expand traceback module API to accept just an exception as an http://bugs.python.org/issue26389 opened by brett.cannon #26390: hashlib's pbkdf2_hmac documentation "rounds" does not match so http://bugs.python.org/issue26390 opened by dbakker #26391: Specialized sub-classes of Generic never call __init__ http://bugs.python.org/issue26391 opened by Kai Wohlfahrt #26392: socketserver.BaseServer.close_server should stop serve_forever http://bugs.python.org/issue26392 opened by palaviv Most recent 15 issues with no replies (15) ========================================== #26392: socketserver.BaseServer.close_server should stop serve_forever http://bugs.python.org/issue26392 #26391: Specialized sub-classes of Generic never call __init__ http://bugs.python.org/issue26391 #26389: Expand traceback module API to accept just an exception as an http://bugs.python.org/issue26389 #26386: tkinter - Treeview - .selection_add and selection_toggle http://bugs.python.org/issue26386 #26384: UnboundLocalError in socket._sendfile_use_sendfile http://bugs.python.org/issue26384 #26383: benchmarks (perf.py): number of decimal places in csv output http://bugs.python.org/issue26383 #26377: Tkinter dialogs will not close if root window not packed. http://bugs.python.org/issue26377 #26375: New versions of Python hangs on imaplib.IMAP4_SSL() http://bugs.python.org/issue26375 #26373: asyncio: add support for async context manager on streams? http://bugs.python.org/issue26373 #26363: __builtins__ propagation is misleading described in exec and e http://bugs.python.org/issue26363 #26359: CPython build options for out-of-the box performance http://bugs.python.org/issue26359 #26358: mmap.mmap.__iter__ is broken (yields bytes instead of ints) http://bugs.python.org/issue26358 #26353: IDLE: Saving Shell should not add \n http://bugs.python.org/issue26353 #26338: remove duplicate bind addresses in create_server http://bugs.python.org/issue26338 #26335: Make mmap.write return the number of bytes written like other http://bugs.python.org/issue26335 Most recent 15 issues waiting for review (15) ============================================= #26392: socketserver.BaseServer.close_server should stop serve_forever http://bugs.python.org/issue26392 #26390: hashlib's pbkdf2_hmac documentation "rounds" does not match so http://bugs.python.org/issue26390 #26387: Race condition in sqlite module http://bugs.python.org/issue26387 #26385: the call of tempfile.NamedTemporaryFile fails and leaves a fil http://bugs.python.org/issue26385 #26384: UnboundLocalError in socket._sendfile_use_sendfile http://bugs.python.org/issue26384 #26382: List object memory allocator http://bugs.python.org/issue26382 #26380: Add an http method enum http://bugs.python.org/issue26380 #26379: zlib decompress as_bytearray flag http://bugs.python.org/issue26379 #26372: Popen.communicate not ignoring BrokenPipeError http://bugs.python.org/issue26372 #26367: importlib.__import__ does not fail for invalid relative import http://bugs.python.org/issue26367 #26366: Use ???.. versionadded??? over ???.. versionchanged??? where a http://bugs.python.org/issue26366 #26359: CPython build options for out-of-the box performance http://bugs.python.org/issue26359 #26352: getpass incorrectly displays password prompt on stderr on fall http://bugs.python.org/issue26352 #26347: BoundArguments.apply_defaults doesn't handle empty arguments http://bugs.python.org/issue26347 #26342: Faster bit ops for single-digit positive longs http://bugs.python.org/issue26342 Top 10 most discussed issues (10) ================================= #15873: datetime: add ability to parse RFC 3339 dates and times http://bugs.python.org/issue15873 29 msgs #26360: Deadlock in thread.join on Python 2.7/Mac OS X 10.9 http://bugs.python.org/issue26360 17 msgs #26323: Add assert_called() and assert_called_once() methods for mock http://bugs.python.org/issue26323 10 msgs #26366: Use ???.. versionadded??? over ???.. versionchanged??? where a http://bugs.python.org/issue26366 10 msgs #26331: Tokenizer: allow underscores for grouping in numeric literals http://bugs.python.org/issue26331 9 msgs #21145: Add the @cached_property decorator http://bugs.python.org/issue21145 8 msgs #26372: Popen.communicate not ignoring BrokenPipeError http://bugs.python.org/issue26372 8 msgs #26385: the call of tempfile.NamedTemporaryFile fails and leaves a fil http://bugs.python.org/issue26385 7 msgs #26380: Add an http method enum http://bugs.python.org/issue26380 6 msgs #14597: Cannot unload dll in ctypes until script exits http://bugs.python.org/issue14597 5 msgs Issues closed (28) ================== #15608: Improve socketserver doc http://bugs.python.org/issue15608 closed by martin.panter #16915: mode of socket.makefile is more limited than documentation sug http://bugs.python.org/issue16915 closed by berker.peksag #19841: ConfigParser PEP issues http://bugs.python.org/issue19841 closed by berker.peksag #20169: random module doc page has broken links http://bugs.python.org/issue20169 closed by python-dev #23992: multiprocessing: MapResult shouldn't fail fast upon exception http://bugs.python.org/issue23992 closed by neologix #25179: PEP 498 f-strings need to be documented http://bugs.python.org/issue25179 closed by martin.panter #25713: Setuptools included with 64-bit Windows installer is outdated http://bugs.python.org/issue25713 closed by ned.deily #25833: pyvenv: venvs cannot be moved because activate scripts hard-co http://bugs.python.org/issue25833 closed by vinay.sajip #25887: awaiting on coroutine more than once should be an error http://bugs.python.org/issue25887 closed by yselivanov #25924: investigate if getaddrinfo(3) on OSX is thread-safe http://bugs.python.org/issue25924 closed by ned.deily #26265: build errors on OS X 10.11 with --enable-universalsdk http://bugs.python.org/issue26265 closed by ned.deily #26309: socketserver.BaseServer._handle_request_noblock() doesn't shut http://bugs.python.org/issue26309 closed by martin.panter #26316: Probable typo in Arg Clinic's linear_format() http://bugs.python.org/issue26316 closed by martin.panter #26318: `io.open(fd, ...).name` returns numeric fd instead of None http://bugs.python.org/issue26318 closed by terry.reedy #26319: Check recData size before unpack in zipfile http://bugs.python.org/issue26319 closed by terry.reedy #26327: IDLE: File > Save in 2.7 Shell with non-ascii fails http://bugs.python.org/issue26327 closed by terry.reedy #26333: Multiprocessing imap hangs when generator input errors http://bugs.python.org/issue26333 closed by terry.reedy #26334: bytes.translate() doesn't take keyword arguments; docs suggest http://bugs.python.org/issue26334 closed by Nicholas Chammas #26348: activate.fish sets VENV prompt incorrectly http://bugs.python.org/issue26348 closed by vinay.sajip #26349: Ship python35.lib with the embedded distribution, please http://bugs.python.org/issue26349 closed by steve.dower #26354: re.I does not work as expected http://bugs.python.org/issue26354 closed by ezio.melotti #26356: Registration http://bugs.python.org/issue26356 closed by SilentGhost #26361: lambda in dict comprehension is broken http://bugs.python.org/issue26361 closed by steven.daprano #26364: pip uses colour in messages that does not work on white termin http://bugs.python.org/issue26364 closed by berker.peksag #26365: ntpath.py Error in Windows http://bugs.python.org/issue26365 closed by serhiy.storchaka #26368: grammatical error in asyncio stream documentation http://bugs.python.org/issue26368 closed by ned.deily #26371: asynchat.async_chat and asyncore.dispatcher_with_send are not http://bugs.python.org/issue26371 closed by gvanrossum #26378: Typo in regex documentation http://bugs.python.org/issue26378 closed by python-dev From hodgestar+pythondev at gmail.com Fri Feb 19 14:30:18 2016 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Fri, 19 Feb 2016 21:30:18 +0200 Subject: [Python-Dev] Regular expression bytecode In-Reply-To: References: Message-ID: This might be tricky for alternative Python implementations which might compile regular expressions into something rather different. From jcgoble3 at gmail.com Fri Feb 19 14:36:51 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Fri, 19 Feb 2016 14:36:51 -0500 Subject: [Python-Dev] Regular expression bytecode In-Reply-To: References: Message-ID: On Fri, Feb 19, 2016 at 2:30 PM, Simon Cross wrote: > This might be tricky for alternative Python implementations which > might compile regular expressions into something rather different. As has been discussed on python-ideas, it would be explicitly treated as a CPython implementation detail, so that wouldn't be an issue. That said, I've decided to shelve the idea for the time being, at least, as I've had some things come up unexpectedly, and I no longer have time to pursue this. From g.brandl at gmx.net Sun Feb 21 02:01:32 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 21 Feb 2016 08:01:32 +0100 Subject: [Python-Dev] Python 3.2.7 and 3.3.7 In-Reply-To: References: Message-ID: On 02/11/2016 06:34 PM, Georg Brandl wrote: > Hi all, > > I'm planning to release 3.2.7 and 3.3.7 at the end of February. > There will be a release candidate on Feb 20, and the final on > Feb 27, if there is no holdup. > > These are both security (source-only) releases. 3.2.7 will be the > last release from the 3.2 series. > > If you know of any patches that should go in, make sure to commit > them in time or notify me. FYI, these releases are currently on hold waiting for patches for a pending security issue (that we can't discuss publicly yet). cheers, Georg From tritium-list at sdamon.com Sun Feb 21 03:29:55 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Sun, 21 Feb 2016 03:29:55 -0500 Subject: [Python-Dev] Thank you. Message-ID: <56C97583.9070602@sdamon.com> I don't know if it is appropriate for this list, or not. I don't exactly care. As much as I might disagree with some of you... Thank you. Your work on Python has made a notable difference in how happy my life is. From brett at python.org Sun Feb 21 12:38:23 2016 From: brett at python.org (Brett Cannon) Date: Sun, 21 Feb 2016 17:38:23 +0000 Subject: [Python-Dev] Thank you. In-Reply-To: <56C97583.9070602@sdamon.com> References: <56C97583.9070602@sdamon.com> Message-ID: This is the appropriate list to post "thanks" to, and you're welcome! Glad we have been able to make your life happier. On Sun, 21 Feb 2016 at 02:28 Alexander Walters wrote: > I don't know if it is appropriate for this list, or not. I don't > exactly care. As much as I might disagree with some of you... > > Thank you. > > Your work on Python has made a notable difference in how happy my life is. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sun Feb 21 13:08:59 2016 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 22 Feb 2016 05:08:59 +1100 Subject: [Python-Dev] Thank you. References: <56C97583.9070602@sdamon.com> Message-ID: <854md20wck.fsf@benfinney.id.au> Alexander Walters writes: > I don't know if it is appropriate for this list, or not. I don't > exactly care. As much as I might disagree with some of you... If you intend to address the Python core developers, this is an appropriate forum in which to express thanks. > Thank you. > Your work on Python has made a notable difference in how happy my life is. Agreed. This community has robust disagreements that nevertheless remain civil and respectful. The ongoing development of Python makes my life, and the lives of many people whom I care about, significantly better. Thank you all. -- \ ?Only the shallow know themselves.? ?Oscar Wilde, _Phrases and | `\ Philosophies for the Use of the Young_, 1894 | _o__) | Ben Finney From ncoghlan at gmail.com Mon Feb 22 03:07:45 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 22 Feb 2016 18:07:45 +1000 Subject: [Python-Dev] Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) In-Reply-To: References: Message-ID: On 17 February 2016 at 15:42, Gregory P. Smith wrote: > sys.argv represents the C main() argv array. Your inclination (in the > linked to bug above) to leave sys.argv[0] alone is a good one. > No, it doesn't - it represents the state of argv *after* CPython's main function is done processing arguments (that's why we have a longstanding RFE requesting the availability of sys.raw_argv) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Mon Feb 22 17:24:51 2016 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Mon, 22 Feb 2016 23:24:51 +0100 Subject: [Python-Dev] Thank you. In-Reply-To: <56C97583.9070602@sdamon.com> References: <56C97583.9070602@sdamon.com> Message-ID: On Sun, Feb 21, 2016 at 9:29 AM, Alexander Walters wrote: > I don't know if it is appropriate for this list, or not. I don't exactly > care. As much as I might disagree with some of you... > > Thank you. > >From time to time I also think about how deeply Python impacted my life. Places I visited and lived in, people I got in contact with, the mere every day life and afternoons spent writing code just for the heck of it.... If it weren't for Python all of that would have been profoundly different and most certainly not as good. I would like to say thanks as well. Thank you Guido and thank you all core-devs. You don't change only code: you literally change lives as well. And that is profound. -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Feb 24 05:32:14 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Feb 2016 20:32:14 +1000 Subject: [Python-Dev] PEP 493: HTTPS verification migration tools for Python 2.7 Message-ID: Hi folks, Since the last discussion back in November (just after the RHEL 7.2 release), I've rewritten PEP 493 to be a standards track PEP targeting Python 2.7.12. Barry also kindly volunteered to serve as BDFL-Delegate, so we have a clear path to pronouncement if nobody notices any new problems or concerns that didn't come up in previous discussions :) The PEP now focuses on adding two new configuration mechanisms to the PEP 476 backport in Python 2.7: * turning off default certificate verification through a Python API * turning off default certificate verification through an environment variable Both of these are defined in such a way that if backported to a version where default verification is still off by default, they can be used to turn it *on*. The original file based configuration proposal to change the default behaviour of older versions is also still covered, but moved to a clearly optional section. (The gist of that section is now "If you backport this capability, aim to stay consistent with already existing backports of it") Regards, Nick. ========================================== PEP: 493 Title: HTTPS verification migration tools for Python 2.7 Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan , Robert Kuska , Marc-Andr? Lemburg BDFL-Delegate: Barry Warsaw Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-May-2015 Python-Version: 2.7.12 Post-History: 06-Jul-2015, 11-Nov-2015, 24-Nov-2015, 24-Feb-2016 Abstract ======== PEP 476 updated Python's default handling of HTTPS certificates in client modules to align with certificate handling in web browsers, by validating that the certificates received belonged to the server the client was attempting to contact. The Python 2.7 long term maintenance series was judged to be in scope for this change, with the new behaviour introduced in the Python 2.7.9 maintenance release. This has created a non-trivial barrier to adoption for affected Python 2.7 maintenance releases, so this PEP proposes additional Python 2.7 specific features that allow system administrators and other users to more easily decouple the decision to verify server certificates in HTTPS client modules from the decision to update to newer Python 2.7 maintenance releases. Rationale ========= PEP 476 changed Python's default behaviour to align with expectations established by web browsers in regards to the semantics of HTTPS URLs: starting with Python 2.7.9 and 3.4.3, HTTPS clients in the standard library validate server certificates by default. However, it is also the case that this change *does* cause problems for infrastructure administrators operating private intranets that rely on self-signed certificates, or otherwise encounter problems with the new default certificate verification settings. To manage these kinds of situations, web browsers provide users with "click through" warnings that allow the user to add the server's certificate to the browser's certificate store. Network client tools like ``curl`` and ``wget`` offer options to switch off certificate checking entirely (by way of ``curl --insecure`` and ``wget --no-check-certificate``, respectively). At a different layer of the technology stack, Linux security modules like `SELinux` and `AppArmor`, while enabled by default by distribution vendors, offer relatively straightforward mechanisms for turning them off. At the moment, no such convenient mechanisms exist to disable Python's default certificate checking for a whole process. PEP 476 did attempt to address this question, by covering how to revert to the old settings process wide by monkeypatching the ``ssl`` module to restore the old behaviour. Unfortunately, the ``sitecustomize.py`` based technique proposed to allow system administrators to disable the feature by default in their Standard Operating Environment definition has been determined to be insufficient in at least some cases. The specific case that led to the initial creation of this PEP is the one where a Linux distributor aims to provide their users with a `smoother migration path < https://bugzilla.redhat.com/show_bug.cgi?id=1173041>`__ than the standard one provided by consuming upstream CPython 2.7 releases directly, but other potential challenges have also been pointed out with updating embedded Python runtimes and other user level installations of Python. Rather than allowing a plethora of mutually incompatibile migration techniques to bloom, this PEP proposes an additional feature to be added to Python 2.7.12 to make it easier to revert a process to the past behaviour of skipping certificate validation in HTTPS client modules. It also provides additional recommendations to redistributors backporting these features to versions of Python prior to Python 2.7.9. Alternatives ------------ In the absence of clear upstream guidance and recommendations, commercial redistributors will still make their own design decisions in the interests of their customers. The main approaches available are: * Continuing to rebase on new Python 2.7.x releases, while providing no additional assistance beyond the mechanisms defined in PEP 476 in migrating from unchecked to checked hostnames in standard library HTTPS clients * Gating availability of the changes in default handling of HTTPS connections on upgrading from Python 2 to Python 3 * For Linux distribution vendors, gating availability of the changes in default handling of HTTPS connections on upgrading to a new operating system version * Implementing one or both of the backport suggestions described in this PEP, regardless of the formal status of the PEP Scope Limitations ================= These changes are being proposed purely as tools for helping to manage the transition to the new default certificate handling behaviour in the context of Python 2.7. They are not being proposed as new features for Python 3, as it is expected that the vast majority of client applications affected by this problem without the ability to update the application itself will be Python 2 applications. It would likely be desirable for a future version of Python 3 to allow the default certificate handling for secure protocols to be configurable on a per-protocol basis, but that question is beyond the scope of this PEP. Requirements for capability detection ===================================== As the proposals in this PEP aim to facilitate backports to earlier Python versions, the Python version number cannot be used as a reliable means for detecting them. Instead, they are designed to allow the presence or absence of the feature to be determined using the following technique:: python -c "import ssl; ssl.<_relevant_attribute>" This will fail with `AttributeError` (and hence a non-zero return code) if the relevant capability is not available. The feature detection attributes defined by this PEP are: * ``ssl._https_verify_certificates``: runtime configuration API * ``ssl._https_verify_envvar``: environment based configuration * ``ssl._cert_verification_config``: file based configuration (PEP 476 opt-in) The marker attributes are prefixed with an underscore to indicate the implementation dependent and security sensitive nature of these capabilities. Feature: Configuration API ========================== This change is proposed for inclusion in CPython 2.7.12 and later CPython 2.7.x releases. It consists of a new ``ssl._https_verify_certificates()`` to specify the default handling of HTTPS certificates in standard library client libraries. It is not proposed to forward port this change to Python 3, so Python 3 applications that need to support skipping certificate verification will still need to define their own suitable security context. Feature detection ----------------- The marker attribute on the ``ssl`` module related to this feature is the ``ssl._https_verify_certificates`` function itself. Specification ------------- The ``ssl._https_verify_certificates`` function will work as follows:: def _https_verify_certificates(enable=True): """Verify server HTTPS certificates by default?""" global _create_default_https_context if enable: _create_default_https_context = create_default_context else: _create_default_https_context = _create_unverified_context If called without arguments, or with ``enable`` set to a true value, then standard library client modules will subsequently verify HTTPS certificates by default, otherwise they will skip verification. If called with ``enable`` set to a false value, then standard library client modules will subsequently skip verifying HTTPS certificates by default. Security Considerations ----------------------- The inclusion of this feature will allow security sensitive applications to include the following forward-compatible snippet in their code:: if hasattr(ssl, "_https_verify_certificates"): ssl._https_verify_certificates() Some developers may also choose to opt out of certificate checking using ``ssl._https_verify_certificates(enable=False)``. This doesn't introduce any major new security concerns, as monkeypatching the affected internal APIs was already possible. Feature: environment based configuration ======================================== This change is proposed for inclusion in CPython 2.7.12 and later CPython 2.7.x releases. It consists of a new ``PYTHONHTTPSVERIFY`` environment variable that can be set to ``'0'`` to disable the default verification without modifying the application source code (which may not even be available in cases of bytecode-only application distribution) It is not proposed to forward port this change to Python 3, so Python 3 applications that need to support skipping certificate verification will still need to define their own suitable security context. Feature detection ----------------- The marker attribute on the ``ssl`` module related to this feature is: * the ``ssl._https_verify_envvar`` attribute, giving the name of environment variable affecting the default behaviour This not only makes it straightforward to detect the presence (or absence) of the capability, it also makes it possible to programmatically determine the relevant environment variable name. Specification ------------- Rather than always defaulting to the use of ``ssl.create_default_context``, the ``ssl`` module will be modified to: * read the ``PYTHONHTTPSVERIFY`` environment variable when the module is first imported into a Python process * set the ``ssl._create_default_https_context`` function to be an alias for ``ssl._create_unverified_context`` if this environment variable is present and set to ``'0'`` * otherwise, set the ``ssl._create_default_https_context`` function to be an alias for ``ssl.create_default_context`` as usual Example implementation ---------------------- :: _https_verify_envvar = 'PYTHONHTTPSVERIFY' def _get_https_context_factory(): if not sys.flags.ignore_environment: config_setting = os.environ.get(_https_verify_envvar) if config_setting == '0': return _create_unverified_context return create_default_context _create_default_https_context = _get_https_context_factory() Security Considerations ----------------------- Relative to the behaviour in Python 3.4.3+ and Python 2.7.9->2.7.11, this approach does introduce a new downgrade attack against the default security settings that potentially allows a sufficiently determined attacker to revert Python to the default behaviour used in CPython 2.7.8 and earlier releases. However, such an attack requires the ability to modify the execution environment of a Python process prior to the import of the ``ssl`` module, and any attacker with such access would already be able to modify the behaviour of the underlying OpenSSL implementation. Interaction with Python virtual environments -------------------------------------------- The default setting is read directly from the process environment, and hence works the same way regardless of whether or not the interpreter is being run inside an activated Python virtual environment. Reference Implementation ======================== A patch for Python 2.7 implementing the above two features is attached to the `relevant tracker issue `__. Backporting this PEP to earlier Python versions =============================================== If this PEP is accepted, then commercial Python redistributors may choose to backport the per-process configuration mechanisms defined in this PEP to base versions older than Python 2.7.9, *without* also backporting PEP 476's change to the default behaviour of the overall Python installation. Such a backport would differ from the mechanism proposed in this PEP solely in the default behaviour when ``PYTHONHTTPSVERIFY`` was not set at all: it would continue to default to skipping certificate validation. In this case, if the ``PYTHONHTTPSVERIFY`` environment variable is defined, and set to anything *other* than ``'0'``, then HTTPS certificate verification should be enabled. Feature detection ----------------- There's no specific attribute indicating that this situation applies. Rather, it is indicated by the ``ssl._https_verify_certificates`` and ``ssl._https_verify_envvar`` attributes being present in a Python version that is nominally older than Python 2.7.12. Specification ------------- Implementing this backport involves backporting the changes in PEP 466, 476 and this PEP, with the following change to the handling of the ``PYTHONHTTPSVERIFY`` environment variable in the ``ssl`` module: * read the ``PYTHONHTTPSVERIFY`` environment variable when the module is first imported into a Python process * set the ``ssl._create_default_https_context`` function to be an alias for ``ssl.create_default_context`` if this environment variable is present and set to any value other than ``'0'`` * otherwise, set the ``ssl._create_default_https_context`` function to be an alias for ``ssl._create_unverified_context`` Example implementation ---------------------- :: _https_verify_envvar = 'PYTHONHTTPSVERIFY' def _get_https_context_factory(): if not sys.flags.ignore_environment: config_setting = os.environ.get(_https_verify_envvar) if config_setting != '0': return create_default_context return _create_unverified_context _create_default_https_context = _get_https_context_factory() def _disable_https_default_verification(): """Skip verification of HTTPS certificates by default""" global _create_default_https_context _create_default_https_context = _create_unverified_context Security Considerations ----------------------- This change would be a strict security upgrade for any Python version that currently defaults to skipping certificate validation in standard library HTTPS clients. The technical trade-offs to be taken into account relate largely to the magnitude of the PEP 466 backport also required rather than to anything security related. Interaction with Python virtual environments -------------------------------------------- The default setting is read directly from the process environment, and hence works the same way regardless of whether or not the interpreter is being run inside an activated Python virtual environment. Backporting PEP 476 to earlier Python versions ============================================== The backporting approach described above leaves the default HTTPS certificate verification behaviour of a Python 2.7 installation unmodified: verifying certificates still needs to be opted into on a per-connection or per-process basis. To allow the default behaviour of the entire installation to be modified without breaking backwards compatibility, Red Hat designed a configuration mechanism for the system Python 2.7 installation in Red Hat Enterprise Linux 7.2+ that provides: * an opt-in model that allows the decision to enable HTTPS certificate verification to be made independently of the decision to upgrade to the operating system version where the feature was first backported * the ability for system administrators to set the default behaviour of Python applications and scripts run directly in the system Python installation * the ability for the redistributor to consider changing the default behaviour of *new* installations at some point in the future without impacting existing installations that have been explicitly configured to skip verifying HTTPS certificates by default As it only affects backports to earlier releases of Python 2.7, this change is not proposed for inclusion in upstream CPython, but rather is offered as a recommendation to other redistributors that choose to offer a similar feature to their users. This PEP doesn't take a position on whether or not this particular change is a good idea - rather, it suggests that *if* a redistributor chooses to go down the path of making the default behaviour configurable in a version of Python older than Python 2.7.9, then maintaining a consistent approach across redistributors would be beneficial for users. However, this approach SHOULD NOT be used for any Python installation that advertises itself as providing Python 2.7.9 or later, as most Python users will have the reasonable expectation that all such environments will verify HTTPS certificates by default. Feature detection ----------------- The marker attribute on the ``ssl`` module related to this feature is:: _cert_verification_config = '' This not only makes it straightforward to detect the presence (or absence) of the capability, it also makes it possible to programmatically determine the relevant configuration file name. Recommended modifications to the Python standard library -------------------------------------------------------- The recommended approach to backporting the PEP 476 modifications to an earlier point release is to implement the following changes relative to the default PEP 476 behaviour implemented in Python 2.7.9+: * modify the ``ssl`` module to read a system wide configuration file when the module is first imported into a Python process * define a platform default behaviour (either verifying or not verifying HTTPS certificates) to be used if this configuration file is not present * support selection between the following three modes of operation: * ensure HTTPS certificate verification is enabled * ensure HTTPS certificate verification is disabled * delegate the decision to the redistributor providing this Python version * set the ``ssl._create_default_https_context`` function to be an alias for either ``ssl.create_default_context`` or ``ssl._create_unverified_context`` based on the given configuration setting. Recommended file location ------------------------- As the PEP authors are not aware of any vendors providing long-term support releases targeting Windows, Mac OS X or \*BSD systems, this approach is currently only specifically defined for Linux system Python installations. The recommended configuration file name on Linux systems is ``/etc/python/cert-verification.cfg``. The ``.cfg`` filename extension is recommended for consistency with the ``pyvenv.cfg`` used by the ``venv`` module in Python 3's standard library. Recommended file format ----------------------- The configuration file should use a ConfigParser ini-style format with a single section named ``[https]`` containing one required setting ``verify``. The suggested section name is taken from the "https" URL schema passed to affected client APIs. Permitted values for ``verify`` are: * ``enable``: ensure HTTPS certificate verification is enabled by default * ``disable``: ensure HTTPS certificate verification is disabled by default * ``platform_default``: delegate the decision to the redistributor providing this particular Python version If the ``[https]`` section or the ``verify`` setting are missing, or if the ``verify`` setting is set to an unknown value, it should be treated as if the configuration file is not present. Example implementation ---------------------- :: _cert_verification_config = '/etc/python/cert-verification.cfg' def _get_https_context_factory(): # Check for a system-wide override of the default behaviour context_factories = { 'enable': create_default_context, 'disable': _create_unverified_context, 'platform_default': _create_unverified_context, # For now :) } import ConfigParser config = ConfigParser.RawConfigParser() config.read(_cert_verification_config) try: verify_mode = config.get('https', 'verify') except (ConfigParser.NoSectionError, ConfigParser.NoOptionError): verify_mode = 'platform_default' default_factory = context_factories.get('platform_default') return context_factories.get(verify_mode, default_factory) _create_default_https_context = _get_https_context_factory() Security Considerations ----------------------- The specific recommendations for this backporting case are designed to work for privileged, security sensitive processes, even those being run in the following locked down configuration: * run from a locked down administrator controlled directory rather than a normal user directory (preventing ``sys.path[0]`` based privilege escalation attacks) * run using the ``-E`` switch (preventing ``PYTHON*`` environment variable based privilege escalation attacks) * run using the ``-s`` switch (preventing user site directory based privilege escalation attacks) * run using the ``-S`` switch (preventing ``sitecustomize`` based privilege escalation attacks) The intent is that the *only* reason HTTPS verification should be getting turned off installation wide when using this approach is because: * an end user is running a redistributor provided version of CPython rather than running upstream CPython directly * that redistributor has decided to provide a smoother migration path to verifying HTTPS certificates by default than that being provided by the upstream project * either the redistributor or the local infrastructure administrator has determined that it is appropriate to retain the default pre-2.7.9 behaviour (at least for the time being) Using an administrator controlled configuration file rather than an environment variable has the essential feature of providing a smoother migration path, even for applications being run with the ``-E`` switch. Interaction with Python virtual environments -------------------------------------------- This setting is scoped by the interpreter installation and affects all Python processes using that interpreter, regardless of whether or not the interpreter is being run inside an activated Python virtual environment. Origins of this recommendation ------------------------------ This recommendation is based on the backporting approach adopted for Red Hat Enterprise Linux 7.2, as published in the original July 2015 draft of this PEP and described in detail in `this KnowledgeBase article `__. Red Hat's patches implementing this backport for Python 2.7.5 can be found in the `CentOS git repository `__. Recommendation for combined feature backports ============================================= If a redistributor chooses to backport the environment variable based configuration setting from this PEP to a modified Python version that also implements the configuration file based PEP 476 backport, then the environment variable should take precedence over the system-wide configuration setting. This allows the setting to be changed for a given user or application, regardless of the installation-wide default behaviour. Example implementation ---------------------- :: _https_verify_envvar = 'PYTHONHTTPSVERIFY' _cert_verification_config = '/etc/python/cert-verification.cfg' def _get_https_context_factory(): # Check for an environmental override of the default behaviour if not sys.flags.ignore_environment: config_setting = os.environ.get(_https_verify_envvar) if config_setting is not None: if config_setting == '0': return _create_unverified_context return create_default_context # Check for a system-wide override of the default behaviour context_factories = { 'enable': create_default_context, 'disable': _create_unverified_context, 'platform_default': _create_unverified_context, # For now :) } import ConfigParser config = ConfigParser.RawConfigParser() config.read(_cert_verification_config) try: verify_mode = config.get('https', 'verify') except (ConfigParser.NoSectionError, ConfigParser.NoOptionError): verify_mode = 'platform_default' default_factory = context_factories.get('platform_default') return context_factories.get(verify_mode, default_factory) _create_default_https_context = _get_https_context_factory() Copyright ========= This document has been placed into the public domain. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Wed Feb 24 06:28:27 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 24 Feb 2016 11:28:27 +0000 Subject: [Python-Dev] PEP 493: HTTPS verification migration tools for Python 2.7 In-Reply-To: References: Message-ID: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> > On 24 Feb 2016, at 10:32, Nick Coghlan wrote: > > Security Considerations > ----------------------- > > Relative to the behaviour in Python 3.4.3+ and Python 2.7.9->2.7.11, this > approach does introduce a new downgrade attack against the default security > settings that potentially allows a sufficiently determined attacker to revert > Python to the default behaviour used in CPython 2.7.8 and earlier releases. > However, such an attack requires the ability to modify the execution > environment of a Python process prior to the import of the ``ssl`` module, > and any attacker with such access would already be able to modify the > behaviour of the underlying OpenSSL implementation. > I?m not entirely sure this is accurate. Specifically, an attacker that is able to set environment variables but nothing else (no filesystem access) would be able to disable hostname validation. To my knowledge this is the only environment variable that could be set that would do that. It?s just worth noting here that this potentially opens a little crack in Python?s armour. Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Wed Feb 24 06:42:49 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Feb 2016 21:42:49 +1000 Subject: [Python-Dev] PEP 493: HTTPS verification migration tools for Python 2.7 In-Reply-To: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> References: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> Message-ID: On 24 February 2016 at 21:28, Cory Benfield wrote: > > > On 24 Feb 2016, at 10:32, Nick Coghlan wrote: > > > > Security Considerations > > ----------------------- > > > > Relative to the behaviour in Python 3.4.3+ and Python 2.7.9->2.7.11, this > > approach does introduce a new downgrade attack against the default > security > > settings that potentially allows a sufficiently determined attacker to > revert > > Python to the default behaviour used in CPython 2.7.8 and earlier > releases. > > However, such an attack requires the ability to modify the execution > > environment of a Python process prior to the import of the ``ssl`` > module, > > and any attacker with such access would already be able to modify the > > behaviour of the underlying OpenSSL implementation. > > > > I?m not entirely sure this is accurate. Specifically, an attacker that is > able to set environment variables but nothing else (no filesystem access) > would be able to disable hostname validation. ... for SSL contexts that aren't explicitly enabling it. > To my knowledge this is the only environment variable that could be set > that would do that. > > It?s just worth noting here that this potentially opens a little crack in > Python?s armour. > Only in Python 2.7's, and there we have a much bigger problem with folks not upgrading past 2.7.8, and with a number of redistributors considering the change too disruptive to backport as a security fix. I do think you're right though, so I'll tweak the wording of that section accordingly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed Feb 24 07:19:44 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Feb 2016 13:19:44 +0100 Subject: [Python-Dev] PEP 493: HTTPS verification migration tools for Python 2.7 In-Reply-To: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> References: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> Message-ID: <56CD9FE0.7020305@egenix.com> On 24.02.2016 12:28, Cory Benfield wrote: > >> On 24 Feb 2016, at 10:32, Nick Coghlan wrote: >> >> Security Considerations >> ----------------------- >> >> Relative to the behaviour in Python 3.4.3+ and Python 2.7.9->2.7.11, this >> approach does introduce a new downgrade attack against the default security >> settings that potentially allows a sufficiently determined attacker to revert >> Python to the default behaviour used in CPython 2.7.8 and earlier releases. >> However, such an attack requires the ability to modify the execution >> environment of a Python process prior to the import of the ``ssl`` module, >> and any attacker with such access would already be able to modify the >> behaviour of the underlying OpenSSL implementation. >> > > I?m not entirely sure this is accurate. Specifically, an attacker that is able to set environment variables but nothing else (no filesystem access) would be able to disable hostname validation. To my knowledge this is the only environment variable that could be set that would do that. An attacker with access to the OS environment of a process would be able to do lots of things. I think disabling certificate checks is not one of the highest ranked attack vectors you'd use, given such capabilities :-) Think of LD_PRELOAD attacks, LD_LIBRARY_PATH manipulations, shell PATH manipulations (think spawned processes), compiler flag manipulations (think "pip install sourcepkg"), OpenSSL reconfiguration, etc. Probably much easier than an active attack would be to simply extract sensitive information from the environ and use this for more direct attacks, e.g. accessing databases, payment services, etc. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 24 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2016-02-19: Released eGenix PyRun 2.1.2 ... http://egenix.com/go88 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From cory at lukasa.co.uk Wed Feb 24 15:39:51 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 24 Feb 2016 20:39:51 +0000 Subject: [Python-Dev] PEP 493: HTTPS verification migration tools for Python 2.7 In-Reply-To: <56CD9FE0.7020305@egenix.com> References: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> <56CD9FE0.7020305@egenix.com> Message-ID: <636ED375-DC0E-430A-A7B3-6411B7FF1DED@lukasa.co.uk> > On 24 Feb 2016, at 12:19, M.-A. Lemburg wrote: > > On 24.02.2016 12:28, Cory Benfield wrote: >> >>> On 24 Feb 2016, at 10:32, Nick Coghlan wrote: >>> >>> Security Considerations >>> ----------------------- >>> >>> Relative to the behaviour in Python 3.4.3+ and Python 2.7.9->2.7.11, this >>> approach does introduce a new downgrade attack against the default security >>> settings that potentially allows a sufficiently determined attacker to revert >>> Python to the default behaviour used in CPython 2.7.8 and earlier releases. >>> However, such an attack requires the ability to modify the execution >>> environment of a Python process prior to the import of the ``ssl`` module, >>> and any attacker with such access would already be able to modify the >>> behaviour of the underlying OpenSSL implementation. >>> >> >> I?m not entirely sure this is accurate. Specifically, an attacker that is able to set environment variables but nothing else (no filesystem access) would be able to disable hostname validation. To my knowledge this is the only environment variable that could be set that would do that. > > An attacker with access to the OS environment of a process would > be able to do lots of things. I think disabling certificate checks > is not one of the highest ranked attack vectors you'd use, given > such capabilities :-) > > Think of LD_PRELOAD attacks, LD_LIBRARY_PATH manipulations, shell PATH > manipulations (think spawned processes), compiler flag manipulations > (think "pip install sourcepkg"), OpenSSL reconfiguration, etc. > > Probably much easier than an active attack would be to simply extract > sensitive information from the environ and use this for more direct > attacks, e.g. accessing databases, payment services, etc. To be clear, I?m not suggesting that this represents a reason not to do any of this, just that we should not suggest that there is no risk here: there is, and it is a new attack vector. Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Wed Feb 24 16:14:49 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Feb 2016 22:14:49 +0100 Subject: [Python-Dev] PEP 493: HTTPS verification migration tools for Python 2.7 In-Reply-To: <636ED375-DC0E-430A-A7B3-6411B7FF1DED@lukasa.co.uk> References: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> <56CD9FE0.7020305@egenix.com> <636ED375-DC0E-430A-A7B3-6411B7FF1DED@lukasa.co.uk> Message-ID: <56CE1D49.3000405@egenix.com> On 24.02.2016 21:39, Cory Benfield wrote: > >> On 24 Feb 2016, at 12:19, M.-A. Lemburg wrote: >> >> On 24.02.2016 12:28, Cory Benfield wrote: >>> >>>> On 24 Feb 2016, at 10:32, Nick Coghlan wrote: >>>> >>>> Security Considerations >>>> ----------------------- >>>> >>>> Relative to the behaviour in Python 3.4.3+ and Python 2.7.9->2.7.11, this >>>> approach does introduce a new downgrade attack against the default security >>>> settings that potentially allows a sufficiently determined attacker to revert >>>> Python to the default behaviour used in CPython 2.7.8 and earlier releases. >>>> However, such an attack requires the ability to modify the execution >>>> environment of a Python process prior to the import of the ``ssl`` module, >>>> and any attacker with such access would already be able to modify the >>>> behaviour of the underlying OpenSSL implementation. >>>> >>> >>> I?m not entirely sure this is accurate. Specifically, an attacker that is able to set environment variables but nothing else (no filesystem access) would be able to disable hostname validation. To my knowledge this is the only environment variable that could be set that would do that. >> >> An attacker with access to the OS environment of a process would >> be able to do lots of things. I think disabling certificate checks >> is not one of the highest ranked attack vectors you'd use, given >> such capabilities :-) >> >> Think of LD_PRELOAD attacks, LD_LIBRARY_PATH manipulations, shell PATH >> manipulations (think spawned processes), compiler flag manipulations >> (think "pip install sourcepkg"), OpenSSL reconfiguration, etc. >> >> Probably much easier than an active attack would be to simply extract >> sensitive information from the environ and use this for more direct >> attacks, e.g. accessing databases, payment services, etc. > > To be clear, I?m not suggesting that this represents a reason not to do any of this, just that we should not suggest that there is no risk here: there is, and it is a new attack vector. Fair enough :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 24 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2016-02-19: Released eGenix PyRun 2.1.2 ... http://egenix.com/go88 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From ncoghlan at gmail.com Thu Feb 25 03:36:10 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 25 Feb 2016 18:36:10 +1000 Subject: [Python-Dev] PEP 493: HTTPS verification migration tools for Python 2.7 In-Reply-To: <56CE1D49.3000405@egenix.com> References: <734AD98A-0264-4B08-9A75-73A91303051A@lukasa.co.uk> <56CD9FE0.7020305@egenix.com> <636ED375-DC0E-430A-A7B3-6411B7FF1DED@lukasa.co.uk> <56CE1D49.3000405@egenix.com> Message-ID: On 25 February 2016 at 07:14, M.-A. Lemburg wrote: > On 24.02.2016 21:39, Cory Benfield wrote: >> >>> On 24 Feb 2016, at 12:19, M.-A. Lemburg wrote: >>> >>> On 24.02.2016 12:28, Cory Benfield wrote: >>>> I?m not entirely sure this is accurate. Specifically, an attacker that is able to set environment variables but nothing else (no filesystem access) would be able to disable hostname validation. To my knowledge this is the only environment variable that could be set that would do that. >>> >>> An attacker with access to the OS environment of a process would >>> be able to do lots of things. I think disabling certificate checks >>> is not one of the highest ranked attack vectors you'd use, given >>> such capabilities :-) >>> >>> Think of LD_PRELOAD attacks, LD_LIBRARY_PATH manipulations, shell PATH >>> manipulations (think spawned processes), compiler flag manipulations >>> (think "pip install sourcepkg"), OpenSSL reconfiguration, etc. >> >> To be clear, I?m not suggesting that this represents a reason not to do any of this, just that we should not suggest that there is no risk here: there is, and it is a new attack vector. > > Fair enough :-) I tweaked the explanation of that security caveat: https://hg.python.org/peps/rev/a24451715d84 (and then tweaked the tweak to replace "the main" with "a key"). I didn't mention the prospect of reading sensitive data from the environment, as the specific problem we're introducing is with write access, and I believe certainly flavours of vulnerability can give the ability to do blind writes to the environment without necessarily gaining the ability to dump arbitrary details about that environment. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From status at bugs.python.org Fri Feb 26 12:08:34 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 26 Feb 2016 18:08:34 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160226170834.8328156B75@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-02-19 - 2016-02-26) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5437 ( +5) closed 32764 (+46) total 38201 (+51) Open issues with patches: 2378 Issues opened (25) ================== #26393: random.shuffled http://bugs.python.org/issue26393 opened by palaviv #26394: Have argparse provide ability to require a fallback value be p http://bugs.python.org/issue26394 opened by quabla #26396: Create json.JSONType http://bugs.python.org/issue26396 opened by brett.cannon #26403: Catch FileNotFoundError in socketserver.DatagramRequestHandler http://bugs.python.org/issue26403 opened by desbma #26404: socketserver context manager http://bugs.python.org/issue26404 opened by palaviv #26407: csv.writer.writerows masks exceptions from __iter__ http://bugs.python.org/issue26407 opened by Ilja Everil?? #26410: "incompatible pointer type" while compiling Python3.5.1 http://bugs.python.org/issue26410 opened by Devyn Johnson #26414: os.defpath too permissive http://bugs.python.org/issue26414 opened by jbeck #26415: Out of memory, trying to parse a 35MB dict http://bugs.python.org/issue26415 opened by A. Skrobov #26418: multiprocessing.pool.ThreadPool eats up memories http://bugs.python.org/issue26418 opened by renlifeng #26420: IDEL for Python 3.5.1 for x64 Windows exits when pasted a stri http://bugs.python.org/issue26420 opened by tats.u. #26421: string_richcompare invalid check Py_NotImplemented http://bugs.python.org/issue26421 opened by yuriy_levchenko #26423: Integer overflow in wrap_lenfunc() on 64-bit build of Windows http://bugs.python.org/issue26423 opened by Dave Hibbitts #26425: 'TypeError: object of type 'NoneType' has no len()' in 'splitd http://bugs.python.org/issue26425 opened by Konrad #26432: Add partial.kwargs http://bugs.python.org/issue26432 opened by serhiy.storchaka #26433: urllib.urlencode() does not explain how to handle unicode http://bugs.python.org/issue26433 opened by Thomas G??ttler #26434: multiprocessing cannot spawn grandchild from a Windows service http://bugs.python.org/issue26434 opened by schlamar #26436: Add the regex-dna benchmark http://bugs.python.org/issue26436 opened by serhiy.storchaka #26437: asyncio create_server() not always accepts the 'port' paramete http://bugs.python.org/issue26437 opened by xdegaye #26439: ctypes.util.find_library fails when ldconfig/glibc not availab http://bugs.python.org/issue26439 opened by Michael.Felt #26440: tarfile._FileInFile.seekable is broken in stream mode http://bugs.python.org/issue26440 opened by Bill Lee #26441: email.charset: to_splittable and from_splittable are not there http://bugs.python.org/issue26441 opened by martin.panter #26442: Doc refers to xmlrpc.client but means xmlrpc.server http://bugs.python.org/issue26442 opened by Valentin.Lorentz #26443: cross building extensions picks up host headers http://bugs.python.org/issue26443 opened by hundeboll #26444: Fix 2 typos on ElementTree docs http://bugs.python.org/issue26444 opened by Ismail s Most recent 15 issues with no replies (15) ========================================== #26444: Fix 2 typos on ElementTree docs http://bugs.python.org/issue26444 #26443: cross building extensions picks up host headers http://bugs.python.org/issue26443 #26442: Doc refers to xmlrpc.client but means xmlrpc.server http://bugs.python.org/issue26442 #26441: email.charset: to_splittable and from_splittable are not there http://bugs.python.org/issue26441 #26433: urllib.urlencode() does not explain how to handle unicode http://bugs.python.org/issue26433 #26432: Add partial.kwargs http://bugs.python.org/issue26432 #26418: multiprocessing.pool.ThreadPool eats up memories http://bugs.python.org/issue26418 #26396: Create json.JSONType http://bugs.python.org/issue26396 #26393: random.shuffled http://bugs.python.org/issue26393 #26391: Specialized sub-classes of Generic never call __init__ http://bugs.python.org/issue26391 #26383: benchmarks (perf.py): number of decimal places in csv output http://bugs.python.org/issue26383 #26373: asyncio: add support for async context manager on streams? http://bugs.python.org/issue26373 #26363: __builtins__ propagation is misleading described in exec and e http://bugs.python.org/issue26363 #26359: CPython build options for out-of-the box performance http://bugs.python.org/issue26359 #26358: mmap.mmap.__iter__ is broken (yields bytes instead of ints) http://bugs.python.org/issue26358 Most recent 15 issues waiting for review (15) ============================================= #26444: Fix 2 typos on ElementTree docs http://bugs.python.org/issue26444 #26443: cross building extensions picks up host headers http://bugs.python.org/issue26443 #26436: Add the regex-dna benchmark http://bugs.python.org/issue26436 #26432: Add partial.kwargs http://bugs.python.org/issue26432 #26423: Integer overflow in wrap_lenfunc() on 64-bit build of Windows http://bugs.python.org/issue26423 #26414: os.defpath too permissive http://bugs.python.org/issue26414 #26404: socketserver context manager http://bugs.python.org/issue26404 #26403: Catch FileNotFoundError in socketserver.DatagramRequestHandler http://bugs.python.org/issue26403 #26393: random.shuffled http://bugs.python.org/issue26393 #26388: Disabling changing sys.argv[0] with runpy.run_module(...alter_ http://bugs.python.org/issue26388 #26387: Crash calling sqlite3_close with invalid pointer http://bugs.python.org/issue26387 #26386: tkinter - Treeview - .selection_add and selection_toggle http://bugs.python.org/issue26386 #26385: the call of tempfile.NamedTemporaryFile fails and leaves a fil http://bugs.python.org/issue26385 #26384: UnboundLocalError in socket._sendfile_use_sendfile http://bugs.python.org/issue26384 #26382: List object memory allocator http://bugs.python.org/issue26382 Top 10 most discussed issues (10) ================================= #26423: Integer overflow in wrap_lenfunc() on 64-bit build of Windows http://bugs.python.org/issue26423 15 msgs #26221: awaiting asyncio.Future swallows StopIteration http://bugs.python.org/issue26221 11 msgs #26385: the call of tempfile.NamedTemporaryFile fails and leaves a fil http://bugs.python.org/issue26385 9 msgs #26415: Out of memory, trying to parse a 35MB dict http://bugs.python.org/issue26415 8 msgs #19475: Add timespec optional flag to datetime isoformat() to choose t http://bugs.python.org/issue19475 7 msgs #26376: Tkinter root window won't close if packed. http://bugs.python.org/issue26376 7 msgs #26039: More flexibility in zipfile interface http://bugs.python.org/issue26039 6 msgs #26281: Clear sys.path_importer_cache from importlib.invalidate_caches http://bugs.python.org/issue26281 6 msgs #26323: Add assert_called() and assert_called_once() methods for mock http://bugs.python.org/issue26323 6 msgs #21042: ctypes.util.find_library() should return full pathname instead http://bugs.python.org/issue21042 5 msgs Issues closed (44) ================== #1429: FD leak in SocketServer when request handler throws exception http://bugs.python.org/issue1429 closed by martin.panter #5824: SocketServer.DatagramRequestHandler Broken under Linux http://bugs.python.org/issue5824 closed by martin.panter #21996: gettarinfo method does not handle files without text string na http://bugs.python.org/issue21996 closed by martin.panter #22088: base64 module still ignores non-alphabet characters http://bugs.python.org/issue22088 closed by martin.panter #22468: Tarfile using fstat on GZip file object http://bugs.python.org/issue22468 closed by martin.panter #23430: socketserver.BaseServer.handle_error() should not catch exitin http://bugs.python.org/issue23430 closed by martin.panter #24229: pathlib.Path should have a copy() method http://bugs.python.org/issue24229 closed by serhiy.storchaka #25080: The example-code for making XML-RPC requests through proxy, fa http://bugs.python.org/issue25080 closed by berker.peksag #25136: Python doesn't find Xcode 7 SDK stub libraries http://bugs.python.org/issue25136 closed by ned.deily #25139: socketserver.ThreadingMixIn exception handler: Just a little r http://bugs.python.org/issue25139 closed by martin.panter #25801: ResourceWarning in test_zipfile64 http://bugs.python.org/issue25801 closed by SilentGhost #25913: base64.a85decode adobe flag incorrectly utilizes <~ as a marke http://bugs.python.org/issue25913 closed by serhiy.storchaka #26261: NamedTemporaryFile documentation is vague about the `name` att http://bugs.python.org/issue26261 closed by martin.panter #26302: cookies module allows commas in keys http://bugs.python.org/issue26302 closed by jason.coombs #26366: Use ???.. versionadded??? over ???.. versionchanged??? where a http://bugs.python.org/issue26366 closed by rhettinger #26367: importlib.__import__ does not fail for invalid relative import http://bugs.python.org/issue26367 closed by brett.cannon #26390: hashlib's pbkdf2_hmac documentation "rounds" does not match so http://bugs.python.org/issue26390 closed by martin.panter #26392: socketserver.BaseServer.close_server should stop serve_forever http://bugs.python.org/issue26392 closed by palaviv #26395: asyncio does not support yielding from recvfrom (socket/udp) http://bugs.python.org/issue26395 closed by Simon Bernier St-Pierre #26397: Tweak importlib Example of importlib.import_module() to use im http://bugs.python.org/issue26397 closed by brett.cannon #26398: cgi.escape() Can Lead To XSS and HTML Vulnerabilities http://bugs.python.org/issue26398 closed by gregory.p.smith #26399: CSV Injection Vulnerability http://bugs.python.org/issue26399 closed by maciej.szulik #26400: SyntaxError when running Python 2.7 interpreter with subproces http://bugs.python.org/issue26400 closed by giumas #26401: Error in documentation for "compile" built-in function http://bugs.python.org/issue26401 closed by berker.peksag #26402: Regression in Python 3.5 xmlrpc.client, raises RemoteDisconnec http://bugs.python.org/issue26402 closed by martin.panter #26405: tkinter askopenfilename doubleclick issue on windows http://bugs.python.org/issue26405 closed by serhiy.storchaka #26406: getaddrinfo is thread-safe on NetBSD and OpenBSD http://bugs.python.org/issue26406 closed by ned.deily #26408: pep-8 requires a few corrections http://bugs.python.org/issue26408 closed by python-dev #26409: Support latest Tcl/Tk on future versions of Mac installer http://bugs.python.org/issue26409 closed by ned.deily #26411: Suggestion concerning compile-time warnings http://bugs.python.org/issue26411 closed by brett.cannon #26412: Segmentation Fault: 11 http://bugs.python.org/issue26412 closed by christian.heimes #26413: python 3.5.1 uses wrong registry in system-wide installation http://bugs.python.org/issue26413 closed by eryksun #26416: Deprecate the regex_v8, telco, and spectral_norm benchmarks http://bugs.python.org/issue26416 closed by brett.cannon #26417: Default IDLE 2.7.11 configuration files are out-of-sync on OS http://bugs.python.org/issue26417 closed by ned.deily #26422: printing 1e23 and up is incorrect http://bugs.python.org/issue26422 closed by eric.smith #26424: QPyNullVariant http://bugs.python.org/issue26424 closed by haypo #26426: email examples: incorrect use of email.headerregistry.Address http://bugs.python.org/issue26426 closed by berker.peksag #26427: w* format in PyArg_ParseTupleAndKeywords for optional argument http://bugs.python.org/issue26427 closed by serhiy.storchaka #26428: The range for xrange() is too narrow on Windows 64-bit http://bugs.python.org/issue26428 closed by serhiy.storchaka #26429: os.path.dirname returns empty string instead of "." when file http://bugs.python.org/issue26429 closed by serhiy.storchaka #26430: quote marks problem on loaded file http://bugs.python.org/issue26430 closed by SilentGhost #26431: string template substitute tests http://bugs.python.org/issue26431 closed by serhiy.storchaka #26435: Fix versionadded/versionchanged documentation directives http://bugs.python.org/issue26435 closed by python-dev #26438: Complete your registration to Python tracker -- key4g5ti2VWPYC http://bugs.python.org/issue26438 closed by christian.heimes From deronnax at gmail.com Fri Feb 26 05:12:48 2016 From: deronnax at gmail.com (Mathieu Dupuy) Date: Fri, 26 Feb 2016 18:12:48 +0800 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW Message-ID: Hi. I am currently working on adding some functionality on a standard library module (http://bugs.python.org/issue15873). The Python part went fine, but now I have to do the C counterpart, and I have ran into in several problems, which, stacked up, are a huge obstacle to easily contribute further. Currently, despite I could work, I can't go further on my patch. I am currently working in very limited network, CPU and time ressources* which are quite uncommon in the western world, but are much less in the rest of the world. I have a 2GB/month mobile data plan and a 100KB/s speed. For the C part of my patch, I should download Visual Studio. The Express Edition 2015 is roughly 9GB. I can't afford that. I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and Fedora 23). Shortly, I couldn't get something working quickly and simply (quickly = less than 2 hours, downloading time NOT included, which is anyway way too already much). What went wrong and why it went wrong could be a whole new thread and is outside of the scope of this message. Let me precise this : at my work I use many virtualbox instances automatically fired and run in parallel to test new deployments and run unittests. I like this tool, but despite its simple look, it (most of the time) can not be used simply by a profane. The concepts it requires you to understand are not intuitive at first sight and there is *always* a thing that go wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox shipped for a moment a broken version of mount.vboxsf, preventing sharing folder to mount. Despite it's fixed, the broken releases spread everywhere and you may encounter them a lot in various Ubuntu and Virtualbox version. I downloaded the last versions of both and I am yet infected. https://www.virtualbox.org/ticket/12879). I could do whole new thread on why you can't ask newcomers to use Virtualbox (currently, at least). I ran into is a whole patch set to make CPython compile on MinGW (https://bugs.python.org/issue3871#msg199695). But it is not denying it's very experimental, and I know I would again spent useless hours trying to get it work rather than joyfully improving Python, and that's exactly what I do not want to happen. Getting ready to contribute to CPython pure python modules from an standard, average mr-everyone Windows PC for a beginner-to-medium contributor only require few megabytes of internet and few minutes of his time: getting a tarball of CPython sources (or cloning the github CPython mirror)**, a basic text editor and msys-git. The step further, if doing some -even basic- C code is required, implies downloading 9GB of Visual Studio and countless hours for it to be ready to use. I think downloading the whole Visual Studio suite is a huge stopper to contribute further for an average medium-or-below-contributor. I think (and I must not be the only one since CPython is to be moved to github), that barriers to contribute to CPython should be set to the lowest. Of course my situation is a bit special but I think it represents daily struggle of a *lot* of non-western programmer (at least for limited internet)(even here in Australia, landline limited internet connections are very common). It's not a big deal if the MinGW result build is twenty time slower or if some of the most advanced modules can't be build. But everyone programmer should be able to easily make some C hacks and get them to work. Hoping you'll be receptive to my pleas, Cheers * I am currently picking fruits in the regional Australia. I live in a van and have internet through with smartphone through an EDGE connection. I can plug the laptop in the farm but not in the van. ** No fresh programmer use mercurial unless he has a gun pointed on his head. From tritium-list at sdamon.com Fri Feb 26 12:55:04 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Fri, 26 Feb 2016 12:55:04 -0500 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: Message-ID: <56D09178.3080000@sdamon.com> No. Visual Studio is a solid compiler suit, mingw is a jenky mess, especially when you try and move to 64bit (where I don't think there is one true version of mingw). I'm sorry that Visual Studio makes it very hard for you to contribute, but changing THE compiler of the distribution from the platform compiler, especially when we FINALLY got a stable abi with it, is going to be a non starter. Compiling on MinGW for your own edification is fine, but that's not the build platform for windows python, nor should it be. Contributions are, and should continue to be, tested against Visual Studio. On 2/26/2016 05:12, Mathieu Dupuy wrote: > Hi. > I am currently working on adding some functionality on a standard > library module (http://bugs.python.org/issue15873). The Python part > went fine, but now I have to do the C counterpart, and I have ran into > in several problems, which, stacked up, are a huge obstacle to easily > contribute further. Currently, despite I could work, I can't go > further > on my patch. > > I am currently working in very limited network, CPU and time > ressources* which are quite uncommon in the western world, but are > much less in the rest of the world. I have a 2GB/month mobile data > plan and a 100KB/s speed. For the C part of my patch, I should > download Visual Studio. The Express Edition 2015 is roughly 9GB. I > can't afford that. > > I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and > Fedora 23). Shortly, I couldn't get something working quickly and > simply (quickly = less than 2 hours, downloading time NOT included, > which is anyway way too already much). What went wrong and why it went > wrong could be a whole new thread and is outside of the scope of this > message. > Let me precise this : at my work I use many virtualbox instances > automatically fired and run in parallel to test new deployments and > run unittests. I like this tool, > but despite its simple look, it (most of the time) can not be used > simply by a profane. The concepts it requires you to understand are > not intuitive at first sight and there is *always* a thing that go > wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox > shipped for a moment a broken version of mount.vboxsf, preventing > sharing folder to mount. Despite it's fixed, the broken releases > spread everywhere and you may encounter them a lot in various Ubuntu > and Virtualbox version. I downloaded the last versions of both and I > am yet infected. https://www.virtualbox.org/ticket/12879). I could do > whole new thread on why you can't ask newcomers to use Virtualbox > (currently, at least). > > I ran into is a whole patch set to make CPython compile on MinGW > (https://bugs.python.org/issue3871#msg199695). But it is not denying > it's very experimental, and I know I would again spent useless hours > trying to get it work rather than joyfully improving Python, and > that's exactly what I do not want to happen. > > Getting ready to contribute to CPython pure python modules from an > standard, average mr-everyone Windows PC for a beginner-to-medium > contributor only require few megabytes of internet and few minutes of his > time: getting a tarball of CPython sources (or cloning the github CPython > mirror)**, a basic text editor and msys-git. The step further, if doing > some -even basic- C code is required, implies downloading 9GB of Visual > Studio and countless hours for it to be ready to use. > I think downloading the whole Visual Studio suite is a huge stopper to > contribute further for an average medium-or-below-contributor. > > I think (and I must not be the only one since CPython is to be moved > to github), that barriers to contribute to CPython should be set to > the lowest. > Of course my situation is a bit special but I think it represents > daily struggle of a *lot* of non-western programmer (at least for > limited internet)(even here in Australia, landline limited internet > connections are very common). > It's not a big deal if the MinGW result build is twenty time slower or > if some of the most advanced modules can't be build. But everyone > programmer should be able to easily make some C hacks and get them to > work. > > Hoping you'll be receptive to my pleas, > Cheers > > > * I am currently picking fruits in the regional Australia. I live in a van > and have internet through with smartphone through an EDGE connection. I can > plug the laptop in the farm but not in the van. > ** No fresh programmer use mercurial unless he has a gun pointed on his > head. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com From drsalists at gmail.com Fri Feb 26 13:05:19 2016 From: drsalists at gmail.com (Dan Stromberg) Date: Fri, 26 Feb 2016 10:05:19 -0800 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: <56D09178.3080000@sdamon.com> References: <56D09178.3080000@sdamon.com> Message-ID: But what do you really think? IMO, windows builds probably should do both visual studio and mingw. That is, there probably should be two builds on windows, since there's no clear consensus about which to use. I certainly prefer mingw over visual studio - and I have adequate bandwidth for either. On Fri, Feb 26, 2016 at 9:55 AM, Alexander Walters wrote: > No. > > Visual Studio is a solid compiler suit, mingw is a jenky mess, especially > when you try and move to 64bit (where I don't think there is one true > version of mingw). I'm sorry that Visual Studio makes it very hard for you > to contribute, but changing THE compiler of the distribution from the > platform compiler, especially when we FINALLY got a stable abi with it, is > going to be a non starter. > > Compiling on MinGW for your own edification is fine, but that's not the > build platform for windows python, nor should it be. Contributions are, and > should continue to be, tested against Visual Studio. > > > On 2/26/2016 05:12, Mathieu Dupuy wrote: >> >> Hi. >> I am currently working on adding some functionality on a standard >> library module (http://bugs.python.org/issue15873). The Python part >> went fine, but now I have to do the C counterpart, and I have ran into >> in several problems, which, stacked up, are a huge obstacle to easily >> contribute further. Currently, despite I could work, I can't go >> further >> on my patch. >> >> I am currently working in very limited network, CPU and time >> ressources* which are quite uncommon in the western world, but are >> much less in the rest of the world. I have a 2GB/month mobile data >> plan and a 100KB/s speed. For the C part of my patch, I should >> download Visual Studio. The Express Edition 2015 is roughly 9GB. I >> can't afford that. >> >> I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and >> Fedora 23). Shortly, I couldn't get something working quickly and >> simply (quickly = less than 2 hours, downloading time NOT included, >> which is anyway way too already much). What went wrong and why it went >> wrong could be a whole new thread and is outside of the scope of this >> message. >> Let me precise this : at my work I use many virtualbox instances >> automatically fired and run in parallel to test new deployments and >> run unittests. I like this tool, >> but despite its simple look, it (most of the time) can not be used >> simply by a profane. The concepts it requires you to understand are >> not intuitive at first sight and there is *always* a thing that go >> wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox >> shipped for a moment a broken version of mount.vboxsf, preventing >> sharing folder to mount. Despite it's fixed, the broken releases >> spread everywhere and you may encounter them a lot in various Ubuntu >> and Virtualbox version. I downloaded the last versions of both and I >> am yet infected. https://www.virtualbox.org/ticket/12879). I could do >> whole new thread on why you can't ask newcomers to use Virtualbox >> (currently, at least). >> >> I ran into is a whole patch set to make CPython compile on MinGW >> (https://bugs.python.org/issue3871#msg199695). But it is not denying >> it's very experimental, and I know I would again spent useless hours >> trying to get it work rather than joyfully improving Python, and >> that's exactly what I do not want to happen. >> >> Getting ready to contribute to CPython pure python modules from an >> standard, average mr-everyone Windows PC for a beginner-to-medium >> contributor only require few megabytes of internet and few minutes of his >> time: getting a tarball of CPython sources (or cloning the github CPython >> mirror)**, a basic text editor and msys-git. The step further, if doing >> some -even basic- C code is required, implies downloading 9GB of Visual >> Studio and countless hours for it to be ready to use. >> I think downloading the whole Visual Studio suite is a huge stopper to >> contribute further for an average medium-or-below-contributor. >> >> I think (and I must not be the only one since CPython is to be moved >> to github), that barriers to contribute to CPython should be set to >> the lowest. >> Of course my situation is a bit special but I think it represents >> daily struggle of a *lot* of non-western programmer (at least for >> limited internet)(even here in Australia, landline limited internet >> connections are very common). >> It's not a big deal if the MinGW result build is twenty time slower or >> if some of the most advanced modules can't be build. But everyone >> programmer should be able to easily make some C hacks and get them to >> work. >> >> Hoping you'll be receptive to my pleas, >> Cheers >> >> >> * I am currently picking fruits in the regional Australia. I live in a van >> and have internet through with smartphone through an EDGE connection. I >> can >> plug the laptop in the farm but not in the van. >> ** No fresh programmer use mercurial unless he has a gun pointed on his >> head. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/drsalists%40gmail.com From tritium-list at sdamon.com Fri Feb 26 13:10:58 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Fri, 26 Feb 2016 13:10:58 -0500 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: <56D09178.3080000@sdamon.com> Message-ID: <56D09532.6090203@sdamon.com> Ok, fine. Bring a windows build bot online. And also take on the support burden of guiding people to which version of which compiler you use for each of the currently supported python versions. And go ahead and write the pep to change how wheel distributions work (which will effectively kill them, so yeah, good side benefit there.) Want to kill python on windows for anything that needs a c extension? go ahead, release one version of python with 2 ABIs. What do I know. On 2/26/2016 13:05, Dan Stromberg wrote: > But what do you really think? > > IMO, windows builds probably should do both visual studio and mingw. > That is, there probably should be two builds on windows, since there's > no clear consensus about which to use. > > I certainly prefer mingw over visual studio - and I have adequate > bandwidth for either. > > > On Fri, Feb 26, 2016 at 9:55 AM, Alexander Walters > wrote: >> No. >> >> Visual Studio is a solid compiler suit, mingw is a jenky mess, especially >> when you try and move to 64bit (where I don't think there is one true >> version of mingw). I'm sorry that Visual Studio makes it very hard for you >> to contribute, but changing THE compiler of the distribution from the >> platform compiler, especially when we FINALLY got a stable abi with it, is >> going to be a non starter. >> >> Compiling on MinGW for your own edification is fine, but that's not the >> build platform for windows python, nor should it be. Contributions are, and >> should continue to be, tested against Visual Studio. >> >> >> On 2/26/2016 05:12, Mathieu Dupuy wrote: >>> Hi. >>> I am currently working on adding some functionality on a standard >>> library module (http://bugs.python.org/issue15873). The Python part >>> went fine, but now I have to do the C counterpart, and I have ran into >>> in several problems, which, stacked up, are a huge obstacle to easily >>> contribute further. Currently, despite I could work, I can't go >>> further >>> on my patch. >>> >>> I am currently working in very limited network, CPU and time >>> ressources* which are quite uncommon in the western world, but are >>> much less in the rest of the world. I have a 2GB/month mobile data >>> plan and a 100KB/s speed. For the C part of my patch, I should >>> download Visual Studio. The Express Edition 2015 is roughly 9GB. I >>> can't afford that. >>> >>> I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and >>> Fedora 23). Shortly, I couldn't get something working quickly and >>> simply (quickly = less than 2 hours, downloading time NOT included, >>> which is anyway way too already much). What went wrong and why it went >>> wrong could be a whole new thread and is outside of the scope of this >>> message. >>> Let me precise this : at my work I use many virtualbox instances >>> automatically fired and run in parallel to test new deployments and >>> run unittests. I like this tool, >>> but despite its simple look, it (most of the time) can not be used >>> simply by a profane. The concepts it requires you to understand are >>> not intuitive at first sight and there is *always* a thing that go >>> wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox >>> shipped for a moment a broken version of mount.vboxsf, preventing >>> sharing folder to mount. Despite it's fixed, the broken releases >>> spread everywhere and you may encounter them a lot in various Ubuntu >>> and Virtualbox version. I downloaded the last versions of both and I >>> am yet infected. https://www.virtualbox.org/ticket/12879). I could do >>> whole new thread on why you can't ask newcomers to use Virtualbox >>> (currently, at least). >>> >>> I ran into is a whole patch set to make CPython compile on MinGW >>> (https://bugs.python.org/issue3871#msg199695). But it is not denying >>> it's very experimental, and I know I would again spent useless hours >>> trying to get it work rather than joyfully improving Python, and >>> that's exactly what I do not want to happen. >>> >>> Getting ready to contribute to CPython pure python modules from an >>> standard, average mr-everyone Windows PC for a beginner-to-medium >>> contributor only require few megabytes of internet and few minutes of his >>> time: getting a tarball of CPython sources (or cloning the github CPython >>> mirror)**, a basic text editor and msys-git. The step further, if doing >>> some -even basic- C code is required, implies downloading 9GB of Visual >>> Studio and countless hours for it to be ready to use. >>> I think downloading the whole Visual Studio suite is a huge stopper to >>> contribute further for an average medium-or-below-contributor. >>> >>> I think (and I must not be the only one since CPython is to be moved >>> to github), that barriers to contribute to CPython should be set to >>> the lowest. >>> Of course my situation is a bit special but I think it represents >>> daily struggle of a *lot* of non-western programmer (at least for >>> limited internet)(even here in Australia, landline limited internet >>> connections are very common). >>> It's not a big deal if the MinGW result build is twenty time slower or >>> if some of the most advanced modules can't be build. But everyone >>> programmer should be able to easily make some C hacks and get them to >>> work. >>> >>> Hoping you'll be receptive to my pleas, >>> Cheers >>> >>> >>> * I am currently picking fruits in the regional Australia. I live in a van >>> and have internet through with smartphone through an EDGE connection. I >>> can >>> plug the laptop in the farm but not in the van. >>> ** No fresh programmer use mercurial unless he has a gun pointed on his >>> head. >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/drsalists%40gmail.com From steve.dower at python.org Fri Feb 26 13:13:21 2016 From: steve.dower at python.org (Steve Dower) Date: Fri, 26 Feb 2016 10:13:21 -0800 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: Message-ID: <56D095C1.4080507@python.org> Hi Mathieu I just want to say that we are very aware of the concerns and issues faced here and people (including myself) are actively working to resolve them. For example, I am working with the Visual C++ team to encourage and support work such as https://blogs.msdn.microsoft.com/vcblog/2016/02/16/try-out-the-latest-c-compiler-toolset-without-waiting-for-the-next-update-of-visual-studio/, which strips out most of that 9GB install. It currently is not sufficient for building Python or extension modules, and is likely to be 500MB+ by the time it is, but I am directly in contact with the team involved to achieve that. Once this is available, I will be looking into ways to make it easier to install the compiler. (I work at Microsoft, so I have a better ability than most to influence this sort of thing.) In parallel, there is plenty going on over on distutils-sig to improve the package ecosystem in this respect (which I know is not your immediate concern, but it always comes up as soon as someone mentions a compiler). Improved sdist and wheel capabilities are critical to being able to help non-Windows developers produce pre-compiled software for their users. Supporting a second compiler for CPython is a fairly significant task. We are willing to do it when someone volunteers to actively maintain it and manage a buildbot for it, but so far the best offer has been patches (not all of which are ready to be committed, though I haven't gone through them recently so that may have been improved). Releasing builds from a second compiler is a big compatibility concern, as it is highly likely (right now) that existing wheels will simply crash when used. Dealing with that - probably by treating wheels as incompatible - is essential before we want other distros thinking they can release a MinGW build. There's also plenty of distutils work required to support it properly, and I'm not sure how much of that should/could be done anyway. So basically, smaller packages for MSVC are on their way, and MinGW support is blocked until someone commits to supporting it for an extended period of time. Hopefully that helps clarify our position. Cheers, Steve On 26Feb2016 0212, Mathieu Dupuy wrote: > Hi. > I am currently working on adding some functionality on a standard > library module (http://bugs.python.org/issue15873). The Python part > went fine, but now I have to do the C counterpart, and I have ran into > in several problems, which, stacked up, are a huge obstacle to easily > contribute further. Currently, despite I could work, I can't go > further > on my patch. > > I am currently working in very limited network, CPU and time > ressources* which are quite uncommon in the western world, but are > much less in the rest of the world. I have a 2GB/month mobile data > plan and a 100KB/s speed. For the C part of my patch, I should > download Visual Studio. The Express Edition 2015 is roughly 9GB. I > can't afford that. > > I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and > Fedora 23). Shortly, I couldn't get something working quickly and > simply (quickly = less than 2 hours, downloading time NOT included, > which is anyway way too already much). What went wrong and why it went > wrong could be a whole new thread and is outside of the scope of this > message. > Let me precise this : at my work I use many virtualbox instances > automatically fired and run in parallel to test new deployments and > run unittests. I like this tool, > but despite its simple look, it (most of the time) can not be used > simply by a profane. The concepts it requires you to understand are > not intuitive at first sight and there is *always* a thing that go > wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox > shipped for a moment a broken version of mount.vboxsf, preventing > sharing folder to mount. Despite it's fixed, the broken releases > spread everywhere and you may encounter them a lot in various Ubuntu > and Virtualbox version. I downloaded the last versions of both and I > am yet infected. https://www.virtualbox.org/ticket/12879). I could do > whole new thread on why you can't ask newcomers to use Virtualbox > (currently, at least). > > I ran into is a whole patch set to make CPython compile on MinGW > (https://bugs.python.org/issue3871#msg199695). But it is not denying > it's very experimental, and I know I would again spent useless hours > trying to get it work rather than joyfully improving Python, and > that's exactly what I do not want to happen. > > Getting ready to contribute to CPython pure python modules from an > standard, average mr-everyone Windows PC for a beginner-to-medium > contributor only require few megabytes of internet and few minutes of his > time: getting a tarball of CPython sources (or cloning the github CPython > mirror)**, a basic text editor and msys-git. The step further, if doing > some -even basic- C code is required, implies downloading 9GB of Visual > Studio and countless hours for it to be ready to use. > I think downloading the whole Visual Studio suite is a huge stopper to > contribute further for an average medium-or-below-contributor. > > I think (and I must not be the only one since CPython is to be moved > to github), that barriers to contribute to CPython should be set to > the lowest. > Of course my situation is a bit special but I think it represents > daily struggle of a *lot* of non-western programmer (at least for > limited internet)(even here in Australia, landline limited internet > connections are very common). > It's not a big deal if the MinGW result build is twenty time slower or > if some of the most advanced modules can't be build. But everyone > programmer should be able to easily make some C hacks and get them to > work. > > Hoping you'll be receptive to my pleas, > Cheers > > > * I am currently picking fruits in the regional Australia. I live in a van > and have internet through with smartphone through an EDGE connection. I can > plug the laptop in the farm but not in the van. > ** No fresh programmer use mercurial unless he has a gun pointed on his > head. From tritium-list at sdamon.com Fri Feb 26 13:16:19 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Fri, 26 Feb 2016 13:16:19 -0500 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: <56D09178.3080000@sdamon.com> <56D09532.6090203@sdamon.com> Message-ID: <56D09673.4040906@sdamon.com> You mean honestly pointing out what would happen with a suggestion? It is a horrifically bad idea. I didn't say they were bad people. On 2/26/2016 13:14, Brian Curtin wrote: > The attitude in these responses is counter productive and not really > how it works on this list. > > On Fri, Feb 26, 2016 at 1:10 PM, Alexander Walters > wrote: >> Ok, fine. Bring a windows build bot online. And also take on the support >> burden of guiding people to which version of which compiler you use for each >> of the currently supported python versions. And go ahead and write the pep >> to change how wheel distributions work (which will effectively kill them, so >> yeah, good side benefit there.) >> >> Want to kill python on windows for anything that needs a c extension? go >> ahead, release one version of python with 2 ABIs. >> >> What do I know. >> >> >> On 2/26/2016 13:05, Dan Stromberg wrote: >>> But what do you really think? >>> >>> IMO, windows builds probably should do both visual studio and mingw. >>> That is, there probably should be two builds on windows, since there's >>> no clear consensus about which to use. >>> >>> I certainly prefer mingw over visual studio - and I have adequate >>> bandwidth for either. >>> >>> >>> On Fri, Feb 26, 2016 at 9:55 AM, Alexander Walters >>> wrote: >>>> No. >>>> >>>> Visual Studio is a solid compiler suit, mingw is a jenky mess, especially >>>> when you try and move to 64bit (where I don't think there is one true >>>> version of mingw). I'm sorry that Visual Studio makes it very hard for >>>> you >>>> to contribute, but changing THE compiler of the distribution from the >>>> platform compiler, especially when we FINALLY got a stable abi with it, >>>> is >>>> going to be a non starter. >>>> >>>> Compiling on MinGW for your own edification is fine, but that's not the >>>> build platform for windows python, nor should it be. Contributions are, >>>> and >>>> should continue to be, tested against Visual Studio. >>>> >>>> >>>> On 2/26/2016 05:12, Mathieu Dupuy wrote: >>>>> Hi. >>>>> I am currently working on adding some functionality on a standard >>>>> library module (http://bugs.python.org/issue15873). The Python part >>>>> went fine, but now I have to do the C counterpart, and I have ran into >>>>> in several problems, which, stacked up, are a huge obstacle to easily >>>>> contribute further. Currently, despite I could work, I can't go >>>>> further >>>>> on my patch. >>>>> >>>>> I am currently working in very limited network, CPU and time >>>>> ressources* which are quite uncommon in the western world, but are >>>>> much less in the rest of the world. I have a 2GB/month mobile data >>>>> plan and a 100KB/s speed. For the C part of my patch, I should >>>>> download Visual Studio. The Express Edition 2015 is roughly 9GB. I >>>>> can't afford that. >>>>> >>>>> I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and >>>>> Fedora 23). Shortly, I couldn't get something working quickly and >>>>> simply (quickly = less than 2 hours, downloading time NOT included, >>>>> which is anyway way too already much). What went wrong and why it went >>>>> wrong could be a whole new thread and is outside of the scope of this >>>>> message. >>>>> Let me precise this : at my work I use many virtualbox instances >>>>> automatically fired and run in parallel to test new deployments and >>>>> run unittests. I like this tool, >>>>> but despite its simple look, it (most of the time) can not be used >>>>> simply by a profane. The concepts it requires you to understand are >>>>> not intuitive at first sight and there is *always* a thing that go >>>>> wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox >>>>> shipped for a moment a broken version of mount.vboxsf, preventing >>>>> sharing folder to mount. Despite it's fixed, the broken releases >>>>> spread everywhere and you may encounter them a lot in various Ubuntu >>>>> and Virtualbox version. I downloaded the last versions of both and I >>>>> am yet infected. https://www.virtualbox.org/ticket/12879). I could do >>>>> whole new thread on why you can't ask newcomers to use Virtualbox >>>>> (currently, at least). >>>>> >>>>> I ran into is a whole patch set to make CPython compile on MinGW >>>>> (https://bugs.python.org/issue3871#msg199695). But it is not denying >>>>> it's very experimental, and I know I would again spent useless hours >>>>> trying to get it work rather than joyfully improving Python, and >>>>> that's exactly what I do not want to happen. >>>>> >>>>> Getting ready to contribute to CPython pure python modules from an >>>>> standard, average mr-everyone Windows PC for a beginner-to-medium >>>>> contributor only require few megabytes of internet and few minutes of >>>>> his >>>>> time: getting a tarball of CPython sources (or cloning the github >>>>> CPython >>>>> mirror)**, a basic text editor and msys-git. The step further, if doing >>>>> some -even basic- C code is required, implies downloading 9GB of Visual >>>>> Studio and countless hours for it to be ready to use. >>>>> I think downloading the whole Visual Studio suite is a huge stopper to >>>>> contribute further for an average medium-or-below-contributor. >>>>> >>>>> I think (and I must not be the only one since CPython is to be moved >>>>> to github), that barriers to contribute to CPython should be set to >>>>> the lowest. >>>>> Of course my situation is a bit special but I think it represents >>>>> daily struggle of a *lot* of non-western programmer (at least for >>>>> limited internet)(even here in Australia, landline limited internet >>>>> connections are very common). >>>>> It's not a big deal if the MinGW result build is twenty time slower or >>>>> if some of the most advanced modules can't be build. But everyone >>>>> programmer should be able to easily make some C hacks and get them to >>>>> work. >>>>> >>>>> Hoping you'll be receptive to my pleas, >>>>> Cheers >>>>> >>>>> >>>>> * I am currently picking fruits in the regional Australia. I live in a >>>>> van >>>>> and have internet through with smartphone through an EDGE connection. I >>>>> can >>>>> plug the laptop in the farm but not in the van. >>>>> ** No fresh programmer use mercurial unless he has a gun pointed on his >>>>> head. >>>>> _______________________________________________ >>>>> Python-Dev mailing list >>>>> Python-Dev at python.org >>>>> https://mail.python.org/mailman/listinfo/python-dev >>>>> Unsubscribe: >>>>> >>>>> https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com >>>> >>>> _______________________________________________ >>>> Python-Dev mailing list >>>> Python-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/python-dev >>>> Unsubscribe: >>>> https://mail.python.org/mailman/options/python-dev/drsalists%40gmail.com >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/brian%40python.org From rdmurray at bitdance.com Fri Feb 26 13:24:25 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 26 Feb 2016 13:24:25 -0500 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: <56D09178.3080000@sdamon.com> Message-ID: <20160226182426.C949FB14101@webabinitio.net> On Fri, 26 Feb 2016 10:05:19 -0800, Dan Stromberg wrote: > But what do you really think? > > IMO, windows builds probably should do both visual studio and mingw. > That is, there probably should be two builds on windows, since there's > no clear consensus about which to use. > > I certainly prefer mingw over visual studio - and I have adequate > bandwidth for either. I don't think there is much if any objection to the idea of making CPython compilable with mingw, we just need the official supported release to be the VS one for compatibility reasons. But, there has historically been a lack of a clear target in the mingw space for someone to actually produce a working generalized port (as opposed to, say, cygwin), much less generate a set of reviewable patches that could be incorporated in to the repository. (Among other things for the latter we would need a mingw buildbot, and no one has stepped forward on that front at all, as far as I know.) I think there has been some progress lately, but it is a hard problem and needs more volunteer time. Ideally we'd have someone who is all of passionate enough about it, knowledgeable enough about it, and patient enough to work with the others in the community who need to be involved. --David From brett at python.org Fri Feb 26 14:03:12 2016 From: brett at python.org (Brett Cannon) Date: Fri, 26 Feb 2016 19:03:12 +0000 Subject: [Python-Dev] Responding in a nice way (was: Python should be easily compilable on Windows with MinGW In-Reply-To: <56D09673.4040906@sdamon.com> References: <56D09178.3080000@sdamon.com> <56D09532.6090203@sdamon.com> <56D09673.4040906@sdamon.com> Message-ID: On Fri, 26 Feb 2016 at 10:18 Alexander Walters wrote: > You mean honestly pointing out what would happen with a suggestion? It > is a horrifically bad idea. I didn't say they were bad people. > You're right, you didn't directly insult Mathieu, but the tone was unnecessary. Calling mingw a "jenky mess" in your first response was not needed. Dan's response with "But what do you really think?" was also unnecessary as it was antagonistic. I suspect he was reacting to your rather emphatic "no" response instead of simply saying "it was be rather hard to make work" and leave it at that. But then your response to Dan crossed a line with the biting sarcasm. That tone was definitely unnecessary and every point you made in that email could have been phrased in a nicer fashion and still get the point across just as well. If you didn't like Dan's response you could have simply replied saying that fact and then kept the tone civil. On this list we believe you shouldn't respond to bad behaviour with worse behaviour. -Brett > > On 2/26/2016 13:14, Brian Curtin wrote: > > The attitude in these responses is counter productive and not really > > how it works on this list. > > > > On Fri, Feb 26, 2016 at 1:10 PM, Alexander Walters > > wrote: > >> Ok, fine. Bring a windows build bot online. And also take on the > support > >> burden of guiding people to which version of which compiler you use for > each > >> of the currently supported python versions. And go ahead and write the > pep > >> to change how wheel distributions work (which will effectively kill > them, so > >> yeah, good side benefit there.) > >> > >> Want to kill python on windows for anything that needs a c extension? > go > >> ahead, release one version of python with 2 ABIs. > >> > >> What do I know. > >> > >> > >> On 2/26/2016 13:05, Dan Stromberg wrote: > >>> But what do you really think? > >>> > >>> IMO, windows builds probably should do both visual studio and mingw. > >>> That is, there probably should be two builds on windows, since there's > >>> no clear consensus about which to use. > >>> > >>> I certainly prefer mingw over visual studio - and I have adequate > >>> bandwidth for either. > >>> > >>> > >>> On Fri, Feb 26, 2016 at 9:55 AM, Alexander Walters > >>> wrote: > >>>> No. > >>>> > >>>> Visual Studio is a solid compiler suit, mingw is a jenky mess, > especially > >>>> when you try and move to 64bit (where I don't think there is one true > >>>> version of mingw). I'm sorry that Visual Studio makes it very hard > for > >>>> you > >>>> to contribute, but changing THE compiler of the distribution from the > >>>> platform compiler, especially when we FINALLY got a stable abi with > it, > >>>> is > >>>> going to be a non starter. > >>>> > >>>> Compiling on MinGW for your own edification is fine, but that's not > the > >>>> build platform for windows python, nor should it be. Contributions > are, > >>>> and > >>>> should continue to be, tested against Visual Studio. > >>>> > >>>> > >>>> On 2/26/2016 05:12, Mathieu Dupuy wrote: > >>>>> Hi. > >>>>> I am currently working on adding some functionality on a standard > >>>>> library module (http://bugs.python.org/issue15873). The Python part > >>>>> went fine, but now I have to do the C counterpart, and I have ran > into > >>>>> in several problems, which, stacked up, are a huge obstacle to easily > >>>>> contribute further. Currently, despite I could work, I can't go > >>>>> further > >>>>> on my patch. > >>>>> > >>>>> I am currently working in very limited network, CPU and time > >>>>> ressources* which are quite uncommon in the western world, but are > >>>>> much less in the rest of the world. I have a 2GB/month mobile data > >>>>> plan and a 100KB/s speed. For the C part of my patch, I should > >>>>> download Visual Studio. The Express Edition 2015 is roughly 9GB. I > >>>>> can't afford that. > >>>>> > >>>>> I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and > >>>>> Fedora 23). Shortly, I couldn't get something working quickly and > >>>>> simply (quickly = less than 2 hours, downloading time NOT included, > >>>>> which is anyway way too already much). What went wrong and why it > went > >>>>> wrong could be a whole new thread and is outside of the scope of this > >>>>> message. > >>>>> Let me precise this : at my work I use many virtualbox instances > >>>>> automatically fired and run in parallel to test new deployments and > >>>>> run unittests. I like this tool, > >>>>> but despite its simple look, it (most of the time) can not be used > >>>>> simply by a profane. The concepts it requires you to understand are > >>>>> not intuitive at first sight and there is *always* a thing that go > >>>>> wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox > >>>>> shipped for a moment a broken version of mount.vboxsf, preventing > >>>>> sharing folder to mount. Despite it's fixed, the broken releases > >>>>> spread everywhere and you may encounter them a lot in various Ubuntu > >>>>> and Virtualbox version. I downloaded the last versions of both and I > >>>>> am yet infected. https://www.virtualbox.org/ticket/12879). I could > do > >>>>> whole new thread on why you can't ask newcomers to use Virtualbox > >>>>> (currently, at least). > >>>>> > >>>>> I ran into is a whole patch set to make CPython compile on MinGW > >>>>> (https://bugs.python.org/issue3871#msg199695). But it is not denying > >>>>> it's very experimental, and I know I would again spent useless hours > >>>>> trying to get it work rather than joyfully improving Python, and > >>>>> that's exactly what I do not want to happen. > >>>>> > >>>>> Getting ready to contribute to CPython pure python modules from an > >>>>> standard, average mr-everyone Windows PC for a beginner-to-medium > >>>>> contributor only require few megabytes of internet and few minutes of > >>>>> his > >>>>> time: getting a tarball of CPython sources (or cloning the github > >>>>> CPython > >>>>> mirror)**, a basic text editor and msys-git. The step further, if > doing > >>>>> some -even basic- C code is required, implies downloading 9GB of > Visual > >>>>> Studio and countless hours for it to be ready to use. > >>>>> I think downloading the whole Visual Studio suite is a huge stopper > to > >>>>> contribute further for an average medium-or-below-contributor. > >>>>> > >>>>> I think (and I must not be the only one since CPython is to be moved > >>>>> to github), that barriers to contribute to CPython should be set to > >>>>> the lowest. > >>>>> Of course my situation is a bit special but I think it represents > >>>>> daily struggle of a *lot* of non-western programmer (at least for > >>>>> limited internet)(even here in Australia, landline limited internet > >>>>> connections are very common). > >>>>> It's not a big deal if the MinGW result build is twenty time slower > or > >>>>> if some of the most advanced modules can't be build. But everyone > >>>>> programmer should be able to easily make some C hacks and get them to > >>>>> work. > >>>>> > >>>>> Hoping you'll be receptive to my pleas, > >>>>> Cheers > >>>>> > >>>>> > >>>>> * I am currently picking fruits in the regional Australia. I live in > a > >>>>> van > >>>>> and have internet through with smartphone through an EDGE > connection. I > >>>>> can > >>>>> plug the laptop in the farm but not in the van. > >>>>> ** No fresh programmer use mercurial unless he has a gun pointed on > his > >>>>> head. > >>>>> _______________________________________________ > >>>>> Python-Dev mailing list > >>>>> Python-Dev at python.org > >>>>> https://mail.python.org/mailman/listinfo/python-dev > >>>>> Unsubscribe: > >>>>> > >>>>> > https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com > >>>> > >>>> _______________________________________________ > >>>> Python-Dev mailing list > >>>> Python-Dev at python.org > >>>> https://mail.python.org/mailman/listinfo/python-dev > >>>> Unsubscribe: > >>>> > https://mail.python.org/mailman/options/python-dev/drsalists%40gmail.com > >> > >> _______________________________________________ > >> Python-Dev mailing list > >> Python-Dev at python.org > >> https://mail.python.org/mailman/listinfo/python-dev > >> Unsubscribe: > >> https://mail.python.org/mailman/options/python-dev/brian%40python.org > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy.terrel at gmail.com Fri Feb 26 14:06:51 2016 From: andy.terrel at gmail.com (Andy Ray Terrel) Date: Fri, 26 Feb 2016 13:06:51 -0600 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: <20160226182426.C949FB14101@webabinitio.net> References: <56D09178.3080000@sdamon.com> <20160226182426.C949FB14101@webabinitio.net> Message-ID: You might be interested in a project funded by NumFOCUS + PSF to make the mingw ecosystem work with CPython. http://mingwpy.github.io/ -- Andy On Fri, Feb 26, 2016 at 12:24 PM, R. David Murray wrote: > On Fri, 26 Feb 2016 10:05:19 -0800, Dan Stromberg > wrote: > > But what do you really think? > > > > IMO, windows builds probably should do both visual studio and mingw. > > That is, there probably should be two builds on windows, since there's > > no clear consensus about which to use. > > > > I certainly prefer mingw over visual studio - and I have adequate > > bandwidth for either. > > I don't think there is much if any objection to the idea of making CPython > compilable with mingw, we just need the official supported release to > be the VS one for compatibility reasons. > > But, there has historically been a lack of a clear target in the mingw > space for someone to actually produce a working generalized port (as > opposed to, say, cygwin), much less generate a set of reviewable patches > that could be incorporated in to the repository. (Among other things > for the latter we would need a mingw buildbot, and no one has stepped > forward on that front at all, as far as I know.) > > I think there has been some progress lately, but it is a hard problem > and needs more volunteer time. Ideally we'd have someone who is all of > passionate enough about it, knowledgeable enough about it, and patient > enough to work with the others in the community who need to be involved. > > --David > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/andy.terrel%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Feb 26 14:12:28 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 26 Feb 2016 11:12:28 -0800 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: <56D09178.3080000@sdamon.com> References: <56D09178.3080000@sdamon.com> Message-ID: On Feb 26, 2016 9:56 AM, "Alexander Walters" wrote: > > No. > > Visual Studio is a solid compiler suit, mingw is a jenky mess, especially when you try and move to 64bit (where I don't think there is one true version of mingw). I see why you say this, but I think you're seriously underestimating mingw. There's the split between the legacy mingw and mingw-w64 fork, but that's a non-issue: legacy mingw is basically dead, everyone uses mingw-w64 for both 32- and 64-bits, no big deal. If you want a precompiled mingw-w64, then there's a ton of different distributions to choose from, but that's true of lots of F/OSS projects, including python itself. There are a few different official build configurations (e.g. "seh" vs "sjlj"), but they all have to do with the C++ abi so don't matter for python itself. The one real issue is that classically mingw-w64 has not been compatible with msvc -- mostly due to using different C runtimes, but with a few other corner case issues too. This, however, is fixable. The PSF- and NumFOCUS-sponsored mingwpy project is near to having builds of mingw-w64 that are abi compatible with all cpython releases from 2.6-3.4: http://mingwpy.github.io/ The bigger challenge is getting a version of mingw-w64 that is compatible with the latest MSVC used for cpython 3.5+, since they rearranged the runtimes (for the better, but...). Unfortunately this is also the case that's relevant for OP's use case of working on cpython itself. There's no technical obstacle to doing this, and it's a major priority for both the numerical python community and mingw-w64 upstream, but it's not currently clear who will do the work and if/how it will be funded. In any case though I agree that now is not a good time to start trying to support two incompatible windows ABIs in python upstream, just we're converging on a common abi across the different compilers. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Feb 26 14:44:54 2016 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Feb 2016 11:44:54 -0800 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: Message-ID: <6849679B-83FD-4255-898D-76FE8A479431@yahoo.com> One alternative to consider is using Cygwin. A complete Cygwin environment, including a GCC toolchain, is pretty small. And it can build a *nix-style CPython that works inside the Cygwin environment. That may not be sufficient for a lot of uses, but for your purpose, it should be. Another alternative, as crazy as it may sound, is to get an AWS-Free-Tier EC2 instance and develop on that. Or, of course, buy an ancient used laptop and install linux natively. Obviously none of these are ideal, but they may still be better for you than waiting for a complete MinGW port of Python or a smaller MSVC toolchain. > On Feb 26, 2016, at 02:12, Mathieu Dupuy wrote: > > Hi. > I am currently working on adding some functionality on a standard > library module (http://bugs.python.org/issue15873). The Python part > went fine, but now I have to do the C counterpart, and I have ran into > in several problems, which, stacked up, are a huge obstacle to easily > contribute further. Currently, despite I could work, I can't go > further > on my patch. > > I am currently working in very limited network, CPU and time > ressources* which are quite uncommon in the western world, but are > much less in the rest of the world. I have a 2GB/month mobile data > plan and a 100KB/s speed. For the C part of my patch, I should > download Visual Studio. The Express Edition 2015 is roughly 9GB. I > can't afford that. > > I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and > Fedora 23). Shortly, I couldn't get something working quickly and > simply (quickly = less than 2 hours, downloading time NOT included, > which is anyway way too already much). What went wrong and why it went > wrong could be a whole new thread and is outside of the scope of this > message. > Let me precise this : at my work I use many virtualbox instances > automatically fired and run in parallel to test new deployments and > run unittests. I like this tool, > but despite its simple look, it (most of the time) can not be used > simply by a profane. The concepts it requires you to understand are > not intuitive at first sight and there is *always* a thing that go > wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox > shipped for a moment a broken version of mount.vboxsf, preventing > sharing folder to mount. Despite it's fixed, the broken releases > spread everywhere and you may encounter them a lot in various Ubuntu > and Virtualbox version. I downloaded the last versions of both and I > am yet infected. https://www.virtualbox.org/ticket/12879). I could do > whole new thread on why you can't ask newcomers to use Virtualbox > (currently, at least). > > I ran into is a whole patch set to make CPython compile on MinGW > (https://bugs.python.org/issue3871#msg199695). But it is not denying > it's very experimental, and I know I would again spent useless hours > trying to get it work rather than joyfully improving Python, and > that's exactly what I do not want to happen. > > Getting ready to contribute to CPython pure python modules from an > standard, average mr-everyone Windows PC for a beginner-to-medium > contributor only require few megabytes of internet and few minutes of his > time: getting a tarball of CPython sources (or cloning the github CPython > mirror)**, a basic text editor and msys-git. The step further, if doing > some -even basic- C code is required, implies downloading 9GB of Visual > Studio and countless hours for it to be ready to use. > I think downloading the whole Visual Studio suite is a huge stopper to > contribute further for an average medium-or-below-contributor. > > I think (and I must not be the only one since CPython is to be moved > to github), that barriers to contribute to CPython should be set to > the lowest. > Of course my situation is a bit special but I think it represents > daily struggle of a *lot* of non-western programmer (at least for > limited internet)(even here in Australia, landline limited internet > connections are very common). > It's not a big deal if the MinGW result build is twenty time slower or > if some of the most advanced modules can't be build. But everyone > programmer should be able to easily make some C hacks and get them to > work. > > Hoping you'll be receptive to my pleas, > Cheers > > > * I am currently picking fruits in the regional Australia. I live in a van > and have internet through with smartphone through an EDGE connection. I can > plug the laptop in the farm but not in the van. > ** No fresh programmer use mercurial unless he has a gun pointed on his > head. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/abarnert%40yahoo.com From deronnax at gmail.com Sat Feb 27 06:25:39 2016 From: deronnax at gmail.com (Mathieu Dupuy) Date: Sat, 27 Feb 2016 19:25:39 +0800 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: <20160215202720.GA27790@phdru.name> References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> Message-ID: Ahah. Obtaining his electronic coordinates like email to gently ask him to pull it down by himself (otherwise we open fire). Because having Github suddenly destroying the repo, even though the man probably forgot about its existence might be a bit rude from the polite people python developers are. 2016-02-16 4:27 UTC+08:00, Oleg Broytman : > On Tue, Feb 16, 2016 at 09:16:56AM +1300, Greg Ewing > wrote: >> Mathieu Dupuy wrote: >> >A python representative (like Guido himself) should contact Github to >> >obtain coordinates of the owner... >> >> ...and then order a drone strike on him? > > Yes, and then pry the repo from his cold dead fingers. > > Well, I hope prying can be done without striking first. ;-) > >> -- >> Greg > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/deronnax%40gmail.com > From g.rodola at gmail.com Sat Feb 27 14:05:26 2016 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sat, 27 Feb 2016 20:05:26 +0100 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: <56D09178.3080000@sdamon.com> References: <56D09178.3080000@sdamon.com> Message-ID: On Fri, Feb 26, 2016 at 6:55 PM, Alexander Walters wrote: > No. > > Visual Studio is a solid compiler suit, mingw is a jenky mess, especially > when you try and move to 64bit (where I don't think there is one true > version of mingw). I'm sorry that Visual Studio makes it very hard for you > to contribute, but changing THE compiler of the distribution from the > platform compiler, especially when we FINALLY got a stable abi with it, is > going to be a non starter. > > Compiling on MinGW for your own edification is fine, but that's not the > build platform for windows python, nor should it be. Contributions are, and > should continue to be, tested against Visual Studio. > > > On 2/26/2016 05:12, Mathieu Dupuy wrote: > >> Hi. >> I am currently working on adding some functionality on a standard >> library module (http://bugs.python.org/issue15873). The Python part >> went fine, but now I have to do the C counterpart, and I have ran into >> in several problems, which, stacked up, are a huge obstacle to easily >> contribute further. Currently, despite I could work, I can't go >> further >> on my patch. >> >> I am currently working in very limited network, CPU and time >> ressources* which are quite uncommon in the western world, but are >> much less in the rest of the world. I have a 2GB/month mobile data >> plan and a 100KB/s speed. For the C part of my patch, I should >> download Visual Studio. The Express Edition 2015 is roughly 9GB. I >> can't afford that. >> >> I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and >> Fedora 23). Shortly, I couldn't get something working quickly and >> simply (quickly = less than 2 hours, downloading time NOT included, >> which is anyway way too already much). What went wrong and why it went >> wrong could be a whole new thread and is outside of the scope of this >> message. >> Let me precise this : at my work I use many virtualbox instances >> automatically fired and run in parallel to test new deployments and >> run unittests. I like this tool, >> but despite its simple look, it (most of the time) can not be used >> simply by a profane. The concepts it requires you to understand are >> not intuitive at first sight and there is *always* a thing that go >> wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox >> shipped for a moment a broken version of mount.vboxsf, preventing >> sharing folder to mount. Despite it's fixed, the broken releases >> spread everywhere and you may encounter them a lot in various Ubuntu >> and Virtualbox version. I downloaded the last versions of both and I >> am yet infected. https://www.virtualbox.org/ticket/12879). I could do >> whole new thread on why you can't ask newcomers to use Virtualbox >> (currently, at least). >> >> I ran into is a whole patch set to make CPython compile on MinGW >> (https://bugs.python.org/issue3871#msg199695). But it is not denying >> it's very experimental, and I know I would again spent useless hours >> trying to get it work rather than joyfully improving Python, and >> that's exactly what I do not want to happen. >> >> Getting ready to contribute to CPython pure python modules from an >> standard, average mr-everyone Windows PC for a beginner-to-medium >> contributor only require few megabytes of internet and few minutes of his >> time: getting a tarball of CPython sources (or cloning the github CPython >> mirror)**, a basic text editor and msys-git. The step further, if doing >> some -even basic- C code is required, implies downloading 9GB of Visual >> Studio and countless hours for it to be ready to use. >> I think downloading the whole Visual Studio suite is a huge stopper to >> contribute further for an average medium-or-below-contributor. >> >> I think (and I must not be the only one since CPython is to be moved >> to github), that barriers to contribute to CPython should be set to >> the lowest. >> Of course my situation is a bit special but I think it represents >> daily struggle of a *lot* of non-western programmer (at least for >> limited internet)(even here in Australia, landline limited internet >> connections are very common). >> It's not a big deal if the MinGW result build is twenty time slower or >> if some of the most advanced modules can't be build. But everyone >> programmer should be able to easily make some C hacks and get them to >> work. >> >> Hoping you'll be receptive to my pleas, >> Cheers >> >> >> * I am currently picking fruits in the regional Australia. I live in a van >> and have internet through with smartphone through an EDGE connection. I >> can >> plug the laptop in the farm but not in the van. >> ** No fresh programmer use mercurial unless he has a gun pointed on his >> head. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com > My personal experience with psutil is that at some point I simply gave up trying to support mingw because it was too difficult and I couldn't keep up with it anymore. I had to hack through all sorts of missing stuff such as: https://github.com/giampaolo/psutil/blob/08d490d0d8fa60ee1d689cca30738ceb599298d0/psutil/arch/windows/ntextapi.h, I always wondered why mingw doesn't do it for me, plus the fact that I never really got it to work with 64-bit Python. I can only imagine what it would mean doing something similar in a much larger C code base such as Python's. The advantage of being able to use something else other than VS (which I despise as well) would undoubtedly be enormous, but from my experience mingw is an alternative which simply doesn't work. -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sat Feb 27 16:27:25 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 27 Feb 2016 16:27:25 -0500 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: Message-ID: For this particular case, is there someone generous enough (or, can someone apply for a PSF grant) to ship Mathieu a DVD/two/flash drive? On Feb 26, 2016 12:18 PM, "Mathieu Dupuy" wrote: > Hi. > I am currently working on adding some functionality on a standard > library module (http://bugs.python.org/issue15873). The Python part > went fine, but now I have to do the C counterpart, and I have ran into > in several problems, which, stacked up, are a huge obstacle to easily > contribute further. Currently, despite I could work, I can't go > further > on my patch. > > I am currently working in very limited network, CPU and time > ressources* which are quite uncommon in the western world, but are > much less in the rest of the world. I have a 2GB/month mobile data > plan and a 100KB/s speed. For the C part of my patch, I should > download Visual Studio. The Express Edition 2015 is roughly 9GB. I > can't afford that. > > I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and > Fedora 23). Shortly, I couldn't get something working quickly and > simply (quickly = less than 2 hours, downloading time NOT included, > which is anyway way too already much). What went wrong and why it went > wrong could be a whole new thread and is outside of the scope of this > message. > Let me precise this : at my work I use many virtualbox instances > automatically fired and run in parallel to test new deployments and > run unittests. I like this tool, > but despite its simple look, it (most of the time) can not be used > simply by a profane. The concepts it requires you to understand are > not intuitive at first sight and there is *always* a thing that go > wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox > shipped for a moment a broken version of mount.vboxsf, preventing > sharing folder to mount. Despite it's fixed, the broken releases > spread everywhere and you may encounter them a lot in various Ubuntu > and Virtualbox version. I downloaded the last versions of both and I > am yet infected. https://www.virtualbox.org/ticket/12879). I could do > whole new thread on why you can't ask newcomers to use Virtualbox > (currently, at least). > > I ran into is a whole patch set to make CPython compile on MinGW > (https://bugs.python.org/issue3871#msg199695). But it is not denying > it's very experimental, and I know I would again spent useless hours > trying to get it work rather than joyfully improving Python, and > that's exactly what I do not want to happen. > > Getting ready to contribute to CPython pure python modules from an > standard, average mr-everyone Windows PC for a beginner-to-medium > contributor only require few megabytes of internet and few minutes of his > time: getting a tarball of CPython sources (or cloning the github CPython > mirror)**, a basic text editor and msys-git. The step further, if doing > some -even basic- C code is required, implies downloading 9GB of Visual > Studio and countless hours for it to be ready to use. > I think downloading the whole Visual Studio suite is a huge stopper to > contribute further for an average medium-or-below-contributor. > > I think (and I must not be the only one since CPython is to be moved > to github), that barriers to contribute to CPython should be set to > the lowest. > Of course my situation is a bit special but I think it represents > daily struggle of a *lot* of non-western programmer (at least for > limited internet)(even here in Australia, landline limited internet > connections are very common). > It's not a big deal if the MinGW result build is twenty time slower or > if some of the most advanced modules can't be build. But everyone > programmer should be able to easily make some C hacks and get them to > work. > > Hoping you'll be receptive to my pleas, > Cheers > > > * I am currently picking fruits in the regional Australia. I live in a van > and have internet through with smartphone through an EDGE connection. I can > plug the laptop in the farm but not in the van. > ** No fresh programmer use mercurial unless he has a gun pointed on his > head. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Sat Feb 27 16:35:11 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Sat, 27 Feb 2016 16:35:11 -0500 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: Message-ID: <56D2168F.8070805@sdamon.com> The 9 gig initial download is not the only problem. Visual studio is very bandwidth hungry in day to day operations (between polling websites and vcs remotes, near constant updating, integration with the VS web service, etc.). You can of course shut all of that off, but it's a pain. It's my understanding from Steve's post that a leaner, meaner edition of VS is in the works, so waiting for that might just be an overall better solution. On 2/27/2016 16:27, Franklin? Lee wrote: > > For this particular case, is there someone generous enough (or, can > someone apply for a PSF grant) to ship Mathieu a DVD/two/flash drive? > > On Feb 26, 2016 12:18 PM, "Mathieu Dupuy" > wrote: > > Hi. > I am currently working on adding some functionality on a standard > library module (http://bugs.python.org/issue15873). The Python part > went fine, but now I have to do the C counterpart, and I have ran into > in several problems, which, stacked up, are a huge obstacle to easily > contribute further. Currently, despite I could work, I can't go > further > on my patch. > > I am currently working in very limited network, CPU and time > ressources* which are quite uncommon in the western world, but are > much less in the rest of the world. I have a 2GB/month mobile data > plan and a 100KB/s speed. For the C part of my patch, I should > download Visual Studio. The Express Edition 2015 is roughly 9GB. I > can't afford that. > > I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and > Fedora 23). Shortly, I couldn't get something working quickly and > simply (quickly = less than 2 hours, downloading time NOT included, > which is anyway way too already much). What went wrong and why it went > wrong could be a whole new thread and is outside of the scope of this > message. > Let me precise this : at my work I use many virtualbox instances > automatically fired and run in parallel to test new deployments and > run unittests. I like this tool, > but despite its simple look, it (most of the time) can not be used > simply by a profane. The concepts it requires you to understand are > not intuitive at first sight and there is *always* a thing that go > wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox > shipped for a moment a broken version of mount.vboxsf, preventing > sharing folder to mount. Despite it's fixed, the broken releases > spread everywhere and you may encounter them a lot in various Ubuntu > and Virtualbox version. I downloaded the last versions of both and I > am yet infected. https://www.virtualbox.org/ticket/12879). I could do > whole new thread on why you can't ask newcomers to use Virtualbox > (currently, at least). > > I ran into is a whole patch set to make CPython compile on MinGW > (https://bugs.python.org/issue3871#msg199695). But it is not denying > it's very experimental, and I know I would again spent useless hours > trying to get it work rather than joyfully improving Python, and > that's exactly what I do not want to happen. > > Getting ready to contribute to CPython pure python modules from an > standard, average mr-everyone Windows PC for a beginner-to-medium > contributor only require few megabytes of internet and few minutes > of his > time: getting a tarball of CPython sources (or cloning the github > CPython > mirror)**, a basic text editor and msys-git. The step further, if > doing > some -even basic- C code is required, implies downloading 9GB of > Visual > Studio and countless hours for it to be ready to use. > I think downloading the whole Visual Studio suite is a huge stopper to > contribute further for an average medium-or-below-contributor. > > I think (and I must not be the only one since CPython is to be moved > to github), that barriers to contribute to CPython should be set to > the lowest. > Of course my situation is a bit special but I think it represents > daily struggle of a *lot* of non-western programmer (at least for > limited internet)(even here in Australia, landline limited internet > connections are very common). > It's not a big deal if the MinGW result build is twenty time slower or > if some of the most advanced modules can't be build. But everyone > programmer should be able to easily make some C hacks and get them to > work. > > Hoping you'll be receptive to my pleas, > Cheers > > > * I am currently picking fruits in the regional Australia. I live > in a van > and have internet through with smartphone through an EDGE > connection. I can > plug the laptop in the farm but not in the van. > ** No fresh programmer use mercurial unless he has a gun pointed > on his > head. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Sat Feb 27 17:14:02 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Sat, 27 Feb 2016 17:14:02 -0500 Subject: [Python-Dev] Python should be easily compilable on Windows with MinGW In-Reply-To: References: <56D2168F.8070805@sdamon.com> Message-ID: <56D21FAA.7070400@sdamon.com> Theoretically yes. Practically, I think, but do not know for sure, it would have the same annoying issues as other SDK builds (of which, the most annoying is just different paths for the tools). Making that a supported build would require some of the same effort as supporting any other compiler though (a build bot configured to compile python this way). I also think, but am not sure, that what you linked is in fact the leaner meaner toolchain that Steve was referring to. On 2/27/2016 16:49, Chris Krycho wrote: > > Outsider/observer here; but is it not possible to build Python using > the VS *toolchain* (compiler, linker, etc.) outside of VS itself?i.e. > using MSBuild[1] and so on? That would remove the need for the full VS > install, and is *much* smaller (~800MB after installation, rather than > 9GB). A lean and mean VS will be a great improvement regardless, but > it seems like that would be a good intermediate solution if it?s possible. > > [1]: https://www.microsoft.com/en-us/download/details.aspx?id=49983 > > ? Chris > > On February 27, 2016 at 4:36:54 PM, Alexander Walters > (tritium-list at sdamon.com ) wrote: > >> The 9 gig initial download is not the only problem. Visual studio is >> very bandwidth hungry in day to day operations (between polling >> websites and vcs remotes, near constant updating, integration with >> the VS web service, etc.). You can of course shut all of that off, >> but it's a pain. It's my understanding from Steve's post that a >> leaner, meaner edition of VS is in the works, so waiting for that >> might just be an overall better solution. >> >> On 2/27/2016 16:27, Franklin? Lee wrote: >>> >>> For this particular case, is there someone generous enough (or, can >>> someone apply for a PSF grant) to ship Mathieu a DVD/two/flash drive? >>> >>> On Feb 26, 2016 12:18 PM, "Mathieu Dupuy" >> > wrote: >>> >>> Hi. >>> I am currently working on adding some functionality on a standard >>> library module (http://bugs.python.org/issue15873). The Python part >>> went fine, but now I have to do the C counterpart, and I have >>> ran into >>> in several problems, which, stacked up, are a huge obstacle to >>> easily >>> contribute further. Currently, despite I could work, I can't go >>> further >>> on my patch. >>> >>> I am currently working in very limited network, CPU and time >>> ressources* which are quite uncommon in the western world, but are >>> much less in the rest of the world. I have a 2GB/month mobile data >>> plan and a 100KB/s speed. For the C part of my patch, I should >>> download Visual Studio. The Express Edition 2015 is roughly 9GB. I >>> can't afford that. >>> >>> I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and >>> Fedora 23). Shortly, I couldn't get something working quickly and >>> simply (quickly = less than 2 hours, downloading time NOT included, >>> which is anyway way too already much). What went wrong and why >>> it went >>> wrong could be a whole new thread and is outside of the scope of >>> this >>> message. >>> Let me precise this : at my work I use many virtualbox instances >>> automatically fired and run in parallel to test new deployments and >>> run unittests. I like this tool, >>> but despite its simple look, it (most of the time) can not be used >>> simply by a profane. The concepts it requires you to understand are >>> not intuitive at first sight and there is *always* a thing that go >>> wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox >>> shipped for a moment a broken version of mount.vboxsf, preventing >>> sharing folder to mount. Despite it's fixed, the broken releases >>> spread everywhere and you may encounter them a lot in various Ubuntu >>> and Virtualbox version. I downloaded the last versions of both and I >>> am yet infected. https://www.virtualbox.org/ticket/12879). I >>> could do >>> whole new thread on why you can't ask newcomers to use Virtualbox >>> (currently, at least). >>> >>> I ran into is a whole patch set to make CPython compile on MinGW >>> (https://bugs.python.org/issue3871#msg199695). But it is not denying >>> it's very experimental, and I know I would again spent useless hours >>> trying to get it work rather than joyfully improving Python, and >>> that's exactly what I do not want to happen. >>> >>> Getting ready to contribute to CPython pure python modules from an >>> standard, average mr-everyone Windows PC for a beginner-to-medium >>> contributor only require few megabytes of internet and few >>> minutes of his >>> time: getting a tarball of CPython sources (or cloning the >>> github CPython >>> mirror)**, a basic text editor and msys-git. The step further, >>> if doing >>> some -even basic- C code is required, implies downloading 9GB of >>> Visual >>> Studio and countless hours for it to be ready to use. >>> I think downloading the whole Visual Studio suite is a huge >>> stopper to >>> contribute further for an average medium-or-below-contributor. >>> >>> I think (and I must not be the only one since CPython is to be moved >>> to github), that barriers to contribute to CPython should be set to >>> the lowest. >>> Of course my situation is a bit special but I think it represents >>> daily struggle of a *lot* of non-western programmer (at least for >>> limited internet)(even here in Australia, landline limited internet >>> connections are very common). >>> It's not a big deal if the MinGW result build is twenty time >>> slower or >>> if some of the most advanced modules can't be build. But everyone >>> programmer should be able to easily make some C hacks and get >>> them to >>> work. >>> >>> Hoping you'll be receptive to my pleas, >>> Cheers >>> >>> >>> * I am currently picking fruits in the regional Australia. I >>> live in a van >>> and have internet through with smartphone through an EDGE >>> connection. I can >>> plug the laptop in the farm but not in the van. >>> ** No fresh programmer use mercurial unless he has a gun pointed >>> on his >>> head. >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com >>> >>> >>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe:https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/chris%40chriskrycho.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Sat Feb 27 17:21:01 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Sat, 27 Feb 2016 17:21:01 -0500 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> Message-ID: <56D2214D.9000304@sdamon.com> Can we even ask github to pull it down and reasonably expect them to comply? Their entire model is built on everyone forking everyone else. On 2/27/2016 06:25, Mathieu Dupuy wrote: > Ahah. Obtaining his electronic coordinates like email to gently ask > him to pull it down by himself (otherwise we open fire). Because > having Github suddenly destroying the repo, even though the man > probably forgot about its existence might be a bit rude from the > polite people python developers are. > > 2016-02-16 4:27 UTC+08:00, Oleg Broytman : >> On Tue, Feb 16, 2016 at 09:16:56AM +1300, Greg Ewing >> wrote: >>> Mathieu Dupuy wrote: >>>> A python representative (like Guido himself) should contact Github to >>>> obtain coordinates of the owner... >>> ...and then order a drone strike on him? >> Yes, and then pry the repo from his cold dead fingers. >> >> Well, I hope prying can be done without striking first. ;-) >> >>> -- >>> Greg >> Oleg. >> -- >> Oleg Broytman http://phdru.name/ phd at phdru.name >> Programmers don't die, they just GOSUB without RETURN. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/deronnax%40gmail.com >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com From bussonniermatthias at gmail.com Sat Feb 27 17:45:21 2016 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sat, 27 Feb 2016 14:45:21 -0800 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: <56D2214D.9000304@sdamon.com> References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> Message-ID: <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> Hi all, > On Feb 27, 2016, at 14:21, Alexander Walters wrote: > > Can we even ask github to pull it down and reasonably expect them to comply? Their entire model is built on everyone forking everyone else. Why the model is everyone forking, some of the help page of GitHub actually tell you to contact GitHub support, like if you desire to "detach" a fork. Every reasonable requests I made to GitHub and the few interactions I had with the support always went well. This did include asking GitHub to contact user as their pages were confusing, and might be misleading others. So I would suggest 1) asking GitHub to contact author, potentially forwarding him/her a message from this list asking him/her to bring that down or transfer the control to you. That should be easy to do as it will not force GitHub to provide anyone with the emails of the the owner of python-git. 2) in the case of no response from author ask politely GitHub that the repo is confusing for user, and ask what they can do about that. 3) If still nothing can be done make a DMCA request. You can likely argue that the logo/name are used without PSF content. https://help.github.com/articles/dmca-takedown-policy/ This would likely have more impact if sent from someone part of https://github.com/python -- M -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianlee1521 at gmail.com Sat Feb 27 17:45:47 2016 From: ianlee1521 at gmail.com (Ian Lee) Date: Sat, 27 Feb 2016 14:45:47 -0800 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: <56D2214D.9000304@sdamon.com> References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> Message-ID: <9F5B3487-FE2E-4B28-A375-DF12F519855A@gmail.com> > On Feb 27, 2016, at 14:21, Alexander Walters wrote: > > Can we even ask github to pull it down and reasonably expect them to comply? Their entire model is built on everyone forking everyone else. As a data point ? I had a pretty good experience with GitHub helping me out when I was trying to reclaim an organization using my company name. In that case it turned out that they just gave me the contact for the person and I worked it out from there, but it?d seemed like they were willing to take a more? forceful approach if it was needed. Perhaps the better / easier solution is to promote the *real* ?Sem-official read-only mirror of the Python Mercurial repository? [1] ? And perhaps this goes away entirely (in time) with PEP-512 [2]? [1] https://github.com/python/cpython [2] https://www.python.org/dev/peps/pep-0512/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Sat Feb 27 18:32:45 2016 From: steve.dower at python.org (Steve Dower) Date: Sat, 27 Feb 2016 15:32:45 -0800 Subject: [Python-Dev] Python should be easily compilable on Windows withMinGW In-Reply-To: <56D21FAA.7070400@sdamon.com> References: <56D2168F.8070805@sdamon.com> <56D21FAA.7070400@sdamon.com> Message-ID: Yep, that link is part of what I was talking about, though really it's one of a few experiments we're working on right now for making the build tools more accessible. I'm not sure it is currently sufficient for building CPython, but that's why I'm working with the team on these - what is eventually settled on should support all of the cases we care about here. Top-posted from my Windows Phone -----Original Message----- From: "Alexander Walters" Sent: ?2/?27/?2016 14:16 To: "Chris Krycho" ; "python-dev at python.org" Subject: Re: [Python-Dev] Python should be easily compilable on Windows withMinGW Theoretically yes. Practically, I think, but do not know for sure, it would have the same annoying issues as other SDK builds (of which, the most annoying is just different paths for the tools). Making that a supported build would require some of the same effort as supporting any other compiler though (a build bot configured to compile python this way). I also think, but am not sure, that what you linked is in fact the leaner meaner toolchain that Steve was referring to. On 2/27/2016 16:49, Chris Krycho wrote: Outsider/observer here; but is it not possible to build Python using the VS *toolchain* (compiler, linker, etc.) outside of VS itself?i.e. using MSBuild[1] and so on? That would remove the need for the full VS install, and is *much* smaller (~800MB after installation, rather than 9GB). A lean and mean VS will be a great improvement regardless, but it seems like that would be a good intermediate solution if it?s possible. [1]: https://www.microsoft.com/en-us/download/details.aspx?id=49983 ? Chris On February 27, 2016 at 4:36:54 PM, Alexander Walters (tritium-list at sdamon.com) wrote: The 9 gig initial download is not the only problem. Visual studio is very bandwidth hungry in day to day operations (between polling websites and vcs remotes, near constant updating, integration with the VS web service, etc.). You can of course shut all of that off, but it's a pain. It's my understanding from Steve's post that a leaner, meaner edition of VS is in the works, so waiting for that might just be an overall better solution. On 2/27/2016 16:27, Franklin? Lee wrote: For this particular case, is there someone generous enough (or, can someone apply for a PSF grant) to ship Mathieu a DVD/two/flash drive? On Feb 26, 2016 12:18 PM, "Mathieu Dupuy" wrote: Hi. I am currently working on adding some functionality on a standard library module (http://bugs.python.org/issue15873). The Python part went fine, but now I have to do the C counterpart, and I have ran into in several problems, which, stacked up, are a huge obstacle to easily contribute further. Currently, despite I could work, I can't go further on my patch. I am currently working in very limited network, CPU and time ressources* which are quite uncommon in the western world, but are much less in the rest of the world. I have a 2GB/month mobile data plan and a 100KB/s speed. For the C part of my patch, I should download Visual Studio. The Express Edition 2015 is roughly 9GB. I can't afford that. I downloaded Virtualbox and two Linux netinstall (Ubuntu 15.10 and Fedora 23). Shortly, I couldn't get something working quickly and simply (quickly = less than 2 hours, downloading time NOT included, which is anyway way too already much). What went wrong and why it went wrong could be a whole new thread and is outside of the scope of this message. Let me precise this : at my work I use many virtualbox instances automatically fired and run in parallel to test new deployments and run unittests. I like this tool, but despite its simple look, it (most of the time) can not be used simply by a profane. The concepts it requires you to understand are not intuitive at first sight and there is *always* a thing that go wrong (guest additions, mostly).(for example : Ubuntu and Virtualbox shipped for a moment a broken version of mount.vboxsf, preventing sharing folder to mount. Despite it's fixed, the broken releases spread everywhere and you may encounter them a lot in various Ubuntu and Virtualbox version. I downloaded the last versions of both and I am yet infected. https://www.virtualbox.org/ticket/12879). I could do whole new thread on why you can't ask newcomers to use Virtualbox (currently, at least). I ran into is a whole patch set to make CPython compile on MinGW (https://bugs.python.org/issue3871#msg199695). But it is not denying it's very experimental, and I know I would again spent useless hours trying to get it work rather than joyfully improving Python, and that's exactly what I do not want to happen. Getting ready to contribute to CPython pure python modules from an standard, average mr-everyone Windows PC for a beginner-to-medium contributor only require few megabytes of internet and few minutes of his time: getting a tarball of CPython sources (or cloning the github CPython mirror)**, a basic text editor and msys-git. The step further, if doing some -even basic- C code is required, implies downloading 9GB of Visual Studio and countless hours for it to be ready to use. I think downloading the whole Visual Studio suite is a huge stopper to contribute further for an average medium-or-below-contributor. I think (and I must not be the only one since CPython is to be moved to github), that barriers to contribute to CPython should be set to the lowest. Of course my situation is a bit special but I think it represents daily struggle of a *lot* of non-western programmer (at least for limited internet)(even here in Australia, landline limited internet connections are very common). It's not a big deal if the MinGW result build is twenty time slower or if some of the most advanced modules can't be build. But everyone programmer should be able to easily make some C hacks and get them to work. Hoping you'll be receptive to my pleas, Cheers * I am currently picking fruits in the regional Australia. I live in a van and have internet through with smartphone through an EDGE connection. I can plug the laptop in the farm but not in the van. ** No fresh programmer use mercurial unless he has a gun pointed on his head. _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris%40chriskrycho.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From senthil at uthcode.com Sat Feb 27 18:41:28 2016 From: senthil at uthcode.com (Senthil Kumaran) Date: Sat, 27 Feb 2016 15:41:28 -0800 (PST) Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: <9F5B3487-FE2E-4B28-A375-DF12F519855A@gmail.com> References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> <9F5B3487-FE2E-4B28-A375-DF12F519855A@gmail.com> Message-ID: <7ld3q1c1lqbw7rqtnpa0wvfix-0@mailer.nylas.com> > On Feb 27 2016, at 2:47 pm, Ian Lee <ianlee1521 at gmail.com> wrote: > > Perhaps the better / easier solution is to promote the *real* ?Sem-official read-only mirror of the Python Mercurial repository? [1] ? And perhaps this goes away entirely (in time) with PEP-512 [2]? We will be working to promote the github repo, once the migration and PEP-512 is complete. Promoting semi-official repo in the interim (as opposed the active one in hg.python.org) does not seem like a good idea. This thread about claiming ownership of look-alike repo and we could concentrate our discussion on that alone. FWIW, that old look-alike (python-dev) repo as been in existence for years now and it has not caused any confusion. Once python moves to github, I think, we can ask for some logo or some kind of validation that will help users easily identify the originality. Thanks, Senthil -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Sun Feb 28 12:46:36 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 28 Feb 2016 18:46:36 +0100 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> Message-ID: On 02/27/2016 11:45 PM, Matthias Bussonnier wrote: > Hi all, > > >> On Feb 27, 2016, at 14:21, Alexander Walters > > wrote: >> >> Can we even ask github to pull it down and reasonably expect them to comply? >> Their entire model is built on everyone forking everyone else. > > Why the model is everyone forking, some of the help page of GitHub actually tell > you to contact GitHub support, like if you desire to "detach" a fork. > > Every reasonable requests I made to GitHub and the few interactions I had with > the support always went well. > This did include asking GitHub to contact user as their pages were confusing, > and might be misleading others. > > So I would suggest > > 1) asking GitHub to contact author, potentially forwarding him/her a message > from this list asking him/her to bring that down or transfer the control to you. > That should be easy to do as it will not force GitHub to provide anyone with the > emails of the the owner of python-git. > > 2) in the case of no response from author ask politely GitHub that the repo is > confusing for user, and ask what they can do about that. These are both fine. Although I don't see much confusion; there's bound to be hundreds of forks of CPython, if not already, then definitely once we move to GitHub. > 3) If still nothing can be done make a DMCA request. You can likely argue that > the logo/name are used without PSF content. > https://help.github.com/articles/dmca-takedown-policy/ Please no. There is absolutely no call using such a blunt instrument, just for a case of minor inconvenience. It could also be blown up into a PR disaster, probably rightly so. cheers, Georg From rosuav at gmail.com Sun Feb 28 12:58:24 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Feb 2016 04:58:24 +1100 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> Message-ID: On Mon, Feb 29, 2016 at 4:46 AM, Georg Brandl wrote: > Although I don't see much confusion; there's bound to be > hundreds of forks of CPython, if not already, then definitely once we move to > GitHub. Forks made within the GitHub interface aren't usually confusing. Up the top of this repo, you can see where its upstream is, and therefore where you would go to find the official version of this project: https://github.com/Rosuav/appension So a fork-esque that predates the official repo is a different beast. +1 for asking GitHub to contact the owner, since it's not intrinsically obvious who's maintaining that. ChrisA From brett at python.org Sun Feb 28 13:07:20 2016 From: brett at python.org (Brett Cannon) Date: Sun, 28 Feb 2016 18:07:20 +0000 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> Message-ID: On Sun, 28 Feb 2016 at 09:58 Chris Angelico wrote: > On Mon, Feb 29, 2016 at 4:46 AM, Georg Brandl wrote: > > Although I don't see much confusion; there's bound to be > > hundreds of forks of CPython, if not already, then definitely once we > move to > > GitHub. > > Forks made within the GitHub interface aren't usually confusing. Up > the top of this repo, you can see where its upstream is, and therefore > where you would go to find the official version of this project: > > https://github.com/Rosuav/appension > > So a fork-esque that predates the official repo is a different beast. > +1 for asking GitHub to contact the owner, since it's not > intrinsically obvious who's maintaining that. > Since this isn't being pushy I'm +1 as well. But who's going to ask? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Feb 28 13:18:54 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Feb 2016 05:18:54 +1100 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> Message-ID: On Mon, Feb 29, 2016 at 5:07 AM, Brett Cannon wrote: > > > On Sun, 28 Feb 2016 at 09:58 Chris Angelico wrote: >> >> On Mon, Feb 29, 2016 at 4:46 AM, Georg Brandl wrote: >> > Although I don't see much confusion; there's bound to be >> > hundreds of forks of CPython, if not already, then definitely once we >> > move to >> > GitHub. >> >> Forks made within the GitHub interface aren't usually confusing. Up >> the top of this repo, you can see where its upstream is, and therefore >> where you would go to find the official version of this project: >> >> https://github.com/Rosuav/appension >> >> So a fork-esque that predates the official repo is a different beast. >> +1 for asking GitHub to contact the owner, since it's not >> intrinsically obvious who's maintaining that. > > > Since this isn't being pushy I'm +1 as well. But who's going to ask? Someone who has the authority to represent Python, I hope. A member of the PSF board? ChrisA From pmiscml at gmail.com Sun Feb 28 13:18:54 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Sun, 28 Feb 2016 20:18:54 +0200 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> Message-ID: <20160228201854.0a7262c7@x230> Hello, On Sun, 28 Feb 2016 18:46:36 +0100 Georg Brandl wrote: [] > > 3) If still nothing can be done make a DMCA request. You can likely > > argue that the logo/name are used without PSF content. > > https://help.github.com/articles/dmca-takedown-policy/ > > Please no. There is absolutely no call using such a blunt > instrument, just for a case of minor inconvenience. It could also be > blown up into a PR disaster, probably rightly so. I can't believe my eyes that I read such a thread already. The poor repo clearly states it's unofficial mirror. Some dudes without much clue (*1) submit pull requests against it. So what - someone getting jealous? Well deserved - there could have been support for the leading version control system and very popular hosting site long, long ago. But well, if you want those pull requests, go and add friendly note to each along the lines "Hi, you submitted your PR against unattended, unofficial mirror, there's now official Py repo at ..., we encourage you to resubmit your patch against it". Nope, shutdown/exterminate. *1: See username of the submitter of https://github.com/python-git/python/issues/12 > > cheers, > Georg -- Best regards, Paul mailto:pmiscml at gmail.com From mal at egenix.com Sun Feb 28 15:15:52 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 28 Feb 2016 21:15:52 +0100 Subject: [Python-Dev] Very old git mirror under github user "python-git" In-Reply-To: References: <56C23238.3000306@canterbury.ac.nz> <20160215202720.GA27790@phdru.name> <56D2214D.9000304@sdamon.com> <163272AD-1FFB-466D-AA8E-944273DD4CBF@gmail.com> Message-ID: <56D35578.6060906@egenix.com> On 28.02.2016 18:46, Georg Brandl wrote: > On 02/27/2016 11:45 PM, Matthias Bussonnier wrote: >> Hi all, >> >> >>> On Feb 27, 2016, at 14:21, Alexander Walters >> > wrote: >>> >>> Can we even ask github to pull it down and reasonably expect them to comply? >>> Their entire model is built on everyone forking everyone else. >> >> Why the model is everyone forking, some of the help page of GitHub actually tell >> you to contact GitHub support, like if you desire to "detach" a fork. >> >> Every reasonable requests I made to GitHub and the few interactions I had with >> the support always went well. >> This did include asking GitHub to contact user as their pages were confusing, >> and might be misleading others. >> >> So I would suggest >> >> 1) asking GitHub to contact author, potentially forwarding him/her a message >> from this list asking him/her to bring that down or transfer the control to you. >> That should be easy to do as it will not force GitHub to provide anyone with the >> emails of the the owner of python-git. >> >> 2) in the case of no response from author ask politely GitHub that the repo is >> confusing for user, and ask what they can do about that. > > These are both fine. Although I don't see much confusion; there's bound to be > hundreds of forks of CPython, if not already, then definitely once we move to > GitHub. > >> 3) If still nothing can be done make a DMCA request. You can likely argue that >> the logo/name are used without PSF content. >> https://help.github.com/articles/dmca-takedown-policy/ > > Please no. There is absolutely no call using such a blunt instrument, just for > a case of minor inconvenience. It could also be blown up into a PR disaster, > probably rightly so. I frankly don't understand what all the fuzz is about. The repo in question hasn't been touched in 7 years. It refers to Python 2.7 alpha 0. It also clearly reads "Unofficial Python SVN auto-updating mirror", so there's no confusion either. The talk about DMCA requests really doesn't apply. Python is open-source. Anyone can fork it, at any version they like, as long as the license is respected. The trademark use is also perfectly in line with our TM policy. The logo is a bit blurred, but that's really the only nit I could find. Asking the owner to take the repo down is still a good thought, but there's definitely nothing wrong with it per se. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 28 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2016-02-19: Released eGenix PyRun 2.1.2 ... http://egenix.com/go88 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From brian.cain at gmail.com Sun Feb 28 22:44:39 2016 From: brian.cain at gmail.com (Brian Cain) Date: Sun, 28 Feb 2016 21:44:39 -0600 Subject: [Python-Dev] [ANNOUNCE] fuzzpy Message-ID: ################################################################## *---------------------------------------------------* * fuzzpy: CPython fuzz tester is now available * * * * Version 0.8 * * https://bitbucket.org/ebadf/fuzzpy/ * *---------------------------------------------------* I am pleased to announce the creation of a coverage-guided fuzz tester for CPython. It's a pretty small wrapper around LLVM's libFuzzer that enables some powerful testing logic. AFL (American Fuzzy Lop) is another popular fuzzer lately -- libFuzzer is very similar in concept to AFL. From what I've read on list archives, Victor Stinner had previously done some good fuzz testing on CPython using fusil. This project should expand on that concept. I'd love to get feedback, suggestions, patches and anything else the list can offer. Q: What is fuzzpy for? A: It's primarily for testing CPython itself, but could also be used for individual python projects too. Pure-python projects will be the simplest to integrate at this point. Also, interesting test cases output by fuzzpy may end up being useful in testing others such as pypy, pyston, etc. Q: What is a fuzz tester? A: It modifies inputs to a test case in order to find unique/rare failures. Q: What does "coverage-guided" mean? A: It means that libFuzzer is able to witness the specific code executed as a result of a given test case. It feeds this information back into an engine to modify the test cases to optimize for coverage. Q: How can I help? A1: donate cycles: build the project and crank away on one of the existing tests. Relative to other common fuzzing, it's awfully slow, so consider throwing as many cycles as you can afford to. A2: contribute tests: write a ~10-line python script that exercises a feature that you think could benefit from fuzz testing. A3: if there's interest, I can accept cryptocoin donations to purchase cycles on a cloud server. ################################################################## -- -Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: