From rndblnch at gmail.com Wed Jun 1 15:31:47 2016 From: rndblnch at gmail.com (rndblnch) Date: Wed, 1 Jun 2016 19:31:47 +0000 (UTC) Subject: [Python-Dev] Adding NewType() to PEP 484 References: <38cb015b-2d32-b9a7-d5b7-eef312eb4fa7@g.nevcal.com> Message-ID: Nick Coghlan gmail.com> writes: > On 31 May 2016 3:12 pm, "Glenn Linderman" g.nevcal.com> wrote: > > On 5/31/2016 12:55 PM, rndblnch wrote: > >> Guido van Rossum gmail.com> writes: > >> > >>> > >>> Also -- the most important thing.? What to call these things? [...] > > Interesting! Prior art. And parallel type isn't a bad name... > If I heard "parallel type", I'd assume it had something to do with parallel processing. sure, it was 15 years ago, parallel processing was not so widely widespread. but looking at synonyms for parallel, i stumbed upon: counterpart, analog, miror, etc. and then from here: countertype ... my 2 cents. renaud [...] > Cheers, > Nick. From guido at python.org Wed Jun 1 16:59:57 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Jun 2016 13:59:57 -0700 Subject: [Python-Dev] Adding NewType() to PEP 484 In-Reply-To: References: <38cb015b-2d32-b9a7-d5b7-eef312eb4fa7@g.nevcal.com> Message-ID: Unless Jukka objects I am going with "distinct type" when discussing the feature but NewType() in code. -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Jun 1 20:44:40 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Jun 2016 17:44:40 -0700 Subject: [Python-Dev] Adding NewType() to PEP 484 In-Reply-To: References: <38cb015b-2d32-b9a7-d5b7-eef312eb4fa7@g.nevcal.com> Message-ID: Everyone on the mypy team has a different opinion so the search is on. :-( On Wed, Jun 1, 2016 at 5:37 PM, Hai Nguyen wrote: > I am +1 for DistinctType (vs others) (no specific reason, just read out > loud). > > Hai > > On Wednesday, June 1, 2016, Guido van Rossum wrote: >> >> Unless Jukka objects I am going with "distinct type" when discussing >> the feature but NewType() in code. >> >> -- >> --Guido van Rossum (python.org/~guido) >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/nhai.qn%40gmail.com -- --Guido van Rossum (python.org/~guido) From nhai.qn at gmail.com Wed Jun 1 20:37:20 2016 From: nhai.qn at gmail.com (Hai Nguyen) Date: Wed, 1 Jun 2016 20:37:20 -0400 Subject: [Python-Dev] Adding NewType() to PEP 484 In-Reply-To: References: <38cb015b-2d32-b9a7-d5b7-eef312eb4fa7@g.nevcal.com> Message-ID: I am +1 for DistinctType (vs others) (no specific reason, just read out loud). Hai On Wednesday, June 1, 2016, Guido van Rossum wrote: > Unless Jukka objects I am going with "distinct type" when discussing > the feature but NewType() in code. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/nhai.qn%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mafagafogigante at gmail.com Wed Jun 1 20:50:07 2016 From: mafagafogigante at gmail.com (Bernardo Sulzbach) Date: Wed, 1 Jun 2016 21:50:07 -0300 Subject: [Python-Dev] Adding NewType() to PEP 484 In-Reply-To: References: <38cb015b-2d32-b9a7-d5b7-eef312eb4fa7@g.nevcal.com> Message-ID: <472a3418-6030-00a9-73b2-a0c9b0afee6f@gmail.com> On 06/01/2016 09:44 PM, Guido van Rossum wrote: > Everyone on the mypy team has a different opinion so the search is on. :-( > > On Wed, Jun 1, 2016 at 5:37 PM, Hai Nguyen wrote: >> I am +1 for DistinctType (vs others) (no specific reason, just read out >> loud). >> At least on this thread it seems like (I haven't counted) that distinct type [alias] is the preferred option. From guido at python.org Wed Jun 1 21:04:00 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Jun 2016 18:04:00 -0700 Subject: [Python-Dev] Adding NewType() to PEP 484 In-Reply-To: <472a3418-6030-00a9-73b2-a0c9b0afee6f@gmail.com> References: <38cb015b-2d32-b9a7-d5b7-eef312eb4fa7@g.nevcal.com> <472a3418-6030-00a9-73b2-a0c9b0afee6f@gmail.com> Message-ID: I've merged this into PEP 484 now. The informal term used there is actually "unique type" which is fine. End of discussion please. On Wed, Jun 1, 2016 at 5:50 PM, Bernardo Sulzbach wrote: > On 06/01/2016 09:44 PM, Guido van Rossum wrote: >> >> Everyone on the mypy team has a different opinion so the search is on. :-( >> >> On Wed, Jun 1, 2016 at 5:37 PM, Hai Nguyen wrote: >>> >>> I am +1 for DistinctType (vs others) (no specific reason, just read out >>> loud). >>> > > At least on this thread it seems like (I haven't counted) that distinct type > [alias] is the preferred option. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From jake at lwn.net Thu Jun 2 20:39:59 2016 From: jake at lwn.net (Jake Edge) Date: Thu, 2 Jun 2016 18:39:59 -0600 Subject: [Python-Dev] Start of the Python Language Summit coverage at LWN Message-ID: <20160602183959.4728902e@chukar.edge2.net> Howdy python-dev, I was able to sit in on the Python Language Summit again this year (thanks Larry and Barry!) and have some of the coverage available for your viewing pleasure now. The starting point is here: https://lwn.net/Articles/688969/ (or here for non-subscribers: https://lwn.net/SubscriberLink/688969/91cbeeaf32807914/ ) So far, I have written up the first three sessions. The rest will be coming in over the next week or so and be added to the page above (and will also appear in next week's weekly edition). The future of the ssl module: https://lwn.net/Articles/688974/ https://lwn.net/SubscriberLink/688974/31cfa9f818c834e1/ Twisted and Python 3: https://lwn.net/Articles/689068/ https://lwn.net/SubscriberLink/689068/34b68a2aea6ddd2d/ Gilectomy: https://lwn.net/Articles/689548/ https://lwn.net/SubscriberLink/689548/4328423f85a47679/ The articles will be freely available (without using the SubscriberLink) to the world at large in a week (and the next batch the week after that) ... until then, feel free to share the SubscriberLinks Hopefully I have captured things reasonably well. If there are corrections or clarifications needed, though, I recommend posting them as comments on the article. enjoy! jake -- Jake Edge - LWN - jake at lwn.net - http://lwn.net From tjreedy at udel.edu Thu Jun 2 23:26:43 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 2 Jun 2016 23:26:43 -0400 Subject: [Python-Dev] Start of the Python Language Summit coverage at LWN In-Reply-To: <20160602183959.4728902e@chukar.edge2.net> References: <20160602183959.4728902e@chukar.edge2.net> Message-ID: On 6/2/2016 8:39 PM, Jake Edge wrote: > > Howdy python-dev, > > I was able to sit in on the Python Language Summit again this year > (thanks Larry and Barry!) and have some of the coverage available for > your viewing pleasure now. > > The starting point is here: https://lwn.net/Articles/688969/ > (or here for non-subscribers: > https://lwn.net/SubscriberLink/688969/91cbeeaf32807914/ ) > > So far, I have written up the first three sessions. The rest will be > coming in over the next week or so and be added to the page above Thank you. Please continue posting the individual SubscriberLinks here as the page above does not have them. > will also appear in next week's weekly edition). > > The future of the ssl module: https://lwn.net/Articles/688974/ > https://lwn.net/SubscriberLink/688974/31cfa9f818c834e1/ > > Twisted and Python 3: https://lwn.net/Articles/689068/ > https://lwn.net/SubscriberLink/689068/34b68a2aea6ddd2d/ > > Gilectomy: https://lwn.net/Articles/689548/ > https://lwn.net/SubscriberLink/689548/4328423f85a47679/ > > The articles will be freely available (without using the > SubscriberLink) to the world at large in a week (and the next batch the > week after that) ... until then, feel free to share the SubscriberLinks > > Hopefully I have captured things reasonably well. If there are > corrections or clarifications needed, though, I recommend posting them > as comments on the article. -- Terry Jan Reedy From status at bugs.python.org Fri Jun 3 12:08:43 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 3 Jun 2016 18:08:43 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160603160843.1677A5688D@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-05-27 - 2016-06-03) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5537 ( +8) closed 33416 (+52) total 38953 (+60) Open issues with patches: 2416 Issues opened (41) ================== #22331: test_io.test_interrupted_write_text() hangs on the buildbot Fr http://bugs.python.org/issue22331 reopened by martin.panter #27137: Python implementation of `functools.partial` is not a class http://bugs.python.org/issue27137 opened by ebarry #27139: Increased test coverage for statistics.median_grouped http://bugs.python.org/issue27139 opened by juliojr77 #27140: Opcode for creating dict with constant keys http://bugs.python.org/issue27140 opened by serhiy.storchaka #27141: Fix collections.UserList shallow copy http://bugs.python.org/issue27141 opened by bar.harel #27142: Default int value with xmlrpclib / xmlrpc.client http://bugs.python.org/issue27142 opened by julienc #27144: concurrent.futures.as_completed() memory inefficiency http://bugs.python.org/issue27144 opened by grzgrzgrz3 #27145: long_add and long_sub might return a new int where &small_ints http://bugs.python.org/issue27145 opened by Oren Milman #27149: Implement socket.sendmsg() for Windows http://bugs.python.org/issue27149 opened by mmarkk #27150: PEP446 (CLOEXEC by default) violation with fcntl.fcntl(..., fc http://bugs.python.org/issue27150 opened by mmarkk #27151: multiprocessing.Process leaves read pipes open (Process.sentin http://bugs.python.org/issue27151 opened by Roman Bolshakov #27152: Additional assert methods for unittest http://bugs.python.org/issue27152 opened by serhiy.storchaka #27154: Regression in file.writelines behavior http://bugs.python.org/issue27154 opened by snaury #27156: IDLE: remove unused code http://bugs.python.org/issue27156 opened by terry.reedy #27157: Unhelpful error message when one calls a subclass of type with http://bugs.python.org/issue27157 opened by ppperry #27161: Confusing exception in Path().with_name http://bugs.python.org/issue27161 opened by Antony.Lee #27162: Add idlelib.interface module http://bugs.python.org/issue27162 opened by terry.reedy #27163: IDLE entry for What's New in Python 3.6 http://bugs.python.org/issue27163 opened by terry.reedy #27164: zlib can't decompress DEFLATE using shared dictionary http://bugs.python.org/issue27164 opened by Vladimir Mihailenco #27165: Skip callables when displaying exception fields in cgitb http://bugs.python.org/issue27165 opened by Adam.Biela??ski #27167: subprocess reports signal as negative exit status, not documen http://bugs.python.org/issue27167 opened by dmacnet #27168: Yury isn't sure comprehensions and await interact correctly http://bugs.python.org/issue27168 opened by njs #27169: __debug__ is not optimized out at compile time for anything bu http://bugs.python.org/issue27169 opened by josh.r #27170: IDLE: remove Toggle Auto Coloring or add to edit menu & doc http://bugs.python.org/issue27170 opened by terry.reedy #27172: Add skip_bound_arg argument to inspect.Signature.from_callable http://bugs.python.org/issue27172 opened by ryan.petrello #27173: Modern Unix key bindings for IDLE http://bugs.python.org/issue27173 opened by serhiy.storchaka #27175: Unpickling Path objects http://bugs.python.org/issue27175 opened by Antony.Lee #27177: re match.group should support __index__ http://bugs.python.org/issue27177 opened by jdemeyer #27179: subprocess uses wrong encoding on Windows http://bugs.python.org/issue27179 opened by davispuh #27180: Doc/pathlib: Please describe the behaviour of Path().rename() http://bugs.python.org/issue27180 opened by hashimo #27181: Add geometric mean to `statistics` module http://bugs.python.org/issue27181 opened by cool-RR #27182: PEP 519 support in the stdlib http://bugs.python.org/issue27182 opened by ethan.furman #27184: Support path objects in the ntpath module http://bugs.python.org/issue27184 opened by ethan.furman #27186: add os.fspath() http://bugs.python.org/issue27186 opened by ethan.furman #27187: Relax __all__ location requirement in PEP 8 http://bugs.python.org/issue27187 opened by barry #27188: sqlite3 execute* methods return value not documented http://bugs.python.org/issue27188 opened by Dave Sawyer #27189: configure --with-lto with clang should find the appropriate ll http://bugs.python.org/issue27189 opened by gregory.p.smith #27190: Check sqlite3_version before allowing check_same_thread = Fals http://bugs.python.org/issue27190 opened by Dave Sawyer #27194: Tarfile superfluous truncate calls slows extraction. http://bugs.python.org/issue27194 opened by fried #27195: Crash when RawIOBase.write(b) evaluates b.format http://bugs.python.org/issue27195 opened by martin.panter #27196: Eliminate 'ThemeChanged' warning when running IDLE tests http://bugs.python.org/issue27196 opened by terry.reedy Most recent 15 issues with no replies (15) ========================================== #27195: Crash when RawIOBase.write(b) evaluates b.format http://bugs.python.org/issue27195 #27189: configure --with-lto with clang should find the appropriate ll http://bugs.python.org/issue27189 #27188: sqlite3 execute* methods return value not documented http://bugs.python.org/issue27188 #27180: Doc/pathlib: Please describe the behaviour of Path().rename() http://bugs.python.org/issue27180 #27175: Unpickling Path objects http://bugs.python.org/issue27175 #27168: Yury isn't sure comprehensions and await interact correctly http://bugs.python.org/issue27168 #27165: Skip callables when displaying exception fields in cgitb http://bugs.python.org/issue27165 #27163: IDLE entry for What's New in Python 3.6 http://bugs.python.org/issue27163 #27162: Add idlelib.interface module http://bugs.python.org/issue27162 #27151: multiprocessing.Process leaves read pipes open (Process.sentin http://bugs.python.org/issue27151 #27144: concurrent.futures.as_completed() memory inefficiency http://bugs.python.org/issue27144 #27139: Increased test coverage for statistics.median_grouped http://bugs.python.org/issue27139 #27123: Allow `install_headers` command to follow specific directory s http://bugs.python.org/issue27123 #27121: imghdr does not support jpg files with Lavc bytes http://bugs.python.org/issue27121 #27115: IDLE/tkinter: in simpledialog, != [OK] click http://bugs.python.org/issue27115 Most recent 15 issues waiting for review (15) ============================================= #27194: Tarfile superfluous truncate calls slows extraction. http://bugs.python.org/issue27194 #27190: Check sqlite3_version before allowing check_same_thread = Fals http://bugs.python.org/issue27190 #27186: add os.fspath() http://bugs.python.org/issue27186 #27179: subprocess uses wrong encoding on Windows http://bugs.python.org/issue27179 #27177: re match.group should support __index__ http://bugs.python.org/issue27177 #27173: Modern Unix key bindings for IDLE http://bugs.python.org/issue27173 #27172: Add skip_bound_arg argument to inspect.Signature.from_callable http://bugs.python.org/issue27172 #27165: Skip callables when displaying exception fields in cgitb http://bugs.python.org/issue27165 #27164: zlib can't decompress DEFLATE using shared dictionary http://bugs.python.org/issue27164 #27161: Confusing exception in Path().with_name http://bugs.python.org/issue27161 #27157: Unhelpful error message when one calls a subclass of type with http://bugs.python.org/issue27157 #27152: Additional assert methods for unittest http://bugs.python.org/issue27152 #27145: long_add and long_sub might return a new int where &small_ints http://bugs.python.org/issue27145 #27144: concurrent.futures.as_completed() memory inefficiency http://bugs.python.org/issue27144 #27141: Fix collections.UserList shallow copy http://bugs.python.org/issue27141 Top 10 most discussed issues (10) ================================= #27157: Unhelpful error message when one calls a subclass of type with http://bugs.python.org/issue27157 26 msgs #19611: inspect.getcallargs doesn't properly interpret set comprehensi http://bugs.python.org/issue19611 15 msgs #27179: subprocess uses wrong encoding on Windows http://bugs.python.org/issue27179 12 msgs #20699: Document that binary IO classes work with bytes-likes objects http://bugs.python.org/issue20699 11 msgs #27137: Python implementation of `functools.partial` is not a class http://bugs.python.org/issue27137 11 msgs #27161: Confusing exception in Path().with_name http://bugs.python.org/issue27161 10 msgs #27136: sock_connect fails for bluetooth (and probably others) http://bugs.python.org/issue27136 9 msgs #22558: Missing doc links to source code for Python-coded modules. http://bugs.python.org/issue22558 8 msgs #26546: Provide translated french translation on docs.python.org http://bugs.python.org/issue26546 8 msgs #27033: Change the decode_data default in smtpd to False http://bugs.python.org/issue27033 8 msgs Issues closed (50) ================== #5252: 2to3 should detect and delete import of removed statvfs module http://bugs.python.org/issue5252 closed by r.david.murray #8519: doc: termios and ioctl reference links http://bugs.python.org/issue8519 closed by orsenthil #9327: doctest DocFileCase setUp/tearDown asymmetry http://bugs.python.org/issue9327 closed by berker.peksag #9363: data_files are not installed relative to sys.prefix http://bugs.python.org/issue9363 closed by berker.peksag #12243: getpass.getuser works on OSX http://bugs.python.org/issue12243 closed by berker.peksag #12691: tokenize.untokenize is broken http://bugs.python.org/issue12691 closed by terry.reedy #13784: Documentation of xml.sax.xmlreader: Locator.getLineNumber() a http://bugs.python.org/issue13784 closed by r.david.murray #17352: Be clear that __prepare__ must be declared as a class method http://bugs.python.org/issue17352 closed by berker.peksag #18384: Add devhelp build instructions to the documentation makefile http://bugs.python.org/issue18384 closed by berker.peksag #18478: Class bodies: when does a name become local? http://bugs.python.org/issue18478 closed by terry.reedy #20496: function definition tutorial encourages bad practice http://bugs.python.org/issue20496 closed by berker.peksag #20973: Implement proper comparison operations for in _TotalOrderingMi http://bugs.python.org/issue20973 closed by r.david.murray #21271: reset_mock needs parameters to also reset return_value and sid http://bugs.python.org/issue21271 closed by kushal.das #21776: distutils.upload uses the wrong order of exceptions http://bugs.python.org/issue21776 closed by berker.peksag #23116: Python Tutorial 4.7.1: Improve ask_ok() to cover more input va http://bugs.python.org/issue23116 closed by berker.peksag #24647: Document argparse.REMAINDER as being equal to "..." http://bugs.python.org/issue24647 closed by r.david.murray #24671: idlelib 2.7: finish converting print statements http://bugs.python.org/issue24671 closed by terry.reedy #25570: urllib.request > Request.add_header("abcd","efgh") fails with http://bugs.python.org/issue25570 closed by martin.panter #25926: Clarify that the itertools pure python equivalents are only ap http://bugs.python.org/issue25926 closed by rhettinger #25931: os.fork() command distributed in windows Python27 (in SocketSe http://bugs.python.org/issue25931 closed by gregory.p.smith #26526: In parsermodule.c, replace over 2KLOC of hand-crafted validati http://bugs.python.org/issue26526 closed by python-dev #26553: Write HTTP in uppercase http://bugs.python.org/issue26553 closed by martin.panter #26632: @public - an __all__ decorator http://bugs.python.org/issue26632 closed by barry #26739: idle: Errno 10035 a non-blocking socket operation could not be http://bugs.python.org/issue26739 closed by zach.ware #26829: update docs: when creating classes a new dict is created for t http://bugs.python.org/issue26829 closed by r.david.murray #27043: Describe what ???inspect.cleandoc??? does to synopsis line. http://bugs.python.org/issue27043 closed by orsenthil #27113: sqlite3 connect parameter "check_same_thread" not documented http://bugs.python.org/issue27113 closed by orsenthil #27117: turtledemo does not work with IDLE's new dark theme. http://bugs.python.org/issue27117 closed by terry.reedy #27124: binascii.a2b_hex raises binascii.Error and ValueError, not Typ http://bugs.python.org/issue27124 closed by martin.panter #27125: Typo in Python 2 multiprocessing documentation http://bugs.python.org/issue27125 closed by martin.panter #27138: FileFinder.find_spec() docstring needs to be corrected. http://bugs.python.org/issue27138 closed by eric.snow #27143: python 3.5 conflict with Mailman, ebtables and firewalld http://bugs.python.org/issue27143 closed by barry #27146: posixmodule.c needs stdio.h http://bugs.python.org/issue27146 closed by gregory.p.smith #27147: importlib docs do not mention PEP 420 http://bugs.python.org/issue27147 closed by eric.snow #27148: Make VENV_DIR relative to Script directory http://bugs.python.org/issue27148 closed by vinay.sajip #27153: Default value shown by argparse.ArgumentDefaultsHelpFormatter http://bugs.python.org/issue27153 closed by r.david.murray #27155: '-' sign typo in example http://bugs.python.org/issue27155 closed by r.david.murray #27158: `isinstance` function does not handle types that are their own http://bugs.python.org/issue27158 closed by ebarry #27159: Python 3.5.1's websocket's lib crashes in event that internet http://bugs.python.org/issue27159 closed by r.david.murray #27160: str.format: Silent truncation of kwargs when passing keywords http://bugs.python.org/issue27160 closed by ebarry #27166: Spam http://bugs.python.org/issue27166 closed by ebarry #27171: Fix various typos http://bugs.python.org/issue27171 closed by martin.panter #27174: Update URL to IPython in interactive.rst http://bugs.python.org/issue27174 closed by berker.peksag #27176: Addition of assertNotRaises http://bugs.python.org/issue27176 closed by rhettinger #27178: Unconverted RST marking in interpreter tutorial http://bugs.python.org/issue27178 closed by berker.peksag #27183: Clarify that Py_VISIT(NULL) does nothing http://bugs.python.org/issue27183 closed by python-dev #27185: Clarify Test Coverage for the String Module (test_pep292 is no http://bugs.python.org/issue27185 closed by erinspace #27191: Add formatting around methods in PEP 8 http://bugs.python.org/issue27191 closed by berker.peksag #27192: Keyboard Shortcuts Consistently Cause Crashes http://bugs.python.org/issue27192 closed by ebarry #27193: Tkinter Unresponsive With Special Keys http://bugs.python.org/issue27193 closed by ned.deily From brett at python.org Fri Jun 3 17:37:03 2016 From: brett at python.org (Brett Cannon) Date: Fri, 03 Jun 2016 21:37:03 +0000 Subject: [Python-Dev] frame evaluation API PEP Message-ID: For those of you who follow python-ideas or were at the PyCon US 2016 language summit, you have already seen/heard about this PEP. For those of you who don't fall into either of those categories, this PEP proposed a frame evaluation API for CPython. The motivating example of this work has been Pyjion, the experimental CPython JIT Dino Viehland and I have been working on in our spare time at Microsoft. The API also works for debugging, though, as already demonstrated by Google having added a very similar API internally for debugging purposes. The PEP is pasted in below and also available in rendered form at https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I will assign myself a PEP # once discussion is finished as it's easier to work in git for this for the rich rendering of the in-progress PEP). I should mention that the difference from python-ideas and the language summit in the PEP are the listed support from Google's use of a very similar API as well as clarifying the co_extra field on code objects doesn't change their immutability (at least from the view of the PEP). ---------- PEP: NNN Title: Adding a frame evaluation API to CPython Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon , Dino Viehland Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 16-May-2016 Post-History: 16-May-2016 03-Jun-2016 Abstract ======== This PEP proposes to expand CPython's C API [#c-api]_ to allow for the specification of a per-interpreter function pointer to handle the evaluation of frames [#pyeval_evalframeex]_. This proposal also suggests adding a new field to code objects [#pycodeobject]_ to store arbitrary data for use by the frame evaluation function. Rationale ========= One place where flexibility has been lacking in Python is in the direct execution of Python code. While CPython's C API [#c-api]_ allows for constructing the data going into a frame object and then evaluating it via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control over the execution of Python code comes down to individual objects instead of a hollistic control of execution at the frame level. While wanting to have influence over frame evaluation may seem a bit too low-level, it does open the possibility for things such as a method-level JIT to be introduced into CPython without CPython itself having to provide one. By allowing external C code to control frame evaluation, a JIT can participate in the execution of Python code at the key point where evaluation occurs. This then allows for a JIT to conditionally recompile Python bytecode to machine code as desired while still allowing for executing regular CPython bytecode when running the JIT is not desired. This can be accomplished by allowing interpreters to specify what function to call to evaluate a frame. And by placing the API at the frame evaluation level it allows for a complete view of the execution environment of the code for the JIT. This ability to specify a frame evaluation function also allows for other use-cases beyond just opening CPython up to a JIT. For instance, it would not be difficult to implement a tracing or profiling function at the call level with this API. While CPython does provide the ability to set a tracing or profiling function at the Python level, this would be able to match the data collection of the profiler and quite possibly be faster for tracing by simply skipping per-line tracing support. It also opens up the possibility of debugging where the frame evaluation function only performs special debugging work when it detects it is about to execute a specific code object. In that instance the bytecode could be theoretically rewritten in-place to inject a breakpoint function call at the proper point for help in debugging while not having to do a heavy-handed approach as required by ``sys.settrace()``. To help facilitate these use-cases, we are also proposing the adding of a "scratch space" on code objects via a new field. This will allow per-code object data to be stored with the code object itself for easy retrieval by the frame evaluation function as necessary. The field itself will simply be a ``PyObject *`` type so that any data stored in the field will participate in normal object memory management. Proposal ======== All proposed C API changes below will not be part of the stable ABI. Expanding ``PyCodeObject`` -------------------------- One field is to be added to the ``PyCodeObject`` struct [#pycodeobject]_:: typedef struct { ... PyObject *co_extra; /* "Scratch space" for the code object. */ } PyCodeObject; The ``co_extra`` will be ``NULL`` by default and will not be used by CPython itself. Third-party code is free to use the field as desired. Values stored in the field are expected to not be required in order for the code object to function, allowing the loss of the data of the field to be acceptable (this keeps the code object as immutable from a functionality point-of-view; this is slightly contentious and so is listed as an open issue in `Is co_extra needed?`_). The field will be freed like all other fields on ``PyCodeObject`` during deallocation using ``Py_XDECREF()``. It is not recommended that multiple users attempt to use the ``co_extra`` simultaneously. While a dictionary could theoretically be set to the field and various users could use a key specific to the project, there is still the issue of key collisions as well as performance degradation from using a dictionary lookup on every frame evaluation. Users are expected to do a type check to make sure that the field has not been previously set by someone else. Expanding ``PyInterpreterState`` -------------------------------- The entrypoint for the frame evalution function is per-interpreter:: // Same type signature as PyEval_EvalFrameEx(). typedef PyObject* (__stdcall *PyFrameEvalFunction)(PyFrameObject*, int); typedef struct { ... PyFrameEvalFunction eval_frame; } PyInterpreterState; By default, the ``eval_frame`` field will be initialized to a function pointer that represents what ``PyEval_EvalFrameEx()`` currently is (called ``PyEval_EvalFrameDefault()``, discussed later in this PEP). Third-party code may then set their own frame evaluation function instead to control the execution of Python code. A pointer comparison can be used to detect if the field is set to ``PyEval_EvalFrameDefault()`` and thus has not been mutated yet. Changes to ``Python/ceval.c`` ----------------------------- ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it currently stands will be renamed to ``PyEval_EvalFrameDefault()``. The new ``PyEval_EvalFrameEx()`` will then become:: PyObject * PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) { PyThreadState *tstate = PyThreadState_GET(); return tstate->interp->eval_frame(frame, throwflag); } This allows third-party code to place themselves directly in the path of Python code execution while being backwards-compatible with code already using the pre-existing C API. Updating ``python-gdb.py`` -------------------------- The generated ``python-gdb.py`` file used for Python support in GDB makes some hard-coded assumptions about ``PyEval_EvalFrameEx()``, e.g. the names of local variables. It will need to be updated to work with the proposed changes. Performance impact ================== As this PEP is proposing an API to add pluggability, performance impact is considered only in the case where no third-party code has made any changes. Several runs of pybench [#pybench]_ consistently showed no performance cost from the API change alone. A run of the Python benchmark suite [#py-benchmarks]_ showed no measurable cost in performance. In terms of memory impact, since there are typically not many CPython interpreters executing in a single process that means the impact of ``co_extra`` being added to ``PyCodeObject`` is the only worry. According to [#code-object-count]_, a run of the Python test suite results in about 72,395 code objects being created. On a 64-bit CPU that would result in 579,160 bytes of extra memory being used if all code objects were alive at once and had nothing set in their ``co_extra`` fields. Example Usage ============= A JIT for CPython ----------------- Pyjion '''''' The Pyjion project [#pyjion]_ has used this proposed API to implement a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each code object has its ``co_extra`` field set to a ``PyjionJittedCode`` object which stores four pieces of information: 1. Execution count 2. A boolean representing whether a previous attempt to JIT failed 3. A function pointer to a trampoline (which can be type tracing or not) 4. A void pointer to any JIT-compiled machine code The frame evaluation function has (roughly) the following algorithm:: def eval_frame(frame, throw_flag): pyjion_code = frame.code.co_extra if not pyjion_code: frame.code.co_extra = PyjionJittedCode() elif not pyjion_code.jit_failed: if not pyjion_code.jit_code: return pyjion_code.eval(pyjion_code.jit_code, frame) elif pyjion_code.exec_count > 20_000: if jit_compile(frame): return pyjion_code.eval(pyjion_code.jit_code, frame) else: pyjion_code.jit_failed = True pyjion_code.exec_count += 1 return PyEval_EvalFrameDefault(frame, throw_flag) The key point, though, is that all of this work and logic is separate from CPython and yet with the proposed API changes it is able to provide a JIT that is compliant with Python semantics (as of this writing, performance is almost equivalent to CPython without the new API). This means there's nothing technically preventing others from implementing their own JITs for CPython by utilizing the proposed API. Other JITs '''''''''' It should be mentioned that the Pyston team was consulted on an earlier version of this PEP that was more JIT-specific and they were not interested in utilizing the changes proposed because they want control over memory layout they had no interest in directly supporting CPython itself. An informal discusion with a developer on the PyPy team led to a similar comment. Numba [#numba]_, on the other hand, suggested that they would be interested in the proposed change in a post-1.0 future for themselves [#numba-interest]_. The experimental Coconut JIT [#coconut]_ could have benefitted from this PEP. In private conversations with Coconut's creator we were told that our API was probably superior to the one they developed for Coconut to add JIT support to CPython. Debugging --------- In conversations with the Python Tools for Visual Studio team (PTVS) [#ptvs]_, they thought they would find these API changes useful for implementing more performant debugging. As mentioned in the Rationale_ section, this API would allow for switching on debugging functionality only in frames where it is needed. This could allow for either skipping information that ``sys.settrace()`` normally provides and even go as far as to dynamically rewrite bytecode prior to execution to inject e.g. breakpoints in the bytecode. It also turns out that Google has provided a very similar API internally for years. It has been used for performant debugging purposes. Implementation ============== A set of patches implementing the proposed API is available through the Pyjion project [#pyjion]_. In its current form it has more changes to CPython than just this proposed API, but that is for ease of development instead of strict requirements to accomplish its goals. Open Issues =========== Allow ``eval_frame`` to be ``NULL`` ----------------------------------- Currently the frame evaluation function is expected to always be set. It could very easily simply default to ``NULL`` instead which would signal to use ``PyEval_EvalFrameDefault()``. The current proposal of not special-casing the field seemed the most straight-forward, but it does require that the field not accidentally be cleared, else a crash may occur. Is co_extra needed? ------------------- While discussing this PEP at PyCon US 2016, some core developers expressed their worry of the ``co_extra`` field making code objects mutable. The thinking seemed to be that having a field that was mutated after the creation of the code object made the object seem mutable, even though no other aspect of code objects changed. The view of this PEP is that the `co_extra` field doesn't change the fact that code objects are immutable. The field is specified in this PEP as to not contain information required to make the code object usable, making it more of a caching field. It could be viewed as similar to the UTF-8 cache that string objects have internally; strings are still considered immutable even though they have a field that is conditionally set. The field is also not strictly necessary. While the field greatly simplifies attaching extra information to code objects, other options such as keeping a mapping of code object memory addresses to what would have been kept in ``co_extra`` or perhaps using a weak reference of the data on the code object and then iterating through the weak references until the attached data is found is possible. But obviously all of these solutions are not as simple or performant as adding the ``co_extra`` field. Rejected Ideas ============== A JIT-specific C API -------------------- Originally this PEP was going to propose a much larger API change which was more JIT-specific. After soliciting feedback from the Numba team [#numba]_, though, it became clear that the API was unnecessarily large. The realization was made that all that was truly needed was the opportunity to provide a trampoline function to handle execution of Python code that had been JIT-compiled and a way to attach that compiled machine code along with other critical data to the corresponding Python code object. Once it was shown that there was no loss in functionality or in performance while minimizing the API changes required, the proposal was changed to its current form. References ========== .. [#pyjion] Pyjion project (https://github.com/microsoft/pyjion) .. [#c-api] CPython's C API (https://docs.python.org/3/c-api/index.html) .. [#pycodeobject] ``PyCodeObject`` (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) .. [#coreclr] .NET Core Runtime (CoreCLR) (https://github.com/dotnet/coreclr) .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` ( https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx ) .. [#pycodeobject] ``PyCodeObject`` (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) .. [#numba] Numba (http://numba.pydata.org/) .. [#numba-interest] numba-users mailing list: "Would the C API for a JIT entrypoint being proposed by Pyjion help out Numba?" ( https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g ) .. [#code-object-count] [Python-Dev] Opcode cache in ceval loop (https://mail.python.org/pipermail/python-dev/2016-February/143025.html) .. [#py-benchmarks] Python benchmark suite (https://hg.python.org/benchmarks) .. [#pyston] Pyston (http://pyston.org) .. [#pypy] PyPy (http://pypy.org/) .. [#ptvs] Python Tools for Visual Studio (http://microsoft.github.io/PTVS/) .. [#coconut] Coconut (https://github.com/davidmalcolm/coconut) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Fri Jun 3 17:50:29 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 03 Jun 2016 17:50:29 -0400 Subject: [Python-Dev] I broke the 3.5 branch, apparently Message-ID: <20160603215031.E2C37B14024@webabinitio.net> I don't understand how it happened, but apparently I got a merge commit backward and merged 3.6 into 3.5 and pushed it without realizing what had happened. If anyone has any clue how to reverse this cleanly, please let me know. (There are a couple people at the sprints looking in to it, but the mercurial guys aren't here so we are short on experts). My apologies for the mess :( --David From python at mrabarnett.plus.com Fri Jun 3 18:21:25 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 3 Jun 2016 23:21:25 +0100 Subject: [Python-Dev] I broke the 3.5 branch, apparently In-Reply-To: <20160603215031.E2C37B14024@webabinitio.net> References: <20160603215031.E2C37B14024@webabinitio.net> Message-ID: <5e613f77-0f91-1c14-e4dc-39dbfb1cdd4f@mrabarnett.plus.com> On 2016-06-03 22:50, R. David Murray wrote: > I don't understand how it happened, but apparently I got a merge commit > backward and merged 3.6 into 3.5 and pushed it without realizing what > had happened. If anyone has any clue how to reverse this cleanly, > please let me know. (There are a couple people at the sprints looking > in to it, but the mercurial guys aren't here so we are short on experts). > > My apologies for the mess :( > There's a lot about undoing changes here: http://hgbook.red-bean.com/read/finding-and-fixing-mistakes.html From rdmurray at bitdance.com Fri Jun 3 18:29:03 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 03 Jun 2016 18:29:03 -0400 Subject: [Python-Dev] FIXED: I broke the 3.5 branch, apparently In-Reply-To: <5e613f77-0f91-1c14-e4dc-39dbfb1cdd4f@mrabarnett.plus.com> References: <20160603215031.E2C37B14024@webabinitio.net> <5e613f77-0f91-1c14-e4dc-39dbfb1cdd4f@mrabarnett.plus.com> Message-ID: <20160603222904.960CFB14024@webabinitio.net> On Fri, 03 Jun 2016 23:21:25 +0100, MRAB wrote: > On 2016-06-03 22:50, R. David Murray wrote: > > I don't understand how it happened, but apparently I got a merge commit > > backward and merged 3.6 into 3.5 and pushed it without realizing what > > had happened. If anyone has any clue how to reverse this cleanly, > > please let me know. (There are a couple people at the sprints looking > > in to it, but the mercurial guys aren't here so we are short on experts). > > > > My apologies for the mess :( > > > There's a lot about undoing changes here: > > http://hgbook.red-bean.com/read/finding-and-fixing-mistakes.html Ned Deily has fixed the problem. --David From benjamin at python.org Sat Jun 4 02:11:31 2016 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 03 Jun 2016 23:11:31 -0700 Subject: [Python-Dev] C99 Message-ID: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> PEP 7 requires CPython to use C code conforming to the venerable C89 standard. Traditionally, we've been stuck with C89 due to poor C support in MSVC. However, MSVC 2013 and 2015 implement the key features of C99. C99 does not offer anything earth-shattering; here are the features I think we'd find most interesting: - Variable declarations can be on any line: removes possibly the most annoying limitation of C89. - Inline functions: We can make Py_DECREF and Py_INCREF inline functions rather than unpleasant macros. - C++-style line comments: Not an killer feature but commonly used. - Booleans In summary, some niceties that would make CPython hacking a little more fun. So, what say you to updating PEP 7 to allow C99 features for Python 3.6 (in so much as GCC and MSVC support them)? Regards, Benjamin From vadmium+py at gmail.com Sat Jun 4 03:53:20 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Sat, 4 Jun 2016 07:53:20 +0000 Subject: [Python-Dev] C99 In-Reply-To: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: On 4 June 2016 at 06:11, Benjamin Peterson wrote: > PEP 7 requires CPython to use C code conforming to the venerable C89 > standard. Traditionally, we've been stuck with C89 due to poor C support > in MSVC. However, MSVC 2013 and 2015 implement the key features of C99. > C99 does not offer anything earth-shattering; here are the features I > think we'd find most interesting: > - Variable declarations can be on any line: removes possibly the most > annoying limitation of C89. > - Inline functions: We can make Py_DECREF and Py_INCREF inline functions > rather than unpleasant macros. > - C++-style line comments: Not an killer feature but commonly used. > - Booleans My most-missed C99 feature would be designated initializers. Does MSVC support them? It might allow you to do away with those giant pasted slot tables, and just write the slots you need: PyTypeObject PyUnicodeIter_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) .tp_name = "str_iterator", .tp_basicsize = sizeof(unicodeiterobject), .tp_dealloc = unicodeiter_dealloc, .tp_getattro = PyObject_GenericGetAttr, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, .tp_traverse = unicodeiter_traverse, .tp_iter = PyObject_SelfIter, .tp_iternext = unicodeiter_next, .tp_methods = unicodeiter_methods, }; > So, what say you to updating PEP 7 to allow C99 features for Python 3.6 > (in so much as GCC and MSVC support them)? Sounds good for features that are well-supported by compilers that people use. (Are there other compilers used than just GCC and MSVC?) From storchaka at gmail.com Sat Jun 4 04:08:39 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 4 Jun 2016 11:08:39 +0300 Subject: [Python-Dev] Improving the bytecode Message-ID: Following the converting 8-bit bytecode to 16-bit bytecode (wordcode), there are other issues for improving the bytecode. 1. http://bugs.python.org/issue27129 Make the bytecode more 16-bit oriented. 2. http://bugs.python.org/issue27140 Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant keys. This optimize the common case and especially helpful for two following issues (creating and calling functions). 3. http://bugs.python.org/issue27095 Simplify MAKE_FUNCTION/MAKE_CLOSURE. Instead packing three numbers in oparg the new MAKE_FUNCTION takes built tuples and dicts from the stack. MAKE_FUNCTION and MAKE_CLOSURE are merged in the single opcode. 4. http://bugs.python.org/issue27213 Rework CALL_FUNCTION* opcodes. Replace four existing opcodes with three simpler and more efficient opcodes. 5. http://bugs.python.org/issue27127 Rework the for loop implementation. 6. http://bugs.python.org/issue17611 Move unwinding of stack for "pseudo exceptions" from interpreter to compiler. From sebastian at realpath.org Sat Jun 4 04:12:57 2016 From: sebastian at realpath.org (Sebastian Krause) Date: Sat, 04 Jun 2016 10:12:57 +0200 Subject: [Python-Dev] C99 In-Reply-To: (Martin Panter's message of "Sat, 4 Jun 2016 07:53:20 +0000") References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: Martin Panter wrote: >> So, what say you to updating PEP 7 to allow C99 features for Python 3.6 >> (in so much as GCC and MSVC support them)? > > Sounds good for features that are well-supported by compilers that > people use. (Are there other compilers used than just GCC and MSVC?) clang on OS X, but it supports pretty much everything that GCC supports as well. From brett at python.org Sat Jun 4 12:07:22 2016 From: brett at python.org (Brett Cannon) Date: Sat, 04 Jun 2016 16:07:22 +0000 Subject: [Python-Dev] Improving the bytecode In-Reply-To: References: Message-ID: It's not on the list but I'm hoping to convince Dino to work on END_FINALLY to be a bit more sane. On Sat, Jun 4, 2016, 01:17 Serhiy Storchaka wrote: > Following the converting 8-bit bytecode to 16-bit bytecode (wordcode), > there are other issues for improving the bytecode. > > 1. http://bugs.python.org/issue27129 > Make the bytecode more 16-bit oriented. > > 2. http://bugs.python.org/issue27140 > Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant > keys. This optimize the common case and especially helpful for two > following issues (creating and calling functions). > > 3. http://bugs.python.org/issue27095 > Simplify MAKE_FUNCTION/MAKE_CLOSURE. Instead packing three numbers in > oparg the new MAKE_FUNCTION takes built tuples and dicts from the stack. > MAKE_FUNCTION and MAKE_CLOSURE are merged in the single opcode. > > 4. http://bugs.python.org/issue27213 > Rework CALL_FUNCTION* opcodes. Replace four existing opcodes with three > simpler and more efficient opcodes. > > 5. http://bugs.python.org/issue27127 > Rework the for loop implementation. > > 6. http://bugs.python.org/issue17611 > Move unwinding of stack for "pseudo exceptions" from interpreter to > compiler. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Jun 4 13:02:27 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 4 Jun 2016 11:02:27 -0600 Subject: [Python-Dev] Improving the bytecode In-Reply-To: References: Message-ID: You should get in touch with Mark Shannon, while you're working on ceval. He has some definite improvements that can be made to the eval loop. -eric On Sat, Jun 4, 2016 at 2:08 AM, Serhiy Storchaka wrote: > Following the converting 8-bit bytecode to 16-bit bytecode (wordcode), there > are other issues for improving the bytecode. > > 1. http://bugs.python.org/issue27129 > Make the bytecode more 16-bit oriented. > > 2. http://bugs.python.org/issue27140 > Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant keys. > This optimize the common case and especially helpful for two following > issues (creating and calling functions). > > 3. http://bugs.python.org/issue27095 > Simplify MAKE_FUNCTION/MAKE_CLOSURE. Instead packing three numbers in oparg > the new MAKE_FUNCTION takes built tuples and dicts from the stack. > MAKE_FUNCTION and MAKE_CLOSURE are merged in the single opcode. > > 4. http://bugs.python.org/issue27213 > Rework CALL_FUNCTION* opcodes. Replace four existing opcodes with three > simpler and more efficient opcodes. > > 5. http://bugs.python.org/issue27127 > Rework the for loop implementation. > > 6. http://bugs.python.org/issue17611 > Move unwinding of stack for "pseudo exceptions" from interpreter to > compiler. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com From christian at python.org Sat Jun 4 13:27:39 2016 From: christian at python.org (Christian Heimes) Date: Sat, 4 Jun 2016 10:27:39 -0700 Subject: [Python-Dev] C99 In-Reply-To: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: On 2016-06-03 23:11, Benjamin Peterson wrote: > PEP 7 requires CPython to use C code conforming to the venerable C89 > standard. Traditionally, we've been stuck with C89 due to poor C support > in MSVC. However, MSVC 2013 and 2015 implement the key features of C99. > C99 does not offer anything earth-shattering; here are the features I > think we'd find most interesting: > - Variable declarations can be on any line: removes possibly the most > annoying limitation of C89. > - Inline functions: We can make Py_DECREF and Py_INCREF inline functions > rather than unpleasant macros. > - C++-style line comments: Not an killer feature but commonly used. > - Booleans > In summary, some niceties that would make CPython hacking a little more > fun. > > So, what say you to updating PEP 7 to allow C99 features for Python 3.6 > (in so much as GCC and MSVC support them)? +1 - We never officially deprecated C89 platforms withou 64 bit integers in PEP 7. Victor's changes to pytime.h implies support for uint64_t and int64_t. C99 has mandatory long long int support. - If we also drop Solaris Studio C compiler support, we can replace header guards (e.g. #ifndef Py_PYTHON_H) with #pragma once Christian From guido at python.org Sat Jun 4 13:47:38 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 4 Jun 2016 10:47:38 -0700 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: Funny. Just two weeks ago I was helping someone who discovered a compiler that doesn't support the new relaxed variable declaration rules. I think it was on Windows. Maybe this move is a little too aggressively deprecating older Windows compilers? On Sat, Jun 4, 2016 at 10:27 AM, Christian Heimes wrote: > On 2016-06-03 23:11, Benjamin Peterson wrote: >> PEP 7 requires CPython to use C code conforming to the venerable C89 >> standard. Traditionally, we've been stuck with C89 due to poor C support >> in MSVC. However, MSVC 2013 and 2015 implement the key features of C99. >> C99 does not offer anything earth-shattering; here are the features I >> think we'd find most interesting: >> - Variable declarations can be on any line: removes possibly the most >> annoying limitation of C89. >> - Inline functions: We can make Py_DECREF and Py_INCREF inline functions >> rather than unpleasant macros. >> - C++-style line comments: Not an killer feature but commonly used. >> - Booleans >> In summary, some niceties that would make CPython hacking a little more >> fun. >> >> So, what say you to updating PEP 7 to allow C99 features for Python 3.6 >> (in so much as GCC and MSVC support them)? > > +1 > > - We never officially deprecated C89 platforms withou 64 bit integers in > PEP 7. Victor's changes to pytime.h implies support for uint64_t and > int64_t. C99 has mandatory long long int support. > > - If we also drop Solaris Studio C compiler support, we can replace > header guards (e.g. #ifndef Py_PYTHON_H) with #pragma once > > Christian > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From dinov at microsoft.com Sat Jun 4 14:32:30 2016 From: dinov at microsoft.com (Dino Viehland) Date: Sat, 4 Jun 2016 18:32:30 +0000 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: Martin wrote: > On 4 June 2016 at 06:11, Benjamin Peterson wrote: > > PEP 7 requires CPython to use C code conforming to the venerable C89 > > standard. Traditionally, we've been stuck with C89 due to poor C > > support in MSVC. However, MSVC 2013 and 2015 implement the key > features of C99. > > C99 does not offer anything earth-shattering; here are the features I > > think we'd find most interesting: > > - Variable declarations can be on any line: removes possibly the most > > annoying limitation of C89. > > - Inline functions: We can make Py_DECREF and Py_INCREF inline > > functions rather than unpleasant macros. > > - C++-style line comments: Not an killer feature but commonly used. > > - Booleans > > My most-missed C99 feature would be designated initializers. Does MSVC > support them? It might allow you to do away with those giant pasted slot > tables, and just write the slots you need: > > PyTypeObject PyUnicodeIter_Type = { > PyVarObject_HEAD_INIT(&PyType_Type, 0) > .tp_name = "str_iterator", > .tp_basicsize = sizeof(unicodeiterobject), > .tp_dealloc = unicodeiter_dealloc, > .tp_getattro = PyObject_GenericGetAttr, > .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, > .tp_traverse = unicodeiter_traverse, > .tp_iter = PyObject_SelfIter, > .tp_iternext = unicodeiter_next, > .tp_methods = unicodeiter_methods, > }; I checked and VC++ does actually support this, and it looks like they support // comments as well. I don't think it fully supports all of the C99 features - it appears They just cherry picked some stuff. The C99 standard library does appear to be fully supported with the exception of tgmath.h. From benjamin at python.org Sat Jun 4 14:47:43 2016 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 04 Jun 2016 11:47:43 -0700 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: <1465066063.2971497.627971209.737F379F@webmail.messagingengine.com> On Sat, Jun 4, 2016, at 11:32, Dino Viehland wrote: > > > Martin wrote: > > On 4 June 2016 at 06:11, Benjamin Peterson wrote: > > > PEP 7 requires CPython to use C code conforming to the venerable C89 > > > standard. Traditionally, we've been stuck with C89 due to poor C > > > support in MSVC. However, MSVC 2013 and 2015 implement the key > > features of C99. > > > C99 does not offer anything earth-shattering; here are the features I > > > think we'd find most interesting: > > > - Variable declarations can be on any line: removes possibly the most > > > annoying limitation of C89. > > > - Inline functions: We can make Py_DECREF and Py_INCREF inline > > > functions rather than unpleasant macros. > > > - C++-style line comments: Not an killer feature but commonly used. > > > - Booleans > > > > My most-missed C99 feature would be designated initializers. Does MSVC > > support them? It might allow you to do away with those giant pasted slot > > tables, and just write the slots you need: > > > > PyTypeObject PyUnicodeIter_Type = { > > PyVarObject_HEAD_INIT(&PyType_Type, 0) > > .tp_name = "str_iterator", > > .tp_basicsize = sizeof(unicodeiterobject), > > .tp_dealloc = unicodeiter_dealloc, > > .tp_getattro = PyObject_GenericGetAttr, > > .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, > > .tp_traverse = unicodeiter_traverse, > > .tp_iter = PyObject_SelfIter, > > .tp_iternext = unicodeiter_next, > > .tp_methods = unicodeiter_methods, > > }; > > I checked and VC++ does actually support this, and it looks like they > support > // comments as well. I don't think it fully supports all of the C99 > features - it appears > They just cherry picked some stuff. The C99 standard library does appear > to be fully > supported with the exception of tgmath.h. Are the C99 features VS++ supports documented anywhere? I couldn't find any list. From christian at python.org Sat Jun 4 14:50:52 2016 From: christian at python.org (Christian Heimes) Date: Sat, 4 Jun 2016 11:50:52 -0700 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> On 2016-06-04 10:47, Guido van Rossum wrote: > Funny. Just two weeks ago I was helping someone who discovered a > compiler that doesn't support the new relaxed variable declaration > rules. I think it was on Windows. Maybe this move is a little too > aggressively deprecating older Windows compilers? Yes, it's not support in VS 2012 and 2008 for Python 3.4 and older. New C99 features are available in VS 2013, https://blogs.msdn.microsoft.com/vcblog/2013/06/28/c1114-stl-features-fixes-and-breaking-changes-in-vs-2013/ Python 3.5+ requires VS 2015 anyway. Traditionally we tried to keep backwards compatibility with older compiler versions. The new features are tempting enough to deprecate compiler versions that have been released more than five years ago. Christian From guido at python.org Sat Jun 4 14:59:07 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 4 Jun 2016 11:59:07 -0700 Subject: [Python-Dev] C99 In-Reply-To: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> Message-ID: As long as we don't require extension module authors to use them -- they may have their own compatibility requirements. On Sat, Jun 4, 2016 at 11:50 AM, Christian Heimes wrote: > On 2016-06-04 10:47, Guido van Rossum wrote: >> Funny. Just two weeks ago I was helping someone who discovered a >> compiler that doesn't support the new relaxed variable declaration >> rules. I think it was on Windows. Maybe this move is a little too >> aggressively deprecating older Windows compilers? > > Yes, it's not support in VS 2012 and 2008 for Python 3.4 and older. New > C99 features are available in VS 2013, > https://blogs.msdn.microsoft.com/vcblog/2013/06/28/c1114-stl-features-fixes-and-breaking-changes-in-vs-2013/ > > > Python 3.5+ requires VS 2015 anyway. Traditionally we tried to keep > backwards compatibility with older compiler versions. The new features > are tempting enough to deprecate compiler versions that have been > released more than five years ago. > > Christian -- --Guido van Rossum (python.org/~guido) From christian at python.org Sat Jun 4 15:05:09 2016 From: christian at python.org (Christian Heimes) Date: Sat, 4 Jun 2016 12:05:09 -0700 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> Message-ID: <0764173d-e69b-4f75-8797-30828ca6b471@python.org> On 2016-06-04 11:59, Guido van Rossum wrote: > As long as we don't require extension module authors to use them -- > they may have their own compatibility requirements. On Windows extension modules must be compiled with a specific version of MSVC any way. For Python 3.6 VS 2015 or newer is a hard requirement. We kept the old compiler directories around for embedders. From guido at python.org Sat Jun 4 15:07:07 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 4 Jun 2016 12:07:07 -0700 Subject: [Python-Dev] C99 In-Reply-To: <0764173d-e69b-4f75-8797-30828ca6b471@python.org> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> Message-ID: I'm talking about 3rd party extensions. Those may require source compatibility with older Python versions. All I'm asking for is to not require source-level use of C99 features. Of course requiring a specific compiler to work with specific CPython versions is fine. On Sat, Jun 4, 2016 at 12:05 PM, Christian Heimes wrote: > On 2016-06-04 11:59, Guido van Rossum wrote: >> As long as we don't require extension module authors to use them -- >> they may have their own compatibility requirements. > > On Windows extension modules must be compiled with a specific version of > MSVC any way. For Python 3.6 VS 2015 or newer is a hard requirement. > > We kept the old compiler directories around for embedders. > -- --Guido van Rossum (python.org/~guido) From christian at python.org Sat Jun 4 15:10:26 2016 From: christian at python.org (Christian Heimes) Date: Sat, 4 Jun 2016 12:10:26 -0700 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> Message-ID: <7104eb32-9e06-7e33-7e5a-542b3bc94a35@python.org> On 2016-06-04 12:07, Guido van Rossum wrote: > I'm talking about 3rd party extensions. Those may require source > compatibility with older Python versions. All I'm asking for is to not > require source-level use of C99 features. Of course requiring a > specific compiler to work with specific CPython versions is fine. Ah, the other way around. Yes, that makes a lot of sense. From larry at hastings.org Sat Jun 4 17:12:13 2016 From: larry at hastings.org (Larry Hastings) Date: Sat, 4 Jun 2016 14:12:13 -0700 Subject: [Python-Dev] C99 In-Reply-To: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: <5753442D.2000008@hastings.org> On 06/03/2016 11:11 PM, Benjamin Peterson wrote: > So, what say you to updating PEP 7 to allow C99 features for Python 3.6 > (in so much as GCC and MSVC support them)? +1 Clearly it'll be 3.5+ only, and clearly it'll be a specific list of features ("C89 but also permitting //-comments, variadic macros, variable declarations on any line, inline functions, and designated initializers"). But I'm looking forward to it! We already had macros for inline (e.g. Py_LOCAL_INLINE), maybe we can remove those. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From meadori at gmail.com Sat Jun 4 17:22:20 2016 From: meadori at gmail.com (Meador Inge) Date: Sat, 4 Jun 2016 16:22:20 -0500 Subject: [Python-Dev] C99 In-Reply-To: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: On Sat, Jun 4, 2016 at 1:11 AM, Benjamin Peterson wrote: > So, what say you to updating PEP 7 to allow C99 features for Python 3.6 > (in so much as GCC and MSVC support them)? > +1 # Meador -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Sat Jun 4 20:26:01 2016 From: christian at python.org (Christian Heimes) Date: Sat, 4 Jun 2016 17:26:01 -0700 Subject: [Python-Dev] cpython: replace custom validation logic in the parse module with a simple DFA validator In-Reply-To: <20160602183248.72964.43203.21CCDB55@psf.io> References: <20160602183248.72964.43203.21CCDB55@psf.io> Message-ID: <4005d323-bb04-9dda-00c8-a5fb271061f9@python.org> On 2016-06-02 11:32, benjamin.peterson wrote: > https://hg.python.org/cpython/rev/4a9159ea2536 > changeset: 101601:4a9159ea2536 > user: Benjamin Peterson > date: Thu Jun 02 11:30:18 2016 -0700 > summary: > replace custom validation logic in the parse module with a simple DFA validator (closes #26526) > > Patch from A. Skrobov. > > files: > Misc/NEWS | 3 + > Modules/parsermodule.c | 2545 +-------------------------- > 2 files changed, 96 insertions(+), 2452 deletions(-) > > > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -22,6 +22,9 @@ > Library > ------- > > +- Issue #26526: Replace custom parse tree validation in the parser > + module with a simple DFA validator. > + > - Issue #27114: Fix SSLContext._load_windows_store_certs fails with > PermissionError > > diff --git a/Modules/parsermodule.c b/Modules/parsermodule.c > --- a/Modules/parsermodule.c > +++ b/Modules/parsermodule.c > @@ -670,9 +670,75 @@ > > > static node* build_node_tree(PyObject *tuple); > -static int validate_expr_tree(node *tree); > -static int validate_file_input(node *tree); > -static int validate_encoding_decl(node *tree); > + > +static int > +validate_node(node *tree) > +{ > + int type = TYPE(tree); > + int nch = NCH(tree); > + dfa *nt_dfa; > + state *dfa_state; > + int pos, arc; > + > + assert(ISNONTERMINAL(type)); > + type -= NT_OFFSET; > + if (type >= _PyParser_Grammar.g_ndfas) { > + PyErr_Format(parser_error, "Unrecognized node type %d.", TYPE(tree)); > + return 0; > + } > + nt_dfa = &_PyParser_Grammar.g_dfa[type]; > + REQ(tree, nt_dfa->d_type); > + > + /* Run the DFA for this nonterminal. */ > + dfa_state = &nt_dfa->d_state[nt_dfa->d_initial]; > + for (pos = 0; pos < nch; ++pos) { > + node *ch = CHILD(tree, pos); > + int ch_type = TYPE(ch); > + for (arc = 0; arc < dfa_state->s_narcs; ++arc) { > + short a_label = dfa_state->s_arc[arc].a_lbl; > + assert(a_label < _PyParser_Grammar.g_ll.ll_nlabels); > + if (_PyParser_Grammar.g_ll.ll_label[a_label].lb_type == ch_type) { > + /* The child is acceptable; if non-terminal, validate it recursively. */ > + if (ISNONTERMINAL(ch_type) && !validate_node(ch)) > + return 0; > + > + /* Update the state, and move on to the next child. */ > + dfa_state = &nt_dfa->d_state[dfa_state->s_arc[arc].a_arrow]; > + goto arc_found; > + } > + } > + /* What would this state have accepted? */ > + { > + short a_label = dfa_state->s_arc->a_lbl; > + int next_type; > + if (!a_label) /* Wouldn't accept any more children */ > + goto illegal_num_children; > + > + next_type = _PyParser_Grammar.g_ll.ll_label[a_label].lb_type; > + if (ISNONTERMINAL(next_type)) > + PyErr_Format(parser_error, "Expected node type %d, got %d.", > + next_type, ch_type); > + else > + PyErr_Format(parser_error, "Illegal terminal: expected %s.", > + _PyParser_TokenNames[next_type]); Coverity doesn't that line: CID 1362505 (#1 of 1): Out-of-bounds read (OVERRUN) 20. overrun-local: Overrunning array _PyParser_TokenNames of 58 8-byte elements at element index 255 (byte offset 2040) using index next_type (which evaluates to 255). Can you add a check to verify, that next_type is not out-of-bounds, e.g. + else if (next_type > N_TOKENS) + PyErr_Format(parser_error, "Illegal node type %d", next_type); > + return 0; > + } > + > +arc_found: > + continue; > + } > + /* Are we in a final state? If so, return 1 for successful validation. */ > + for (arc = 0; arc < dfa_state->s_narcs; ++arc) { > + if (!dfa_state->s_arc[arc].a_lbl) { > + return 1; > + } > + } > + > +illegal_num_children: > + PyErr_Format(parser_error, > + "Illegal number of children for %s node.", nt_dfa->d_name); > + return 0; > +} From mark at hotpy.org Sat Jun 4 20:53:57 2016 From: mark at hotpy.org (Mark Shannon) Date: Sat, 4 Jun 2016 17:53:57 -0700 Subject: [Python-Dev] Improving the bytecode In-Reply-To: References: Message-ID: <57537825.7000706@hotpy.org> On 04/06/16 10:02, Eric Snow wrote: > You should get in touch with Mark Shannon, while you're working on > ceval. He has some definite improvements that can be made to the eval > loop. See http://bugs.python.org/issue17611 for my suggested improvements. I've made a new comment there. Cheers, Mark. > > -eric > > On Sat, Jun 4, 2016 at 2:08 AM, Serhiy Storchaka wrote: >> Following the converting 8-bit bytecode to 16-bit bytecode (wordcode), there >> are other issues for improving the bytecode. >> >> 1. http://bugs.python.org/issue27129 >> Make the bytecode more 16-bit oriented. >> >> 2. http://bugs.python.org/issue27140 >> Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant keys. >> This optimize the common case and especially helpful for two following >> issues (creating and calling functions). >> >> 3. http://bugs.python.org/issue27095 >> Simplify MAKE_FUNCTION/MAKE_CLOSURE. Instead packing three numbers in oparg >> the new MAKE_FUNCTION takes built tuples and dicts from the stack. >> MAKE_FUNCTION and MAKE_CLOSURE are merged in the single opcode. >> >> 4. http://bugs.python.org/issue27213 >> Rework CALL_FUNCTION* opcodes. Replace four existing opcodes with three >> simpler and more efficient opcodes. >> >> 5. http://bugs.python.org/issue27127 >> Rework the for loop implementation. >> >> 6. http://bugs.python.org/issue17611 >> Move unwinding of stack for "pseudo exceptions" from interpreter to >> compiler. >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com From raymond.hettinger at gmail.com Sun Jun 5 14:24:50 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 5 Jun 2016 11:24:50 -0700 Subject: [Python-Dev] Improving the bytecode In-Reply-To: References: Message-ID: <535A1D81-B1BD-439C-8167-A93B2174507E@gmail.com> > On Jun 4, 2016, at 1:08 AM, Serhiy Storchaka wrote: > > Following the converting 8-bit bytecode to 16-bit bytecode (wordcode), there are other issues for improving the bytecode. > > 1. http://bugs.python.org/issue27129 > Make the bytecode more 16-bit oriented. I don' think this should be done. Adding the /2 and *2 just complicates the code and messes with my ability to reason about jumps. With VM opcodes, there is always a tension between being close to implementation (what byte address are we jumping to) and being high level (what is the word offset). In this case, I think we should stay with the former because they are primarily used in ceval.c and peephole.c which are close to the implementation. At the higher level, there isn't any real benefit either (because dis.py already does a nice job of translating the jump targets). Here is one example of the parts of the diff that cause concern that future maintenance will be made more difficult by the change: - j = blocks[j + i + 2] - blocks[i] - 2; + j = (blocks[j * 2 + i + 2] - blocks[i] - 2) / 2; Reviewing the original line only gives me a mild headache while the second one really makes me want to avert my eyes ;-) > 2. http://bugs.python.org/issue27140 > Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant keys. This optimize the common case and especially helpful for two following issues (creating and calling functions). This shows promise. The proposed name BUILD_CONST_KEY_MAP is much more clear than BUILD_MAP_EX. > 3. http://bugs.python.org/issue27095 > Simplify MAKE_FUNCTION/MAKE_CLOSURE. Instead packing three numbers in oparg the new MAKE_FUNCTION takes built tuples and dicts from the stack. MAKE_FUNCTION and MAKE_CLOSURE are merged in the single opcode. > > 4. http://bugs.python.org/issue27213 > Rework CALL_FUNCTION* opcodes. Replace four existing opcodes with three simpler and more efficient opcodes. +1 > 5. http://bugs.python.org/issue27127 > Rework the for loop implementation. I'm unclear what problem is being solved by requiring that GET_ITER always followed immediately by FOR_ITER. > 6. http://bugs.python.org/issue17611 > Move unwinding of stack for "pseudo exceptions" from interpreter to compiler. I have mixed feelings on this one, at once applauding efforts to simplify an eternally messy part of the eval loop and at the same time worried that it throws aways years of tweaks and improvements that came beforehand. This is more of a major surgery than the other patches. Raymond Hettinger From storchaka at gmail.com Sun Jun 5 15:16:57 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 5 Jun 2016 22:16:57 +0300 Subject: [Python-Dev] Improving the bytecode In-Reply-To: <535A1D81-B1BD-439C-8167-A93B2174507E@gmail.com> References: <535A1D81-B1BD-439C-8167-A93B2174507E@gmail.com> Message-ID: On 05.06.16 21:24, Raymond Hettinger wrote: >> On Jun 4, 2016, at 1:08 AM, Serhiy Storchaka wrote: >> 1. http://bugs.python.org/issue27129 >> Make the bytecode more 16-bit oriented. > > I don' think this should be done. Adding the /2 and *2 just complicates the code and messes with my ability to reason about jumps. > > With VM opcodes, there is always a tension between being close to implementation (what byte address are we jumping to) and being high level (what is the word offset). In this case, I think we should stay with the former because they are primarily used in ceval.c and peephole.c which are close to the implementation. At the higher level, there isn't any real benefit either (because dis.py already does a nice job of translating the jump targets). > > Here is one example of the parts of the diff that cause concern that future maintenance will be made more difficult by the change: > > - j = blocks[j + i + 2] - blocks[i] - 2; > + j = (blocks[j * 2 + i + 2] - blocks[i] - 2) / 2; > > Reviewing the original line only gives me a mild headache while the second one really makes me want to avert my eyes ;-) The /2 and *2 are added just because Victor wants to keep f_lineno counting bytes. Please look at my first patch. It doesn't contain /2 and *2. It even contains much less +2 and -2. For example the above change looks as: - j = blocks[j + i + 2] - blocks[i] - 2; + j = blocks[j + i + 1] - blocks[i] - 1; Doesn't this give you less headache? >> 2. http://bugs.python.org/issue27140 >> Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant keys. This optimize the common case and especially helpful for two following issues (creating and calling functions). > > This shows promise. > > The proposed name BUILD_CONST_KEY_MAP is much more clear than BUILD_MAP_EX. If you accept this patch, I'll commit it. At least two other issues wait this. >> 5. http://bugs.python.org/issue27127 >> Rework the for loop implementation. > > I'm unclear what problem is being solved by requiring that GET_ITER always followed immediately by FOR_ITER. As I understand, the purpose was to decrease the number of executed opcodes. It looks to me that existing patch is not acceptable, because there is a reason for using two opcodes in the for loop start. But I think that we can use other optimization here. I'll try to write a patch. From sturla.molden at gmail.com Sun Jun 5 22:28:44 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 6 Jun 2016 02:28:44 +0000 (UTC) Subject: [Python-Dev] C99 References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> Message-ID: <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> Guido van Rossum wrote: > I'm talking about 3rd party extensions. Those may require source > compatibility with older Python versions. All I'm asking for is to not > require source-level use of C99 features. This of course removes a lot of its usefulness. E.g. macros cannot be replaced by inline functions, as header files must still be plain C89. Sturla Molden From tritium-list at sdamon.com Sun Jun 5 22:35:28 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Sun, 5 Jun 2016 22:35:28 -0400 Subject: [Python-Dev] C99 In-Reply-To: <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> Message-ID: <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> > -----Original Message----- > From: Python-Dev [mailto:python-dev-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Sturla Molden > Sent: Sunday, June 5, 2016 10:29 PM > To: python-dev at python.org > Subject: Re: [Python-Dev] C99 > > Guido van Rossum wrote: > > > I'm talking about 3rd party extensions. Those may require source > > compatibility with older Python versions. All I'm asking for is to not > > require source-level use of C99 features. > > This of course removes a lot of its usefulness. E.g. macros cannot be > replaced by inline functions, as header files must still be plain C89. > > > Sturla Molden > I share Guido's priority there - source compatibility is more important than smoothing a few of C's rough edges. Maybe the next breaking change release this should be considered (python 4000... python 5000?) From vgr255 at live.ca Sun Jun 5 22:42:12 2016 From: vgr255 at live.ca (=?iso-8859-1?Q?=C9manuel_Barry?=) Date: Sun, 5 Jun 2016 22:42:12 -0400 Subject: [Python-Dev] C99 In-Reply-To: <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> Message-ID: > From: Python-Dev [mailto:python-dev- > bounces+vgr255=live.ca at python.org] On Behalf Of tritium- > list at sdamon.com > Sent: Sunday, June 05, 2016 10:35 PM > To: 'Sturla Molden'; python-dev at python.org > Subject: Re: [Python-Dev] C99 > > > -----Original Message----- > > From: Python-Dev [mailto:python-dev-bounces+tritium- > > list=sdamon.com at python.org] On Behalf Of Sturla Molden > > Sent: Sunday, June 5, 2016 10:29 PM > > To: python-dev at python.org > > Subject: Re: [Python-Dev] C99 > > > > Guido van Rossum wrote: > > > > > I'm talking about 3rd party extensions. Those may require source > > > compatibility with older Python versions. All I'm asking for is to not > > > require source-level use of C99 features. > > > > This of course removes a lot of its usefulness. E.g. macros cannot be > > replaced by inline functions, as header files must still be plain C89. > > > > > > Sturla Molden > > > > I share Guido's priority there - source compatibility is more important than > smoothing a few of C's rough edges. Correct me if I'm wrong, but I think that Guido meant that the third-party extensions might require their own code (not CPython's) to be compatible with versions of CPython < 3.6, and so PEP 7 shouldn't force them to break their own backwards compatibility. Either way I'm +1 for allowing (but not enforcing) C99 syntax. > Maybe the next breaking change release > this should be considered (python 4000... python 5000?) Let's not! -Emanuel From sturla.molden at gmail.com Sun Jun 5 22:42:15 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 6 Jun 2016 02:42:15 +0000 (UTC) Subject: [Python-Dev] C99 References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> Message-ID: <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> wrote: > I share Guido's priority there - source compatibility is more important than > smoothing a few of C's rough edges. Maybe the next breaking change release > this should be considered (python 4000... python 5000?) I was simply pointing out that Guido's priority removes a lot of the usefulness of C99 at source level. I was not saying I disagreed. If we have to keep header files clean of C99 I think this proposal just adds clutter. From guido at python.org Sun Jun 5 22:52:52 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 5 Jun 2016 19:52:52 -0700 Subject: [Python-Dev] C99 In-Reply-To: <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> Message-ID: I'm not sure I meant that. But if I have a 3rd party extension that compiles with 3.5 headers using C89, then it should still compile with 3.6 headers using C99. Also if I compile it for 3.5 and it only uses the ABI it should still be linkable with 3.6. On Sun, Jun 5, 2016 at 7:42 PM, Sturla Molden wrote: > wrote: > >> I share Guido's priority there - source compatibility is more important than >> smoothing a few of C's rough edges. Maybe the next breaking change release >> this should be considered (python 4000... python 5000?) > > I was simply pointing out that Guido's priority removes a lot of the > usefulness of C99 at source level. I was not saying I disagreed. If we have > to keep header files clean of C99 I think this proposal just adds clutter. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From benjamin at python.org Mon Jun 6 02:51:54 2016 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 05 Jun 2016 23:51:54 -0700 Subject: [Python-Dev] cpython: replace custom validation logic in the parse module with a simple DFA validator In-Reply-To: <4005d323-bb04-9dda-00c8-a5fb271061f9@python.org> References: <20160602183248.72964.43203.21CCDB55@psf.io> <4005d323-bb04-9dda-00c8-a5fb271061f9@python.org> Message-ID: <1465195914.3950157.628917729.492FEB13@webmail.messagingengine.com> On Sat, Jun 4, 2016, at 17:26, Christian Heimes wrote: > On 2016-06-02 11:32, benjamin.peterson wrote: > > https://hg.python.org/cpython/rev/4a9159ea2536 > > changeset: 101601:4a9159ea2536 > > user: Benjamin Peterson > > date: Thu Jun 02 11:30:18 2016 -0700 > > summary: > > replace custom validation logic in the parse module with a simple DFA validator (closes #26526) > > > > Patch from A. Skrobov. > > > > files: > > Misc/NEWS | 3 + > > Modules/parsermodule.c | 2545 +-------------------------- > > 2 files changed, 96 insertions(+), 2452 deletions(-) > > > > > > diff --git a/Misc/NEWS b/Misc/NEWS > > --- a/Misc/NEWS > > +++ b/Misc/NEWS > > @@ -22,6 +22,9 @@ > > Library > > ------- > > > > +- Issue #26526: Replace custom parse tree validation in the parser > > + module with a simple DFA validator. > > + > > - Issue #27114: Fix SSLContext._load_windows_store_certs fails with > > PermissionError > > > > diff --git a/Modules/parsermodule.c b/Modules/parsermodule.c > > --- a/Modules/parsermodule.c > > +++ b/Modules/parsermodule.c > > @@ -670,9 +670,75 @@ > > > > > > static node* build_node_tree(PyObject *tuple); > > -static int validate_expr_tree(node *tree); > > -static int validate_file_input(node *tree); > > -static int validate_encoding_decl(node *tree); > > + > > +static int > > +validate_node(node *tree) > > +{ > > + int type = TYPE(tree); > > + int nch = NCH(tree); > > + dfa *nt_dfa; > > + state *dfa_state; > > + int pos, arc; > > + > > + assert(ISNONTERMINAL(type)); > > + type -= NT_OFFSET; > > + if (type >= _PyParser_Grammar.g_ndfas) { > > + PyErr_Format(parser_error, "Unrecognized node type %d.", TYPE(tree)); > > + return 0; > > + } > > + nt_dfa = &_PyParser_Grammar.g_dfa[type]; > > + REQ(tree, nt_dfa->d_type); > > + > > + /* Run the DFA for this nonterminal. */ > > + dfa_state = &nt_dfa->d_state[nt_dfa->d_initial]; > > + for (pos = 0; pos < nch; ++pos) { > > + node *ch = CHILD(tree, pos); > > + int ch_type = TYPE(ch); > > + for (arc = 0; arc < dfa_state->s_narcs; ++arc) { > > + short a_label = dfa_state->s_arc[arc].a_lbl; > > + assert(a_label < _PyParser_Grammar.g_ll.ll_nlabels); > > + if (_PyParser_Grammar.g_ll.ll_label[a_label].lb_type == ch_type) { > > + /* The child is acceptable; if non-terminal, validate it recursively. */ > > + if (ISNONTERMINAL(ch_type) && !validate_node(ch)) > > + return 0; > > + > > + /* Update the state, and move on to the next child. */ > > + dfa_state = &nt_dfa->d_state[dfa_state->s_arc[arc].a_arrow]; > > + goto arc_found; > > + } > > + } > > + /* What would this state have accepted? */ > > + { > > + short a_label = dfa_state->s_arc->a_lbl; > > + int next_type; > > + if (!a_label) /* Wouldn't accept any more children */ > > + goto illegal_num_children; > > + > > + next_type = _PyParser_Grammar.g_ll.ll_label[a_label].lb_type; > > + if (ISNONTERMINAL(next_type)) > > + PyErr_Format(parser_error, "Expected node type %d, got %d.", > > + next_type, ch_type); > > + else > > + PyErr_Format(parser_error, "Illegal terminal: expected %s.", > > + _PyParser_TokenNames[next_type]); > > Coverity doesn't that line: > > CID 1362505 (#1 of 1): Out-of-bounds read (OVERRUN) > 20. overrun-local: Overrunning array _PyParser_TokenNames of 58 8-byte > elements at element index 255 (byte offset 2040) using index next_type > (which evaluates to 255). I don't think this can cause a problem because it doesn't ever come from user-provided input. From sturla.molden at gmail.com Mon Jun 6 07:23:31 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 6 Jun 2016 11:23:31 +0000 (UTC) Subject: [Python-Dev] C99 References: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> Message-ID: <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> Guido van Rossum wrote: > I'm not sure I meant that. But if I have a 3rd party extension that > compiles with 3.5 headers using C89, then it should still compile with > 3.6 headers using C99. Also if I compile it for 3.5 and it only uses > the ABI it should still be linkable with 3.6. Ok, but if third-party developers shall be free to use a C89 compiler for their own code, we cannot have C99 in the include files. Otherwise the include files will taint the C89 purity of their source code. Personally I don't think we need to worry about compilers that don't implement C99 features like inline functions in C. How long have the Linux kernel used inline functions instead of macros? 20 years or more? Sturla From random832 at fastmail.com Mon Jun 6 09:25:24 2016 From: random832 at fastmail.com (Random832) Date: Mon, 06 Jun 2016 09:25:24 -0400 Subject: [Python-Dev] C99 In-Reply-To: <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> References: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> Message-ID: <1465219524.1062244.629232969.0CBE8C17@webmail.messagingengine.com> On Mon, Jun 6, 2016, at 07:23, Sturla Molden wrote: > Ok, but if third-party developers shall be free to use a C89 compiler for > their own code, we cannot have C99 in the include files. Otherwise the > include files will taint the C89 purity of their source code. > > Personally I don't think we need to worry about compilers that don't > implement C99 features like inline functions in C. How long have the > Linux > kernel used inline functions instead of macros? 20 years or more? Using inline functions instead of macros doesn't have to mean anything but a performance hit on platforms that don't support them, since the inline keyword, or some other identifier, could be defined to expand to an empty token sequence on platforms that do not support it. It's much lower impact on the source code than some other C99 features. From guido at python.org Mon Jun 6 10:11:39 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Jun 2016 07:11:39 -0700 Subject: [Python-Dev] C99 In-Reply-To: <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> References: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Jun 6, 2016 at 4:23 AM, Sturla Molden wrote: > Guido van Rossum wrote: > >> I'm not sure I meant that. But if I have a 3rd party extension that >> compiles with 3.5 headers using C89, then it should still compile with >> 3.6 headers using C99. Also if I compile it for 3.5 and it only uses >> the ABI it should still be linkable with 3.6. > > Ok, but if third-party developers shall be free to use a C89 compiler for > their own code, we cannot have C99 in the include files. Otherwise the > include files will taint the C89 purity of their source code. Well, they should use the right compiler for the Python version they are targeting. I'm just saying that they can't afford C99 features in their own code. Not even to call C/Python APIs. I think it would be okay if e.g. Py_INCREF was an inline function in Python 3.6, as long as the way you use it remains the same. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jun 6 10:12:10 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Jun 2016 07:12:10 -0700 Subject: [Python-Dev] C99 In-Reply-To: <1465219524.1062244.629232969.0CBE8C17@webmail.messagingengine.com> References: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> <1465219524.1062244.629232969.0CBE8C17@webmail.messagingengine.com> Message-ID: On Mon, Jun 6, 2016 at 6:25 AM, Random832 wrote: > On Mon, Jun 6, 2016, at 07:23, Sturla Molden wrote: >> Ok, but if third-party developers shall be free to use a C89 compiler for >> their own code, we cannot have C99 in the include files. Otherwise the >> include files will taint the C89 purity of their source code. >> >> Personally I don't think we need to worry about compilers that don't >> implement C99 features like inline functions in C. How long have the >> Linux >> kernel used inline functions instead of macros? 20 years or more? > > Using inline functions instead of macros doesn't have to mean anything > but a performance hit on platforms that don't support them, since the > inline keyword, or some other identifier, could be defined to expand to > an empty token sequence on platforms that do not support it. It's much > lower impact on the source code than some other C99 features. That could be a major performance impact. -- --Guido van Rossum (python.org/~guido) From eric at trueblade.com Mon Jun 6 10:31:12 2016 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 6 Jun 2016 10:31:12 -0400 Subject: [Python-Dev] C99 In-Reply-To: References: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> Message-ID: <57558930.2040700@trueblade.com> On 06/06/2016 10:11 AM, Guido van Rossum wrote: > On Mon, Jun 6, 2016 at 4:23 AM, Sturla Molden wrote: >> Guido van Rossum wrote: >> >>> I'm not sure I meant that. But if I have a 3rd party extension that >>> compiles with 3.5 headers using C89, then it should still compile with >>> 3.6 headers using C99. Also if I compile it for 3.5 and it only uses >>> the ABI it should still be linkable with 3.6. >> >> Ok, but if third-party developers shall be free to use a C89 compiler for >> their own code, we cannot have C99 in the include files. Otherwise the >> include files will taint the C89 purity of their source code. > > Well, they should use the right compiler for the Python version they > are targeting. I'm just saying that they can't afford C99 features in > their own code. Not even to call C/Python APIs. I think it would be > okay if e.g. Py_INCREF was an inline function in Python 3.6, as long > as the way you use it remains the same. Right. So we could use C99 features in 3.6 .h files, as long as the same extension module, unmodified, could be compiled with 3.5 .h files with a 3.5 approved (C89) compiler, and also with a 3.6 approved (C99) compiler. The headers would be different, but so would the compilers. It's the extension module source code that must be the same in the two scenarios. We're not saying that an extension module must compile with a C89 compiler under 3.6. Eric. From guido at python.org Mon Jun 6 10:39:55 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Jun 2016 07:39:55 -0700 Subject: [Python-Dev] C99 In-Reply-To: <57558930.2040700@trueblade.com> References: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> <57558930.2040700@trueblade.com> Message-ID: Right. On Mon, Jun 6, 2016 at 7:31 AM, Eric V. Smith wrote: > On 06/06/2016 10:11 AM, Guido van Rossum wrote: >> On Mon, Jun 6, 2016 at 4:23 AM, Sturla Molden wrote: >>> Guido van Rossum wrote: >>> >>>> I'm not sure I meant that. But if I have a 3rd party extension that >>>> compiles with 3.5 headers using C89, then it should still compile with >>>> 3.6 headers using C99. Also if I compile it for 3.5 and it only uses >>>> the ABI it should still be linkable with 3.6. >>> >>> Ok, but if third-party developers shall be free to use a C89 compiler for >>> their own code, we cannot have C99 in the include files. Otherwise the >>> include files will taint the C89 purity of their source code. >> >> Well, they should use the right compiler for the Python version they >> are targeting. I'm just saying that they can't afford C99 features in >> their own code. Not even to call C/Python APIs. I think it would be >> okay if e.g. Py_INCREF was an inline function in Python 3.6, as long >> as the way you use it remains the same. > > Right. So we could use C99 features in 3.6 .h files, as long as the same > extension module, unmodified, could be compiled with 3.5 .h files with a > 3.5 approved (C89) compiler, and also with a 3.6 approved (C99) compiler. > > The headers would be different, but so would the compilers. It's the > extension module source code that must be the same in the two scenarios. > > We're not saying that an extension module must compile with a C89 > compiler under 3.6. > > Eric. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Mon Jun 6 15:23:43 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 6 Jun 2016 15:23:43 -0400 Subject: [Python-Dev] PEP 492: __aiter__ should return async iterator directly instead of awaitable Message-ID: <5755CDBF.40905@gmail.com> There is a small flaw in PEP 492 design -- __aiter__ should not return an awaitable object that resolves to an asynchronous iterator. It should return an asynchronous iterator directly. Let me explain this by showing some examples. I've discovered this while working on a new asynchronous generators PEP. Let's pretend that we have them already: if we have a 'yield' expression in an 'async def' function, the function becomes an "asynchronous generator function": async def foo(): await bar() yield 1 await baz() yield 2 # foo -- is an `asynchronous generator function` # foo() -- is an `asynchronous generator` If we iterate through "foo()", it will await on "bar()", yield "1", await on "baz()", and yield "2": >>> async for el in foo(): ... print(el) 1 2 If we decide to have a class with an __aiter__ that is an async generator, we'd write something like this: class Foo: async def __aiter__(self): await bar() yield 1 await baz() yield 2 However, with the current PEP 492 design, the above code would be invalid! The interpreter expects __aiter__ to return a coroutine, not an async generator. I'm still working on the PEP for async generators, targeting CPython 3.6. And once it is ready, it might still be rejected or deferred. But in any case, this PEP 492 flaw has to be fixed now, in 3.5.2 (since PEP 492 is provisional). I've created an issue on the bug tracker: http://bugs.python.org/issue27243 The proposed patch fixes the __aiter__ in a backwards compatible way: 1. ceval/GET_AITER opcode calls the __aiter__ method. 2. If the returned object has an '__anext__' method, GET_AITER silently wraps it in an awaitable, which is equivalent to the following coroutine: async def wrapper(aiter_result): return aiter_result 3. If the returned object does not have an '__anext__' method, a DeprecationWarning is raised. From lukasz at langa.pl Mon Jun 6 16:02:11 2016 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Mon, 6 Jun 2016 13:02:11 -0700 Subject: [Python-Dev] PEP 492: __aiter__ should return async iterator directly instead of awaitable In-Reply-To: <5755CDBF.40905@gmail.com> References: <5755CDBF.40905@gmail.com> Message-ID: <3E4C21A8-4AB0-4ABC-A090-2BE8D5CFBD95@langa.pl> On Jun 6, 2016, at 12:23 PM, Yury Selivanov wrote: > > However, with the current PEP 492 design, the above code would be invalid! The interpreter expects __aiter__ to return a coroutine, not an async generator. > Yes, I remember asking about the reason behind __aiter__ being an awaitable during the original PEP 492 review process. You added an explanation to the PEP but I don?t think we ever had an example where this was needed. I?m +1 to resolve this now. > The proposed patch fixes the __aiter__ in a backwards compatible way: > > 1. ceval/GET_AITER opcode calls the __aiter__ method. > > 2. If the returned object has an '__anext__' method, GET_AITER silently wraps it in an awaitable, which is equivalent to the following coroutine: > > async def wrapper(aiter_result): > return aiter_result > > 3. If the returned object does not have an '__anext__' method, a DeprecationWarning is raised. There?s a problem with this approach. It will force people to write deprecated code because you never know if your library is going to run on 3.5.0 or 3.5.1. Barry, Ubuntu wily, xenial and yakkety currently package 3.5.0 or 3.5.1. When 3.5.2 is going to get released, are they going to get it? I?m pretty sure wily isn?t and yakkety is but just wanted to confirm; especially with xenial being an LTS release. -- Not-that-i-see-a-different-way-out?sly yours, ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From yselivanov.ml at gmail.com Mon Jun 6 16:05:53 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 6 Jun 2016 16:05:53 -0400 Subject: [Python-Dev] PEP 492: __aiter__ should return async iterator directly instead of awaitable In-Reply-To: <3E4C21A8-4AB0-4ABC-A090-2BE8D5CFBD95@langa.pl> References: <5755CDBF.40905@gmail.com> <3E4C21A8-4AB0-4ABC-A090-2BE8D5CFBD95@langa.pl> Message-ID: <5755D7A1.40402@gmail.com> On 2016-06-06 4:02 PM, ?ukasz Langa wrote: >> The proposed patch fixes the __aiter__ in a backwards compatible way: >> >> 1. ceval/GET_AITER opcode calls the __aiter__ method. >> >> 2. If the returned object has an '__anext__' method, GET_AITER >> silently wraps it in an awaitable, which is equivalent to the >> following coroutine: >> >> async def wrapper(aiter_result): >> return aiter_result >> >> 3. If the returned object does not have an '__anext__' method, a >> DeprecationWarning is raised. > > There?s a problem with this approach. It will force people to write > deprecated code because you never know if your library is going to run > on 3.5.0 or 3.5.1. Barry, Ubuntu wily, xenial and yakkety currently > package 3.5.0 or 3.5.1. When 3.5.2 is going to get released, are they > going to get it? I?m pretty sure wily *isn?t* and yakkety *is* but > just wanted to confirm; especially with xenial being an LTS release. > Yes, I agree. OTOH, I don't see any other way of resolving this. Another option would be to start raising the DeprecationWarning only in 3.6. Yury From guido at python.org Mon Jun 6 16:21:25 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Jun 2016 13:21:25 -0700 Subject: [Python-Dev] PEP 492: __aiter__ should return async iterator directly instead of awaitable In-Reply-To: <5755D7A1.40402@gmail.com> References: <5755CDBF.40905@gmail.com> <3E4C21A8-4AB0-4ABC-A090-2BE8D5CFBD95@langa.pl> <5755D7A1.40402@gmail.com> Message-ID: The RC for 3.5.2 is going out coming weekend (see PEP 478 ). We should get this out now, or make it the first incompatibility in 3.6 (that's also an option; 3.6 feature freeze starts September, see PEP 494 ). On Mon, Jun 6, 2016 at 1:05 PM, Yury Selivanov wrote: > > > On 2016-06-06 4:02 PM, ?ukasz Langa wrote: > >> The proposed patch fixes the __aiter__ in a backwards compatible way: >>> >>> 1. ceval/GET_AITER opcode calls the __aiter__ method. >>> >>> 2. If the returned object has an '__anext__' method, GET_AITER silently >>> wraps it in an awaitable, which is equivalent to the following coroutine: >>> >>> async def wrapper(aiter_result): >>> return aiter_result >>> >>> 3. If the returned object does not have an '__anext__' method, a >>> DeprecationWarning is raised. >>> >> >> There?s a problem with this approach. It will force people to write >> deprecated code because you never know if your library is going to run on >> 3.5.0 or 3.5.1. Barry, Ubuntu wily, xenial and yakkety currently package >> 3.5.0 or 3.5.1. When 3.5.2 is going to get released, are they going to get >> it? I?m pretty sure wily *isn?t* and yakkety *is* but just wanted to >> confirm; especially with xenial being an LTS release. >> >> > Yes, I agree. OTOH, I don't see any other way of resolving this. > > Another option would be to start raising the DeprecationWarning only in > 3.6. > > Yury > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Jun 6 16:58:54 2016 From: barry at python.org (Barry Warsaw) Date: Mon, 6 Jun 2016 16:58:54 -0400 Subject: [Python-Dev] PEP 492: __aiter__ should return async iterator directly instead of awaitable In-Reply-To: <3E4C21A8-4AB0-4ABC-A090-2BE8D5CFBD95@langa.pl> References: <5755CDBF.40905@gmail.com> <3E4C21A8-4AB0-4ABC-A090-2BE8D5CFBD95@langa.pl> Message-ID: <20160606165854.7af952c3@subdivisions.wooz.org> On Jun 06, 2016, at 01:02 PM, ?ukasz Langa wrote: >There?s a problem with this approach. It will force people to write >deprecated code because you never know if your library is going to run on >3.5.0 or 3.5.1. Barry, Ubuntu wily, xenial and yakkety currently package >3.5.0 or 3.5.1. When 3.5.2 is going to get released, are they going to get >it? I?m pretty sure wily isn?t and yakkety is but just wanted to confirm; >especially with xenial being an LTS release. Matthias and I talked briefly about this at Pycon. We want to get 3.5.2 into Ubuntu 16.04.1 if it's released in time. 16.04.1 is currently scheduled for July 21st [1] so if Larry keeps with his announced schedule that should work out[2]. Obviously it would make it into Yakkety too. It's not worth it for Wily (15.10) since that EOLs next month. Cheers, -Barry [1] https://wiki.ubuntu.com/XenialXerus/ReleaseSchedule [2] https://mail.python.org/pipermail/python-dev/2016-April/144383.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tjreedy at udel.edu Mon Jun 6 22:33:31 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 6 Jun 2016 22:33:31 -0400 Subject: [Python-Dev] C99 In-Reply-To: <57558930.2040700@trueblade.com> References: <1b9e1e41-b401-c27d-894e-913ecc485d8d@python.org> <0764173d-e69b-4f75-8797-30828ca6b471@python.org> <1031128263486872749.403655sturla.molden-gmail.com@news.gmane.org> <0a9001d1bf9c$1710b7b0$45322710$@hotmail.com> <287791199486873532.937937sturla.molden-gmail.com@news.gmane.org> <70851808486904675.240857sturla.molden-gmail.com@news.gmane.org> <57558930.2040700@trueblade.com> Message-ID: On 6/6/2016 10:31 AM, Eric V. Smith wrote: > Right. So we could use C99 features in 3.6 .h files, as long as the same > extension module, unmodified, could be compiled with 3.5 .h files with a > 3.5 approved (C89) compiler, and also with a 3.6 approved (C99) compiler. > The headers would be different, but so would the compilers. On Windows, the compiler would be the 2015 MS compiler in both cases. Steve Dower would know if compiler flags need to be changed to enable or stop disabling C99 features. > It's the > extension module source code that must be the same in the two scenarios. We could run the experiment ourselves by changing one or more .h files to include one or more of the C99 features we want while leaving our .c files alone in the sense of remaining C89 compatible. Compile and run the test suite. If successful, add more. We would soon find out whether any of the features we want in header files require use of C99 features in .c files that include them. With a .h standard established, we could then revise *our* .c files without imposing the same on extensions. -- Terry Jan Reedy From victor.stinner at gmail.com Tue Jun 7 06:24:26 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 7 Jun 2016 12:24:26 +0200 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: Hi, 2016-06-04 19:47 GMT+02:00 Guido van Rossum : > Funny. Just two weeks ago I was helping someone who discovered a > compiler that doesn't support the new relaxed variable declaration > rules. I think it was on Windows. Maybe this move is a little too > aggressively deprecating older Windows compilers? I understood that Python only has a tiny list of officially supported compilers. For example, MinGW is somehow explicitly not supported and I see this as a deliberate choice. I'm quite sure that all supported compilers support C99. Is it worth to support a compiler that in 2016 doesn't support the C standard released in 1999, 17 years ago? Victor From guido at python.org Tue Jun 7 11:18:39 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jun 2016 08:18:39 -0700 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: I'll ask my colleague what his compiler setup was. On Tue, Jun 7, 2016 at 3:24 AM, Victor Stinner wrote: > Hi, > > 2016-06-04 19:47 GMT+02:00 Guido van Rossum : > > Funny. Just two weeks ago I was helping someone who discovered a > > compiler that doesn't support the new relaxed variable declaration > > rules. I think it was on Windows. Maybe this move is a little too > > aggressively deprecating older Windows compilers? > > I understood that Python only has a tiny list of officially supported > compilers. For example, MinGW is somehow explicitly not supported and > I see this as a deliberate choice. > > I'm quite sure that all supported compilers support C99. > > Is it worth to support a compiler that in 2016 doesn't support the C > standard released in 1999, 17 years ago? > > Victor > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jun 7 11:21:45 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jun 2016 08:21:45 -0700 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: So here's the diffs that seem to indicate we were working with a compiler that wasn't full C99 (or maybe previously we were working with a compiler that had extensions?) https://github.com/dropbox/typed_ast/commit/f7497e25abc3bcceced3ca6c3be3786d8805df41 On Tue, Jun 7, 2016 at 8:18 AM, Guido van Rossum wrote: > I'll ask my colleague what his compiler setup was. > > On Tue, Jun 7, 2016 at 3:24 AM, Victor Stinner > wrote: > >> Hi, >> >> 2016-06-04 19:47 GMT+02:00 Guido van Rossum : >> > Funny. Just two weeks ago I was helping someone who discovered a >> > compiler that doesn't support the new relaxed variable declaration >> > rules. I think it was on Windows. Maybe this move is a little too >> > aggressively deprecating older Windows compilers? >> >> I understood that Python only has a tiny list of officially supported >> compilers. For example, MinGW is somehow explicitly not supported and >> I see this as a deliberate choice. >> >> I'm quite sure that all supported compilers support C99. >> >> Is it worth to support a compiler that in 2016 doesn't support the C >> standard released in 1999, 17 years ago? >> >> Victor >> > > > > -- > --Guido van Rossum (python.org/~guido) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jun 7 13:37:58 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 10:37:58 -0700 Subject: [Python-Dev] Proper way to specify that a method is not defined for a type Message-ID: <57570676.4070700@stoneleaf.us> For binary methods, such as __add__, either do not implement or return NotImplemented if the other operand/class is not supported. For non-binary methods, simply do not define. Except for subclasses when the super-class defines __hash__ and the subclass is not hashable -- then set __hash__ to None. Question: Are there any other methods that should be set to None to tell the run-time that the method is not supported? Or is this a general mechanism for subclasses to declare any method is unsupported? -- ~Ethan~ From ericsnowcurrently at gmail.com Tue Jun 7 13:51:52 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 10:51:52 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace Message-ID: Hi all, Following discussion a few years back (and rough approval from Guido [1]), I started work on using OrderedDict for the class definition namespace by default. The bulk of the effort lay in implementing OrderedDict in C, which I got landed just in time for 3.5. The remaining work was quite minimal and the actual change is quite small. My intention was to land the patch soon, having gone through code review during PyCon. However, Nick pointed out to me the benefit of having a concrete point of reference for the change, as well as making sure it isn't a problem for other implementations. So in that spirit, here's a PEP for the change. Feedback is welcome, particularly from from other implementors. -eric [1] https://mail.python.org/pipermail/python-ideas/2013-February/019704.html ================================================== PEP: XXX Title: Ordered Class Definition Namespace Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-Jun-2016 Python-Version: 3.6 Post-History: 7-Jun-2016 Abstract ======== This PEP changes the default class definition namespace to ``OrderedDict``. Furthermore, the order in which the attributes are defined in each class body will now be preserved in ``type.__definition_order__``. This allows introspection of the original definition order, e.g. by class decorators. Note: just to be clear, this PEP is *not* about changing ``type.__dict__`` to ``OrderedDict``. Motivation ========== Currently the namespace used during execution of a class body defaults to dict. If the metaclass defines ``__prepare__()`` then the result of calling it is used. Thus, before this PEP, if you needed your class definition namespace to be ``OrderedDict`` you had to use a metaclass. Metaclasses introduce an extra level of complexity to code and in some cases (e.g. conflicts) are a problem. So reducing the need for them is worth doing when the opportunity presents itself. Given that we now have a C implementation of ``OrderedDict`` and that ``OrderedDict`` is the common use case for ``__prepare__()``, we have such an opportunity by defaulting to ``OrderedDict``. The usefulness of ``OrderedDict``-by-default is greatly increased if the definition order is directly introspectable on classes afterward, particularly by code that is independent of the original class definition. One of the original motivating use cases for this PEP is generic class decorators that make use of the definition order. Changing the default class definition namespace has been discussed a number of times, including on the mailing lists and in PEP 422 and PEP 487 (see the References section below). Specification ============= * the default class *definition* namespace is now ``OrderdDict`` * the order in which class attributes are defined is preserved in the new ``__definition_order__`` attribute on each class * "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored * ``__definition_order__`` is a tuple * ``__definition_order__`` is a read-only attribute * ``__definition_order__`` is always set: * if ``__definition_order__`` is defined in the class body then it is used * types that do not have a class definition (e.g. builtins) have their ``__definition_order__`` set to ``None`` * types for which `__prepare__()`` returned something other than ``OrderedDict`` (or a subclass) have their ``__definition_order__`` set to ``None`` The following code demonstrates roughly equivalent semantics:: class Meta(type): def __prepare__(cls, *args, **kwargs): return OrderedDict() class Spam(metaclass=Meta): ham = None eggs = 5 __definition_order__ = tuple(k for k in locals() if (!k.startswith('__') or !k.endswith('__'))) Note that [pep487_] proposes a similar solution, albeit as part of a broader proposal. Compatibility ============= This PEP does not break backward compatibility, except in the case that someone relies *strictly* on dicts as the class definition namespace. This shouldn't be a problem. Changes ============= In addition to the class syntax, the following expose the new behavior: * builtins.__build_class__ * types.prepare_class * types.new_class Other Python Implementations ============================ Pending feedback, the impact on Python implementations is expected to be minimal. If a Python implementation cannot support switching to `OrderedDict``-by-default then it can always set ``__definition_order__`` to ``None``. Implementation ============== The implementation is found in the tracker. [impl_] Alternatives ============ type.__dict__ as OrderedDict ---------------------------- Instead of storing the definition order in ``__definition_order__``, the now-ordered definition namespace could be copied into a new ``OrderedDict``. This would mostly provide the same semantics. However, using ``OrderedDict`` for ``type,__dict__`` would obscure the relationship with the definition namespace, making it less useful. Additionally, doing this would require significant changes to the semantics of the concrete dict C-API. A "namespace" Keyword Arg for Class Definition ---------------------------------------------- PEP 422 introduced a new "namespace" keyword arg to class definitions that effectively replaces the need to ``__prepare__()``. [pep422_] However, the proposal was withdrawn in favor of the simpler PEP 487. References ========== .. [impl] issue #24254 (https://bugs.python.org/issue24254) .. [pep422] PEP 422 (https://www.python.org/dev/peps/pep-0422/#order-preserving-classes) .. [pep487] PEP 487 (https://www.python.org/dev/peps/pep-0487/#defining-arbitrary-namespaces) .. [orig] original discussion (https://mail.python.org/pipermail/python-ideas/2013-February/019690.html) .. [followup1] follow-up 1 (https://mail.python.org/pipermail/python-dev/2013-June/127103.html) .. [followup2] follow-up 2 (https://mail.python.org/pipermail/python-dev/2015-May/140137.html) Copyright =========== This document has been placed in the public domain. From vgr255 at live.ca Tue Jun 7 13:55:01 2016 From: vgr255 at live.ca (=?iso-8859-1?Q?=C9manuel_Barry?=) Date: Tue, 7 Jun 2016 13:55:01 -0400 Subject: [Python-Dev] Proper way to specify that a method is not defined for a type In-Reply-To: <57570676.4070700@stoneleaf.us> References: <57570676.4070700@stoneleaf.us> Message-ID: > From: Ethan Furman > Sent: Tuesday, June 07, 2016 1:38 PM > To: Python Dev > Subject: [Python-Dev] Proper way to specify that a method is not defined for > a type (Just so everyone follows, this is a followup of http://bugs.python.org/issue27242 ) > For binary methods, such as __add__, either do not implement or return > NotImplemented if the other operand/class is not supported. > > For non-binary methods, simply do not define. > > Except for subclasses when the super-class defines __hash__ and the > subclass is not hashable -- then set __hash__ to None. Should I mention the __hash__ special case in the NotImplemented/NotImplementedError docs? If people are looking for a way to declare this specific operation undefined, they'd find it there as well as the hash() documentation. > Question: > > Are there any other methods that should be set to None to tell the > run-time that the method is not supported? Or is this a general > mechanism for subclasses to declare any method is unsupported? There was a discussion on one of Python-ideas or Python-dev some time ago about exactly that, but I don't think any consensus was reached. However, I think it would make sense for e.g. __iter__ and __reversed__ to tell the interpreter to *not* fall back to the default sequence protocol (well, in practice that already works, but it gives an unhelpful error message). I'm not sure how useful it would be for arbitrary methods, though. __bytes__ (which originally sparked the issue) may or may not be a good candidate, I'm not sure. While I like the `__magic_method__ = None` approach, I think the main reason __hash__ supports that is because there are legitimate use cases of disallowing hashing (i.e. mutable objects which may or may not change hash during their lifetime), but I don't think the same rationale applies to everything ("accidentally" iterating over a not-meant-to-be-iterable object will result in nonsensical data, but it won't bite the user later, unlike changing hashes which definitely will). > -- > ~Ethan~ Special-cases-aren't-special-enough-but-they're-still-there'ly yrs, -Emanuel From guido at python.org Tue Jun 7 13:56:37 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jun 2016 10:56:37 -0700 Subject: [Python-Dev] Proper way to specify that a method is not defined for a type In-Reply-To: <57570676.4070700@stoneleaf.us> References: <57570676.4070700@stoneleaf.us> Message-ID: Setting it to None in the subclass is the intended pattern. But CPython must explicitly handle that somewhere so I don't know how general it is supported. Try defining a list subclass with __len__ set to None and see what happens. Then try the same with MutableSequence. On Tue, Jun 7, 2016 at 10:37 AM, Ethan Furman wrote: > For binary methods, such as __add__, either do not implement or return > NotImplemented if the other operand/class is not supported. > > For non-binary methods, simply do not define. > > Except for subclasses when the super-class defines __hash__ and the > subclass is not hashable -- then set __hash__ to None. > > Question: > > Are there any other methods that should be set to None to tell the > run-time that the method is not supported? Or is this a general mechanism > for subclasses to declare any method is unsupported? > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jun 7 14:01:43 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 11:01:43 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: <57570C07.9000703@stoneleaf.us> On 06/07/2016 10:51 AM, Eric Snow wrote: > My intention was to land the patch soon, having gone through code > review during PyCon. However, Nick pointed out to me the benefit of > having a concrete point of reference for the change, as well as making > sure it isn't a problem for other implementations. So in that spirit, > here's a PEP for the change. Feedback is welcome, particularly from > from other implementors. +1 > Specification > ============= > * types for which `__prepare__()`` returned something other than > ``OrderedDict`` (or a subclass) have their ``__definition_order__`` > set to ``None`` I assume this check happens in type.__new__? If a non-OrderedDict is used as the namespace, but a __definition_order__ key and value are supplied, is it used or still set to None? -- ~Ethan~ From ericsnowcurrently at gmail.com Tue Jun 7 14:13:45 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 11:13:45 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: <57570C07.9000703@stoneleaf.us> References: <57570C07.9000703@stoneleaf.us> Message-ID: On Tue, Jun 7, 2016 at 11:01 AM, Ethan Furman wrote: > On 06/07/2016 10:51 AM, Eric Snow wrote: >> Specification >> ============= > > >> * types for which `__prepare__()`` returned something other than >> ``OrderedDict`` (or a subclass) have their ``__definition_order__`` >> set to ``None`` > > > I assume this check happens in type.__new__? If a non-OrderedDict is used > as the namespace, but a __definition_order__ key and value are supplied, is > it used or still set to None? A __definition_order__ in the class body always takes precedence. So a supplied value will be honored (and not replaced with None). -eric From tjreedy at udel.edu Tue Jun 7 14:32:14 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 7 Jun 2016 14:32:14 -0400 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 6/7/2016 1:51 PM, Eric Snow wrote: > Note: just to be clear, this PEP is *not* about changing > ``type.__dict__`` to ``OrderedDict``. By 'type', do you mean the one and one objected named 'type or the class being defined? To be really clear, will the following change? >>> class C: pass >>> type(C.__dict__) If the proposal only affects (slows) the class definition process, and then only minimally, and has no effect on class use, then +1 on being able to avoid metaclass and prepare for its most common current usage. -- Terry Jan Reedy From vgr255 at live.ca Tue Jun 7 14:36:20 2016 From: vgr255 at live.ca (=?iso-8859-1?Q?=C9manuel_Barry?=) Date: Tue, 7 Jun 2016 14:36:20 -0400 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: > From: Eric Snow > Sent: Tuesday, June 07, 2016 1:52 PM > To: Python-Dev > Subject: [Python-Dev] PEP: Ordered Class Definition Namespace > > > Currently the namespace used during execution of a class body defaults > to dict. If the metaclass defines ``__prepare__()`` then the result of > calling it is used. Thus, before this PEP, if you needed your class > definition namespace to be ``OrderedDict`` you had to use a metaclass. Formatting nit: ``dict`` > Specification > ============= > > * the default class *definition* namespace is now ``OrderdDict`` > * the order in which class attributes are defined is preserved in the > new ``__definition_order__`` attribute on each class > * "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored What does this imply? If I define some __dunder__ methods, will they simply not be present in __definition_order__? What if I want to keep the order of those? While keeping the order of these might be meaningless in most cases, I don't think there's really a huge problem in doing so. Maybe I'm overthinking it. > * ``__definition_order__`` is a tuple > * ``__definition_order__`` is a read-only attribute > * ``__definition_order__`` is always set: > > * if ``__definition_order__`` is defined in the class body then it > is used > * types that do not have a class definition (e.g. builtins) have > their ``__definition_order__`` set to ``None`` > * types for which `__prepare__()`` returned something other than > ``OrderedDict`` (or a subclass) have their ``__definition_order__`` > set to ``None`` I would probably like a ``type.definition_order`` method, for which the return value is bound to __definition_order__ when the class is created (much like the link between ``type.mro`` and ``cls.__mro__``. Additionally I'm not sure if setting the attribute to None is a good idea; I'd have it as an empty tuple. Then again I tend to overthink a lot. > The following code demonstrates roughly equivalent semantics:: > > class Meta(type): > def __prepare__(cls, *args, **kwargs): > return OrderedDict() > > class Spam(metaclass=Meta): > ham = None > eggs = 5 > __definition_order__ = tuple(k for k in locals() > if (!k.startswith('__') or > !k.endswith('__'))) Mixing up C and Python syntax here. > However, using ``OrderedDict`` for ``type,__dict__`` would obscure the > relationship with the definition namespace, making it less useful. > Additionally, doing this would require significant changes to the > semantics of the concrete dict C-API. Formatting nit: ``dict`` I'm +1 on the whole idea (one of my common uses of metaclasses was to keep the definition order *somewhere*). Thank you for doing that! -Emanuel From ericsnowcurrently at gmail.com Tue Jun 7 14:39:06 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 11:39:06 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Tue, Jun 7, 2016 at 11:32 AM, Terry Reedy wrote: > On 6/7/2016 1:51 PM, Eric Snow wrote: > >> Note: just to be clear, this PEP is *not* about changing > >> ``type.__dict__`` to ``OrderedDict``. > > By 'type', do you mean the one and one objected named 'type or the class > being defined? To be really clear, will the following change? > >>>> class C: pass > >>>> type(C.__dict__) > I mean the latter, "type" -> the class being defined. > > If the proposal only affects (slows) the class definition process, and then > only minimally, and has no effect on class use, then +1 on being able to > avoid metaclass and prepare for its most common current usage. That is all correct. -eric From ethan at stoneleaf.us Tue Jun 7 14:45:52 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 11:45:52 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: <57570C07.9000703@stoneleaf.us> Message-ID: <57571660.2090709@stoneleaf.us> On 06/07/2016 11:13 AM, Eric Snow wrote: > On Tue, Jun 7, 2016 at 11:01 AM, Ethan Furman wrote: >> On 06/07/2016 10:51 AM, Eric Snow wrote: >>> Specification >>> ============= >> >> >>> * types for which `__prepare__()`` returned something other than >>> ``OrderedDict`` (or a subclass) have their ``__definition_order__`` >>> set to ``None`` >> >> >> I assume this check happens in type.__new__? If a non-OrderedDict is used >> as the namespace, but a __definition_order__ key and value are supplied, is >> it used or still set to None? > > A __definition_order__ in the class body always takes precedence. So > a supplied value will be honored (and not replaced with None). Will the supplied __definition_order__ be made a tuple, and still be read-only? -- ~Ethan~ From rdmurray at bitdance.com Tue Jun 7 14:45:03 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 07 Jun 2016 14:45:03 -0400 Subject: [Python-Dev] Proper way to specify that a method is not defined for a type In-Reply-To: References: <57570676.4070700@stoneleaf.us> Message-ID: <20160607184504.82143B14027@webabinitio.net> For those interested in this topic, if you are not already aware of it, see also http://bugs.python.org/issue25958, which among other things has a relevant proposed patch for datamode.rst. On Tue, 07 Jun 2016 10:56:37 -0700, Guido van Rossum wrote: > Setting it to None in the subclass is the intended pattern. But CPython > must explicitly handle that somewhere so I don't know how general it is > supported. Try defining a list subclass with __len__ set to None and see > what happens. Then try the same with MutableSequence. > > On Tue, Jun 7, 2016 at 10:37 AM, Ethan Furman wrote: > > > For binary methods, such as __add__, either do not implement or return > > NotImplemented if the other operand/class is not supported. > > > > For non-binary methods, simply do not define. > > > > Except for subclasses when the super-class defines __hash__ and the > > subclass is not hashable -- then set __hash__ to None. > > > > Question: > > > > Are there any other methods that should be set to None to tell the > > run-time that the method is not supported? Or is this a general mechanism > > for subclasses to declare any method is unsupported? From ericsnowcurrently at gmail.com Tue Jun 7 14:51:34 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 11:51:34 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Tue, Jun 7, 2016 at 11:36 AM, ?manuel Barry wrote: >> From: Eric Snow >> * "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored > > What does this imply? If I define some __dunder__ methods, will they simply > not be present in __definition_order__? What if I want to keep the order of > those? While keeping the order of these might be meaningless in most cases, > I don't think there's really a huge problem in doing so. Maybe I'm > overthinking it. "dunder" names (not just methods) will not be present in __definition_order__. I'll add an explanation to the PEP. The gist of it is that they are reserved for use by the interpreter and will always clutter up __definition_order__. Since needing dunder names included in __definition_order__ would be rather exceptional, and there are other options available, leaving them out by default is a matter of practicality. > >> * ``__definition_order__`` is a tuple >> * ``__definition_order__`` is a read-only attribute >> * ``__definition_order__`` is always set: >> >> * if ``__definition_order__`` is defined in the class body then it >> is used >> * types that do not have a class definition (e.g. builtins) have >> their ``__definition_order__`` set to ``None`` >> * types for which `__prepare__()`` returned something other than >> ``OrderedDict`` (or a subclass) have their ``__definition_order__`` >> set to ``None`` > > I would probably like a ``type.definition_order`` method, for which the > return value is bound to __definition_order__ when the class is created > (much like the link between ``type.mro`` and ``cls.__mro__``. What is the value of type.definition_order()? If you need a mutable copy then pass __definition_order__ to list(). > Additionally > I'm not sure if setting the attribute to None is a good idea; I'd have it as > an empty tuple. Then again I tend to overthink a lot. None indicates that there is no order. An empty tuple indicates that there were no attributes. >> __definition_order__ = tuple(k for k in locals() >> if (!k.startswith('__') or >> !k.endswith('__'))) > > Mixing up C and Python syntax here. nice catch :) > I'm +1 on the whole idea (one of my common uses of metaclasses was to keep > the definition order *somewhere*). Thank you for doing that! :) -eric From ericsnowcurrently at gmail.com Tue Jun 7 14:53:53 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 11:53:53 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: <57571660.2090709@stoneleaf.us> References: <57570C07.9000703@stoneleaf.us> <57571660.2090709@stoneleaf.us> Message-ID: On Tue, Jun 7, 2016 at 11:45 AM, Ethan Furman wrote: > On 06/07/2016 11:13 AM, Eric Snow wrote: >> A __definition_order__ in the class body always takes precedence. So >> a supplied value will be honored (and not replaced with None). > > Will the supplied __definition_order__ be made a tuple, and still be > read-only? I had planned on leaving a supplied one alone. So no change to tuple. It remain a read-only attribute though, since that is handled via a descriptor (a la type.__dict__). -eric From vgr255 at live.ca Tue Jun 7 14:57:16 2016 From: vgr255 at live.ca (=?utf-8?Q?=C3=89manuel_Barry?=) Date: Tue, 7 Jun 2016 14:57:16 -0400 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: > From: Eric Snow > Sent: Tuesday, June 07, 2016 2:52 PM > To: ?manuel Barry > Cc: Python-Dev > Subject: Re: [Python-Dev] PEP: Ordered Class Definition Namespace > > "dunder" names (not just methods) will not be present in > __definition_order__. I'll add an explanation to the PEP. The gist > of it is that they are reserved for use by the interpreter and will > always clutter up __definition_order__. Since needing dunder names > included in __definition_order__ would be rather exceptional, and > there are other options available, leaving them out by default is a > matter of practicality. Good point. I'll assume that if we need that we'll do something in the metaclass. > What is the value of type.definition_order()? If you need a mutable > copy then pass __definition_order__ to list(). I think I explained it backwards. I meant to have a method on ``type`` (which metaclasses can override at will) which will set what is passed to the resulting __definition_order__ attribute. But it might not be needed, as we can probably sneak that inside the namespace in the metaclass' __new__. > > Additionally > > I'm not sure if setting the attribute to None is a good idea; I'd have it as > > an empty tuple. Then again I tend to overthink a lot. > > None indicates that there is no order. An empty tuple indicates that > there were no attributes. Fair enough. > > -eric -Emanuel From ncoghlan at gmail.com Tue Jun 7 15:30:31 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jun 2016 12:30:31 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 7 June 2016 at 10:51, Eric Snow wrote: > Specification > ============= > > * the default class *definition* namespace is now ``OrderdDict`` > * the order in which class attributes are defined is preserved in the > new ``__definition_order__`` attribute on each class > * "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored > * ``__definition_order__`` is a tuple > * ``__definition_order__`` is a read-only attribute Thinking about the class decorator use case, I think this may need to be reconsidered, as class decorators may: 1. Remove class attributes 2. Add class attributes This will then lead to __definition_order__ getting out of sync with the current state of the class namespace. One option for dealing with that would be to make type.__setattr__ and type.__delattr__ aware of __definition_order__, and have them replace the tuple with a new one as needed. If we did that, then the main question would be whether updating an existing attribute changed the definition order, and I'd be inclined to say "No" (to minimise the side effects of monkey-patching). The main alternative would be to make __definition_order__ writable, so the default behaviour would be for it to reflect the original class body, but decorators would be free to update it to reflect their changes, as well as to make other modifications (e.g. stripping out all callables from the list). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From sturla.molden at gmail.com Tue Jun 7 15:37:21 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 7 Jun 2016 19:37:21 +0000 (UTC) Subject: [Python-Dev] C99 References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> Victor Stinner wrote: > Is it worth to support a compiler that in 2016 doesn't support the C > standard released in 1999, 17 years ago? MSVC only supports C99 when its needed for C++11 or some MS extension to C. Is it worth supporting MSVC? If not, we have Intel C, Clang and Cygwin GCC are the viable options we have on Windows (and perhaps Embarcadero, but I haven't used C++ builder for a very long time). Even MinGW does not fully support C99, because it depends on Microsoft's CRT. If we think MSVC and MinGW are worth supporting, we cannot just use C99 indiscriminantly. From tritium-list at sdamon.com Tue Jun 7 16:10:28 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Tue, 7 Jun 2016 16:10:28 -0400 Subject: [Python-Dev] C99 In-Reply-To: <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> Message-ID: <0cb001d1c0f8$a3322550$e9966ff0$@hotmail.com> Doesn't Cygwin build against the posix abstraction layer? Wouldn't a python built as such operate as though it was on a unix of some sort? It has been quite a while since I messed with Cygwin - if it hasn't changed, it's not really an option, especially when we have native windows builds now. It would be too much of a downgrade in experience and performance. > -----Original Message----- > From: Python-Dev [mailto:python-dev-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Sturla Molden > Sent: Tuesday, June 7, 2016 3:37 PM > To: python-dev at python.org > Subject: Re: [Python-Dev] C99 > > Victor Stinner wrote: > > > Is it worth to support a compiler that in 2016 doesn't support the C > > standard released in 1999, 17 years ago? > > MSVC only supports C99 when its needed for C++11 or some MS extension > to C. > > Is it worth supporting MSVC? If not, we have Intel C, Clang and Cygwin GCC > are the viable options we have on Windows (and perhaps Embarcadero, but I > haven't used C++ builder for a very long time). Even MinGW does not fully > support C99, because it depends on Microsoft's CRT. If we think MSVC and > MinGW are worth supporting, we cannot just use C99 indiscriminantly. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium- > list%40sdamon.com From ethan at stoneleaf.us Tue Jun 7 16:28:13 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 13:28:13 -0700 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview Message-ID: <57572E5D.4020101@stoneleaf.us> Minor changes: updated version numbers, add punctuation. The current text seems to take into account Guido's last comments. Thoughts before asking for acceptance? PEP: 467 Title: Minor API improvements for binary sequences Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-03-30 Python-Version: 3.5 Post-History: 2014-03-30 2014-08-15 2014-08-16 Abstract ======== During the initial development of the Python 3 language specification, the core ``bytes`` type for arbitrary binary data started as the mutable type that is now referred to as ``bytearray``. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series. This PEP proposes four small adjustments to the APIs of the ``bytes``, ``bytearray`` and ``memoryview`` types to make it easier to operate entirely in the binary domain: * Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors * Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and ``memoryview.iterbytes`` alternative iterators Proposals ========= Deprecation of current "zero-initialised sequence" behaviour ------------------------------------------------------------ Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size:: >>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00') This PEP proposes to deprecate that behaviour in Python 3.6, and remove it entirely in Python 3.7. No other changes are proposed to the existing constructors. Addition of explicit "zero-initialised sequence" constructors ------------------------------------------------------------- To replace the deprecated behaviour, this PEP proposes the addition of an explicit ``zeros`` alternative constructor as a class method on both ``bytes`` and ``bytearray``:: >>> bytes.zeros(3) b'\x00\x00\x00' >>> bytearray.zeros(3) bytearray(b'\x00\x00\x00') It will behave just as the current constructors behave when passed a single integer. The specific choice of ``zeros`` as the alternative constructor name is taken from the corresponding initialisation function in NumPy (although, as these are 1-dimensional sequence types rather than N-dimensional matrices, the constructors take a length as input rather than a shape tuple). Addition of explicit "single byte" constructors ----------------------------------------------- As binary counterparts to the text ``chr`` function, this PEP proposes the addition of an explicit ``byte`` alternative constructor as a class method on both ``bytes`` and ``bytearray``:: >>> bytes.byte(3) b'\x03' >>> bytearray.byte(3) bytearray(b'\x03') These methods will only accept integers in the range 0 to 255 (inclusive):: >>> bytes.byte(512) Traceback (most recent call last): File "", line 1, in ValueError: bytes must be in range(0, 256) >>> bytes.byte(1.0) Traceback (most recent call last): File "", line 1, in TypeError: 'float' object cannot be interpreted as an integer The documentation of the ``ord`` builtin will be updated to explicitly note that ``bytes.byte`` is the inverse operation for binary data, while ``chr`` is the inverse operation for text data. Behaviourally, ``bytes.byte(x)`` will be equivalent to the current ``bytes([x])`` (and similarly for ``bytearray``). The new spelling is expected to be easier to discover and easier to read (especially when used in conjunction with indexing operations on binary sequence types). As a separate method, the new spelling will also work better with higher order functions like ``map``. Addition of optimised iterator methods that produce ``bytes`` objects --------------------------------------------------------------------- This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an optimised ``iterbytes`` method that produces length 1 ``bytes`` objects rather than integers:: for x in data.iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer The method can be used with arbitrary buffer exporting objects by wrapping them in a ``memoryview`` instance first:: for x in memoryview(data).iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer For ``memoryview``, the semantics of ``iterbytes()`` are defined such that:: memview.tobytes() == b''.join(memview.iterbytes()) This allows the raw bytes of the memory view to be iterated over without needing to make a copy, regardless of the defined shape and format. The main advantage this method offers over the ``map(bytes.byte, data)`` approach is that it is guaranteed *not* to fail midstream with a ``ValueError`` or ``TypeError``. By contrast, when using the ``map`` based approach, the type and value of the individual items in the iterable are only checked as they are retrieved and passed through the ``bytes.byte`` constructor. Design discussion ================= Why not rely on sequence repetition to create zero-initialised sequences? ------------------------------------------------------------------------- Zero-initialised sequences can be created via sequence repetition:: >>> b'\x00' * 3 b'\x00\x00\x00' >>> bytearray(b'\x00') * 3 bytearray(b'\x00\x00\x00') However, this was also the case when the ``bytearray`` type was originally designed, and the decision was made to add explicit support for it in the type constructor. The immutable ``bytes`` type then inherited that feature when it was introduced in PEP 3137. This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity. References ========== .. [1] Initial March 2014 discussion thread on python-ideas (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) .. [2] Guido's initial feedback in that thread (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html) .. [3] Issue proposing moving zero-initialised sequences to a dedicated API (http://bugs.python.org/issue20895) .. [4] Issue proposing to use calloc() for zero-initialised binary sequences (http://bugs.python.org/issue21644) .. [5] August 2014 discussion thread on python-dev (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) Copyright ========= This document has been placed in the public domain. From gvanrossum at gmail.com Tue Jun 7 15:45:35 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue, 7 Jun 2016 12:45:35 -0700 Subject: [Python-Dev] C99 In-Reply-To: <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> Message-ID: We should definitely keep supporting MSVC. --Guido (mobile) On Jun 7, 2016 12:39 PM, "Sturla Molden" wrote: > Victor Stinner wrote: > > > Is it worth to support a compiler that in 2016 doesn't support the C > > standard released in 1999, 17 years ago? > > MSVC only supports C99 when its needed for C++11 or some MS extension to C. > > Is it worth supporting MSVC? If not, we have Intel C, Clang and Cygwin GCC > are the viable options we have on Windows (and perhaps Embarcadero, but I > haven't used C++ builder for a very long time). Even MinGW does not fully > support C99, because it depends on Microsoft's CRT. If we think MSVC and > MinGW are worth supporting, we cannot just use C99 indiscriminantly. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jun 7 16:54:04 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 7 Jun 2016 13:54:04 -0700 Subject: [Python-Dev] C99 In-Reply-To: <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Jun 7, 2016 at 12:37 PM, Sturla Molden wrote: > Victor Stinner wrote: > >> Is it worth to support a compiler that in 2016 doesn't support the C >> standard released in 1999, 17 years ago? > > MSVC only supports C99 when its needed for C++11 or some MS extension to C. > > Is it worth supporting MSVC? If not, we have Intel C, Clang and Cygwin GCC > are the viable options we have on Windows (and perhaps Embarcadero, but I > haven't used C++ builder for a very long time). Even MinGW does not fully > support C99, because it depends on Microsoft's CRT. If we think MSVC and > MinGW are worth supporting, we cannot just use C99 indiscriminantly. No-one's proposing to use C99 indiscriminately; AFAICT the proposal was: it would make a big difference if the CPython core could start using some of C99's basic features like long long, inline functions, and mid-block declarations, and all interesting compilers support these, so let's officially switch from C89-only to C89-plus-the-bits-of-C99-that-MSVC-supports. This would be a big improvement and is just a matter of recognizing the status quo; no need to drag in anything controversial. There's no chance that CPython is going to drop MSVC support in 3.6. Intel C is hardly a viable option given that the license requires the people running the compiler to accept unbounded liability for Intel lawyer bills and imposes non-DFSG-free conditions on the compiled output. And Cygwin GCC isn't even real Windows. Maybe switching to Clang will make sense in 3.7 but that's a long ways off... -n -- Nathaniel J. Smith -- https://vorpus.org From sturla.molden at gmail.com Tue Jun 7 17:03:57 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 7 Jun 2016 21:03:57 +0000 (UTC) Subject: [Python-Dev] C99 References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> Message-ID: <1399814768487025983.893766sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > No-one's proposing to use C99 indiscriminately; > There's no chance that CPython is going to drop MSVC support in 3.6. Stinner was proposing that by saying "Is it worth to support a compiler that in 2016 doesn't support the C standard released in 1999, 17 years ago?" This is basically a suggestion to drop MSVC support, as I read it. That is never going to happen. Sturla From ericsnowcurrently at gmail.com Tue Jun 7 17:20:02 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 14:20:02 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Tue, Jun 7, 2016 at 12:30 PM, Nick Coghlan wrote: > On 7 June 2016 at 10:51, Eric Snow wrote: >> * ``__definition_order__`` is a tuple >> * ``__definition_order__`` is a read-only attribute > > Thinking about the class decorator use case, I think this may need to > be reconsidered, as class decorators may: > > 1. Remove class attributes > 2. Add class attributes > > This will then lead to __definition_order__ getting out of sync with > the current state of the class namespace. I'm not clear on your point. Decorators are applied after the class has been created. Hence they have no impact on the class's definition order. I'd expect __definition_order__ to strictly represent what happened in the class body during definition, and not anything afterward. Certainly __definition_order__ might not align with __dict__ (or dir()); we don't have any way to guarantee that it would, do we? If anything, the ability to diff __definition_order__ and __dict__ is a positive, since it allows you to see changes on the class since it was defined. > > One option for dealing with that would be to make type.__setattr__ and > type.__delattr__ aware of __definition_order__, and have them replace > the tuple with a new one as needed. If we did that, then the main > question would be whether updating an existing attribute changed the > definition order, and I'd be inclined to say "No" (to minimise the > side effects of monkey-patching). > > The main alternative would be to make __definition_order__ writable, > so the default behaviour would be for it to reflect the original class > body, but decorators would be free to update it to reflect their > changes, as well as to make other modifications (e.g. stripping out > all callables from the list). I think both of those make __definition_order__ more complicated and less useful. As the PEP stands, folks can be confident in what __definition_order__ represents. What would you consider to be the benefit of a mutable (or replaceable) __definition_order__ that outweighs the benefit of a simpler definition of what's in it. BTW, thanks for bringing this up. :) -eric From barry at python.org Tue Jun 7 17:31:19 2016 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Jun 2016 17:31:19 -0400 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57572E5D.4020101@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> Message-ID: <20160607173119.36961fcf.barry@wooz.org> On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: >* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and > ``memoryview.iterbytes`` alternative iterators +1 but I want to go just a little farther. We can't change bytes.__getitem__ but we can add another method that returns single byte objects? I think it's still a bit of a pain to extract single bytes even with .iterbytes(). Maybe .iterbytes can take a single index argument (blech) or add a method like .byte_at(i). I'll let you bikeshed on the name. Cheers, -Barry From ethan at stoneleaf.us Tue Jun 7 17:34:58 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 14:34:58 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: <57573E02.9020206@stoneleaf.us> On 06/07/2016 02:20 PM, Eric Snow wrote: > On Tue, Jun 7, 2016 at 12:30 PM, Nick Coghlan wrote: >> On 7 June 2016 at 10:51, Eric Snow wrote: >>> * ``__definition_order__`` is a tuple >>> * ``__definition_order__`` is a read-only attribute >> >> Thinking about the class decorator use case, I think this may need to >> be reconsidered, as class decorators may: >> >> 1. Remove class attributes >> 2. Add class attributes >> >> This will then lead to __definition_order__ getting out of sync with >> the current state of the class namespace. > > I'm not clear on your point. Decorators are applied after the class > has been created. Hence they have no impact on the class's definition > order. I'd expect __definition_order__ to strictly represent what > happened in the class body during definition, and not anything > afterward. > > Certainly __definition_order__ might not align with __dict__ (or > dir()); we don't have any way to guarantee that it would, do we? If > anything, the ability to diff __definition_order__ and __dict__ is a > positive, since it allows you to see changes on the class since it was > defined. > >> >> One option for dealing with that would be to make type.__setattr__ and >> type.__delattr__ aware of __definition_order__, and have them replace >> the tuple with a new one as needed. If we did that, then the main >> question would be whether updating an existing attribute changed the >> definition order, and I'd be inclined to say "No" (to minimise the >> side effects of monkey-patching). >> >> The main alternative would be to make __definition_order__ writable, >> so the default behaviour would be for it to reflect the original class >> body, but decorators would be free to update it to reflect their >> changes, as well as to make other modifications (e.g. stripping out >> all callables from the list). > > I think both of those make __definition_order__ more complicated and > less useful. As the PEP stands, folks can be confident in what > __definition_order__ represents. What would you consider to be the > benefit of a mutable (or replaceable) __definition_order__ that > outweighs the benefit of a simpler definition of what's in it. I think the question is which is more useful? - a definition order that lists items that are not in the class, as well as not having items that are in the class (set by the decorator) or - a definition order that is representative of the class state after all decorators have been applied One argument for the latter is that, even though the class has been technically "defined" (class body executed, type.__new__ called, etc.), applying decorators feels like continued class definition. One argument for the former is simplified implementation, and is definition order really important after the class body has been executed? (okay, two arguments ;) Perhaps the best thing is just to make it writeable -- after all, if __class__, __name__, etc., can all be changed, why should __definition_order__ be special? -- ~Ethan~ From k7hoven at gmail.com Tue Jun 7 17:34:02 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 8 Jun 2016 00:34:02 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57572E5D.4020101@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> Message-ID: On Tue, Jun 7, 2016 at 11:28 PM, Ethan Furman wrote: > > Minor changes: updated version numbers, add punctuation. > > The current text seems to take into account Guido's last comments. > > Thoughts before asking for acceptance? > > PEP: 467 > Title: Minor API improvements for binary sequences > Version: $Revision$ > Last-Modified: $Date$ > Author: Nick Coghlan > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2014-03-30 > Python-Version: 3.5 > Post-History: 2014-03-30 2014-08-15 2014-08-16 > > > Abstract > ======== > > During the initial development of the Python 3 language specification, the core ``bytes`` type for arbitrary binary data started as the mutable type that is now referred to as ``bytearray``. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series. > > This PEP proposes four small adjustments to the APIs of the ``bytes``, ``bytearray`` and ``memoryview`` types to make it easier to operate entirely in the binary domain: > > * Deprecate passing single integer values to ``bytes`` and ``bytearray`` > * Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors > * Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors > * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and > ``memoryview.iterbytes`` alternative iterators > Why not bytes.viewbytes (or whatever name) so that one could also subscript it? And if it were a property, one could perhaps conveniently get the n'th byte: b'abcde'.viewbytes[n] # compared to b'abcde'[n:n+1] Also, would it not be more clear to call the int -> bytes method something like bytes.fromint or bytes.fromord and introduce the same thing on str? And perhaps allow multiple arguments to create a str/bytes of length > 1. I guess this may violate TOOWTDI, but anyway, just a thought. -- Koos From ncoghlan at gmail.com Tue Jun 7 17:34:37 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jun 2016 14:34:37 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 7 June 2016 at 14:20, Eric Snow wrote: > On Tue, Jun 7, 2016 at 12:30 PM, Nick Coghlan wrote: >> The main alternative would be to make __definition_order__ writable, >> so the default behaviour would be for it to reflect the original class >> body, but decorators would be free to update it to reflect their >> changes, as well as to make other modifications (e.g. stripping out >> all callables from the list). > > I think both of those make __definition_order__ more complicated and > less useful. As the PEP stands, folks can be confident in what > __definition_order__ represents. What would you consider to be the > benefit of a mutable (or replaceable) __definition_order__ that > outweighs the benefit of a simpler definition of what's in it. Mainly the fact that class decorators and metaclasses can't hide the difference between "attributes defined in the class body" and "attributes injected by a decorator or metaclass". I don't have a concrete use case for that, it just bothers me on general principles when we have things the interpreter can do that can't readily be emulated in Python code. However, if it proves to be a hassle in practice, making it writable can be done later based on specific use cases, so I don't mind if the PEP stays as it is on that front. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pmiscml at gmail.com Tue Jun 7 17:33:50 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 8 Jun 2016 00:33:50 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57572E5D.4020101@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> Message-ID: <20160608003350.7a7c6641@x230> Hello, On Tue, 07 Jun 2016 13:28:13 -0700 Ethan Furman wrote: > Minor changes: updated version numbers, add punctuation. > > The current text seems to take into account Guido's last comments. > > Thoughts before asking for acceptance? > > [] > Deprecation of current "zero-initialised sequence" behaviour > ------------------------------------------------------------ > > Currently, the ``bytes`` and ``bytearray`` constructors accept an > integer argument and interpret it as meaning to create a > zero-initialised sequence of the given size:: > > >>> bytes(3) > b'\x00\x00\x00' > >>> bytearray(3) > bytearray(b'\x00\x00\x00') > > This PEP proposes to deprecate that behaviour in Python 3.6, and > remove it entirely in Python 3.7. Why the desire to break applications of thousands and thousands of people? Besides, bytes(3) behavior is very logical. Everyone who knows what malloc(3) does also knows what bytes(3) does. Who doesn't, can learn, and eventually be grateful that learning Python actually helped them to learn other language as well. [] > Addition of explicit "single byte" constructors > ----------------------------------------------- > > As binary counterparts to the text ``chr`` function, this PEP > proposes the addition of an explicit ``byte`` alternative constructor > as a class method on both ``bytes`` and ``bytearray``:: > > >>> bytes.byte(3) > b'\x03' > >>> bytearray.byte(3) > bytearray(b'\x03') > > These methods will only accept integers in the range 0 to 255 > (inclusive):: > > >>> bytes.byte(512) > Traceback (most recent call last): > File "", line 1, in > ValueError: bytes must be in range(0, 256) > > >>> bytes.byte(1.0) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'float' object cannot be interpreted as an integer > > The documentation of the ``ord`` builtin will be updated to > explicitly note that ``bytes.byte`` is the inverse operation for > binary data, while ``chr`` is the inverse operation for text data. The documentation should probably also mention that bytes.byte(x) is equivalent to x.to_bytes(1, "little"). [] -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Tue Jun 7 17:37:11 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 8 Jun 2016 00:37:11 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160607173119.36961fcf.barry@wooz.org> References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> Message-ID: <20160608003711.6149bc96@x230> Hello, On Tue, 7 Jun 2016 17:31:19 -0400 Barry Warsaw wrote: > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: > > >* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and > > ``memoryview.iterbytes`` alternative iterators > > +1 but I want to go just a little farther. > > We can't change bytes.__getitem__ but we can add another method that > returns single byte objects? I think it's still a bit of a pain to > extract single bytes even with .iterbytes(). > > Maybe .iterbytes can take a single index argument (blech) or add a > method like .byte_at(i). I'll let you bikeshed on the name. What's wrong with b[i:i+1] ? -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Tue Jun 7 17:39:30 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jun 2016 14:39:30 -0700 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160607173119.36961fcf.barry@wooz.org> References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> Message-ID: On 7 June 2016 at 14:31, Barry Warsaw wrote: > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: > >>* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and >> ``memoryview.iterbytes`` alternative iterators > > +1 but I want to go just a little farther. > > We can't change bytes.__getitem__ but we can add another method that returns > single byte objects? I think it's still a bit of a pain to extract single > bytes even with .iterbytes(). > > Maybe .iterbytes can take a single index argument (blech) or add a method like > .byte_at(i). I'll let you bikeshed on the name. Perhaps: data.getbyte(i) data.iterbytes() The rationale for "Why not a live view?" is that an iterator is simple to define and implement, while we know from experience with memoryview and the various dict views that live views are a minefield for folks defining new container types. Since this PEP would in some sense change what it means to implement a full "bytes-like object", it's worth keeping implementation complexity in mind. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Tue Jun 7 17:40:35 2016 From: brett at python.org (Brett Cannon) Date: Tue, 07 Jun 2016 21:40:35 +0000 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160608003711.6149bc96@x230> References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> <20160608003711.6149bc96@x230> Message-ID: On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky wrote: > Hello, > > On Tue, 7 Jun 2016 17:31:19 -0400 > Barry Warsaw wrote: > > > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: > > > > >* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and > > > ``memoryview.iterbytes`` alternative iterators > > > > +1 but I want to go just a little farther. > > > > We can't change bytes.__getitem__ but we can add another method that > > returns single byte objects? I think it's still a bit of a pain to > > extract single bytes even with .iterbytes(). > > > > Maybe .iterbytes can take a single index argument (blech) or add a > > method like .byte_at(i). I'll let you bikeshed on the name. > > What's wrong with b[i:i+1] ? > It always succeeds while indexing can trigger an IndexError. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Tue Jun 7 17:50:49 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Tue, 7 Jun 2016 17:50:49 -0400 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> Message-ID: <0cea01d1c106$a88a3750$f99ea5f0$@hotmail.com> > -----Original Message----- > From: Python-Dev [mailto:python-dev-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Nick Coghlan > Sent: Tuesday, June 7, 2016 5:40 PM > To: Barry Warsaw > Cc: python-dev at python.org > Subject: Re: [Python-Dev] PEP 467: Minor API improvements to bytes, > bytearray, and memoryview > > On 7 June 2016 at 14:31, Barry Warsaw wrote: > > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: > > > >>* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and > >> ``memoryview.iterbytes`` alternative iterators > > > > +1 but I want to go just a little farther. > > > > We can't change bytes.__getitem__ but we can add another method that > returns > > single byte objects? I think it's still a bit of a pain to extract single > > bytes even with .iterbytes(). > > > > Maybe .iterbytes can take a single index argument (blech) or add a method > like > > .byte_at(i). I'll let you bikeshed on the name. > > Perhaps: > > data.getbyte(i) > data.iterbytes() data.getbyte(index_or_slice_object) ? while it might not be... ideal... to create a sliceable live view object, we can have a method that accepts a slice, even if we have to create it manually (or at least make it convenient for those who wish to wrap a bytes object in their own type and blindly pass the first-non-self arg of a custom __getitem__ to the method). > The rationale for "Why not a live view?" is that an iterator is simple > to define and implement, while we know from experience with memoryview > and the various dict views that live views are a minefield for folks > defining new container types. Since this PEP would in some sense > change what it means to implement a full "bytes-like object", it's > worth keeping implementation complexity in mind. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium- > list%40sdamon.com From ericsnowcurrently at gmail.com Tue Jun 7 17:51:46 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 14:51:46 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Tue, Jun 7, 2016 at 2:34 PM, Nick Coghlan wrote: > On 7 June 2016 at 14:20, Eric Snow wrote: >> What would you consider to be the >> benefit of a mutable (or replaceable) __definition_order__ that >> outweighs the benefit of a simpler definition of what's in it. > > Mainly the fact that class decorators and metaclasses can't hide the > difference between "attributes defined in the class body" and > "attributes injected by a decorator or metaclass". I don't have a > concrete use case for that, it just bothers me on general principles > when we have things the interpreter can do that can't readily be > emulated in Python code. Yeah, I see what you mean. > > However, if it proves to be a hassle in practice, making it writable > can be done later based on specific use cases, so I don't mind if the > PEP stays as it is on that front. Agreed. -eric From tritium-list at sdamon.com Tue Jun 7 17:52:51 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Tue, 7 Jun 2016 17:52:51 -0400 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <0cea01d1c106$a88a3750$f99ea5f0$@hotmail.com> References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> <0cea01d1c106$a88a3750$f99ea5f0$@hotmail.com> Message-ID: <0ceb01d1c106$f0eb6be0$d2c243a0$@hotmail.com> Ignore that message. I hit send before brain and hands were fully in sync. > -----Original Message----- > From: tritium-list at sdamon.com [mailto:tritium-list at sdamon.com] > Sent: Tuesday, June 7, 2016 5:51 PM > To: 'Nick Coghlan' ; 'Barry Warsaw' > > Cc: python-dev at python.org > Subject: RE: [Python-Dev] PEP 467: Minor API improvements to bytes, > bytearray, and memoryview > > > > > -----Original Message----- > > From: Python-Dev [mailto:python-dev-bounces+tritium- > > list=sdamon.com at python.org] On Behalf Of Nick Coghlan > > Sent: Tuesday, June 7, 2016 5:40 PM > > To: Barry Warsaw > > Cc: python-dev at python.org > > Subject: Re: [Python-Dev] PEP 467: Minor API improvements to bytes, > > bytearray, and memoryview > > > > On 7 June 2016 at 14:31, Barry Warsaw wrote: > > > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: > > > > > >>* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and > > >> ``memoryview.iterbytes`` alternative iterators > > > > > > +1 but I want to go just a little farther. > > > > > > We can't change bytes.__getitem__ but we can add another method > that > > returns > > > single byte objects? I think it's still a bit of a pain to extract > single > > > bytes even with .iterbytes(). > > > > > > Maybe .iterbytes can take a single index argument (blech) or add a > method > > like > > > .byte_at(i). I'll let you bikeshed on the name. > > > > Perhaps: > > > > data.getbyte(i) > > data.iterbytes() > > data.getbyte(index_or_slice_object) ? > > while it might not be... ideal... to create a sliceable live view object, we > can have a method that accepts a slice, even if we have to create it > manually (or at least make it convenient for those who wish to wrap a bytes > object in their own type and blindly pass the first-non-self arg of a custom > __getitem__ to the method). > > > The rationale for "Why not a live view?" is that an iterator is simple > > to define and implement, while we know from experience with > memoryview > > and the various dict views that live views are a minefield for folks > > defining new container types. Since this PEP would in some sense > > change what it means to implement a full "bytes-like object", it's > > worth keeping implementation complexity in mind. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python- > dev/tritium- > > list%40sdamon.com From ncoghlan at gmail.com Tue Jun 7 17:56:38 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jun 2016 14:56:38 -0700 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160608003350.7a7c6641@x230> References: <57572E5D.4020101@stoneleaf.us> <20160608003350.7a7c6641@x230> Message-ID: On 7 June 2016 at 14:33, Paul Sokolovsky wrote: > Hello, > > On Tue, 07 Jun 2016 13:28:13 -0700 > Ethan Furman wrote: > >> Minor changes: updated version numbers, add punctuation. >> >> The current text seems to take into account Guido's last comments. >> >> Thoughts before asking for acceptance? >> >> > [] > >> Deprecation of current "zero-initialised sequence" behaviour >> ------------------------------------------------------------ >> >> Currently, the ``bytes`` and ``bytearray`` constructors accept an >> integer argument and interpret it as meaning to create a >> zero-initialised sequence of the given size:: >> >> >>> bytes(3) >> b'\x00\x00\x00' >> >>> bytearray(3) >> bytearray(b'\x00\x00\x00') >> >> This PEP proposes to deprecate that behaviour in Python 3.6, and >> remove it entirely in Python 3.7. > > Why the desire to break applications of thousands and thousands of > people? Same argument as any deprecation: to make existing and future defects easier to find or easier to debug. That said, this is the main part I was referring to in the other thread when I mentioned some of the constructor changes were potentially controversial and probably not worth the hassle - it's the only one with the potential to break currently working code, while the others are just a matter of choosing suitable names. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Tue Jun 7 17:57:45 2016 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Jun 2016 17:57:45 -0400 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> <20160608003711.6149bc96@x230> Message-ID: <20160607175745.291e595a@subdivisions.wooz.org> On Jun 07, 2016, at 09:40 PM, Brett Cannon wrote: >On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky wrote: >> What's wrong with b[i:i+1] ? >It always succeeds while indexing can trigger an IndexError. Right. You want a method with the semantics of __getitem__() but that returns the desired type. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From k7hoven at gmail.com Tue Jun 7 18:22:21 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 8 Jun 2016 01:22:21 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160607175745.291e595a@subdivisions.wooz.org> References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> <20160608003711.6149bc96@x230> <20160607175745.291e595a@subdivisions.wooz.org> Message-ID: On Wed, Jun 8, 2016 at 12:57 AM, Barry Warsaw wrote: > On Jun 07, 2016, at 09:40 PM, Brett Cannon wrote: > >>On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky wrote: >>> What's wrong with b[i:i+1] ? >>It always succeeds while indexing can trigger an IndexError. > > Right. You want a method with the semantics of __getitem__() but that returns > the desired type. > And if this is called __getitem__ (with slices delegated to bytes.__getitem__) and implemented in a class, one has a view. Maybe I'm missing something, but I fail to understand what makes this significantly more problematic than an iterator. Ok, I guess we might also need __len__. -- Koos > -Barry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com > From ericsnowcurrently at gmail.com Tue Jun 7 18:27:16 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 15:27:16 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: <57573E02.9020206@stoneleaf.us> References: <57573E02.9020206@stoneleaf.us> Message-ID: On Tue, Jun 7, 2016 at 2:34 PM, Ethan Furman wrote: > On 06/07/2016 02:20 PM, Eric Snow wrote: >> I think both of those make __definition_order__ more complicated and >> less useful. As the PEP stands, folks can be confident in what >> __definition_order__ represents. What would you consider to be the >> benefit of a mutable (or replaceable) __definition_order__ that >> outweighs the benefit of a simpler definition of what's in it. > > I think the question is which is more useful? > > - a definition order that lists items that are not in the class, as > well as not having items that are in the class (set by the decorator) > > or > > - a definition order that is representative of the class state after > all decorators have been applied "definition" refers explicitly to the execution of the class body in a class statement. So what you've described is a bit confusing to me. If we're talking about some other semantics then the name "__definition_order__" is misleading. Also, consider that __definition_order__ is, IMHO, most useful when interpreted as the actual order of attributes in the class definition body. The point is that code outside the class body can leverage the order of assigned names within that block. So, relative to the class definition, I'm not clear on valid use cases that divorce themselves from the class definition, such that either of your scenarios is relevant. Semantics that relate more to the class namespace (__dict__) are a separate matter from this PEP. I'd welcome a broader solution that still met the needs at which __definition_order__ is aiming. For example, consider if the class's __dict__ (or rather the proxied value) were OrderedDict. In fact, Guido was originally (in 2013) amenable to that idea. However, I tried it and making it work was a challenge due to use of the concrete dict C-API. I'd be glad if it was worked out. In the meantime, this PEP is more focused on a practical representation of the ordering information inside just the class definition body. > > One argument for the latter is that, even though the class has been > technically "defined" (class body executed, type.__new__ called, etc.), > applying decorators feels like continued class definition. Perhaps. That doesn't align with my intuition on decorators, but I'll readily concede that my views aren't always particularly representative of everyone else. :) That said, there are many different uses for decorators and modifying the class namespace (__dict__) isn't the only one (and in my experience not necessarily the most common). > > One argument for the former is simplified implementation, I'm not sure what you're implying about the implementation. Do you mean that it's easier than just letting __definition_order__ be writable (or mutable)? It's actually slightly more work to make it a read-only attr. Perhaps you mean that the semantics in the PEP are easier to implement than something that tracks changes to the class namespace (__dict__) after definition is over? Probably, though I don't see anything like that happening (other than if OrderedDict were used for __dict__). > and is definition > order really important after the class body has been executed? (okay, two > arguments ;) Given that the focus is on class definition, I'd argue no. :) > > Perhaps the best thing is just to make it writeable -- after all, if > __class__, __name__, etc., can all be changed, why should > __definition_order__ be special? Not all attrs are writable and it's a case-by-case situation: some of the ones that are writable started out read-only and changed once there was a valid reason. If anything, it's arguably safer in general to take an immutable-by-default approach. -eric From ethan at stoneleaf.us Tue Jun 7 18:39:28 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 15:39:28 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: <57573E02.9020206@stoneleaf.us> Message-ID: <57574D20.3010700@stoneleaf.us> On 06/07/2016 03:27 PM, Eric Snow wrote: > Not all attrs are writable and it's a case-by-case situation: some of > the ones that are writable started out read-only and changed once > there was a valid reason. If anything, it's arguably safer in general > to take an immutable-by-default approach. I'm sold. Leave it read-only. :) -- ~Ethan~ From ethan at stoneleaf.us Tue Jun 7 18:46:00 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 15:46:00 -0700 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160608003350.7a7c6641@x230> References: <57572E5D.4020101@stoneleaf.us> <20160608003350.7a7c6641@x230> Message-ID: <57574EA8.9090805@stoneleaf.us> On 06/07/2016 02:33 PM, Paul Sokolovsky wrote: >> This PEP proposes to deprecate that behaviour in Python 3.6, and >> remove it entirely in Python 3.7. > > Why the desire to break applications of thousands and thousands of > people? Besides, bytes(3) behavior is very logical. Everyone who knows > what malloc(3) does also knows what bytes(3) does. Who doesn't, can > learn, and eventually be grateful that learning Python actually helped > them to learn other language as well. Two reasons: 1) bytes are immutable, so creating a 3-byte 0x00 string seems ridiculous; 2) Python is not C, and the vagaries of malloc are not relevant to Python. However, there is little point in breaking working code, so a deprecation without removal is fine by me. -- ~Ethan~ From ncoghlan at gmail.com Tue Jun 7 19:03:13 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jun 2016 16:03:13 -0700 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> <20160608003711.6149bc96@x230> <20160607175745.291e595a@subdivisions.wooz.org> Message-ID: On 7 June 2016 at 15:22, Koos Zevenhoven wrote: > On Wed, Jun 8, 2016 at 12:57 AM, Barry Warsaw wrote: >> On Jun 07, 2016, at 09:40 PM, Brett Cannon wrote: >> >>>On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky wrote: >>>> What's wrong with b[i:i+1] ? >>>It always succeeds while indexing can trigger an IndexError. >> >> Right. You want a method with the semantics of __getitem__() but that returns >> the desired type. >> > > And if this is called __getitem__ (with slices delegated to > bytes.__getitem__) and implemented in a class, one has a view. Maybe > I'm missing something, but I fail to understand what makes this > significantly more problematic than an iterator. Ok, I guess we might > also need __len__. Right, it's the fact that a view is a much broader API than we need, since most of the operations on the base type are already fine. The two alternate operations that people are interested in are: - like indexing, but producing bytes instead of ints - like iteration, but producing bytes instead of ints That said, it occurs to me that there's a reasonably strong composability argument in favour of a view-based approach: a view will work with operator.itemgetter() and other sequence consuming APIs, while special methods won't. The "like-memoryview-but-not" view type could also take any bytes-like object as input, similar to memoryview itself. Cheers, Nick. P.S. I'm starting to remember why I stopped working on this - I'm genuinely unsure of the right way forward, so I wasn't prepared to advocate strongly for the particular approach in the PEP :) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From raymond.hettinger at gmail.com Tue Jun 7 19:03:57 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 7 Jun 2016 16:03:57 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: <00C7A5D7-686C-45F7-9C8E-930CAB96FDFD@gmail.com> > On Jun 7, 2016, at 10:51 AM, Eric Snow wrote: > > This PEP changes the default class definition namespace to ``OrderedDict``. I think this would be a nice improvement. > Furthermore, the order in which the attributes are defined in each class > body will now be preserved in ``type.__definition_order__``. This allows > introspection of the original definition order, e.g. by class decorators. I'm unclear on why this would be needed. Wouldn't the OrderedDict be sufficient for preserving definition order? Raymond From ncoghlan at gmail.com Tue Jun 7 19:12:15 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jun 2016 16:12:15 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: <00C7A5D7-686C-45F7-9C8E-930CAB96FDFD@gmail.com> References: <00C7A5D7-686C-45F7-9C8E-930CAB96FDFD@gmail.com> Message-ID: On 7 June 2016 at 16:03, Raymond Hettinger wrote: > >> On Jun 7, 2016, at 10:51 AM, Eric Snow wrote: >> >> This PEP changes the default class definition namespace to ``OrderedDict``. > > I think this would be a nice improvement. > >> Furthermore, the order in which the attributes are defined in each class >> body will now be preserved in ``type.__definition_order__``. This allows >> introspection of the original definition order, e.g. by class decorators. > > I'm unclear on why this would be needed. Wouldn't the OrderedDict be sufficient for preserving definition order? By the time decorators run, the original execution namespace is no longer available - the contents have been copied into the class dict, which will still be a plain dict (and there's a lot of code that calls PyDict_* APIs on tp_dict, so replacing the latter with a subclass is neither trivial nor particularly safe in the presence of extension modules). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pmiscml at gmail.com Tue Jun 7 19:17:12 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 8 Jun 2016 02:17:12 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57574EA8.9090805@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> <20160608003350.7a7c6641@x230> <57574EA8.9090805@stoneleaf.us> Message-ID: <20160608021712.0e2e02d7@x230> Hello, On Tue, 07 Jun 2016 15:46:00 -0700 Ethan Furman wrote: > On 06/07/2016 02:33 PM, Paul Sokolovsky wrote: > > >> This PEP proposes to deprecate that behaviour in Python 3.6, and > >> remove it entirely in Python 3.7. > > > > Why the desire to break applications of thousands and thousands of > > people? Besides, bytes(3) behavior is very logical. Everyone who > > knows what malloc(3) does also knows what bytes(3) does. Who > > doesn't, can learn, and eventually be grateful that learning Python > > actually helped them to learn other language as well. > > Two reasons: > > 1) bytes are immutable, so creating a 3-byte 0x00 string seems > ridiculous; There's nothing ridiculous in sending N zero bytes over network, writing to a file, transferring to a hardware device. That however raises questions e.g. how to (efficiently) fill a (subsection) of bytearray with something but a 0, and how to apply all that consistently to array.array, but I don't even want to bring it, because the answer will be "we need first to deal with subjects of this PEP". > > 2) Python is not C, and the vagaries of malloc are not relevant to > Python. Yes, but Python has always had some traits nicely similar to C, (% formatting, os.read/write at the fingertips, this bytes/bytearray constructor, etc.), and that certainly catered for sizable share of its audience. It's nice that nowadays Python is truly multi-paradigm and taught to pre-schools and used by folks who know how to analyze data much better than how to allocate memory to hold that data in the first place. But hopefully people who used Python since 1.x as a nice system-level integration language, concise without much ambiguity (definitely less than other languages, maybe COBOL excluded) shouldn't suffer and have their stuff broken. > > However, there is little point in breaking working code, so a > deprecation without removal is fine by me. Thanks. > > -- > ~Ethan~ -- Best regards, Paul mailto:pmiscml at gmail.com From ericsnowcurrently at gmail.com Tue Jun 7 20:50:50 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 17:50:50 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace Message-ID: I've grabbed a PEP # (520) and updated the PEP to clarify points that were brought up earlier today. Given positive feedback I got at PyCon and the reaction today, I'm hopeful the PEP isn't far off from pronouncement. :) -eric ========================================== PEP: 520 Title: Ordered Class Definition Namespace Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 7-Jun-2016 Python-Version: 3.6 Post-History: 7-Jun-2016 Abstract ======== This PEP changes the default class definition namespace to ``OrderedDict``. Furthermore, the order in which the attributes are defined in each class body will now be preserved in ``type.__definition_order__``. This allows introspection of the original definition order, e.g. by class decorators. Note: just to be clear, this PEP is *not* about changing ``__dict__`` for classes to ``OrderedDict``. Motivation ========== Currently the namespace used during execution of a class body defaults to ``dict``. If the metaclass defines ``__prepare__()`` then the result of calling it is used. Thus, before this PEP, if you needed your class definition namespace to be ``OrderedDict`` you had to use a metaclass. Metaclasses introduce an extra level of complexity to code and in some cases (e.g. conflicts) are a problem. So reducing the need for them is worth doing when the opportunity presents itself. Given that we now have a C implementation of ``OrderedDict`` and that ``OrderedDict`` is the common use case for ``__prepare__()``, we have such an opportunity by defaulting to ``OrderedDict``. The usefulness of ``OrderedDict``-by-default is greatly increased if the definition order is directly introspectable on classes afterward, particularly by code that is independent of the original class definition. One of the original motivating use cases for this PEP is generic class decorators that make use of the definition order. Changing the default class definition namespace has been discussed a number of times, including on the mailing lists and in PEP 422 and PEP 487 (see the References section below). Specification ============= * the default class *definition* namespace is now ``OrderdDict`` * the order in which class attributes are defined is preserved in the new ``__definition_order__`` attribute on each class * "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored * ``__definition_order__`` is a tuple * ``__definition_order__`` is a read-only attribute * ``__definition_order__`` is always set: 1. if ``__definition_order__`` is defined in the class body then the value is used as-is, though the attribute will still be read-only 2. types that do not have a class definition (e.g. builtins) have their ``__definition_order__`` set to ``None`` 3. types for which `__prepare__()`` returned something other than ``OrderedDict`` (or a subclass) have their ``__definition_order__`` set to ``None`` (except where #1 applies) The following code demonstrates roughly equivalent semantics:: class Meta(type): def __prepare__(cls, *args, **kwargs): return OrderedDict() class Spam(metaclass=Meta): ham = None eggs = 5 __definition_order__ = tuple(k for k in locals() if (!k.startswith('__') or !k.endswith('__'))) Note that [pep487_] proposes a similar solution, albeit as part of a broader proposal. Why a tuple? ------------ Use of a tuple reflects the fact that we are exposing the order in which attributes on the class were *defined*. Since the definition is already complete by the time ``definition_order__`` is set, the content and order of the value won't be changing. Thus we use a type that communicates that state of immutability. Why a read-only attribute? -------------------------- As with the use of tuple, making ``__definition_order__`` a read-only attribute communicates the fact that the information it represents is complete. Since it represents the state of a particular one-time event (execution of the class definition body), allowing the value to be replaced would reduce confidence that the attribute corresponds to the original class body. If a use case for a writable (or mutable) ``__definition_order__`` arises, the restriction may be loosened later. Presently this seems unlikely and furthermore it is usually best to go immutable-by-default. Note that ``__definition_order__`` is centered on the class definition body. The use cases for dealing with the class namespace (``__dict__``) post-definition are a separate matter. ``__definition_order__`` would be a significantly misleading name for a supporting feature. See [nick_concern_] for more discussion. Why ignore "dunder" names? -------------------------- Names starting and ending with "__" are reserved for use by the interpreter. In practice they should not be relevant to the users of ``__definition_order__``. Instead, for early everyone they would only be clutter, causing the same extra work for everyone. Why is __definition_order__ even necessary? ------------------------------------------- Since the definition order is not preserved in ``__dict__``, it would be lost once class definition execution completes. Classes *could* explicitly set the attribute as the last thing in the body. However, then independent decorators could only make use of classes that had done so. Instead, ``__definition_order__`` preserves this one bit of info from the class body so that it is universally available. Compatibility ============= This PEP does not break backward compatibility, except in the case that someone relies *strictly* on ``dict`` as the class definition namespace. This shouldn't be a problem. Changes ============= In addition to the class syntax, the following expose the new behavior: * builtins.__build_class__ * types.prepare_class * types.new_class Other Python Implementations ============================ Pending feedback, the impact on Python implementations is expected to be minimal. If a Python implementation cannot support switching to `OrderedDict``-by-default then it can always set ``__definition_order__`` to ``None``. Implementation ============== The implementation is found in the tracker. [impl_] Alternatives ============ type.__dict__ as OrderedDict ---------------------------- Instead of storing the definition order in ``__definition_order__``, the now-ordered definition namespace could be copied into a new ``OrderedDict``. This would mostly provide the same semantics. However, using ``OrderedDict`` for ``type,__dict__`` would obscure the relationship with the definition namespace, making it less useful. Additionally, doing this would require significant changes to the semantics of the concrete ``dict`` C-API. A "namespace" Keyword Arg for Class Definition ---------------------------------------------- PEP 422 introduced a new "namespace" keyword arg to class definitions that effectively replaces the need to ``__prepare__()``. [pep422_] However, the proposal was withdrawn in favor of the simpler PEP 487. References ========== .. [impl] issue #24254 (https://bugs.python.org/issue24254) .. [nick_concern] Nick's concerns about mutability (https://mail.python.org/pipermail/python-dev/2016-June/144883.html) .. [pep422] PEP 422 (https://www.python.org/dev/peps/pep-0422/#order-preserving-classes) .. [pep487] PEP 487 (https://www.python.org/dev/peps/pep-0487/#defining-arbitrary-namespaces) .. [orig] original discussion (https://mail.python.org/pipermail/python-ideas/2013-February/019690.html) .. [followup1] follow-up 1 (https://mail.python.org/pipermail/python-dev/2013-June/127103.html) .. [followup2] follow-up 2 (https://mail.python.org/pipermail/python-dev/2015-May/140137.html) Copyright =========== This document has been placed in the public domain. From steve at pearwood.info Tue Jun 7 21:09:57 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Jun 2016 11:09:57 +1000 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: <20160608010957.GG12028@ando.pearwood.info> On Tue, Jun 07, 2016 at 11:39:06AM -0700, Eric Snow wrote: > On Tue, Jun 7, 2016 at 11:32 AM, Terry Reedy wrote: > > On 6/7/2016 1:51 PM, Eric Snow wrote: > > > >> Note: just to be clear, this PEP is *not* about changing > > > >> ``type.__dict__`` to ``OrderedDict``. > > > > By 'type', do you mean the one and one objected named 'type or the class > > being defined? To be really clear, will the following change? > > > >>>> class C: pass > > > >>>> type(C.__dict__) > > > > I mean the latter, "type" -> the class being defined. Could you clarify that in the PEP please? Like Terry, I too found it unclear. I think there are a couple of places where you refer to `type` and it isn't clear whether you mean builtins.type or something else. -- Steve From ethan at stoneleaf.us Tue Jun 7 21:20:38 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jun 2016 18:20:38 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: <575772E6.7040906@stoneleaf.us> On 06/07/2016 05:50 PM, Eric Snow wrote: Overall +1. Some nits below. > Specification > ============= > 3. types for which `__prepare__()`` returned something other than > ``OrderedDict`` (or a subclass) have their ``__definition_order__`` > set to ``None`` (unless ``__definition_order__`` is present in the class dict either by virtue of being in the class body or because the metaclass inserted it before calling ``type.__new__``) > __definition_order__ = tuple(k for k in locals() > if (!k.startswith('__') or > !k.endswith('__'))) Still mixing C and Python! ;) > Why a tuple? > ------------ > > Use of a tuple reflects the fact that we are exposing the order in > which attributes on the class were *defined*. Since the definition > is already complete by the time ``definition_order__`` is set, the > content and order of the value won't be changing. Thus we use a type > that communicates that state of immutability. > Why a read-only attribute? > -------------------------- > > As with the use of tuple, making ``__definition_order__`` a read-only > attribute communicates the fact that the information it represents is > complete. Since it represents the state of a particular one-time event > (execution of the class definition body), allowing the value to be > replaced would reduce confidence that the attribute corresponds to the > original class body. > > If a use case for a writable (or mutable) ``__definition_order__`` > arises, the restriction may be loosened later. Presently this seems > unlikely and furthermore it is usually best to go immutable-by-default. If __definition_order__ is supposed to be immutable as well as read-only then we should convert non-tuples to tuples. No point in letting that user bug slip through. > Why ignore "dunder" names? > -------------------------- > > Names starting and ending with "__" are reserved for use by the > interpreter. In practice they should not be relevant to the users of > ``__definition_order__``. Instead, for early everyone they would only s/early/nearly > Why is __definition_order__ even necessary? > ------------------------------------------- > > Since the definition order is not preserved in ``__dict__``, it would be > lost once class definition execution completes. Classes *could* > explicitly set the attribute as the last thing in the body. However, > then independent decorators could only make use of classes that had done > so. Instead, ``__definition_order__`` preserves this one bit of info > from the class body so that it is universally available. s/would be/is -- ~Ethan~ From vadmium+py at gmail.com Tue Jun 7 22:01:14 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Wed, 8 Jun 2016 02:01:14 +0000 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57572E5D.4020101@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> Message-ID: On 7 June 2016 at 20:28, Ethan Furman wrote: > Addition of explicit "single byte" constructors > ----------------------------------------------- > > As binary counterparts to the text ``chr`` function, this PEP proposes the > addition of an explicit ``byte`` alternative constructor as a class method > on both ``bytes`` and ``bytearray``:: > > >>> bytes.byte(3) > b'\x03' > >>> bytearray.byte(3) > bytearray(b'\x03') Bytes.byte() is a great idea. But what?s the point or use case of bytearray.byte(), a mutable array of one pre-defined byte? > Addition of optimised iterator methods that produce ``bytes`` objects > --------------------------------------------------------------------- > > This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain an > optimised ``iterbytes`` method that produces length 1 ``bytes`` objects > rather than integers:: > > for x in data.iterbytes(): > # x is a length 1 ``bytes`` object, rather than an integer Might be good to have an example with concrete output, so you see the one-byte strings coming out of it. >>> tuple(b"ABC".iterbytes()) (b'A', b'B', b'C') From steve at pearwood.info Tue Jun 7 23:09:15 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Jun 2016 13:09:15 +1000 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160608021712.0e2e02d7@x230> References: <57572E5D.4020101@stoneleaf.us> <20160608003350.7a7c6641@x230> <57574EA8.9090805@stoneleaf.us> <20160608021712.0e2e02d7@x230> Message-ID: <20160608030915.GI12028@ando.pearwood.info> On Wed, Jun 08, 2016 at 02:17:12AM +0300, Paul Sokolovsky wrote: > Hello, > > On Tue, 07 Jun 2016 15:46:00 -0700 > Ethan Furman wrote: > > > On 06/07/2016 02:33 PM, Paul Sokolovsky wrote: > > > > >> This PEP proposes to deprecate that behaviour in Python 3.6, and > > >> remove it entirely in Python 3.7. > > > > > > Why the desire to break applications of thousands and thousands of > > > people? I'm not so sure that *thousands* of people are relying on this behaviour, but your point is taken that it is a backwards-incompatible change. > > > Besides, bytes(3) behavior is very logical. Everyone who > > > knows what malloc(3) does also knows what bytes(3) does. Most Python coders are not C coders. Knowing C is not and should not be a pre-requisite for using Python. > > > Who > > > doesn't, can learn, and eventually be grateful that learning Python > > > actually helped them to learn other language as well. I really don't think that learning Python will help with C. > > Two reasons: > > > > 1) bytes are immutable, so creating a 3-byte 0x00 string seems > > ridiculous; > > There's nothing ridiculous in sending N zero bytes over network, > writing to a file, transferring to a hardware device. True, but there is a good way of writing N identical bytes, not limited to nulls, using the replication operator: py> b'\xff'*10 b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' which is more useful than `bytes(10)` since that can only produce zeroes. > That however > raises questions e.g. how to (efficiently) fill a (subsection) of > bytearray with something but a 0 Slicing. py> b = bytearray(10) py> b[4:4] = b'\xff'*4 py> b bytearray(b'\x00\x00\x00\x00\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00') -- Steve From ericsnowcurrently at gmail.com Tue Jun 7 23:17:16 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 20:17:16 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: <575772E6.7040906@stoneleaf.us> References: <575772E6.7040906@stoneleaf.us> Message-ID: On Tue, Jun 7, 2016 at 6:20 PM, Ethan Furman wrote: > On 06/07/2016 05:50 PM, Eric Snow wrote: >> __definition_order__ = tuple(k for k in locals() >> if (!k.startswith('__') or >> !k.endswith('__'))) > > > Still mixing C and Python! ;) I knew I was missing something! > > >> Why a tuple? >> ------------ >> >> Use of a tuple reflects the fact that we are exposing the order in >> which attributes on the class were *defined*. Since the definition >> is already complete by the time ``definition_order__`` is set, the >> content and order of the value won't be changing. Thus we use a type >> that communicates that state of immutability. > > >> Why a read-only attribute? >> -------------------------- >> >> As with the use of tuple, making ``__definition_order__`` a read-only >> attribute communicates the fact that the information it represents is >> complete. Since it represents the state of a particular one-time event >> (execution of the class definition body), allowing the value to be >> replaced would reduce confidence that the attribute corresponds to the >> original class body. >> >> If a use case for a writable (or mutable) ``__definition_order__`` >> arises, the restriction may be loosened later. Presently this seems >> unlikely and furthermore it is usually best to go immutable-by-default. > > > If __definition_order__ is supposed to be immutable as well as read-only > then we should convert non-tuples to tuples. No point in letting that > user bug slip through. Do you mean if a class explicitly defines __definition_order__? If so, I'm not clear on how that would work. It could be set to anything, including None or a value that does not iterate into a definition order. If someone explicitly set __definition_order__ then I think it should be used as-is. > > >> Why ignore "dunder" names? >> -------------------------- >> >> Names starting and ending with "__" are reserved for use by the >> interpreter. In practice they should not be relevant to the users of >> ``__definition_order__``. Instead, for early everyone they would only > > > s/early/nearly fixed > >> Why is __definition_order__ even necessary? >> ------------------------------------------- >> >> Since the definition order is not preserved in ``__dict__``, it would be >> lost once class definition execution completes. Classes *could* >> explicitly set the attribute as the last thing in the body. However, >> then independent decorators could only make use of classes that had done >> so. Instead, ``__definition_order__`` preserves this one bit of info >> from the class body so that it is universally available. > > > s/would be/is fixed Thanks! -eric From ericsnowcurrently at gmail.com Tue Jun 7 23:17:54 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 7 Jun 2016 20:17:54 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: <20160608010957.GG12028@ando.pearwood.info> References: <20160608010957.GG12028@ando.pearwood.info> Message-ID: On Tue, Jun 7, 2016 at 6:09 PM, Steven D'Aprano wrote: > On Tue, Jun 07, 2016 at 11:39:06AM -0700, Eric Snow wrote: >> I mean the latter, "type" -> the class being defined. > > Could you clarify that in the PEP please? Like Terry, I too found it > unclear. I think there are a couple of places where you refer to `type` > and it isn't clear whether you mean builtins.type or something else. Yep. Done. -eric From vadmium+py at gmail.com Wed Jun 8 00:00:50 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Wed, 8 Jun 2016 04:00:50 +0000 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> <20160608003350.7a7c6641@x230> Message-ID: On 7 June 2016 at 21:56, Nick Coghlan wrote: > On 7 June 2016 at 14:33, Paul Sokolovsky wrote: >> Ethan Furman wrote: >>> Deprecation of current "zero-initialised sequence" behaviour >>> ------------------------------------------------------------ >>> >>> Currently, the ``bytes`` and ``bytearray`` constructors accept an >>> integer argument and interpret it as meaning to create a >>> zero-initialised sequence of the given size:: >>> >>> >>> bytes(3) >>> b'\x00\x00\x00' >>> >>> bytearray(3) >>> bytearray(b'\x00\x00\x00') >>> >>> This PEP proposes to deprecate that behaviour in Python 3.6, and >>> remove it entirely in Python 3.7. >> >> Why the desire to break applications of thousands and thousands of >> people? > > Same argument as any deprecation: to make existing and future defects > easier to find or easier to debug. > > That said, this is the main part I was referring to in the other > thread when I mentioned some of the constructor changes were > potentially controversial and probably not worth the hassle - it's the > only one with the potential to break currently working code, while the > others are just a matter of choosing suitable names. An argument against deprecating bytearray(n) in particular is that this is supported in Python 2. I think I have (ab)used this fact to work around the problem with bytes(n) in Python 2 & 3 compatible code. From raymond.hettinger at gmail.com Wed Jun 8 00:27:35 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 7 Jun 2016 21:27:35 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: <00C7A5D7-686C-45F7-9C8E-930CAB96FDFD@gmail.com> Message-ID: > On Jun 7, 2016, at 4:12 PM, Nick Coghlan wrote: > > By the time decorators run, the original execution namespace is no > longer available - the contents have been copied into the class dict, > which will still be a plain dict (and there's a lot of code that calls > PyDict_* APIs on tp_dict, so replacing the latter with a subclass is > neither trivial nor particularly safe in the presence of extension > modules). That makes sense. +1 all around. Raymond From storchaka at gmail.com Wed Jun 8 01:28:03 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 8 Jun 2016 08:28:03 +0300 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 07.06.16 20:51, Eric Snow wrote: > Hi all, > > Following discussion a few years back (and rough approval from Guido > [1]), I started work on using OrderedDict for the class definition > namespace by default. The bulk of the effort lay in implementing > OrderedDict in C, which I got landed just in time for 3.5. The > remaining work was quite minimal and the actual change is quite small. > > My intention was to land the patch soon, having gone through code > review during PyCon. However, Nick pointed out to me the benefit of > having a concrete point of reference for the change, as well as making > sure it isn't a problem for other implementations. So in that spirit, > here's a PEP for the change. Feedback is welcome, particularly from > from other implementors. Be aware that C implementation of OrderedDict still is not free from problems. From storchaka at gmail.com Wed Jun 8 01:42:23 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 8 Jun 2016 08:42:23 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57572E5D.4020101@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> Message-ID: On 07.06.16 23:28, Ethan Furman wrote: > Minor changes: updated version numbers, add punctuation. > > The current text seems to take into account Guido's last comments. > > Thoughts before asking for acceptance? > > > > > PEP: 467 > Title: Minor API improvements for binary sequences > Version: $Revision$ > Last-Modified: $Date$ > Author: Nick Coghlan > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2014-03-30 > Python-Version: 3.5 > Post-History: 2014-03-30 2014-08-15 2014-08-16 > > > Abstract > ======== > > During the initial development of the Python 3 language specification, > the core ``bytes`` type for arbitrary binary data started as the mutable > type that is now referred to as ``bytearray``. Other aspects of > operating in the binary domain in Python have also evolved over the > course of the Python 3 series. > > This PEP proposes four small adjustments to the APIs of the ``bytes``, > ``bytearray`` and ``memoryview`` types to make it easier to operate > entirely in the binary domain: > > * Deprecate passing single integer values to ``bytes`` and ``bytearray`` > * Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors > * Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors > * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and > ``memoryview.iterbytes`` alternative iterators "Byte" is an alias to "octet" (8-bit integer) in modern terminology. Iterating bytes and bytearray already produce bytes. Wouldn't this be confused? May be name these methods "iterbytestrings", since they adds str-like behavior? From stephen at xemacs.org Wed Jun 8 02:48:58 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 8 Jun 2016 15:48:58 +0900 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57572E5D.4020101@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> Message-ID: <22359.49114.638710.974863@turnbull.sk.tsukuba.ac.jp> Ethan Furman writes: > * Deprecate passing single integer values to ``bytes`` and > ``bytearray`` Why? This is a slightly awkward idiom compared to .zeros (EITBI etc), but your 32-bit clock will roll over before we can actually remove it. There are a lot of languages that do this kind of initialization of arrays based on ``count``. If you want to do something useful here, add an optional argument (here in ridiculous :-) generality: bytes(count, tile=[0]) -> bytes(tile * count) where ``tile`` is a Sequence of a type that is acceptable to bytes anyway, or Sequence[int], which is treated as b"".join([bytes(chr(i)) for i in tile] * count]) Interpretation of ``count`` of course i bikesheddable, with at least one alternative interpretation (length of result bytes, with last tile truncated if necessary). > * Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors this is an API break if you take the deprecation as a mandate (which eventual removal does indicate). And backward compatibility for clients of the bytes API means that we violate TOOWTDI indefinitely, on a constructor of quite specialized utility. Yuck. -1 on both. Barry Warsaw writes later in thread: > We can't change bytes.__getitem__ but we can add another method > that returns single byte objects? I think it's still a bit of a > pain to extract single bytes even with .iterbytes(). +1 ISTM that more than the other changes, this is the most important one. Steve From leewangzhong+python at gmail.com Wed Jun 8 03:07:26 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Wed, 8 Jun 2016 03:07:26 -0400 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Jun 7, 2016 8:52 PM, "Eric Snow" wrote: > * the default class *definition* namespace is now ``OrderdDict`` > * the order in which class attributes are defined is preserved in the By using an OrderedDict, names are ordered by first definition point, rather than location of the used definition. For example, the definition order of the following will be "x, y", even though the definitions actually bound to the name are in order "y, x". class C: x = 0 def y(self): return 'y' def x(self): return 'x' Is that okay? -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Jun 8 04:07:27 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 8 Jun 2016 10:07:27 +0200 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: > Abstract > ======== > > This PEP changes the default class definition namespace to ``OrderedDict``. > Furthermore, the order in which the attributes are defined in each class > body will now be preserved in ``type.__definition_order__``. This allows > introspection of the original definition order, e.g. by class decorators. > > Note: just to be clear, this PEP is *not* about changing ``__dict__`` for > classes to ``OrderedDict``. What is the cost in term of performance? What can be slower: define a new class and/or instanciate a class? Victor From victor.stinner at gmail.com Wed Jun 8 04:04:08 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 8 Jun 2016 10:04:08 +0200 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <57572E5D.4020101@stoneleaf.us> References: <57572E5D.4020101@stoneleaf.us> Message-ID: Hi, > Currently, the ``bytes`` and ``bytearray`` constructors accept an integer > argument and interpret it as meaning to create a zero-initialised sequence > of the given size:: > (...) > This PEP proposes to deprecate that behaviour in Python 3.6, and remove it > entirely in Python 3.7. I'm opposed to this change (presented like that). Please stop breaking the backward compatibility in minor versions. I'm porting Python 2 code to Python 3 for longer than 2 years. First, Python 3 only proposed to immediatly drop Python 2 support using the 2to3 tool. It simply doesn't work because you must port incrementally all dependencies, so you must write code working with Python 2 and Python 3 using the same code base. A few people tried to duplicate repositories, projects, project name, etc. to have one version for Python 2 and one version for Python 3, but IMHO it's even worse. It's very difficult to handle dependencies using that. It took a few years until six was widely used and that pip was popular enough to be able to add six as a *dependency* (and not put an old copy in the project). Basically, you propose to introduce a backward incompatible change for free (I fail to see the benefit of replacing bytes(n) with bytes.zeros(n)) and without obvious way to write code compatible with Python <= 3.6 and Python >= 3.7. Moreover, a single cycle is way too short to port all code in the wild. It's common that users complain that Python core developers like breaking the compatibility at each release. Recently, I saw a list of applications which need to be ported to Python 3.5, while they work perfectly on Python 3.4. *If* you still want to deprecate bytes(n), you must introduce an helper working on *all* Python versions. Obviously, the helper must be avaialble and work for Python 2.7. Maybe it can be the six module. Maybe something else. In Perl 5, there is a nice "use 5.12;" pragma to explicitly ask to keep the compatiiblity with Perl 5.12. This pragma allows to change the language more easily, since you can port code file by file. I don't know if it's technically possible in Python, maybe not for all kinds of backward incompatible changes. Victor From victor.stinner at gmail.com Wed Jun 8 04:26:34 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 8 Jun 2016 10:26:34 +0200 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <837505337487020527.659630sturla.molden-gmail.com@news.gmane.org> Message-ID: I guess that as usual, we should use the "common denominator" of all compilers supported by CPython. For example, if MSVC doesn't support a feature, we should not use it in CPython. In practice, it's easy to check if a feature is supported or not: we have buildbots building Python at each commit. It was very common to get a compilation error only on MSVC when a variable was defined in the middle of a function. We are now using -Werror=declaration-after-statement with GCC because of MSVC! Maybe GCC has an option to ask for the subset of the C99 standard compatible with MSVC? Something like "-std=c99 -pedantic"? Note: I tried -pedantic, GCC emits a lot of warnings on code which looks valid and/or is protected with #ifdef for features specific to GCC like computed goto. Victor 2016-06-07 21:45 GMT+02:00 Guido van Rossum : > We should definitely keep supporting MSVC. > > --Guido (mobile) > > On Jun 7, 2016 12:39 PM, "Sturla Molden" wrote: >> >> Victor Stinner wrote: >> >> > Is it worth to support a compiler that in 2016 doesn't support the C >> > standard released in 1999, 17 years ago? >> >> MSVC only supports C99 when its needed for C++11 or some MS extension to >> C. >> >> Is it worth supporting MSVC? If not, we have Intel C, Clang and Cygwin GCC >> are the viable options we have on Windows (and perhaps Embarcadero, but I >> haven't used C++ builder for a very long time). Even MinGW does not fully >> support C99, because it depends on Microsoft's CRT. If we think MSVC and >> MinGW are worth supporting, we cannot just use C99 indiscriminantly. >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > From storchaka at gmail.com Wed Jun 8 04:53:06 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 8 Jun 2016 11:53:06 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> Message-ID: On 08.06.16 11:04, Victor Stinner wrote: >> Currently, the ``bytes`` and ``bytearray`` constructors accept an integer >> argument and interpret it as meaning to create a zero-initialised sequence >> of the given size:: >> (...) >> This PEP proposes to deprecate that behaviour in Python 3.6, and remove it >> entirely in Python 3.7. > > I'm opposed to this change (presented like that). Please stop breaking > the backward compatibility in minor versions. The argument for deprecating bytes(n) is that this has different meaning in Python 2, and when backport a code to Python 2 or write 2+3 compatible code there is a risk to make a mistake. This argument is not applicable to bytearray(n). > *If* you still want to deprecate bytes(n), you must introduce an > helper working on *all* Python versions. Obviously, the helper must be > avaialble and work for Python 2.7. Maybe it can be the six module. > Maybe something else. The obvious way to create the bytes object of length n is b'\0' * n. It works in all Python versions starting from 2.6. I don't see the need in bytes(n) and bytes.zeros(n). There are no special methods for creating a list or a string of size n. From storchaka at gmail.com Wed Jun 8 05:11:40 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 8 Jun 2016 12:11:40 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> <20160607173119.36961fcf.barry@wooz.org> <20160608003711.6149bc96@x230> <20160607175745.291e595a@subdivisions.wooz.org> Message-ID: On 08.06.16 02:03, Nick Coghlan wrote: > That said, it occurs to me that there's a reasonably strong > composability argument in favour of a view-based approach: a view will > work with operator.itemgetter() and other sequence consuming APIs, > while special methods won't. The "like-memoryview-but-not" view type > could also take any bytes-like object as input, similar to memoryview > itself. Something like: class chunks: def __init__(self, seq, size): self._seq = seq self._size = size def __len__(self): return len(self._seq) // self._size def __getitem__(self, i): chunk = self._seq[i: i + self._size] if len(chunk) != self._size: raise IndexError return chunk (but needs more checks and slices support). It would be useful for general sequences too. From pmiscml at gmail.com Wed Jun 8 06:37:37 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 8 Jun 2016 13:37:37 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> Message-ID: <20160608133737.63e6c666@x230> Hello, On Wed, 8 Jun 2016 11:53:06 +0300 Serhiy Storchaka wrote: > On 08.06.16 11:04, Victor Stinner wrote: > >> Currently, the ``bytes`` and ``bytearray`` constructors accept an > >> integer argument and interpret it as meaning to create a > >> zero-initialised sequence of the given size:: > >> (...) > >> This PEP proposes to deprecate that behaviour in Python 3.6, and > >> remove it entirely in Python 3.7. > > > > I'm opposed to this change (presented like that). Please stop > > breaking the backward compatibility in minor versions. > > The argument for deprecating bytes(n) is that this has different > meaning in Python 2, That's artifact (as in: defect) of "bytes" (apparently) being a flat alias of "str" in Python2, without trying to validate its arguments. It would be sad if thinkos in Python2 implementation dictate how Python3 should work. It's not too late to fix it in Python2 by issuing s CVE along the lines of "Lack of argument validation in Python2 bytes() constructor may lead to insecure code." > and when backport a code to Python 2 or write > 2+3 compatible code there is a risk to make a mistake. This argument > is not applicable to bytearray(n). > > > *If* you still want to deprecate bytes(n), you must introduce an > > helper working on *all* Python versions. Obviously, the helper must > > be avaialble and work for Python 2.7. Maybe it can be the six > > module. Maybe something else. > > The obvious way to create the bytes object of length n is b'\0' * n. That's very inefficient: it requires allocating useless b'\0', then a generic function to repeat arbitrary memory block N times. If there's a talk of Python to not be laughed at for being SLOW, there would rather be efficient ways to deal with blocks of binary data. > It works in all Python versions starting from 2.6. I don't see the > need in bytes(n) and bytes.zeros(n). There are no special methods for > creating a list or a string of size n. So, above, unless you specifically mean having bytearray.zero() and not having bytes.zero(). But then the whole purpose of the presented PEP is make API more, not less consistent. Having random gaps in bytes vs bytearray API isn't going to help anyone. -- Best regards, Paul mailto:pmiscml at gmail.com From storchaka at gmail.com Wed Jun 8 07:05:19 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 8 Jun 2016 14:05:19 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160608133737.63e6c666@x230> References: <57572E5D.4020101@stoneleaf.us> <20160608133737.63e6c666@x230> Message-ID: On 08.06.16 13:37, Paul Sokolovsky wrote: >> The obvious way to create the bytes object of length n is b'\0' * n. > > That's very inefficient: it requires allocating useless b'\0', then a > generic function to repeat arbitrary memory block N times. If there's a > talk of Python to not be laughed at for being SLOW, there would rather > be efficient ways to deal with blocks of binary data. Do you have any evidences for this claim? $ ./python -m timeit -s 'n = 10000' -- 'bytes(n)' 1000000 loops, best of 3: 1.32 usec per loop $ ./python -m timeit -s 'n = 10000' -- 'b"\0" * n' 1000000 loops, best of 3: 0.858 usec per loop From pmiscml at gmail.com Wed Jun 8 07:26:45 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 8 Jun 2016 14:26:45 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> <20160608133737.63e6c666@x230> Message-ID: <20160608142645.71884fc1@x230> Hello, On Wed, 8 Jun 2016 14:05:19 +0300 Serhiy Storchaka wrote: > On 08.06.16 13:37, Paul Sokolovsky wrote: > >> The obvious way to create the bytes object of length n is b'\0' * > >> n. > > > > That's very inefficient: it requires allocating useless b'\0', then > > a generic function to repeat arbitrary memory block N times. If > > there's a talk of Python to not be laughed at for being SLOW, there > > would rather be efficient ways to deal with blocks of binary data. > > Do you have any evidences for this claim? Yes, it's written above, let me repeat it: bytes(n) is (can be) calloc(1, n) underlyingly, while b"\0" * n is a more complex algorithm. > > $ ./python -m timeit -s 'n = 10000' -- 'bytes(n)' > 1000000 loops, best of 3: 1.32 usec per loop > $ ./python -m timeit -s 'n = 10000' -- 'b"\0" * n' > 1000000 loops, best of 3: 0.858 usec per loop I don't know how inefficient CPython's bytes(n) or how efficient repetition (maybe 1-byte repetitions are optimized into memset()?), but MicroPython (where bytes(n) is truly calloc(n)) gives expected results: $ ./run-bench-tests bench/bytealloc* bench/bytealloc: 3.333s (+00.00%) bench/bytealloc-1-bytes_n.py 11.244s (+237.35%) bench/bytealloc-2-repeat.py -- Best regards, Paul mailto:pmiscml at gmail.com From storchaka at gmail.com Wed Jun 8 07:45:22 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 8 Jun 2016 14:45:22 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160608142645.71884fc1@x230> References: <57572E5D.4020101@stoneleaf.us> <20160608133737.63e6c666@x230> <20160608142645.71884fc1@x230> Message-ID: On 08.06.16 14:26, Paul Sokolovsky wrote: > On Wed, 8 Jun 2016 14:05:19 +0300 > Serhiy Storchaka wrote: > >> On 08.06.16 13:37, Paul Sokolovsky wrote: >>>> The obvious way to create the bytes object of length n is b'\0' * >>>> n. >>> >>> That's very inefficient: it requires allocating useless b'\0', then >>> a generic function to repeat arbitrary memory block N times. If >>> there's a talk of Python to not be laughed at for being SLOW, there >>> would rather be efficient ways to deal with blocks of binary data. >> >> Do you have any evidences for this claim? > > Yes, it's written above, let me repeat it: bytes(n) is (can be) > calloc(1, n) underlyingly, while b"\0" * n is a more complex algorithm. > >> >> $ ./python -m timeit -s 'n = 10000' -- 'bytes(n)' >> 1000000 loops, best of 3: 1.32 usec per loop >> $ ./python -m timeit -s 'n = 10000' -- 'b"\0" * n' >> 1000000 loops, best of 3: 0.858 usec per loop > > I don't know how inefficient CPython's bytes(n) or how efficient > repetition (maybe 1-byte repetitions are optimized into memset()?), but > MicroPython (where bytes(n) is truly calloc(n)) gives expected results: > > $ ./run-bench-tests bench/bytealloc* > bench/bytealloc: > 3.333s (+00.00%) bench/bytealloc-1-bytes_n.py > 11.244s (+237.35%) bench/bytealloc-2-repeat.py If the performance of creating an immutable array of n zero bytes is important in MicroPython, it is worth to optimize b"\0" * n. For now CPython is the main implementation of Python 3 and bytes(n) is slower than b"\0" * n in CPython. From pmiscml at gmail.com Wed Jun 8 08:11:47 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 8 Jun 2016 15:11:47 +0300 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> <20160608133737.63e6c666@x230> <20160608142645.71884fc1@x230> Message-ID: <20160608151147.54531ccc@x230> Hello, On Wed, 8 Jun 2016 14:45:22 +0300 Serhiy Storchaka wrote: [] > > $ ./run-bench-tests bench/bytealloc* > > bench/bytealloc: > > 3.333s (+00.00%) bench/bytealloc-1-bytes_n.py > > 11.244s (+237.35%) bench/bytealloc-2-repeat.py > > If the performance of creating an immutable array of n zero bytes is > important in MicroPython, it is worth to optimize b"\0" * n. No matter how you optimize calloc + something, it's always slower than just calloc. > For now CPython is the main implementation of Python 3 Indeed, and it already has bytes(N). So, perhaps nothing should be done about it except leaving it alone. Perhaps, more discussion should go into whether there's need for .iterbytes() if there's [i:i+1] already. (I personally skip that, as I find [i:i+1] perfectly ok, and while I can't understand how people may be not ok with it up to wanting something more, I leave such possibility). > and bytes(n) > is slower than b"\0" * n in CPython. -- Best regards, Paul mailto:pmiscml at gmail.com From ericsnowcurrently at gmail.com Wed Jun 8 10:17:28 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 8 Jun 2016 07:17:28 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Wed, Jun 8, 2016 at 12:07 AM, Franklin? Lee wrote: > On Jun 7, 2016 8:52 PM, "Eric Snow" wrote: >> * the default class *definition* namespace is now ``OrderdDict`` >> * the order in which class attributes are defined is preserved in the > > By using an OrderedDict, names are ordered by first definition point, rather > than location of the used definition. > > For example, the definition order of the following will be "x, y", even > though the definitions actually bound to the name are in order "y, x". > class C: > x = 0 > def y(self): return 'y' > def x(self): return 'x' > > Is that okay? In practice that will seldom be an issue. In the few cases where it could possibly be a problem, the class may explicitly set __definition_order__. -eric From barry at python.org Wed Jun 8 10:25:54 2016 From: barry at python.org (Barry Warsaw) Date: Wed, 8 Jun 2016 10:25:54 -0400 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> Message-ID: <20160608102554.120a5b2b.barry@wooz.org> On Jun 08, 2016, at 02:01 AM, Martin Panter wrote: >Bytes.byte() is a great idea. But what?s the point or use case of >bytearray.byte(), a mutable array of one pre-defined byte? I like Bytes.byte() too. I would guess you'd want the same method on bytearray for duck typing APIs. -Barry From ericsnowcurrently at gmail.com Wed Jun 8 10:26:29 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 8 Jun 2016 07:26:29 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Wed, Jun 8, 2016 at 1:07 AM, Victor Stinner wrote: >> Abstract >> ======== >> >> This PEP changes the default class definition namespace to ``OrderedDict``. >> Furthermore, the order in which the attributes are defined in each class >> body will now be preserved in ``type.__definition_order__``. This allows >> introspection of the original definition order, e.g. by class decorators. >> >> Note: just to be clear, this PEP is *not* about changing ``__dict__`` for >> classes to ``OrderedDict``. > > What is the cost in term of performance? Do you mean the cost of the PEP? The extra cost is negligible: creating an OrderedDict + mutation operations on it. Note that it is only used during class definition (execution of the class body). > > What can be slower: define a new class and/or instanciate a class? By "instantiate" do you mean the equivalent of "type(...)" or do you mean creating a new instance of a class? As noted above, the impact of using OrderedDict during class definition is negligible. During definition the cost of other operations will usually dwarf any extra overhead from using an OrderedDict. -eric From steve at pearwood.info Wed Jun 8 10:49:47 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 9 Jun 2016 00:49:47 +1000 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: References: <57572E5D.4020101@stoneleaf.us> Message-ID: <20160608144947.GK12028@ando.pearwood.info> On Wed, Jun 08, 2016 at 10:04:08AM +0200, Victor Stinner wrote: > It's common that users complain that Python core developers like > breaking the compatibility at each release. No more common as users complaining that Python features are badly designed and crufty and should be fixed. Whatever we do, we can't win. If we fix misfeatures, people complain. If we don't fix them, people complain. Sometimes the same people, depending on their specific needs. "Fix this, because it annoys me, but don't fix that, because I'm used to it and it doesn't annoy me any more." *shrug* Ultimately it comes down to a subjective feeling as to which is worse. My own subjective feeling is that, in the long run, we'll be better off fixing bytes than keeping it, and the longer we wait to fix it, the harder it will be. -- Steve From leewangzhong+python at gmail.com Wed Jun 8 16:42:25 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Wed, 8 Jun 2016 16:42:25 -0400 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160608151147.54531ccc@x230> References: <57572E5D.4020101@stoneleaf.us> <20160608133737.63e6c666@x230> <20160608142645.71884fc1@x230> <20160608151147.54531ccc@x230> Message-ID: On Jun 8, 2016 8:13 AM, "Paul Sokolovsky" wrote: > > Hello, > > On Wed, 8 Jun 2016 14:45:22 +0300 > Serhiy Storchaka wrote: > > [] > > > > $ ./run-bench-tests bench/bytealloc* > > > bench/bytealloc: > > > 3.333s (+00.00%) bench/bytealloc-1-bytes_n.py > > > 11.244s (+237.35%) bench/bytealloc-2-repeat.py > > > > If the performance of creating an immutable array of n zero bytes is > > important in MicroPython, it is worth to optimize b"\0" * n. > > No matter how you optimize calloc + something, it's always slower than > just calloc. `bytes(n)` *is* calloc + something. It's a lookup of and call to a global function. (Unless MicroPython optimizes away lookups for builtins, in which case it can theoretically optimize b"\0".__mul__.) On the other hand, b"\0" is a constant, and * is an operator lookup that succeeds on the first argument (meaning, perhaps, a successful branch prediction). As a constant, it is only created once, so there's no intermediate object created. AFAICT, the first requires optimizing global function lookups + calls, and the second requires optimizing lookup and *successful* application of __mul__ (versus failure + fallback to some __rmul__), and repetitions of a particular `bytes` object (which can be interned and checked against). That means there is room for either to win, depending on the efforts of the implementers. (However, `bytearray` has no syntax for literals (and therefore easy constants), and is a more valid and, AFAIK, more practical concern.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at python.ca Wed Jun 8 17:01:33 2016 From: neil at python.ca (Neil Schemenauer) Date: Wed, 8 Jun 2016 14:01:33 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 Message-ID: <20160608210133.GA4318@python.ca> [I've posted something about this on python-ideas but since I now have some basic working code, I think it is more than an idea.] I think the uptake of Python 3 is starting to accelerate. That's good. However, there are still millions or maybe billions of lines of Python code that still needs to be ported. It is beneficial to the Python ecosystem if this code can get ported. My idea is to make a stepping stone version of Python, between 2.7.x and 3.x that eases the porting job. The high level goals are: - code coming out of 2to3 runs correctly on this modified Python - code that runs without warnings on this modified Python will run correctly on Python 3.x. Achieving these goals is not technically possible. Still, I want to reduce as much as possible the manual work involved in porting. Incrementally fixing code that generates warnings is a lot easier than trying to fix an entire application or library at once. I have a very early version on github: https://github.com/nascheme/ppython I'm hoping if people find it useful then they would contribute backwards compatibility fixes that help their applications or librarys run. I am currently running a newly 2to3 ported application on it. At this time there is no warning generated but I would rather get a warning then have one of my customers run into a porting bug. To be clear, I'm not proposing that these backwards compatiblity features go into Python 3.x or that this modified Python becomes the standard version. It is purely an intermediate step in getting code ported to Python 3. I've temporarily named it "Pragmatic Python". I'd like a better name if someone can suggest one. Maybe something like Perverted, Debauched or Impure Python. Regards, Neil From rymg19 at gmail.com Wed Jun 8 17:33:22 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 8 Jun 2016 16:33:22 -0500 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160608210133.GA4318@python.ca> References: <20160608210133.GA4318@python.ca> Message-ID: On Jun 8, 2016 4:04 PM, "Neil Schemenauer" wrote: > > [I've posted something about this on python-ideas but since I now > have some basic working code, I think it is more than an idea.] > > I think the uptake of Python 3 is starting to accelerate. That's > good. However, there are still millions or maybe billions of lines > of Python code that still needs to be ported. It is beneficial to > the Python ecosystem if this code can get ported. > > My idea is to make a stepping stone version of Python, between 2.7.x > and 3.x that eases the porting job. The high level goals are: > > - code coming out of 2to3 runs correctly on this modified Python > > - code that runs without warnings on this modified Python will run > correctly on Python 3.x. > > Achieving these goals is not technically possible. Still, I want to > reduce as much as possible the manual work involved in porting. > Incrementally fixing code that generates warnings is a lot easier > than trying to fix an entire application or library at once. > > I have a very early version on github: > > https://github.com/nascheme/ppython > > I'm hoping if people find it useful then they would contribute > backwards compatibility fixes that help their applications or > librarys run. I am currently running a newly 2to3 ported > application on it. At this time there is no warning generated but I > would rather get a warning then have one of my customers run into a > porting bug. > > To be clear, I'm not proposing that these backwards compatiblity > features go into Python 3.x or that this modified Python becomes the > standard version. It is purely an intermediate step in getting code > ported to Python 3. > > I've temporarily named it "Pragmatic Python". I'd like a better > name if someone can suggest one. Maybe something like Perverted, > Debauched or Impure Python. > ...Perverted Python? Ouch. What about something like "unpythonic" or similar? > Regards, > > Neil > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fred at fdrake.net Wed Jun 8 17:40:39 2016 From: fred at fdrake.net (Fred Drake) Date: Wed, 8 Jun 2016 17:40:39 -0400 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> Message-ID: On Wed, Jun 8, 2016 at 5:33 PM, Ryan Gonzalez wrote: > What about something like "unpythonic" or similar? Or perhaps... antipythy? -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From greg.ewing at canterbury.ac.nz Wed Jun 8 18:08:50 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 09 Jun 2016 10:08:50 +1200 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> Message-ID: <57589772.2010707@canterbury.ac.nz> > On Jun 8, 2016 4:04 PM, "Neil Schemenauer" > wrote: > > > > I've temporarily named it "Pragmatic Python". I'd like a better > > name if someone can suggest one. Maybe something like Perverted, > > Debauched or Impure Python. Python Two and Three Quarters. -- Greg From phd at phdru.name Wed Jun 8 18:13:47 2016 From: phd at phdru.name (Oleg Broytman) Date: Thu, 9 Jun 2016 00:13:47 +0200 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <57589772.2010707@canterbury.ac.nz> References: <20160608210133.GA4318@python.ca> <57589772.2010707@canterbury.ac.nz> Message-ID: <20160608221347.GA5854@phdru.name> On Thu, Jun 09, 2016 at 10:08:50AM +1200, Greg Ewing wrote: > >On Jun 8, 2016 4:04 PM, "Neil Schemenauer" >> wrote: > > > > > > I've temporarily named it "Pragmatic Python". I'd like a better > > > name if someone can suggest one. Maybe something like Perverted, > > > Debauched or Impure Python. > > Python Two and Three Quarters. QOTW! :-D > -- > Greg Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From tjreedy at udel.edu Wed Jun 8 18:22:22 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 8 Jun 2016 18:22:22 -0400 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 6/8/2016 4:07 AM, Victor Stinner wrote: >> Abstract >> ======== >> >> This PEP changes the default class definition namespace to ``OrderedDict``. >> Furthermore, the order in which the attributes are defined in each class >> body will now be preserved in ``type.__definition_order__``. This allows >> introspection of the original definition order, e.g. by class decorators. >> >> Note: just to be clear, this PEP is *not* about changing ``__dict__`` for >> classes to ``OrderedDict``. > > What is the cost in term of performance? > > What can be slower: define a new class and/or instanciate a class? A class is defined once, used many times to instantiate instances. Each instance is typically used many times, with many lookups. So it is self.class_attribute lookups, like method lookups, that likely matter the most, and which are not changed by the PEP. -- Terry Jan Reedy From jake at lwn.net Wed Jun 8 19:23:58 2016 From: jake at lwn.net (Jake Edge) Date: Wed, 8 Jun 2016 17:23:58 -0600 Subject: [Python-Dev] Round 2 of the Python Language Summit coverage at LWN Message-ID: <20160608172358.146ae6c6@redtail.lan> Howdy python-dev, The second batch of articles from the Python Language Summit is now available. The starting point is here: https://lwn.net/Articles/688969/ (or here for non-subscribers: https://lwn.net/SubscriberLink/688969/91cbeeaf32807914/ for the next few hours anyway, it will be open to all after that using either link.) I have added five more sessions since last week's three, still six more to go, which should all be done by next week (and I'll post here again). Python's GitHub migration and workflow changes: https://lwn.net/Articles/689937/ https://lwn.net/SubscriberLink/689937/1fd56367a74206bf/ The state of mypy: https://lwn.net/Articles/690081/ https://lwn.net/SubscriberLink/690081/5c35679cafe42d1b/ An introduction to pytype: https://lwn.net/Articles/690150/ https://lwn.net/SubscriberLink/690150/660acde532afb8a3/ PyCharm and type hints: https://lwn.net/Articles/690186/ https://lwn.net/SubscriberLink/690186/848c447551204ffe/ Python 3.6 and 3.7 release cycles: https://lwn.net/Articles/690404/ https://lwn.net/SubscriberLink/690404/73cfb918fa21d27c/ The articles will be freely available (without using the SubscriberLink) to the world at large in a week (and the next batch the week after that) ... until then, feel free to share the SubscriberLinks. Hopefully I have captured things reasonably well. If there are corrections or clarifications needed, though, I recommend posting them as comments on the article. enjoy! jake -- Jake Edge - LWN - jake at lwn.net - http://lwn.net From ben+python at benfinney.id.au Wed Jun 8 19:55:50 2016 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 09 Jun 2016 09:55:50 +1000 Subject: [Python-Dev] Round 2 of the Python Language Summit coverage at LWN References: <20160608172358.146ae6c6@redtail.lan> Message-ID: <8560tj1bi1.fsf@benfinney.id.au> Jake Edge writes: > The second batch of articles from the Python Language Summit is now > available. Thank you for writing these (and many other good articles) for Linux Weekly News! High-quality, dependable reporting is very valuable for our community. -- \ ?To punish me for my contempt of authority, Fate has made me an | `\ authority myself.? ?Albert Einstein, 1930-09-18 | _o__) | Ben Finney From victor.stinner at gmail.com Wed Jun 8 21:11:10 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 9 Jun 2016 03:11:10 +0200 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160608210133.GA4318@python.ca> References: <20160608210133.GA4318@python.ca> Message-ID: 2016-06-08 23:01 GMT+02:00 Neil Schemenauer : > - code coming out of 2to3 runs correctly on this modified Python Stop using 2to3. This tool adds many useless changes when you only care of Python 2.7 and Python 3.4+. I suggest to use better tools like 2to6, modernize or my own tool: https://pypi.python.org/pypi/sixer "Add Python 3 support to Python 2 applications using the six module." Victor From guido at python.org Wed Jun 8 22:35:04 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Jun 2016 19:35:04 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> Message-ID: Or write your own set of 2to3 fixers that *are* necessary. On Wed, Jun 8, 2016 at 6:11 PM, Victor Stinner wrote: > 2016-06-08 23:01 GMT+02:00 Neil Schemenauer : > > - code coming out of 2to3 runs correctly on this modified Python > > Stop using 2to3. This tool adds many useless changes when you only > care of Python 2.7 and Python 3.4+. I suggest to use better tools like > 2to6, modernize or my own tool: > https://pypi.python.org/pypi/sixer > > "Add Python 3 support to Python 2 applications using the six module." > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jun 9 07:25:04 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 9 Jun 2016 04:25:04 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? Message-ID: <57595210.4000508@hastings.org> A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix". As 3.5 Release Manager, I can put my foot down and make rulings, and AFAIK the only way to overrule me is with the BDFL. In two of three cases I've put my foot down. In the third I'm pretty sure I'm right, but IIUC literally everyone with a stated opinion else disagrees with me. So I thought it best I escalate it. Note that 3.5.2 is going to wait until the issue is settled and any changes to behavior are written and checked in. (Blanket disclaimer for the below: in some places I'm trying to communicate other's people positions. I apologize if I misrepresented yours; please reply and correct my mistake. Also, sorry for the length of this email. But feel even sorrier for me: this debate has already eaten two days this week.) BACKGROUND For 3.5 os.urandom() was changed: instead of reading from /dev/urandom, it uses the new system call getrandom() where available. This is a new system call on Linux (which has already been cloned by Solaris). getrandom(), as CPython uses it, reads from the same PRNG that /dev/urandom gets its bits from. But because it's a system call you don't have to mess around with file handles. Also it always works in chrooted environments. Sounds like a fine idea. Also for 3.5, several other places where CPython internally needs random bits were switched from reading from /dev/urandom to calling getrandom(). The two that I know of: choosing the seed for hash randomization, and initializing the default Mersenne Twister for the random module. There's one subtle but important difference between /dev/urandom and getrandom(). At startup, Linux seeds the urandom PRNG from the entropy pool. If the entropy pool is uninitialized, what happens? CPython's calls to getrandom() will block until the entropy pool is initialized, which is usually just a few seconds (or less) after startup. But /dev/urandom *guarantees* that reads will *always* work. If the entropy pool hasn't been initialized, it pulls numbers from the PRNG before it's been properly seeded. What this results in depends on various aspects of the configuration (do you have ECC RAM? how long was the machine powered down? does the system have a correct realtime clock?). In extreme circumstances this may mean the "random" numbers are shockingly predictable! Under normal circumstances this minor difference is irrelevant. After all, when would the entropy pool ever be uninitialized? THE PROBLEM Issue #26839: http://bugs.python.org/issue26839 (warning, the issue is now astonishingly long, and exhausting to read, and various bits of it are factually wrong) A user reports that when starting CPython soon after startup on a fresh virtual machine, the process would hang for a long time. Someone on the issue reported observed delays of over 90 seconds. Later we found out: it wasn't 90 seconds before CPython became usable, these 90 seconds delays were before systemd timed out and simply killed the process. It's not clear what the upper bound on the delay might be. The issue author had already identified the cause: CPython was blocking on getrandom() in order to initialize hash randomization. On this fresh virtual machine the entropy pool started out uninitialized. And since the only thing running on the machine was CPython, and since CPython was blocked on initialization, the entropy pool was initializing very, very slowly. Other posters to the thread pointed out that the same thing would happen in "import random", if your code could get that far. The constructor for the Random() object would seed the Mersenne Twister, which would call getrandom() and block. Naturally, callers to os.urandom() could also block for an unbounded period for the same reason. MY RULINGS SO FAR 1) The change in 3.5 that means "import random" may block for an unbounded period of time on Linux due to the switch to getrandom() must be backed out or amended so that it never blocks. I *think* everyone agrees with this. The Mersenne Twister is not a CPRNG, so seeding it with crypto-quality bits isn't necessary. And unbounded delays are bad. 2) The change in 3.5 that means hash randomization initialization may block for an unbounded period of time on Linux due to the switch to getrandom() must be backed out or amended so that it never blocks. I believe most people agree with me. The cryptography experts disagree. IIUC both Alex Gaynor and Christian Heimes feel the blocking is preferable to non-random hash "randomization". Yes, the bad random data means the hashing will be predictable. Neither choice is exactly what you want. But most people feel it's simply unreasonable that in extreme corner cases CPython can block for an unbounded amount of time before running user code. OS.URANDOM() Here's where it gets complicated--and where everyone else thinks I'm wrong. os.urandom() is currently the best place for a Python programmer to get high-quality random bits. The one-line summary for os.urandom() reads: "Return a string of n random bytes suitable for cryptographic use." On 3.4 and before, on Linux, os.urandom() would never block, but if the entropy pool was uninitialized it could return very-very-poor-quality random bits. On 3.5.0 and 3.5.1, on Linux, when using the getrandom() call, it will instead block for an apparently unbounded period before returning high-quality random bits. The question: is this new behavior preferable, or should we return to the old behavior? Since I'm the one writing this email, let me make the case for my position: I think that os.urandom() should never block on Linux. Why? 1) Functions in the os module that look like OS functions should behave predictably like thin wrappers over those OS functions. Most of the time this is exactly what they are. In some cases they're more sophisticated; examples include os.popen(), os.scandir(), and the byzantine os.utime(). There are also some functions provided by the os module that don't resemble any native functionality, but these have unique names that don't look like anything provided by the OS. This makes the behavior of the Python function easy to reason about: it always behaves like your local OS function. Python provides os.stat() and it behaves like the local stat(). So if you want to know how any os module function behaves, just read your local man page. Therefore, os.urandom() should behave exactly like a thin shell around reading the local /dev/urandom. On Linux, /dev/urandom guarantees that it will never block. This means it has undesirable behavior if read immediately after a fresh boot. But this guarantee is so strong that Theodore Ts'o couldn't break it to fix the undesirable behavior. Instead he added the getrandom() system call. But he left /dev/urandom alone. Therefore, on Linux, os.urandom() should behave the same way, and also never block. 2) It's unfair to change the semantics of a well-established function to such a radical degree. os.urandom() has been in Python since at least 2.6--I was too lazy to go back any further. From 2.6 to 3.4, it behaved exactly like /dev/urandom, which meant that on Linux it would never block. As of 3.5, on Linux, it might now block for an unbounded period of time. Any code that calls os.urandom() has had its behavior radically changed in this extreme corner case. 3) os.urandom() doesn't actually guarantee it's suitable for cryptography. The documentation for os.urandom() has contained this sentence, untouched, since 2.6: The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). Of course, version 3.5 added this: On Linux 3.17 and newer, the getrandom() syscall is now used when available. But the waffling about its suitability for cryptography remains unchanged. So, while it's undesirable that os.urandom() might return shockingly poor quality random bits, it is *permissible* according to the documentation. 4) This really is a rare corner-case we're talking about. I just want to re-state: this case on Linux where /dev/urandom returns totally predictable bytes, and getrandom() will block, only happens when the entropy pool for urandom is uninitialized. Although it has been seen in the field, it's extremely rare. 99.99999%+ of the time, reading /dev/urandom and calling getrandom() will both return the exact same high-quality random bits without blocking. 5) This corner-case behavior is fixable externally to CPython. I don't really understand the subject, but apparently it's entirely reasonable to expect sysadmins to directly manage the entropy pools of virtual machines. They should be able to spin up their VMs with a pre-filled entropy pool. So it should be possible to ensure that os.urandom() always returns the high-quality random bits we wanted, even on freshly-booted VMs. 6) Guido and Tim Peters already decided once that os.urandom() should behave like /dev/urandom. Issue #25003: http://bugs.python.org/issue25003 In 2.7.10, os.urandom() was changed to call getentropy() instead of reading /dev/urandom when getentropy() was available. getentropy() was "stunningly slow" on Solaris, on the order of 300x slower than reading /dev/urandom. Guido and Tim both participated in the discussion on the issue; Guido also apparently discussed it via email with Theo De Raadt. While it's not quite apples-to-apples, I think this establishes some precedent that os.urandom() should * behave like /dev/urandom, and * be fast. -- On the other side is... everybody else. I've already spent an enormous amount of time researching and writing and re-writing this email. Rather than try (and fail) to accurately present the other sides of this debate, I'm just going to end the email here and let the other participants reply and voice their views. Bottom line: Guido, in this extreme corner case on Linux, should os.urandom() return bad random data like it used to, or should it block forever like it does in 3.5.0 and 3.5.1? //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Jun 9 07:35:38 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 9 Jun 2016 13:35:38 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57595210.4000508@hastings.org> References: <57595210.4000508@hastings.org> Message-ID: I understood that Christian Heimes and/or Donald Stufft are interested to work on a PEP. 2016-06-09 13:25 GMT+02:00 Larry Hastings : > A problem has surfaced just this week in 3.5.1. Obviously this is a good > time to fix it for 3.5.2. But there's a big argument over what is "broken" > and what is an appropriate "fix". IMHO the bug is now fixed in 3.5.2 as I explained at: http://haypo-notes.readthedocs.io/pep_random.html#status-of-python-3-5-2 > THE PROBLEM > > Issue #26839: > > http://bugs.python.org/issue26839 > > (warning, the issue is now astonishingly long, and exhausting to read, and > various bits of it are factually wrong) You may want to read my summary: http://haypo-notes.readthedocs.io/pep_random.html I'm not interested to reply to Larry's email point per point. IHMO a formal PEP is now required for Python 3.6 (to enhance os.urandom and clarify Python behaviour before urandom is initialized). Python 3.5.2 is fixed, there is no more urgency ;-) Victor From christian at python.org Thu Jun 9 07:54:38 2016 From: christian at python.org (Christian Heimes) Date: Thu, 9 Jun 2016 13:54:38 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57595210.4000508@hastings.org> References: <57595210.4000508@hastings.org> Message-ID: On 2016-06-09 13:25, Larry Hastings wrote: > > A problem has surfaced just this week in 3.5.1. Obviously this is a > good time to fix it for 3.5.2. But there's a big argument over what is > "broken" and what is an appropriate "fix". > > As 3.5 Release Manager, I can put my foot down and make rulings, and > AFAIK the only way to overrule me is with the BDFL. In two of three > cases I've put my foot down. In the third I'm pretty sure I'm right, > but IIUC literally everyone with a stated opinion else disagrees with > me. So I thought it best I escalate it. Note that 3.5.2 is going to > wait until the issue is settled and any changes to behavior are written > and checked in. > > (Blanket disclaimer for the below: in some places I'm trying to > communicate other's people positions. I apologize if I misrepresented > yours; please reply and correct my mistake. Also, sorry for the length > of this email. But feel even sorrier for me: this debate has already > eaten two days this week.) Thanks for the digest, Larry. I would appreciate if we could split the issue into three separate problems: 1) behavior of os.urandom() 2) initialization of _Py_HashSecret for byte, str and XML hash randomization. 3) initialization of default random.random Mersenne-Twister As of now 2 and 3 are the culprit for blocking starting. Both happen to use _PyOS_URandom() either directly or indirectly through os.urandom(). We chose to use the OS random source because it was convenient. It is not a necessity. The seed for Mersenne-Twister and the keys for hash randomization don't have to be strong cryptographic values in all cases. They just have to be hard-to-guess by an attacker. In case of scripts in early boot, there are no viable attack scenarios. Therefore I propose to fix problem 2 and 3: - add a new random_seed member to _Py_HashSecret and use it to derive an initial Mersenne-Twister state for the default random instance of the random module. - try CPRNG for _Py_HashSecret first, fall back to a user space RNG when the Kernel's CPRNG would block. For some operating systems like Windows and OSX, we can assume that Kernel CPRNG is always available. For Linux we can use getrandom() in non-blocking mode and handle EWOULDBLOCK. On BSD the seed state can be queried from /proc. Christian From cory at lukasa.co.uk Thu Jun 9 08:12:22 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 9 Jun 2016 13:12:22 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> Message-ID: <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> > On 9 Jun 2016, at 12:54, Christian Heimes wrote: > > Therefore I propose to fix problem 2 and 3: > > - add a new random_seed member to _Py_HashSecret and use it to derive an > initial Mersenne-Twister state for the default random instance of the > random module. > > - try CPRNG for _Py_HashSecret first, fall back to a user space RNG when > the Kernel's CPRNG would block. > > For some operating systems like Windows and OSX, we can assume that > Kernel CPRNG is always available. For Linux we can use getrandom() in > non-blocking mode and handle EWOULDBLOCK. On BSD the seed state can be > queried from /proc. > I am in agreement with Christian here. Let me add: Larry has suggested that it?s ok that os.urandom() can degrade to weak random numbers in part because "os.urandom() doesn't actually guarantee it's suitable for cryptography.? That?s true, that is what the documentation says. However, that documentation has been emphatically disagreed with by the entire Python ecosystem *including* the Python standard library. Both random.SystemRandom and the secrets module use os.urandom() to generate their random numbers. The secrets module says this right at the top: "The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.? Regressing the behaviour in os.urandom() would mean that this statement is not unequivocally true but only situationally true. It would be more accurate to say ?The secrets module should generate cryptographically strong random numbers most of the time?. So I?d argue that while os.urandom() does not make these promises, the rest of the standard library behaves like it does. While we?re here I should note that the cryptography project unequivocally recommends os.urandom[0], and that this aspect of Linux?s /dev/urandom behaviour is considered to be a dangerous misfeature by almost everyone in the crypto community. The Linux kernel can?t change this stuff easily because they mustn?t break userspace. Python *is* userspace, we can do what we like, and we should be aiming to make sure that doing the obvious thing in Python amounts to doing the *right* thing. *Obviously* this shouldn?t block startup, and obviously we should fix that, but I disagree that we should be reverting the change to os.urandom(). Cory [0]: https://cryptography.io/en/latest/random-numbers/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Thu Jun 9 08:26:20 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 08:26:20 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57595210.4000508@hastings.org> References: <57595210.4000508@hastings.org> Message-ID: <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> > On Jun 9, 2016, at 7:25 AM, Larry Hastings wrote: > > A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix". Couple clarifications: random.py --------- In the abstract it doesn't hurt to seed MT with a CSPRNG, it just doesn't provide much (if any) benefit and in this case it is hurting us because of the cost on import (which will exist on other platforms as well no matter what we do here for Linux). There are a couple solutions to this problem: * Use getrandom(GRND_NONBLOCK) for random.Random since it doesn't matter if we get cryptographically secure random numbers or not. * Switch it to use something other than a CSPRNG by default since it doesn't need that. * Instead of seeding itself from os.urandom on import, have it lazily do that the first time one of the random.rand* functions are called. * Do nothing, and say that ``import random`` relies on having the kernel's urandom pool initialized. Between these options, I have a slight preference for switching it to use a non CSPRNG, but I really don't care that much which of these options we pick. Using random.Random is not secure and none of the above options meaningfully change the security posture of something that accidently uses it. SipHash and the Interpreter Startup ----------------------------------- I have complicated thoughts on what SipHash should do. For something like, a Django process, we never want it to be initialized with ?bad? entropy, however reading straight from /dev/urandom, or getrandom(GRND_NONBLOCK) means that we might get that if we start the process early enough in the boot process. The rub here is that I cannot think of a situation where by the time you?re at the point you?re starting up something like Django, you?re even remotely likely to not have an initialized random pool. The other side of this issue is that we have Python scripts which do not need a secure random being passed to SipHash running early enough in the boot process with systemd that we need to be able to have SipHash initialization not block on waiting for /dev/urandom. So I?m torn between the ?Practicality beats Purity? mindset, which says we should just let SipHash seed itself with whatever quality of random from the urandom pool is currently available and the ?Special cases aren?t special enough to break the rules? mindset which says that we should just make it easier for scripts in this edge case to declare they don?t care about hash randomization to remove the need for it (in other words, a CLI flag that matches PYTHONHASHSEED in functionality). An additional wrinkle in the mix is that we cannot get non-blocking random on many (any?) modern OS besides Linux, so we're going to run into this same problem if say, FreeBSD decides to put a Python script early enough in the boot sequence. In the end, both of these choices make me happy and unhappy in different ways but I would lean towards adding a CLI flag for the special case and letting the systemd script that caused this problem invoke their Python with that flag. I think this because: * It leaves the interpreter so that it is secure by default, but provides the relevant knobs to turn off this default in cases where a user doesn't need or want it. * It solves the problem in a cross platform way, that doesn't rely on the nuances of the CSPRNG interface on one particular supported platform. os.urandom ---------- There have been a lot of proposals thrown around, and people pointing to different sections of the documentation to justify different opinions. This is easily the most contentious question we have here. It is my belief that reading from urandom is the right thing to do for generating cryptographically secure random numbers. This is a view point held by every major security expert and cryptographer that I'm aware of. Most (all?) major platforms besides Linux do not allow reading from their equivalent of /dev/urandom until it has been successfully initialized and it is widely held by all security experts and cryptographers that I'm aware of that this property is a good one, and the Linux behavior of /dev/urandom is a wart/footgun but that prior to getrandom() there simply wasn't a better option on Linux. With that in mind, I think that we should, to the best of our ability given the platform we're on, ensure that os.urandom does not return bytes that the OS does not think is cryptographically secure. In practice this means that os.urandom should do one of two things in the very early boot process on Linux: * Block waiting on the kernel to initialize the urandom pool, and then return the now secure random bytes given to us. * Raise an exception saying that the pool has not been initialized and thus os.urandom is not ready yet. The key point in both of these options is that os.urandom never [1] returns bytes prior to the OS believing that it can give us cryptographically secure random bytes. I believe I have a preference for blocking on waiting the kernel to intialize the urandom pool, because that makes Linux behave similarly to the other platforms that I'm aware of. I do not believe that adding additional public functions like some other people have expressed to be a good option. I think they muddy the waters and I think that it forces us to try and convince people that "no really, yes everyone says you should use urandom, but you actually want getrandom". Particularly since the outcome of these two functions would be exactly the same in all but a very narrow edge case on Linux. Larry has suggested that os.py should only ever be thin shells around OS provided functionality and thus os.urandom should simply mimic whatever the behavior of /dev/urandom is on that OS. For os.urandom in particular this is already not the case since it calls CryptGetRandom on Windows, but putting that aside since that's a Windows vs POSIX difference, we're not talking about adding a great amount of functionality around something provided by the OS. We're only talking about using a different interface to access the same underlying functionality. In this case, an interface that better suits the actual use of os.urandom in the wild and provides better properties all around. He's also pointed out that the documentation does not guarantee that the result of os.urandom will be cryptographically strong in the following quote: This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. My read of this quote, is that this is a hedge against operating systems that have implemented their urandom pool in such a way that it does not return cryptographically secure random numbers that you don't come back and yell at Python for it. In other words, it's a hedge against /dev/urandom being https://xkcd.com/221/. I do not think this documentation excuses us from using a weaker interface to the OS-specific randomness source simply because it's name happens to match the name of the function. Particularly since earlier on in that documentation it states: Return a string of n random bytes suitable for cryptographic use. and the Python standard library, and the entire ecosystem as I know it, as well as all security experts and crypto experts believe you should treat it as such. This is largely because if your urandom pool is implemented in a way that, in the general case it provides insecure random values, then you're beyond the pale and there's nothing that Python, or anyone but your OS vendor, can do to help you. Further more, I think that the behavior I want (that os.urandom is secure by default to the best of our abilities) is tricker to get right, and requires interfacing with C code. However, getting the exact semantics of /dev/urandom on Linux is trivial to do with a single line of Python code: def urandom(amt): open("/dev/urandom", "rb").read(amt) So if you're someone who is depending on the Linux urandom behavior in an edge case that almost nobody is going to hit, you can trivially get the old behavior back. Even better, if you're someone depending on this, you're going to get an *obvious* failure rather than silently getting insecure bytes. On top of all of that, this only matters in a small edge case, most likely to only ever been hit by OS vendors themselves, who are in the best position to make informed decisions about how to work around the fact the urandom entropy pool hasn't already been initialized rather than expecting every other user to have to try and ensure that they don't start their Python script too early. [1] To the best of our ability, given the interfaces and implementation provided to us by the OS. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Thu Jun 9 08:32:02 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 08:32:02 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57595210.4000508@hastings.org> References: <57595210.4000508@hastings.org> Message-ID: <2D2A5F9D-05BD-4E9D-907D-0E862E11821D@stufft.io> > On Jun 9, 2016, at 7:25 AM, Larry Hastings wrote: > > 6) Guido and Tim Peters already decided once that os.urandom() should behave like /dev/urandom. > > Issue #25003: > http://bugs.python.org/issue25003 To be exceedingly clear, in this issue the problem wasn?t that os.urandom was blocking once, early on in the boot process before the kernel had initialized it?s urandom pool. The problem was that the getentropy() function on Solaris behaves more like /dev/random does on Linux. This behavior is something that myself, and most security experts/cryptographers that I know of, think is bad behavior (and indeed, most OSs have gotten rid of this behavior of /dev/random and made /dev/random and /dev/urandom behave the same... except again for Linux). The ask here isn't to make Linux behave like Solaris did in that issue, it's to use the newer, better, interface to make Linux use the more secure behavior that most (all?) of the other modern OSs have already adopted. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Thu Jun 9 08:41:01 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 09 Jun 2016 08:41:01 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> Message-ID: <20160609124102.5EE4EB14024@webabinitio.net> On Thu, 09 Jun 2016 13:12:22 +0100, Cory Benfield wrote: > The Linux kernel can???t change this stuff easily because they mustn???t > break userspace. Python *is* userspace, we can do what we like, and we I don't have specific input on the rest of this discussion, but I disagree strongly with this statement. The environment in which python programs run, ie: the python runtime and standard library, are *our* "userspace", and the same constraints apply to our making changes there as apply to the linux kernel and its userspace...even though we knowingly break those constraints from time to time[*]. --David [*] Which I think the twisted folks at least would argue we shouldn't be doing :) From doug at doughellmann.com Thu Jun 9 08:53:51 2016 From: doug at doughellmann.com (Doug Hellmann) Date: Thu, 09 Jun 2016 08:53:51 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160609124102.5EE4EB14024@webabinitio.net> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> Message-ID: <1465476616-sup-8510@lrrr.local> Excerpts from R. David Murray's message of 2016-06-09 08:41:01 -0400: > On Thu, 09 Jun 2016 13:12:22 +0100, Cory Benfield wrote: > > The Linux kernel can???t change this stuff easily because they mustn???t > > break userspace. Python *is* userspace, we can do what we like, and we > > I don't have specific input on the rest of this discussion, but I disagree > strongly with this statement. The environment in which python programs > run, ie: the python runtime and standard library, are *our* "userspace", > and the same constraints apply to our making changes there as apply > to the linux kernel and its userspace...even though we knowingly break > those constraints from time to time[*]. > > --David > > [*] Which I think the twisted folks at least would argue we shouldn't > be doing :) I agree with David. We shouldn't break existing behavior in a way that might lead to someone else's software being unusable. Adding a new API that does block allows anyone to call that when they want guaranteed random values, and the decision about whether to block or not can be placed in the application developer's hands. Christian's points about separating the various cases and solutions also make sense. Doug From donald at stufft.io Thu Jun 9 08:59:57 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 08:59:57 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <1465476616-sup-8510@lrrr.local> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> Message-ID: <6E696C6F-5ADB-4F82-AF8D-E1C13CC71BA8@stufft.io> > On Jun 9, 2016, at 8:53 AM, Doug Hellmann wrote: > > Excerpts from R. David Murray's message of 2016-06-09 08:41:01 -0400: >> On Thu, 09 Jun 2016 13:12:22 +0100, Cory Benfield wrote: >>> The Linux kernel can???t change this stuff easily because they mustn???t >>> break userspace. Python *is* userspace, we can do what we like, and we >> >> I don't have specific input on the rest of this discussion, but I disagree >> strongly with this statement. The environment in which python programs >> run, ie: the python runtime and standard library, are *our* "userspace", >> and the same constraints apply to our making changes there as apply >> to the linux kernel and its userspace...even though we knowingly break >> those constraints from time to time[*]. >> >> --David >> >> [*] Which I think the twisted folks at least would argue we shouldn't >> be doing :) > > I agree with David. We shouldn't break existing behavior in a way > that might lead to someone else's software being unusable. > > Adding a new API that does block allows anyone to call that when > they want guaranteed random values, and the decision about whether > to block or not can be placed in the application developer's hands. > I think this is a terrible compromise. The new API is going to be exactly the same as the old API in 99.9999% of cases and it's fighting against the entire software ecosystem's suggestion of what to use ("use urandom" is basically a meme at this point). This is like saying that we can't switch to verifying HTTPS by default because a one in a million connection might have different behavior instead of being silently insecure. ? Donald Stufft From cory at lukasa.co.uk Thu Jun 9 09:27:36 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 9 Jun 2016 14:27:36 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <1465476616-sup-8510@lrrr.local> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> Message-ID: > On 9 Jun 2016, at 13:53, Doug Hellmann wrote: > > I agree with David. We shouldn't break existing behavior in a way > that might lead to someone else's software being unusable. What does ?usable? mean? Does it mean ?the code must execute from beginning to end?? Or does it mean ?the code must maintain the expected invariants?? If it?s the second, what reasonably counts as ?the expected invariants?? The problem here is that both definitions of ?broken? are unclear. If we leave os.urandom() as it is, there is a small-but-nonzero change that your program will hang, potentially indefinitely. If we change it back, there is a small-but-nonzero chance your program will generate you bad random numbers. If we assume, for a moment, that os.urandom() doesn?t get called during Python startup (that is that we adopt Christian?s approach to deal with random and SipHash as separate concerns), what we?ve boiled down to is: your application called os.urandom() so early that you?ve got weak random numbers, does it hang or proceed? Those are literally our two options. These two options can be described a different way. If you didn?t actually need strong random numbers but were affected by the hang, that program failed obviously, and it failed closed. You *will* notice that your program didn?t start up, you?ll investigate, and you?ll take action. On the other hand, if you need strong random numbers but were affected by os.urandom() returning bad random numbers, you almost certainly will *not* notice, and your program will have failed *open*: that is, you are exposed to a security risk, and you have no way to be alerted to that fact. For my part, I think the first failure mode is *vastly* better than the second, even if the first failure mode affects vastly more people than the second one does. Failing early, obviously, and safely is IMO much, much better than failing late, silently, and dangerously. I?d argue that all the security disagreements that happen in this list boil down to weighting that differently. For my part, I want code that expects to be used in a secure context to fail *as loudly as possible* if it is unable to operate securely. And for that reason: > Adding a new API that does block allows anyone to call that when > they want guaranteed random values, and the decision about whether > to block or not can be placed in the application developer's hands. I?d rather flip this around. Add a new API that *does not* block. Right now, os.urandom() is trying to fill two niches, one of which is security focused. I?d much rather decide that os.urandom() is the secure API and fail as loudly as possible when people are using it insecurely than to decide that os.urandom() is the *insecure* API and require changes. This is because, again, people very rarely notice this kind of new API introduction unless their code explodes when they migrate. If you think you can find a way to blow up the secure crypto code only, I?m willing to have that patch too, but otherwise I really think that those who expect this code to be safe should be prioritised over those who expect it to be 100% available. My ideal solution: change os.urandom() to throw an exception if the kernel CSPRNG is not seeded, and add a new function for saying you don?t care if the CSPRNG isn?t seeded, with all the appropriate ?don?t use this unless you?re sure? warnings on it. Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From doug at doughellmann.com Thu Jun 9 09:48:08 2016 From: doug at doughellmann.com (Doug Hellmann) Date: Thu, 9 Jun 2016 09:48:08 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> Message-ID: > On Jun 9, 2016, at 9:27 AM, Cory Benfield wrote: > > >> On 9 Jun 2016, at 13:53, Doug Hellmann wrote: >> >> I agree with David. We shouldn't break existing behavior in a way >> that might lead to someone else's software being unusable. > > What does ?usable? mean? Does it mean ?the code must execute from beginning to end?? Or does it mean ?the code must maintain the expected invariants?? If it?s the second, what reasonably counts as ?the expected invariants?? The code must not cause the user?s computer to completely freeze in a way that makes their VM appear to be failing to boot? > > The problem here is that both definitions of ?broken? are unclear. If we leave os.urandom() as it is, there is a small-but-nonzero change that your program will hang, potentially indefinitely. If we change it back, there is a small-but-nonzero chance your program will generate you bad random numbers. > > If we assume, for a moment, that os.urandom() doesn?t get called during Python startup (that is that we adopt Christian?s approach to deal with random and SipHash as separate concerns), what we?ve boiled down to is: your application called os.urandom() so early that you?ve got weak random numbers, does it hang or proceed? Those are literally our two options. I agree those are the two options. I want the application developer to make the choice, not us. > > These two options can be described a different way. If you didn?t actually need strong random numbers but were affected by the hang, that program failed obviously, and it failed closed. You *will* notice that your program didn?t start up, you?ll investigate, and you?ll take action. On the other hand, if you need strong random numbers but were affected by os.urandom() returning bad random numbers, you almost certainly will *not* notice, and your program will have failed *open*: that is, you are exposed to a security risk, and you have no way to be alerted to that fact. > > For my part, I think the first failure mode is *vastly* better than the second, even if the first failure mode affects vastly more people than the second one does. Failing early, obviously, and safely is IMO much, much better than failing late, silently, and dangerously. > > I?d argue that all the security disagreements that happen in this list boil down to weighting that differently. For my part, I want code that expects to be used in a secure context to fail *as loudly as possible* if it is unable to operate securely. And for that reason: > >> Adding a new API that does block allows anyone to call that when >> they want guaranteed random values, and the decision about whether >> to block or not can be placed in the application developer's hands. > > I?d rather flip this around. Add a new API that *does not* block. Right now, os.urandom() is trying to fill two niches, one of which is security focused. I?d much rather decide that os.urandom() is the secure API and fail as loudly as possible when people are using it insecurely than to decide that os.urandom() is the *insecure* API and require changes. > > This is because, again, people very rarely notice this kind of new API introduction unless their code explodes when they migrate. If you think you can find a way to blow up the secure crypto code only, I?m willing to have that patch too, but otherwise I really think that those who expect this code to be safe should be prioritised over those who expect it to be 100% available. > > My ideal solution: change os.urandom() to throw an exception if the kernel CSPRNG is not seeded, and add a new function for saying you don?t care if the CSPRNG isn?t seeded, with all the appropriate ?don?t use this unless you?re sure? warnings on it. All of which fails to be backwards compatible (new exceptions and hanging behavior), which means you?re breaking apps. Introducing a new API lets the developers who care about strong random values use them without breaking anyone else. Doug From donald at stufft.io Thu Jun 9 09:57:22 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 09:57:22 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> Message-ID: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> > On Jun 9, 2016, at 9:48 AM, Doug Hellmann wrote: > > All of which fails to be backwards compatible (new exceptions and hanging behavior), which means you?re breaking apps. Introducing a new API lets the developers who care about strong random values use them without breaking anyone else. I assert that the vast bulk of users of os.urandom are using it because they care about strong random values, not because they care about the nuances of it's behavior on Linux. You're suggesting that almost every [1] single use of os.urandom in the wild should switch to this new API. Forcing the multitudes to adapt for the minority is just pointless churn and pain. Besides, Python has never held backwards compatibility sacred above all else and regularly breaks it in X.Y+1 releases when there is good reason to do so. Just yesterday there was discussion on removing bytes(n) from Python 3.x not because it's dangerous in any way, but because it's behavior makes it slightly confusing in an extremely obvious way in a PEP that appears like it has a reasonably good chance of being accepted. [1] I would almost go as far as to call it every single use, but I'm sure someone can dig up one person somewhere who purposely used this behavior. ? Donald Stufft From cory at lukasa.co.uk Thu Jun 9 10:32:35 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 9 Jun 2016 15:32:35 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> Message-ID: <3AB96F39-8682-49AF-B57C-E5B9E6E68FF4@lukasa.co.uk> > On 9 Jun 2016, at 14:48, Doug Hellmann wrote: > > I agree those are the two options. I want the application developer to make the choice, not us. Right, but right now those two options aren?t available: only one of them is. And one way or another we?re taking an action here: either we?re leaving os.urandom() as it stands now, or reverting it back to the way it was in 3.4.0. This means that you *do* want python-dev to make a choice: specifically, you want python-dev to make the choice that was made in 3.4.0, rather than the one that was made in 3.5.0. That?s fine, but we shouldn?t be pretending that either side is arguing for inaction or the status quo for Python 3.5 a choice was made with insufficient knowledge of the outcomes, and now we?re arguing about whether we can revert that choice. The difference is, now we *do* know about both outcomes, which means we are consciously choosing between them. > All of which fails to be backwards compatible (new exceptions and hanging behavior), which means you?re breaking apps. Backwards compatible with what? Python 3.5.0 and 3.5.1 both have this behaviour, so I assume you mean ?backward compatible with 3.4?. However, part of the point of a major release is that it doesn?t have to be backward compatible in this manner: Python breaks backward compatibility all the time in major releases. I should point out that as far as I'm aware there are exactly two applications that suffer from this problem. One of them is Debian?s autopkgtest, which has resolved this problem by invoking Python with PYTHONHASHSEED=0. The other is systemd-cron, and frankly it does not seem at all unreasonable to suggest that perhaps systemd-cron should *maybe* hold off until the system?s CSPRNG gets seeded before it starts executing. Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From colm at tuatha.org Thu Jun 9 10:02:53 2016 From: colm at tuatha.org (Colm Buckley) Date: Thu, 9 Jun 2016 15:02:53 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? Message-ID: Larry Hastings wrote: On 3.4 and before, on Linux, os.urandom() would never block, but if the > entropy pool was uninitialized it could return very-very-poor-quality > random bits. On 3.5.0 and 3.5.1, on Linux, when using the getrandom() > call, it will instead block for an apparently unbounded period before > returning high-quality random bits. Just a point of information here. Ted Ts'o commented on the quality of the pre-initialization bits; it's not a given that they're "very very poor quality". Even before the per-boot entropy pool is initialized, the kernel has a few sources of randomness available to it - viz: interrupt timings, RDRAND (on x86) and a little per-machine data (uname -a). If RDRAND is trusted, this is enough to provide quite significant entropy, however that's not much help to all the ARM devices out there. The most pressing issue from my perspective is the hash randomization initialization; as there is currently nothing a script author can do to influence its behavior (except setting PYTHONHASHSEED before invocation, which might not be an option). It should be possible, at least conceptually, for Python to be used to implement /sbin/init. This isn't currently the case on Linux with Python 3.5.1 and Linux 3.17+ For what it's worth, I do agree with Larry that os.urandom() should hew as closely as possible to the OS-specific urandom implementation. Adding an optional "blocking" boolean flag might be a useful addition for 3.6. Colm -- Colm Buckley / colm at tuatha.org / +353 87 2469146 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jun 9 11:52:50 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Jun 2016 08:52:50 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> Message-ID: Wow. I have to decide an issue on which lots of people I respect disagree strongly. So no matter how I decide some of you are going to hate me. Oh well. :-( So let's summarize the easy part first. It seems that there is actually agreement that for the initialization of hash randomization and for the random module's Mersenne Twister initialization it is not worth waiting. That leaves direct calls to os.urandom(). I don't think this should block either. I'm not a security expert. I'm not really an expert in anything. But I often have a good sense for what users need or want. In this case it's clear what users want: they don't want Python to hang waiting for random numbers. Take an example from asyncio. If os.urandom() could block, then an ayncio coroutine that wants to call it would have to move that call to a separate thread using loop.run_in_executor() and await the resulting Future, just to avoid blocking all I/O. But you can't test such code, because in practice when you're there to test it, it will never block anyway. So nobody will write it that way, and everybody's code will have a subtle bug (i.e. a coroutine may block without letting other coroutines run). And it's not just bare calls to os.urandom() -- it's any call to library code that might call os.urandom(). Who documents whether their library call uses os.urandom()? It's unknowable. And therein lies madness. The problem with security experts is that they're always right when they say you shouldn't do something. The only truly secure computer is one that's disconnected and buried 6 feet under the ground. There's always a scenario through which an attacker could exploit a certain behavior. And there's always the possibility that the computer that's thus compromised is guarding a list of Chinese dissidents, or a million credit card numbers, or the key Apple uses to sign iPhone apps. But much more likely it just has my family photos and 100 cloned GitHub projects. And the only time when os.urandom() is going to block on me is probably when I'm rebooting a development VM and wondering why it's so slow. Maybe we can put in a warning when getrandom(..., GRND_NONBLOCK) returns EAGAIN? And then award a prize to people who can make it print that warning. Maybe we'll find a way to actually test this code. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Thu Jun 9 11:58:14 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 11:58:14 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> Message-ID: <621DF93D-AC42-4A36-A54D-1F6F08F9FDAF@stufft.io> > On Jun 9, 2016, at 11:52 AM, Guido van Rossum wrote: > > Wow. I have to decide an issue on which lots of people I respect disagree strongly. So no matter how I decide some of you are going to hate me. Oh well. :-( > > So let's summarize the easy part first. It seems that there is actually agreement that for the initialization of hash randomization and for the random module's Mersenne Twister initialization it is not worth waiting. > > That leaves direct calls to os.urandom(). I don't think this should block either. To be clear, it?s going to block until urandom has been initialized on most non Linux OSs, so either way if the requirement of someone calling os.urandom is ?must never block?, then they can?t use os.urandom on most non Linux systems. > > I'm not a security expert. I'm not really an expert in anything. But I often have a good sense for what users need or want. In this case it's clear what users want: they don't want Python to hang waiting for random numbers. > > Take an example from asyncio. If os.urandom() could block, then an ayncio coroutine that wants to call it would have to move that call to a separate thread using loop.run_in_executor() and await the resulting Future, just to avoid blocking all I/O. But you can't test such code, because in practice when you're there to test it, it will never block anyway. So nobody will write it that way, and everybody's code will have a subtle bug (i.e. a coroutine may block without letting other coroutines run). And it's not just bare calls to os.urandom() -- it's any call to library code that might call os.urandom(). Who documents whether their library call uses os.urandom()? It's unknowable. And therein lies madness. > > The problem with security experts is that they're always right when they say you shouldn't do something. The only truly secure computer is one that's disconnected and buried 6 feet under the ground. There's always a scenario through which an attacker could exploit a certain behavior. And there's always the possibility that the computer that's thus compromised is guarding a list of Chinese dissidents, or a million credit card numbers, or the key Apple uses to sign iPhone apps. But much more likely it just has my family photos and 100 cloned GitHub projects. > > And the only time when os.urandom() is going to block on me is probably when I'm rebooting a development VM and wondering why it's so slow. > > Maybe we can put in a warning when getrandom(..., GRND_NONBLOCK) returns EAGAIN? And then award a prize to people who can make it print that warning. Maybe we'll find a way to actually test this code. > > -- > --Guido van Rossum (python.org/~guido ) ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jun 9 12:03:46 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jun 2016 09:03:46 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57595210.4000508@hastings.org> References: <57595210.4000508@hastings.org> Message-ID: <57599362.7000304@stoneleaf.us> On 06/09/2016 04:25 AM, Larry Hastings wrote: > > A problem has surfaced just this week in 3.5.1. Obviously this is a > good time to fix it for 3.5.2. But there's a big argument over what is > "broken" and what is an appropriate "fix". Having read the thread thus far, here is my take on fixing it: - Modify os.urandom() to raise an exception instead of blocking. Everyone seems to agree that this is a rare corner case, and being rare it would be easier (at least for me) to troubleshoot an exception instead of a VM (or whatever) hanging and then being killed. - Add a CLI knob to not raise, but instead wait for initialization. I think this should be under the control of the user, who knows (or should) the environment that Python is running under, and not the developer who may have never dreamed his/her little script would be called first thing during bootup. Maybe we just continue to use the hash seed parameter for this. - Modify the functions that don't need cryptographically strong random bits to use the old style (reading directly from /dev/urandom?). This seems like it should appease the security folks, yet still allow those in the trenches to (more) easily diagnose and work around the problem. -- ~Ethan~ From guido at python.org Thu Jun 9 12:16:54 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Jun 2016 09:16:54 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57599362.7000304@stoneleaf.us> References: <57595210.4000508@hastings.org> <57599362.7000304@stoneleaf.us> Message-ID: To expand on my idea of printing a warning, in 3.6 we could add a new Warning exception for this purpose, so you'd have command-line control over the behavior of os.urandom() by specifying -WXXX on your Python command line. For 3.5.2 that's too fancy though -- we can't add a new exception. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Jun 9 12:27:38 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 9 Jun 2016 12:27:38 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> Message-ID: On 6/9/2016 9:48 AM, Doug Hellmann wrote: > >> On Jun 9, 2016, at 9:27 AM, Cory Benfield >> wrote: >> The problem here is that both definitions of ?broken? are unclear. >> If we leave os.urandom() as it is, there is a small-but-nonzero >> change that your program will hang, potentially indefinitely. If we >> change it back, there is a small-but-nonzero chance your program >> will generate you bad random numbers. >> >> If we assume, for a moment, that os.urandom() doesn?t get called >> during Python startup (that is that we adopt Christian?s approach >> to deal with random and SipHash as separate concerns), what we?ve >> boiled down to is: your application called os.urandom() so early >> that you?ve got weak random numbers, does it hang or proceed? Those >> are literally our two options. > > I agree those are the two options. I want the application developer > to make the choice, not us. I think the 'new API' should be a parameter, not a new function. With just two choices, 'wait' = True/False could work. If 'raise an exception' were added, then 'action (when good bits are not immediately available' = 'return (best possible)' or 'wait (until have good bits)' or 'raise (CryptBitsNotAvailable)' In either case, there would then be the question of whether the default should match 3.5.0/1 or 3.4 and before. -- Terry Jan Reedy From steve at pearwood.info Thu Jun 9 12:30:02 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jun 2016 02:30:02 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> Message-ID: <20160609163000.GB27919@ando.pearwood.info> On Thu, Jun 09, 2016 at 08:26:20AM -0400, Donald Stufft wrote: > random.py > --------- > > In the abstract it doesn't hurt to seed MT with a CSPRNG, it just doesn't > provide much (if any) benefit and in this case it is hurting us because of the > cost on import (which will exist on other platforms as well no matter what we > do here for Linux). There are a couple solutions to this problem: > > * Use getrandom(GRND_NONBLOCK) for random.Random since it doesn't matter if we > get cryptographically secure random numbers or not. +1 on this option (see below for rationale). > * Switch it to use something other than a CSPRNG by default since it doesn't > need that. [...] > Between these options, I have a slight preference for switching it to use a non > CSPRNG, but I really don't care that much which of these options we pick. Using > random.Random is not secure and none of the above options meaningfully change > the security posture of something that accidently uses it. I don't think that is quite right, although it will depend on your definition of "meaningful". PEP 506 says: Demonstrated attacks against MT are typically against PHP applications. It is believed that PHP's version of MT is a significantly softer target than Python's version, due to a poor seeding technique [17] . https://www.python.org/dev/peps/pep-0506/#id17 specifically that PHP seeds the MT with the time, while we use the output of a CSPRNG. Now, we all agree that MT is completely the wrong thing to use for secrets, good seeding or not, but *bad* seeding could make it a PHP-level soft target. The point of PEP 506 is to move people away from using random.Random for their secrets, but we should expect that whatever we do, there will be some late adopters who are slow to get the message and continue to use it. I would not like us to weaken the seeding technique to the point that those folks become an attractive target. I think that using getrandom(GRND_NONBLOCK) will be okay, provided that when the entropy pool is too low and getrandom falls back to something cryptographically weak, it's still better (hopefully significantly better) than seeding with the time. My reasoning is that the sort of applications that could be targets of attacks against MT are unlikely to be started up early in the boot process, so they're almost always going to get good crypto seeds. On the rare occasion that they don't, well, there's only so far that I'm prepared to stand up for developer's right to be ignorant of security concerns in 2016, and that's where I draw the line. > SipHash and the Interpreter Startup > ----------------------------------- [...] > In the end, both of these choices make me happy and unhappy in different ways > but I would lean towards adding a CLI flag for the special case and letting the > systemd script that caused this problem invoke their Python with that flag. I > think this because: > > * It leaves the interpreter so that it is secure by default, but provides the > relevant knobs to turn off this default in cases where a user doesn't need > or want it. > * It solves the problem in a cross platform way, that doesn't rely on the > nuances of the CSPRNG interface on one particular supported platform. Makes sense to me. +1 > os.urandom > ---------- [...] > With that in mind, I think that we should, to the best of our ability given the > platform we're on, ensure that os.urandom does not return bytes that the OS > does not think is cryptographically secure. Just to be clear, you're talking about having it block rather than raise an exception, right? If so, that makes sense to me. That's already the behaviour on all major platforms except Linux, so you're just bringing Linux into line with the others. Those who want the non-blocking behaviour on Linux can just read from /dev/urandom. +1 -- Steve From donald at stufft.io Thu Jun 9 12:39:00 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 12:39:00 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160609163000.GB27919@ando.pearwood.info> References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> Message-ID: > On Jun 9, 2016, at 12:30 PM, Steven D'Aprano wrote: > >> >> os.urandom >> ---------- > [...] >> With that in mind, I think that we should, to the best of our ability given the >> platform we're on, ensure that os.urandom does not return bytes that the OS >> does not think is cryptographically secure. > > Just to be clear, you're talking about having it block rather than raise > an exception, right? > > If so, that makes sense to me. That's already the behaviour on all major > platforms except Linux, so you're just bringing Linux into line with the > others. Those who want the non-blocking behaviour on Linux can just read > from /dev/urandom. There are three options for what do with os.urandom by default: * Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like. * Raise an exception if we determine that the pool isn?t initialized enough to get secure random from it. * Block until the pool is initialized. Historically Python has done the first option on Linux (but not on other OSs) because that was simply the only interface that Linux offered at all. In 3.5.0 Victor changed the way os.urandom worked in a way that made it use the third option (he wasn?t attempting to change the security properties, just avoid using an FD, but it improved the security properties as well). My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure. ? Donald Stufft From benno at benno.id.au Thu Jun 9 12:54:31 2016 From: benno at benno.id.au (Ben Leslie) Date: Thu, 9 Jun 2016 12:54:31 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> Message-ID: On 9 June 2016 at 12:39, Donald Stufft wrote: > >> On Jun 9, 2016, at 12:30 PM, Steven D'Aprano wrote: >> >>> >>> os.urandom >>> ---------- >> [...] >>> With that in mind, I think that we should, to the best of our ability given the >>> platform we're on, ensure that os.urandom does not return bytes that the OS >>> does not think is cryptographically secure. >> >> Just to be clear, you're talking about having it block rather than raise >> an exception, right? >> >> If so, that makes sense to me. That's already the behaviour on all major >> platforms except Linux, so you're just bringing Linux into line with the >> others. Those who want the non-blocking behaviour on Linux can just read >> from /dev/urandom. > > > There are three options for what do with os.urandom by default: > > * Allow it to silently return data that may or may not be cryptographically secure based on what the state of the urandom pool initialization looks like. > * Raise an exception if we determine that the pool isn?t initialized enough to get secure random from it. > * Block until the pool is initialized. > > Historically Python has done the first option on Linux (but not on other OSs) because that was simply the only interface that Linux offered at all. In 3.5.0 Victor changed the way os.urandom worked in a way that made it use the third option (he wasn?t attempting to change the security properties, just avoid using an FD, but it improved the security properties as well). > > My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure. I think an exception is much easier for a user to deal with from a practical point of view. Trying to work out why a process has hung is obviously possible, but not necessarily easy. Having a process crash due to an exception is very easy to diagnose by comparison. Cheers, Ben From steve at pearwood.info Thu Jun 9 13:14:50 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jun 2016 03:14:50 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> Message-ID: <20160609171450.GC27919@ando.pearwood.info> On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote: > There are three options for what do with os.urandom by default: > > * Allow it to silently return data that may or may not be > cryptographically secure based on what the state of the urandom pool > initialization looks like. Just to be clear, this is only an option on Linux, right? All the other major platforms block, whatever we decide to do on Linux. Including Windows? -- Steve From p.f.moore at gmail.com Thu Jun 9 13:21:32 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 9 Jun 2016 18:21:32 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> Message-ID: On 9 June 2016 at 17:54, Ben Leslie wrote: >> My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure. > > I think an exception is much easier for a user to deal with from a > practical point of view. Trying to work out why a process has hung is > obviously possible, but not necessarily easy. If we put the specific issue of applications that run very early in system startup to one side, is there a possibility of running out of entropy during normal system use? Even for a tiny duration? An exception may be better than a hanging process, but a random process crash in place of a wait of a few microseconds for the entropy buffer to fill up again not so much. If we could predict whether the call was going to block for a microsecond, or for 20 minutes, I'd be OK with an exception for the latter case. But we can't predict the future, so unless the system call is guaranteed not to block except at system startup, then I prefer blocking over an exception. As for blocking vs returning less random results, I defer to others on that. On 9 June 2016 at 18:14, Steven D'Aprano wrote: > On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote: > >> There are three options for what do with os.urandom by default: >> >> * Allow it to silently return data that may or may not be >> cryptographically secure based on what the state of the urandom pool >> initialization looks like. > > Just to be clear, this is only an option on Linux, right? All the other > major platforms block, whatever we decide to do on Linux. Including > Windows? That's what I understood, certainly. But the place where this was an issue in real life was a Python program being run during the startup sequence of the OS. That's never going to be possible on Windows, so I'd be cautious about drawing parallels with Windows in this situation (blocking on Windows may be fine because Python can never run when Windows could possibly have low entropy available). Paul From donald at stufft.io Thu Jun 9 13:22:00 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 13:22:00 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160609171450.GC27919@ando.pearwood.info> References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> <20160609171450.GC27919@ando.pearwood.info> Message-ID: > On Jun 9, 2016, at 1:14 PM, Steven D'Aprano wrote: > > On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote: > >> There are three options for what do with os.urandom by default: >> >> * Allow it to silently return data that may or may not be >> cryptographically secure based on what the state of the urandom pool >> initialization looks like. > > Just to be clear, this is only an option on Linux, right? All the other > major platforms block, whatever we decide to do on Linux. Including > Windows? To my knowledge, all other major platforms block or otherwise ensure that /dev/urandom can never return anything but cryptographically secure random. [1] > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io [1] I believe OpenBSD cannot block, but they inject randomness via the boot loader so that the system is never in a state where the kernel doesn?t have enough entropy. ? Donald Stufft From donald at stufft.io Thu Jun 9 13:24:11 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 13:24:11 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> Message-ID: > On Jun 9, 2016, at 1:21 PM, Paul Moore wrote: > > On 9 June 2016 at 17:54, Ben Leslie wrote: >>> My opinion is that blocking is slightly better than raising an exception because it matches what other OSs do, but that both blocking and raising an exception is better than silently giving data that may or may not be cryptographically secure. >> >> I think an exception is much easier for a user to deal with from a >> practical point of view. Trying to work out why a process has hung is >> obviously possible, but not necessarily easy. > > If we put the specific issue of applications that run very early in > system startup to one side, is there a possibility of running out of > entropy during normal system use? Even for a tiny duration? An > exception may be better than a hanging process, but a random process > crash in place of a wait of a few microseconds for the entropy buffer > to fill up again not so much. > > If we could predict whether the call was going to block for a > microsecond, or for 20 minutes, I'd be OK with an exception for the > latter case. But we can't predict the future, so unless the system > call is guaranteed not to block except at system startup, then I > prefer blocking over an exception. /dev/urandom (and getrandom() on Linux) will never block once the pool has been initialized. The concept of ?running out of entropy? doesn?t apply to it. Once it has entropy it?s good to go. > > As for blocking vs returning less random results, I defer to others on that. > > On 9 June 2016 at 18:14, Steven D'Aprano wrote: >> On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote: >> >>> There are three options for what do with os.urandom by default: >>> >>> * Allow it to silently return data that may or may not be >>> cryptographically secure based on what the state of the urandom pool >>> initialization looks like. >> >> Just to be clear, this is only an option on Linux, right? All the other >> major platforms block, whatever we decide to do on Linux. Including >> Windows? > > That's what I understood, certainly. But the place where this was an > issue in real life was a Python program being run during the startup > sequence of the OS. That's never going to be possible on Windows, so > I'd be cautious about drawing parallels with Windows in this situation > (blocking on Windows may be fine because Python can never run when > Windows could possibly have low entropy available). > > Paul ? Donald Stufft From steve at pearwood.info Thu Jun 9 13:29:12 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jun 2016 03:29:12 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> Message-ID: <20160609172911.GD27919@ando.pearwood.info> On Thu, Jun 09, 2016 at 12:54:31PM -0400, Ben Leslie wrote: > I think an exception is much easier for a user to deal with from a > practical point of view. Trying to work out why a process has hung is > obviously possible, but not necessarily easy. > > Having a process crash due to an exception is very easy to diagnose by > comparison. That only makes sense if the application is going to block for (say) five or ten minutes. If it's going to block for three seconds, you might not even notice. At least not on a server. But what are you going to do when you catch that exception? - Sleep for a few seconds, and try again? That's just blocking. - Stop waiting on secure randomness, and use something low quality and insecure? That's how you get exploits. - Crash? -- Steve From steve at pearwood.info Thu Jun 9 13:49:27 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jun 2016 03:49:27 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> Message-ID: <20160609174927.GE27919@ando.pearwood.info> On Thu, Jun 09, 2016 at 06:21:32PM +0100, Paul Moore wrote: > If we put the specific issue of applications that run very early in > system startup to one side, is there a possibility of running out of > entropy during normal system use? Even for a tiny duration? With /dev/urandom, I believe the answer to that is no. On most platforms other than Linux, /dev/urandom is exactly the same as /dev/random, and both can only block straight after the machine has booted up before enough entropy has been collected. Then they will run forever without blocking. (Or at least until you reboot.) On Linux, /dev/random *will* block, at unpredictable times, but fortunately we're not using /dev/random. We're using Urandom. Apart from just after boot up, /dev/urandom on Linux will also run forever without blocking, just like the other platforms. The critical difference is just after booting up: - Linux /dev/urandom doesn't block, but it might return predictable, poor-quality pseudo-random bytes (i.e. a potential exploit); - Other OSes may block for potentially many minutes (i.e. a potential DOS). Two links which may help explain what's happening: http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ http://security.stackexchange.com/a/42955 -- Steve From ncoghlan at gmail.com Thu Jun 9 13:53:59 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jun 2016 10:53:59 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57595210.4000508@hastings.org> References: <57595210.4000508@hastings.org> Message-ID: On 9 June 2016 at 04:25, Larry Hastings wrote: > A user reports that when starting CPython soon after startup on a fresh > virtual machine, the process would hang for a long time. Someone on the > issue reported observed delays of over 90 seconds. Later we found out: it > wasn't 90 seconds before CPython became usable, these 90 seconds delays were > before systemd timed out and simply killed the process. It's not clear what > the upper bound on the delay might be. > > The issue author had already identified the cause: CPython was blocking on > getrandom() in order to initialize hash randomization. On this fresh > virtual machine the entropy pool started out uninitialized. And since the > only thing running on the machine was CPython, and since CPython was blocked > on initialization, the entropy pool was initializing very, very slowly. Further analysis (mentioned later in the original Python-3.5-on-Linux bug report) suggested that this wasn't actually a generic "waiting for the entropy pool to initialise" problem. Instead, the problem appeared to be specifically that the Python script was being invoked *before the Linux kernel had initialised the entropy pool* and the boot process was waiting for that script to run before continuing on with other tasks (like initialising the entropy pool). That meant os.urandom() had nothing to do with it (since the affected script wasn't generating random numbers), and the entire problem was that we were blocking trying to initialise CPython's internal hashing. Born from Victor's proposal to add a "wait for entropy?" flag to os.urandom [1], the simplest proposal for a long term fix [2] posted so far has been to: 1. make os.urandom raise BlockingIOError if kernel entropy is not available 2. don't rely on os.urandom for internal hash initialisation 3. don't rely on os.urandom for MT seeding in the random module Linux is currently the only OS we know of where the BlockingIOError would be a possible result, and the only known scenarios where it could be raised are Linux init system scripts and some embedded systems where the kernel doesn't have any good sources of entropy. In both those cases, the lack of entropy is potentially a real problem, and an exception lets the software author make an informed decision to either wait for entropy (e.g. by polling os.urandom() until it succeeds, or selecting on /dev/random) or else read directly from /dev/urandom (potentially getting non-cryptographically secure bits) The virtue of this approach is that it's entirely invisible for almost all users, and the users that it does affect will start getting an exception in Python 3.6+ rather than silently being handed cryptographically non-secure random data. Cheers, Nick. [1] http://bugs.python.org/issue27266 [2] http://bugs.python.org/issue27282 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From benno at benno.id.au Thu Jun 9 13:57:58 2016 From: benno at benno.id.au (Ben Leslie) Date: Thu, 9 Jun 2016 13:57:58 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160609172911.GD27919@ando.pearwood.info> References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> <20160609172911.GD27919@ando.pearwood.info> Message-ID: On 9 June 2016 at 13:29, Steven D'Aprano wrote: > On Thu, Jun 09, 2016 at 12:54:31PM -0400, Ben Leslie wrote: > >> I think an exception is much easier for a user to deal with from a >> practical point of view. Trying to work out why a process has hung is >> obviously possible, but not necessarily easy. >> >> Having a process crash due to an exception is very easy to diagnose by >> comparison. > > That only makes sense if the application is going to block for (say) > five or ten minutes. If it's going to block for three seconds, you might > not even notice. At least not on a server. > > But what are you going to do when you catch that exception? > > - Sleep for a few seconds, and try again? That's just blocking. > > - Stop waiting on secure randomness, and use something low quality > and insecure? That's how you get exploits. > > - Crash? What does a program do when on any exception? It really depends on the program and the circumstances in which it is running. But I would think that in most circumstances 'crash' is the answer. In the circumstances where this is most likely going to occur (server startup) you are almost certainly going to have some type of supervisory program restarting the failed process. It will almost certainly be logging the failure. Having logs filled with process restarts due to this error until there is finally entropy is better than it just hanging. At least that is what I'd prefer to diagnose. I think the real solution here would be outside of Python; starting a process that needs entropy when the system isn't ready yet is just as silly as running a 'mount' on a disk where the driver is still loading, or 'ifconfig' on a network interface where the network driver isn't yet loaded. But that isn't really a problem that can be solved in the context of Python. Cheers, Ben From christian at python.org Thu Jun 9 13:57:37 2016 From: christian at python.org (Christian Heimes) Date: Thu, 9 Jun 2016 19:57:37 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160609171450.GC27919@ando.pearwood.info> References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> <20160609171450.GC27919@ando.pearwood.info> Message-ID: On 2016-06-09 19:14, Steven D'Aprano wrote: > On Thu, Jun 09, 2016 at 12:39:00PM -0400, Donald Stufft wrote: > >> There are three options for what do with os.urandom by default: >> >> * Allow it to silently return data that may or may not be >> cryptographically secure based on what the state of the urandom pool >> initialization looks like. > > Just to be clear, this is only an option on Linux, right? All the other > major platforms block, whatever we decide to do on Linux. Including > Windows? To best of my knowledge, Windows and OSX are already initialized when Python is started. On other BSD platforms it is possible to get the seeding state through the proc file system. From zreed at fastmail.com Thu Jun 9 15:41:02 2016 From: zreed at fastmail.com (zreed at fastmail.com) Date: Thu, 09 Jun 2016 14:41:02 -0500 Subject: [Python-Dev] PEP 468 Message-ID: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Is there any further thoughts on including this in 3.6? Similar to the recent discussion on OrderedDict namespaces for metaclasses, this would simplify / enable a number of type factory use cases where proper metaclasses are overkill. This feature would also be quite nice in say pandas where the (currently unspecified) field order used in the definition of frames is preserved in user-visible displays. From vgr255 at live.ca Thu Jun 9 16:10:00 2016 From: vgr255 at live.ca (=?iso-8859-1?Q?=C9manuel_Barry?=) Date: Thu, 9 Jun 2016 16:10:00 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: > From: zreed at fastmail.com > Subject: [Python-Dev] PEP 468 > > Is there any further thoughts on including this in 3.6? Similar to the > recent discussion on OrderedDict namespaces for metaclasses, this would > simplify / enable a number of type factory use cases where proper > metaclasses are overkill. This feature would also be quite nice in say > pandas where the (currently unspecified) field order used in the > definition of frames is preserved in user-visible displays. As stated by Guido (and pointed out in the PEP): Making **kwds ordered is still open, but requires careful design and implementation to avoid slowing down function calls that don't benefit. The PEP has not been updated in a while, though. Python 3.5 has been released, and with it a C implementation of OrderedDict. Eric, are you still interested in this? IIRC that PEP was one of the motivating use cases for implementing OrderedDict in C. Maybe it's time for a second round of discussion on Python-ideas? -Emanuel From ncoghlan at gmail.com Thu Jun 9 17:39:40 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jun 2016 14:39:40 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: <575772E6.7040906@stoneleaf.us> Message-ID: On 7 June 2016 at 20:17, Eric Snow wrote: > On Tue, Jun 7, 2016 at 6:20 PM, Ethan Furman wrote: >> If __definition_order__ is supposed to be immutable as well as read-only >> then we should convert non-tuples to tuples. No point in letting that >> user bug slip through. > > Do you mean if a class explicitly defines __definition_order__? If > so, I'm not clear on how that would work. It could be set to > anything, including None or a value that does not iterate into a > definition order. If someone explicitly set __definition_order__ then > I think it should be used as-is. I'm guessing Ethan is suggesting defining it as: __definition_order__ = tuple(ns["__definition_order__"]) When the attribute is present in the method body. That restriction would be comparable to what we do with __slots__ today: >>> class C: ... __slots__ = 1 ... Traceback (most recent call last): File "", line 1, in TypeError: 'int' object is not iterable Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 9 17:55:13 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jun 2016 14:55:13 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160608210133.GA4318@python.ca> References: <20160608210133.GA4318@python.ca> Message-ID: On 8 June 2016 at 14:01, Neil Schemenauer wrote: > [I've posted something about this on python-ideas but since I now > have some basic working code, I think it is more than an idea.] > > I think the uptake of Python 3 is starting to accelerate. That's > good. However, there are still millions or maybe billions of lines > of Python code that still needs to be ported. It is beneficial to > the Python ecosystem if this code can get ported. > > My idea is to make a stepping stone version of Python, between 2.7.x > and 3.x that eases the porting job. The high level goals are: > > - code coming out of 2to3 runs correctly on this modified Python > > - code that runs without warnings on this modified Python will run > correctly on Python 3.x. As Victor noted, and as the porting guide describes in https://docs.python.org/3/howto/pyporting.html#update-your-code, we've determined that 2to3 isn't the best choice of tool for folks that can't afford to immediately drop Python 2 support. Once you switch to those now recommended more conservative migration tools, the tool suite you request already exists: - update your code with modernize or futurize - check it still runs on Python 2.7 - check it doesn't generate warnings under 2.7's "-3" switch - check it passes "pylint --py3k" - check if it runs on Python 3.5 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Thu Jun 9 18:02:45 2016 From: brett at python.org (Brett Cannon) Date: Thu, 09 Jun 2016 22:02:45 +0000 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> Message-ID: On Thu, 9 Jun 2016 at 14:56 Nick Coghlan wrote: > On 8 June 2016 at 14:01, Neil Schemenauer wrote: > > [I've posted something about this on python-ideas but since I now > > have some basic working code, I think it is more than an idea.] > > > > I think the uptake of Python 3 is starting to accelerate. That's > > good. However, there are still millions or maybe billions of lines > > of Python code that still needs to be ported. It is beneficial to > > the Python ecosystem if this code can get ported. > > > > My idea is to make a stepping stone version of Python, between 2.7.x > > and 3.x that eases the porting job. The high level goals are: > > > > - code coming out of 2to3 runs correctly on this modified Python > > > > - code that runs without warnings on this modified Python will run > > correctly on Python 3.x. > > As Victor noted, and as the porting guide describes in > https://docs.python.org/3/howto/pyporting.html#update-your-code, we've > determined that 2to3 isn't the best choice of tool for folks that > can't afford to immediately drop Python 2 support. > > Once you switch to those now recommended more conservative migration > tools, the tool suite you request already exists: > > - update your code with modernize or futurize > - check it still runs on Python 2.7 > - check it doesn't generate warnings under 2.7's "-3" switch > - check it passes "pylint --py3k" > - check if it runs on Python 3.5 > `python3.5 -bb` is best to help keep Python 2.7 compatibility, otherwise what Nick said. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jun 9 18:11:50 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jun 2016 15:11:50 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: <575772E6.7040906@stoneleaf.us> Message-ID: <5759E9A6.1080706@stoneleaf.us> On 06/09/2016 02:39 PM, Nick Coghlan wrote: > On 7 June 2016 at 20:17, Eric Snow wrote: >> On Tue, Jun 7, 2016 at 6:20 PM, Ethan Furman wrote: >>> If __definition_order__ is supposed to be immutable as well as read-only >>> then we should convert non-tuples to tuples. No point in letting that >>> user bug slip through. >> >> Do you mean if a class explicitly defines __definition_order__? If >> so, I'm not clear on how that would work. It could be set to >> anything, including None or a value that does not iterate into a >> definition order. If someone explicitly set __definition_order__ then >> I think it should be used as-is. > > I'm guessing Ethan is suggesting defining it as: > > __definition_order__ = tuple(ns["__definition_order__"]) > > When the attribute is present in the method body. Yup, that it's it exactly. Thanks, Nick! -- ~Ethan~ From ethan at stoneleaf.us Thu Jun 9 18:16:55 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jun 2016 15:16:55 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> Message-ID: <5759EAD7.4030506@stoneleaf.us> On 06/08/2016 02:40 PM, Fred Drake wrote: > On Wed, Jun 8, 2016 at 5:33 PM, Ryan Gonzalez wrote: >> What about something like "unpythonic" or similar? > > Or perhaps... antipythy? That's awfully close to antipathy [1], my path module on PyPI. Besides, I liked the suggestion from the -ideas list: Python 2therescue. ;) -- ~Ethan~ [1] https://pypi.python.org/pypi/antipathy From fred at fdrake.net Thu Jun 9 18:19:14 2016 From: fred at fdrake.net (Fred Drake) Date: Thu, 9 Jun 2016 18:19:14 -0400 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <5759EAD7.4030506@stoneleaf.us> References: <20160608210133.GA4318@python.ca> <5759EAD7.4030506@stoneleaf.us> Message-ID: On Thu, Jun 9, 2016 at 6:16 PM, Ethan Furman wrote: > That's awfully close to antipathy [1], my path module on PyPI. Good point. Increasing confusion would not help. > Besides, I liked the suggestion from the -ideas list: Python 2therescue. ;) Nice; I like that too. :-) -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From larry at hastings.org Thu Jun 9 18:22:35 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 9 Jun 2016 15:22:35 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> Message-ID: <5759EC2B.8040208@hastings.org> On 06/09/2016 08:52 AM, Guido van Rossum wrote: > That leaves direct calls to os.urandom(). I don't think this should > block either. Then it's you and me against the rest of the world ;-) Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. It's permissible to take advantage of getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from /dev/urandom. It's already well established that this will upset the cryptography experts. As a concession to them, I propose adding a simple! predictable! function to Python 3.5.2: os.getrandom(). This would be a simple wrapper over getrandom, only available on platforms that expose it. It would provide a way to use both extant flags, GRND_RANDOM and GRND_NONBLOCK, though possibly not exactly mirroring the native API. This would enable cryptography libraries to easily do what (IIUC) they regard as the "correct" thing on Linux for all supported versions of Python: if hasattr(os, "getrandom"): bits = os.getrandom(n) else: bits = os.urandom(n) I'm not excited about adding a new function in 3.5.2, but on the other hand we are taking away this functionality they had in 3.5.0 and 3.5.1 so only seems fair. And the implementation of os.getrandom() should be very straightforward, and its semantics will mirror the native call, so I'm pretty confident we can get it solid in a couple of days, though we might slip 3.5.2rc1 by a day or two. Guido: do you see this as an acceptable compromise? Cryptographers: given that os.urandom() will no longer block in 3.5.2, do you want this? Pointing out an alternate approach: Marc-Andre Lemburg proposes in issue #27279 ( http://bugs.python.org/issue27279 ) that we should add two "known best-practices" functions to get pseudo-random bits; one merely for pseudo random bits, the other for crypto-strength pseudo random bits. While I think this is a fine idea, the exact spelling, semantics, and per-platform implementation of these functions is far from settled, and nobody is proposing that we do something like that for 3.5. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jun 9 18:33:03 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 9 Jun 2016 15:33:03 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> <20160609171450.GC27919@ando.pearwood.info> Message-ID: <5759EE9F.9010203@hastings.org> On 06/09/2016 10:22 AM, Donald Stufft wrote: >> On Jun 9, 2016, at 1:14 PM, Steven D'Aprano wrote: >> >> Just to be clear, this is only an option on Linux, right? All the other >> major platforms block, whatever we decide to do on Linux. Including >> Windows? > To my knowledge, all other major platforms block or otherwise ensure that /dev/urandom can never return anything but cryptographically secure random. [1] I've done some research into this over the past couple of days. To the best of my knowledge: * Linux: /dev/urandom will never block. If the entropy pool isn't initialized yet, it will return poor-quality random bits from what is effectively an unseeded PRNG. (Yes: it uses a custom algorithm which isn't considered CPRNG-strength, it is merely a PRNG seeded with entropy.) * OS X: AFAICT, /dev/urandom guarantees it will never block. It uses an old CSPRNG, 160-bit Yarrow. The documentation states that if the entropy pool is "drained", it won't block; instead it'll degrade ("output quality will suffer over time without any explicit indication from the random device itself"). It isn't clear how initialization of the entropy pool during early startup might affect this. http://www.manpages.info/macosx/random.4.html * FreeBSD: /dev/urandom may block. It also using Yarrow (but maybe with more bits? and possibly switching soon to Yarrow's successor, Fortuna?). Both devices guarantee high-quality random bits, and will block if they feel like they're running low on entropy. * OpenBSD 5.1 is like FreeBSD, except the algorithm used is ARC4. In OpenBSD 5.5 they changed to using ChaCha20. On all of those platforms *except* Linux, /dev/random and /dev/urandom are exactly the same. Also, regarding Windows: Victor Stinner did some experiments with a VM, and even in early startup he was able to get random bits from os.urandom(). But it's hard to have a "fresh" Windows VM, so it's possible it had residual entropy from a previous boot, so this isn't conclusive. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jun 9 18:44:11 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jun 2016 15:44:11 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5759EC2B.8040208@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: <5759F13B.2000909@stoneleaf.us> On 06/09/2016 03:22 PM, Larry Hastings wrote: > > On 06/09/2016 08:52 AM, Guido van Rossum wrote: >> That leaves direct calls to os.urandom(). I don't think this should >> block either. > > Then it's you and me against the rest of the world ;-) > > > Okay, it's decided: os.urandom() must be changed for 3.5.2 to never > block on a getrandom() call. One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?]. > As a concession to [the crypto experts], I propose adding a simple! > predictable! function to Python 3.5.2: os.getrandom(). This would be unnecessary if we go the exception route. > And the implementation of os.getrandom() should be > very straightforward, and its semantics will mirror the native call, so > I'm pretty confident we can get it solid in a couple of days, though we > might slip 3.5.2rc1 by a day or two. I would think the exception route would also not take very long to make solid. Okay, I'll shut up now. ;) -- ~Ethan~ From larry at hastings.org Thu Jun 9 18:47:54 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 9 Jun 2016 15:47:54 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5759F13B.2000909@stoneleaf.us> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> Message-ID: <5759F21A.5080905@hastings.org> On 06/09/2016 03:44 PM, Ethan Furman wrote: > On 06/09/2016 03:22 PM, Larry Hastings wrote: >> Okay, it's decided: os.urandom() must be changed for 3.5.2 to never >> block on a getrandom() call. > > One way to not block is to raise an exception. Since this is such a > rare occurrence anyway I don't see this being a problem, plus it keeps > everybody mostly happy: normal users won't see it hang, crypto-folk > won't see vulnerable-from-this-cause-by-default machines, and those > running Python early in the boot sequence will have something they can > figure out, plus an existing knob to work around it [hashseed, I think?]. Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at python.ca Thu Jun 9 19:08:07 2016 From: neil at python.ca (Neil Schemenauer) Date: Thu, 9 Jun 2016 16:08:07 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> Message-ID: <20160609230807.GA8118@python.ca> On 2016-06-09, Brett Cannon wrote: > On Thu, 9 Jun 2016 at 14:56 Nick Coghlan wrote: > > Once you switch to those now recommended more conservative migration > > tools, the tool suite you request already exists: > > > > - update your code with modernize or futurize > > - check it still runs on Python 2.7 > > - check it doesn't generate warnings under 2.7's "-3" switch > > - check it passes "pylint --py3k" > > - check if it runs on Python 3.5 > > > > `python3.5 -bb` is best to help keep Python 2.7 compatibility, otherwise > what Nick said. :) I have to wonder if you guys actually ported at lot of Python 2 code. Maybe you somehow avoided the problematic behavior. Below is a pretty trival set of functions. The tools you recommend do not help at all. One problem is that the str literals should be bytes literals. Comparison with None needs to be avoided. With Python 2 code runs successfully. With Python 3 the code crashes with a traceback. With my modified Python 3.6, the code runs successfully but generates the following warnings: test.py:13: DeprecationWarning: encoding bytes to str output.write('%d:' % len(s)) test.py:14: DeprecationWarning: encoding bytes to str output.write(s) test.py:15: DeprecationWarning: encoding bytes to str output.write(',') test.py:5: DeprecationWarning: encoding bytes to str if c == ':': test.py:9: DeprecationWarning: encoding bytes to str size += c test.py:24: DeprecationWarning: encoding bytes to str data = data + s test.py:26: DeprecationWarning: encoding bytes to str if input.read(1) != ',': test.py:31: DeprecationWarning: default compare is depreciated if a > 0: It is very easy for me to find code written for Python 2 that will fail in the same way. According to you guys, there is no problem and we already have good enough tooling. ;-( -------------- next part -------------- A non-text attachment was scrubbed... Name: test.py Type: text/x-python Size: 1133 bytes Desc: not available URL: From brett at python.org Thu Jun 9 19:43:24 2016 From: brett at python.org (Brett Cannon) Date: Thu, 09 Jun 2016 23:43:24 +0000 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160609230807.GA8118@python.ca> References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: On Thu, 9 Jun 2016 at 16:08 Neil Schemenauer wrote: > On 2016-06-09, Brett Cannon wrote: > > On Thu, 9 Jun 2016 at 14:56 Nick Coghlan wrote: > > > Once you switch to those now recommended more conservative migration > > > tools, the tool suite you request already exists: > > > > > > - update your code with modernize or futurize > > > - check it still runs on Python 2.7 > > > - check it doesn't generate warnings under 2.7's "-3" switch > > > - check it passes "pylint --py3k" > > > - check if it runs on Python 3.5 > > > > > > > `python3.5 -bb` is best to help keep Python 2.7 compatibility, otherwise > > what Nick said. :) > > I have to wonder if you guys actually ported at lot of Python 2 > code. Yes I have, including code that needed to be 2.4-3.4 compatible of all things. Plus I'm the author of the porting HOWTO so I know the edge cases pretty well. I don't think you meant for what you said to sound insulting, Neil, but it did feel like it upon first reading. > Maybe you somehow avoided the problematic behavior. Below is > a pretty trival set of functions. The tools you recommend do not > help at all. One problem is that the str literals should be bytes > literals. At least for Modernize that's on purpose as it can't tell semantically what is meant to be binary data vs. textual ASCII data (which you obviously know, else you wouldn't be trying to add runtime warnings for this sort of stuff). > Comparison with None needs to be avoided. > > With Python 2 code runs successfully. With Python 3 the code > crashes with a traceback. With my modified Python 3.6, the code > runs successfully but generates the following warnings: > > test.py:13: DeprecationWarning: encoding bytes to str > output.write('%d:' % len(s)) > test.py:14: DeprecationWarning: encoding bytes to str > output.write(s) > test.py:15: DeprecationWarning: encoding bytes to str > output.write(',') > test.py:5: DeprecationWarning: encoding bytes to str > if c == ':': > test.py:9: DeprecationWarning: encoding bytes to str > size += c > test.py:24: DeprecationWarning: encoding bytes to str > data = data + s > test.py:26: DeprecationWarning: encoding bytes to str > if input.read(1) != ',': > test.py:31: DeprecationWarning: default compare is depreciated > if a > 0: > > It is very easy for me to find code written for Python 2 that will > fail in the same way. According to you guys, there is no problem > and we already have good enough tooling. ;-( > That's not what I'm saying at all (nor what I think Nick is saying); more tooling to ease the transition is always welcomed. The point we are trying to make is 2to3 is not considered best practice anymore, and so targeting its specific output might not be the best use of your time. I'm totally happy to have your fork work out and help give warnings for situations where runtime semantics are the only way to know there will be a problem that static analyzing tools can't handle and have the porting HOWTO updated so that people can run their test suite with your interpreter to help with that final bit of porting. I personally just don't want to see you waste time on warnings that are handled by the tools already or ignore the fact that six, modernize, and futurize can help more than 2to3 typically can with the easy stuff when trying to keep 2/3 compatibility. IOW some of us have become allergic to the word "2to3" in regards to porting. :) But if you want to target 2to3 output then by all means please do and your work will still be appreciated. And I should also mention in case you don't know -- and assuming I'm remembering correctly -- that adding new Py3kWarning cases to Python 2.7 is still allowed, so if there is a warning you want to add that makes sense to be upstream then we can consider adding it in Python 2.7.12 (or later). -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Thu Jun 9 20:00:39 2016 From: steve.dower at python.org (Steve Dower) Date: Fri, 10 Jun 2016 10:00:39 +1000 Subject: [Python-Dev] BDFL ruling request: should we block foreverwaiting for high-quality random bits? In-Reply-To: <5759F21A.5080905@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> Message-ID: (fat fingered the send button, picking up where I left off) If the pattern is really going to be the hasattr check you posted earlier, can we just do it for people and save them writing code that won't work on different OSs? Cheers, Steve Top-posted from my Windows Phone -----Original Message----- From: "Larry Hastings" Sent: ?6/?10/?2016 8:50 To: "python-dev at python.org" Subject: Re: [Python-Dev] BDFL ruling request: should we block foreverwaiting for high-quality random bits? On 06/09/2016 03:44 PM, Ethan Furman wrote: On 06/09/2016 03:22 PM, Larry Hastings wrote: Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?]. Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". /arry -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at python.ca Thu Jun 9 20:35:57 2016 From: neil at python.ca (Neil Schemenauer) Date: Thu, 9 Jun 2016 17:35:57 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: <20160610003557.GA9353@python.ca> On 2016-06-09, Brett Cannon wrote: > I don't think you meant for what you said to sound insulting, > Neil, but it did feel like it upon first reading. Sorry, I think I misunderstood what you and Nick were saying. I've experienced a fair amount of negative feedback on my idea so I'm pretty cranky at this point. Amber Brown claimed that she spent $60k of her time porting Twisted to Python 3. I think there is lots of room to make our porting tools better. Using something like modernize, 2to6, or sixer seems like a better idea than trying to improve on 2to3. I agree on that point. However, those tools combined with my modified Python 3.6 makes for a much easier migration path than going directly to Python 3.x. My runtime warnings catch many common problems and make it easy to see what needs fixing. We have a lot more freedom to put ugly, backwards compatibility hacks into this stepping stone version, rather than changing either Python 2.7.x or the main 3.x line. I'm hoping to get community contributions to add more backwards compatibility and runtime warnings. From steve.dower at python.org Thu Jun 9 19:58:57 2016 From: steve.dower at python.org (Steve Dower) Date: Fri, 10 Jun 2016 09:58:57 +1000 Subject: [Python-Dev] BDFL ruling request: should we block foreverwaiting for high-quality random bits? In-Reply-To: <5759F21A.5080905@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> Message-ID: Can we get any new function on all platforms, deferring to urandom() if getrandom() isn't there? If the pattern is really going to be the hasattr check you posted earlier Top-posted from my Windows Phone -----Original Message----- From: "Larry Hastings" Sent: ?6/?10/?2016 8:50 To: "python-dev at python.org" Subject: Re: [Python-Dev] BDFL ruling request: should we block foreverwaiting for high-quality random bits? On 06/09/2016 03:44 PM, Ethan Furman wrote: On 06/09/2016 03:22 PM, Larry Hastings wrote: Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on a getrandom() call. One way to not block is to raise an exception. Since this is such a rare occurrence anyway I don't see this being a problem, plus it keeps everybody mostly happy: normal users won't see it hang, crypto-folk won't see vulnerable-from-this-cause-by-default machines, and those running Python early in the boot sequence will have something they can figure out, plus an existing knob to work around it [hashseed, I think?]. Nope, I want the old behavior back. os.urandom() should read /dev/random if getrandom() would block. As the British say, "it should do what it says on the tin". /arry -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Jun 9 20:33:16 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Jun 2016 12:33:16 +1200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160609174927.GE27919@ando.pearwood.info> References: <57595210.4000508@hastings.org> <6728E567-C894-47EF-88E0-0E0A2A678E6B@stufft.io> <20160609163000.GB27919@ando.pearwood.info> <20160609174927.GE27919@ando.pearwood.info> Message-ID: <575A0ACC.5040809@canterbury.ac.nz> Steven D'Aprano wrote: > - Linux /dev/urandom doesn't block, but it might return predictable, > poor-quality pseudo-random bytes (i.e. a potential exploit); > > - Other OSes may block for potentially many minutes (i.e. a > potential DOS). It's even possible that it could block *forever*. There was a case here recently in the cosc dept where students were running Clojure programs in a virtual machine environment. When they updated to a newer version of Clojure, everyone's programs started hanging on startup. It turned out the Clojure library was initialising its RNG from /dev/random, and the VM didn't have any real spinning disks or other devices to provide entropy. -- Greg From njs at pobox.com Thu Jun 9 21:03:35 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Jun 2016 18:03:35 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5759EC2B.8040208@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: On Thu, Jun 9, 2016 at 3:22 PM, Larry Hastings wrote: > > On 06/09/2016 08:52 AM, Guido van Rossum wrote: > >> That leaves direct calls to os.urandom(). I don't think this should block >> either. > > > Then it's you and me against the rest of the world ;-) > > > Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on > a getrandom() call. It's permissible to take advantage of > getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from > /dev/urandom. > > It's already well established that this will upset the cryptography experts. > As a concession to them, I propose adding a simple! predictable! function to > Python 3.5.2: os.getrandom(). This would be a simple wrapper over > getrandom, only available on platforms that expose it. It would provide a > way to use both extant flags, GRND_RANDOM and GRND_NONBLOCK, though > possibly not exactly mirroring the native API. > > This would enable cryptography libraries to easily do what (IIUC) they > regard as the "correct" thing on Linux for all supported versions of Python: > > if hasattr(os, "getrandom"): > bits = os.getrandom(n) > else: > bits = os.urandom(n) So I understand that the trade-offs between crypto users and regular users are tricky, but this resolution concerns me quite a bit :-( Specifically, it seems to me that: 1) we now have these two functions that need to be supported forever, and AFAICT in every case where someone is currently explicitly calling os.urandom and the behavior differs, they want os.getrandom instead. (This is based on the assumption that the only time that explicitly calling os.urandom is the best option is when one cares about the cryptographic strength of the result -- I'm explicitly distinguishing here between the hash seeding issue that triggered the original bug report and explicit calls to os.urandom.) So in practice this change makes it so that the only correct way of calling either of these functions is the if/else stanza above. 2) every piece of security-sensitive software is going to spend resources churning their code to implement the above, 3) every future security audit of Python software is going to spend resources making sure this is on their checklist of incredibly subtle gotchas that have to be audited for, 4) the crypto folks are going to have to spin up a whole evangelism effort to re-educate everyone that (contrary to what we've been telling everyone for years), os.urandom is no longer the right way to get cryptographic randomness. OTOH if we allow explicit calls to os.urandom to block or raise an exception, then AFAICT from this thread this will break exactly zero projects. Maybe this is just rehashing the same things that have already been discussed ad naseaum, in which case I apologize. But I really feel like this is one of those cases where the crypto folks aren't so much saying "oh BUT what if "; they're more saying "oh $#@ you're going to cause me a *massive* amount of real work and churn and ongoing costs for no perceivable gain and I'm exhausted even thinking about it". > I'm not excited about adding a new function in 3.5.2, but on the other hand > we are taking away this functionality they had in 3.5.0 and 3.5.1 so only > seems fair. And the implementation of os.getrandom() should be very > straightforward, and its semantics will mirror the native call, so I'm > pretty confident we can get it solid in a couple of days, though we might > slip 3.5.2rc1 by a day or two. > > Guido: do you see this as an acceptable compromise? > > Cryptographers: given that os.urandom() will no longer block in 3.5.2, do > you want this? > > > Pointing out an alternate approach: Marc-Andre Lemburg proposes in issue > #27279 ( http://bugs.python.org/issue27279 ) that we should add two "known > best-practices" functions to get pseudo-random bits; one merely for pseudo > random bits, the other for crypto-strength pseudo random bits. While I > think this is a fine idea, the exact spelling, semantics, and per-platform > implementation of these functions is far from settled, and nobody is > proposing that we do something like that for 3.5. We already have a function for non-crypto-strength pseudo-random bits: random.getrandbits. os.urandom is the one for the cryptographers (I thought). -n -- Nathaniel J. Smith -- https://vorpus.org From guido at python.org Thu Jun 9 21:18:33 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Jun 2016 18:18:33 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: I don't think we should add a new function. I think we should convince ourselves that there is not enough of a risk of an exploit even if os.urandom() falls back. On Thu, Jun 9, 2016 at 6:03 PM, Nathaniel Smith wrote: > On Thu, Jun 9, 2016 at 3:22 PM, Larry Hastings wrote: > > > > On 06/09/2016 08:52 AM, Guido van Rossum wrote: > > > >> That leaves direct calls to os.urandom(). I don't think this should > block > >> either. > > > > > > Then it's you and me against the rest of the world ;-) > > > > > > Okay, it's decided: os.urandom() must be changed for 3.5.2 to never > block on > > a getrandom() call. It's permissible to take advantage of > > getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from > > /dev/urandom. > > > > It's already well established that this will upset the cryptography > experts. > > As a concession to them, I propose adding a simple! predictable! > function to > > Python 3.5.2: os.getrandom(). This would be a simple wrapper over > > getrandom, only available on platforms that expose it. It would provide > a > > way to use both extant flags, GRND_RANDOM and GRND_NONBLOCK, though > > possibly not exactly mirroring the native API. > > > > This would enable cryptography libraries to easily do what (IIUC) they > > regard as the "correct" thing on Linux for all supported versions of > Python: > > > > if hasattr(os, "getrandom"): > > bits = os.getrandom(n) > > else: > > bits = os.urandom(n) > > So I understand that the trade-offs between crypto users and regular > users are tricky, but this resolution concerns me quite a bit :-( > > Specifically, it seems to me that: > 1) we now have these two functions that need to be supported forever, > and AFAICT in every case where someone is currently explicitly calling > os.urandom and the behavior differs, they want os.getrandom instead. > (This is based on the assumption that the only time that explicitly > calling os.urandom is the best option is when one cares about the > cryptographic strength of the result -- I'm explicitly distinguishing > here between the hash seeding issue that triggered the original bug > report and explicit calls to os.urandom.) So in practice this change > makes it so that the only correct way of calling either of these > functions is the if/else stanza above. > 2) every piece of security-sensitive software is going to spend > resources churning their code to implement the above, > 3) every future security audit of Python software is going to spend > resources making sure this is on their checklist of incredibly subtle > gotchas that have to be audited for, > 4) the crypto folks are going to have to spin up a whole evangelism > effort to re-educate everyone that (contrary to what we've been > telling everyone for years), os.urandom is no longer the right way to > get cryptographic randomness. > > OTOH if we allow explicit calls to os.urandom to block or raise an > exception, then AFAICT from this thread this will break exactly zero > projects. > > Maybe this is just rehashing the same things that have already been > discussed ad naseaum, in which case I apologize. But I really feel > like this is one of those cases where the crypto folks aren't so much > saying "oh BUT what if oppressive regimes and ticking bombs>"; they're more saying "oh $#@ > you're going to cause me a *massive* amount of real work and churn and > ongoing costs for no perceivable gain and I'm exhausted even thinking > about it". > > > I'm not excited about adding a new function in 3.5.2, but on the other > hand > > we are taking away this functionality they had in 3.5.0 and 3.5.1 so only > > seems fair. And the implementation of os.getrandom() should be very > > straightforward, and its semantics will mirror the native call, so I'm > > pretty confident we can get it solid in a couple of days, though we might > > slip 3.5.2rc1 by a day or two. > > > > Guido: do you see this as an acceptable compromise? > > > > Cryptographers: given that os.urandom() will no longer block in 3.5.2, do > > you want this? > > > > > > Pointing out an alternate approach: Marc-Andre Lemburg proposes in issue > > #27279 ( http://bugs.python.org/issue27279 ) that we should add two > "known > > best-practices" functions to get pseudo-random bits; one merely for > pseudo > > random bits, the other for crypto-strength pseudo random bits. While I > > think this is a fine idea, the exact spelling, semantics, and > per-platform > > implementation of these functions is far from settled, and nobody is > > proposing that we do something like that for 3.5. > > We already have a function for non-crypto-strength pseudo-random bits: > random.getrandbits. os.urandom is the one for the cryptographers (I > thought). > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jun 9 21:53:43 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Jun 2016 21:53:43 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5759EC2B.8040208@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: <20160609215343.00b0190e.barry@wooz.org> On Jun 09, 2016, at 03:22 PM, Larry Hastings wrote: >On 06/09/2016 08:52 AM, Guido van Rossum wrote: >> That leaves direct calls to os.urandom(). I don't think this should > block either. > >Then it's you and me against the rest of the world ;-) FWIW, I agree with you and Guido. I'm also not opposed to adding a more direct exposure of getrandom(), but in Python 3.6 only. Like it or not, that's the right approach for our backward compatibility policies. Cheers, -Barry From barry at python.org Thu Jun 9 22:13:56 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Jun 2016 22:13:56 -0400 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160610003557.GA9353@python.ca> References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> <20160610003557.GA9353@python.ca> Message-ID: <20160609221356.3b18e447.barry@wooz.org> On Jun 09, 2016, at 05:35 PM, Neil Schemenauer wrote: >Amber Brown claimed that she spent $60k of her time porting Twisted to Python >3. I think there is lots of room to make our porting tools better. Amber gave a presentation at the language summit and a Pycon talk. The latter video is up on YouTube but the former wasn't recorded. I'm hoping Jake will post a summary of it though. She's done a truly impressive amount of work in porting Twisted and has a lot of good insight. I've ported a fair bit, but nothing of the size and complexity of Twisted. FWIW, I did port the Mailman 3 core, which is now Python 3.4 and 3.5 compatible. In my own experience, and IIRC Amber had a similar experience, the ease of porting to Python 3 really comes down to how bytes/unicode clean your code base is. Almost all the other pieces are either pretty manageable or fairly easily automated. But if you're code isn't bytes-clean you're in for a world of hurt because you first have to decide how to represent those things. Twisted's job is especially fun because it's all about wire protocols, which I think Amber described as (paraphrasing) bytes that happen to have contents that look like strings. I've ported some libraries that weren't bytes-clean. With one of them, I actually failed twice before I hit on the correct representation. Once I got that right the rest went much more quickly. There's does seem to be a wide variety of experiences in porting to Python 3. I think is worth both accepting, acknowledging, and promoting that for a lot of code, it's really not that hard, but that for some code it's really painful. It's within our job to help understand the remaining pain and address it in some way. But let's also not scare people away from Python 3, because it *can* be very easy to port, and I think there's fairly widespread agreement that once you're in the Python 3 world, you don't want to look back. Cheers, -Barry From barry at python.org Thu Jun 9 22:21:57 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Jun 2016 22:21:57 -0400 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview References: <57572E5D.4020101@stoneleaf.us> Message-ID: <20160609222157.2063ca00@anarchist.wooz.org> On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: >Deprecation of current "zero-initialised sequence" behaviour >------------------------------------------------------------ > >Currently, the ``bytes`` and ``bytearray`` constructors accept an integer >argument and interpret it as meaning to create a zero-initialised sequence of >the given size:: > > >>> bytes(3) > b'\x00\x00\x00' > >>> bytearray(3) > bytearray(b'\x00\x00\x00') > >This PEP proposes to deprecate that behaviour in Python 3.6, and remove it >entirely in Python 3.7. > >No other changes are proposed to the existing constructors. Does it need to be *actually* removed? That does break existing code for not a lot of benefit. Yes, the default constructor is a little wonky, but with the addition of the new constructors, and the fact that you're not proposing to eventually change the default constructor, removal seems unnecessary. Besides, once it's removed, what would `bytes(3)` actually do? The PEP doesn't say. Also, since you're proposing to add `bytes.byte(3)` have you considered also adding an optional count argument? E.g. `bytes.byte(3, count=7)` would yield b'\x03\x03\x03\x03\x03\x03\x03'. That seems like it could be useful. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From Nikolaus at rath.org Thu Jun 9 22:38:37 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 09 Jun 2016 19:38:37 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5759F21A.5080905@hastings.org> (Larry Hastings's message of "Thu, 9 Jun 2016 15:47:54 -0700") References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> Message-ID: <87oa79ydhu.fsf@vostro.rath.org> On Jun 09 2016, Larry Hastings wrote: > On 06/09/2016 03:44 PM, Ethan Furman wrote: >> On 06/09/2016 03:22 PM, Larry Hastings wrote: >>> Okay, it's decided: os.urandom() must be changed for 3.5.2 to never >>> block on a getrandom() call. >> >> One way to not block is to raise an exception. Since this is such a >> rare occurrence anyway I don't see this being a problem, plus it >> keeps everybody mostly happy: normal users won't see it hang, >> crypto-folk won't see vulnerable-from-this-cause-by-default >> machines, and those running Python early in the boot sequence will >> have something they can figure out, plus an existing knob to work >> around it [hashseed, I think?]. > > Nope, I want the old behavior back. os.urandom() should read > /dev/random if getrandom() would block. As the British say, "it > should do what it says on the tin". Aeh, what the tin says is "return random bytes". What everyone uses it for (including the standard library) is to provide randomness for cryptographic purposes. What it does (in the problematic case) is return something that's not random. To me this sounds about as sensible as having open('/dev/zero') return non-zero values in some rare situations. And yes, for most people "the kernel running out of zeros" makes exactly as much sense as "the kernel runs out of random data". Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From Nikolaus at rath.org Thu Jun 9 22:52:31 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 09 Jun 2016 19:52:31 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: (Guido van Rossum's message of "Thu, 9 Jun 2016 18:18:33 -0700") References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: <87lh2dycuo.fsf@vostro.rath.org> On Jun 09 2016, Guido van Rossum wrote: > I don't think we should add a new function. I think we should convince > ourselves that there is not enough of a risk of an exploit even if > os.urandom() falls back. That will be hard, because you have to consider an active, clever adversary. On the other hand, convincing yourself that in practice os.urandom would never block unless the setup is super exotic or there is active maliciousness seems much easier. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From breamoreboy at yahoo.co.uk Thu Jun 9 22:52:23 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 10 Jun 2016 03:52:23 +0100 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: On 10/06/2016 00:43, Brett Cannon wrote: > > That's not what I'm saying at all (nor what I think Nick is saying); > more tooling to ease the transition is always welcomed. The point we are > trying to make is 2to3 is not considered best practice anymore, and so > targeting its specific output might not be the best use of your time. > I'm totally happy to have your fork work out and help give warnings for > situations where runtime semantics are the only way to know there will > be a problem that static analyzing tools can't handle and have the > porting HOWTO updated so that people can run their test suite with your > interpreter to help with that final bit of porting. I personally just > don't want to see you waste time on warnings that are handled by the > tools already or ignore the fact that six, modernize, and futurize can > help more than 2to3 typically can with the easy stuff when trying to > keep 2/3 compatibility. IOW some of us have become allergic to the word > "2to3" in regards to porting. :) But if you want to target 2to3 output > then by all means please do and your work will still be appreciated. > Given the above and that 2to3 appears to be unsupported* is there a case for deprecating it? * There are 46 outstanding issues on the bug tracker. Is the above the reason for this, I don't know? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From larry at hastings.org Thu Jun 9 22:54:51 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 9 Jun 2016 19:54:51 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <87oa79ydhu.fsf@vostro.rath.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> <87oa79ydhu.fsf@vostro.rath.org> Message-ID: <575A2BFB.8040907@hastings.org> On 06/09/2016 07:38 PM, Nikolaus Rath wrote: > On Jun 09 2016, Larry Hastings wrote: >> Nope, I want the old behavior back. os.urandom() should read >> /dev/random if getrandom() would block. As the British say, "it >> should do what it says on the tin". > Aeh, what the tin says is "return random bytes". What the tin says is "urandom", which has local man pages that dictate exactly how it behaves. On Linux the "urandom" man page says: A read from the /dev/urandom device will not block waiting for more entropy. If there is not sufficient entropy, a pseudorandom number generator is used to create the requested bytes. os.urandom() needs to behave like that on Linux, which is how it behaved in Python 2.4 through 3.4. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jun 9 22:58:14 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Jun 2016 19:58:14 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160609215343.00b0190e.barry@wooz.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> Message-ID: On Thu, Jun 9, 2016 at 6:53 PM, Barry Warsaw wrote: > On Jun 09, 2016, at 03:22 PM, Larry Hastings wrote: > >>On 06/09/2016 08:52 AM, Guido van Rossum wrote: >>> That leaves direct calls to os.urandom(). I don't think this should > block either. >> >>Then it's you and me against the rest of the world ;-) > > FWIW, I agree with you and Guido. I'm also not opposed to adding a more > direct exposure of getrandom(), but in Python 3.6 only. Like it or not, > that's the right approach for our backward compatibility policies. I suspect the crypto folks would be okay with pushing this back to 3.6, so long as the final resolution is that os.urandom remains the standard interface for, as the docstring says, "Return[ing] a string of n random bytes suitable for cryptographic use" using the OS-recommended method, and they don't have to go change all their code. After all, 3.4 and 2.7 will still have this subtle brokenness for some time. But I'm a little uncertain what you think would need to happen to satisfy the backwards compatibility policies. If we can change it in 3.6 without having a warning in 3.5, then presumably we can also change it in 3.5 without a warning in 3.4, which is what already happened accidentally :-). Would it be acceptable for 3.5.2 to start raising a warning "urandom returning non-random bytes -- in 3.6 this will be an error", and then make it an error in 3.6? (And it would probably be good even in the long run to issue a prominent warning if hash seeding fails.) -n -- Nathaniel J. Smith -- https://vorpus.org From larry at hastings.org Thu Jun 9 23:01:14 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 9 Jun 2016 20:01:14 -0700 Subject: [Python-Dev] BDFL ruling request: should we block foreverwaiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> Message-ID: <575A2D7A.6080808@hastings.org> On 06/09/2016 05:00 PM, Steve Dower wrote: > If the pattern is really going to be the hasattr check you posted > earlier, can we just do it for people and save them writing code that > won't work on different OSs? No. That's what got us into this mess in the first place. 3.5.0 and 3.5.1 *already* changed to the new behavior, and it resulted in the situation where CPython blocked forever at startup in these certain edge cases. os.urandom() has been around for more than a decade, we can't unilaterally change its semantics now. os.urandom() in 3.5 has to go back to how it behaved on Linux in 3.4. And if I were release manager for 3.6, I'd say "it has to stay that way in 3.6 too". However, Guido's already said "don't add os.getrandom() in 3.5", so the debate is somewhat irrelevant. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jun 9 23:11:08 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 9 Jun 2016 20:11:08 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> Message-ID: <575A2FCC.5070101@hastings.org> On 06/09/2016 07:58 PM, Nathaniel Smith wrote: > I suspect the crypto folks would be okay with pushing this back to > 3.6, so long as the final resolution is that os.urandom remains the > standard interface for, as the docstring says, "Return[ing] a string > of n random bytes suitable for cryptographic use" using the > OS-recommended method, and they don't have to go change all their > code. The Linux core devs didn't like the behavior of /dev/urandom. But they couldn't change its behavior without breaking userspace code. Linux takes backwards-compatibility very seriously, so they left /dev/urandom exactly the way it was and added new functionality (the getrandom() system call) that had the semantics they felt were best. I don't understand why so many people seem to think it's okay to break old code in new versions of Python, when Python's history has shown a similarly strong commitment to backwards-compatibility. os.urandom() was added in Python 2.4, in 2004, and remained unchanged for about thirteen years. That's thirteen years of people calling it and assuming its semantics were identical to the local "urandom" man page, which was correct. I don't think we should change os.urandom() to block on Linux even in 3.6. Happily, that's no longer my fight, as I'm not 3.6 RM. > Would it be acceptable for 3.5.2 to start raising a warning "urandom > returning non-random bytes -- in 3.6 this will be an error", and then > make it an error in 3.6? No. In 3.5.2 and the remaining 3.5 releases, os.urandom() must behave identically to how it behaved in 3.4 and the previous releases. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Thu Jun 9 23:48:33 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 9 Jun 2016 23:48:33 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575A2FCC.5070101@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> Message-ID: <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> > On Jun 9, 2016, at 11:11 PM, Larry Hastings wrote: > > I don't understand why so many people seem to think it's okay to break old code in new versions of Python, when Python's history has shown a similarly strong commitment to backwards-compatibility. Python *regularly* breaks compatibility in X.Y+1 releases, and does it on purpose. An example from Python 3.5 would be PEP 479. I think breaking compatibility is a good thing from time to time, as long as it?s not done so with wanton disregard and as long as the cost is carefully weighed against the benefits. One of the more frustrating aspects of trying to discuss security sensitive topics on python-dev is a feeling (at least from my end) that whenever someone wants to make something more secure [1] folks come in and try to anchor the discussion by treating backwards compatibility as some sort of sacred duty that can never be broken and the discussion ends up feeling (from the security side that I?m typically on) being try to justify the idea of ever breaking backwards compatibility, instead of weighing the cost/benefit of a particular change. On the flip side, when a different kind of change that breaks compatibility , say to make some behavior less confusing, gets brought up it feels like the discussion instead focuses on whether or not breaking compatibility is worth it in that particular instance. I?m perfectly happy to accept that Python has decided to make a trade off differently than what I would prefer it, but the rhetoric that is employed makes trying to improve Python?s security an extremely frustrating experience for myself and others [2]. Feeling like you have to litigate that it?s *ever* OK to break compatibility before you can even get to the point of discussing if it makes sense in any particular instance, while watching other kinds proposals not have to do that is a pretty disheartening experience. [1] Making code more secure pretty much by definition means taking some code that previously executed and making it so it no longer executes, ideally only in degenerate and dangerous conditions, but in general, that?s always the case. [2] I don?t want to name names, as they didn?t give me permission to do so, but these discussions have caused more than one person who tends to fall on the security side of things to consider avoiding contributing to Python at all, because of this kind of rhetoric. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Thu Jun 9 23:50:45 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 09 Jun 2016 20:50:45 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575A2BFB.8040907@hastings.org> (Larry Hastings's message of "Thu, 9 Jun 2016 19:54:51 -0700") References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> <87oa79ydhu.fsf@vostro.rath.org> <575A2BFB.8040907@hastings.org> Message-ID: <878tydg0ru.fsf@vostro.rath.org> On Jun 09 2016, Larry Hastings wrote: > On 06/09/2016 07:38 PM, Nikolaus Rath wrote: >> On Jun 09 2016, Larry Hastings wrote: >>> Nope, I want the old behavior back. os.urandom() should read >>> /dev/random if getrandom() would block. As the British say, "it >>> should do what it says on the tin". >> Aeh, what the tin says is "return random bytes". > > What the tin says is "urandom", which has local man pages that dictate > exactly how it behaves. [...] I disagree. The authoritative source for the behavior of the Python 'urandom' function is the Python documentation, not the Linux manpage for the "urandom" device. And https://docs.python.org/3.4/library/os.html says first and foremost: os.urandom(n)? Return a string of n random bytes suitable for cryptographic use. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From tim.peters at gmail.com Thu Jun 9 23:54:15 2016 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 9 Jun 2016 22:54:15 -0500 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575A2BFB.8040907@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> <87oa79ydhu.fsf@vostro.rath.org> <575A2BFB.8040907@hastings.org> Message-ID: [Nikolaus Rath] >> Aeh, what the tin says is "return random bytes". [Larry Hastings] > What the tin says is "urandom", which has local man pages that dictate > exactly how it behaves. On Linux the "urandom" man page says: > > A read from the /dev/urandom device will not block waiting for more entropy. > If there is not sufficient entropy, a pseudorandom number generator is used > to create the requested bytes. > > os.urandom() needs to behave like that on Linux, which is how it behaved in > Python 2.4 through 3.4. I agree (with Larry). If the change hadn't already been made, nobody would get anywhere trying to make it now. So best to pretend it was never made to begin with ;-) The tin that _will_ say "return random bytes" in Python will be`secrets.token_bytes()`. That's self-evidently (to me) where the "possibly block forever" implementation belongs. From guido at python.org Fri Jun 10 00:28:18 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Jun 2016 21:28:18 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <5759F13B.2000909@stoneleaf.us> <5759F21A.5080905@hastings.org> <87oa79ydhu.fsf@vostro.rath.org> <575A2BFB.8040907@hastings.org> Message-ID: So secrets.py needs an upgrade; it currently uses random.SysRandom. On Thursday, June 9, 2016, Tim Peters wrote: > [Nikolaus Rath] > >> Aeh, what the tin says is "return random bytes". > > [Larry Hastings] > > What the tin says is "urandom", which has local man pages that dictate > > exactly how it behaves. On Linux the "urandom" man page says: > > > > A read from the /dev/urandom device will not block waiting for more > entropy. > > If there is not sufficient entropy, a pseudorandom number generator > is used > > to create the requested bytes. > > > > os.urandom() needs to behave like that on Linux, which is how it behaved > in > > Python 2.4 through 3.4. > > I agree (with Larry). If the change hadn't already been made, nobody > would get anywhere trying to make it now. So best to pretend it was > never made to begin with ;-) > > The tin that _will_ say "return random bytes" in Python will > be`secrets.token_bytes()`. That's self-evidently (to me) where the > "possibly block forever" implementation belongs. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jun 10 00:32:53 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Jun 2016 21:32:53 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575A2FCC.5070101@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> Message-ID: On Thu, Jun 9, 2016 at 8:11 PM, Larry Hastings wrote: > > On 06/09/2016 07:58 PM, Nathaniel Smith wrote: > > I suspect the crypto folks would be okay with pushing this back to > 3.6, so long as the final resolution is that os.urandom remains the > standard interface for, as the docstring says, "Return[ing] a string > of n random bytes suitable for cryptographic use" using the > OS-recommended method, and they don't have to go change all their > code. > > > The Linux core devs didn't like the behavior of /dev/urandom. But they > couldn't change its behavior without breaking userspace code. Linux takes > backwards-compatibility very seriously, so they left /dev/urandom exactly > the way it was and added new functionality (the getrandom() system call) > that had the semantics they felt were best. > > I don't understand why so many people seem to think it's okay to break old > code in new versions of Python, when Python's history has shown a similarly > strong commitment to backwards-compatibility. os.urandom() was added in > Python 2.4, in 2004, and remained unchanged for about thirteen years. > That's thirteen years of people calling it and assuming its semantics were > identical to the local "urandom" man page, which was correct. > I can only speak for myself, but the the reason it doesn't bother me is that the documentation for os.urandom has always been very clear that it is an abstraction over multiple OS-specific sources of cryptographic randomness -- even in the 2.4 docs [1] we read that its output "depends on the OS implementation", and that it might be /dev/urandom, it might be CryptGenRandom, and it might even raise an exception if "a randomness source is not found". So as a user I've always expected that it will make a best-effort attempt to use whatever the best source of cryptographic randomness is in a given environment, or else make a best-effort attempt to raise an error if it's determined that it can't give me cryptographic randomness, and it's been doing that unchanged for thirteen years too. But now Linux has moved forward and provided an improved OS-specific source of cryptographic randomness, and in particular one that actually signals to userspace when it doesn't have randomness available. So we have a choice: either we have to break the guarantee that os.urandom is identical to /dev/urandom, or we have to break the guarantee that os.urandom uses the best OS-specific source of cryptographic randomness. Either way we're breaking some guarantee we used to make. And AFAICT so far 100% of the people who actually maintain libraries that call os.urandom are asking python-dev to break the identical-to-/dev/urandom guarantee and preserve the uses-the-best-OS-specific-cryptographic-randomness guarantee. Disrupting working code is a bad thing, but in the long run, no-one is actually asking for an os.urandom that silently falls back on the xkcd #221 PRNG [2]. All that said, on the eve of the 3.5.2 release is a terrible time to be trying to decide this, and it makes perfect sense to me that maybe 3.5 should kick this can down the road. Your efforts as RM are appreciated and I'm glad I'm not in your spot :-). -n [1] https://docs.python.org/2.4/lib/os-miscfunc.html [2] https://xkcd.com/221/ -- Nathaniel J. Smith -- https://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jun 10 00:57:45 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jun 2016 21:57:45 -0700 Subject: [Python-Dev] PEP: Ordered Class Definition Namespace In-Reply-To: References: <57570C07.9000703@stoneleaf.us> Message-ID: <575A48C9.5080100@stoneleaf.us> On 06/07/2016 11:13 AM, Eric Snow wrote: > On Tue, Jun 7, 2016 at 11:01 AM, Ethan Furman wrote: >> On 06/07/2016 10:51 AM, Eric Snow wrote: >>> Specification >>> ============= >> >> >>> * types for which `__prepare__()`` returned something other than >>> ``OrderedDict`` (or a subclass) have their ``__definition_order__`` >>> set to ``None`` >> >> >> I assume this check happens in type.__new__? If a non-OrderedDict is used >> as the namespace, but a __definition_order__ key and value are supplied, is >> it used or still set to None? > > A __definition_order__ in the class body always takes precedence. So > a supplied value will be honored (and not replaced with None). Nice. I'll add it to the Enum, enum34, and aenum as soon as it lands (give or take a couple months ;) -- ~Ethan~ From vadmium+py at gmail.com Fri Jun 10 01:11:02 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Fri, 10 Jun 2016 05:11:02 +0000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5759EC2B.8040208@hastings.org> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: > On 06/09/2016 08:52 AM, Guido van Rossum wrote: > That leaves direct calls to os.urandom(). I don't think this should block > either. On 9 June 2016 at 22:22, Larry Hastings wrote: > Then it's you and me against the rest of the world ;-) > > > Okay, it's decided: os.urandom() must be changed for 3.5.2 to never block on > a getrandom() call. It's permissible to take advantage of > getrandom(GRND_NONBLOCK), but if it returns EAGAIN we must read from > /dev/urandom. So assuming this is the ?final? decision, where to from here? I think the latest change by Colm and committed by Victor already implements this decision: https://hg.python.org/cpython/rev/9de508dc4837 Getrandom() is still called, but if it would block, we fall back to trying the less-secure Linux /dev/urandom, or fail if /dev/urandom is missing. The Python hash seed is still set using this code. And os.urandom() calls this code. Random.seed() and SystemRandom still use os.urandom(), as documented. So I suggest we close the original mega bug thread as fixed. Unless people think they can change Larry or Guido?s mind, we should focus further discussion on any changes for 3.6. From christian at python.org Fri Jun 10 02:06:39 2016 From: christian at python.org (Christian Heimes) Date: Fri, 10 Jun 2016 08:06:39 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> Message-ID: On 2016-06-10 05:48, Donald Stufft wrote: > >> On Jun 9, 2016, at 11:11 PM, Larry Hastings > > wrote: >> >> I don't understand why so many people seem to think it's okay to break >> old code in new versions of Python, when Python's history has shown a >> similarly strong commitment to backwards-compatibility. > > Python *regularly* breaks compatibility in X.Y+1 releases, and does it > on purpose. An example from Python 3.5 would be PEP 479. I think > breaking compatibility is a good thing from time to time, as long as > it?s not done so with wanton disregard and as long as the cost is > carefully weighed against the benefits. > > One of the more frustrating aspects of trying to discuss security > sensitive topics on python-dev is a feeling (at least from my end) that > whenever someone wants to make something more secure [1] folks come in > and try to anchor the discussion by treating backwards compatibility as > some sort of sacred duty that can never be broken and the discussion > ends up feeling (from the security side that I?m typically on) being try > to justify the idea of ever breaking backwards compatibility, instead of > weighing the cost/benefit of a particular change. On the flip side, when > a different kind of change that breaks compatibility , say to make some > behavior less confusing, gets brought up it feels like the discussion > instead focuses on whether or not breaking compatibility is worth it in > that particular instance. > > I?m perfectly happy to accept that Python has decided to make a trade > off differently than what I would prefer it, but the rhetoric that is > employed makes trying to improve Python?s security an extremely > frustrating experience for myself and others [2]. Feeling like you have > to litigate that it?s *ever* OK to break compatibility before you can > even get to the point of discussing if it makes sense in any particular > instance, while watching other kinds proposals not have to do that is a > pretty disheartening experience. > > > [1] Making code more secure pretty much by definition means taking some > code that previously executed and making it so it no longer executes, > ideally only in degenerate and dangerous conditions, but in general, > that?s always the case. > > [2] I don?t want to name names, as they didn?t give me permission to do > so, but these discussions have caused more than one person who tends to > fall on the security side of things to consider avoiding contributing to > Python at all, because of this kind of rhetoric. Donald, feel free to name me. I'm mentally exhausted and frustrated by the discussions over the last days and weeks. As of now I'm considering to step down from PSRT and take a long break from Python core development. My frustration is mostly rooted in Dunning-Kruger effects. If you still think that a CSPRNG can run out of entropy or that it is a good idea to implement a crypto hash function in pure Python, then please go back to the children table and let the grown-ups talk. You are still struggling with basic addition and multiplication, while we discuss Laplace transformation for linear ODEs and consult experts, who do quantum fourier transformation to solve a hidden subgroup problem by converting it from finite Abelian groups to Shor's quantum algorithm [1]. Quoting Larry: "You must be this tall to ride the security train." I'm well aware that I'm not a trained and studied cryptographer. Cory and Donald repeatedly stated the same. However we are aware of our shortcomings, know our limits and constantly follow the advice of trusted experts. At least we combine enough experience to recognize bad ideas. Please, please don't add unnecessary noise to security discussions. os.urandom() is not about the concrete foundation of a bike shed. It's the f...reaking core catcher [2] of a nuclear power plant. You want to have a secure core catcher when the nuclear reactor goes BOOOM and spills hot molten, extremely radioactive Corium. Christian [1] Yes, that is a real thing. It will break all current asymmetric ciphers like RSA and EC. [2] https://en.wikipedia.org/wiki/Core_catcher From stephen at xemacs.org Fri Jun 10 02:23:43 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 10 Jun 2016 15:23:43 +0900 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160609230807.GA8118@python.ca> References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: <22362.23791.365192.927455@turnbull.sk.tsukuba.ac.jp> Neil Schemenauer writes: > I have to wonder if you guys actually ported at lot of Python 2 > code. Python 3 (including stdlib) itself is quite a bit of code. > According to you guys, there is no problem No, according to us, there are problems, but in the code, not in the language or its implementation. This is a Brooksian "no silver bullet" problem: it's very hard to write reliable code that handles multiple text representations (as pretty much everything does nowadays), except by converting to internal text on input and back to encoded text on output. The warnings you quote (and presumably the code that generates them) make assumptions (cf Barry's post) that are frequently invalid. I don't know about cross-type comparisons, but as Barry and Brett both pointed out, mixtures of bytes and text are *rarely* easy to fix, because it's often extremely difficult to know which is the appropriate representation for a given variable unless you do a complete refactoring as described above. When I've tried to fix such warnings one at a time, it's always been whack-a-mole. The experience in GNU Emacs and Mailman 2 has been that it took about ten years to get to the point where they went a whole year without an encoding bug once non-Latin-1 encodings were being handled. XEmacs OTOH took only about 3 years from the proof-of-concept introduction of multibyte characters to essentially no bugs (except in C code, of course!) because we had the same policy as Python 3: bytes and text don't mix, and in development we also would abort on mixing integers and characters (in GNU Emacs, the character type was the same as the integer type until very recently). We affectionately referred to those bugs as "Ebola" (not very polite, but it gets the point across about how seriously we took the idea of making the internal text representation completely opaque). In Mailman 2, we still can't say confidently that there are no Unicode bugs left even today. We still need an outer "except UnicodeError: quarantine_and_call_for_help(msg)" handler, although AFAIK it hasn't been reported for a couple years. It's not that you can't continue to run the potentially buggy code in Python 2. Mailman 2 does; you can, too. What we don't support (and I personally hope we never support) is running that code in Python 3 (warnings or no). If you want to support that yourself, more power to you, but I advise you that my experience suggests that it's not going to be a panacea, and I do believe it's going to be more trouble than biting the bullet and just thoroughly porting your code. Even if that takes as much time as it took Amber to port Twisted. > and we already have good enough tooling. ;-( Nobody said that, just that the existing tooling is pretty good for the problems that tools can help with, while no tool is likely to be much help with some of the code your tool allows to run. You're welcome to try to prove that claim wrong -- if you do, it would indeed be very valuable! But I personally, based on my own experience, think that the chance of success is too low to justify the cost. (Granted, I don't have to port Twisted, so in that sense I'm biased. :-/ ) BTW tools continue to be added, as well as language changes (PEP 461!) There is no resistence to that. What you're running into here is that several of us have substantial experience with various of the issues raised, and that experience convinces us that there's no silver bullet, just hard work, if you face them. Steve From p.f.moore at gmail.com Fri Jun 10 04:35:45 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 10 Jun 2016 09:35:45 +0100 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160609221356.3b18e447.barry@wooz.org> References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> <20160610003557.GA9353@python.ca> <20160609221356.3b18e447.barry@wooz.org> Message-ID: On 10 June 2016 at 03:13, Barry Warsaw wrote: > In my own experience, and IIRC Amber had a similar experience, the ease of > porting to Python 3 really comes down to how bytes/unicode clean your code > base is. Almost all the other pieces are either pretty manageable or fairly > easily automated. But if you're code isn't bytes-clean you're in for a world > of hurt because you first have to decide how to represent those things. > Twisted's job is especially fun because it's all about wire protocols, which I > think Amber described as (paraphrasing) bytes that happen to have contents > that look like strings. Although I have much less experience with porting than many others in this thread, that's my experience as well. Get a clear and well-understood separation of bytes and strings, and the rest of the porting exercise is (relatively!) straightforward. But if you just once think "I'm not quite sure, but I think I just need to decode here to be safe" and you'll be fighting Unicode errors for ever. My hope is that static typing tools like MyPy could help here. I typically review Python 2 code by mentally categorising which functions (theoretically) take bytes, which take strings, and which are confused. And sort things out from there. Type annotations seem like they'd help that process. But I've yet to use typing in practice, so it may not be that simple. Paul From victor.stinner at gmail.com Fri Jun 10 07:13:10 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 10 Jun 2016 13:13:10 +0200 Subject: [Python-Dev] Stop using timeit, use perf.timeit! Message-ID: Hi, Last weeks, I made researchs on how to get stable and reliable benchmarks, especially for the corner case of microbenchmarks. The first result is a serie of article, here are the first three: https://haypo.github.io/journey-to-stable-benchmark-system.html https://haypo.github.io/journey-to-stable-benchmark-deadcode.html https://haypo.github.io/journey-to-stable-benchmark-average.html The second result is a new perf module which includes all "tricks" discovered in my research: compute average and standard deviation, spawn multiple worker child processes, automatically calibrate the number of outter-loop iterations, automatically pin worker processes to isolated CPUs, and more. The perf module allows to store benchmark results as JSON to analyze them in depth later. It helps to configure correctly a benchmark and check manually if it is reliable or not. The perf documentation also explains how to get stable and reliable benchmarks (ex: how to tune Linux to isolate CPUs). perf has 3 builtin CLI commands: * python -m perf: show and compare JSON results * python -m perf.timeit: new better and more reliable implementation of timeit * python -m metadata: display collected metadata Python 3 is recommended to get time.perf_counter(), use the new accurate statistics module, automatic CPU pinning (I will implement it on Python 2 later), etc. But Python 2.7 is also supported, fallbacks are implemented when needed. Example with the patched telco benchmark (benchmark for the decimal module) on a Linux with two isolated CPUs. First run the benchmark: --- $ python3 telco.py --json-file=telco.json ......................... Average: 26.7 ms +- 0.2 ms --- Then show the JSON content to see all details: --- $ python3 -m perf -v show telco.json Metadata: - aslr: enabled - cpu_affinity: 2, 3 - cpu_count: 4 - cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz - hostname: smithers - loops: 10 - platform: Linux-4.4.9-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three - python_executable: /usr/bin/python3 - python_implementation: cpython - python_version: 3.4.3 Run 1/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.8 ms, 26.7 ms Run 2/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms Run 3/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.9 ms, 26.8 ms (...) Run 25/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms Average: 26.7 ms +- 0.2 ms (25 runs x 3 samples; 1 warmup) --- Note: benchmarks can be analyzed with Python 2. I'm posting my email to python-dev because providing timeit results is commonly requested in review of optimization patches. The next step is to patch the CPython benchmark suite to use the perf module. I already forked the repository and started to patch some benchmarks. If you are interested by Python performance in general, please join us on the speed mailing list! https://mail.python.org/mailman/listinfo/speed Victor From steve at pearwood.info Fri Jun 10 09:20:51 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jun 2016 23:20:51 +1000 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: Message-ID: <20160610132051.GH27919@ando.pearwood.info> On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote: > Hi, > > Last weeks, I made researchs on how to get stable and reliable > benchmarks, especially for the corner case of microbenchmarks. The > first result is a serie of article, here are the first three: Thank you for this! I am very interested in benchmarking. > https://haypo.github.io/journey-to-stable-benchmark-system.html > https://haypo.github.io/journey-to-stable-benchmark-deadcode.html > https://haypo.github.io/journey-to-stable-benchmark-average.html I strongly question your statement in the third: [quote] But how can we compare performances if results are random? Take the minimum? No! You must never (ever again) use the minimum for benchmarking! Compute the average and some statistics like the standard deviation: [end quote] While I'm happy to see a real-world use for the statistics module, I disagree with your logic. The problem is that random noise can only ever slow the code down, it cannot speed it up. To put it another way, the random errors in the timings are always positive. Suppose you micro-benchmark some code snippet and get a series of timings. We can model the measured times as: measured time t = T + ? where T is the unknown "true" timing we wish to estimate, and ? is some variable error due to noise in the system. But ? is always positive, never negative, and we always measure something larger than T. Let's suppose we somehow (magically) know what the epsilons are: measurements = [T + 0.01, T + 0.02, T + 0.04, T + 0.01] The average is (4*T + 0.08)/4 = T + 0.02 But the minimum is T + 0.01, which is a better estimate than the average. Taking the average means that *worse* epsilons will effect your estimate, while the minimum means that only the smallest epsilon effects your estimate. Taking the average is appropriate is if the error terms can be positive or negative, e.g. if they are *measurement error* rather than noise: measurements = [T + 0.01, T - 0.02, T + 0.04, T - 0.01] The average is (4*T + 0.02)/4 = T + 0.005 The minimum is T - 0.02, which is worse than the average. Unless you have good reason to think that the timing variation is mostly caused by some error which can be both positive and negative, the minimum is the right statistic to use, not the average. But ask yourself: what sort of error, noise or external influence will cause the code snippet to run FASTER than the fastest the CPU can execute it? -- Steve From dmalcolm at redhat.com Fri Jun 10 10:34:26 2016 From: dmalcolm at redhat.com (David Malcolm) Date: Fri, 10 Jun 2016 10:34:26 -0400 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <20160610132051.GH27919@ando.pearwood.info> References: <20160610132051.GH27919@ando.pearwood.info> Message-ID: <1465569266.4029.43.camel@redhat.com> On Fri, 2016-06-10 at 23:20 +1000, Steven D'Aprano wrote: > On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote: > > Hi, > > > > Last weeks, I made researchs on how to get stable and reliable > > benchmarks, especially for the corner case of microbenchmarks. The > > first result is a serie of article, here are the first three: > > Thank you for this! I am very interested in benchmarking. > > > https://haypo.github.io/journey-to-stable-benchmark-system.html > > https://haypo.github.io/journey-to-stable-benchmark-deadcode.html > > https://haypo.github.io/journey-to-stable-benchmark-average.html > > I strongly question your statement in the third: > > [quote] > But how can we compare performances if results are random? > Take the minimum? > > No! You must never (ever again) use the minimum for > benchmarking! Compute the average and some statistics like > the standard deviation: > [end quote] > > > While I'm happy to see a real-world use for the statistics module, I > disagree with your logic. > > The problem is that random noise can only ever slow the code down, it > cannot speed it up. Consider a workload being benchmarked running on one core, which has a particular pattern of cache hits and misses. Now consider another process running on a sibling core, sharing the same cache. Isn't it possible that under some circumstances the 2nd process could prefetch memory into the cache in such a way that the workload under test actually gets faster than if the 2nd process wasn't running? [...snip...] Hope this is constructive Dave From sebastian at realpath.org Fri Jun 10 02:39:02 2016 From: sebastian at realpath.org (Sebastian Krause) Date: Fri, 10 Jun 2016 08:39:02 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: (Nathaniel Smith's message of "Thu, 9 Jun 2016 18:03:35 -0700") References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: Nathaniel Smith wrote: > (This is based on the assumption that the only time that explicitly > calling os.urandom is the best option is when one cares about the > cryptographic strength of the result -- I'm explicitly distinguishing > here between the hash seeding issue that triggered the original bug > report and explicit calls to os.urandom.) I disagree with that assumption. I've often found myself to use os.urandom for non-secure random data and seen it as the best option simply because it directly returns the type I wanted: bytes. The last time I looked the random module didn't have a function to directly give me bytes, so I would have to wrap it in something like: bytearray(random.getrandbits(8) for _ in range(size)) Or maybe the function exists, but then it doesn't seem very discoverable. Ideally I would only want to use the random module for non-secure and (in 3.6) the secrets module (which could block) for secure random data and never bother with os.urandom (and knowing how it behaves). But then those modules should probably get new functions to directly return bytes. Sebastian From cody.piersall at gmail.com Fri Jun 10 10:09:27 2016 From: cody.piersall at gmail.com (Cody Piersall) Date: Fri, 10 Jun 2016 09:09:27 -0500 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <20160609230807.GA8118@python.ca> References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: > One problem is that the str literals should be bytes > literals. Comparison with None needs to be avoided. > > With Python 2 code runs successfully. With Python 3 the code > crashes with a traceback. With my modified Python 3.6, the code > runs successfully but generates the following warnings: > > test.py:13: DeprecationWarning: encoding bytes to str > output.write('%d:' % len(s)) > test.py:14: DeprecationWarning: encoding bytes to str > output.write(s) > test.py:15: DeprecationWarning: encoding bytes to str > output.write(',') > test.py:5: DeprecationWarning: encoding bytes to str > if c == ':': > test.py:9: DeprecationWarning: encoding bytes to str > size += c > test.py:24: DeprecationWarning: encoding bytes to str > data = data + s > test.py:26: DeprecationWarning: encoding bytes to str > if input.read(1) != ',': > test.py:31: DeprecationWarning: default compare is depreciated > if a > 0: > This seems _very_ useful; I'm surprised that other people don't think so too. Currently, the easiest way to find bytes/str errors in a big application is by running the program, finding where it crashes, fixing that one line (or hopefully wherever the data entered the system if you can find it), and repeating the process. This is nice because you can get in "fix my encoding errors" mode for more than just one traceback at a time; the new method would be to run the program, look at the millions of bytes/str errors, and fix everything that showed up in this round at once. That seems like a big win for productivity to me. Cody From victor.stinner at gmail.com Fri Jun 10 11:07:18 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 10 Jun 2016 17:07:18 +0200 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <20160610132051.GH27919@ando.pearwood.info> References: <20160610132051.GH27919@ando.pearwood.info> Message-ID: I started to work on visualisation. IMHO it helps to understand the problem. Let's create a large dataset: 500 samples (100 processes x 5 samples): --- $ python3 telco.py --json-file=telco.json -p 100 -n 5 --- Attached plot.py script creates an histogram: --- avg: 26.7 ms +- 0.2 ms; min = 26.2 ms 26.1 ms: 1 # 26.2 ms: 12 ##### 26.3 ms: 34 ############ 26.4 ms: 44 ################ 26.5 ms: 109 ###################################### 26.6 ms: 117 ######################################## 26.7 ms: 86 ############################## 26.8 ms: 50 ################## 26.9 ms: 32 ########### 27.0 ms: 10 #### 27.1 ms: 3 ## 27.2 ms: 1 # 27.3 ms: 1 # minimum 26.1 ms: 0.2% (1) of 500 samples --- Replace "if 1" with "if 0" to produce a graphical view, or just view the attached distribution.png, the numpy+scipy histogram. The distribution looks a gaussian curve: https://en.wikipedia.org/wiki/Gaussian_function The interesting thing is that only 1 sample on 500 are in the minimum bucket (26.1 ms). If you say that the performance is 26.1 ms, only 0.2% of your users will be able to reproduce this timing. The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us 394/500 = 79%. IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than 26.1 ms (0.2%). Victor -------------- next part -------------- A non-text attachment was scrubbed... Name: distribution.png Type: image/png Size: 31967 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: telco.json Type: application/json Size: 58847 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: plot.py Type: text/x-python Size: 1109 bytes Desc: not available URL: From p.f.moore at gmail.com Fri Jun 10 11:09:44 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 10 Jun 2016 16:09:44 +0100 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <1465569266.4029.43.camel@redhat.com> References: <20160610132051.GH27919@ando.pearwood.info> <1465569266.4029.43.camel@redhat.com> Message-ID: On 10 June 2016 at 15:34, David Malcolm wrote: >> The problem is that random noise can only ever slow the code down, it >> cannot speed it up. [...] > Isn't it possible that under some circumstances the 2nd process could > prefetch memory into the cache in such a way that the workload under > test actually gets faster than if the 2nd process wasn't running? My feeling is that it would be much rarer for random effects to speed up the benchmark under test - possible in the sort of circumstance you describe, but not common. The conclusion I draw is "be careful how you interpret summary statistics if you don't know the distribution of the underlying data as an estimator of the value you are interested in". In the case of Victor's article, he's specifically trying to compensate for variations introduced by Python's hash randomisation algorithm. And for that, you would get both positive and negative effects on code speed, so the average makes sense. But only if you've already eliminated the other common noise (such as other proceses, etc). In Victor's articles, he sounds like he's done this, but he's using very Linux-specific mechanisms, and I don't know if he's done the same for other platforms. Also, the way people commonly use micro-benchmarks ("hey, look, this way of writing the expression goes faster than that way") doesn't really address questions like "is the difference statistically significant". Summary: Micro-benchmarking is hard. Victor looks like he's done some really interesting work on it, but any "easy to use" timeit tool will typically get used in an over-simplistic way in practice, and so you probably shouldn't read too much into timing figures quoted in isolation, no matter what tool was used to generate them. Paul From p.f.moore at gmail.com Fri Jun 10 11:14:45 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 10 Jun 2016 16:14:45 +0100 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: On 10 June 2016 at 15:09, Cody Piersall wrote: >> One problem is that the str literals should be bytes >> literals. Comparison with None needs to be avoided. >> >> With Python 2 code runs successfully. With Python 3 the code >> crashes with a traceback. With my modified Python 3.6, the code >> runs successfully but generates the following warnings: >> >> test.py:13: DeprecationWarning: encoding bytes to str >> output.write('%d:' % len(s)) >> test.py:14: DeprecationWarning: encoding bytes to str >> output.write(s) >> test.py:15: DeprecationWarning: encoding bytes to str >> output.write(',') >> test.py:5: DeprecationWarning: encoding bytes to str >> if c == ':': >> test.py:9: DeprecationWarning: encoding bytes to str >> size += c >> test.py:24: DeprecationWarning: encoding bytes to str >> data = data + s >> test.py:26: DeprecationWarning: encoding bytes to str >> if input.read(1) != ',': >> test.py:31: DeprecationWarning: default compare is depreciated >> if a > 0: >> > > This seems _very_ useful; I'm surprised that other people don't think > so too. Currently, the easiest way to find bytes/str errors in a big > application is by running the program, finding where it crashes, > fixing that one line (or hopefully wherever the data entered the > system if you can find it), and repeating the process. It *is* very nice. But... > This is nice because you can get in "fix my encoding errors" mode for > more than just one traceback at a time; the new method would be to run > the program, look at the millions of bytes/str errors, and fix > everything that showed up in this round at once. That seems like a > big win for productivity to me. If you're fixing encoding errors at the point they occur, rather than looking at the high-level design of the program's handling of textual and bytestring data, you're likely to end up in a bit of a mess no matter how you locate the issues. Most likely because at the point in the code where the warning occurs, you no longer know what the correct encoding to use should be. But absolutely, anything that gives extra information about where the encoding hotspots are in your code is of value. Paul From status at bugs.python.org Fri Jun 10 12:08:43 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 10 Jun 2016 18:08:43 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160610160843.612C656AAD@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-06-03 - 2016-06-10) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5553 (+16) closed 33491 (+75) total 39044 (+91) Open issues with patches: 2424 Issues opened (69) ================== #16484: pydoc generates invalid docs.python.org link for xml.etree.Ele http://bugs.python.org/issue16484 reopened by martin.panter #26243: zlib.compress level as keyword argument http://bugs.python.org/issue26243 reopened by serhiy.storchaka #26839: Python 3.5 running on Linux kernel 3.17+ can block at startup http://bugs.python.org/issue26839 reopened by haypo #27186: add os.fspath() http://bugs.python.org/issue27186 reopened by brett.cannon #27197: mock.patch interactions with "from" imports http://bugs.python.org/issue27197 opened by clarkbreyman #27198: Adding an assertClose() method to unittest.TestCase http://bugs.python.org/issue27198 opened by ChrisBarker #27199: TarFile expose copyfileobj bufsize to improve throughput http://bugs.python.org/issue27199 opened by fried #27200: make doctest in CPython has failures http://bugs.python.org/issue27200 opened by Jelle Zijlstra #27201: expose the ABI name as a config variable http://bugs.python.org/issue27201 opened by doko #27204: Failing doctests in Doc/howto/ http://bugs.python.org/issue27204 opened by Jelle Zijlstra #27205: Failing doctests in Library/collections.rst http://bugs.python.org/issue27205 opened by Jelle Zijlstra #27206: Failing doctests in Doc/tutorial/ http://bugs.python.org/issue27206 opened by Jelle Zijlstra #27207: Failing doctests in Doc/whatsnew/3.2.rst http://bugs.python.org/issue27207 opened by Jelle Zijlstra #27208: Failing doctests in Library/traceback.rst http://bugs.python.org/issue27208 opened by Jelle Zijlstra #27209: Failing doctests in Library/email.*.rst http://bugs.python.org/issue27209 opened by Jelle Zijlstra #27210: Failing doctests due to environmental dependencies in Lib/*lib http://bugs.python.org/issue27210 opened by Jelle Zijlstra #27212: Doc for itertools, 'islice()' implementation have unwanted beh http://bugs.python.org/issue27212 opened by alex0307 #27213: Rework CALL_FUNCTION* opcodes http://bugs.python.org/issue27213 opened by serhiy.storchaka #27214: a potential future bug and an optimization that mostly undermi http://bugs.python.org/issue27214 opened by Oren Milman #27218: improve tracing performance with f_trace set to Py_None http://bugs.python.org/issue27218 opened by xdegaye #27219: turtle.fillcolor doesn't accept a tuple of floats http://bugs.python.org/issue27219 opened by Jelle Zijlstra #27220: Add a pure Python version of 'collections.defaultdict' http://bugs.python.org/issue27220 opened by ebarry #27221: multiprocessing documentation is outdated regarding method pic http://bugs.python.org/issue27221 opened by memeplex #27222: redundant checks and a weird use of goto statements in long_rs http://bugs.python.org/issue27222 opened by Oren Milman #27223: _ready_ready and _write_ready should respect _conn_lost http://bugs.python.org/issue27223 opened by lukasz.langa #27226: distutils: unable to compile both .opt-1.pyc and .opt2.pyc sim http://bugs.python.org/issue27226 opened by mgorny #27227: argparse fails to parse [] when using choices and nargs='*' http://bugs.python.org/issue27227 opened by evan_ #27231: Support the fspath protocol in the posixpath module http://bugs.python.org/issue27231 opened by Jelle Zijlstra #27232: os.fspath() should not use repr() on error http://bugs.python.org/issue27232 opened by Jelle Zijlstra #27233: Missing documentation for PyOS_FSPath http://bugs.python.org/issue27233 opened by Jelle Zijlstra #27235: Heap overflow occurred due to the int overflow (Python-2.7.11/ http://bugs.python.org/issue27235 opened by madness #27238: Bare except: usages in turtle.py http://bugs.python.org/issue27238 opened by Jelle Zijlstra #27240: 'UnstructuredTokenList' object has no attribute '_fold_as_ew' http://bugs.python.org/issue27240 opened by touilleMan #27241: Catch exceptions raised in pstats add (repl) http://bugs.python.org/issue27241 opened by llllllllll #27242: Make the docs for NotImplemented & NotImplementedError unambig http://bugs.python.org/issue27242 opened by ebarry #27243: __aiter__ should return async iterator instead of awaitable http://bugs.python.org/issue27243 opened by yselivanov #27244: print(';;') fails in pdb with SyntaxError http://bugs.python.org/issue27244 opened by cjw296 #27245: IDLE: Fix deletion of custom themes and key bindings http://bugs.python.org/issue27245 opened by terry.reedy #27248: Possible refleaks in PyType_Ready in error condition http://bugs.python.org/issue27248 opened by xiang.zhang #27250: Add os.urandom_block() http://bugs.python.org/issue27250 opened by haypo #27252: Make dict views copyable http://bugs.python.org/issue27252 opened by serhiy.storchaka #27253: More efficient deepcopying of Mapping http://bugs.python.org/issue27253 opened by serhiy.storchaka #27254: heap overflow in Tkinter module http://bugs.python.org/issue27254 opened by Emin Ghuliev #27255: More opcode predictions http://bugs.python.org/issue27255 opened by serhiy.storchaka #27256: header indentation destroyed http://bugs.python.org/issue27256 opened by frispete #27257: get_addresses results in traceback with an addrspec with an em http://bugs.python.org/issue27257 opened by frispete #27258: Exception in BytesGenerator.flatten http://bugs.python.org/issue27258 opened by frispete #27259: Possible missing deprecation warnings? http://bugs.python.org/issue27259 opened by mark #27260: Missing equality check for super objects http://bugs.python.org/issue27260 opened by Jelle Zijlstra #27261: io.BytesIO.truncate does not work as advertised http://bugs.python.org/issue27261 opened by justus.winter #27262: IDLE: move Aqua context menu code to maxosx http://bugs.python.org/issue27262 opened by terry.reedy #27263: IDLE sets the HOME environment variable breaking scripts http://bugs.python.org/issue27263 opened by Jarrod Petz #27266: Always use getrandom() in os.random() on Linux and add block=F http://bugs.python.org/issue27266 opened by haypo #27268: Incorrect error message on float('') http://bugs.python.org/issue27268 opened by Drekin #27269: ipaddress: Wrong behavior with ::ffff:1.2.3.4 style IPs http://bugs.python.org/issue27269 opened by ThiefMaster #27270: 'parentheses-equality' warnings when building with clang and c http://bugs.python.org/issue27270 opened by xdegaye #27272: random.Random should not read 2500 bytes from urandom http://bugs.python.org/issue27272 opened by haypo #27273: subprocess.run(cmd, input='text') should pass universal_newlin http://bugs.python.org/issue27273 opened by akira #27274: [ctypes] Allow from_pointer creation http://bugs.python.org/issue27274 opened by memeplex #27275: KeyError thrown by optimised collections.OrderedDict.popitem() http://bugs.python.org/issue27275 opened by kaniini #27277: Fatal Python error: Segmentation fault in test_exceptions http://bugs.python.org/issue27277 opened by Rohit Mediratta #27278: py_getrandom() uses an int for syscall() result http://bugs.python.org/issue27278 opened by haypo #27279: Add random.cryptorandom() and random.pseudorandom, deprecate o http://bugs.python.org/issue27279 opened by lemburg #27281: unpickling an xmlrpc.client.Fault raises TypeError http://bugs.python.org/issue27281 opened by Uri Okrent #27282: Raise BlockingIOError in os.urandom if kernel is not ready http://bugs.python.org/issue27282 opened by ncoghlan #27283: Add a "What's New" entry for PEP 519 http://bugs.python.org/issue27283 opened by brett.cannon #27285: Deprecate pyvenv in favor of python3 -m venv http://bugs.python.org/issue27285 opened by stevepiercy #27286: str object got multiple values for keyword argument http://bugs.python.org/issue27286 opened by martin.panter #27287: SIGSEGV when calling os.forkpty() http://bugs.python.org/issue27287 opened by Alexander Haensch Most recent 15 issues with no replies (15) ========================================== #27287: SIGSEGV when calling os.forkpty() http://bugs.python.org/issue27287 #27283: Add a "What's New" entry for PEP 519 http://bugs.python.org/issue27283 #27273: subprocess.run(cmd, input='text') should pass universal_newlin http://bugs.python.org/issue27273 #27269: ipaddress: Wrong behavior with ::ffff:1.2.3.4 style IPs http://bugs.python.org/issue27269 #27259: Possible missing deprecation warnings? http://bugs.python.org/issue27259 #27258: Exception in BytesGenerator.flatten http://bugs.python.org/issue27258 #27248: Possible refleaks in PyType_Ready in error condition http://bugs.python.org/issue27248 #27241: Catch exceptions raised in pstats add (repl) http://bugs.python.org/issue27241 #27240: 'UnstructuredTokenList' object has no attribute '_fold_as_ew' http://bugs.python.org/issue27240 #27227: argparse fails to parse [] when using choices and nargs='*' http://bugs.python.org/issue27227 #27223: _ready_ready and _write_ready should respect _conn_lost http://bugs.python.org/issue27223 #27222: redundant checks and a weird use of goto statements in long_rs http://bugs.python.org/issue27222 #27220: Add a pure Python version of 'collections.defaultdict' http://bugs.python.org/issue27220 #27218: improve tracing performance with f_trace set to Py_None http://bugs.python.org/issue27218 #27214: a potential future bug and an optimization that mostly undermi http://bugs.python.org/issue27214 Most recent 15 issues waiting for review (15) ============================================= #27286: str object got multiple values for keyword argument http://bugs.python.org/issue27286 #27281: unpickling an xmlrpc.client.Fault raises TypeError http://bugs.python.org/issue27281 #27273: subprocess.run(cmd, input='text') should pass universal_newlin http://bugs.python.org/issue27273 #27270: 'parentheses-equality' warnings when building with clang and c http://bugs.python.org/issue27270 #27266: Always use getrandom() in os.random() on Linux and add block=F http://bugs.python.org/issue27266 #27262: IDLE: move Aqua context menu code to maxosx http://bugs.python.org/issue27262 #27255: More opcode predictions http://bugs.python.org/issue27255 #27253: More efficient deepcopying of Mapping http://bugs.python.org/issue27253 #27252: Make dict views copyable http://bugs.python.org/issue27252 #27248: Possible refleaks in PyType_Ready in error condition http://bugs.python.org/issue27248 #27245: IDLE: Fix deletion of custom themes and key bindings http://bugs.python.org/issue27245 #27243: __aiter__ should return async iterator instead of awaitable http://bugs.python.org/issue27243 #27242: Make the docs for NotImplemented & NotImplementedError unambig http://bugs.python.org/issue27242 #27241: Catch exceptions raised in pstats add (repl) http://bugs.python.org/issue27241 #27238: Bare except: usages in turtle.py http://bugs.python.org/issue27238 Top 10 most discussed issues (10) ================================= #26839: Python 3.5 running on Linux kernel 3.17+ can block at startup http://bugs.python.org/issue26839 144 msgs #27266: Always use getrandom() in os.random() on Linux and add block=F http://bugs.python.org/issue27266 61 msgs #27186: add os.fspath() http://bugs.python.org/issue27186 18 msgs #27243: __aiter__ should return async iterator instead of awaitable http://bugs.python.org/issue27243 18 msgs #27272: random.Random should not read 2500 bytes from urandom http://bugs.python.org/issue27272 18 msgs #5124: IDLE - pasting text doesn't delete selection http://bugs.python.org/issue5124 16 msgs #27198: Adding an assertClose() method to unittest.TestCase http://bugs.python.org/issue27198 13 msgs #27250: Add os.urandom_block() http://bugs.python.org/issue27250 13 msgs #23401: Add pickle support of Mapping views http://bugs.python.org/issue23401 12 msgs #25548: Show the address in the repr for class objects http://bugs.python.org/issue25548 11 msgs Issues closed (71) ================== #8491: Need readline command and keybinding information http://bugs.python.org/issue8491 closed by martin.panter #12962: TitledHelpFormatter and IndentedHelpFormatter are not document http://bugs.python.org/issue12962 closed by berker.peksag #13771: HTTPSConnection __init__ super implementation causes recursion http://bugs.python.org/issue13771 closed by berker.peksag #15476: Index "code object" and link to code object definition http://bugs.python.org/issue15476 closed by martin.panter #17888: docs: more information on documentation team http://bugs.python.org/issue17888 closed by berker.peksag #18027: distutils should access stat_result timestamps via .st_*time a http://bugs.python.org/issue18027 closed by berker.peksag #18117: Missing symlink:Current after Mac OS X 3.3.2 package installat http://bugs.python.org/issue18117 closed by ned.deily #19234: socket.fileno() documentation http://bugs.python.org/issue19234 closed by Jelle Zijlstra #19611: inspect.getcallargs doesn't properly interpret set comprehensi http://bugs.python.org/issue19611 closed by ncoghlan #20041: TypeError when f_trace is None and tracing. http://bugs.python.org/issue20041 closed by serhiy.storchaka #20567: test_idle causes test_ttk_guionly 'can't invoke "event" comman http://bugs.python.org/issue20567 closed by terry.reedy #21272: use _sysconfigdata to itinialize distutils.sysconfig http://bugs.python.org/issue21272 closed by doko #21277: don't try to link _ctypes with a ffi_convenience library http://bugs.python.org/issue21277 closed by doko #21313: Py_GetVersion() is broken when using mqueue and a long patch n http://bugs.python.org/issue21313 closed by martin.panter #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 closed by serhiy.storchaka #22797: urllib.request.urlopen documentation falsely guarantees that a http://bugs.python.org/issue22797 closed by r.david.murray #23264: Add pickle support of dict views http://bugs.python.org/issue23264 closed by serhiy.storchaka #24617: os.makedirs()'s [mode] not correct http://bugs.python.org/issue24617 closed by martin.panter #24810: UX mode for IDLE targeted to 'new learners' http://bugs.python.org/issue24810 closed by terry.reedy #25738: http.server doesn't handle RESET CONTENT status correctly http://bugs.python.org/issue25738 closed by martin.panter #25941: Add 'How to Review a Patch' section to devguide http://bugs.python.org/issue25941 closed by ned.deily #26014: Guide users to the newer package install instructions http://bugs.python.org/issue26014 closed by ned.deily #26305: Make Argument Clinic to generate PEP 7 conforming code http://bugs.python.org/issue26305 closed by serhiy.storchaka #26372: Popen.communicate not ignoring BrokenPipeError http://bugs.python.org/issue26372 closed by gregory.p.smith #26437: asyncio create_server() not always accepts the 'port' paramete http://bugs.python.org/issue26437 closed by berker.peksag #26448: dis.findlabels ignores EXTENDED_ARG http://bugs.python.org/issue26448 closed by serhiy.storchaka #26809: Add __all__ list to the string module http://bugs.python.org/issue26809 closed by python-dev #26884: android: cross-compilation of extension module links to the wr http://bugs.python.org/issue26884 closed by doko #26983: float() can return not exact float instance http://bugs.python.org/issue26983 closed by serhiy.storchaka #27052: Python2.7.11+ as in Debian testing and Ubuntu 16.04 LTS crashe http://bugs.python.org/issue27052 closed by doko #27066: SystemError if custom opener returns -1 http://bugs.python.org/issue27066 closed by barry #27072: random.getrandbits is limited to 2**31-1 bits on 64-bit Window http://bugs.python.org/issue27072 closed by rhettinger #27073: redundant checks in long_add and long_sub http://bugs.python.org/issue27073 closed by serhiy.storchaka #27105: cgi.__all__ is incomplete http://bugs.python.org/issue27105 closed by martin.panter #27107: mailbox.__all__ list is incomplete http://bugs.python.org/issue27107 closed by martin.panter #27108: mimetypes.__all__ list is incomplete http://bugs.python.org/issue27108 closed by martin.panter #27109: plistlib.__all__ list is incomplete http://bugs.python.org/issue27109 closed by martin.panter #27110: smtpd.__all__ list is incomplete http://bugs.python.org/issue27110 closed by martin.panter #27127: Never have GET_ITER not followed by FOR_ITER http://bugs.python.org/issue27127 closed by Demur Rumed #27136: sock_connect fails for bluetooth (and probably others) http://bugs.python.org/issue27136 closed by yselivanov #27156: IDLE: remove unused code http://bugs.python.org/issue27156 closed by terry.reedy #27164: zlib can't decompress DEFLATE using shared dictionary http://bugs.python.org/issue27164 closed by martin.panter #27167: subprocess reports signal as negative exit status, not documen http://bugs.python.org/issue27167 closed by gregory.p.smith #27187: Relax __all__ location requirement in PEP 8 http://bugs.python.org/issue27187 closed by python-dev #27196: Eliminate 'ThemeChanged' warning when running IDLE tests http://bugs.python.org/issue27196 closed by terry.reedy #27202: make doctest fails on 2.7 release notes http://bugs.python.org/issue27202 closed by orsenthil #27203: Failing doctests in Doc/faq/programming.rst http://bugs.python.org/issue27203 closed by orsenthil #27211: Heap corruption via Python 2.7.11 IOBase readline() http://bugs.python.org/issue27211 closed by python-dev #27215: Docstrings of Sequence and MutableSequence seems not right http://bugs.python.org/issue27215 closed by rhettinger #27216: Fix capitalisation of "Python runtime" in os.path.islink descr http://bugs.python.org/issue27216 closed by ned.deily #27217: IDLE 3.5.1 not using Tk 8.6 http://bugs.python.org/issue27217 closed by ned.deily #27224: IDLE: editor versus grep line number differ http://bugs.python.org/issue27224 closed by terry.reedy #27225: Potential refleak in type_new when setting __new__ fails http://bugs.python.org/issue27225 closed by serhiy.storchaka #27228: just for clearing: os.path.normpath("file://a") returns "file: http://bugs.python.org/issue27228 closed by georg.brandl #27229: In tree cross-build fails copying Include/graminit.h to itsel http://bugs.python.org/issue27229 closed by martin.panter #27230: Calculation involving mpmath gives wrong result with Python 3. http://bugs.python.org/issue27230 closed by ned.deily #27234: tuple - single value with comma is assigned as type tuple http://bugs.python.org/issue27234 closed by steven.daprano #27236: Add CHAINED_COMPARE_OP opcode http://bugs.python.org/issue27236 closed by serhiy.storchaka #27237: Kafka Python Consumer Messages gets truncated http://bugs.python.org/issue27237 closed by ned.deily #27239: Make idlelib.macosx self-contained. http://bugs.python.org/issue27239 closed by terry.reedy #27246: Keyboard Shortcuts Crash Idle http://bugs.python.org/issue27246 closed by ebarry #27247: telnetlib AttributeError: 'error' object has no attribute 'err http://bugs.python.org/issue27247 closed by berker.peksag #27249: Add os.urandom_info http://bugs.python.org/issue27249 closed by haypo #27251: TypeError in logging.HTTPHandler.emit; possible python 2 to 3 http://bugs.python.org/issue27251 closed by vinay.sajip #27264: python 3.4 vs. 3.5 strftime same locale different output on Wi http://bugs.python.org/issue27264 closed by eryksun #27265: Hash of different, specific Decimals created from str is the s http://bugs.python.org/issue27265 closed by mark.dickinson #27267: memory leak in _ssl.c, function load_cert_chain http://bugs.python.org/issue27267 closed by python-dev #27271: asyncio lost udp packets http://bugs.python.org/issue27271 closed by gvanrossum #27276: FileFinder.find_spec() incompatible with finder specification http://bugs.python.org/issue27276 closed by paulmar #27280: Paste fail in ipaddress documentation http://bugs.python.org/issue27280 closed by berker.peksag #27284: Spam http://bugs.python.org/issue27284 closed by eryksun From victor.stinner at gmail.com Fri Jun 10 12:09:02 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 10 Jun 2016 18:09:02 +0200 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: <20160610132051.GH27919@ando.pearwood.info> <1465569266.4029.43.camel@redhat.com> Message-ID: 2016-06-10 17:09 GMT+02:00 Paul Moore : > Also, the way people commonly use > micro-benchmarks ("hey, look, this way of writing the expression goes > faster than that way") doesn't really address questions like "is the > difference statistically significant". If you use the "python3 -m perf compare method1.json method2.json", perf will checks that the difference is significant using the is_significant() method: http://perf.readthedocs.io/en/latest/api.html#perf.is_significant "This uses a Student?s two-sample, two-tailed t-test with alpha=0.95." FYI at the beginning, this function comes from the Unladen Swallow benchmark suite ;-) We should design a CLI command to do timeit+compare at once. Victor From guido at python.org Fri Jun 10 12:23:17 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Jun 2016 09:23:17 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> Message-ID: I somehow feel compelled to clarify that (perhaps unlike Larry) my concern is not the strict rules of backwards compatibility (if that was the case I would have objected to changing this in 3.5.2). I just don't like the potentially blocking behavior, and experts' opinions seem to widely vary on how insecure the fallback bits really are, how likely you are to find yourself in that situation, and how probable an exploit would be. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jun 10 12:30:26 2016 From: brett at python.org (Brett Cannon) Date: Fri, 10 Jun 2016 16:30:26 +0000 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: On Thu, 9 Jun 2016 at 19:53 Mark Lawrence via Python-Dev < python-dev at python.org> wrote: > On 10/06/2016 00:43, Brett Cannon wrote: > > > > That's not what I'm saying at all (nor what I think Nick is saying); > > more tooling to ease the transition is always welcomed. The point we are > > trying to make is 2to3 is not considered best practice anymore, and so > > targeting its specific output might not be the best use of your time. > > I'm totally happy to have your fork work out and help give warnings for > > situations where runtime semantics are the only way to know there will > > be a problem that static analyzing tools can't handle and have the > > porting HOWTO updated so that people can run their test suite with your > > interpreter to help with that final bit of porting. I personally just > > don't want to see you waste time on warnings that are handled by the > > tools already or ignore the fact that six, modernize, and futurize can > > help more than 2to3 typically can with the easy stuff when trying to > > keep 2/3 compatibility. IOW some of us have become allergic to the word > > "2to3" in regards to porting. :) But if you want to target 2to3 output > > then by all means please do and your work will still be appreciated. > > > > Given the above and that 2to3 appears to be unsupported* is there a case > for deprecating it? > I don't think so because it's still a useful transpiler tool. Basically the community has decided the standard rewriters included with 2to3 aren't how people prefer to port, but 2to3 as a tool is the basis of both modernize and futurize (as are some of those rewriters, but tweaked to do something different). > > * There are 46 outstanding issues on the bug tracker. Is the above the > reason for this, I don't know? > Typically the bugs are for the rewrite rules and they are for edge cases that no one wants to try and tackle as they are tough to cover (although this is based on what comes through my inbox so my generalization could be wrong). -Brett > > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > > Mark Lawrence > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Jun 10 12:42:32 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 10 Jun 2016 09:42:32 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: <575772E6.7040906@stoneleaf.us> Message-ID: On Thu, Jun 9, 2016 at 2:39 PM, Nick Coghlan wrote: > I'm guessing Ethan is suggesting defining it as: > > __definition_order__ = tuple(ns["__definition_order__"]) > > When the attribute is present in the method body. Ah. I'd rather stick to "consenting adults" in the case that __definition_order__ is explicitly set. We'll strongly recommend setting it to None or a tuple of identifier strings. > > That restriction would be comparable to what we do with __slots__ today: > > >>> class C: > ... __slots__ = 1 > ... > Traceback (most recent call last): > File "", line 1, in > TypeError: 'int' object is not iterable Are you suggesting that we require it be a tuple of identifiers (or None) and raise TypeError otherwise, similar to __slots__? The difference is that __slots__ has specific type requirements that do not apply to __definition_order__, as well as a different purpose. __definition_order__ is about preserving definition-type info that we are currently throwing away. -eric From ericsnowcurrently at gmail.com Fri Jun 10 12:49:10 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 10 Jun 2016 09:49:10 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: On Thu, Jun 9, 2016 at 12:41 PM, wrote: > Is there any further thoughts on including this in 3.6? I don't have any plans and I don't know of anyone willing to champion the PEP for 3.6. Note that the implementation itself shouldn't take very long. > Similar to the > recent discussion on OrderedDict namespaces for metaclasses, this would > simplify / enable a number of type factory use cases where proper > metaclasses are overkill. This feature would also be quite nice in say > pandas where the (currently unspecified) field order used in the > definition of frames is preserved in user-visible displays. Good point. One weakness of the PEP has been sufficient justification. The greater number of compelling use cases, the better. So thanks! :) -eric From ericsnowcurrently at gmail.com Fri Jun 10 12:54:32 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 10 Jun 2016 09:54:32 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: On Thu, Jun 9, 2016 at 1:10 PM, ?manuel Barry wrote: > As stated by Guido (and pointed out in the PEP): > > Making **kwds ordered is still open, but requires careful design and > implementation to avoid slowing down function calls that don't benefit. > > The PEP has not been updated in a while, though. Python 3.5 has been > released, and with it a C implementation of OrderedDict. > > Eric, are you still interested in this? Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6). I'm certainly not opposed to someone picking up the banner. > IIRC that PEP was one of the > motivating use cases for implementing OrderedDict in C. Correct, though I'm not sure OrderedDict needs to be involved any more. > Maybe it's time for > a second round of discussion on Python-ideas? Fine with me, though I won't have a lot of time in the 3.6 timeframe to handle a high-volume discussion or push through an implementation. -eric From tjreedy at udel.edu Fri Jun 10 12:55:22 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Jun 2016 12:55:22 -0400 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <20160610132051.GH27919@ando.pearwood.info> References: <20160610132051.GH27919@ando.pearwood.info> Message-ID: On 6/10/2016 9:20 AM, Steven D'Aprano wrote: > On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote: >> Hi, >> >> Last weeks, I made researchs on how to get stable and reliable >> benchmarks, especially for the corner case of microbenchmarks. The >> first result is a serie of article, here are the first three: > > Thank you for this! I am very interested in benchmarking. > >> https://haypo.github.io/journey-to-stable-benchmark-system.html >> https://haypo.github.io/journey-to-stable-benchmark-deadcode.html >> https://haypo.github.io/journey-to-stable-benchmark-average.html > > I strongly question your statement in the third: > > [quote] > But how can we compare performances if results are random? > Take the minimum? > > No! You must never (ever again) use the minimum for > benchmarking! Compute the average and some statistics like > the standard deviation: > [end quote] > > > While I'm happy to see a real-world use for the statistics module, I > disagree with your logic. > > The problem is that random noise can only ever slow the code down, it > cannot speed it up. To put it another way, the random errors in the > timings are always positive. > > Suppose you micro-benchmark some code snippet and get a series of > timings. We can model the measured times as: > > measured time t = T + ? > > where T is the unknown "true" timing we wish to estimate, For comparative timings, we do not care about T. So arguments about the best estimate of T mist the point. What we do wish to estimate is the relationship between two Ts, T0 for 'control', and T1 for 'treatment', in particular T1/T0. I suspect Viktor is correct that mean(t1)/mean(t0) is better than min(t1)/min(t0) as an estimate of the true ratio T1/T0 (for a particular machine). But given that we have matched pairs of measurements with the same hashseed and address, it may be better yet to estimate T1/T0 from the ratios t1i/t0i, where i indexes experimental conditions. But it has been a long time since I have read about estimation of ratios. What I remember is that this is a nasty subject. It is also the case that while an individual with one machine wants the best ratio for that machine, we need to make CPython patch decisions for the universe of machines that run Python. > and ? is some variable error due to noise in the system. > But ? is always positive, never negative, lognormal might be a first guess. But what we really have is contributions from multiple factors, -- Terry Jan Reedy From sebastian at realpath.org Fri Jun 10 13:01:23 2016 From: sebastian at realpath.org (Sebastian Krause) Date: Fri, 10 Jun 2016 19:01:23 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: (Guido van Rossum's message of "Fri, 10 Jun 2016 09:23:17 -0700") References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> Message-ID: Guido van Rossum wrote: > I just don't like the potentially blocking behavior, and experts' opinions > seem to widely vary on how insecure the fallback bits really are, how > likely you are to find yourself in that situation, and how probable an > exploit would be. This is not just a theoretical problem being discussed by security experts that *could* be exploited, there have already been multiple real-life cases of devices (mostly embedded Linux machines) generating predicatable SSH keys because they read from an uninitialized /dev/urandom at first boot. Most recently in the Raspbian distribution for the Raspberry Pi: https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892 At least in 3.6 there should be obvious way to get random data that *always* guarantees to be secure and either fails or blocks if it can't guarantee that. Sebastian From zreed at fastmail.com Fri Jun 10 13:04:31 2016 From: zreed at fastmail.com (zreed at fastmail.com) Date: Fri, 10 Jun 2016 12:04:31 -0500 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: <1465578271.1903265.634024905.666E8FA2@webmail.messagingengine.com> I would be super excited for this feature, so if there's a reasonable chance of it being picked up I don't mind doing the implementation work. On Fri, Jun 10, 2016, at 11:54 AM, Eric Snow wrote: > On Thu, Jun 9, 2016 at 1:10 PM, ?manuel Barry wrote: > > As stated by Guido (and pointed out in the PEP): > > > > Making **kwds ordered is still open, but requires careful design and > > implementation to avoid slowing down function calls that don't benefit. > > > > The PEP has not been updated in a while, though. Python 3.5 has been > > released, and with it a C implementation of OrderedDict. > > > > Eric, are you still interested in this? > > Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6). > I'm certainly not opposed to someone picking up the banner. > > > > IIRC that PEP was one of the > > motivating use cases for implementing OrderedDict in C. > > Correct, though I'm not sure OrderedDict needs to be involved any more. > > > Maybe it's time for > > a second round of discussion on Python-ideas? > > Fine with me, though I won't have a lot of time in the 3.6 timeframe > to handle a high-volume discussion or push through an implementation. > > -eric From tjreedy at udel.edu Fri Jun 10 13:04:51 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Jun 2016 13:04:51 -0400 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: <20160610132051.GH27919@ando.pearwood.info> Message-ID: On 6/10/2016 11:07 AM, Victor Stinner wrote: > I started to work on visualisation. IMHO it helps to understand the problem. > > Let's create a large dataset: 500 samples (100 processes x 5 samples): As I finished by response to Steven, I was thinking you should do something like this to get real data. > --- > $ python3 telco.py --json-file=telco.json -p 100 -n 5 > --- > > Attached plot.py script creates an histogram: > --- > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms > > 26.1 ms: 1 # > 26.2 ms: 12 ##### > 26.3 ms: 34 ############ > 26.4 ms: 44 ################ > 26.5 ms: 109 ###################################### > 26.6 ms: 117 ######################################## > 26.7 ms: 86 ############################## > 26.8 ms: 50 ################## > 26.9 ms: 32 ########### > 27.0 ms: 10 #### > 27.1 ms: 3 ## > 27.2 ms: 1 # > 27.3 ms: 1 # > > minimum 26.1 ms: 0.2% (1) of 500 samples > --- > > Replace "if 1" with "if 0" to produce a graphical view, or just view > the attached distribution.png, the numpy+scipy histogram. > > The distribution looks a gaussian curve: > https://en.wikipedia.org/wiki/Gaussian_function I am not too surprised. If there are several somewhat independent sources of slowdown, their sum would tend to be normal. I am also not surprised that there is also a bit of skewness, but probably not enough to worry about. > The interesting thing is that only 1 sample on 500 are in the minimum > bucket (26.1 ms). If you say that the performance is 26.1 ms, only > 0.2% of your users will be able to reproduce this timing. > > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us > 394/500 = 79%. > > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than > 26.1 ms (0.2%). -- Terry Jan Reedy From tritium-list at sdamon.com Fri Jun 10 13:05:58 2016 From: tritium-list at sdamon.com (Alex Walters) Date: Fri, 10 Jun 2016 13:05:58 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> Message-ID: <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> > -----Original Message----- > From: Python-Dev [mailto:python-dev-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Sebastian Krause > Sent: Friday, June 10, 2016 1:01 PM > To: python-dev at python.org > Subject: Re: [Python-Dev] BDFL ruling request: should we block forever > waiting for high-quality random bits? > > Guido van Rossum wrote: > > I just don't like the potentially blocking behavior, and experts' opinions > > seem to widely vary on how insecure the fallback bits really are, how > > likely you are to find yourself in that situation, and how probable an > > exploit would be. > > This is not just a theoretical problem being discussed by security > experts that *could* be exploited, there have already been multiple > real-life cases of devices (mostly embedded Linux machines) > generating predicatable SSH keys because they read from an > uninitialized /dev/urandom at first boot. Most recently in the > Raspbian distribution for the Raspberry Pi: > https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892 > > At least in 3.6 there should be obvious way to get random data that > *always* guarantees to be secure and either fails or blocks if it > can't guarantee that. > > Sebastian And that should live in the secrets module. From steve at pearwood.info Fri Jun 10 13:04:54 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jun 2016 03:04:54 +1000 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: <20160610132051.GH27919@ando.pearwood.info> Message-ID: <20160610170453.GI27919@ando.pearwood.info> On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote: > I started to work on visualisation. IMHO it helps to understand the problem. > > Let's create a large dataset: 500 samples (100 processes x 5 samples): > --- > $ python3 telco.py --json-file=telco.json -p 100 -n 5 > --- > > Attached plot.py script creates an histogram: > --- > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms > > 26.1 ms: 1 # > 26.2 ms: 12 ##### > 26.3 ms: 34 ############ > 26.4 ms: 44 ################ > 26.5 ms: 109 ###################################### > 26.6 ms: 117 ######################################## > 26.7 ms: 86 ############################## > 26.8 ms: 50 ################## > 26.9 ms: 32 ########### > 27.0 ms: 10 #### > 27.1 ms: 3 ## > 27.2 ms: 1 # > 27.3 ms: 1 # > > minimum 26.1 ms: 0.2% (1) of 500 samples > --- [...] > The distribution looks a gaussian curve: > https://en.wikipedia.org/wiki/Gaussian_function Lots of distributions look a bit Gaussian, but they can be skewed, or truncated, or both. E.g. the average life-span of a lightbulb is approximately Gaussian with a central peak at some value (let's say 5000 hours), but while it is conceivable that you might be really lucky and find a bulb that lasts 15000 hours, it isn't possible to find one that lasts -10000 hours. The distribution is truncated on the left. To me, your graph looks like the distribution is skewed: the right-hand tail (shown at the bottom) is longer than the left-hand tail, six buckets compared to five buckets. There are actual statistical tests for detecting deviation from Gaussian curves, but I'd have to look them up. But as a really quick and dirty test, we can count the number of samples on either side of the central peak (the mode): left: 109+44+34+12+1 = 200 centre: 117 right: 500 - 200 - 117 = 183 It certainly looks *close* to Gaussian, but with the crude tests we are using, we can't be sure. If you took more and more samples, I would expect that the right-hand tail would get longer and longer, but the left-hand tail would not. > The interesting thing is that only 1 sample on 500 are in the minimum > bucket (26.1 ms). If you say that the performance is 26.1 ms, only > 0.2% of your users will be able to reproduce this timing. Hmmm. Okay, that is a good point. In this case, you're not so much reporting your estimate of what the "true speed" of the code snippet would be in the absence of all noise, but your estimate of what your users should expect to experience "most of the time". Assuming they have exactly the same hardware, operating system, and load on their system as you have. > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us > 394/500 = 79%. > > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than > 26.1 ms (0.2%). I think I understand the point you are making. I'll have to think about it some more to decide if I agree with you. But either way, I think the work you have done on perf is fantastic and I think this will be a great tool. I really love the histogram. Can you draw a histogram of two functions side-by-side, for comparisons? -- Steve From ncoghlan at gmail.com Fri Jun 10 13:40:53 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jun 2016 10:40:53 -0700 Subject: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview In-Reply-To: <20160609222157.2063ca00@anarchist.wooz.org> References: <57572E5D.4020101@stoneleaf.us> <20160609222157.2063ca00@anarchist.wooz.org> Message-ID: On 9 June 2016 at 19:21, Barry Warsaw wrote: > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote: > >>Deprecation of current "zero-initialised sequence" behaviour >>------------------------------------------------------------ >> >>Currently, the ``bytes`` and ``bytearray`` constructors accept an integer >>argument and interpret it as meaning to create a zero-initialised sequence of >>the given size:: >> >> >>> bytes(3) >> b'\x00\x00\x00' >> >>> bytearray(3) >> bytearray(b'\x00\x00\x00') >> >>This PEP proposes to deprecate that behaviour in Python 3.6, and remove it >>entirely in Python 3.7. >> >>No other changes are proposed to the existing constructors. > > Does it need to be *actually* removed? That does break existing code for not > a lot of benefit. Yes, the default constructor is a little wonky, but with > the addition of the new constructors, and the fact that you're not proposing > to eventually change the default constructor, removal seems unnecessary. > Besides, once it's removed, what would `bytes(3)` actually do? The PEP > doesn't say. Raise TypeError, presumably. However, I agree this isn't worth the hassle of breaking working code, especially since truly ludicrous values will fail promptly with MemoryError - it's only a particular range of values that fit within the limits of the machine, but also push it into heavy swapping that are a potential problem. > Also, since you're proposing to add `bytes.byte(3)` have you considered also > adding an optional count argument? E.g. `bytes.byte(3, count=7)` would yield > b'\x03\x03\x03\x03\x03\x03\x03'. That seems like it could be useful. The purpose of bytes.byte() in the PEP is to provide a way to roundtrip ord() calls with binary inputs, since the current spelling is pretty unintuitive: >>> ord("A") 65 >>> chr(ord("A")) 'A' >>> ord(b"A") 65 >>> bytes([ord(b"A")]) b'A' That said, perhaps it would make more sense for the corresponding round-trip to be: >>> bchr(ord("A")) b'A' With the "b" prefix on "chr" reflecting the "b" prefix on the output. This also inverts the chr/unichr pairing that existed in Python 2 (replacing it with bchr/chr), and is hence very friendly to compatibility modules like six and future (future.builtins already provides a chr that behaves like the Python 3 one, and bchr would be much easier to add to that than a new bytes object method). In terms of an efficient memory-preallocation interface, the equivalent NumPy operation to request a pre-filled array is "ndarray.full": http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.full.html (there's also an inplace mutation operation, "fill") For bytes and bytearray though, that has an unfortunate name collision with "zfill", which refers to zero-padding numeric values for fixed width display. If the PEP just added bchr() to complement chr(), and [bytes, bytearray].zeros() as a more discoverable alternative to passing integers to the default constructor, I think that would be a decent step forward, and the question of pre-initialising with arbitrary values can be deferred for now (and perhaps left to NumPy indefinitely) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Fri Jun 10 13:41:48 2016 From: brett at python.org (Brett Cannon) Date: Fri, 10 Jun 2016 17:41:48 +0000 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <20160610170453.GI27919@ando.pearwood.info> References: <20160610132051.GH27919@ando.pearwood.info> <20160610170453.GI27919@ando.pearwood.info> Message-ID: On Fri, 10 Jun 2016 at 10:11 Steven D'Aprano wrote: > On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote: > > I started to work on visualisation. IMHO it helps to understand the > problem. > > > > Let's create a large dataset: 500 samples (100 processes x 5 samples): > > --- > > $ python3 telco.py --json-file=telco.json -p 100 -n 5 > > --- > > > > Attached plot.py script creates an histogram: > > --- > > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms > > > > 26.1 ms: 1 # > > 26.2 ms: 12 ##### > > 26.3 ms: 34 ############ > > 26.4 ms: 44 ################ > > 26.5 ms: 109 ###################################### > > 26.6 ms: 117 ######################################## > > 26.7 ms: 86 ############################## > > 26.8 ms: 50 ################## > > 26.9 ms: 32 ########### > > 27.0 ms: 10 #### > > 27.1 ms: 3 ## > > 27.2 ms: 1 # > > 27.3 ms: 1 # > > > > minimum 26.1 ms: 0.2% (1) of 500 samples > > --- > [...] > > The distribution looks a gaussian curve: > > https://en.wikipedia.org/wiki/Gaussian_function > > Lots of distributions look a bit Gaussian, but they can be skewed, or > truncated, or both. E.g. the average life-span of a lightbulb is > approximately Gaussian with a central peak at some value (let's say 5000 > hours), but while it is conceivable that you might be really lucky and > find a bulb that lasts 15000 hours, it isn't possible to find one that > lasts -10000 hours. The distribution is truncated on the left. > > To me, your graph looks like the distribution is skewed: the right-hand > tail (shown at the bottom) is longer than the left-hand tail, six > buckets compared to five buckets. There are actual statistical tests for > detecting deviation from Gaussian curves, but I'd have to look them up. > But as a really quick and dirty test, we can count the number of samples > on either side of the central peak (the mode): > > left: 109+44+34+12+1 = 200 > centre: 117 > right: 500 - 200 - 117 = 183 > > It certainly looks *close* to Gaussian, but with the crude tests we are > using, we can't be sure. If you took more and more samples, I would > expect that the right-hand tail would get longer and longer, but the > left-hand tail would not. > > > > The interesting thing is that only 1 sample on 500 are in the minimum > > bucket (26.1 ms). If you say that the performance is 26.1 ms, only > > 0.2% of your users will be able to reproduce this timing. > > Hmmm. Okay, that is a good point. In this case, you're not so much > reporting your estimate of what the "true speed" of the code snippet > would be in the absence of all noise, but your estimate of what your > users should expect to experience "most of the time". > > I think the other way to think about why you don't want to use the minimum is what if one run just happened to get lucky and ran when nothing else was running (some random lull on the system), while the second run didn't get so lucky on magically hitting an equivalent lull? Using the average helps remove the "luck of the draw" potential of taking the minimum. This is why the PyPy folks suggested to Victor to not consider the minimum but the average instead; minimum doesn't measure typical system behaviour. > Assuming they have exactly the same hardware, operating system, and load > on their system as you have. > Sure, but that's true of any benchmarking. The only way to get accurate measurements for one's own system is to run the benchmarks yourself. -Brett > > > > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. > > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us > > 394/500 = 79%. > > > > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than > > 26.1 ms (0.2%). > > I think I understand the point you are making. I'll have to think about > it some more to decide if I agree with you. > > But either way, I think the work you have done on perf is fantastic and > I think this will be a great tool. I really love the histogram. Can you > draw a histogram of two functions side-by-side, for comparisons? > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 10 13:49:37 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jun 2016 10:49:37 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: On 9 June 2016 at 16:43, Brett Cannon wrote: > That's not what I'm saying at all (nor what I think Nick is saying); more > tooling to ease the transition is always welcomed. What Brett said is mostly accurate for me, except with one slight caveat: I've been explicitly trying to nudge you towards making the *existing tools better*, rather than introducing new tools. With modernize and futurize we have a fairly clear trade-off ("Do you want your code to look more like Python 2 or more like Python 3?"), and things like "pylint --py3k" and the static analyzers are purely additive to the migration process (so folks can take them or leave them), but alternate interpreter builds and new converters have really high barriers to adoption. More -3 warnings in Python 2.7 are definitely welcome (since those can pick up runtime behaviors that the static analysers miss), and if there are things the existing code converters and static analysers *could* detect but don't, that's a fruitful avenue for improvement as well. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jun 10 14:00:00 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jun 2016 11:00:00 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: On 10 June 2016 at 07:09, Cody Piersall wrote: >> One problem is that the str literals should be bytes >> literals. Comparison with None needs to be avoided. >> >> With Python 2 code runs successfully. With Python 3 the code >> crashes with a traceback. With my modified Python 3.6, the code >> runs successfully but generates the following warnings: >> >> test.py:13: DeprecationWarning: encoding bytes to str >> output.write('%d:' % len(s)) >> test.py:14: DeprecationWarning: encoding bytes to str >> output.write(s) >> test.py:15: DeprecationWarning: encoding bytes to str >> output.write(',') >> test.py:5: DeprecationWarning: encoding bytes to str >> if c == ':': >> test.py:9: DeprecationWarning: encoding bytes to str >> size += c >> test.py:24: DeprecationWarning: encoding bytes to str >> data = data + s >> test.py:26: DeprecationWarning: encoding bytes to str >> if input.read(1) != ',': >> test.py:31: DeprecationWarning: default compare is depreciated >> if a > 0: >> > > This seems _very_ useful; I'm surprised that other people don't think > so too. Currently, the easiest way to find bytes/str errors in a big > application is by running the program, finding where it crashes, > fixing that one line (or hopefully wherever the data entered the > system if you can find it), and repeating the process. It could be very interesting to add an "ascii-warn" codec to Python 2.7, and then set that as the default encoding when the -3 flag is set. The expressed lack of interest has been in the idea of recommending people use an alternate interpreter build (which has nothing to do with the usefulness of the added warnings, and everything to do with the logistics of distributing and adopting alternate runtimes), rather than in the concept of improving the available runtime compatibility warnings. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From neil at python.ca Fri Jun 10 14:00:45 2016 From: neil at python.ca (Neil Schemenauer) Date: Fri, 10 Jun 2016 11:00:45 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: <04c726d9-e53b-7242-a3e4-c0a2435efb67@python.ca> On 6/10/2016 10:49 AM, Nick Coghlan wrote: > What Brett said is mostly accurate for me, except with one slight > caveat: I've been explicitly trying to nudge you towards making the > *existing tools better*, rather than introducing new tools. With > modernize and futurize we have a fairly clear trade-off ("Do you want > your code to look more like Python 2 or more like Python 3?"), and > things like "pylint --py3k" and the static analyzers are purely > additive to the migration process (so folks can take them or leave > them), but alternate interpreter builds and new converters have really > high barriers to adoption. I agree with that idea. If there is anything that is "clean" enough, it should be merged with either 2.7.x or 3.x. There is nothing in my tree that can be usefully merged though. > More -3 warnings in Python 2.7 are definitely welcome (since those can > pick up runtime behaviors that the static analysers miss), and if > there are things the existing code converters and static analysers > *could* detect but don't, that's a fruitful avenue for improvement as > well. We are really limited on what can be done with the bytes/string issue because in Python 2 there is no distinct type for bytes. Also, the standard library does all sorts of unclean mixing of str and unicode so a warning would spew a lot of noise. Likewise, a warning about comparison behavior (None, default ordering of types) would also not be useful because there is so much standard library code that would spew warnings. From ncoghlan at gmail.com Fri Jun 10 14:16:43 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jun 2016 11:16:43 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: <04c726d9-e53b-7242-a3e4-c0a2435efb67@python.ca> References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> <04c726d9-e53b-7242-a3e4-c0a2435efb67@python.ca> Message-ID: On 10 June 2016 at 11:00, Neil Schemenauer wrote: > On 6/10/2016 10:49 AM, Nick Coghlan wrote: >> More -3 warnings in Python 2.7 are definitely welcome (since those can >> pick up runtime behaviors that the static analysers miss), and if >> there are things the existing code converters and static analysers >> *could* detect but don't, that's a fruitful avenue for improvement as >> well. > > We are really limited on what can be done with the bytes/string issue > because in Python 2 there is no distinct type for bytes. Also, the standard > library does all sorts of unclean mixing of str and unicode so a warning > would spew a lot of noise. > > Likewise, a warning about comparison behavior (None, default ordering of > types) would also not be useful because there is so much standard library > code that would spew warnings. Implicitly enabling those warnings universally with -3 might not be an option then, but it may be feasible to have those warnings ignored by default, and allow people to enable them selectively for their own code via the warnings module. Failing that, you may be right that there's value in a permissive Python 3.x variant as an optional compatibility testing tool (I admit I originally thought you were proposing such an environment as a production deployment target for partially migrated code, which I'd be thoroughly against, but as a tool for running a test suite or experimentally migrated instance it would be closer in spirit to the -3 switch and the static analysers - folks can use it if they think it will help them, but they don't need to worry about it if they don't need it themselves) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mertz at gnosis.cx Fri Jun 10 14:29:01 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 10 Jun 2016 11:29:01 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> Message-ID: This is fairly academic, since I do not anticipate needing to do this myself, but I have a specific question. I'll assume that Python 3.5.2 will go back to the 2.6-3.4 behavior in which os.urandom() never blocks on Linux. Moreover, I understand that the case where the insecure bits might be returned are limited to Python scripts that run on system initialization on Linux. If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 10 14:29:09 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jun 2016 11:29:09 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: <575772E6.7040906@stoneleaf.us> Message-ID: On 10 June 2016 at 09:42, Eric Snow wrote: > On Thu, Jun 9, 2016 at 2:39 PM, Nick Coghlan wrote: >> That restriction would be comparable to what we do with __slots__ today: >> >> >>> class C: >> ... __slots__ = 1 >> ... >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: 'int' object is not iterable > > Are you suggesting that we require it be a tuple of identifiers (or > None) and raise TypeError otherwise, similar to __slots__? The > difference is that __slots__ has specific type requirements that do > not apply to __definition_order__, as well as a different purpose. > __definition_order__ is about preserving definition-type info that we > are currently throwing away. If we don't enforce the tuple-of-identifiers restriction at type creation time, everyone that *doesn't* make it a tuple-of-identifiers is likely to have a subtle compatibility bug with class decorators and other code that assume the default tuple-of-identifiers format is the only possible format (aside from None). To put it in PEP 484 terms: regardless of what the PEP says, people are going to assume the type of __definition_order__ is Optional[Tuple[str]], as that's going to cover almost all class definitions they encounter. It makes sense to me to give class definitions and metaclasses the opportunity to change the *content* of the definition order: "Use these names in this order, not the names and order you would have calculated by default". It doesn't make sense to me to give them an opportunity to change the *form* of the definition order, since that makes it incredibly difficult to consume correctly: "Sure, it's *normally* a tuple-of-identifiers, but it *might* be a dictionary, or a complex number, or a set, or whatever the class author decided to make it". By contrast, if the class machinery enforces Optional[Tuple[str]], then it becomes a lot easier to consume reliably, and anyone violating the constraint gets an immediate exception when defining the offending class, rather than a potentially obscure exception from a class decorator or other piece of code that assumes __definition_order__ could only be None or a tuple of strings. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Fri Jun 10 14:33:17 2016 From: donald at stufft.io (Donald Stufft) Date: Fri, 10 Jun 2016 14:33:17 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> Message-ID: <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> > On Jun 10, 2016, at 2:29 PM, David Mertz wrote: > > If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all. Do you mean if os.urandom blocked and you wanted to call os.urandom from your boot script? Or if os.urandom doesn?t block and you wanted to ensure you got good random numbers on boot? ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmod at dropbox.com Fri Jun 10 14:37:17 2016 From: kmod at dropbox.com (Kevin Modzelewski) Date: Fri, 10 Jun 2016 11:37:17 -0700 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <20160610170453.GI27919@ando.pearwood.info> References: <20160610132051.GH27919@ando.pearwood.info> <20160610170453.GI27919@ando.pearwood.info> Message-ID: Hi all, I wrote a blog post about this. http://blog.kevmod.com/2016/06/benchmarking-minimum-vs-average/ We can rule out any argument that one (minimum or average) is strictly better than the other, since there are cases that make either one better. It comes down to our expectation of the underlying distribution. Victor if you could calculate the sample skewness of your results I think that would be very interesting! kmod On Fri, Jun 10, 2016 at 10:04 AM, Steven D'Aprano wrote: > On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote: > > I started to work on visualisation. IMHO it helps to understand the > problem. > > > > Let's create a large dataset: 500 samples (100 processes x 5 samples): > > --- > > $ python3 telco.py --json-file=telco.json -p 100 -n 5 > > --- > > > > Attached plot.py script creates an histogram: > > --- > > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms > > > > 26.1 ms: 1 # > > 26.2 ms: 12 ##### > > 26.3 ms: 34 ############ > > 26.4 ms: 44 ################ > > 26.5 ms: 109 ###################################### > > 26.6 ms: 117 ######################################## > > 26.7 ms: 86 ############################## > > 26.8 ms: 50 ################## > > 26.9 ms: 32 ########### > > 27.0 ms: 10 #### > > 27.1 ms: 3 ## > > 27.2 ms: 1 # > > 27.3 ms: 1 # > > > > minimum 26.1 ms: 0.2% (1) of 500 samples > > --- > [...] > > The distribution looks a gaussian curve: > > https://en.wikipedia.org/wiki/Gaussian_function > > Lots of distributions look a bit Gaussian, but they can be skewed, or > truncated, or both. E.g. the average life-span of a lightbulb is > approximately Gaussian with a central peak at some value (let's say 5000 > hours), but while it is conceivable that you might be really lucky and > find a bulb that lasts 15000 hours, it isn't possible to find one that > lasts -10000 hours. The distribution is truncated on the left. > > To me, your graph looks like the distribution is skewed: the right-hand > tail (shown at the bottom) is longer than the left-hand tail, six > buckets compared to five buckets. There are actual statistical tests for > detecting deviation from Gaussian curves, but I'd have to look them up. > But as a really quick and dirty test, we can count the number of samples > on either side of the central peak (the mode): > > left: 109+44+34+12+1 = 200 > centre: 117 > right: 500 - 200 - 117 = 183 > > It certainly looks *close* to Gaussian, but with the crude tests we are > using, we can't be sure. If you took more and more samples, I would > expect that the right-hand tail would get longer and longer, but the > left-hand tail would not. > > > > The interesting thing is that only 1 sample on 500 are in the minimum > > bucket (26.1 ms). If you say that the performance is 26.1 ms, only > > 0.2% of your users will be able to reproduce this timing. > > Hmmm. Okay, that is a good point. In this case, you're not so much > reporting your estimate of what the "true speed" of the code snippet > would be in the absence of all noise, but your estimate of what your > users should expect to experience "most of the time". > > Assuming they have exactly the same hardware, operating system, and load > on their system as you have. > > > > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. > > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us > > 394/500 = 79%. > > > > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than > > 26.1 ms (0.2%). > > I think I understand the point you are making. I'll have to think about > it some more to decide if I agree with you. > > But either way, I think the work you have done on perf is fantastic and > I think this will be a great tool. I really love the histogram. Can you > draw a histogram of two functions side-by-side, for comparisons? > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/kmod%40dropbox.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.jerdonek at gmail.com Fri Jun 10 14:42:40 2016 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Fri, 10 Jun 2016 11:42:40 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> Message-ID: On Fri, Jun 10, 2016 at 11:29 AM, David Mertz wrote: > This is fairly academic, since I do not anticipate needing to do this > myself, but I have a specific question. I'll assume that Python 3.5.2 will > go back to the 2.6-3.4 behavior in which os.urandom() never blocks on Linux. > Moreover, I understand that the case where the insecure bits might be > returned are limited to Python scripts that run on system initialization on > Linux. > > If I *were* someone who needed to write a Linux system initialization script > using Python 3.5.2, what would the code look like. I think for this use > case, requiring something with a little bit of "code smell" is fine, but I > kinda hope it exists at all. Good question. And going back to Larry's original e-mail, where he said-- On Thu, Jun 9, 2016 at 4:25 AM, Larry Hastings wrote: > THE PROBLEM > ... > The issue author had already identified the cause: CPython was blocking on > getrandom() in order to initialize hash randomization. On this fresh > virtual machine the entropy pool started out uninitialized. And since the > only thing running on the machine was CPython, and since CPython was blocked > on initialization, the entropy pool was initializing very, very slowly. it seems to me that you'd want such a solution to have code that causes the initialization of the entropy pool to be sped up so that it happens as quickly as possible (if that is even possible). Is it possible? (E.g. by causing the machine to start doing things other than just CPython?) --Chris From mertz at gnosis.cx Fri Jun 10 14:43:51 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 10 Jun 2016 11:43:51 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> Message-ID: My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux), and block rather than allow bad bits." I'm not quite sure I understand all of your question, Donald. On Python 3.4?and by BDFL declaration on 3.5.2?os.urandom() *will not* block, although it might on 3.5.1. On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft wrote: > > On Jun 10, 2016, at 2:29 PM, David Mertz wrote: > > If I *were* someone who needed to write a Linux system initialization > script using Python 3.5.2, what would the code look like. I think for this > use case, requiring something with a little bit of "code smell" is fine, > but I kinda hope it exists at all. > > > Do you mean if os.urandom blocked and you wanted to call os.urandom from > your boot script? Or if os.urandom doesn?t block and you wanted to ensure > you got good random numbers on boot? > > ? > Donald Stufft > > > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From meadori at gmail.com Fri Jun 10 14:47:19 2016 From: meadori at gmail.com (Meador Inge) Date: Fri, 10 Jun 2016 13:47:19 -0500 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: Message-ID: On Fri, Jun 10, 2016 at 6:13 AM, Victor Stinner wrote: The second result is a new perf module which includes all "tricks" > discovered in my research: compute average and standard deviation, > spawn multiple worker child processes, automatically calibrate the > number of outter-loop iterations, automatically pin worker processes > to isolated CPUs, and more. > Apologies in advance if this is answered in one of the links you posted, but out of curiosity was geometric mean considered? In the compiler world this is a very common way of aggregating performance results. -- Meador -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Fri Jun 10 14:54:05 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 10 Jun 2016 14:54:05 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: Eric, have you any work in progress on compact dicts? On Fri, Jun 10, 2016 at 12:54 PM, Eric Snow wrote: > On Thu, Jun 9, 2016 at 1:10 PM, ?manuel Barry wrote: >> As stated by Guido (and pointed out in the PEP): >> >> Making **kwds ordered is still open, but requires careful design and >> implementation to avoid slowing down function calls that don't benefit. >> >> The PEP has not been updated in a while, though. Python 3.5 has been >> released, and with it a C implementation of OrderedDict. >> >> Eric, are you still interested in this? > > Yes, but wasn't planning on dusting it off yet (i.e. in time for 3.6). > I'm certainly not opposed to someone picking up the banner. > > >> IIRC that PEP was one of the >> motivating use cases for implementing OrderedDict in C. > > Correct, though I'm not sure OrderedDict needs to be involved any more. > >> Maybe it's time for >> a second round of discussion on Python-ideas? > > Fine with me, though I won't have a lot of time in the 3.6 timeframe > to handle a high-volume discussion or push through an implementation. > > -eric > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com From donald at stufft.io Fri Jun 10 14:55:39 2016 From: donald at stufft.io (Donald Stufft) Date: Fri, 10 Jun 2016 14:55:39 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> Message-ID: <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Ok, so you?re looking for how would you replicate the blocking behavior of os.urandom that exists in 3.5.0 and 3.5.1? In that case, it?s hard. I don?t think linux provides any way to externally determine if /dev/urandom has been initialized or not. Probably the easiest thing to do would be to interface with the getrandom() function using a c-ext, CFFI, or ctypes. If you?re looking for a way of doing this without calling the getrandom() function.. I believe the answer is you can?t. The closest thing you can get is checking the /proc/sys/kernel/random/entropy_avail file, but that tells you how much entropy the system currently thinks it has (which will go up and down over time) and corresponds to /dev/random on Linux not /dev/urandom. You could read from /dev/random, but that?s going to randomly block outside of the pool initialization whenever the kernel things it doesn?t have enough entropy. Cryptographers and security experts alike consider this to be pretty stupid behavior and don?t recommend using it because of this ?randomly block throughout the use of your application? behavior. So really, out of the recommended solutions you really only have find a way to interface with the getrandom() function, or just consume /dev/urandom and hope it?s been initialized. > On Jun 10, 2016, at 2:43 PM, David Mertz wrote: > > My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux), and block rather than allow bad bits." > > I'm not quite sure I understand all of your question, Donald. On Python 3.4?and by BDFL declaration on 3.5.2?os.urandom() *will not* block, although it might on 3.5.1. > > On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft > wrote: > >> On Jun 10, 2016, at 2:29 PM, David Mertz > wrote: >> >> If I *were* someone who needed to write a Linux system initialization script using Python 3.5.2, what would the code look like. I think for this use case, requiring something with a little bit of "code smell" is fine, but I kinda hope it exists at all. > > > Do you mean if os.urandom blocked and you wanted to call os.urandom from your boot script? Or if os.urandom doesn?t block and you wanted to ensure you got good random numbers on boot? > > ? > Donald Stufft > > > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Fri Jun 10 15:01:48 2016 From: donald at stufft.io (Donald Stufft) Date: Fri, 10 Jun 2016 15:01:48 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: > On Jun 10, 2016, at 2:55 PM, Donald Stufft wrote: > > So really, out of the recommended solutions you really only have find a way to interface with the getrandom() function, or just consume /dev/urandom and hope it?s been initialized. I?d note, this is one of the reasons why I felt like blocking (or raising an exception) on os.urandom was the right solution? because it?s hard to get that behavior on Linux otherwise. However, if we instead kept the blocking (or exception) behavior, getting the old behavior back on Linux is trivial, since it would only require open(?/dev/urandom?).read(?). ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Jun 10 15:05:38 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 10 Jun 2016 12:05:38 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6? It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that. On Fri, Jun 10, 2016 at 11:55 AM, Donald Stufft wrote: > Ok, so you?re looking for how would you replicate the blocking behavior of > os.urandom that exists in 3.5.0 and 3.5.1? > > In that case, it?s hard. I don?t think linux provides any way to > externally determine if /dev/urandom has been initialized or not. Probably > the easiest thing to do would be to interface with the getrandom() function > using a c-ext, CFFI, or ctypes. If you?re looking for a way of doing this > without calling the getrandom() function.. I believe the answer is you > can?t. > > The closest thing you can get is checking > the /proc/sys/kernel/random/entropy_avail file, but that tells you how much > entropy the system currently thinks it has (which will go up and down over > time) and corresponds to /dev/random on Linux not /dev/urandom. > > You could read from /dev/random, but that?s going to randomly block > outside of the pool initialization whenever the kernel things it doesn?t > have enough entropy. Cryptographers and security experts alike consider > this to be pretty stupid behavior and don?t recommend using it because of > this ?randomly block throughout the use of your application? behavior. > > So really, out of the recommended solutions you really only have find a > way to interface with the getrandom() function, or just consume > /dev/urandom and hope it?s been initialized. > > > On Jun 10, 2016, at 2:43 PM, David Mertz wrote: > > My hypothetical is "Ensure good random bits (on Python 3.5.2 and Linux), > and block rather than allow bad bits." > > I'm not quite sure I understand all of your question, Donald. On Python > 3.4?and by BDFL declaration on 3.5.2?os.urandom() *will not* block, > although it might on 3.5.1. > > On Fri, Jun 10, 2016 at 11:33 AM, Donald Stufft wrote: > >> >> On Jun 10, 2016, at 2:29 PM, David Mertz wrote: >> >> If I *were* someone who needed to write a Linux system initialization >> script using Python 3.5.2, what would the code look like. I think for this >> use case, requiring something with a little bit of "code smell" is fine, >> but I kinda hope it exists at all. >> >> >> Do you mean if os.urandom blocked and you wanted to call os.urandom from >> your boot script? Or if os.urandom doesn?t block and you wanted to ensure >> you got good random numbers on boot? >> >> ? >> Donald Stufft >> >> >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > > > ? > Donald Stufft > > > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Jun 10 15:16:30 2016 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 10 Jun 2016 14:16:30 -0500 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: [David Mertz] > OK. My understanding is that Guido ruled out introducing an os.getrandom() > API in 3.5.2. But would you be happy if that interface is added to 3.6? > > It feels to me like the correct spelling in 3.6 should probably be > secrets.getrandom() or something related to that. secrets.token_bytes() is already the way to spell "get a string of messed-up bytes", and that's the dead obvious (according to me) place to add the potentially blocking implementation. Indeed, everything in the `secrets` module should block when the OS thinks that's needed. From donald at stufft.io Fri Jun 10 15:17:47 2016 From: donald at stufft.io (Donald Stufft) Date: Fri, 10 Jun 2016 15:17:47 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: > On Jun 10, 2016, at 3:05 PM, David Mertz wrote: > > OK. My understanding is that Guido ruled out introducing an os.getrandom() API in 3.5.2. But would you be happy if that interface is added to 3.6? > > It feels to me like the correct spelling in 3.6 should probably be secrets.getrandom() or something related to that. Well we have https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so adding a getrandom() function to secrets would largely be the same as that function. The problem of course is that the secrets library in 3.6 uses os.urandom under the covers, so it?s security rests on the security of os.urandom. To ensure that the secrets library is actually safe even in early boot it?ll need to stop using os.urandom on Linux and use the getrandom() function. That same library exposes random.SystemRandom as secrets.SystemRandom [1], and of course SystemRandom uses os.urandom too. So if we want people to treat secrets.SystemRandom as ?always secure? then it would need to stop using os.urandom and start using the get random() function on Linux as well. [1] This is actually documented as "using the highest-quality sources provided by the operating system? in the secrets documentation, and I?d argue that it is not using the highest-quality source if it?s reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() is available. Of course, it?s just an alias for random.SystemRandom, and that is documented as using os.urandom. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Jun 10 15:29:10 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 10 Jun 2016 12:29:10 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: I believe that secrets.token_bytes() and secrets.SystemRandom() should be changed even for 3.5.1 to use getrandom() on Linux. Thanks for fixing my spelling of the secrets API, Donald. :-) On Fri, Jun 10, 2016 at 12:17 PM, Donald Stufft wrote: > > On Jun 10, 2016, at 3:05 PM, David Mertz wrote: > > OK. My understanding is that Guido ruled out introducing an > os.getrandom() API in 3.5.2. But would you be happy if that interface is > added to 3.6? > > It feels to me like the correct spelling in 3.6 should probably be > secrets.getrandom() or something related to that. > > > > Well we have > https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so > adding a getrandom() function to secrets would largely be the same as that > function. > > The problem of course is that the secrets library in 3.6 uses os.urandom > under the covers, so it?s security rests on the security of os.urandom. To > ensure that the secrets library is actually safe even in early boot it?ll > need to stop using os.urandom on Linux and use the getrandom() function. > > That same library exposes random.SystemRandom as secrets.SystemRandom [1], > and of course SystemRandom uses os.urandom too. So if we want people to > treat secrets.SystemRandom as ?always secure? then it would need to stop > using os.urandom and start using the get random() function on Linux as well. > > > [1] This is actually documented as "using the highest-quality sources > provided by the operating system? in the secrets documentation, and I?d > argue that it is not using the highest-quality source if it?s reading from > /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() > is available. Of course, it?s just an alias for random.SystemRandom, and > that is documented as using os.urandom. > > ? > Donald Stufft > > > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Jun 10 15:29:55 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 10 Jun 2016 12:29:55 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: Ooops.... thinko there! Of course `secrets` won't exist in 3.5.1, so that's a 3.6 matter instead. On Fri, Jun 10, 2016 at 12:29 PM, David Mertz wrote: > I believe that secrets.token_bytes() and secrets.SystemRandom() should be > changed even for 3.5.1 to use getrandom() on Linux. > > Thanks for fixing my spelling of the secrets API, Donald. :-) > > On Fri, Jun 10, 2016 at 12:17 PM, Donald Stufft wrote: > >> >> On Jun 10, 2016, at 3:05 PM, David Mertz wrote: >> >> OK. My understanding is that Guido ruled out introducing an >> os.getrandom() API in 3.5.2. But would you be happy if that interface is >> added to 3.6? >> >> It feels to me like the correct spelling in 3.6 should probably be >> secrets.getrandom() or something related to that. >> >> >> >> Well we have >> https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so >> adding a getrandom() function to secrets would largely be the same as that >> function. >> >> The problem of course is that the secrets library in 3.6 uses os.urandom >> under the covers, so it?s security rests on the security of os.urandom. To >> ensure that the secrets library is actually safe even in early boot it?ll >> need to stop using os.urandom on Linux and use the getrandom() function. >> >> That same library exposes random.SystemRandom as secrets.SystemRandom >> [1], and of course SystemRandom uses os.urandom too. So if we want people >> to treat secrets.SystemRandom as ?always secure? then it would need to stop >> using os.urandom and start using the get random() function on Linux as well. >> >> >> [1] This is actually documented as "using the highest-quality sources >> provided by the operating system? in the secrets documentation, and I?d >> argue that it is not using the highest-quality source if it?s reading from >> /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() >> is available. Of course, it?s just an alias for random.SystemRandom, and >> that is documented as using os.urandom. >> >> ? >> Donald Stufft >> >> >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jun 10 15:33:45 2016 From: brett at python.org (Brett Cannon) Date: Fri, 10 Jun 2016 19:33:45 +0000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: On Fri, 10 Jun 2016 at 12:20 Donald Stufft wrote: > > On Jun 10, 2016, at 3:05 PM, David Mertz wrote: > > OK. My understanding is that Guido ruled out introducing an > os.getrandom() API in 3.5.2. But would you be happy if that interface is > added to 3.6? > > It feels to me like the correct spelling in 3.6 should probably be > secrets.getrandom() or something related to that. > > > > Well we have > https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so > adding a getrandom() function to secrets would largely be the same as that > function. > > The problem of course is that the secrets library in 3.6 uses os.urandom > under the covers, so it?s security rests on the security of os.urandom. To > ensure that the secrets library is actually safe even in early boot it?ll > need to stop using os.urandom on Linux and use the getrandom() function. > > That same library exposes random.SystemRandom as secrets.SystemRandom [1], > and of course SystemRandom uses os.urandom too. So if we want people to > treat secrets.SystemRandom as ?always secure? then it would need to stop > using os.urandom and start using the get random() function on Linux as well. > > > [1] This is actually documented as "using the highest-quality sources > provided by the operating system? in the secrets documentation, and I?d > argue that it is not using the highest-quality source if it?s reading from > /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems where getrandom() > is available. Of course, it?s just an alias for random.SystemRandom, and > that is documented as using os.urandom. > If that's the case then we should file a bug so we are sure this is the case and we need to decouple the secrets documentation from random so that they can operate independently with secrets always doing whatever is required to be as secure as possible. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at realpath.org Fri Jun 10 15:48:02 2016 From: sebastian at realpath.org (Sebastian Krause) Date: Fri, 10 Jun 2016 21:48:02 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: (David Mertz's message of "Fri, 10 Jun 2016 12:05:38 -0700") References: <57595210.4000508@hastings.org> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: David Mertz wrote: > It feels to me like the correct spelling in 3.6 should probably be > secrets.getrandom() or something related to that. Since there already is a secrets.randbits(k), I would keep the naming similar and suggest something like: secrets.randbytes(k, *, nonblock=False) With the argument "nonblock" you can control what happens when not enough entropy is available: It either blocks or (if nonblock=True) raises an exception. The third option, getting unsecure random data, is simply not available in this function. Then you can keep os.urandom() as it was in Python 3.4 and earlier, but update the documentation to better warn about its behavior and point developers to the secrets module. Sebastian From srkunze at mail.de Fri Jun 10 15:45:13 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 10 Jun 2016 21:45:13 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: <575B18C9.9090002@mail.de> On 10.06.2016 21:17, Donald Stufft wrote: > >> On Jun 10, 2016, at 3:05 PM, David Mertz > > wrote: >> >> OK. My understanding is that Guido ruled out introducing an >> os.getrandom() API in 3.5.2. But would you be happy if that >> interface is added to 3.6? >> >> It feels to me like the correct spelling in 3.6 should probably be >> secrets.getrandom() or something related to that. > I am not a security expert but your reply makes it clear to me. So, for me this makes: os -> os-dependent and because of this varies from os to os (also quality-wise) random -> pseudo-random, but it works for most non-critical use-cases secret -> that's for crypto If don't need crypto, secret would be a waste of resources, but if you need crypto, then os and random are unsafe. I think that's simple enough. At least, I would understand it. Just my 2 cents: if I need crypto, I would pay the price of blocking rather then to get an exception (what are my alternatives? I need those bits! ) or get unsecure bits. Sven > Well we have > https://docs.python.org/dev/library/secrets.html#secrets.token_bytes so adding > a getrandom() function to secrets would largely be the same as that > function. > > The problem of course is that the secrets library in 3.6 uses > os.urandom under the covers, so it?s security rests on the security of > os.urandom. To ensure that the secrets library is actually safe even > in early boot it?ll need to stop using os.urandom on Linux and use the > getrandom() function. > > That same library exposes random.SystemRandom as secrets.SystemRandom > [1], and of course SystemRandom uses os.urandom too. So if we want > people to treat secrets.SystemRandom as ?always secure? then it would > need to stop using os.urandom and start using the get random() > function on Linux as well. > > > [1] This is actually documented as "using the highest-quality sources > provided by the operating system? in the secrets documentation, and > I?d argue that it is not using the highest-quality source if it?s > reading from /dev/urandom or getrandom(GRD_NONBLOCK) on Linux systems > where getrandom() is available. Of course, it?s just an alias for > random.SystemRandom, and that is documented as using os.urandom. > > ? > Donald Stufft > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Fri Jun 10 15:55:04 2016 From: larry at hastings.org (Larry Hastings) Date: Fri, 10 Jun 2016 12:55:04 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: <575B1B18.9020502@hastings.org> On 06/10/2016 12:29 PM, David Mertz wrote: > I believe that secrets.token_bytes() and secrets.SystemRandom() should > be changed even for 3.5.1 to use getrandom() on Linux. Surely you meant 3.5.2? 3.5.1 shipped last December. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Fri Jun 10 15:57:01 2016 From: donald at stufft.io (Donald Stufft) Date: Fri, 10 Jun 2016 15:57:01 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: > On Jun 10, 2016, at 3:33 PM, Brett Cannon wrote: > > If that's the case then we should file a bug so we are sure this is the case and we need to decouple the secrets documentation from random so that they can operate independently with secrets always doing whatever is required to be as secure as possible. https://bugs.python.org/issue27288 ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at realpath.org Fri Jun 10 15:57:31 2016 From: sebastian at realpath.org (Sebastian Krause) Date: Fri, 10 Jun 2016 21:57:31 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: (Tim Peters's message of "Fri, 10 Jun 2016 14:16:30 -0500") References: <57595210.4000508@hastings.org> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: Tim Peters wrote: > secrets.token_bytes() is already the way to spell "get a string of > messed-up bytes", and that's the dead obvious (according to me) place > to add the potentially blocking implementation. I honestly didn't think that this was the dead obvious function to use. To me the naming kind of suggested that it would do some special magic that tokens needed, instead of just returning random bytes (even though the best token is probably just perfectly random data). If you want to provide a general function for secure random bytes I would suggest at least a better naming. Sebastian From mertz at gnosis.cx Fri Jun 10 16:01:12 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 10 Jun 2016 13:01:12 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575B1B18.9020502@hastings.org> References: <57595210.4000508@hastings.org> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> <575B1B18.9020502@hastings.org> Message-ID: On Fri, Jun 10, 2016 at 12:55 PM, Larry Hastings wrote: > On 06/10/2016 12:29 PM, David Mertz wrote: > > I believe that secrets.token_bytes() and secrets.SystemRandom() should be > changed even for 3.5.1 to use getrandom() on Linux. > > Surely you meant 3.5.2? 3.5.1 shipped last December. > Yeah, that combines a couple thinkos even. I had intended to write "for 3.5.2" ... but that is also an error, since the secrets module doesn't exist until 3.6. So yes, I think 3.5.2 should restore the 2.6-3.4 behavior of os.urandom(), and the NEW APIs in secrets should use the "best available randomness (even if it blocks)" Donald is correct that we have the spelling secrets.token_bytes() available in 3.6a1, so the spellings secrets.getrandom() or secrets.randbytes() are not needed. However, Sebastian's (adapted) suggestion to allow secrets.token_bytes(k, *, nonblock=False) as the signature makes sense to me (i.e. it's a choice of "block or raise exception", not an option to get non-crypto bytes). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Fri Jun 10 16:02:44 2016 From: larry at hastings.org (Larry Hastings) Date: Fri, 10 Jun 2016 13:02:44 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> References: <57595210.4000508@hastings.org> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: <575B1CE4.7030902@hastings.org> On 06/10/2016 11:55 AM, Donald Stufft wrote: > Ok, so you?re looking for how would you replicate the blocking > behavior of os.urandom that exists in 3.5.0 and 3.5.1? > > In that case, it?s hard. I don?t think linux provides any way to > externally determine if /dev/urandom has been initialized or not. > Probably the easiest thing to do would be to interface with the > getrandom() function using a c-ext, CFFI, or ctypes. If you?re looking > for a way of doing this without calling the getrandom() function.. I > believe the answer is you can?t. I'm certain you're correct: you can't perform any operation on /dev/urandom to determine whether or not the urandom device has been initialized. That's one of the reasons why Mr. Ts'o added getrandom()--you can use it to test exactly that (getrandom(GRND_NONBLOCK)). That's also why I proposed adding os.getrandom() in 3.5.2, to make it possible to block until urandom was initialized (without using ctypes etc as you suggest). However, none of the cryptography guys jumped up and said they wanted it, and in any case it was overruled by Guido, so we're not adding it to 3.5.2. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Jun 10 16:04:48 2016 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 10 Jun 2016 15:04:48 -0500 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: [Tim] >> secrets.token_bytes() is already the way to spell "get a string of >> messed-up bytes", and that's the dead obvious (according to me) place >> to add the potentially blocking implementation. [Sebastian Krause] > I honestly didn't think that this was the dead obvious function to > use. To me the naming kind of suggested that it would do some > special magic that tokens needed, instead of just returning random > bytes (even though the best token is probably just perfectly random > data). If you want to provide a general function for secure random > bytes I would suggest at least a better naming. There was ample bikeshedding over the names of `secrets` functions at the time. If token_bytes wasn't the obvious function to you, I suspect you have scant idea what _is_ in the `secrets` module. The naming is logical in context, where various "token_xxx" functions supply random-ish bytes in different formats. In that context, xxx=bytes is the obvious way to get raw bytes. From larry at hastings.org Fri Jun 10 16:06:45 2016 From: larry at hastings.org (Larry Hastings) Date: Fri, 10 Jun 2016 13:06:45 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> <575B1B18.9020502@hastings.org> Message-ID: <575B1DD5.4040305@hastings.org> On 06/10/2016 01:01 PM, David Mertz wrote: > So yes, I think 3.5.2 should restore the 2.6-3.4 behavior of os.urandom(), That makes... five of us I think ;-) (Larry Guido Barry Tim David) > and the NEW APIs in secrets should use the "best available randomness > (even if it blocks)" I'm not particular about how the new API is spelled. However, I do think os.getrandom() should be exposed as a thin wrapper over getrandom() in 3.6. That would permit Python programmers to take maximal advantage of the features offered by their platform. It would also permit the secrets module to continue to be written in pure Python. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Jun 10 16:11:59 2016 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Jun 2016 16:11:59 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: <20160610161159.7e4f1cce.barry@wooz.org> On Jun 10, 2016, at 12:05 PM, David Mertz wrote: >OK. My understanding is that Guido ruled out introducing an os.getrandom() >API in 3.5.2. But would you be happy if that interface is added to 3.6? I would. >It feels to me like the correct spelling in 3.6 should probably be >secrets.getrandom() or something related to that. ISTM that secrets is a somewhat higher level API while it makes sense that a fairly simple plumbing of the underlying C call should go in os. But I wouldn't argue much if folks had strong opinions to the contrary. Cheers, -Barry From ericsnowcurrently at gmail.com Fri Jun 10 16:19:37 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 10 Jun 2016 13:19:37 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee wrote: > Eric, have you any work in progress on compact dicts? Nope. I presume you are talking the proposal Raymond made a while back. -eric From ericsnowcurrently at gmail.com Fri Jun 10 16:25:10 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 10 Jun 2016 13:25:10 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: <575772E6.7040906@stoneleaf.us> Message-ID: On Fri, Jun 10, 2016 at 11:29 AM, Nick Coghlan wrote: > On 10 June 2016 at 09:42, Eric Snow wrote: >> On Thu, Jun 9, 2016 at 2:39 PM, Nick Coghlan wrote: >>> That restriction would be comparable to what we do with __slots__ today: >>> >>> >>> class C: >>> ... __slots__ = 1 >>> ... >>> Traceback (most recent call last): >>> File "", line 1, in >>> TypeError: 'int' object is not iterable >> >> Are you suggesting that we require it be a tuple of identifiers (or >> None) and raise TypeError otherwise, similar to __slots__? The >> difference is that __slots__ has specific type requirements that do >> not apply to __definition_order__, as well as a different purpose. >> __definition_order__ is about preserving definition-type info that we >> are currently throwing away. > > If we don't enforce the tuple-of-identifiers restriction at type > creation time, everyone that *doesn't* make it a tuple-of-identifiers > is likely to have a subtle compatibility bug with class decorators and > other code that assume the default tuple-of-identifiers format is the > only possible format (aside from None). To put it in PEP 484 terms: > regardless of what the PEP says, people are going to assume the type > of __definition_order__ is Optional[Tuple[str]], as that's going to > cover almost all class definitions they encounter. > > It makes sense to me to give class definitions and metaclasses the > opportunity to change the *content* of the definition order: "Use > these names in this order, not the names and order you would have > calculated by default". > > It doesn't make sense to me to give them an opportunity to change the > *form* of the definition order, since that makes it incredibly > difficult to consume correctly: "Sure, it's *normally* a > tuple-of-identifiers, but it *might* be a dictionary, or a complex > number, or a set, or whatever the class author decided to make it". > > By contrast, if the class machinery enforces Optional[Tuple[str]], > then it becomes a lot easier to consume reliably, and anyone violating > the constraint gets an immediate exception when defining the offending > class, rather than a potentially obscure exception from a class > decorator or other piece of code that assumes __definition_order__ > could only be None or a tuple of strings. That makes sense. I'll adjust the PEP (and the implementation). -eric From tytso at mit.edu Fri Jun 10 15:54:11 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Fri, 10 Jun 2016 15:54:11 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> Message-ID: <20160610195411.GA3932@thunk.org> I will observe that feelings have gotten a little heated, so without making any suggestions to how the python-dev community should decide things, let me offer some observations that might perhaps shed a little light, and perhaps dispell a little bit of the heat. As someone who has been working in security for a long time --- before I started getting paid to hack Linux full-time, worked on Kerberos, was on the Security Area Directorate of the IETF, where among other things I was one of the working group chairs for the IP Security (ipsec) working group --- I tend to cringe a bit when people talk about security in terms of absolutes. For example, the phrase "improving Python's security". Security is something that is best talked about given a specific threat environment, where the value of what you are trying to protect, the capabilities and resources of the attackers, etc., are all well known. This gets hard for those of us who work on infrastructure which can get used in many different arenas, and so that's something that applies both to the Linux Kernel and to C-Python, because how people will use the tools that we spend so much of our passion crafting is largely out of our control, and we may not even know how they are using it. As far as /dev/urandom is concerned, it's true that it doesn't block before it has been initialized. If you are a security academic who likes to write papers about how great you are at finding defects in other people's work. This is definitely a weakness. Is it a fatal weakness? Well, first of all, on most server and desktop deployments, we save 1 kilobyte or so of /dev/urandom output during the shutdown sequence, and immediately after the init scripts are completed. This saved entropy is then piped back into /dev/random infrastructure and used initialized /dev/random and /dev/urandom very early in the init scripts. On a freshly instaled machine, this won't help, true, but in practice, on most systems, /dev/urandom will get initialized from interrupt timing sampling within a few seconds after boot. For example, on a sample Google Compute Engine VM which is booted into Debian and then left idle, /dev/urandom was initialized within 2.8 seconds after boot, while the root file system was remounted read-only 1.6 seconds after boot. So even on Python pre-3.5.0, realistically speaking, the "weakness" of os.random would only be an issue (a) if it is run within the first few seconds of boot, and (b) os.random is used to directly generate a long-term cryptographic secret. If you are fork openssl or ssh-keygen to generate a public/private keypair, then you aren't using os.random. Furthermore, if you are running on a modern x86 system with RDRAND, you'll also be fine, because we mix in randomness from the CPU chip via the RDRAND instruction. So this whole question of whether os.random should block *is* important in certain very specific cases, and if you are generating long-term cryptogaphic secrets in Python, maybe you should be worrying about that. But to be honest, there are lots of other things you should be worrying about as well, and I would hope that people writing cryptographic code would be asking questions of how the random nunmber stack is working, not just at the C-Python interpretor level, but also at the OS level. My preference would be that os.random should block, because the odds that people would be trying to generate long-term cryptographic secrets within seconds after boot is very small, and if you *do* block for a second or two, it's not the end of the world. The problem that triggered this was specifically because systemd was trying to use C-Python very early in the boot process to initialize the SIPHASH used for the dictionary, and it's not clear that really needed to be extremely strong because it wasn't a long-term cryptogaphic secret --- certainly not how systemd was using that specific script! The reason why I think blocking is better is that once you've solved the "don't hang the VM for 90 seconds until python has started up", someone who is using os.random will almost certainly not be on the blocking path of the system boot sequence, and so blocking for 2 seconds before generating a long-term cryptographic secret is not the end of the world. And if it does block by accident, in a security critical scenario it will hopefully force the progammer to think, and and in a non-security critical scenario, it should be easy to switch to either a totally non-blocking interface, or switch to a pseudo-random interface hwich is more efficient. *HOWEVER*, on the flip side, if os.random *doesn't* block, in 99.999% percent of the cases, the python script that is directly generating a long-term secret will not be started 1.2 seconds after the root file system is remounted read/write, so it is *also* not the end of the world. Realistically speaking, we do know which processes are likely to be generating long-term cryptographic secrets imnmediately after boot, and they'll most likely be using progams like openssl or openssh-keygen, to actually generate the cryptogaphic key, and in both of those places, (a) it's there problem to get it right, and (b) blocking for two seconds is a completely reasonable thing to do, and they will probably do it, so we're fine. So either way, I think it will be fine. I may have a preference, but if Python choses another path, all will be well. There is an old saying that Academic politics are often so passionate because the stakes are so small. It may be that one of the reasons why this topic has been so passionate is precisely because of Sayre's Law. Peace, - Ted From mal at egenix.com Fri Jun 10 16:30:29 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 10 Jun 2016 22:30:29 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> References: <57595210.4000508@hastings.org> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> Message-ID: <575B2365.7050200@egenix.com> On 10.06.2016 20:55, Donald Stufft wrote: > Ok, so you?re looking for how would you replicate the blocking behavior of os.urandom that exists in 3.5.0 and 3.5.1? > > In that case, it?s hard. I don?t think linux provides any way to externally determine if /dev/urandom has been initialized or not. Probably the easiest thing to do would be to interface with the getrandom() function using a c-ext, CFFI, or ctypes. If you?re looking for a way of doing this without calling the getrandom() function.. I believe the answer is you can?t. Well, you can see the effect by running Python early in the boot process. See e.g. http://bugs.python.org/issue26839#msg267749 and if you look at the system log file, you'll find a notice entry "random: %s pool is initialized" which gets written once the pool is initialized: http://lxr.free-electrons.com/source/drivers/char/random.c#L684 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 10 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From tjreedy at udel.edu Fri Jun 10 16:40:16 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Jun 2016 16:40:16 -0400 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: <20160610132051.GH27919@ando.pearwood.info> <1465569266.4029.43.camel@redhat.com> Message-ID: On 6/10/2016 12:09 PM, Victor Stinner wrote: > 2016-06-10 17:09 GMT+02:00 Paul Moore : >> Also, the way people commonly use >> micro-benchmarks ("hey, look, this way of writing the expression goes >> faster than that way") doesn't really address questions like "is the >> difference statistically significant". > > If you use the "python3 -m perf compare method1.json method2.json", > perf will checks that the difference is significant using the > is_significant() method: > http://perf.readthedocs.io/en/latest/api.html#perf.is_significant > "This uses a Student?s two-sample, two-tailed t-test with alpha=0.95." Depending on the sampling design, a matched-pairs t-test may be more appropriate. -- Terry Jan Reedy From robertc at robertcollins.net Fri Jun 10 16:51:06 2016 From: robertc at robertcollins.net (Robert Collins) Date: Sat, 11 Jun 2016 08:51:06 +1200 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: <20160610132051.GH27919@ando.pearwood.info> <1465569266.4029.43.camel@redhat.com> Message-ID: On 11 June 2016 at 04:09, Victor Stinner wrote: ..> We should design a CLI command to do timeit+compare at once. http://judge.readthedocs.io/en/latest/ might offer some inspiration There's also ministat - https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+8-current&format=html From larry at hastings.org Fri Jun 10 17:06:29 2016 From: larry at hastings.org (Larry Hastings) Date: Fri, 10 Jun 2016 14:06:29 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160610195411.GA3932@thunk.org> References: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <20160610195411.GA3932@thunk.org> Message-ID: <575B2BD5.4050209@hastings.org> On 06/10/2016 12:54 PM, Theodore Ts'o wrote: > So even on Python pre-3.5.0, realistically speaking, the "weakness" of > os.random would only be an issue (a) if it is run within the first few > seconds of boot, and (b) os.random is used to directly generate a > long-term cryptographic secret. If you are fork openssl or ssh-keygen > to generate a public/private keypair, then you aren't using os.random. Just a gentle correction: wherever Mr. Ts'o says "os.random", he means "os.urandom()". We don't have an "os.random" in Python. My thanks to today's celebrity guest correspondent, Mr. Theodore Ts'o! //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Fri Jun 10 17:13:13 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Fri, 10 Jun 2016 17:13:13 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Fri Jun 10 17:14:50 2016 From: random832 at fastmail.com (Random832) Date: Fri, 10 Jun 2016 17:14:50 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160610195411.GA3932@thunk.org> References: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <20160610195411.GA3932@thunk.org> Message-ID: <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> On Fri, Jun 10, 2016, at 15:54, Theodore Ts'o wrote: > So even on Python pre-3.5.0, realistically speaking, the "weakness" of > os.random would only be an issue (a) if it is run within the first few > seconds of boot, and (b) os.random is used to directly generate a > long-term cryptographic secret. If you are fork openssl or ssh-keygen > to generate a public/private keypair, then you aren't using os.random. So, I have a question. If this "weakness" in /dev/urandom is so unimportant to 99% of situations... why isn't there a flag that can be passed to getrandom() to allow the same behavior? From tim.peters at gmail.com Fri Jun 10 17:21:38 2016 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 10 Jun 2016 16:21:38 -0500 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> References: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <20160610195411.GA3932@thunk.org> <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> Message-ID: [Random832] > So, I have a question. If this "weakness" in /dev/urandom is so > unimportant to 99% of situations... why isn't there a flag that can be > passed to getrandom() to allow the same behavior? Isn't that precisely the purpose of the GRND_NONBLOCK flag? http://man7.org/linux/man-pages/man2/getrandom.2.html From victor.stinner at gmail.com Fri Jun 10 17:22:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 10 Jun 2016 23:22:42 +0200 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: Message-ID: 2016-06-10 20:47 GMT+02:00 Meador Inge : > Apologies in advance if this is answered in one of the links you posted, but > out of curiosity was geometric mean considered? > > In the compiler world this is a very common way of aggregating performance > results. FYI I chose to store all timings in the JSON file. So later, you are free to recompute the average differently, compute other statistics, etc. I saw that the CPython benchmark suite has an *option* to compute the geometric mean. I don't understand well the difference with the arithmeric mean. Is the geometric mean recommended to aggregate results of different (unrelated) benchmarks, or also even for multuple runs of a single benchmark? Victor From donald at stufft.io Fri Jun 10 17:28:28 2016 From: donald at stufft.io (Donald Stufft) Date: Fri, 10 Jun 2016 17:28:28 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <20160610195411.GA3932@thunk.org> <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> Message-ID: <6523337E-0764-42C2-B637-575DBC7B8561@stufft.io> > On Jun 10, 2016, at 5:21 PM, Tim Peters wrote: > > Isn't that precisely the purpose of the GRND_NONBLOCK flag? It doesn?t behave exactly the same as /dev/urandom. If the pool hasn?t been initialized yet /dev/urandom will return possibly predictable data whereas getrandom(GRND_NONBLOCK) will EAGAIN. ? Donald Stufft From tjreedy at udel.edu Fri Jun 10 17:37:31 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Jun 2016 17:37:31 -0400 Subject: [Python-Dev] Cutoff time for patches for upcoming releases Message-ID: A question for each of the three release managers: when is the earliest that you might tag your release and cutoff submission of further patches for the release? 2.7.12 ('6-12')? 3.5.2 ('6-12')? 3.6.0a2 ('6-13')? -- Terry Jan Reedy From victor.stinner at gmail.com Fri Jun 10 18:06:31 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 11 Jun 2016 00:06:31 +0200 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: <20160610132051.GH27919@ando.pearwood.info> <20160610170453.GI27919@ando.pearwood.info> Message-ID: Hi, 2016-06-10 20:37 GMT+02:00 Kevin Modzelewski via Python-Dev : > Hi all, I wrote a blog post about this. > http://blog.kevmod.com/2016/06/benchmarking-minimum-vs-average/ Oh nice, it's even better to have different articles to explain the problem of using the minimum ;-) It added it to my doc. > We can rule out any argument that one (minimum or average) is strictly > better than the other, since there are cases that make either one better. > It comes down to our expectation of the underlying distribution. Ah? In which cases do you prefer to use the minimum? Are you able to get reliable benchmark results when using the minimum? > Victor if you could calculate the sample skewness of your results I think > that would be very interesting! I'm good to copy/paste code, but less to compute statistics :-) Would be interesed to write a pull request, or at least to send me a function computing the expected value? https://github.com/haypo/perf Victor From nad at python.org Fri Jun 10 18:23:36 2016 From: nad at python.org (Ned Deily) Date: Fri, 10 Jun 2016 18:23:36 -0400 Subject: [Python-Dev] Reminder: 3.6.0a2 snapshot 2016-06-13 12:00 UTC Message-ID: <23B6CAA5-6E07-4F2B-898F-B9EABF8E9BD0@python.org> Just a quick reminder that the next alpha snapshot for the 3.6 release cycle is coming up in a couple of days. This is the second of four alphas we have planned. Alpha 2 follows the development sprints at the PyCon US 2016 in Portland. Thanks to all of you who were able to be there and contribute! And to all of you who continue to contribue from afar. While there are still plenty of proposed patches awaiting review, nearly 300 commits have been pushed to the default branch (for 3.6.0) in the four weeks since alpha 1. As a reminder, alpha releases are intended to make it easier for the wider community to test the current state of new features and bug fixes for an upcoming Python release as a whole and for us to test the release process. During the alpha phase, features may be added, modified, or deleted up until the start of the beta phase. Alpha users beware! Also note that Larry has announced plans to do a 3.5.2 release candidate sometime this weekend and Benjamin plans to do a 2.7.12 release candidate. So get important maintenance release fixes in ASAP. Looking ahead, the next alpha release, 3.6.0a3, will follow in about a month on 2016-07-11. 2016-06-13 ~12:00 UTC: code snapshot for 3.6.0 alpha 1 now to 2016-09-07: Alpha phase (unrestricted feature development) 2016-09-07: 3.6.0 feature code freeze, 3.7.0 feature development begins 2016-09-07 to 2016-12-04: 3.6.0 beta phase (bug and regression fixes, no new features) 2016-12-04 3.6.0 release candidate 1 (3.6.0 code freeze) 2016-12-16 3.6.0 release (3.6.0rc1 plus, if necessary, any dire emergency fixes) --Ned P.S. Just to be clear, this upcoming alpha snapshot will *not* contain a resolution for 3.6.0 of the current on-going discussions about the behavior of os.urandom(), the secrets module, and friends (Issue26839, Issue27288, et al). I think the focus should be on getting 3.5.2 settled and then we can decide on and implement any changes for 3.6.0 in an upcoming alpha prior to beta 1. https://www.python.org/dev/peps/pep-0494/ -- Ned Deily nad at python.org -- [] From neil at python.ca Fri Jun 10 19:36:24 2016 From: neil at python.ca (Neil Schemenauer) Date: Fri, 10 Jun 2016 23:36:24 +0000 (UTC) Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: Nick Coghlan wrote: > It could be very interesting to add an "ascii-warn" codec to Python > 2.7, and then set that as the default encoding when the -3 flag is > set. I don't think that can work. The library code in Python would spew out warnings even in the cases when nothing is wrong with the application code. I think warnings have to be added to a Python where str and bytes have been properly separated. Without extreme backporting efforts, that means 3.x. We don't want to saddle 3.x with a bunch of backwards compatibility cruft. Maybe some of my runtime warning changes could be merged using a command line flag to enable them. It would be nice to have the stepping stone version just be normal 3.x with a command line option. However, for the sanity of people maintaining 3.x, I think perhaps we don't want to do it. From steve at pearwood.info Fri Jun 10 21:35:56 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jun 2016 11:35:56 +1000 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: <20160610132051.GH27919@ando.pearwood.info> <20160610170453.GI27919@ando.pearwood.info> Message-ID: <20160611013555.GJ27919@ando.pearwood.info> On Sat, Jun 11, 2016 at 12:06:31AM +0200, Victor Stinner wrote: > > Victor if you could calculate the sample skewness of your results I think > > that would be very interesting! > > I'm good to copy/paste code, but less to compute statistics :-) Would > be interesed to write a pull request, or at least to send me a > function computing the expected value? > https://github.com/haypo/perf I have some code and tests for calculating (population and sample) skewness and kurtosis. Do you think it will be useful to add it to the statistics module? I can polish it up and aim to have it ready by 3.6.0 alpha 4. -- Steve From steve at pearwood.info Fri Jun 10 21:45:49 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jun 2016 11:45:49 +1000 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: Message-ID: <20160611014549.GK27919@ando.pearwood.info> On Fri, Jun 10, 2016 at 11:22:42PM +0200, Victor Stinner wrote: > 2016-06-10 20:47 GMT+02:00 Meador Inge : > > Apologies in advance if this is answered in one of the links you posted, but > > out of curiosity was geometric mean considered? > > > > In the compiler world this is a very common way of aggregating performance > > results. > > FYI I chose to store all timings in the JSON file. So later, you are > free to recompute the average differently, compute other statistics, > etc. > > I saw that the CPython benchmark suite has an *option* to compute the > geometric mean. I don't understand well the difference with the > arithmeric mean. > > Is the geometric mean recommended to aggregate results of different > (unrelated) benchmarks, or also even for multuple runs of a single > benchmark? The Wikipedia article discusses this, but sits on the fence and can't decide whether using the gmean for performance results is a good or bad idea: https://en.wikipedia.org/wiki/Geometric_mean#Properties Geometric mean is usually used in finance for averaging rates of growth: https://www.math.toronto.edu/mathnet/questionCorner/geomean.html If you express your performances as speeds (as "calculations per second") then the harmonic mean is the right way to average them. -- Steve From benjamin at python.org Fri Jun 10 23:45:41 2016 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 10 Jun 2016 20:45:41 -0700 Subject: [Python-Dev] Cutoff time for patches for upcoming releases In-Reply-To: References: Message-ID: <1465616741.1960720.634423329.39FF561D@webmail.messagingengine.com> 2016-06-11 18:00 UTC On Fri, Jun 10, 2016, at 14:37, Terry Reedy wrote: > A question for each of the three release managers: > when is the earliest that you might tag your release and > cutoff submission of further patches for the release? > > 2.7.12 ('6-12')? > > 3.5.2 ('6-12')? > > 3.6.0a2 ('6-13')? > > -- > Terry Jan Reedy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jun 11 03:40:14 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jun 2016 17:40:14 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <87lh2dycuo.fsf@vostro.rath.org> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> Message-ID: <20160611074013.GL27919@ando.pearwood.info> On Thu, Jun 09, 2016 at 07:52:31PM -0700, Nikolaus Rath wrote: > On Jun 09 2016, Guido van Rossum wrote: > > I don't think we should add a new function. I think we should convince > > ourselves that there is not enough of a risk of an exploit even if > > os.urandom() falls back. > > That will be hard, because you have to consider an active, clever > adversary. We know that there are exploitable bugs from Linux systems due to urandom, e.g. the Raspberry Pi bug referenced elsewhere in this thread. https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892 > On the other hand, convincing yourself that in practice os.urandom would > never block unless the setup is super exotic or there is active > maliciousness seems much easier. Not that super exotic. In my day job, I've seen processes hang for five or ten minutes during boot up, waiting for the OS to collect enough entropy, although this was not recently and it wasn't involving Python. But VMs or embedded devices may take a long time to generate entropy. If the device doesn't have a hardware source of randomness, and isn't connected to an external source of noise like networking or a user who habitually fiddles with the mouse, it might take a very long time indeed to gather entropy... If I have understood the concensus, I think we're on the right track: (1) os.urandom should do whatever the OS says it should do, which on Linux is fall back on pseudo-random bytes when the entropy pool hasn't be initialised yet. It won't block and won't raise. (2) os.getrandom will be added to 3.6, and it will block, or possibly raise, whichever the caller specifies. (3) The secrets module in 3.6 will stop relying on os.urandom, and use os.getrandom. It may provide a switch to choose between blocking and non-blocking (raise an exception) behaviour. It WON'T fall back to predictable non-crypto bytes (unless the OS itself is completely broken). (4) random will continue to seed itself from os.urandom, because it doesn't care if urandom provides degraded randomness. It just needs to be better than using the time as seed. (5) What about random.SysRandom? I think it should use os.getrandom. (6) A bunch of stuff will happen to make the hash randomisation not break when systemd runs Python scripts early in the boot process, but I haven't been paying attention to that part :-) Is this a good summary of where we are at? -- Steve From steve at pearwood.info Sat Jun 11 03:49:43 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jun 2016 17:49:43 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575B1DD5.4040305@hastings.org> References: <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> <575B1B18.9020502@hastings.org> <575B1DD5.4040305@hastings.org> Message-ID: <20160611074943.GM27919@ando.pearwood.info> On Fri, Jun 10, 2016 at 01:06:45PM -0700, Larry Hastings wrote: > > On 06/10/2016 01:01 PM, David Mertz wrote: > >So yes, I think 3.5.2 should restore the 2.6-3.4 behavior of os.urandom(), > > That makes... five of us I think ;-) (Larry Guido Barry Tim David) > > > >and the NEW APIs in secrets should use the "best available randomness > >(even if it blocks)" > > I'm not particular about how the new API is spelled. However, I do > think os.getrandom() should be exposed as a thin wrapper over > getrandom() in 3.6. That would permit Python programmers to take > maximal advantage of the features offered by their platform. It would > also permit the secrets module to continue to be written in pure Python. A big +1 for that. Will there be platforms where os.getrandom doesn't exist? If not, then secrets can just rely on it, otherwise what should it do? if hasattr(os, 'getrandom'): return os.getrandom(n) else: # Fail? Fall back on os.urandom? -- Steve From larry at hastings.org Sat Jun 11 04:24:15 2016 From: larry at hastings.org (Larry Hastings) Date: Sat, 11 Jun 2016 01:24:15 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160611074943.GM27919@ando.pearwood.info> References: <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> <575B1B18.9020502@hastings.org> <575B1DD5.4040305@hastings.org> <20160611074943.GM27919@ando.pearwood.info> Message-ID: <575BCAAF.5000009@hastings.org> On 06/11/2016 12:49 AM, Steven D'Aprano wrote: > Will there be platforms where os.getrandom doesn't exist? If not, then > secrets can just rely on it, otherwise what should it do? > > if hasattr(os, 'getrandom'): > return os.getrandom(n) > else: > # Fail? Fall back on os.urandom? AFAIK: * Only Linux and Solaris have getrandom() right now. IIUC Solaris duplicated Linux's API, but I don't know that for certain, and I don't know in particular what GRND_RANDOM does on Solaris. (Of course, you don't need GRND_RANDOM for secrets.token_bytes().) * Only Linux and OS X have never-blocking /dev/urandom. On Linux, you can choose to block by calling getrandom(). On OS X you have no choice, you can only use the never-blocking /dev/urandom. (OS X also has a /dev/random but it behaves identically to /dev/urandom.) OS X's man page reassuringly claims blocking is never necessary; the blogosphere disagrees. If I were writing the function for the secrets module, I'd write it like you have above: call os.getrandom() if it's present, and os.urandom() if it isn't. I believe that achieves current-best-practice everywhere: it does the right thing on Linux, it does the right thing on Solaris, it does the right thing on all the other OSes where reading from /dev/urandom can block, and it uses the only facility available to us on OS X. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jun 11 04:24:42 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jun 2016 18:24:42 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> Message-ID: <20160611082437.GN27919@ando.pearwood.info> On Fri, Jun 10, 2016 at 11:42:40AM -0700, Chris Jerdonek wrote: > And going back to Larry's original e-mail, where he said-- > > On Thu, Jun 9, 2016 at 4:25 AM, Larry Hastings wrote: > > THE PROBLEM > > ... > > The issue author had already identified the cause: CPython was blocking on > > getrandom() in order to initialize hash randomization. On this fresh > > virtual machine the entropy pool started out uninitialized. And since the > > only thing running on the machine was CPython, and since CPython was blocked > > on initialization, the entropy pool was initializing very, very slowly. > > it seems to me that you'd want such a solution to have code that > causes the initialization of the entropy pool to be sped up so that it > happens as quickly as possible (if that is even possible). Is it > possible? (E.g. by causing the machine to start doing things other > than just CPython?) I don't think that's something which the Python interpreter ought to do for you, but you can write to /dev/urandom or /dev/random (both keep their own, separate, entropy pools): open("/dev/urandom", "w").write("hello world") But of course there's the question of where you're going to get a source of noise to write to the file. While it's (probably?) harmless to write a hard-coded string to it, I don't think its going to give you much entropy. -- Steve From sebastian at realpath.org Sat Jun 11 07:00:48 2016 From: sebastian at realpath.org (Sebastian Krause) Date: Sat, 11 Jun 2016 13:00:48 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160611082437.GN27919@ando.pearwood.info> (Steven D'Aprano's message of "Sat, 11 Jun 2016 18:24:42 +1000") References: <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <20160611082437.GN27919@ando.pearwood.info> Message-ID: Steven D'Aprano wrote: >> it seems to me that you'd want such a solution to have code that >> causes the initialization of the entropy pool to be sped up so that it >> happens as quickly as possible (if that is even possible). Is it >> possible? (E.g. by causing the machine to start doing things other >> than just CPython?) > > I don't think that's something which the Python interpreter ought to do > for you, but you can write to /dev/urandom or /dev/random (both keep > their own, separate, entropy pools): There are projects like http://www.issihosts.com/haveged/ that use some tiny timing fluctuations in CPUs to feed the entropy pool and which are available in most Linux distributions. But as you said, that is something completely outside of Python's scope. From g.rodola at gmail.com Sat Jun 11 07:53:29 2016 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sat, 11 Jun 2016 13:53:29 +0200 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: References: Message-ID: On Fri, Jun 10, 2016 at 1:13 PM, Victor Stinner wrote: > Hi, > > Last weeks, I made researchs on how to get stable and reliable > benchmarks, especially for the corner case of microbenchmarks. The > first result is a serie of article, here are the first three: > > https://haypo.github.io/journey-to-stable-benchmark-system.html > https://haypo.github.io/journey-to-stable-benchmark-deadcode.html > https://haypo.github.io/journey-to-stable-benchmark-average.html > > The second result is a new perf module which includes all "tricks" > discovered in my research: compute average and standard deviation, > spawn multiple worker child processes, automatically calibrate the > number of outter-loop iterations, automatically pin worker processes > to isolated CPUs, and more. > > The perf module allows to store benchmark results as JSON to analyze > them in depth later. It helps to configure correctly a benchmark and > check manually if it is reliable or not. > > The perf documentation also explains how to get stable and reliable > benchmarks (ex: how to tune Linux to isolate CPUs). > > perf has 3 builtin CLI commands: > > * python -m perf: show and compare JSON results > * python -m perf.timeit: new better and more reliable implementation of > timeit > * python -m metadata: display collected metadata > > Python 3 is recommended to get time.perf_counter(), use the new > accurate statistics module, automatic CPU pinning (I will implement it > on Python 2 later), etc. But Python 2.7 is also supported, fallbacks > are implemented when needed. > > Example with the patched telco benchmark (benchmark for the decimal > module) on a Linux with two isolated CPUs. > > First run the benchmark: > --- > $ python3 telco.py --json-file=telco.json > ......................... > Average: 26.7 ms +- 0.2 ms > --- > > > Then show the JSON content to see all details: > --- > $ python3 -m perf -v show telco.json > Metadata: > - aslr: enabled > - cpu_affinity: 2, 3 > - cpu_count: 4 > - cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz > - hostname: smithers > - loops: 10 > - platform: Linux-4.4.9-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three > - python_executable: /usr/bin/python3 > - python_implementation: cpython > - python_version: 3.4.3 > > Run 1/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.8 ms, 26.7 ms > Run 2/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms > Run 3/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.9 ms, 26.8 ms > (...) > Run 25/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms > > Average: 26.7 ms +- 0.2 ms (25 runs x 3 samples; 1 warmup) > --- > > Note: benchmarks can be analyzed with Python 2. > > I'm posting my email to python-dev because providing timeit results is > commonly requested in review of optimization patches. > > The next step is to patch the CPython benchmark suite to use the perf > module. I already forked the repository and started to patch some > benchmarks. > > If you are interested by Python performance in general, please join us > on the speed mailing list! > https://mail.python.org/mailman/listinfo/speed > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com > This is very interesting and also somewhat related to psutil. I wonder... would increasing process priority help isolating benchmarks even more? By this I mean "os.nice(-20)". Extra: perhaps even IO priority: https://pythonhosted.org/psutil/#psutil.Process.ionice ? -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Sat Jun 11 08:40:45 2016 From: christian at python.org (Christian Heimes) Date: Sat, 11 Jun 2016 14:40:45 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> Message-ID: On 2016-06-10 20:42, Chris Jerdonek wrote: > On Fri, Jun 10, 2016 at 11:29 AM, David Mertz wrote: >> This is fairly academic, since I do not anticipate needing to do this >> myself, but I have a specific question. I'll assume that Python 3.5.2 will >> go back to the 2.6-3.4 behavior in which os.urandom() never blocks on Linux. >> Moreover, I understand that the case where the insecure bits might be >> returned are limited to Python scripts that run on system initialization on >> Linux. >> >> If I *were* someone who needed to write a Linux system initialization script >> using Python 3.5.2, what would the code look like. I think for this use >> case, requiring something with a little bit of "code smell" is fine, but I >> kinda hope it exists at all. > > Good question. And going back to Larry's original e-mail, where he said-- > > On Thu, Jun 9, 2016 at 4:25 AM, Larry Hastings wrote: >> THE PROBLEM >> ... >> The issue author had already identified the cause: CPython was blocking on >> getrandom() in order to initialize hash randomization. On this fresh >> virtual machine the entropy pool started out uninitialized. And since the >> only thing running on the machine was CPython, and since CPython was blocked >> on initialization, the entropy pool was initializing very, very slowly. I repeat for like the fifth time: os.urandom() and Python startup are totally unrelated. They just happen to use the same internal function to set the hash randomization state. The startup problem can be solved without f... up the security properties of os.urandom(). The correct questions to ask are: 1) Does hash randomization for bytes, text and XML always require cryptographically strong random values from a potentially blocking CPRNG? 2) Does the initial state of the Mersenne-Twister of the default random.Random instance really need cryptographically strong values? 3) Should os.urandom() always use the best CSPRNG source available and make sure it never returns weak, predictable values (when possible)? The answers are: 1) No 2) No 3) HELL YES! If you think that the answer to 3 is "No" and that a CSPRNG is permitted to return predictable values, then you are *by definition* ineligible to vote on security issues. Christian From victor.stinner at gmail.com Sat Jun 11 10:37:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 11 Jun 2016 16:37:44 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> Message-ID: > I repeat for like the fifth time: So, is there a candidate to write a PEP? I didn't read the thread. As expected, the discussion restarted for the 3rd time, there are almost 100 emails in this thread. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Sat Jun 11 10:56:16 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Sat, 11 Jun 2016 15:56:16 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575BCAAF.5000009@hastings.org> References: <58F60D6A-4840-4A7F-8BA5-065356770036@stufft.io> <0D09AC01-10B1-4577-AAEF-F1582ABAD8F7@stufft.io> <575B1B18.9020502@hastings.org> <575B1DD5.4040305@hastings.org> <20160611074943.GM27919@ando.pearwood.info> <575BCAAF.5000009@hastings.org> Message-ID: <7C877B5C-0410-413F-8589-DFFF48792BBD@lukasa.co.uk> > On 11 Jun 2016, at 09:24, Larry Hastings wrote: > Only Linux and OS X have never-blocking /dev/urandom. On Linux, you can choose to block by calling getrandom(). On OS X you have no choice, you can only use the never-blocking /dev/urandom. (OS X also has a /dev/random but it behaves identically to /dev/urandom.) OS X's man page reassuringly claims blocking is never necessary; the blogosphere disagrees. > If I were writing the function for the secrets module, I'd write it like you have above: call os.getrandom() if it's present, and os.urandom() if it isn't. I believe that achieves current-best-practice everywhere: it does the right thing on Linux, it does the right thing on Solaris, it does the right thing on all the other OSes where reading from /dev/urandom can block, and it uses the only facility available to us on OS X. Sorry Larry, but as far as I know this is misleading (it?s not *wrong*, but it suggests that OS X?s /dev/urandom is the same as Linux?s, which is emphatically not true). I?ve found the discussion around OS X?s random devices to be weirdly abstract, given that the source code for it is public, so I went and took a look. My initial reading of it (and, to be clear, this is a high-level read of a codebase I don?t know well, so please take this with the grain of salt that is intended) is that the operating system literally will not boot without at least 128 bits of entropy to read from the EFI boot loader. In the absence of 128 bits of entropy the kernel will panic, rather than continue to boot. Generally speaking that entropy will come from RDRAND, given the restrictions on where OS X can be run (Intel CPUs for real OS X, virtualised on top of OS X, and so on top of Intel CPUs, for VMs), which imposes a baseline on the quality of the entropy you can get. Assuming that OS X is being run in a manner that is acceptable from the perspective of its license agreement (and we can all agree that no-one would violate the terms of OS X?s license agreement, right?), I think it?s reasonable to assume that OS X, either virtualised or not, is getting 128 bits of somewhat sensible entropy from the boot loader/CPU before it boots. That means we can say this about OS X?s /dev/urandom: the reason it never blocks is because the situation of ?not enough entropy to generate good random numbers? is synonymous with ?not enough entropy to boot the OS?. So maybe we can stop casting aspersions on OS X?s RNG now. Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From yan12125 at gmail.com Sat Jun 11 01:59:49 2016 From: yan12125 at gmail.com (Chi Hsuan Yen) Date: Sat, 11 Jun 2016 13:59:49 +0800 Subject: [Python-Dev] Current Python 3.2 status? Message-ID: Hello all, Georg said in February that 3.2.7 is going to be released, and now it's June. Will it ever be released? pip [2], virtualenv [3] and setuptools [4] have all dropped Python 3.2 support, and there's no new commits since 2016/01/15 on CPython's 3.2 branch. I'd like to know CPython's attitude against Python 3.2. Is it still maintained? Or is it dead? [1] https://mail.python.org/pipermail/python-dev/2016-February/143300.html [2] https://github.com/pypa/pip/commit/b11cb019a47ff0cf3d8a37a0c89d8ae4cf25282f [3] https://github.com/pypa/virtualenv/commit/8132fa3a826ff1ba0c0c065563b9733c2e5a5b6c [4] https://github.com/pypa/setuptools/commit/ae6c73f07680da77345f5ccfac4facde30ad4d7e -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Sat Jun 11 11:08:54 2016 From: christian at python.org (Christian Heimes) Date: Sat, 11 Jun 2016 17:08:54 +0200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> Message-ID: <162d6d54-c6ac-bf60-5912-7076e8a07261@python.org> On 2016-06-11 16:37, Victor Stinner wrote: >> I repeat for like the fifth time: > > So, is there a candidate to write a PEP? > > I didn't read the thread. As expected, the discussion restarted for the > 3rd time, there are almost 100 emails in this thread. Sorry, I'm out. I simply lack the necessary strength and mental energy to persuade the issue any further. Donald Stufft just forwarded a quote that resonates with my current state of mind (replace 'lists' with 'current topic'): "I feel I no longer possess either the necessary strength or perhaps the necessary faith to continue rolling the stone of Sisyphus against the forces of reaction which are triumphing everywhere. I am therefore retiring from the lists, and ask if my dear contemporaries only one thing ? oblivion." Christian From guido at python.org Sat Jun 11 11:34:20 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 08:34:20 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160611074013.GL27919@ando.pearwood.info> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> Message-ID: In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for - blocking - raising an exception - weaker random bits To those still upset by the decision, please read Ted Ts'o's message. On Saturday, June 11, 2016, Steven D'Aprano wrote: > On Thu, Jun 09, 2016 at 07:52:31PM -0700, Nikolaus Rath wrote: > > On Jun 09 2016, Guido van Rossum > > wrote: > > > I don't think we should add a new function. I think we should convince > > > ourselves that there is not enough of a risk of an exploit even if > > > os.urandom() falls back. > > > > That will be hard, because you have to consider an active, clever > > adversary. > > We know that there are exploitable bugs from Linux systems due to > urandom, e.g. the Raspberry Pi bug referenced elsewhere in this thread. > > https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=126892 > > > > On the other hand, convincing yourself that in practice os.urandom would > > never block unless the setup is super exotic or there is active > > maliciousness seems much easier. > > Not that super exotic. In my day job, I've seen processes hang for five > or ten minutes during boot up, waiting for the OS to collect enough > entropy, although this was not recently and it wasn't involving Python. > But VMs or embedded devices may take a long time to generate entropy. If > the device doesn't have a hardware source of randomness, and isn't > connected to an external source of noise like networking or a user who > habitually fiddles with the mouse, it might take a very long time indeed > to gather entropy... > > If I have understood the concensus, I think we're on the right track: > > (1) os.urandom should do whatever the OS says it should do, which on > Linux is fall back on pseudo-random bytes when the entropy pool hasn't > be initialised yet. It won't block and won't raise. > > (2) os.getrandom will be added to 3.6, and it will block, or possibly > raise, whichever the caller specifies. > > (3) The secrets module in 3.6 will stop relying on os.urandom, and use > os.getrandom. It may provide a switch to choose between blocking and > non-blocking (raise an exception) behaviour. It WON'T fall back to > predictable non-crypto bytes (unless the OS itself is completely > broken). > > (4) random will continue to seed itself from os.urandom, because it > doesn't care if urandom provides degraded randomness. It just needs to > be better than using the time as seed. > > (5) What about random.SysRandom? I think it should use os.getrandom. > > (6) A bunch of stuff will happen to make the hash randomisation not > break when systemd runs Python scripts early in the boot process, but I > haven't been paying attention to that part :-) > > Is this a good summary of where we are at? > > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Jun 11 12:35:26 2016 From: brett at python.org (Brett Cannon) Date: Sat, 11 Jun 2016 16:35:26 +0000 Subject: [Python-Dev] Current Python 3.2 status? In-Reply-To: References: Message-ID: On Sat, 11 Jun 2016 at 08:05 Chi Hsuan Yen wrote: > Hello all, > > Georg said in February that 3.2.7 is going to be released, and now it's > June. Will it ever be released? > > pip [2], virtualenv [3] and setuptools [4] have all dropped Python 3.2 > support, and there's no new commits since 2016/01/15 on CPython's 3.2 > branch. I'd like to know CPython's attitude against Python 3.2. Is it still > maintained? Or is it dead? > It's up to Georg to decide to do one final source-only release. But to the rest of us it's reached its end-of-life. Basically checking out the source code will more-or-less be the same as whatever Georg relesases (sans tweaking some version numbers). -Brett > > [1] https://mail.python.org/pipermail/python-dev/2016-February/143300.html > [2] > https://github.com/pypa/pip/commit/b11cb019a47ff0cf3d8a37a0c89d8ae4cf25282f > [3] > https://github.com/pypa/virtualenv/commit/8132fa3a826ff1ba0c0c065563b9733c2e5a5b6c > [4] > https://github.com/pypa/setuptools/commit/ae6c73f07680da77345f5ccfac4facde30ad4d7e > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berker.peksag at gmail.com Sat Jun 11 13:02:21 2016 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Sat, 11 Jun 2016 20:02:21 +0300 Subject: [Python-Dev] Current Python 3.2 status? In-Reply-To: References: Message-ID: On Sat, Jun 11, 2016 at 8:59 AM, Chi Hsuan Yen wrote: > Hello all, > > Georg said in February that 3.2.7 is going to be released, and now it's > June. Will it ever be released? Hi, It was delayed because of a security issue. See Georg's email at https://mail.python.org/pipermail/python-dev/2016-February/143400.html --Berker From donald at stufft.io Sat Jun 11 13:15:51 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 11 Jun 2016 13:15:51 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> Message-ID: <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> > On Jun 11, 2016, at 11:34 AM, Guido van Rossum wrote: > > In terms of API design, I'd prefer a flag to os.urandom() indicating a preference for > - blocking > - raising an exception > - weaker random bits If os.urandom can?t block on Linux, then I feel like it?d be saner to add os.getrandom(). I feel like these flags are going to confuse people, particularly when you take into account that all 3 of them are only going to really matter on Linux (and particularly on newer Linux) and for things like ?blocking? it?s going to get confused with the blocking that /dev/random does on Linux. Right now there are two ways to access the system CSPRNG on *nix, there is /dev/urandom pretty much always, and then there is getrandom() (or arc4random, etc, depending on the specific OS you?re on). Perhaps the right answer is to go back to making os.urandom always open(?/dev/urandom?).read() instead of trying to save a FD by using getrandom() and just add os.getrandom() which will interface with getrandom()/arc4random()/etc and always in blocking mode. Why always in blocking mode? Because it?s the only way to get consistent behavior across different platforms, all non Linux OSs either block or they otherwise ensure that it is initialized prior to it even being possible to access the CSPRNG. Using this, code can be smarter about what to do in edge cases than we can reasonably be in os.urandom, for example see https://bpaste.net/show/41d89e520913 . The reasons I think this is preferable to adding parameters to os.urandom are: * If we add parameters to os.urandom, you can?t feature detect their existence easily, you have to use version checks. * With flags, unless we add even more flags we can?t dictate what should happen if we?re on a system where the person?s desired preference can?t be satisfied. We either have to just silently do something that may be wrong, or add more flags. By adding two functions people can pick which of the following they want with some programming (see example): * Just try to get the strongest random, but fall back to maybe not random if it?s early enough in boot process. * Fail on old Linux rather than possibly get insecure random. * Actually write cross platform code to prevent blocking (since only Linux allows you to not block) * Fail hard rather than block if we can?t get secure random bytes without blocking. * Soft fail and get ?probably good enough? random from os.urandom on Linux. * Hard fail on non Linux if we would block since there?s no non-blocking and ?probably good enough? interface. * Soft fail and get ?probably good enough? random from os.urandom on Linux, and use time/pid/memory offsets on non Linux. * Just use the best source of random available to use on the system, and block rather than fail. I don?t see any way to get the same wide set of options by just adding flags to os.urandom unless we add flags that work for every possible combination of what people may or may not want to. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Jun 11 13:28:33 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Jun 2016 13:28:33 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> Message-ID: On 6/11/2016 11:34 AM, Guido van Rossum wrote: > In terms of API design, I'd prefer a flag to os.urandom() indicating a > preference for > - blocking > - raising an exception > - weaker random bits +100 ;-) I proposed exactly this 2 days ago, 5 hours after Larry's initial post. ''' I think the 'new API' should be a parameter, not a new function. With just two choices, 'wait' = True/False could work. If 'raise an exception' were added, then 'action (when good bits are not immediately available' = 'return (best possible)' or 'wait (until have good bits)' or 'raise (CryptBitsNotAvailable)' In either case, there would then be the question of whether the default should match 3.5.0/1 or 3.4 and before. ''' Deciding on this then might have saved some hurt feelings, to the point where two contributors feel like disappearing, and a release manager must feel the same. In any case, Guido already picked 3.4 behavior as the default. Can we agree and move on? -- Terry Jan Reedy From guido at python.org Sat Jun 11 13:39:11 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 10:39:11 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()? My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right). My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can. I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station. On Sat, Jun 11, 2016 at 10:15 AM, Donald Stufft wrote: > > On Jun 11, 2016, at 11:34 AM, Guido van Rossum wrote: > > In terms of API design, I'd prefer a flag to os.urandom() indicating a > preference for > - blocking > - raising an exception > - weaker random bits > > > If os.urandom can?t block on Linux, then I feel like it?d be saner to add > os.getrandom(). I feel like these flags are going to confuse people, > particularly when you take into account that all 3 of them are only going > to really matter on Linux (and particularly on newer Linux) and for things > like ?blocking? it?s going to get confused with the blocking that > /dev/random does on Linux. > > Right now there are two ways to access the system CSPRNG on *nix, there is > /dev/urandom pretty much always, and then there is getrandom() (or > arc4random, etc, depending on the specific OS you?re on). > > Perhaps the right answer is to go back to making os.urandom always > open(?/dev/urandom?).read() instead of trying to save a FD by using > getrandom() and just add os.getrandom() which will interface with > getrandom()/arc4random()/etc and always in blocking mode. Why always in > blocking mode? Because it?s the only way to get consistent behavior across > different platforms, all non Linux OSs either block or they otherwise > ensure that it is initialized prior to it even being possible to access the > CSPRNG. > > Using this, code can be smarter about what to do in edge cases than we can > reasonably be in os.urandom, for example see > https://bpaste.net/show/41d89e520913. > > The reasons I think this is preferable to adding parameters to os.urandom > are: > > * If we add parameters to os.urandom, you can?t feature detect their > existence easily, you have to use version checks. > * With flags, unless we add even more flags we can?t dictate what should > happen if we?re on a system where the person?s desired preference can?t be > satisfied. We either have to just silently do something that may be wrong, > or add more flags. By adding two functions people can pick which of the > following they want with some programming (see example): > > * Just try to get the strongest random, but fall back to maybe not > random if it?s early enough in boot process. > * Fail on old Linux rather than possibly get insecure random. > * Actually write cross platform code to prevent blocking (since only > Linux allows you to not block) > * Fail hard rather than block if we can?t get secure random bytes > without blocking. > * Soft fail and get ?probably good enough? random from os.urandom > on Linux. > * Hard fail on non Linux if we would block since there?s no > non-blocking and ?probably good enough? interface. > * Soft fail and get ?probably good enough? random from os.urandom > on Linux, and use time/pid/memory offsets on non Linux. > * Just use the best source of random available to use on the system, > and block rather than fail. > > I don?t see any way to get the same wide set of options by just adding > flags to os.urandom unless we add flags that work for every possible > combination of what people may or may not want to. > > ? > Donald Stufft > > > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 11 13:41:19 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 10:41:19 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> Message-ID: You can add me to the list of people who feel like disappearing. On Sat, Jun 11, 2016 at 10:28 AM, Terry Reedy wrote: > On 6/11/2016 11:34 AM, Guido van Rossum wrote: > >> In terms of API design, I'd prefer a flag to os.urandom() indicating a >> preference for >> - blocking >> - raising an exception >> - weaker random bits >> > > +100 ;-) > > I proposed exactly this 2 days ago, 5 hours after Larry's initial post. > > ''' > I think the 'new API' should be a parameter, not a new function. With just > two choices, 'wait' = True/False could work. If 'raise an exception' were > added, then > 'action (when good bits are not immediately available' = > 'return (best possible)' or > 'wait (until have good bits)' or > 'raise (CryptBitsNotAvailable)' > > In either case, there would then be the question of whether the default > should match 3.5.0/1 or 3.4 and before. > ''' > > Deciding on this then might have saved some hurt feelings, to the point > where two contributors feel like disappearing, and a release manager must > feel the same. In any case, Guido already picked 3.4 behavior as the > default. Can we agree and move on? > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan12125 at gmail.com Sat Jun 11 13:41:59 2016 From: yan12125 at gmail.com (Chi Hsuan Yen) Date: Sun, 12 Jun 2016 01:41:59 +0800 Subject: [Python-Dev] Current Python 3.2 status? In-Reply-To: References: Message-ID: On Sun, Jun 12, 2016 at 1:02 AM, Berker Peksa? wrote: > On Sat, Jun 11, 2016 at 8:59 AM, Chi Hsuan Yen wrote: > > Hello all, > > > > Georg said in February that 3.2.7 is going to be released, and now it's > > June. Will it ever be released? > > Hi, > > It was delayed because of a security issue. See Georg's email at > https://mail.python.org/pipermail/python-dev/2016-February/143400.html > > --Berker > Thanks for that. I'm just curious what's happening on the 3.2 branch. -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sat Jun 11 14:30:19 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 11 Jun 2016 14:30:19 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: > On Jun 11, 2016, at 1:39 PM, Guido van Rossum wrote: > > Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()? > > My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right). > > My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can. The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn?t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it?s on a platform that defines that as always returning secure or because it?s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized. The ?silently does the wrong thing, even though I explicitly asked for it do something different? is something that I would consider to be a footgun and footgun?s in security sensitive code make me really worried. Outside of the security side of things, if someone goes ?Ok I need some random bytes and I need to make sure it doesn?t block?, then doing ``os.random(block=False, exception=False)`` isn?t going to make sure that it doesn?t block except on Linux. In other words, it?s basically impossible to ensure you get the behavior you want with these flags which I feel like will make everyone unhappy (both the people who want to ensure non-blocking, and the people who want to ensure cryptographically secure). These flags are an attractive nuisance that look like they do the right thing, but silently don?t. Meanwhile if we have os.urandom that reads from /dev/urandom and os.getrandom() which reads from blocking random, then we make it both easier to ensure you get the behavior you want, either by using the function that best suits your needs: * If you just want the best the OS has to offer, os.getrandom falling back to os.urandom. * If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux. * If you want to *ensure* that there?s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that?s the only way to ensure not blocking cross platform). * If you just don?t care, YOLO it up with either os.urandom or os.getrandom or random.random. > > I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station. > Sorry, to be more specific I meant the 3.4 behavior, which was open(?/dev/urandom?).read() on *nix and CryptGenRandom on Windows. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Jun 11 14:55:51 2016 From: brett at python.org (Brett Cannon) Date: Sat, 11 Jun 2016 18:55:51 +0000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: On Sat, 11 Jun 2016 at 11:31 Donald Stufft wrote: > On Jun 11, 2016, at 1:39 PM, Guido van Rossum wrote: > > Is the feature detection desire about being able to write code that runs > on older Python versions or for platforms that just don't have getrandom()? > > My assumption was that nobody would actually use these flags except the > secrets module and people writing code that generates long-lived secrets -- > and the latter category should be checking platform and versions anyway > since they need the whole stack to be secure (if I understand Ted Ts'o's > email right). > > My assumption is also that the flags should be hints (perhaps only > relevant on Linux) -- platforms that can't perform the action desired > (because their system's API doesn't support it) would just do their default > action, assuming the system API does the best it can. > > > The problem is that someone writing software that does > os.urandom(block=True) or os.urandom(exception=True) which gets some bytes > doesn?t know if it got back cryptographically secure random because Python > called getrandom() or if it got back cryptographically secure random > because it called /dev/urandom and that gave it secure random because it?s > on a platform that defines that as always returning secure or because it?s > on Linux and the urandom pool is initialized or if it got back some random > bytes that are not cryptographically secure because it fell back to reading > /dev/urandom on Linux prior to the pool being initialized. > > The ?silently does the wrong thing, even though I explicitly asked for it > do something different? is something that I would consider to be a footgun > and footgun?s in security sensitive code make me really worried. > > Outside of the security side of things, if someone goes ?Ok I need some > random bytes and I need to make sure it doesn?t block?, then doing > ``os.random(block=False, exception=False)`` isn?t going to make sure that > it doesn?t block except on Linux. > > In other words, it?s basically impossible to ensure you get the behavior > you want with these flags which I feel like will make everyone unhappy > (both the people who want to ensure non-blocking, and the people who want > to ensure cryptographically secure). These flags are an attractive nuisance > that look like they do the right thing, but silently don?t. > > Meanwhile if we have os.urandom that reads from /dev/urandom and > os.getrandom() which reads from blocking random, then we make it both > easier to ensure you get the behavior you want, either by using the > function that best suits your needs: > > * If you just want the best the OS has to offer, os.getrandom falling back > to os.urandom. > * If you want to ensure you get cryptographically secure bytes, > os.getrandom, falling back to os.urandom on non Linux platforms and > erroring on Linux. > * If you want to *ensure* that there?s no blocking, then os.urandom on > Linux (or os.urandom wrapped with timeout code anywhere else, as that?s the > only way to ensure not blocking cross platform). > * If you just don?t care, YOLO it up with either os.urandom or > os.getrandom or random.random. > I'm +1 w/ what Donald is suggesting here and below w/ proper documentation in both the secrets and random modules to explain when to use what (i.e. secrets for crypto-no-matter-what randomness, random for quick-and-dirty randomness). This also includes any appropriate decoupling of the secrets module from the random module so there's no reliance on the random module in the docs of the secrets module beyond "this class has the same interface", and letting the secrets module be the way people generally get crypto randomness. -Brett > > > I think the problem with making os.urandom() go back to always reading > /dev/urandom is that we've come to rely on it on all platforms, so we've > passed that station. > > > Sorry, to be more specific I meant the 3.4 behavior, which was > open(?/dev/urandom?).read() on *nix and CryptGenRandom on Windows. > > > ? > > Donald Stufft > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 11 15:40:06 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 12:40:06 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: On Sat, Jun 11, 2016 at 11:30 AM, Donald Stufft wrote: > > On Jun 11, 2016, at 1:39 PM, Guido van Rossum wrote: > > Is the feature detection desire about being able to write code that runs > on older Python versions or for platforms that just don't have getrandom()? > > My assumption was that nobody would actually use these flags except the > secrets module and people writing code that generates long-lived secrets -- > and the latter category should be checking platform and versions anyway > since they need the whole stack to be secure (if I understand Ted Ts'o's > email right). > > My assumption is also that the flags should be hints (perhaps only > relevant on Linux) -- platforms that can't perform the action desired > (because their system's API doesn't support it) would just do their default > action, assuming the system API does the best it can. > > > The problem is that someone writing software that does > os.urandom(block=True) or os.urandom(exception=True) which gets some bytes > doesn?t know if it got back cryptographically secure random because Python > called getrandom() or if it got back cryptographically secure random > because it called /dev/urandom and that gave it secure random because it?s > on a platform that defines that as always returning secure or because it?s > on Linux and the urandom pool is initialized or if it got back some random > bytes that are not cryptographically secure because it fell back to reading > /dev/urandom on Linux prior to the pool being initialized. > > The ?silently does the wrong thing, even though I explicitly asked for it > do something different? is something that I would consider to be a footgun > and footgun?s in security sensitive code make me really worried. > Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation. > Outside of the security side of things, if someone goes ?Ok I need some > random bytes and I need to make sure it doesn?t block?, then doing > ``os.random(block=False, exception=False)`` isn?t going to make sure that > it doesn?t block except on Linux. > To people who "just want some random bytes" we should recommend the random module. > In other words, it?s basically impossible to ensure you get the behavior > you want with these flags which I feel like will make everyone unhappy > (both the people who want to ensure non-blocking, and the people who want > to ensure cryptographically secure). These flags are an attractive nuisance > that look like they do the right thing, but silently don?t. > OK, it looks like the flags just won't make you happy, and I'm happy to give up on them. By default the status quo will win, and that means neither these flags nor os.getrandom(). (But of course you can roll your own using ctypes. :-) > Meanwhile if we have os.urandom that reads from /dev/urandom and > os.getrandom() which reads from blocking random, then we make it both > easier to ensure you get the behavior you want, either by using the > function that best suits your needs: > > * If you just want the best the OS has to offer, os.getrandom falling back > to os.urandom. > Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True). > * If you want to ensure you get cryptographically secure bytes, > os.getrandom, falling back to os.urandom on non Linux platforms and > erroring on Linux. > "Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.) > * If you want to *ensure* that there?s no blocking, then os.urandom on > Linux (or os.urandom wrapped with timeout code anywhere else, as that?s the > only way to ensure not blocking cross platform). > That's fine with me. > * If you just don?t care, YOLO it up with either os.urandom or > os.getrandom or random.random. > Now you're just taking the mickey. > > I think the problem with making os.urandom() go back to always reading > /dev/urandom is that we've come to rely on it on all platforms, so we've > passed that station. > > > Sorry, to be more specific I meant the 3.4 behavior, which was > open(?/dev/urandom?).read() on *nix and CryptGenRandom on Windows. > I am all for keeping it that way. The secrets module doesn't have to use any of these, it can use an undocumented extension module for all I care. Or it can use os.urandom() and trust Ted Ts'o. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Sat Jun 11 15:53:36 2016 From: larry at hastings.org (Larry Hastings) Date: Sat, 11 Jun 2016 12:53:36 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: <575C6C40.7020403@hastings.org> On 06/11/2016 11:30 AM, Donald Stufft wrote: > The problem is that someone writing software that does > os.urandom(block=True) or os.urandom(exception=True) which gets some > bytes doesn?t know if it got back cryptographically secure random > because Python called getrandom() or if it got back cryptographically > secure random because it called /dev/urandom and that gave it secure > random because it?s on a platform that defines that as always > returning secure or because it?s on Linux and the urandom pool is > initialized or if it got back some random bytes that are not > cryptographically secure because it fell back to reading /dev/urandom > on Linux prior to the pool being initialized. Let me jump in tangentially to say: I think os.urandom(block=True) is simply a bad API. On FreeBSD and OpenBSD, /dev/urandom may block, and you don't have a choice. On OS X, /dev/urandom will never block, and you don't have a choice. In Victor's initial patch where he proposed it, the flag was accepted on all platforms but only affected its behavior on Linux and possibly Solaris. I think it's bad API design to have a flag that seems like it would be meaningful on multiple platforms, but in practice is useful only in very limited circumstances. If this were old code, or behavior we inherited from the platform and we were making the best of a bad situation, that'd be one thing. But this is a proposed new API and I definitely think we can do better. As I understand the proposed semantics for os.urandom(exception=True), I feel it falls into the same trap though not to the same degree. Of course, both flags break backwards-compatibility if they default to True, and I strongly disagree with . It's far better in my opinion to keep the os module as a thin shell over platform functionality. That makes Python's behavior more predictable on a platform-by-platform basis. So I think the best approach here is to add os.getrandom() as a thin shell over the local getrandom() (if any). //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Jun 11 16:26:39 2016 From: brett at python.org (Brett Cannon) Date: Sat, 11 Jun 2016 20:26:39 +0000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: http://bugs.python.org/issue27288 covers updating the secrets module to use getrandom(). http://bugs.python.org/issue27292 covers documenting the drawbacks of os.urandom() http://bugs.python.org/issue27293 covers documenting all of the issues pointed out in this discussion. Only issue I can think of that we're missing is one to track reverting os.urandom() to 3.4 semantics (any doc updates required for the random module?). Am I missing anything? On Sat, Jun 11, 2016, 12:41 Guido van Rossum wrote: > On Sat, Jun 11, 2016 at 11:30 AM, Donald Stufft wrote: > >> >> On Jun 11, 2016, at 1:39 PM, Guido van Rossum wrote: >> >> Is the feature detection desire about being able to write code that runs >> on older Python versions or for platforms that just don't have getrandom()? >> >> My assumption was that nobody would actually use these flags except the >> secrets module and people writing code that generates long-lived secrets -- >> and the latter category should be checking platform and versions anyway >> since they need the whole stack to be secure (if I understand Ted Ts'o's >> email right). >> >> My assumption is also that the flags should be hints (perhaps only >> relevant on Linux) -- platforms that can't perform the action desired >> (because their system's API doesn't support it) would just do their default >> action, assuming the system API does the best it can. >> >> >> The problem is that someone writing software that does >> os.urandom(block=True) or os.urandom(exception=True) which gets some bytes >> doesn?t know if it got back cryptographically secure random because Python >> called getrandom() or if it got back cryptographically secure random >> because it called /dev/urandom and that gave it secure random because it?s >> on a platform that defines that as always returning secure or because it?s >> on Linux and the urandom pool is initialized or if it got back some random >> bytes that are not cryptographically secure because it fell back to reading >> /dev/urandom on Linux prior to the pool being initialized. >> >> The ?silently does the wrong thing, even though I explicitly asked for it >> do something different? is something that I would consider to be a footgun >> and footgun?s in security sensitive code make me really worried. >> > > Yeah, but we've already established that there's a lot more upset, > rhetoric and worry than warranted by the situation. > > >> Outside of the security side of things, if someone goes ?Ok I need some >> random bytes and I need to make sure it doesn?t block?, then doing >> ``os.random(block=False, exception=False)`` isn?t going to make sure that >> it doesn?t block except on Linux. >> > > To people who "just want some random bytes" we should recommend the random > module. > > >> In other words, it?s basically impossible to ensure you get the behavior >> you want with these flags which I feel like will make everyone unhappy >> (both the people who want to ensure non-blocking, and the people who want >> to ensure cryptographically secure). These flags are an attractive nuisance >> that look like they do the right thing, but silently don?t. >> > > OK, it looks like the flags just won't make you happy, and I'm happy to > give up on them. By default the status quo will win, and that means neither > these flags nor os.getrandom(). (But of course you can roll your own using > ctypes. :-) > > >> Meanwhile if we have os.urandom that reads from /dev/urandom and >> os.getrandom() which reads from blocking random, then we make it both >> easier to ensure you get the behavior you want, either by using the >> function that best suits your needs: >> >> * If you just want the best the OS has to offer, os.getrandom falling >> back to os.urandom. >> > > Actually the proposal for that was the secrets module. And the secrets > module would be the only user of os.urandom(blocking=True). > > >> * If you want to ensure you get cryptographically secure bytes, >> os.getrandom, falling back to os.urandom on non Linux platforms and >> erroring on Linux. >> > > "Erroring" doesn't sound like it satisfies the "ensure" part of the > requirement. And I don't see the advantage of os.getrandom() over the > secrets module. (Either way you have to fall back on os.urandom() to > suppport Python 3.5 and before.) > > >> * If you want to *ensure* that there?s no blocking, then os.urandom on >> Linux (or os.urandom wrapped with timeout code anywhere else, as that?s the >> only way to ensure not blocking cross platform). >> > > That's fine with me. > > >> * If you just don?t care, YOLO it up with either os.urandom or >> os.getrandom or random.random. >> > > Now you're just taking the mickey. > > >> >> I think the problem with making os.urandom() go back to always reading >> /dev/urandom is that we've come to rely on it on all platforms, so we've >> passed that station. >> >> >> Sorry, to be more specific I meant the 3.4 behavior, which was >> open(?/dev/urandom?).read() on *nix and CryptGenRandom on Windows. >> > > I am all for keeping it that way. The secrets module doesn't have to use > any of these, it can use an undocumented extension module for all I care. > Or it can use os.urandom() and trust Ted Ts'o. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sat Jun 11 16:48:05 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 11 Jun 2016 16:48:05 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> > On Jun 11, 2016, at 3:40 PM, Guido van Rossum wrote: > > On Sat, Jun 11, 2016 at 11:30 AM, Donald Stufft > wrote: > >> On Jun 11, 2016, at 1:39 PM, Guido van Rossum > wrote: >> >> Is the feature detection desire about being able to write code that runs on older Python versions or for platforms that just don't have getrandom()? >> >> My assumption was that nobody would actually use these flags except the secrets module and people writing code that generates long-lived secrets -- and the latter category should be checking platform and versions anyway since they need the whole stack to be secure (if I understand Ted Ts'o's email right). >> >> My assumption is also that the flags should be hints (perhaps only relevant on Linux) -- platforms that can't perform the action desired (because their system's API doesn't support it) would just do their default action, assuming the system API does the best it can. > > The problem is that someone writing software that does os.urandom(block=True) or os.urandom(exception=True) which gets some bytes doesn?t know if it got back cryptographically secure random because Python called getrandom() or if it got back cryptographically secure random because it called /dev/urandom and that gave it secure random because it?s on a platform that defines that as always returning secure or because it?s on Linux and the urandom pool is initialized or if it got back some random bytes that are not cryptographically secure because it fell back to reading /dev/urandom on Linux prior to the pool being initialized. > > The ?silently does the wrong thing, even though I explicitly asked for it do something different? is something that I would consider to be a footgun and footgun?s in security sensitive code make me really worried. > > Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation. Have we? There are real, documented security failures in the wild because of /dev/urandom?s behavior. This isn?t just a theoretical problem, it actually has had consequences in real life, and those same consequences could just have easily happened to Python (in one of the cases that most recently comes to mind it was a C program, but that?s not really relevant because the same problem would have happened if they had written in Python using os.urandom in 3.4 but not in 3.5.0 or 3.5.1. > > Outside of the security side of things, if someone goes ?Ok I need some random bytes and I need to make sure it doesn?t block?, then doing ``os.random(block=False, exception=False)`` isn?t going to make sure that it doesn?t block except on Linux. > > To people who "just want some random bytes" we should recommend the random module. > > In other words, it?s basically impossible to ensure you get the behavior you want with these flags which I feel like will make everyone unhappy (both the people who want to ensure non-blocking, and the people who want to ensure cryptographically secure). These flags are an attractive nuisance that look like they do the right thing, but silently don?t. > > OK, it looks like the flags just won't make you happy, and I'm happy to give up on them. By default the status quo will win, and that means neither these flags nor os.getrandom(). (But of course you can roll your own using ctypes. :-) > > Meanwhile if we have os.urandom that reads from /dev/urandom and os.getrandom() which reads from blocking random, then we make it both easier to ensure you get the behavior you want, either by using the function that best suits your needs: > > * If you just want the best the OS has to offer, os.getrandom falling back to os.urandom. > > Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True). I?m fine if this lives in the secrets module? Steven asked for it to be an os function so that secrets.py could continue to be pure python. > > * If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux. > > "Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.) Erroring does satisfy the ensure part, because if it?s not possible to get cryptographically secure bytes then the only option is to error if you want to be ensured of cryptographically secure bytes. It?s a bit like if you did open(?somefile.txt?), it?s reasonable to say that we should ensure that open(?somefile.txt?) actually opens ./somefile.txt, and doesn?t randomly open a different file if ./somefile.txt doesn?t exist? if it can?t open ./somefile.txt it should error. If I *need* cryptographically secure random bytes, and I?m on a platform that doesn?t provide those, then erroring is often times the correct behavior. This is such an important thing that OS X will flat out kernel panic and refuse to boot if it can?t ensure that it can give people cryptographically secure random bytes. It?s a fairly simple decision tree, I go ?hey, give me cryptographically secure random bytes, and only cryptographically secure random bytes?. If it cannot give them to me because the APIs of the system cannot guarantee they are cryptographically secure then there are only two options, either A) it is explicit about it?s inability to do this and raises an error or B) it does something completely different than what I asked it to do and pretends that it?s what I wanted. > > * If you want to *ensure* that there?s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that?s the only way to ensure not blocking cross platform). > > That's fine with me. > > * If you just don?t care, YOLO it up with either os.urandom or os.getrandom or random.random. > > Now you're just taking the mickey. No I?m not? random.Random is such a use case where it wants to seed with as secure of bytes as it can get it?s hands on, but it doesn?t care if it falls back to insecure bytes if it?s not possible to get secure bytes. This code even falls back to using time as a seed if all else fails. > >> >> I think the problem with making os.urandom() go back to always reading /dev/urandom is that we've come to rely on it on all platforms, so we've passed that station. >> > > Sorry, to be more specific I meant the 3.4 behavior, which was open(?/dev/urandom?).read() on *nix and CryptGenRandom on Windows. > > I am all for keeping it that way. The secrets module doesn't have to use any of these, it can use an undocumented extension module for all I care. Or it can use os.urandom() and trust Ted Ts'o. > > -- > --Guido van Rossum (python.org/~guido ) ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sat Jun 11 16:48:48 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 11 Jun 2016 16:48:48 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> Message-ID: <43B25EC7-FE1D-434C-84B4-58AB9F8D703C@stufft.io> > On Jun 11, 2016, at 4:26 PM, Brett Cannon wrote: > > Only issue I can think of that we're missing is one to track reverting os.urandom() to 3.4 semantics (any doc updates required for the random module?). Am I missing anything? It?s already been reverted to 3.4 semantics (well, it will try to use getrandom(GRD_NONBLOCK) but falls back to /dev/urandom if that would have blocked). ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 11 16:48:49 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 13:48:49 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575C6C40.7020403@hastings.org> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <575C6C40.7020403@hastings.org> Message-ID: On Sat, Jun 11, 2016 at 12:53 PM, Larry Hastings wrote: > > On 06/11/2016 11:30 AM, Donald Stufft wrote: > > The problem is that someone writing software that does > os.urandom(block=True) or os.urandom(exception=True) which gets some bytes > doesn?t know if it got back cryptographically secure random because Python > called getrandom() or if it got back cryptographically secure random > because it called /dev/urandom and that gave it secure random because it?s > on a platform that defines that as always returning secure or because it?s > on Linux and the urandom pool is initialized or if it got back some random > bytes that are not cryptographically secure because it fell back to reading > /dev/urandom on Linux prior to the pool being initialized. > > > Let me jump in tangentially to say: I think os.urandom(block=True) is > simply a bad API. On FreeBSD and OpenBSD, /dev/urandom may block, and you > don't have a choice. On OS X, /dev/urandom will never block, and you don't > have a choice. In Victor's initial patch where he proposed it, the flag > was accepted on all platforms but only affected its behavior on Linux and > possibly Solaris. I think it's bad API design to have a flag that seems > like it would be meaningful on multiple platforms, but in practice is > useful only in very limited circumstances. If this were old code, or > behavior we inherited from the platform and we were making the best of a > bad situation, that'd be one thing. But this is a proposed new API and I > definitely think we can do better. > > As I understand the proposed semantics for os.urandom(exception=True), I > feel it falls into the same trap though not to the same degree. > > Of course, both flags break backwards-compatibility if they default to > True, and I strongly disagree with . > > It's far better in my opinion to keep the os module as a thin shell over > platform functionality. That makes Python's behavior more predictable on a > platform-by-platform basis. So I think the best approach here is to add > os.getrandom() as a thin shell over the local getrandom() (if any). > OK, the flags are unpopular, so let's forget about them. But I find an os.getrandom() that only exists on those (few?) platforms that support it a nuisance too -- this just encourages cargo cult code that's unnecessarily complicated and believed to be secure without anybody ever verifying. I'd like to consider what people freak out about. - You could freak out about blocking - You could freak out about getting slightly less random bits - You could freak out about supporting Python 3.5 and earlier - You could freak out about supporting all platforms You could also freak out about combinations of the above, but that gets complicated and you should probably consider that you're over-constraining matters. If you freak out about all at once (or both the first and the second bullet) you should consider a career change. If you don't freak out about any of these (meaning you're happy with Python 3.6+) you should use the secrets module. If you freak out about support for older Python versions, try the secrets module first and fall back to os.urandom() -- there really isn't any other choice. If you freak out about getting slightly less random bits you should probably do a complete security assessment of your entire stack and fix the OS and Python version, and use the best you can get for that combination. You may not want to rely on the standard library at all. If you freak out about blocking you're probably on a specific platform, and if that platform is Linux, you're in luck: use os.urandom() and avoid Python 3.5.0 and 3.5.1. On other platforms you're out of luck. So I still don't see why we need os.getrandom() -- it has nothing to recommend it over the secrets module (since both won't happen before 3.6). So what should the secrets module use? Let's make that part an extension module. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sat Jun 11 17:16:04 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 11 Jun 2016 17:16:04 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <575C6C40.7020403@hastings.org> Message-ID: <5AF510E5-5DE4-4FF0-9D3E-916987627463@stufft.io> > On Jun 11, 2016, at 4:48 PM, Guido van Rossum wrote: > > But I find an os.getrandom() that only exists on those (few?) platforms that support it a nuisance too -- this just encourages cargo cult code that's unnecessarily complicated and believed to be secure without anybody ever verifying. Well, new enough Linux has getrandom(0), OpenBSD has getentropy(), Solaris has getrandom(), Windows has CryptGenRandom which all make it possible (or it?s the only way to invoke it) to get cryptographically secure random bytes or block and no in-between. So it?d likely be possible to have os.getrandom() with blocking semantics and no FD on all of the most popular platforms we support. If we relax the no FD then FreeBSD and OS X also have /dev/random (or /dev/urandom it?s the same thing) which will ensure that you give cryptographically secure random bytes. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 11 17:16:21 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 14:16:21 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> Message-ID: On Sat, Jun 11, 2016 at 1:48 PM, Donald Stufft wrote: > > On Jun 11, 2016, at 3:40 PM, Guido van Rossum wrote: > > Yeah, but we've already established that there's a lot more upset, > rhetoric and worry than warranted by the situation. > > > Have we? There are real, documented security failures in the wild because > of /dev/urandom?s behavior. This isn?t just a theoretical problem, it > actually has had consequences in real life, and those same consequences > could just have easily happened to Python (in one of the cases that most > recently comes to mind it was a C program, but that?s not really relevant > because the same problem would have happened if they had written in Python > using os.urandom in 3.4 but not in 3.5.0 or 3.5.1. > Actually it's not clear to me at all that it could have happened to Python. (Wasn't it an embedded system?) > Actually the proposal for that was the secrets module. And the secrets > module would be the only user of os.urandom(blocking=True). > > > I?m fine if this lives in the secrets module? Steven asked for it to be an > os function so that secrets.py could continue to be pure python. > The main thing that I want to avoid is that people start cargo-culting whatever the secrets module uses rather than just using the secrets module. Having it redundantly available as os.getrandom() is just begging for people to show off how much they know about writing secure code. > > >> * If you want to ensure you get cryptographically secure bytes, >> os.getrandom, falling back to os.urandom on non Linux platforms and >> erroring on Linux. >> > > "Erroring" doesn't sound like it satisfies the "ensure" part of the > requirement. And I don't see the advantage of os.getrandom() over the > secrets module. (Either way you have to fall back on os.urandom() to > suppport Python 3.5 and before.) > > > Erroring does satisfy the ensure part, because if it?s not possible to get > cryptographically secure bytes then the only option is to error if you want > to be ensured of cryptographically secure bytes. > > It?s a bit like if you did open(?somefile.txt?), it?s reasonable to say > that we should ensure that open(?somefile.txt?) actually opens > ./somefile.txt, and doesn?t randomly open a different file if > ./somefile.txt doesn?t exist? if it can?t open ./somefile.txt it should > error. If I *need* cryptographically secure random bytes, and I?m on a > platform that doesn?t provide those, then erroring is often times the > correct behavior. This is such an important thing that OS X will flat out > kernel panic and refuse to boot if it can?t ensure that it can give people > cryptographically secure random bytes. > But what is a Python script going to do with that error? IIUC this kind of error would only happen very early during boot time, and rarely, so the most likely outcome is a hard-to-debug mystery failure. > It?s a fairly simple decision tree, I go ?hey, give me cryptographically > secure random bytes, and only cryptographically secure random bytes?. If it > cannot give them to me because the APIs of the system cannot guarantee they > are cryptographically secure then there are only two options, either A) it > is explicit about it?s inability to do this and raises an error or B) it > does something completely different than what I asked it to do and pretends > that it?s what I wanted. > I really don't believe that there is only one kind of cryptographically secure random bytes. There are many different applications (use cases) of randomness and they need different behaviors. (If it was simple we wouldn't still be arguing. :-) > > >> * If you want to *ensure* that there?s no blocking, then os.urandom on >> Linux (or os.urandom wrapped with timeout code anywhere else, as that?s the >> only way to ensure not blocking cross platform). >> > > That's fine with me. > > >> * If you just don?t care, YOLO it up with either os.urandom or >> os.getrandom or random.random. >> > > Now you're just taking the mickey. > > > No I?m not? random.Random is such a use case where it wants to seed with > as secure of bytes as it can get it?s hands on, but it doesn?t care if it > falls back to insecure bytes if it?s not possible to get secure bytes. This > code even falls back to using time as a seed if all else fails. > Fair enough. The hash randomization is the other case I suppose (since not running any Python code at all isn't an option, and neither is waiting indefinitely before the user's code gets control). It does show the point that there are different use cases with different needs. But I think the stdlib should limit the choices. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 11 17:24:40 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 14:24:40 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <5AF510E5-5DE4-4FF0-9D3E-916987627463@stufft.io> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <575C6C40.7020403@hastings.org> <5AF510E5-5DE4-4FF0-9D3E-916987627463@stufft.io> Message-ID: On Sat, Jun 11, 2016 at 2:16 PM, Donald Stufft wrote: > > On Jun 11, 2016, at 4:48 PM, Guido van Rossum wrote: > > But I find an os.getrandom() that only exists on those (few?) platforms > that support it a nuisance too -- this just encourages cargo cult code > that's unnecessarily complicated and believed to be secure without anybody > ever verifying. > > > > Well, new enough Linux has getrandom(0), OpenBSD has getentropy(), Solaris > has getrandom(), Windows has CryptGenRandom which all make it possible (or > it?s the only way to invoke it) to get cryptographically secure random > bytes or block and no in-between. So it?d likely be possible to have > os.getrandom() with blocking semantics and no FD on all of the most popular > platforms we support. > > If we relax the no FD then FreeBSD and OS X also have /dev/random (or > /dev/urandom it?s the same thing) which will ensure that you give > cryptographically secure random bytes. > OK, so we should implement the best we can do for the secrets module, and leave os.urandom() alone. I think the requirement that the secrets module remain pure Python has to be dropped. I'm not sure what it should do if even blocking can't give it sufficiently strong random bytes, but I care much less -- it's a new API and it doesn't resemble any OS function, so as long as it is documented it should be fine. An alternative would be to keep the secrets module linked to SystemRandom, and improve the latter. Its link with os.random() is AFAIK undocumented. Its API is clumsy but for code that needs some form of secret-ish bytes and requires platform and Python version independence it might be better than anything else. Then the secrets module is just what we recommend new users on Python 3.6. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sat Jun 11 17:35:23 2016 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 11 Jun 2016 16:35:23 -0500 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <575C6C40.7020403@hastings.org> <5AF510E5-5DE4-4FF0-9D3E-916987627463@stufft.io> Message-ID: [Guido] > ... > An alternative would be to keep the secrets module linked to SystemRandom, > and improve the latter. Its link with os.random() is AFAIK undocumented. Its > API is clumsy but for code that needs some form of secret-ish bytes and > requires platform and Python version independence it might be better than > anything else. Then the secrets module is just what we recommend new users > on Python 3.6. There's an issue currently open about this: http://bugs.python.org/issue27288 The docs for SystemRandom are very brief, so people may have actually noticed ;-) the first sentence: Class that uses the os.urandom() function for generating random numbers ... IOW, "uses os.urandom()" has been one of its only advertised qualities. From donald at stufft.io Sat Jun 11 17:46:29 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 11 Jun 2016 17:46:29 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> Message-ID: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> > On Jun 11, 2016, at 5:16 PM, Guido van Rossum wrote: > > On Sat, Jun 11, 2016 at 1:48 PM, Donald Stufft > wrote: > >> On Jun 11, 2016, at 3:40 PM, Guido van Rossum > wrote: >> >> Yeah, but we've already established that there's a lot more upset, rhetoric and worry than warranted by the situation. > > Have we? There are real, documented security failures in the wild because of /dev/urandom?s behavior. This isn?t just a theoretical problem, it actually has had consequences in real life, and those same consequences could just have easily happened to Python (in one of the cases that most recently comes to mind it was a C program, but that?s not really relevant because the same problem would have happened if they had written in Python using os.urandom in 3.4 but not in 3.5.0 or 3.5.1. > > Actually it's not clear to me at all that it could have happened to Python. (Wasn't it an embedded system?) It was a RaspberryPI that ran a shell script on boot that called ssh-keygen. That shell script could have just as easily been a Python script that called os.urandom via https://github.com/sybrenstuvel/python-rsa instead of a shell script that called ssh-keygen. >> Actually the proposal for that was the secrets module. And the secrets module would be the only user of os.urandom(blocking=True). > > I?m fine if this lives in the secrets module? Steven asked for it to be an os function so that secrets.py could continue to be pure python. > > The main thing that I want to avoid is that people start cargo-culting whatever the secrets module uses rather than just using the secrets module. Having it redundantly available as os.getrandom() is just begging for people to show off how much they know about writing secure code. I guess one question would be, what does the secrets module do if it?s on a Linux that is too old to have getrandom(0), off the top of my head I can think of: * Silently fall back to reading os.urandom and hope that it?s been seeded. * Fall back to os.urandom and hope that it?s been seeded and add a SecurityWarning or something like it to mention that it?s falling back to os.urandom and it may be getting predictable random from /dev/urandom. * Hard fail because it can?t guarantee secure cryptographic random. Of the three, I would probably suggest the second one, it doesn?t let the problem happen silently, but it still ?works? (where it?s basically just hoping it?s being called late enough that /dev/urandom has been seeded), and people can convert it to the third case using the warnings module to turn the warning into an exception. >> >> * If you want to ensure you get cryptographically secure bytes, os.getrandom, falling back to os.urandom on non Linux platforms and erroring on Linux. >> >> "Erroring" doesn't sound like it satisfies the "ensure" part of the requirement. And I don't see the advantage of os.getrandom() over the secrets module. (Either way you have to fall back on os.urandom() to suppport Python 3.5 and before.) > > Erroring does satisfy the ensure part, because if it?s not possible to get cryptographically secure bytes then the only option is to error if you want to be ensured of cryptographically secure bytes. > > It?s a bit like if you did open(?somefile.txt?), it?s reasonable to say that we should ensure that open(?somefile.txt?) actually opens ./somefile.txt, and doesn?t randomly open a different file if ./somefile.txt doesn?t exist? if it can?t open ./somefile.txt it should error. If I *need* cryptographically secure random bytes, and I?m on a platform that doesn?t provide those, then erroring is often times the correct behavior. This is such an important thing that OS X will flat out kernel panic and refuse to boot if it can?t ensure that it can give people cryptographically secure random bytes. > > But what is a Python script going to do with that error? IIUC this kind of error would only happen very early during boot time, and rarely, so the most likely outcome is a hard-to-debug mystery failure. Depends on why they?re calling it, which is sort of the underlying problem I suspect with why there isn?t agreement about what the right default behavior is. The correct answer for some application might be to hard fail and wait for the operator to fix the environment that it?s running in. It depends on how important the thing that is getting this random is. One example: If I was writing a communication platform for people who are fighting oppressive regimes or to securely discuss sexual orientation in more dangerous parts of the world, I would want to make this program hard fail if it couldn?t ensure that it was using an interface that ensured cryptographic random, because the alternative is predictable numbers and someone possibly being arrested or executed. I know that?s a bit of an extreme edge case, but it?s also the kind of thing that people can might use Python for where the predictability of the CSPRNG it?s using is of the utmost importance. For other things, the importance will fall somewhere between best effort being good enough and predictable random numbers being a catastrophic. > > It?s a fairly simple decision tree, I go ?hey, give me cryptographically secure random bytes, and only cryptographically secure random bytes?. If it cannot give them to me because the APIs of the system cannot guarantee they are cryptographically secure then there are only two options, either A) it is explicit about it?s inability to do this and raises an error or B) it does something completely different than what I asked it to do and pretends that it?s what I wanted. > > I really don't believe that there is only one kind of cryptographically secure random bytes. There are many different applications (use cases) of randomness and they need different behaviors. (If it was simple we wouldn't still be arguing. :-) I mean for a CSPRNG there?s only one real important property: Can an attacker predict the next byte. Any other property for a CSPRNG doesn?t really matter. For other, non kinds of CSPRNGs they want other behaviors (equidistribution, etc) but those aren?t cryptographically secure (nor do they need to be). >> >> * If you want to *ensure* that there?s no blocking, then os.urandom on Linux (or os.urandom wrapped with timeout code anywhere else, as that?s the only way to ensure not blocking cross platform). >> >> That's fine with me. >> >> * If you just don?t care, YOLO it up with either os.urandom or os.getrandom or random.random. >> >> Now you're just taking the mickey. > > No I?m not? random.Random is such a use case where it wants to seed with as secure of bytes as it can get it?s hands on, but it doesn?t care if it falls back to insecure bytes if it?s not possible to get secure bytes. This code even falls back to using time as a seed if all else fails. > > Fair enough. The hash randomization is the other case I suppose (since not running any Python code at all isn't an option, and neither is waiting indefinitely before the user's code gets control). > > It does show the point that there are different use cases with different needs. But I think the stdlib should limit the choices. > > -- > --Guido van Rossum (python.org/~guido ) ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sat Jun 11 17:53:34 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 11 Jun 2016 17:53:34 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Sat Jun 11 17:58:07 2016 From: larry at hastings.org (Larry Hastings) Date: Sat, 11 Jun 2016 14:58:07 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <575C6C40.7020403@hastings.org> Message-ID: <575C896F.3060201@hastings.org> On 06/11/2016 01:48 PM, Guido van Rossum wrote: > So I still don't see why we need os.getrandom() -- it has nothing to > recommend it over the secrets module (since both won't happen before 3.6). I have two reasons, neither of which I think are necessarily all that persuasive. Don't consider this an argument--merely some observations. First, simply as a practical matter: the secrets module is currently pure Python. ISTM that the os module is where we put miscellaneous bits of os functionality; getrandom() definitely falls into that category. Rather than adding a new _secrets module or whatever it seemed easiest just to add it there. Second, I'd put this under the "consenting adults" rule. Clearly cryptography is a contentious subject with sharply differing opinions. There are many, many cryptography libraries available on PyPi; perhaps those libraries would like to use getrandom(), or /dev/urandom, or even getentropy(), in a way different than how secrets does it. My thinking is, the os module should provide platform support, the secrets module should be our codified best-practices, and we encourage everyone to use secrets. I'd go so far as to add that recommendation to the doc *and* the docstrings of os.urandom(), random.SystemRandom, and os.getrandom() (and os.getentropy()) if we add it. But by providing the OS functionality in a neutral way we allow external cryptographers to write what *they* view as best-practices code without wading into implementation detalis of secrets, or using ctypes, or whatnot. But like I said I don't have a strong opinion. As long as we're not adding mysterious flags to os.urandom() I'll probably sit the rest of this one out. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 11 18:44:36 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Jun 2016 15:44:36 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <575C896F.3060201@hastings.org> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <575C6C40.7020403@hastings.org> <575C896F.3060201@hastings.org> Message-ID: Fortunately, 3.6 feature freeze isn't until September, so we can all cool off and figure out the best way forward. I'm going on vacation for a week, and after sending this I'm going to mute the thread so I won't be pulled into it while I'm supposed to be relaxing. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Jun 11 19:43:18 2016 From: random832 at fastmail.com (Random832) Date: Sat, 11 Jun 2016 19:43:18 -0400 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <20160611014549.GK27919@ando.pearwood.info> References: <20160611014549.GK27919@ando.pearwood.info> Message-ID: <1465688598.1078301.634935153.258EE6BF@webmail.messagingengine.com> On Fri, Jun 10, 2016, at 21:45, Steven D'Aprano wrote: > If you express your performances as speeds (as "calculations per > second") then the harmonic mean is the right way to average them. That's true in so far as you get the same result as if you were to take the arithmetic mean of the times and then converted from that to calculations per second. Is there any other particular basis for considering it "right"? From stephen at xemacs.org Sat Jun 11 20:16:44 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 12 Jun 2016 09:16:44 +0900 Subject: [Python-Dev] writing to /dev/*random [was: BDFL ruling request: should we block ...] In-Reply-To: <20160611082437.GN27919@ando.pearwood.info> References: <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <20160611082437.GN27919@ando.pearwood.info> Message-ID: <22364.43500.337805.137220@turnbull.sk.tsukuba.ac.jp> This is related to David Mertz's request for backward compatible initialization, not to the bdfl decision. Steven D'Aprano writes: > I don't think that's something which the Python interpreter ought to do > for you, but you can write to /dev/urandom or /dev/random (both keep > their own, separate, entropy pools): > > open("/dev/urandom", "w").write("hello world") This fails for unprivileged users on Mac. I'm not sure what happens on Linux; it appears to succeed, but the result wasn't what I expected. Also, when entropy gets low, it's not clear how additional entropy is allocated between the /dev/random and /dev/urandom pools. > But of course there's the question of where you're going to get a > source of noise to write to the file. While it's (probably?) > harmless to write a hard-coded string to it, I don't think its > going to give you much entropy. Use a Raspberry-Pi, or other advanced expensive hardware. There's no real excuse for not having a hardware generator if the Pi has one! I would guess you can probably get something with a USB interface for $20 or so. http://scruss.com/blog/2013/06/07/well-that-was-unexpected-the-raspberry-pis-hardware-random-number-generator/ From larry at hastings.org Sat Jun 11 20:28:16 2016 From: larry at hastings.org (Larry Hastings) Date: Sat, 11 Jun 2016 17:28:16 -0700 Subject: [Python-Dev] writing to /dev/*random [was: BDFL ruling request: should we block ...] In-Reply-To: <22364.43500.337805.137220@turnbull.sk.tsukuba.ac.jp> References: <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <20160611082437.GN27919@ando.pearwood.info> <22364.43500.337805.137220@turnbull.sk.tsukuba.ac.jp> Message-ID: <575CACA0.7030400@hastings.org> On 06/11/2016 05:16 PM, Stephen J. Turnbull wrote: > Use a Raspberry-Pi, or other advanced expensive hardware. > There's no real excuse for not having a hardware generator if the Pi > has one! Intel CPUs added the RDRAND instruction as of Ivy Bridge, although there's an ongoing debate as to whether or not it's a suitable source of entropy to use for seeding urandom. https://en.wikipedia.org/wiki/RdRand#Reception Wikipedia goes on to describe the very-new RDSEED instruction which might be more suitable. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sat Jun 11 22:21:41 2016 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 11 Jun 2016 21:21:41 -0500 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <57595210.4000508@hastings.org> <817C1F1A-5BCE-40C9-B148-0B4919B307EE@lukasa.co.uk> <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> Message-ID: [Sebastian Krause] > ... > Ideally I would only want to use the random module for > non-secure and (in 3.6) the secrets module (which could block) for > secure random data and never bother with os.urandom (and knowing how > it behaves). But then those modules should probably get new > functions to directly return bytes. `secrets.token_bytes()` does just that, and other token_XXX() functions return bytes too but with different spellings (e.g., if you want, with the byte values represented as ASCII hex digits).. I believe everyone agrees token_bytes() will potentially block in 3.6 (along with all the other `secrets` facilities) on platforms supporting getrandom(). You're right that `random` doesn't expose such a function, and that the closest it gets is .getrandbits() (which returns a potentially giant int). So far, nobody has proposed adding new functions to `random`. From ericsnowcurrently at gmail.com Sat Jun 11 22:37:17 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 11 Jun 2016 19:37:17 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace (round 3) Message-ID: I've updated the PEP to reflect feedback up to this point. The reception has been positive. The only change to the original proposal has been that a manually set __definition_order__ must be a tuple of identifiers or None (rather that using the value as-is). All other updates to the PEP have been clarification. Guido, at this point I believe the PEP is ready for pronouncement. * I've included the most recent copy of the text below. Thanks. -eric ============================== PEP: 520 Title: Ordered Class Definition Namespace Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 7-Jun-2016 Python-Version: 3.6 Post-History: 7-Jun-2016 Abstract ======== When a class is defined using a ``class`` statement, the class body is executed within a namespace. After the execution completes, that namespace is copied into new ``dict`` and the original definition namespace is discarded. The new copy is stored away as the class's namespace and is exposed as ``__dict__`` through a read-only proxy. This PEP changes the default class definition namespace to ``OrderedDict``. The long-lived class namespace (``__dict__``) will remain a ``dict``. Furthermore, the order in which the attributes are defined in each class body will now be preserved in the ``__definition_order__`` attribute of the class. This allows introspection of the original definition order, e.g. by class decorators. Motivation ========== Currently the namespace used during execution of a class body defaults to ``dict``. If the metaclass defines ``__prepare__()`` then the result of calling it is used. Thus, before this PEP, if you needed your class definition namespace to be ``OrderedDict`` you had to use a metaclass. Metaclasses introduce an extra level of complexity to code and in some cases (e.g. conflicts) are a problem. So reducing the need for them is worth doing when the opportunity presents itself. Given that we now have a C implementation of ``OrderedDict`` and that ``OrderedDict`` is the common use case for ``__prepare__()``, we have such an opportunity by defaulting to ``OrderedDict``. The usefulness of ``OrderedDict``-by-default is greatly increased if the definition order is directly introspectable on classes afterward, particularly by code that is independent of the original class definition. One of the original motivating use cases for this PEP is generic class decorators that make use of the definition order. Changing the default class definition namespace has been discussed a number of times, including on the mailing lists and in PEP 422 and PEP 487 (see the References section below). Specification ============= * the default class *definition* namespace is now ``OrderdDict`` * the order in which class attributes are defined is preserved in the new ``__definition_order__`` attribute on each class * "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored * ``__definition_order__`` is a ``tuple`` (or ``None``) * ``__definition_order__`` is a read-only attribute * ``__definition_order__`` is always set: 1. if ``__definition_order__`` is defined in the class body then it must be a ``tuple`` of identifiers or ``None``; any other value will result in ``TypeError`` 2. classes that do not have a class definition (e.g. builtins) have their ``__definition_order__`` set to ``None`` 3. classes for which `__prepare__()`` returned something other than ``OrderedDict`` (or a subclass) have their ``__definition_order__`` set to ``None`` (except where #1 applies) The following code demonstrates roughly equivalent semantics for the default behavior:: class Meta(type): def __prepare__(cls, *args, **kwargs): return OrderedDict() class Spam(metaclass=Meta): ham = None eggs = 5 __definition_order__ = tuple(k for k in locals() if not (k.startswith('__') and k.endswith('__'))) Note that [pep487_] proposes a similar solution, albeit as part of a broader proposal. Why a tuple? ------------ Use of a tuple reflects the fact that we are exposing the order in which attributes on the class were *defined*. Since the definition is already complete by the time ``definition_order__`` is set, the content and order of the value won't be changing. Thus we use a type that communicates that state of immutability. Why a read-only attribute? -------------------------- As with the use of tuple, making ``__definition_order__`` a read-only attribute communicates the fact that the information it represents is complete. Since it represents the state of a particular one-time event (execution of the class definition body), allowing the value to be replaced would reduce confidence that the attribute corresponds to the original class body. If a use case for a writable (or mutable) ``__definition_order__`` arises, the restriction may be loosened later. Presently this seems unlikely and furthermore it is usually best to go immutable-by-default. Note that ``__definition_order__`` is centered on the class definition body. The use cases for dealing with the class namespace (``__dict__``) post-definition are a separate matter. ``__definition_order__`` would be a significantly misleading name for a feature focused on more than class definition. See [nick_concern_] for more discussion. Why ignore "dunder" names? -------------------------- Names starting and ending with "__" are reserved for use by the interpreter. In practice they should not be relevant to the users of ``__definition_order__``. Instead, for nearly everyone they would only be clutter, causing the same extra work for everyone. Why None instead of an empty tuple? ----------------------------------- A key objective of adding ``__definition_order__`` is to preserve information in class definitions which was lost prior to this PEP. One consequence is that ``__definition_order__`` implies an original class definition. Using ``None`` allows us to clearly distinquish classes that do not have a definition order. An empty tuple clearly indicates a class that came from a definition statement but did not define any attributes there. Why None instead of not setting the attribute? ---------------------------------------------- The absence of an attribute requires more complex handling than ``None`` does for consumers of ``__definition_order__``. Why constrain manually set values? ---------------------------------- If ``__definition_order__`` is manually set in the class body then it will be used. We require it to be a tuple of identifiers (or ``None``) so that consumers of ``__definition_order__`` may have a consistent expectation for the value. That helps maximize the feature's usefulness. Why is __definition_order__ even necessary? ------------------------------------------- Since the definition order is not preserved in ``__dict__``, it is lost once class definition execution completes. Classes *could* explicitly set the attribute as the last thing in the body. However, then independent decorators could only make use of classes that had done so. Instead, ``__definition_order__`` preserves this one bit of info from the class body so that it is universally available. Compatibility ============= This PEP does not break backward compatibility, except in the case that someone relies *strictly* on ``dict`` as the class definition namespace. This shouldn't be a problem. Changes ============= In addition to the class syntax, the following expose the new behavior: * builtins.__build_class__ * types.prepare_class * types.new_class Other Python Implementations ============================ Pending feedback, the impact on Python implementations is expected to be minimal. If a Python implementation cannot support switching to `OrderedDict``-by-default then it can always set ``__definition_order__`` to ``None``. Implementation ============== The implementation is found in the tracker. [impl_] Alternatives ============ .__dict__ as OrderedDict ------------------------------- Instead of storing the definition order in ``__definition_order__``, the now-ordered definition namespace could be copied into a new ``OrderedDict``. This would then be used as the mapping proxied as ``__dict__``. Doing so would mostly provide the same semantics. However, using ``OrderedDict`` for ``__dict__`` would obscure the relationship with the definition namespace, making it less useful. Additionally, doing this would require significant changes to the semantics of the concrete ``dict`` C-API. A "namespace" Keyword Arg for Class Definition ---------------------------------------------- PEP 422 introduced a new "namespace" keyword arg to class definitions that effectively replaces the need to ``__prepare__()``. [pep422_] However, the proposal was withdrawn in favor of the simpler PEP 487. References ========== .. [impl] issue #24254 (https://bugs.python.org/issue24254) .. [nick_concern] Nick's concerns about mutability (https://mail.python.org/pipermail/python-dev/2016-June/144883.html) .. [pep422] PEP 422 (https://www.python.org/dev/peps/pep-0422/#order-preserving-classes) .. [pep487] PEP 487 (https://www.python.org/dev/peps/pep-0487/#defining-arbitrary-namespaces) .. [orig] original discussion (https://mail.python.org/pipermail/python-ideas/2013-February/019690.html) .. [followup1] follow-up 1 (https://mail.python.org/pipermail/python-dev/2013-June/127103.html) .. [followup2] follow-up 2 (https://mail.python.org/pipermail/python-dev/2015-May/140137.html) Copyright =========== This document has been placed in the public domain. From tytso at mit.edu Sat Jun 11 22:37:37 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Sat, 11 Jun 2016 22:37:37 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> References: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <20160610195411.GA3932@thunk.org> <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> Message-ID: <20160612023737.GB5489@thunk.org> On Fri, Jun 10, 2016 at 05:14:50PM -0400, Random832 wrote: > On Fri, Jun 10, 2016, at 15:54, Theodore Ts'o wrote: > > So even on Python pre-3.5.0, realistically speaking, the "weakness" of > > os.random would only be an issue (a) if it is run within the first few > > seconds of boot, and (b) os.random is used to directly generate a > > long-term cryptographic secret. If you are fork openssl or ssh-keygen > > to generate a public/private keypair, then you aren't using os.random. > > So, I have a question. If this "weakness" in /dev/urandom is so > unimportant to 99% of situations... why isn't there a flag that can be > passed to getrandom() to allow the same behavior? The intention behind getrandom() is that it is intended *only* for cryptographic purposes. For that use case, there's no point having a "return a potentially unseeded cryptographic secret" option. This makes this much like FreeBSD's /dev/random and getentropy system calls. (BTW, I've seen an assertion on this thread that FreeBSD's getentropy(2) never blocks. As far as I know, this is **not** true. FreeBSD's getentropy(2) works like its /dev/random device, in that if it is not fully seeded, it will block. The only reason why OpenBSD's getentropy(2) and /dev/random devices will never block is because they only support architectures where they can make sure that entropy is passed from a previous boot session to the next, given specialized bootloader support. Linux can't do this because we support a very large number of bootloaders, and the bootloaders are not under the kernel developers' control. Fundamentally, you can't guarantee both (a) that your RNG will never block, and (b) will always be of high cryptographic quality, in a completely general sense. You can if you make caveats about your hardware or when the code runs, but that's fundamentally the problem with the documentation of os.urandom(); it's making promises which can't be true 100% of the time, for all hardware, operating environments, etc.) Anyway, if you don't need cryptographic guarantees, you don't need getrandom(2) or getentropy(2); something like this will do just fine: long getrand() { static int initialized = 0; struct timeval tv; if (!initialized) { gettimeofday(&tv, NULL); srandom(tv.tv_sec ^ tv.tv_usec ^ getpid()); initialized++; } return random(); } So this is why I did what I did. If Python decides to go down this same path, you could define a new interface ala getrandom(2), which is specifically designed for cryptogaphic purposes, and perhaps a new, more efficient interface for those people who don't need cryptogaphic guarantees --- and then keep the behavior of os.urandom consistent with Python 3.4, but update the documentation to reflect the reality. Alternatively, you could keep the implementation of os.urandom consistent with Python 3.5, and then document that under some circumstances, it will block. Both approaches have certain tradeoffs, but it's not going to be the end of the world regardless of which way you decide to go. I'd suggest that you use your existing mechanisms to decide on which approach is more Pythony, and then make sure you communicate and over-communicate it to your user/developer base. And then --- relax. It may seem like a big deal today, but in a year or so people will have gotten used to whatever interface or documentation changes you decide to make, and it will be all fine. As Dame Julian of Norwich once said, "All shall be well, and all shall be well, and all manner of things shall be well." Cheers, - Ted From vgr255 at live.ca Sat Jun 11 22:51:29 2016 From: vgr255 at live.ca (=?iso-8859-1?Q?=C9manuel_Barry?=) Date: Sat, 11 Jun 2016 22:51:29 -0400 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace (round 3) In-Reply-To: References: Message-ID: > From: Eric Snow > Sent: Saturday, June 11, 2016 10:37 PM > To: Python-Dev; Guido van Rossum > Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace (round > 3) > The only change to the original proposal > has been that a manually set __definition_order__ must be a tuple of > identifiers or None (rather that using the value as-is). > 1. if ``__definition_order__`` is defined in the class body then it > must be a ``tuple`` of identifiers or ``None``; any other value > will result in ``TypeError`` Why not just any arbitrary iterable, which get converted to a tuple at runtime? __slots__ allows any arbitrary iterable: >>> def g(): ... yield "foo" ... yield "bar" ... yield "baz" >>> class C: ... __slots__ = g() >>> C.__slots__ >>> C.__slots__.gi_running False >>> dir(C) [, 'bar', 'baz', 'foo'] > Use of a tuple reflects the fact that we are exposing the order in > which attributes on the class were *defined*. Since the definition > is already complete by the time ``definition_order__`` is set, the > content and order of the value won't be changing. Thus we use a type > that communicates that state of immutability. Typo: missing leading underscores in __definition_order__ > Compatibility > ============= > > This PEP does not break backward compatibility, except in the case that > someone relies *strictly* on ``dict`` as the class definition namespace. > This shouldn't be a problem. Perhaps add a mention that isinstance(namespace, dict) will still be true, so users don't get unnecessarily confused. > .__dict__ as OrderedDict > ------------------------------- looks weird to me. I tend to use `cls` (although `klass` isn't uncommon). `C` might also not be a bad choice. Thanks! -Emanuel From ericsnowcurrently at gmail.com Sat Jun 11 23:01:33 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 11 Jun 2016 20:01:33 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace (round 3) In-Reply-To: References: Message-ID: On Sat, Jun 11, 2016 at 7:51 PM, ?manuel Barry wrote: >> From: Eric Snow >> 1. if ``__definition_order__`` is defined in the class body then it >> must be a ``tuple`` of identifiers or ``None``; any other value >> will result in ``TypeError`` > > Why not just any arbitrary iterable, which get converted to a tuple at > runtime? An arbitrary iterable does not necessarily infer a definition order. For example, dict is an iterable but the order is undefined. Also, I'd rather favor simplicity for this (most likely) uncommon corner case of manually setting __definition_order__, particularly at the start. If it proves to be a problematic restriction in the future we can loosen it. > __slots__ allows any arbitrary iterable: Yes, but __slots__ is not order-sensitive. >> is already complete by the time ``definition_order__`` is set, the > > Typo: missing leading underscores in __definition_order__ I'll fix that. > >> Compatibility >> ============= >> >> This PEP does not break backward compatibility, except in the case that >> someone relies *strictly* on ``dict`` as the class definition namespace. >> This shouldn't be a problem. > > Perhaps add a mention that isinstance(namespace, dict) will still be true, > so users don't get unnecessarily confused. Good point. > >> .__dict__ as OrderedDict >> ------------------------------- > > looks weird to me. I tend to use `cls` (although `klass` isn't > uncommon). `C` might also not be a bad choice. Yes, that is better. -eric From vgr255 at live.ca Sat Jun 11 23:04:32 2016 From: vgr255 at live.ca (=?utf-8?Q?=C3=89manuel_Barry?=) Date: Sat, 11 Jun 2016 23:04:32 -0400 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace (round 3) In-Reply-To: References: Message-ID: > From: Eric Snow > Sent: Saturday, June 11, 2016 11:02 PM > To: ?manuel Barry > Cc: Python-Dev > Subject: Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace > (round 3) > > On Sat, Jun 11, 2016 at 7:51 PM, ?manuel Barry wrote: > >> From: Eric Snow > >> 1. if ``__definition_order__`` is defined in the class body then it > >> must be a ``tuple`` of identifiers or ``None``; any other value > >> will result in ``TypeError`` > > > > Why not just any arbitrary iterable, which get converted to a tuple at > > runtime? > > An arbitrary iterable does not necessarily infer a definition order. > For example, dict is an iterable but the order is undefined. Also, > I'd rather favor simplicity for this (most likely) uncommon corner > case of manually setting __definition_order__, particularly at the > start. If it proves to be a problematic restriction in the future we > can loosen it. Point. This can always be revised later (I'm probably overthinking this as always ;) > > -eric -Emanuel From steve at pearwood.info Sat Jun 11 23:15:17 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jun 2016 13:15:17 +1000 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> Message-ID: <20160612031517.GQ27919@ando.pearwood.info> On Sat, Jun 11, 2016 at 02:16:21PM -0700, Guido van Rossum wrote: [on the real-world consequences of degraded randomness from /dev/urandom] > Actually it's not clear to me at all that it could have happened to Python. > (Wasn't it an embedded system?) A Raspberry Pi. But don't people run Python on at least some embedded systems? The wiki thinks so: https://wiki.python.org/moin/EmbeddedPython And I thought that was the purpose of ?Python. > > Actually the proposal for that was the secrets module. And the secrets > > module would be the only user of os.urandom(blocking=True). > > > > I?m fine if this lives in the secrets module? Steven asked for it to be an > > os function so that secrets.py could continue to be pure python. > > The main thing that I want to avoid is that people start cargo-culting > whatever the secrets module uses rather than just using the secrets module. > Having it redundantly available as os.getrandom() is just begging for > people to show off how much they know about writing secure code. That makes sense. I'm happy for getrandom to be an implementation detail of secrets, but I'll need help with that part. > >> * If you want to ensure you get cryptographically secure bytes, > >> os.getrandom, falling back to os.urandom on non Linux platforms and > >> erroring on Linux. [...] > But what is a Python script going to do with that error? IIUC this kind of > error would only happen very early during boot time, and rarely, so the > most likely outcome is a hard-to-debug mystery failure. In my day job, I work for a Linux sys admin consulting company, and I can tell you from our experience that debugging a process that occasionally hangs mysteriously during boot is much harder than debugging a process that occasionally fails with an explicit error in the logs, especially if the error message is explicit about the cause: OSError: entropy pool has not been initialized yet At that point, you can take whatever action is appropriate for your script: - fail altogether, just as it might fail if it requires a writable file system and can't find one; - sleep for three seconds and try again; - log the error and proceed with degraded randomness or functionality; - change it so the script runs later in the boot process. -- Steve From random832 at fastmail.com Sun Jun 12 01:49:34 2016 From: random832 at fastmail.com (Random832) Date: Sun, 12 Jun 2016 01:49:34 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160612023737.GB5489@thunk.org> References: <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <20160610195411.GA3932@thunk.org> <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> <20160612023737.GB5489@thunk.org> Message-ID: <1465710574.1145257.635059377.3AF7C63B@webmail.messagingengine.com> On Sat, Jun 11, 2016, at 22:37, Theodore Ts'o wrote: > On Fri, Jun 10, 2016 at 05:14:50PM -0400, Random832 wrote: > > So, I have a question. If this "weakness" in /dev/urandom is so > > unimportant to 99% of situations... why isn't there a flag that can be > > passed to getrandom() to allow the same behavior? > > The intention behind getrandom() is that it is intended *only* for > cryptographic purposes. I'm somewhat confused now because if that's the case it seems to accomplish multiple unrelated things. Why was this implemented as a system call rather than a device (or an ioctl on the existing ones)? If there's a benefit in not going through the non-atomic (and possibly resource limited) procedure of acquiring a file descriptor, reading from it, and closing it, why is that benefit not also extended to non-cryptographic users of urandom via allowing the system call to be used in that way? > Anyway, if you don't need cryptographic guarantees, you don't need > getrandom(2) or getentropy(2); something like this will do just fine: Then what's /dev/urandom *for*, anyway? From tytso at mit.edu Sun Jun 12 02:11:42 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 12 Jun 2016 02:11:42 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> References: <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> Message-ID: <20160612061142.GA1986@thunk.org> On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote: > > It was a RaspberryPI that ran a shell script on boot that called > ssh-keygen. That shell script could have just as easily been a > Python script that called os.urandom via > https://github.com/sybrenstuvel/python-rsa instead of a shell script > that called ssh-keygen. So I'm going to argue that the primary bug was in the how the systemd init scripts were configured. In generally, creating keypairs at boot time is just a bad idea. They should be created lazily, in a just-in-time paradigm. Consider that if you assume that os.urandom can block, this isn't necessarily going to do the right thing either --- if you use getrandom and it blocks, and it's part of a systemd unit which is blocking futher boot progress, then the system will hang for 90 seconds, and while it's hanging, there won't be any interrupts, so the system will be dead in the water, just like the orignal bug report complaining that Python was hanging when it was using getrandom() to initialize its SipHash. At which point there will be another bug complaining about how python was causing systemd to hang for 90 seconds, and there will be demand to make os.random no longer block. (Since by definition, systemd can do no wrong; it's always other programs that have to change to accomodate systemd. :-) So some people will freak out when the keygen systemd unit hangs, blocking the boot --- and other people will freak out of the systemd unit doesn't hang, and you get predictable SSH keys --- and some wiser folks will be asking the question, why the *heck* is it not openssh/systemd's fault for trying to generate keys this early, instead of after the first time sshd needs host ssh keys? If you wait until the first time the host ssh keys are needed, then the system is fully booted, so it's likely that the entropy will be collected -- and even if it isn't, networking will already be brought up, and the system will be in multi-user mode, so entropy will be collected very quickly. Sometimes, we can't solve the problem at the Python level or at the Kernel level. It will require security-saavy userspace/application programmers as well. Cheers, - Ted From cory at lukasa.co.uk Sun Jun 12 06:40:58 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Sun, 12 Jun 2016 11:40:58 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160612061142.GA1986@thunk.org> References: <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> Message-ID: <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> > On 12 Jun 2016, at 07:11, Theodore Ts'o wrote: > > On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote: >> >> It was a RaspberryPI that ran a shell script on boot that called >> ssh-keygen. That shell script could have just as easily been a >> Python script that called os.urandom via >> https://github.com/sybrenstuvel/python-rsa instead of a shell script >> that called ssh-keygen. > > So I'm going to argue that the primary bug was in the how the systemd > init scripts were configured. In generally, creating keypairs at boot > time is just a bad idea. They should be created lazily, in a > just-in-time paradigm. Agreed. I hope that if there is only one thing every participant has learned from this (extremely painful for all concerned) discussion, it?s that doing anything that requires really good random numbers should be delayed as long as possible on all systems, and should absolutely not be done during the boot process on Linux. Don?t generate key pairs, don?t make TLS connections, just don?t perform any action that requires really good randomness at all. > So some people will freak out when the keygen systemd unit hangs, > blocking the boot --- and other people will freak out of the systemd > unit doesn't hang, and you get predictable SSH keys --- and some wiser > folks will be asking the question, why the *heck* is it not > openssh/systemd's fault for trying to generate keys this early, > instead of after the first time sshd needs host ssh keys? If you wait > until the first time the host ssh keys are needed, then the system is > fully booted, so it's likely that the entropy will be collected -- and > even if it isn't, networking will already be brought up, and the > system will be in multi-user mode, so entropy will be collected very > quickly. As far as I know we still only have three programs that were encountering this problem: Debian?s autopkgtest (which patched with PYTHONHASHSEED=0), systemd-cron (which is moving from Python to Rust anyway), and cloud-init (not formally reported but mentioned to me by a third-party). It remains unclear to me why the systemd-cron service files can?t simply request to be delayed until the kernel CSPRNG is seeded: I guess systemd doesn?t have any way to express that constraint? Perhaps it should. Of this set, only cloud-init worries me, and it worries me for the *opposite* reason that Guido and Larry are worried. Guido and Larry are worried that programs like cloud-init will be delayed by two minutes while they wait for entropy: that?s an understandable concern. I?m much more worried that programs like cloud-init may attempt to establish TLS connections or create keys during this two minute window, leaving them staring down the possibility of performing ?secure? actions with insecure keys. This is why I advocate, like Donald does, for having *some* tool in Python that allows Python programs to crash if they attempt to generate cryptographically secure random bytes on a system that is incapable of providing them (which, in practice, can only happen on Linux systems). I don?t care how it?s spelled, I just care that programs that want to use a properly-seeded CSPRNG can error out effectively when one is not available. That allows us to ensure that Python programs that want to do TLS or build key pairs correctly refuse to do so when used in this state, *and* that they provide a clearly debuggable reason for why they refused. That allows the savvy application developers that Ted talked about to make their own decisions about whether their rapid startup is sufficiently important to take the risk. Cory [0]: https://github.com/systemd-cron/systemd-cron/issues/43#issuecomment-160343989 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Sun Jun 12 07:10:00 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jun 2016 12:10:00 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> Message-ID: On 11 June 2016 at 22:46, Donald Stufft wrote: > I guess one question would be, what does the secrets module do if it?s on a > Linux that is too old to have getrandom(0), off the top of my head I can > think of: > > * Silently fall back to reading os.urandom and hope that it?s been seeded. > * Fall back to os.urandom and hope that it?s been seeded and add a > SecurityWarning or something like it to mention that it?s falling back to > os.urandom and it may be getting predictable random from /dev/urandom. > * Hard fail because it can?t guarantee secure cryptographic random. > > Of the three, I would probably suggest the second one, it doesn?t let the > problem happen silently, but it still ?works? (where it?s basically just > hoping it?s being called late enough that /dev/urandom has been seeded), and > people can convert it to the third case using the warnings module to turn > the warning into an exception. I have kept out of this discussion as I don't know enough about security to comment, but in this instance I think the answer is clear - there is no requirement for Python to protect the user against security bugs in the underlying OS (sure, it's nice if it can, but it's not necessary) so fallng back to os.urandom (with no warning) is fine. A warning, or even worse a hard fail, that 99.99% of the time should be ignored (because you're *not* writing a boot script) seems like a very bad idea. By all means document "if your OS provides no means of getting guaranteed secure randon mumbers (e.g., older versions of Linux very early in the boot sequence) then the secrets module cannot give you results that are any better than the OS provides". It seems self-evident to me that this would be the case, but I see no reason to object if the experts feel it's worth adding. Paul From tytso at mit.edu Sun Jun 12 09:28:44 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 12 Jun 2016 09:28:44 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <1465710574.1145257.635059377.3AF7C63B@webmail.messagingengine.com> References: <5759EC2B.8040208@hastings.org> <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <20160610195411.GA3932@thunk.org> <1465593290.2349072.634239529.67EEE9C8@webmail.messagingengine.com> <20160612023737.GB5489@thunk.org> <1465710574.1145257.635059377.3AF7C63B@webmail.messagingengine.com> Message-ID: <20160612132844.GB1986@thunk.org> On Sun, Jun 12, 2016 at 01:49:34AM -0400, Random832 wrote: > > The intention behind getrandom() is that it is intended *only* for > > cryptographic purposes. > > I'm somewhat confused now because if that's the case it seems to > accomplish multiple unrelated things. Why was this implemented as a > system call rather than a device (or an ioctl on the existing ones)? If > there's a benefit in not going through the non-atomic (and possibly > resource limited) procedure of acquiring a file descriptor, reading from > it, and closing it, why is that benefit not also extended to > non-cryptographic users of urandom via allowing the system call to be > used in that way? This design was taken from OpenBSD, and the goal with getentropy(2) (which is also designed only for cryptographic use cases), was so that a denial of service attack (fd exhaustion) could force an application to fall back to a weaker -- in some cases, very weak or non-existent --- source of randomness. Non-cryptographic users don't need to use this interface at all. They can just use srandom(3)/random(3) and be happy. > > Anyway, if you don't need cryptographic guarantees, you don't need > > getrandom(2) or getentropy(2); something like this will do just fine: > > Then what's /dev/urandom *for*, anyway? /dev/urandom is a legacy interface. It was intended originally for cryptographic use cases, but it was intended for the days when very few programs needed a secure cryptographic random generator, and it was assumed that application programmers would be very careful in checking error codes, etc. It also dates back to a time when the NSA was still pushing very hard for cryptographic export controls (hence the use of SHA-1 versus an encryption algorithm) and when many people questioned whether or not the SHA-1 algorithm, as designed by the NSA, had a backdoor in it. (As it turns out, the NSA put a back door into DUAL-EC, so retrospect this concern really wasn't that unreasonable.) Because of those concerns, the assumption is those few applications who really wanted to get security right (e.g., PGP, which still uses /dev/random for long-term key generation), would want to use /dev/random and deal with entropy accounting, and asking the user to type randomness on the keyboard and move their mouse around while generating a random key. But times change, and these days people are much more likely to believe that SHA-1 is in fact cryptographically secure, and future crypto hash algorithms are designed by teams from all over the world and NIST/NSA merely review the submissions (along with everyone else). So for example, SHA-3 was *not* designed by the NSA, and it was evaluated using a much more open process than SHA-1. Also, we have a much larger set of people writing code which is sensitive to cryptographic issues (back when I wrote /dev/random, I probably had met, or at least electronically corresponded with a large number of the folks who were working on network security protocols, at least in the non-classified world), and these days, there is much less trust that people writing code to use /dev/[u]random are in fact careful and competent security engineers. Whether or not this is a fair concern or not, it is true that there has been a change in API design ethos away from the "Unix let's make things as general as possible, in case someone clever comes up use case we didn't think of", to "idiots are ingenious so they will come up with ways to misuse an idiot-proof interface, so we need to lock it down as much as possible." OpenBSD's getentropy(2) interface is a strong example of this new attitude towards API design, and getrandom(2) is not quite so doctrinaire (I added a flags field when getentropy(2) didn't even give those options to progammers), but it is following in the same tradition. Cheers, - Ted From tytso at mit.edu Sun Jun 12 09:43:15 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 12 Jun 2016 09:43:15 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> References: <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> Message-ID: <20160612134315.GC1986@thunk.org> On Sun, Jun 12, 2016 at 11:40:58AM +0100, Cory Benfield wrote: > > Of this set, only cloud-init worries me, and it worries me for the > *opposite* reason that Guido and Larry are worried. Guido and Larry > are worried that programs like cloud-init will be delayed by two > minutes while they wait for entropy: that?s an understandable > concern. I?m much more worried that programs like cloud-init may > attempt to establish TLS connections or create keys during this two > minute window, leaving them staring down the possibility of > performing ?secure? actions with insecure keys. There are patches in the dev branch of: https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/ which will automatically use virtio-rng (if it is provided by the cloud provider) to initialize /dev/urandom. It also uses a much more aggressive mechanism to initialize the /dev/urandom pool, so that getrandom(2) will block for a much shorter period of time immediately after boot time on real hardware. I'm confident it's secure for x86 platforms. I'm still thinking about whether I should fall back to something more conservative for crappy embedded processors that don't have a cycle counter or an CPU-provided RDRAND-like instruction. Related to this is whether I should finally make the change so that /dev/urandom will block until it is initialized. (This would make Linux work like FreeBSD, which *will* also block if its entropy pool is not initialized.) > This is why I advocate, like Donald does, for having *some* tool in > Python that allows Python programs to crash if they attempt to > generate cryptographically secure random bytes on a system that is > incapable of providing them (which, in practice, can only happen on > Linux systems). Well, it can only happen on Linux because you insist on falling back to /dev/urandom --- and because other OS's have the good taste not to use systemd and/or Python very early in the boot process. If someone tried to run a python script in early FreeBSD init scripts, it would block just as you were seeing on Linux --- you just haven't seen that yet, because arguably the FreeBSD developers have better taste in their choice of init scripts than Red Hat and Debian. :-) So the question is whether I should do what FreeBSD did, which will statisfy those people who are freaking out and whinging about how Linux could allow stupidly written or deployed Python scripts get cryptographically insecure bytes, by removing that option from Python developers. Or should I remove that one line from changes in the random.git patch series, and allow /dev/urandom to be used even when it might be insecure, so as to satisfy all of the people who are freaking out and whinging about the fact that a stupildly written and/or deployed Python script might block during early boot and hang a system? Note that I've tried to do what I can to make the time that /dev/urandom might block as small as possible, but at the end of the day, there is still the question of whether I should remove the choice re: blocking from userspace, ala FreeBSD, or not. And either way, some number of people will be whinging and freaking out. Which is why I completely sympathetic to how Guido might be getting a little exasperated over this whole thread. :-) - Ted From christian at python.org Sun Jun 12 10:36:56 2016 From: christian at python.org (Christian Heimes) Date: Sun, 12 Jun 2016 16:36:56 +0200 Subject: [Python-Dev] New hash algorithms: SHA3, SHAKE, BLAKE2, truncated SHA512 In-Reply-To: <52d03a08-8d5e-9751-405d-aeeca740d832@python.org> References: <52d03a08-8d5e-9751-405d-aeeca740d832@python.org> Message-ID: On 2016-05-25 12:29, Christian Heimes wrote: > Hi everybody, > > I have three hashing-related patches for Python 3.6 that are waiting for > review. Altogether the three patches add ten new hash algorithms to the > hashlib module: SHA3 (224, 256, 384, 512), SHAKE (SHA3 XOF 128, 256), > BLAKE2 (blake2b, blake2s) and truncated SHA512 (224, 256). > > > SHA-3 / SHAKE: https://bugs.python.org/issue16113 > BLAKE2: https://bugs.python.org/issue26798 > SHA512/224 / SHA512/256: https://bugs.python.org/issue26834 > > > I like to push the patches during the sprints at PyCon. Please assist > with reviews. Hi, I have unassigned myself from the tickets and will no longer pursue the addition of new crypto hash algorithms. I might try again when blake2 and sha3 are more widely adopted and the opposition from other core contributors has diminished. Acceptance is simply not high enough to be worth the trouble. Kind regards, Christian From michael at felt.demon.nl Sun Jun 12 10:06:43 2016 From: michael at felt.demon.nl (Michael Felt) Date: Sun, 12 Jun 2016 16:06:43 +0200 Subject: [Python-Dev] C99 In-Reply-To: References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> Message-ID: <4b260bf4-0eb8-960e-3e98-8e852c190dd9@felt.demon.nl> I am using IBM xlc aka vac - version 11. afaik it will deal with c99 features (by default I set it to behave that way because a common 'issue' is C++ style comments, when they should not be that style (fyi: not seen that in Python). IMHO: GCC is not just a compiler - it brings with it a whole set of infrastructure requirements (aka run-time environment, rte). Certainly not an issue for GNU environments, but non-gnu (e.g., posix) will/may have continual side-effects from "competing" rte.. At least that was my experience when I was using gcc rather than xlc. On 6/4/2016 9:53 AM, Martin Panter wrote: > Sounds good for features that are well-supported by compilers that > people use. (Are there other compilers used than just GCC and MSVC?) From stefan at bytereef.org Sun Jun 12 11:10:07 2016 From: stefan at bytereef.org (Stefan Krah) Date: Sun, 12 Jun 2016 15:10:07 +0000 (UTC) Subject: [Python-Dev] C99 References: <1465020691.2818312.627646289.6A6F4D74@webmail.messagingengine.com> <4b260bf4-0eb8-960e-3e98-8e852c190dd9@felt.demon.nl> Message-ID: Michael Felt felt.demon.nl> writes: > I am using IBM xlc aka vac - version 11. > > afaik it will deal with c99 features (by default I set it to behave that > way because a common 'issue' is C++ style comments, when they should not > be that style (fyi: not seen that in Python). We had a couple of exotic build machines a while ago: xlc, the HPUX compiler and a couple of others all support the subset of C99 we are aiming for. In fact the support of the commercial Unix compilers for C99 is quite good -- the common error messages suggest that several of them use the same front end (Comeau?). Stefan Krah From donald at stufft.io Sun Jun 12 12:35:38 2016 From: donald at stufft.io (Donald Stufft) Date: Sun, 12 Jun 2016 12:35:38 -0400 Subject: [Python-Dev] writing to /dev/*random [was: BDFL ruling request: should we block ...] In-Reply-To: <22364.43500.337805.137220@turnbull.sk.tsukuba.ac.jp> References: <20160609215343.00b0190e.barry@wooz.org> <575A2FCC.5070101@hastings.org> <981CD440-71B6-46AD-A057-585A812E083B@stufft.io> <048901d1c33a$5bf13930$13d3ab90$@sdamon.com> <20160611082437.GN27919@ando.pearwood.info> <22364.43500.337805.137220@turnbull.sk.tsukuba.ac.jp> Message-ID: <844F63E6-2B12-4F05-8FC5-611111A3F276@stufft.io> > On Jun 11, 2016, at 8:16 PM, Stephen J. Turnbull wrote: > > This fails for unprivileged users on Mac. I'm not sure what happens > on Linux; it appears to succeed, but the result wasn't what I > expected. I think that on Linux it will mix in whatever you write into the entropy, but it won?t increase the entropy counter for it. ? Donald Stufft From njs at pobox.com Sun Jun 12 14:07:22 2016 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 12 Jun 2016 11:07:22 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160612061142.GA1986@thunk.org> References: <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> Message-ID: On Jun 11, 2016 11:13 PM, "Theodore Ts'o" wrote: > > On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote: > > > > It was a RaspberryPI that ran a shell script on boot that called > > ssh-keygen. That shell script could have just as easily been a > > Python script that called os.urandom via > > https://github.com/sybrenstuvel/python-rsa instead of a shell script > > that called ssh-keygen. > > So I'm going to argue that the primary bug was in the how the systemd > init scripts were configured. In generally, creating keypairs at boot > time is just a bad idea. They should be created lazily, in a > just-in-time paradigm. > > Consider that if you assume that os.urandom can block, this isn't > necessarily going to do the right thing either --- if you use > getrandom and it blocks, and it's part of a systemd unit which is > blocking futher boot progress, then the system will hang for 90 > seconds, and while it's hanging, there won't be any interrupts, so the > system will be dead in the water, just like the orignal bug report > complaining that Python was hanging when it was using getrandom() to > initialize its SipHash. Hi Ted, >From another perspective, I guess one could also argue that the best place to fix this is in the kernel: if a process is blocked waiting for entropy then the kernel probably shouldn't take that its cue to turn off all the entropy generation mechanisms, just like how if a process is blocked waiting for disk I/O then we probably shouldn't power down the disk controller. Obviously this is a weird case because the kernel is architected in a way that makes the dependency between the disk controller and the I/O request obvious, while the dependency between the random pool and... well... everything else, more or less, is much more subtle and goes outside the usual channels, and we wouldn't want to rearchitect everything just for this. But for example, if a process is actively blocked waiting for the initial entropy, one could spawn a kernel thread that keeps the system from quiescing by attempting to scrounge up entropy as fast as possible, via whatever mechanisms are locally appropriate (e.g. doing a busy-loop racing two clocks against each other, or just scheduling lots of interrupts -- which I guess is the same thing, more or less). And the thread would go away again as soon as userspace wasn't blocked on entropy. That way this deadlock wouldn't be possible. I guess someone *might* complain about the idea of the entropy pool actually spending resources instead of being quietly parasitic, because this is the kernel and someone will always complain about everything :-). But complaining about this makes about much sense as complaining about the idea of spending resources trying to service I/O when a process is blocked on that ("maybe if we wait long enough then some other part of the system will just kind of accidentally page in the data we need as a side effect of whatever it's doing, and then this thread will be able to proceed"). Is this an approach that you've considered? > At which point there will be another bug complaining about how python > was causing systemd to hang for 90 seconds, and there will be demand > to make os.random no longer block. (Since by definition, systemd can > do no wrong; it's always other programs that have to change to > accomodate systemd. :-) FWIW, the systemd thing is a red herring -- this was debian's configuration of a particular daemon that is not maintained by the systemd project, and the exact same thing would have happened with sysvinit if debian had tried using python 3.5 early in their rcS. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Sun Jun 12 16:01:09 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Sun, 12 Jun 2016 21:01:09 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160612134315.GC1986@thunk.org> References: <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> Message-ID: <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> > On 12 Jun 2016, at 14:43, Theodore Ts'o wrote: > > Well, it can only happen on Linux because you insist on falling back > to /dev/urandom --- and because other OS's have the good taste not to > use systemd and/or Python very early in the boot process. If someone > tried to run a python script in early FreeBSD init scripts, it would > block just as you were seeing on Linux --- you just haven't seen that > yet, because arguably the FreeBSD developers have better taste in > their choice of init scripts than Red Hat and Debian. :-) Heh, yes, so to be clear, I said ?this can only happen on Linux? because I?m talking about the world that we live in: the one where I lost this debate. =D Certainly right now the codebase as it stands could encounter the same problems on FreeBSD. That?s a problem for Python to deal with. > So the question is whether I should do what FreeBSD did, which will > statisfy those people who are freaking out and whinging about how > Linux could allow stupidly written or deployed Python scripts get > cryptographically insecure bytes, by removing that option from Python > developers. Or should I remove that one line from changes in the > random.git patch series, and allow /dev/urandom to be used even when > it might be insecure, so as to satisfy all of the people who are > freaking out and whinging about the fact that a stupildly written > and/or deployed Python script might block during early boot and hang a > system? > > Note that I've tried to do what I can to make the time that > /dev/urandom might block as small as possible, but at the end of the > day, there is still the question of whether I should remove the choice > re: blocking from userspace, ala FreeBSD, or not. And either way, > some number of people will be whinging and freaking out. Which is why > I completely sympathetic to how Guido might be getting a little > exasperated over this whole thread. :-) I don?t know that we need to talk about removing the choice. I understand the desire to commit to backwards compatibility, of course I do. My problem with /dev/urandom is not that it *exists*, per se: all kinds of stupid stuff exists for the sake of backward compatibility. My problem with /dev/urandom is that it?s a trap, lying in wait for someone who doesn?t know enough about the problem they?re solving to step into it. And it?s the worst kind of trap: it?s one you don?t know you?ve stepped in. Nothing about the failure mode of /dev/urandom is obvious. Worse, well-written apps that try their best to do the right thing can still step into that failure mode if they?re run in a situation that they weren?t expecting (e.g. on an embedded device without hardware RNG or early in the boot process). So my real problem with /dev/urandom is that the man page doesn?t say, in gigantic letters, ?this device has a really nasty failure mode that you cannot possibly detect by just running the code in the dangerous mode?. It?s understandable to have insecure weak stuff available to users: Python has loads of it. But where possible, the documentation marks it as such. It?d be good to have /dev/urandom?s man page say ?hey, by the way, you almost certainly don?t want this: try using getrandom() instead?. Anyway, regarding changing the behaviour of /dev/urandom: as you?ve correctly highlighted, at this point you?re damned if you do and damned if you don?t. If you don?t change, you?ll forever have people like me saying that /dev/urandom is dangerous, and that its behaviour in the unseeded/poorly-seeded state is a misfeature. I trust you?ll understand when I tell you that that opinion has nothing to do with *you* or the Linux kernel maintainership. This is all about the way software security evolves: things that used to be ok start to become not ok over time. We learn, we improve. Of course, if you do change the behaviour, you?ll rightly have programmers stumble onto this exact problem. They?ll be unhappy too. And the worst part of all of this is that neither side of that debate is *wrong*: they just prioritise different things. Guido, Larry, and friends aren?t wrong, any more than I am: we just rate the different concerns differently. That?s fine: after all, it?s probably why Guido invented and maintains an extremely popular programming language and I haven?t and never will! I have absolutely no problem with breaking ?working? code if I believe that that code is exposing users to risks they aren?t aware of (you can check my OSS record to prove it, and I?m happy to provide references). The best advice I can give anyone in this debate, on either side, is to make decisions that you can live with. Consider the consequences, consider the promises you?ve made to users, and then do what you think is right. Guido and Larry have decided to go with backward-compatibility: fine. They?re responsible, the buck stops with them, they know that. The same is true for you, Ted, with the /dev/urandom device. If it were me, I?d change the behaviour of /dev/urandom in a heartbeat. But then again, I?m not Ted Ts?o, and I suspect that instinct is part of why. For my part, thanks for participating, Ted. It?s good to know you know what the problems are, even if your solution isn?t necessarily the one I?d go for. =) Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tytso at mit.edu Sun Jun 12 17:10:38 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 12 Jun 2016 17:10:38 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> Message-ID: <20160612211038.GF1986@thunk.org> On Sun, Jun 12, 2016 at 11:07:22AM -0700, Nathaniel Smith wrote: > But for example, if a process is actively blocked waiting > for the initial entropy, one could spawn a kernel thread that keeps the > system from quiescing by attempting to scrounge up entropy as fast as > possible, via whatever mechanisms are locally appropriate (e.g. doing a > busy-loop racing two clocks against each other, or just scheduling lots of > interrupts -- which I guess is the same thing, more or less). There's a lot of snake oil, or at least, hand waving, that goes on with respect to what will actually work to gather randomness. One of the worst possible choices is a standard, kernel-defined workload that tries to just busy loop two clocks against each other. For one thing, on many embedded systems, all of your clocks are generated off of a single master oscillator anyway. And in early boot, it's not realistic for the kernel to be able to measure network interrupt timings and radio strength indicators from the WiFi, which ultimately is going to be much more likely to be unpredictable by an outside attacker sitting in Fort Meade than pretending that you can just "schedule lots of interrupts". Again, part of the problem here is that if you really want to be secure, it needs to be a full stack perspective, where the hardware designers, the OS developers, and the application level developers are all working together. If one side tries to exert a strong "somebody else's problem field", it's very likely the end solution isn't going to be secure. Because in many cases this is simply not practical, we all have to make assumptions at the OS and C-Python interpreter level, and hope that the assumptions that we make are are conservative enough. > Is this an approach that you've considered? Ultimately, the arguments made by approaches such as Jitterbug are, to put it succiently and perhaps a little unfairly, "gee whillikers, the Intel L1/L2 cache hierarchy is really complicated and it's a closed hardware implementation so no one can understand it, and besides, the statistical analysis of the output looks good". To which I would say, "the first argument is an argument of security through ignorance", and "AES(NSA_KEY, COUNTER++)" also has really great statistical results, and if you don't know the NSA_KEY, it will look very strong and as far as we know, we wouldn't be able to distinguish it from truly secure random number generator --- but it really isn't secure. So yeah, I don't buy it. In order for it to be secure, we need to be grabbing measurements which can't be replicated or determined by a remote attacker. So having the kernel kick off a kernel thread is not going to be useful unless we can mix in entropy from the user, or the workload, or the local configuration, or from the local environment. (Using RSSI is helpful because the remote attacker might not know whether your mobile handset is in the knapsack under the table, or on the desk, and that will change the RSSI numbers.) Remember, the whole *point* of modern CPU designs is that the huge amounts of engineering effort is put into making the CPU be predictable, and so spawning a kernel thread in isolation isn't going perform magic in terms of getting guaranteed unpredictability. > FWIW, the systemd thing is a red herring -- this was debian's configuration > of a particular daemon that is not maintained by the systemd project, and > the exact same thing would have happened with sysvinit if debian had tried > using python 3.5 early in their rcS. It's not a daemon. It's the script in /lib/systemd/system-generators/systemd-crontab-generator, and it's needed because systemd subsumed the cron daemon, and developers who wanted to not break user's existing crontab files turned to it. I suppose you are technically correct that it is not mainained by systemd, but the need for it was generated out of systemd's lack of concern of backwards compatibility. Because FreeBSD and Mac OS are not using systemd, they are not likely to run into this problem. I will grant that if they decided to try to run a python script out of their /etc/rc script, they would run into the same problem. - Ted From tytso at mit.edu Sun Jun 12 19:28:03 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 12 Jun 2016 19:28:03 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> Message-ID: <20160612232803.GB17328@thunk.org> On Sun, Jun 12, 2016 at 09:01:09PM +0100, Cory Benfield wrote: > My problem with /dev/urandom is that it?s a trap, lying in wait for > someone who doesn?t know enough about the problem they?re solving to > step into it. And my answer to that question is absent backwards compatibility concerns, use getrandom(2) on Linux, or getentropy(2) on *BSD, and be happy. Don't use /dev/urandom; use getrandom(2) instead. That way you also solve a number of other problems such as the file descriptor DOS attack issue, etc. The problem with Python is that you *do* have backwards compatibility concerns. At which point you are faced with the same issues that we are in the kernel; except I gather than that the commitment to backwards compatibility isn't quite as absolute (although it is strong). Which is why I've been trying very hard not to tell python-dev what to do, but rather to give you folks the best information I can, and then encouraging you to do whatever seems most "Pythony" --- which might or might not be the same as the decisions we've made in the kernel. Cheers, - Ted P.S. BTW, I probably won't change the behaviour of /dev/urandom to make it be blocking. Before I found out about Pyhton Bug #26839, I actually had patches that did make /dev/urandom blocking, and they were planned to for the next kernel merge window. But ultimately, the reason why I won't is because there is a set of real users (Debian Stretch users on Amazon AWS and Google GCE) for which if I changed how /dev/urandom worked, then I would be screwing them over, even if Python 3.5.2 falls back to /dev/urandom. It's not a problem for bare metal hardware and cloud systems with virtio-rng; I have patches that will take care of those scenarios. Unfortunately, both AWS and GCE don't support virtio-rng currently, and as much as some poeple are worried about the hypothetical problems of stupidly written/deployed Python scripts that try to generate long-term secrets during early boot, weighed against the very real prospect of user lossage on two of the most popular Cloud environments out there --- it's simply no contest. From larry at hastings.org Sun Jun 12 20:21:14 2016 From: larry at hastings.org (Larry Hastings) Date: Sun, 12 Jun 2016 17:21:14 -0700 Subject: [Python-Dev] Reminder: 3.6.0a2 snapshot 2016-06-13 12:00 UTC In-Reply-To: <23B6CAA5-6E07-4F2B-898F-B9EABF8E9BD0@python.org> References: <23B6CAA5-6E07-4F2B-898F-B9EABF8E9BD0@python.org> Message-ID: <575DFC7A.7000403@hastings.org> On 06/10/2016 03:23 PM, Ned Deily wrote: > Also note that Larry has announced plans to do a 3.5.2 release candidate sometime this weekend and Benjamin plans to do a 2.7.12 release candidate. So get important maintenance release fixes in ASAP. To clarify: /both/ 3.5.2rc1 /and/ 3.4.5rc1 were tagged yesterday and will ship later today. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jun 12 21:53:54 2016 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 12 Jun 2016 18:53:54 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160612232803.GB17328@thunk.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> Message-ID: On Sun, Jun 12, 2016 at 4:28 PM, Theodore Ts'o wrote: > P.S. BTW, I probably won't change the behaviour of /dev/urandom to > make it be blocking. Before I found out about Pyhton Bug #26839, I > actually had patches that did make /dev/urandom blocking, and they > were planned to for the next kernel merge window. But ultimately, the > reason why I won't is because there is a set of real users (Debian > Stretch users on Amazon AWS and Google GCE) for which if I changed how > /dev/urandom worked, then I would be screwing them over, even if > Python 3.5.2 falls back to /dev/urandom. It's not a problem for bare > metal hardware and cloud systems with virtio-rng; I have patches that > will take care of those scenarios. > > Unfortunately, both AWS and GCE don't support virtio-rng currently, > and as much as some poeple are worried about the hypothetical problems > of stupidly written/deployed Python scripts that try to generate > long-term secrets during early boot, weighed against the very real > prospect of user lossage on two of the most popular Cloud environments > out there --- it's simply no contest. Speaking of full-stack perspectives, would it affect your decision if Debian Stretch were made robust against blocking /dev/urandom on AWS/GCE? Because I think we could find lots of people who would be overjoyed to fix Stretch before the next merge window even opens (AFAICT the quick fix is literally a 1 line patch), if that allowed the blocking /dev/urandom patches to go in upstream... (It looks like Jessie isn't affected, because while Jessie does provide a systemd-cron package for those who decide to install it, Jessie's systemd-cron is still using python2, python2 doesn't have hash randomization so it doesn't touch /dev/urandom at startup, and systemd-cron doesn't have any code that would trigger access to /dev/urandom otherwise. It looks like Xenial *is* affected, because they ship systemd-cron with python3, but their python3 is still unconditionally using getrandom() in blocking mode, so they need to patch that regardless, and could just as easily make it robust against blocking /dev/urandom at the same time. I don't understand the RPM world as well, but I can't find any evidence that Fedora or SuSE ship systemd-cron at all.) -n -- Nathaniel J. Smith -- https://vorpus.org From larry at hastings.org Sun Jun 12 23:16:12 2016 From: larry at hastings.org (Larry Hastings) Date: Sun, 12 Jun 2016 20:16:12 -0700 Subject: [Python-Dev] [RELEASED] Python 3.4.5rc1 and Python 3.5.2rc1 are now available Message-ID: <575E257C.5020808@hastings.org> On behalf of the Python development community and the Python 3.4 and Python 3.5 release teams, I'm pleased to announce the availability of Python 3.4.5rc1 and Python 3.5.2rc1. Python 3.4 is now in "security fixes only" mode. This is the final stage of support for Python 3.4. All changes made to Python 3.4 since Python 3.4.4 should be security fixes only; conventional bug fixes are not accepted. Also, Python 3.4.5rc1 and all future releases of Python 3.4 will only be released as source code--no official binary installers will be produced. Python 3.5 is still in active "bug fix" mode. Python 3.5.2rc1 contains many incremental improvements over Python 3.5.1. Both these releases are "release candidates". They should not be considered the final releases, although the final releases should contain only minor differences. Python users are encouraged to test with these releases and report any problems they encounter. You can find Python 3.4.5rc1 here: https://www.python.org/downloads/release/python-345rc1/ And you can find Python 3.5.2rc1 here: https://www.python.org/downloads/release/python-352rc1/ Python 3.4.5 final and Python 3.5.2 final are both scheduled for release on June 26th, 2016. Happy Pythoneering, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Sun Jun 12 23:35:25 2016 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 12 Jun 2016 20:35:25 -0700 Subject: [Python-Dev] [RELEASE] Python 2.7.12 release candidate 1 Message-ID: <1465788925.287521.635663601.5B6BD6AA@webmail.messagingengine.com> Python 2.7.12 release candidate 1 is now available for download. This is a preview release of the next bugfix release in the Python 2.7.x series. Assuming no horrible regressions are located, a final release will follow in two weeks. Downloads for 2.7.12rc1 can be found python.org: https://www.python.org/downloads/release/python-2712rc1/ The complete changelog may be viewed at https://hg.python.org/cpython/raw-file/v2.7.12rc1/Misc/NEWS Please test the pre-release and report any bugs to https://bugs.python.org Servus, Benjamin From steve at pearwood.info Mon Jun 13 00:29:31 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 13 Jun 2016 14:29:31 +1000 Subject: [Python-Dev] Stop using timeit, use perf.timeit! In-Reply-To: <1465688598.1078301.634935153.258EE6BF@webmail.messagingengine.com> References: <20160611014549.GK27919@ando.pearwood.info> <1465688598.1078301.634935153.258EE6BF@webmail.messagingengine.com> Message-ID: <20160613042931.GS27919@ando.pearwood.info> On Sat, Jun 11, 2016 at 07:43:18PM -0400, Random832 wrote: > On Fri, Jun 10, 2016, at 21:45, Steven D'Aprano wrote: > > If you express your performances as speeds (as "calculations per > > second") then the harmonic mean is the right way to average them. > > That's true in so far as you get the same result as if you were to take > the arithmetic mean of the times and then converted from that to > calculations per second. Is there any other particular basis for > considering it "right"? I think this is getting off-topic, so extended discussion should probably go off-list. But the brief answer is that it gives a physically meaningful result if you replace each of the data points with the mean. Which specific mean you use depends on how you are using the data points. http://mathforum.org/library/drmath/view/69480.html Consider the question: Dave can paint a room in 5 hours, and Sue can paint the same room in 3 hours. How long will it take them, working together, to paint the room? The right answer can be found the long way: Dave paints 1/5 of a room per hour, and Sue paints 1/3 of a room per hour, so together they paint (1/5+1/3) = 8/15 of a room per hour. So to paint one full room, it takes 15/8 = 1.875 hours. (Sanity check: after 1.875 hours, Sue has painted 1.875/3 of the room, or 62.5%. In that same time, Dave has painted 1.875/5 of the room, or 37.5%. Add the percentages together, and you have 100% of the room.) Using the harmonic mean, the problem is simple: data = 5, 3 # time taken per person mean = 3.75 # time taken per person on average Since they are painting the room in parallel, each person need only paint half the room on average, giving total time of: 3.75/2 = 1.875 hours If we were to use the arithmetic mean (5+3)/2 = 4 hours, we'd get the wrong answer. -- Steve From tytso at mit.edu Mon Jun 13 08:26:54 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Mon, 13 Jun 2016 08:26:54 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> Message-ID: <20160613122654.GE17328@thunk.org> On Sun, Jun 12, 2016 at 06:53:54PM -0700, Nathaniel Smith wrote: > > Speaking of full-stack perspectives, would it affect your decision if > Debian Stretch were made robust against blocking /dev/urandom on > AWS/GCE? Because I think we could find lots of people who would be > overjoyed to fix Stretch before the next merge window even opens > (AFAICT the quick fix is literally a 1 line patch), if that allowed > the blocking /dev/urandom patches to go in upstream... Alas, it's not just Debian. Apparently it breaks the boot on Openwrt as well as Ubuntu Quantal: https://lkml.org/lkml/2016/6/13/48 https://lkml.org/lkml/2016/5/31/599 (Yay for an automated test infrastructure that fires off as soon as you push to an externally visible git repository. :-) I haven't investigated to see exactly *why* it's blowing up on these userspace setups, but it's a great reminder for why changing an established interface is something that has to be done very carefully indeed. - Ted From leewangzhong+python at gmail.com Mon Jun 13 09:35:20 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 13 Jun 2016 09:35:20 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jun 13 12:34:07 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jun 2016 09:34:07 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: <575EE07F.8040102@stoneleaf.us> On 06/10/2016 02:13 PM, Franklin? Lee wrote: > P.S.: If anyone is missing the relevance, Raymond Hettinger's compact > dicts are inherently ordered until a delitem happens.[1] That could be > "good enough" for many purposes, including kwargs and class definition. It would be great for kwargs, but not for class definition: del's can happen there, so we need PEP 520 with OrderedDict so the definition order is not lost when an item is deleted during class creation. -- ~Ethan~ From berker.peksag at gmail.com Mon Jun 13 14:12:56 2016 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Mon, 13 Jun 2016 21:12:56 +0300 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace (round 3) In-Reply-To: References: Message-ID: On Sun, Jun 12, 2016 at 5:37 AM, Eric Snow wrote: > The following code demonstrates roughly equivalent semantics for the > default behavior:: > > class Meta(type): > def __prepare__(cls, *args, **kwargs): Shouldn't this be wrapped with a classmethod decorator? +1 from me. --Berker From leewangzhong+python at gmail.com Mon Jun 13 16:37:33 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 13 Jun 2016 16:37:33 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Mon Jun 13 17:24:09 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 13 Jun 2016 17:24:09 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Mon Jun 13 17:35:00 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 13 Jun 2016 17:35:00 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: I am. I was just wondering if there was an in-progress effort I should be looking at, because I am interested in extensions to it. P.S.: If anyone is missing the relevance, Raymond Hettinger's compact dicts are inherently ordered until a delitem happens.[1] That could be "good enough" for many purposes, including kwargs and class definition. If CPython implements efficient compact dicts, it would be easier to propose order-preserving (or initially-order-preserving) dicts in some places in the standard. [1] Whether delitem preserves order depends on whether you want to allow gaps in your compact entry table. PyPy implemented compact dicts and chose(?) to make dicts ordered. On Saturday, June 11, 2016, Eric Snow wrote: > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a while back. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Mon Jun 13 17:34:56 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon, 13 Jun 2016 14:34:56 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: Can someone block Franklin until his mailer stops resending this message? --Guido (mobile) On Jun 13, 2016 2:26 PM, "Franklin? Lee" wrote: > I am. I was just wondering if there was an in-progress effort I should be > looking at, because I am interested in extensions to it. > > P.S.: If anyone is missing the relevance, Raymond Hettinger's compact > dicts are inherently ordered until a delitem happens.[1] That could be > "good enough" for many purposes, including kwargs and class definition. If > CPython implements efficient compact dicts, it would be easier to propose > order-preserving (or initially-order-preserving) dicts in some places in > the standard. > > [1] Whether delitem preserves order depends on whether you want to allow > gaps in your compact entry table. PyPy implemented compact dicts and > chose(?) to make dicts ordered. > > On Saturday, June 11, 2016, Eric Snow wrote: > >> On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee >> wrote: >> > Eric, have you any work in progress on compact dicts? >> >> Nope. I presume you are talking the proposal Raymond made a while back. >> >> -eric >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at timgolden.me.uk Mon Jun 13 18:08:26 2016 From: mail at timgolden.me.uk (Tim Golden) Date: Mon, 13 Jun 2016 23:08:26 +0100 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> Message-ID: <0b940ccd-33f1-ae21-5295-025bb2b46006@timgolden.me.uk> I've set him to moderation for now. Beyond that we'd have to unsubscribe him altogether and ask him to resubscribe later. TJG On 13/06/2016 22:34, Guido van Rossum wrote: > Can someone block Franklin until his mailer stops resending this message? > > --Guido (mobile) > > On Jun 13, 2016 2:26 PM, "Franklin? Lee" > wrote: > > I am. I was just wondering if there was an in-progress effort I > should be looking at, because I am interested in extensions to it. > > P.S.: If anyone is missing the relevance, Raymond > Hettinger's compact dicts are inherently ordered until a > delitem happens.[1] That could be "good enough" for many purposes, > including kwargs and class definition. If CPython implements > efficient compact dicts, it would be easier to propose > order-preserving (or initially-order-preserving) dicts in some > places in the standard. > > [1] Whether delitem preserves order depends on whether you want to > allow gaps in your compact entry table. PyPy implemented compact > dicts and chose(?) to make dicts ordered. > > On Saturday, June 11, 2016, Eric Snow > wrote: > > On Fri, Jun 10, 2016 at 11:54 AM, Franklin? Lee > wrote: > > Eric, have you any work in progress on compact dicts? > > Nope. I presume you are talking the proposal Raymond made a > while back. > > -eric > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mail%40timgolden.me.uk > From python at mrabarnett.plus.com Mon Jun 13 20:05:06 2016 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 14 Jun 2016 01:05:06 +0100 Subject: [Python-Dev] PEP 468 In-Reply-To: <575EE07F.8040102@stoneleaf.us> References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> <575EE07F.8040102@stoneleaf.us> Message-ID: On 2016-06-13 17:34, Ethan Furman wrote: > On 06/10/2016 02:13 PM, Franklin? Lee wrote: > >> P.S.: If anyone is missing the relevance, Raymond Hettinger's compact >> dicts are inherently ordered until a delitem happens.[1] That could be >> "good enough" for many purposes, including kwargs and class definition. > > It would be great for kwargs, but not for class definition: del's can > happen there, so we need PEP 520 with OrderedDict so the definition > order is not lost when an item is deleted during class creation. > The order can be lost when an item is deleted because it moves the last item into the 'hole' left by the deleted item. This could be avoided by expanding the items to include the index of the 'previous' and 'next' item, so that they could be handled like a doubly-linked list. The disadvantage would be that it would use more memory. From larry at hastings.org Mon Jun 13 20:47:12 2016 From: larry at hastings.org (Larry Hastings) Date: Mon, 13 Jun 2016 17:47:12 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> <575EE07F.8040102@stoneleaf.us> Message-ID: <575F5410.6080106@hastings.org> On 06/13/2016 05:05 PM, MRAB wrote: > This could be avoided by expanding the items to include the index of > the 'previous' and 'next' item, so that they could be handled like a > doubly-linked list. > > The disadvantage would be that it would use more memory. Another, easier technique: don't fill holes. Same disadvantage (increased memory use), but easier to write and maintain. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Jun 13 21:14:26 2016 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 14 Jun 2016 02:14:26 +0100 Subject: [Python-Dev] PEP 468 In-Reply-To: <575F5410.6080106@hastings.org> References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> <575EE07F.8040102@stoneleaf.us> <575F5410.6080106@hastings.org> Message-ID: On 2016-06-14 01:47, Larry Hastings wrote: > On 06/13/2016 05:05 PM, MRAB wrote: >> This could be avoided by expanding the items to include the index of >> the 'previous' and 'next' item, so that they could be handled like a >> doubly-linked list. >> >> The disadvantage would be that it would use more memory. > > Another, easier technique: don't fill holes. Same disadvantage > (increased memory use), but easier to write and maintain. > When iterating over the dict, you'd need to skip over the holes, so it would be a good idea to compact it a some point, when there are too many holes. From njs at pobox.com Mon Jun 13 21:33:57 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Jun 2016 18:33:57 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> <575EE07F.8040102@stoneleaf.us> <575F5410.6080106@hastings.org> Message-ID: On Jun 13, 2016 6:16 PM, "MRAB" wrote: > > On 2016-06-14 01:47, Larry Hastings wrote: >> >> On 06/13/2016 05:05 PM, MRAB wrote: >>> >>> This could be avoided by expanding the items to include the index of >>> the 'previous' and 'next' item, so that they could be handled like a >>> doubly-linked list. >>> >>> The disadvantage would be that it would use more memory. >> >> >> Another, easier technique: don't fill holes. Same disadvantage >> (increased memory use), but easier to write and maintain. >> > When iterating over the dict, you'd need to skip over the holes, so it would be a good idea to compact it a some point, when there are too many holes. Right -- but if you wait for some ratio of holes to filled space before compacting, you can amortize the cost down, and have a good big-O complexity for both del and iteration simultaneously. Same basic principle as using proportional overallocation when appending to a list, just in reverse. I believe this is what pypy's implementation actually does. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jun 13 22:23:10 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jun 2016 19:23:10 -0700 Subject: [Python-Dev] PEP 468 In-Reply-To: <575F5410.6080106@hastings.org> References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> <575EE07F.8040102@stoneleaf.us> <575F5410.6080106@hastings.org> Message-ID: <575F6A8E.3010803@stoneleaf.us> On 06/13/2016 05:47 PM, Larry Hastings wrote: > On 06/13/2016 05:05 PM, MRAB wrote: >> This could be avoided by expanding the items to include the index of >> the 'previous' and 'next' item, so that they could be handled like a >> doubly-linked list. >> >> The disadvantage would be that it would use more memory. > > Another, easier technique: don't fill holes. Same disadvantage > (increased memory use), but easier to write and maintain. I hope this is just an academic discussion: suddenly having Python's dicts grow continuously is going to have nasty consequences somewhere. -- ~Ethan~ From nad at python.org Mon Jun 13 23:57:02 2016 From: nad at python.org (Ned Deily) Date: Mon, 13 Jun 2016 23:57:02 -0400 Subject: [Python-Dev] Python 3.6.0a2 is now available Message-ID: On behalf of the Python development community and the Python 3.6 release team, I'm happy to announce the availability of Python 3.6.0a2. 3.6.0a2 is the first of four planned alpha releases of Python 3.6, the next major release of Python. During the alpha phase, Python 3.6 remains under heavy development: additional features will be added and existing features may be modified or deleted. Please keep in mind that this is a preview release and its use is not recommended for production environments. You can find Python 3.6.0a2 here: https://www.python.org/downloads/release/python-360a2/ The next release of Python 3.6 will be 3.6.0a3, currently scheduled for 2016-07-11. Enjoy! --Ned -- Ned Deily nad at python.org -- [] From g.brandl at gmx.net Tue Jun 14 02:17:13 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 14 Jun 2016 08:17:13 +0200 Subject: [Python-Dev] Current Python 3.2 status? In-Reply-To: References: Message-ID: On 06/11/2016 07:41 PM, Chi Hsuan Yen wrote: > > > On Sun, Jun 12, 2016 at 1:02 AM, Berker Peksa? > wrote: > > On Sat, Jun 11, 2016 at 8:59 AM, Chi Hsuan Yen > wrote: > > Hello all, > > > > Georg said in February that 3.2.7 is going to be released, and now it's > > June. Will it ever be released? > > Hi, > > It was delayed because of a security issue. See Georg's email at > https://mail.python.org/pipermail/python-dev/2016-February/143400.html > > --Berker > > > Thanks for that. I'm just curious what's happening on the 3.2 branch. Patches being available now, I'll do the releases this weekend. Georg From nikita at nemkin.ru Tue Jun 14 05:41:39 2016 From: nikita at nemkin.ru (Nikita Nemkin) Date: Tue, 14 Jun 2016 14:41:39 +0500 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: Is there any rationale for rejecting alternatives like: 1. Adding standard metaclass with ordered namespace. 2. Adding `namespace` or `ordered` args to the default metaclass. 3. Making compiler fill in __definition_order__ for every class (just like __qualname__) without touching the runtime. ? To me, any of the above seems preferred to complicating the core part of the language forever. The vast majority of Python classes don't care about their member order, this is minority use case receiving majority treatment. Also, wiring OrderedDict into class creation means elevating it from a peripheral utility to indispensable built-in type. From asimkostas at gmail.com Tue Jun 14 04:44:01 2016 From: asimkostas at gmail.com (asimkon) Date: Tue, 14 Jun 2016 11:44:01 +0300 Subject: [Python-Dev] mod_python compilation error in VS 2008 for py2.7.1 Message-ID: I would like to ask you a technical question regarding python module compilation for python 2.7.1. I want to compile mod_python library for Apache 2.2 and py2.7 on Win32 in order to use it for psp - py scripts that i have written. I tried to compile it using VS 2008 (VC++) and unfortunately i get an error on pyconfig.h (Py2.7/include) error C2632: int followed by int is illegal. This problem occurs when i try to run the bat file that exists on mod_python/dist folder. Any idea or suggestion what should i do in order to run it on Win 7 Pro (win 32) environment and produce the final apache executable module (.so). For your better assistance, i attach you the necessary files and error_log (ouput that i get during compilation process). I have posted the same question here , but unfortunately i had had no luck! Additionally i give you the compilation instructions that i follow (used also MinGW-w64 and get the same error) in order to produce the final output! Compiling Open a command prompt with VS2008 support. The easiest way to do this is to use "Start | All Programs | Microsoft Visual Studio 2008 | Visual Studio Tools | Visual Studio 2008 Command Prompt". (This puts the VS2008 binaries in the path and sets up the lib/include environmental variables for the Platform SDK.) 1.cd to the mod_python\dist folder. 2.Tell mod_python where Apache is: set APACHESRC=C:\Apache 3. Run build_installer.bat. If it succeeds, an installer.exe will be created in a subfolder. Run that install the module. Kind Regards Kostas Asimakopoulos -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: unistd.h Type: text/x-chdr Size: 1753 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: getopt.h Type: text/x-chdr Size: 18564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mod_python_error.docx Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document Size: 17248 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pyconfig.h Type: text/x-chdr Size: 22098 bytes Desc: not available URL: From leewangzhong+python at gmail.com Tue Jun 14 03:47:56 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 14 Jun 2016 03:47:56 -0400 Subject: [Python-Dev] PEP 468 In-Reply-To: <575F6A8E.3010803@stoneleaf.us> References: <1465501262.461706.633110089.19D9C3C8@webmail.messagingengine.com> <575EE07F.8040102@stoneleaf.us> <575F5410.6080106@hastings.org> <575F6A8E.3010803@stoneleaf.us> Message-ID: Compact OrderedDicts can leave gaps, and once in a while compactify. For example, whenever the entry table is full, it can decide whether to resize (and only copy non-gaps), or just compactactify Compact regular dicts can swap from the back and have no gaps. I don't see the point of discussing these details. Isn't it enough to say that these are solvable problems, which we can worry about if/when someone actually decides to sit down and implement compact dicts? P.S.: Sorry about the repeated emails. I think it was the iOS Gmail app. On Jun 13, 2016 10:23 PM, "Ethan Furman" wrote: > > On 06/13/2016 05:47 PM, Larry Hastings wrote: >> >> On 06/13/2016 05:05 PM, MRAB wrote: > > >>> This could be avoided by expanding the items to include the index of >>> the 'previous' and 'next' item, so that they could be handled like a >>> doubly-linked list. >>> >>> The disadvantage would be that it would use more memory. >> >> >> Another, easier technique: don't fill holes. Same disadvantage >> (increased memory use), but easier to write and maintain. > > > I hope this is just an academic discussion: suddenly having Python's dicts grow continuously is going to have nasty consequences somewhere. > > -- > ~Ethan~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jun 14 11:07:14 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Jun 2016 01:07:14 +1000 Subject: [Python-Dev] [Python-checkins] cpython (3.5): Fix os.urandom() using getrandom() on Linux In-Reply-To: <20160614143358.10086.1428.9B00D7BE@psf.io> References: <20160614143358.10086.1428.9B00D7BE@psf.io> Message-ID: <20160614150713.GX27919@ando.pearwood.info> Is this right? I thought we had decided that os.urandom should *not* fall back on getrandom on Linux? On Tue, Jun 14, 2016 at 02:36:27PM +0000, victor. stinner wrote: > https://hg.python.org/cpython/rev/e028e86a5b73 > changeset: 102033:e028e86a5b73 > branch: 3.5 > parent: 102031:a36238de31ae > user: Victor Stinner > date: Tue Jun 14 16:31:35 2016 +0200 > summary: > Fix os.urandom() using getrandom() on Linux > > Issue #27278: Fix os.urandom() implementation using getrandom() on Linux. > Truncate size to INT_MAX and loop until we collected enough random bytes, > instead of casting a directly Py_ssize_t to int. > > files: > Misc/NEWS | 4 ++++ > Python/random.c | 2 +- > 2 files changed, 5 insertions(+), 1 deletions(-) > > > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -13,6 +13,10 @@ > Library > ------- > > +- Issue #27278: Fix os.urandom() implementation using getrandom() on Linux. > + Truncate size to INT_MAX and loop until we collected enough random bytes, > + instead of casting a directly Py_ssize_t to int. > + > - Issue #26386: Fixed ttk.TreeView selection operations with item id's > containing spaces. > > diff --git a/Python/random.c b/Python/random.c > --- a/Python/random.c > +++ b/Python/random.c > @@ -143,7 +143,7 @@ > to 1024 bytes */ > n = Py_MIN(size, 1024); > #else > - n = size; > + n = Py_MIN(size, INT_MAX); > #endif > > errno = 0; > > -- > Repository URL: https://hg.python.org/cpython > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > https://mail.python.org/mailman/listinfo/python-checkins From steve at pearwood.info Tue Jun 14 11:19:35 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Jun 2016 01:19:35 +1000 Subject: [Python-Dev] Why does base64 return bytes? Message-ID: <20160614151935.GY27919@ando.pearwood.info> Normally I'd take a question like this to Python-List, but this question has turned out to be quite diversive, with people having strong opinions but no definitive answer. So I thought I'd ask here and hope that some of the core devs would have an idea. Why does base64 encoding in Python return bytes? base64.b64encode take bytes as input and returns bytes. Some people are arguing that this is wrong behaviour, as RFC 3548 specifies that Base64 should transform bytes to characters: https://tools.ietf.org/html/rfc3548.html albeit US-ASCII characters. E.g.: The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. [...] Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the output string. Are they misinterpreting the standard? Has Python got it wrong? Is there a good reason for returning bytes? I see that other languages choose different strategies. Microsoft's languages C#, F# and VB (plus their C++ compiler) take an array of bytes as input, and outputs a UTF-16 string: https://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx Java's base64 encoder takes and returns bytes: https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html and Javascript's Base64 encoder takes input as UTF-16 encoded text and returns the same: https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding I'm not necessarily arguing that Python's strategy is the wrong one, but I am interested in what (if any) reasons are behind it. Thanks in advance, Steve From jelle.zijlstra at gmail.com Tue Jun 14 11:27:01 2016 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Tue, 14 Jun 2016 08:27:01 -0700 Subject: [Python-Dev] [Python-checkins] cpython (3.5): Fix os.urandom() using getrandom() on Linux In-Reply-To: <20160614150713.GX27919@ando.pearwood.info> References: <20160614143358.10086.1428.9B00D7BE@psf.io> <20160614150713.GX27919@ando.pearwood.info> Message-ID: I think this is an issue unrelated to the big discussion from a little while ago. The problem isn't that os.urandom() uses getrandom(), it's that it calls it in a mode that may block. 2016-06-14 8:07 GMT-07:00 Steven D'Aprano : > Is this right? I thought we had decided that os.urandom should *not* > fall back on getrandom on Linux? > > > > On Tue, Jun 14, 2016 at 02:36:27PM +0000, victor. stinner wrote: > > https://hg.python.org/cpython/rev/e028e86a5b73 > > changeset: 102033:e028e86a5b73 > > branch: 3.5 > > parent: 102031:a36238de31ae > > user: Victor Stinner > > date: Tue Jun 14 16:31:35 2016 +0200 > > summary: > > Fix os.urandom() using getrandom() on Linux > > > > Issue #27278: Fix os.urandom() implementation using getrandom() on Linux. > > Truncate size to INT_MAX and loop until we collected enough random bytes, > > instead of casting a directly Py_ssize_t to int. > > > > files: > > Misc/NEWS | 4 ++++ > > Python/random.c | 2 +- > > 2 files changed, 5 insertions(+), 1 deletions(-) > > > > > > diff --git a/Misc/NEWS b/Misc/NEWS > > --- a/Misc/NEWS > > +++ b/Misc/NEWS > > @@ -13,6 +13,10 @@ > > Library > > ------- > > > > +- Issue #27278: Fix os.urandom() implementation using getrandom() on > Linux. > > + Truncate size to INT_MAX and loop until we collected enough random > bytes, > > + instead of casting a directly Py_ssize_t to int. > > + > > - Issue #26386: Fixed ttk.TreeView selection operations with item id's > > containing spaces. > > > > diff --git a/Python/random.c b/Python/random.c > > --- a/Python/random.c > > +++ b/Python/random.c > > @@ -143,7 +143,7 @@ > > to 1024 bytes */ > > n = Py_MIN(size, 1024); > > #else > > - n = size; > > + n = Py_MIN(size, INT_MAX); > > #endif > > > > errno = 0; > > > > -- > > Repository URL: https://hg.python.org/cpython > > > _______________________________________________ > > Python-checkins mailing list > > Python-checkins at python.org > > https://mail.python.org/mailman/listinfo/python-checkins > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/jelle.zijlstra%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Tue Jun 14 11:29:25 2016 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 14 Jun 2016 12:29:25 -0300 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160614151935.GY27919@ando.pearwood.info> References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On 14 June 2016 at 12:19, Steven D'Aprano wrote: > Is there > a good reason for returning bytes? What about: it returns 0-255 numeric values for each position in a stream, with no clue whatsoever to how those values map to text characters beyond the 32-128 range? Maybe base64.decode could take a "encoding" optional parameter - or there could be a separate 'decote_to_text" method that would explicitly take a text codec name. Otherwise, no, you simply can't take a bunch of bytes and say they represent text. Jo?o (see ^- the "?" ?) From victor.stinner at gmail.com Tue Jun 14 11:35:15 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 14 Jun 2016 17:35:15 +0200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160614151935.GY27919@ando.pearwood.info> References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: To port OpenStack to Python 3, I wrote 4 (2x2) helper functions which accept bytes *and* Unicode as input. xxx_as_bytes() functions return bytes, xxx_as_text() return Unicode: http://docs.openstack.org/developer/oslo.serialization/api.html Victor Le 14 juin 2016 5:21 PM, "Steven D'Aprano" a ?crit : > Normally I'd take a question like this to Python-List, but this question > has turned out to be quite diversive, with people having strong opinions > but no definitive answer. So I thought I'd ask here and hope that some > of the core devs would have an idea. > > Why does base64 encoding in Python return bytes? > > base64.b64encode take bytes as input and returns bytes. Some people are > arguing that this is wrong behaviour, as RFC 3548 specifies that Base64 > should transform bytes to characters: > > https://tools.ietf.org/html/rfc3548.html > > albeit US-ASCII characters. E.g.: > > The encoding process represents 24-bit groups of input bits > as output strings of 4 encoded characters. > [...] > Each 6-bit group is used as an index into an array of 64 printable > characters. The character referenced by the index is placed in the > output string. > > Are they misinterpreting the standard? Has Python got it wrong? Is there > a good reason for returning bytes? > > I see that other languages choose different strategies. Microsoft's > languages C#, F# and VB (plus their C++ compiler) take an array of bytes > as input, and outputs a UTF-16 string: > > https://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx > > Java's base64 encoder takes and returns bytes: > > https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html > > and Javascript's Base64 encoder takes input as UTF-16 encoded text and > returns the same: > > > https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding > > I'm not necessarily arguing that Python's strategy is the wrong one, but > I am interested in what (if any) reasons are behind it. > > > Thanks in advance, > > > > > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jun 14 11:38:40 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 14 Jun 2016 17:38:40 +0200 Subject: [Python-Dev] [Python-checkins] cpython (3.5): Fix os.urandom() using getrandom() on Linux In-Reply-To: <20160614150713.GX27919@ando.pearwood.info> References: <20160614143358.10086.1428.9B00D7BE@psf.io> <20160614150713.GX27919@ando.pearwood.info> Message-ID: Sorry, I don't hve the bandwith to follow the huge discussion around random in Python. If you want my help, please write a PEP to summarize the discussion. My change fixes an obvious bug. Even if the Python API changes, I don't expect that all the C code will be removed. Victor Le 14 juin 2016 5:11 PM, "Steven D'Aprano" a ?crit : > Is this right? I thought we had decided that os.urandom should *not* > fall back on getrandom on Linux? > > > > On Tue, Jun 14, 2016 at 02:36:27PM +0000, victor. stinner wrote: > > https://hg.python.org/cpython/rev/e028e86a5b73 > > changeset: 102033:e028e86a5b73 > > branch: 3.5 > > parent: 102031:a36238de31ae > > user: Victor Stinner > > date: Tue Jun 14 16:31:35 2016 +0200 > > summary: > > Fix os.urandom() using getrandom() on Linux > > > > Issue #27278: Fix os.urandom() implementation using getrandom() on Linux. > > Truncate size to INT_MAX and loop until we collected enough random bytes, > > instead of casting a directly Py_ssize_t to int. > > > > files: > > Misc/NEWS | 4 ++++ > > Python/random.c | 2 +- > > 2 files changed, 5 insertions(+), 1 deletions(-) > > > > > > diff --git a/Misc/NEWS b/Misc/NEWS > > --- a/Misc/NEWS > > +++ b/Misc/NEWS > > @@ -13,6 +13,10 @@ > > Library > > ------- > > > > +- Issue #27278: Fix os.urandom() implementation using getrandom() on > Linux. > > + Truncate size to INT_MAX and loop until we collected enough random > bytes, > > + instead of casting a directly Py_ssize_t to int. > > + > > - Issue #26386: Fixed ttk.TreeView selection operations with item id's > > containing spaces. > > > > diff --git a/Python/random.c b/Python/random.c > > --- a/Python/random.c > > +++ b/Python/random.c > > @@ -143,7 +143,7 @@ > > to 1024 bytes */ > > n = Py_MIN(size, 1024); > > #else > > - n = size; > > + n = Py_MIN(size, INT_MAX); > > #endif > > > > errno = 0; > > > > -- > > Repository URL: https://hg.python.org/cpython > > > _______________________________________________ > > Python-checkins mailing list > > Python-checkins at python.org > > https://mail.python.org/mailman/listinfo/python-checkins > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jun 14 11:40:33 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 14 Jun 2016 17:40:33 +0200 Subject: [Python-Dev] [Python-checkins] cpython (3.5): Fix os.urandom() using getrandom() on Linux In-Reply-To: References: <20160614143358.10086.1428.9B00D7BE@psf.io> <20160614150713.GX27919@ando.pearwood.info> Message-ID: Le 14 juin 2016 5:28 PM, "Jelle Zijlstra" a ?crit : >The problem isn't that os.urandom() uses getrandom(), it's that it calls it in a mode that may block. Except if it changed very recently, os.urandom() doesn't block anymore thanks to my previous change ;-) Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Jun 14 11:51:44 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 14 Jun 2016 16:51:44 +0100 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160614151935.GY27919@ando.pearwood.info> References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On 14 June 2016 at 16:19, Steven D'Aprano wrote: > Why does base64 encoding in Python return bytes? I seem to recall there was a debate about this around the time of the Python 3 move. (IIRC, it was related to the fact that there used to be a base64 "codec", that wasn't available in Python 3 because it wasn't clear whether it converted bytes to text or bytes). I don't remember any of the details, let alone if a conclusion was reached, but a search of the archives may find something. Paul From a.badger at gmail.com Tue Jun 14 12:32:30 2016 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 14 Jun 2016 09:32:30 -0700 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On Jun 14, 2016 8:32 AM, "Joao S. O. Bueno" wrote: > > On 14 June 2016 at 12:19, Steven D'Aprano wrote: > > Is there > > a good reason for returning bytes? > > What about: it returns 0-255 numeric values for each position in a stream, with > no clue whatsoever to how those values map to text characters beyond > the 32-128 range? > > Maybe base64.decode could take a "encoding" optional parameter - or > there could be > a separate 'decote_to_text" method that would explicitly take a text codec name. > Otherwise, no, you simply can't take a bunch of bytes and say they > represent text. > Although it's not explicit, the question seems to be about the output of encoding (and for symmetry, the input of decoding). In both of those cases, valid output will consist only of ascii characters. The input to encoding would have to remain bytes (that's the main purpose of base64... to turn bytes into an ascii string). -Toshio -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jun 14 12:38:46 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jun 2016 12:38:46 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160614151935.GY27919@ando.pearwood.info> References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On 6/14/2016 11:19 AM, Steven D'Aprano wrote: > Normally I'd take a question like this to Python-List, but this question > has turned out to be quite diversive, with people having strong opinions > but no definitive answer. So I thought I'd ask here and hope that some > of the core devs would have an idea. > > Why does base64 encoding in Python return bytes? Ultimately, because we never decided to change this in 3.0. > base64.b64encode take bytes as input and returns bytes. Some people are > arguing that this is wrong behaviour, as RFC 3548 specifies that Base64 > should transform bytes to characters: > > https://tools.ietf.org/html/rfc3548.html > > albeit US-ASCII characters. E.g.: > > The encoding process represents 24-bit groups of input bits > as output strings of 4 encoded characters. One could argue that 'encoded character' means 'bytes' in Python, but I don't know what the standard writer meant, as unicode characters always have some internal encoding. > [...] > Each 6-bit group is used as an index into an array of 64 printable > characters. The character referenced by the index is placed in the > output string. -- Terry Jan Reedy From breamoreboy at yahoo.co.uk Tue Jun 14 12:29:12 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 14 Jun 2016 17:29:12 +0100 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On 14/06/2016 16:51, Paul Moore wrote: > On 14 June 2016 at 16:19, Steven D'Aprano wrote: >> Why does base64 encoding in Python return bytes? > > I seem to recall there was a debate about this around the time of the > Python 3 move. (IIRC, it was related to the fact that there used to be > a base64 "codec", that wasn't available in Python 3 because it wasn't > clear whether it converted bytes to text or bytes). I don't remember > any of the details, let alone if a conclusion was reached, but a > search of the archives may find something. > > Paul > As I've the time to play detective I'd suggest https://mail.python.org/pipermail/python-3000/2007-July/008975.html -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From tjreedy at udel.edu Tue Jun 14 12:43:38 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jun 2016 12:43:38 -0400 Subject: [Python-Dev] mod_python compilation error in VS 2008 for py2.7.1 In-Reply-To: References: Message-ID: <23fff550-c19f-ff6f-c663-c206a806b1d9@udel.edu> On 6/14/2016 4:44 AM, asimkon wrote: > I would like to ask you a technical question regarding python module > compilation for python 2.7.1. So you know, python-list, where you cross-posted this, is the right place for discussion of development *with* Python. python-dev is for development *of* Python language and future CPython and this is off-topic here. -- Terry Jan Reedy From jsbueno at python.org.br Tue Jun 14 13:05:19 2016 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 14 Jun 2016 14:05:19 -0300 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On 14 June 2016 at 13:32, Toshio Kuratomi wrote: > > On Jun 14, 2016 8:32 AM, "Joao S. O. Bueno" wrote: >> >> On 14 June 2016 at 12:19, Steven D'Aprano wrote: >> > Is there >> > a good reason for returning bytes? >> >> What about: it returns 0-255 numeric values for each position in a >> stream, with >> no clue whatsoever to how those values map to text characters beyond >> the 32-128 range? >> >> Maybe base64.decode could take a "encoding" optional parameter - or >> there could be >> a separate 'decote_to_text" method that would explicitly take a text codec >> name. >> Otherwise, no, you simply can't take a bunch of bytes and say they >> represent text. >> > Although it's not explicit, the question seems to be about the output of > encoding (and for symmetry, the input of decoding). In both of those cases, > valid output will consist only of ascii characters. > > The input to encoding would have to remain bytes (that's the main purpose of > base64... to turn bytes into an ascii string). > Sorry, it is 2016, and I don't think at this point anyone can consider an ASCII string as a representative pattern of textual data in any field of application. Bytes are not text. Bytes with an associated, meaningful, encoding are text. I thought this had been through when Python 3 was out. Unless you are working with COBOL generated data (and intending to keep the file format) , it does not make sense in any real-world field. (supposing your Cobol data is ASCII and nort EBCDIC). > -Toshio From pmiscml at gmail.com Tue Jun 14 13:19:09 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Tue, 14 Jun 2016 20:19:09 +0300 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: <20160614201909.3cd4322b@x230> Hello, On Tue, 14 Jun 2016 16:51:44 +0100 Paul Moore wrote: > On 14 June 2016 at 16:19, Steven D'Aprano wrote: > > Why does base64 encoding in Python return bytes? > > I seem to recall there was a debate about this around the time of the > Python 3 move. (IIRC, it was related to the fact that there used to be > a base64 "codec", that wasn't available in Python 3 because it wasn't > clear whether it converted bytes to text or bytes). I don't remember > any of the details, let alone if a conclusion was reached, but a > search of the archives may find something. Well, it's easy to remember the conclusion - it was decided to return bytes. The reason also wouldn't be hard to imagine - regardless of the fact that base64 uses ASCII codes for digits and letters, it's still essentially a binary data. And the most natural step for it is to send it down the socket (socket.send() accepts bytes), etc. I'd find it a bit more surprising that binascii.hexlify() returns bytes, but I personally got used to it, and consider it a consistency thing on binascii module. Generally, with Python3 by default using (inefficient) Unicode for strings, any efficient data processing would use bytes, and then one appreciates the fact that data encoding/decoding routines also return bytes, avoiding implicit expensive conversion to strings. -- Best regards, Paul mailto:pmiscml at gmail.com From random832 at fastmail.com Tue Jun 14 13:45:00 2016 From: random832 at fastmail.com (Random832) Date: Tue, 14 Jun 2016 13:45:00 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: <1465926300.85154.637554673.09FC6B35@webmail.messagingengine.com> On Tue, Jun 14, 2016, at 13:05, Joao S. O. Bueno wrote: > Sorry, it is 2016, and I don't think at this point anyone can consider > an ASCII string > as a representative pattern of textual data in any field of application. > Bytes are not text. Bytes with an associated, meaningful, encoding are > text. > I thought this had been through when Python 3 was out. Of all the things that anyone has said in this thread, this makes the *least* contextual sense. The input to base64 encoding, which is what is under discussion, is not text in any way. It is images, it is zip files, it is executables, it could be the output of os.urandom (at least, provided it doesn't block ;) for all anyone cares. The *output* is only an ascii string in the sense that it is a text string consisting of characters within (a carefully chosen subset of) ASCII's repertoire, but the output wasn't what he was claiming should be bytes in the sentence you replied to. Is your objection to the phrase "ascii string"? From jsbueno at python.org.br Tue Jun 14 14:00:24 2016 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 14 Jun 2016 15:00:24 -0300 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <1465926300.85154.637554673.09FC6B35@webmail.messagingengine.com> References: <20160614151935.GY27919@ando.pearwood.info> <1465926300.85154.637554673.09FC6B35@webmail.messagingengine.com> Message-ID: On 14 June 2016 at 14:45, Random832 wrote: > On Tue, Jun 14, 2016, at 13:05, Joao S. O. Bueno wrote: >> Sorry, it is 2016, and I don't think at this point anyone can consider >> an ASCII string >> as a representative pattern of textual data in any field of application. >> Bytes are not text. Bytes with an associated, meaningful, encoding are >> text. >> I thought this had been through when Python 3 was out. > > Of all the things that anyone has said in this thread, this makes the > *least* contextual sense. The input to base64 encoding, which is what is > under discussion, is not text in any way. It is images, it is zip files, > it is executables, it could be the output of os.urandom (at least, > provided it doesn't block ;) for all anyone cares. > > The *output* is only an ascii string in the sense that it is a text > string consisting of characters within (a carefully chosen subset of) > ASCII's repertoire, but the output wasn't what he was claiming should be > bytes in the sentence you replied to. Is your objection to the phrase > "ascii string"? Sorry - everything I wrote, I was thinking about _decoding_ base 64. As for the result of an encoded base64, yes, of course it fits into ASCII. The arguments about compactness and what is most likely to happen next applies (transmission trhough a binary network protocol), but the strong objection I had was just because I thought it was a suggestion of decoding base 64 automatically to text without providing a text encoding. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br From random832 at fastmail.com Tue Jun 14 14:02:02 2016 From: random832 at fastmail.com (Random832) Date: Tue, 14 Jun 2016 14:02:02 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160614201909.3cd4322b@x230> References: <20160614151935.GY27919@ando.pearwood.info> <20160614201909.3cd4322b@x230> Message-ID: <1465927322.90071.637565817.6473B401@webmail.messagingengine.com> On Tue, Jun 14, 2016, at 13:19, Paul Sokolovsky wrote: > Well, it's easy to remember the conclusion - it was decided to return > bytes. The reason also wouldn't be hard to imagine - regardless of the > fact that base64 uses ASCII codes for digits and letters, it's still > essentially a binary data. Only in the sense that all text is binary data. There's nothing in the definition of base64 specifying ASCII codes. It specifies *characters* that all happen to be in ASCII's character repertoire. >And the most natural step for it is to send > it down the socket (socket.send() accepts bytes), etc. How is that more natural than to send it to a text buffer that is ultimately encoded (maybe not even in an ASCII-compatible encoding... though probably) and sent down a socket or written to a file by a layer that is outside your control? Yes, everything eventually ends up as bytes. That doesn't mean that we should obsessively convert things to bytes as early as possible. I mean if we were gonna do that why bother even having a unicode string type at all? > I'd find it a bit more surprising that binascii.hexlify() returns > bytes, but I personally got used to it, and consider it a > consistency thing on binascii module. > > Generally, with Python3 by default using (inefficient) Unicode for > strings, Why is it inefficient? > any efficient data processing would use bytes, and then one > appreciates the fact that data encoding/decoding routines also return > bytes, avoiding implicit expensive conversion to strings. From rdmurray at bitdance.com Tue Jun 14 14:05:55 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 14 Jun 2016 14:05:55 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: <20160614180556.9A1C0B1401C@webabinitio.net> On Tue, 14 Jun 2016 14:05:19 -0300, "Joao S. O. Bueno" wrote: > On 14 June 2016 at 13:32, Toshio Kuratomi wrote: > > > > On Jun 14, 2016 8:32 AM, "Joao S. O. Bueno" wrote: > >> > >> On 14 June 2016 at 12:19, Steven D'Aprano wrote: > >> > Is there > >> > a good reason for returning bytes? > >> > >> What about: it returns 0-255 numeric values for each position in a > >> stream, with > >> no clue whatsoever to how those values map to text characters beyond > >> the 32-128 range? > >> > >> Maybe base64.decode could take a "encoding" optional parameter - or > >> there could be > >> a separate 'decote_to_text" method that would explicitly take a text codec > >> name. > >> Otherwise, no, you simply can't take a bunch of bytes and say they > >> represent text. > >> > > Although it's not explicit, the question seems to be about the output of > > encoding (and for symmetry, the input of decoding). In both of those cases, > > valid output will consist only of ascii characters. > > > > The input to encoding would have to remain bytes (that's the main purpose of > > base64... to turn bytes into an ascii string). > > > > Sorry, it is 2016, and I don't think at this point anyone can consider > an ASCII string > as a representative pattern of textual data in any field of application. > Bytes are not text. Bytes with an associated, meaningful, encoding are text. > I thought this had been through when Python 3 was out. > > Unless you are working with COBOL generated data (and intending to keep > the file format) , it does not make sense in any real-world field. > (supposing your > Cobol data is ASCII and nort EBCDIC). The fundamental purpose of the base64 encoding is to take a series of arbitrary bytes and reversibly turn them into another series of bytes in which the eighth bit is not significant. Its utility is for transmitting eight bit bytes over a channel that is not eight bit clean. Before unicode, that meant bytes. Now that we have unicode in use in lots of places, you can think of unicode as a communications channel that is not eight bit clean. So, we might want to use base64 encoding to transmit arbitrary bytes over a unicode channel. This gives a legitimate reason to want unicode output from a base64 encoder. However, it is equally legitimate in the Python context to say you should be explicit about your intentions by decoding the bytes output of the base64 encoder using the ASCII codec. This was indeed discussed at length. For a while we didn't even allow unicode input on either side, but we relaxed that. My understanding of Python's current stance on functions that handle both bytes and string is that *either* the function accepts both types and outputs the *same* type as the input, *or* it accepts both types but always outputs *one* type or the other. You can't have unicode output if you give unicode input to the base64 decoder in the general case. So decode, at least, has to always give bytes output. Likewise, there is small to zero utility for using unicode input to the base64 encoder, since the unicode would have to be ASCII only and there'd be no point in doing the encoding. So, the only thing that makes sense is to follow the "one output type" rule here. Now, you can argue whether or not it would make sense for the encoder to always produce unicode. However, you then immediately run into the backward compatibility issue: the primary use case of the base64 encoding is to produce *wire ready* bytes. This is what the email package uses it for, for example. So for backward compatibility reasons, which are consonant with its primary use case, it makes more sense for the encoder to produce bytes than string. If you need to transmit bytes over a unicode channel, you can decode it from ASCII. That is, unicode is the *exceptional* use case here, not the rule. That might in fact be changing, but for backward compatibility reasons, Python won't change. And that should answer Steve's original question :) --David From dholth at gmail.com Tue Jun 14 14:13:11 2016 From: dholth at gmail.com (Daniel Holth) Date: Tue, 14 Jun 2016 18:13:11 +0000 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <1465927322.90071.637565817.6473B401@webmail.messagingengine.com> References: <20160614151935.GY27919@ando.pearwood.info> <20160614201909.3cd4322b@x230> <1465927322.90071.637565817.6473B401@webmail.messagingengine.com> Message-ID: IMO this is more a philosophical problem than a programming problem. base64 has a dual-nature. It is both text and bytes. At least it should fit in a 1-byte-per-character efficient Python 3 unicode string also. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Tue Jun 14 14:17:16 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Tue, 14 Jun 2016 21:17:16 +0300 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <1465927322.90071.637565817.6473B401@webmail.messagingengine.com> References: <20160614151935.GY27919@ando.pearwood.info> <20160614201909.3cd4322b@x230> <1465927322.90071.637565817.6473B401@webmail.messagingengine.com> Message-ID: <20160614211716.354bf102@x230> Hello, On Tue, 14 Jun 2016 14:02:02 -0400 Random832 wrote: > On Tue, Jun 14, 2016, at 13:19, Paul Sokolovsky wrote: > > Well, it's easy to remember the conclusion - it was decided to > > return bytes. The reason also wouldn't be hard to imagine - > > regardless of the fact that base64 uses ASCII codes for digits and > > letters, it's still essentially a binary data. > > Only in the sense that all text is binary data. There's nothing in the > definition of base64 specifying ASCII codes. It specifies *characters* > that all happen to be in ASCII's character repertoire. > > >And the most natural step for it is to send > > it down the socket (socket.send() accepts bytes), etc. > > How is that more natural than to send it to a text buffer that is It's more natural because it's more efficient. It's more natural in the same sense that the most natural way to get from point A to point B is a straight line. > ultimately encoded (maybe not even in an ASCII-compatible encoding... > though probably) and sent down a socket or written to a file by a > layer that is outside your control? Yes, everything eventually ends > up as bytes. That doesn't mean that we should obsessively convert > things to bytes as early as possible. It's vice-versa - there's no need to obsessively convert simple, primary type of bytes (everything in computers are bytes) to more complex things like Unicode strings. > I mean if we were gonna do that why bother even having a unicode > string type at all? You're trying to raise the topic which is a subject of gigantic flame wars on python-list for years. Here's my summary: not using unicode string type *at all* is better than not using bytes type at all. So, feel free to use unicode string *only* when it's needed, which is *only* when you accept input from or produce output for *human* (like real human, walking down a street to do grocery shopping). In all other cases, data should stay bytes (mind - stay, as it's bytes in the beginning, and it requires extra effort to convert it to a strings). > > I'd find it a bit more surprising that binascii.hexlify() returns > > bytes, but I personally got used to it, and consider it a > > consistency thing on binascii module. > > > > Generally, with Python3 by default using (inefficient) Unicode for > > strings, > > Why is it inefficient? Because bytes is the most efficient basic representation of data. Everything which tries to convert it to something is less efficient in general. Less efficient == inefficient. -- Best regards, Paul mailto:pmiscml at gmail.com From tjreedy at udel.edu Tue Jun 14 14:17:55 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jun 2016 14:17:55 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On 6/14/2016 12:32 PM, Toshio Kuratomi wrote: > The input to encoding would have to remain bytes (that's the main > purpose of base64... to turn bytes into an ascii string). The purpose is to turn arbitrary binary data (commonly images) into 'safe bytes' that will not get mangled on transmission (7 bit channels were once common) and that will not mangle a display of data transmitted or received. Ignoring the EBCDIC world, which Python mostly does, the set of 'safe bytes' is the set that encodes printable ascii characters. Those bytes pass through 7 bit channels and display on ascii-based terminals. -- Terry Jan Reedy From pmiscml at gmail.com Tue Jun 14 14:25:48 2016 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Tue, 14 Jun 2016 21:25:48 +0300 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> <20160614201909.3cd4322b@x230> <1465927322.90071.637565817.6473B401@webmail.messagingengine.com> Message-ID: <20160614212548.7361f8a3@x230> Hello, On Tue, 14 Jun 2016 18:13:11 +0000 Daniel Holth wrote: > IMO this is more a philosophical problem than a programming problem. > base64 has a dual-nature. It is both text and bytes. At least it > should fit in a 1-byte-per-character efficient Python 3 unicode > string also. You probably mean "CPython3 1-byte-per-character "efficient" string". But CPython3 is merely one of half-dozen Python3 language implementations. Yup, a special one, but hopefully it's special in a respect that it doesn't abuse its powers to make language API *changes* based on its own implementation details. API changes, because API *decisions* have been done long ago already. -- Best regards, Paul mailto:pmiscml at gmail.com From tjreedy at udel.edu Tue Jun 14 14:42:36 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jun 2016 14:42:36 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On 6/14/2016 12:29 PM, Mark Lawrence via Python-Dev wrote: > As I've the time to play detective I'd suggest > https://mail.python.org/pipermail/python-3000/2007-July/008975.html Thank you for finding that. I reread it and still believe that bytes was the right choice. Base64 is an generic edge encoding for binary data. It fits in with the the standard paradigm as a edge encoding. Receive encoded bytes. Decode bytes to python objects Manipulate python objects Encode python objects to bytes Send bytes. Receive and send can be from and to either local files or sockets usually connected to remote systems. Transmissions can have blocks with different encodings. In the latter case, the bytes need to be parsed into blocks with different encodings. In the (fairly common) special case that a transmission consists entirely of text in *1* encoding (ignoring any transmission wrappers), decode and encode can be incorporated into a text-mode file object. If a transmission consists entirely or partly of binary, one can open in binary mode and .write one or more blocks of encoded bytes, possible with encoding data. -- Terry Jan Reedy From stephen at xemacs.org Tue Jun 14 14:44:47 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Jun 2016 03:44:47 +0900 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160614151935.GY27919@ando.pearwood.info> References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: <22368.20639.247590.870541@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > base64.b64encode take bytes as input and returns bytes. Some people are > arguing that this is wrong behaviour, as RFC 3548 That RFC is obsolete: the replacement is RFC 4648. However, the text is essentially unchanged. > specifies that Base64 should transform bytes to characters: Without defining "character" except as a "subset" of ASCII. That omission is evidently deliberate. Unfortunately the RFC is unclear whether a subset of the ASCII repertoire of (abstract) characters is meant, or a subset of the ASCII codes. I believe the latter is meant, but either way, it does refer to *encoded* characters as the output of the encoding process: > The encoding process represents 24-bit groups of input bits > as output strings of 4 encoded characters. and I see no reason to deny that the bytes output by base64.b64encode are the octets representing the ASCII codes for the characters of the BASE64 alphabet. > Are they misinterpreting the standard? I think they are. As I understand it, the intention of the standard in using "character" to denote the code unit is similar to that of RFC 3986: BASE encodings are intended to be printable and recognizable to humans. If you're using a non-ASCII-superset encoding such as EBCDIC for text I/O, then you should translate from ASCII to that encoding for display, and in the (unlikely) case that a human types BASE encoding from the terminal, the reverse transformation is necessary. > Has Python got it wrong? I can't see anything in the RFC that suggests that. And, in the end, an RFC is not concerned with Python's internal fiddling, but rather with what goes out over the wire. All of the implementations you mention will eventually send to the wire octets that are interpreted as ASCII-encoded characters according to their integer values. > Is there a good reason for returning bytes? I suppose practicality over purity: BASE encodings are normally used on the wire, and so programs need to encode text to appropriately encoded octets *before* BASE encoding, and then normally immediately put the BASE-encoded content on the wire. Why round-trip from UTF-8 bytes to a str in BASE64 representation, and then do the (trivial) conversion back to bytes? OK, it's not that expensive, but still... From mailing at franzoni.eu Tue Jun 14 16:25:24 2016 From: mailing at franzoni.eu (Alan Franzoni) Date: Tue, 14 Jun 2016 22:25:24 +0200 Subject: [Python-Dev] ValuesView abc: why doesn't it (officially) inherit from Iterable? Message-ID: Hello, I hope not to bother anyone with a somewhat trivial question, I was unable to get an answer from other channels. I was just checking out some docs on ABCs for a project of mine, where I need to do some type-related work. Those are the official docs about the ValuesView type, in both Python 2 and 3: https://docs.python.org/2/library/collections.html#collections.ValuesView https://docs.python.org/3/library/collections.abc.html and this is the source (Python 2, but same happens in Python 3) https://hg.python.org/releases/2.7.11/file/9213c70c67d2/Lib/_abcoll.py#l479 I was very puzzled about the ValuesView interface, because from a logical standpoint it should inherit from Iterable, IMHO (it's even got the __iter__ Mixin method); on the contrary the docs say that it just inherits from MappingView, which inherits from Sized, which doesn't inherit from Iterable. So I fired up my 2.7 interpreter: >>> from collections import Iterable >>> d = {1:2, 3:4} >>> isinstance(d.viewvalues(), Iterable) True >>> It looks iterable, after all, because of Iterable's own subclasshook. But I don't understand why ValuesView isn't explicitly Iterable. Other ABCs, like Sequence, are explicitly inheriting Iterable. Is there some arcane reason behind that, or it's just a documentation+implementation shortcoming (with no real-world impact) for a little-used feature? Bye, -- www.franzoni.eu - Twitter: @alanfranz contact me at public@[mysurname].eu From brett at python.org Tue Jun 14 16:44:19 2016 From: brett at python.org (Brett Cannon) Date: Tue, 14 Jun 2016 20:44:19 +0000 Subject: [Python-Dev] ValuesView abc: why doesn't it (officially) inherit from Iterable? In-Reply-To: References: Message-ID: On Tue, 14 Jun 2016 at 13:30 Alan Franzoni wrote: > Hello, > I hope not to bother anyone with a somewhat trivial question, I was > unable to get an answer from other channels. > > I was just checking out some docs on ABCs for a project of mine, where > I need to do some type-related work. Those are the official docs about > the ValuesView type, in both Python 2 and 3: > > https://docs.python.org/2/library/collections.html#collections.ValuesView > https://docs.python.org/3/library/collections.abc.html > > and this is the source (Python 2, but same happens in Python 3) > > https://hg.python.org/releases/2.7.11/file/9213c70c67d2/Lib/_abcoll.py#l479 > > I was very puzzled about the ValuesView interface, because from a > logical standpoint it should inherit from Iterable, IMHO (it's even > got the __iter__ Mixin method); on the contrary the docs say that it > just inherits from MappingView, which inherits from Sized, which > doesn't inherit from Iterable. > > So I fired up my 2.7 interpreter: > > >>> from collections import Iterable > >>> d = {1:2, 3:4} > >>> isinstance(d.viewvalues(), Iterable) > True > >>> > > It looks iterable, after all, because of Iterable's own subclasshook. > > But I don't understand why ValuesView isn't explicitly Iterable. Other > ABCs, like Sequence, are explicitly inheriting Iterable. Is there some > arcane reason behind that, or it's just a documentation+implementation > shortcoming (with no real-world impact) for a little-used feature? > To add some extra info, both KeysView and ItemsView inherit from Set which does inherit from Iterable. I personally don't know why ValuesView doesn't inherit from Set (although Iterable does override __subclasshook__() so there isn't a direct functional loss which if this turns out to be a bug why no one has notified until now). Alan, would you mind filing an issue at bugs.python.org about this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailing at franzoni.eu Tue Jun 14 16:47:24 2016 From: mailing at franzoni.eu (Alan Franzoni) Date: Tue, 14 Jun 2016 22:47:24 +0200 Subject: [Python-Dev] ValuesView abc: why doesn't it (officially) inherit from Iterable? In-Reply-To: References: Message-ID: ValuesView doesn't inherit from Set because the values in a dictionary can contain duplicates. That makes sense. It's just the missing Iterable, which is a weaker contract, that doesn't. I'm filing the bug tomorrow. On Tue, Jun 14, 2016 at 10:44 PM, Brett Cannon wrote: > On Tue, 14 Jun 2016 at 13:30 Alan Franzoni wrote: >> >> Hello, >> I hope not to bother anyone with a somewhat trivial question, I was >> unable to get an answer from other channels. >> >> I was just checking out some docs on ABCs for a project of mine, where >> I need to do some type-related work. Those are the official docs about >> the ValuesView type, in both Python 2 and 3: >> >> https://docs.python.org/2/library/collections.html#collections.ValuesView >> https://docs.python.org/3/library/collections.abc.html >> >> and this is the source (Python 2, but same happens in Python 3) >> >> >> https://hg.python.org/releases/2.7.11/file/9213c70c67d2/Lib/_abcoll.py#l479 >> >> I was very puzzled about the ValuesView interface, because from a >> logical standpoint it should inherit from Iterable, IMHO (it's even >> got the __iter__ Mixin method); on the contrary the docs say that it >> just inherits from MappingView, which inherits from Sized, which >> doesn't inherit from Iterable. >> >> So I fired up my 2.7 interpreter: >> >> >>> from collections import Iterable >> >>> d = {1:2, 3:4} >> >>> isinstance(d.viewvalues(), Iterable) >> True >> >>> >> >> It looks iterable, after all, because of Iterable's own subclasshook. >> >> But I don't understand why ValuesView isn't explicitly Iterable. Other >> ABCs, like Sequence, are explicitly inheriting Iterable. Is there some >> arcane reason behind that, or it's just a documentation+implementation >> shortcoming (with no real-world impact) for a little-used feature? > > > To add some extra info, both KeysView and ItemsView inherit from Set which > does inherit from Iterable. I personally don't know why ValuesView doesn't > inherit from Set (although Iterable does override __subclasshook__() so > there isn't a direct functional loss which if this turns out to be a bug why > no one has notified until now). > > Alan, would you mind filing an issue at bugs.python.org about this? -- My development blog: ollivander.franzoni.eu . @franzeur on Twitter contact me at public@[mysurname].eu From greg.ewing at canterbury.ac.nz Tue Jun 14 19:37:34 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jun 2016 11:37:34 +1200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> <1465926300.85154.637554673.09FC6B35@webmail.messagingengine.com> Message-ID: <5760953E.8050703@canterbury.ac.nz> Joao S. O. Bueno wrote: > The arguments about compactness and what is most likely to happen > next applies (transmission trhough a binary network protocol), I'm not convinced that this is what is most likely to happen next *in a Python program*. How many people implement their own binary network protocols in Python? It seems to me most people will be using a protocol library written by someone else. -- Greg From greg.ewing at canterbury.ac.nz Tue Jun 14 19:51:05 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jun 2016 11:51:05 +1200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160614180556.9A1C0B1401C@webabinitio.net> References: <20160614151935.GY27919@ando.pearwood.info> <20160614180556.9A1C0B1401C@webabinitio.net> Message-ID: <57609869.4060304@canterbury.ac.nz> R. David Murray wrote: > The fundamental purpose of the base64 encoding is to take a series > of arbitrary bytes and reversibly turn them into another series of > bytes in which the eighth bit is not significant. No, it's not. If that were its only purpose, it would be called base128, and the RFC would describe it purely in terms of bit patterns and not mention characters or character sets at all. The RFC does *not* do that. It describes the output in terms of characters, and does not specify any bit patterns for the output. The intention is clearly to represent binary data as *text*. -- Greg From steve at pearwood.info Tue Jun 14 22:48:05 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Jun 2016 12:48:05 +1000 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: <20160615024805.GA27919@ando.pearwood.info> On Tue, Jun 14, 2016 at 05:29:12PM +0100, Mark Lawrence via Python-Dev wrote: > As I've the time to play detective I'd suggest > https://mail.python.org/pipermail/python-3000/2007-July/008975.html Thanks Mark, that's great! -- Steve From stephen at xemacs.org Tue Jun 14 22:58:04 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Jun 2016 11:58:04 +0900 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <57609869.4060304@canterbury.ac.nz> References: <20160614151935.GY27919@ando.pearwood.info> <20160614180556.9A1C0B1401C@webabinitio.net> <57609869.4060304@canterbury.ac.nz> Message-ID: <22368.50236.788324.306156@turnbull.sk.tsukuba.ac.jp> Greg Ewing writes: > The RFC does *not* do that. It describes the output in terms of > characters, and does not specify any bit patterns for the > output. The RFC is unclear on this point, but I read it as specifying the ASCII coded character set, not the ASCII repertoire of (abstract) characters. Therefore, it specifies an invertible mapping from a particular set of integers to characters. > The intention is clearly to represent binary data as *text*. It's more subtle than that. *RFCs do not deal with text.* Text is an internal concept of (some) programming environments. RFCs may deal with *encoded text*, and RFC 4648 indeed specifically mentions "encoded characters" as the output of the BASE64 algorithm.[1] The intention then is to represent binary data with *binary data that may be conveniently interpreted as text* (ie, without reencoding), eg, by a terminal or a printer.[2] It is also desirable that it be likely to pass unscathed through channels that are not necessarily even 7-bit clean (file system directories and JIS X 0201, for example) which *inadvertantly* treat it as text. Both requirements are conveniently fulfilled by using appropriate ASCII subsets, and encoding on the wire using the usual bit patterns. However, I suppose you could also use EBCDIC or UTF-16, as long as you have agreed with the receiver to do so. So I would say that Python can do what it wants with the type that base64.b64encode returns as far as the RFC is concerned; that's an internal aspect of Python. It's purely a matter of our convenience (as programmer *in* Python) whether we return str or bytes. My own experience is biased toward email and web (not to be confused with SMTP and HTTP), and so my experience is that most composers (1) automatically handle text encodings for the users, and then the content transfer encoding as necessary for the underlying protocol, and (2) handle attachments by placing a reference in the composed content, which is replaced by the object just before transmission (and any desired content transfer encoding is applied at that time, at the option of the composing agent, which rarely needs to bother the user with such trivia). Bytes seem more convenient to me, and give an on- the-wire representation consistent with that of Python 2 str. Footnotes: [1] Admittedly, RFC 3986 (URIs) does stretch the notion of "encoded text" to the breaking point by including marks on paper. [2] Thus, BASE64-encoding resources provides a more efficient, alternative datagram protocol for the physical links used by RFC 1149 networks. From random832 at fastmail.com Wed Jun 15 00:29:06 2016 From: random832 at fastmail.com (Random832) Date: Wed, 15 Jun 2016 00:29:06 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <22368.50236.788324.306156@turnbull.sk.tsukuba.ac.jp> References: <20160614151935.GY27919@ando.pearwood.info> <20160614180556.9A1C0B1401C@webabinitio.net> <57609869.4060304@canterbury.ac.nz> <22368.50236.788324.306156@turnbull.sk.tsukuba.ac.jp> Message-ID: <1465964946.2281576.638049745.3C35B600@webmail.messagingengine.com> On Tue, Jun 14, 2016, at 22:58, Stephen J. Turnbull wrote: > The RFC is unclear on this point, but I read it as specifying the > ASCII coded character set, not the ASCII repertoire of (abstract) > characters. Therefore, it specifies an invertible mapping from a > particular set of integers to characters. There are multiple descriptions of base 64 that specifically mention using it with EBCDIC and with local character sets of unspecified nature. > > The intention is clearly to represent binary data as *text*. > > It's more subtle than that. *RFCs do not deal with text.* Text is > an internal concept of (some) programming environments. It's also a human concept. Plenty of RFCs deal with human concept rather than purely programming topics. From greg.ewing at canterbury.ac.nz Wed Jun 15 01:40:26 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jun 2016 17:40:26 +1200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <22368.20639.247590.870541@turnbull.sk.tsukuba.ac.jp> References: <20160614151935.GY27919@ando.pearwood.info> <22368.20639.247590.870541@turnbull.sk.tsukuba.ac.jp> Message-ID: <5760EA4A.4060404@canterbury.ac.nz> Stephen J. Turnbull wrote: > it does refer to *encoded* characters as the output of > the encoding process: > > > The encoding process represents 24-bit groups of input bits > > as output strings of 4 encoded characters. The "encoding" being referred to there is the encoding from input bytes to output characters, not an encoding of the output characters as bytes. Nowhere in RFC 4648 does it refer to the output as being made up of "bytes" or "octets". It's always described in terms of "characters". > As I understand it, the intention of the standard > in using "character" to denote the code unit is similar to that of RFC > 3986: BASE encodings are intended to be printable and recognizable to > humans. Hmmm... so why then does it say, in section 4: The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that ... need not be human readable. > If you're using a non-ASCII-superset encoding such as EBCDIC > for text I/O, then you should translate from ASCII to that encoding > for display, What about the channel you're sending the encoded data over? Suppose I'm on Windows and I'm embedding the base64 encoded data in a text message that I'm sending through a mail client that accepts text in utf-16. I hope you would agree that, in that situation, encoding the base64 output in ASCII and giving those bytes directly to the mail client would be very much the wrong thing to do? -- Greg From larry at hastings.org Wed Jun 15 01:41:32 2016 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jun 2016 22:41:32 -0700 Subject: [Python-Dev] [Python-checkins] cpython (3.5): Fix os.urandom() using getrandom() on Linux In-Reply-To: <20160614150713.GX27919@ando.pearwood.info> References: <20160614143358.10086.1428.9B00D7BE@psf.io> <20160614150713.GX27919@ando.pearwood.info> Message-ID: <5760EA8C.4070507@hastings.org> On 06/14/2016 08:07 AM, Steven D'Aprano wrote: > Is this right? I thought we had decided that os.urandom should *not* > fall back on getrandom on Linux? We decided that os.urandom() should not *block* on Linux. Which it doesn't; we now strictly call getrandom(GRND_NONBLOCK), which will never block. getrandom() is better because it's a system call, instead of reading from a file. So it's much less messy. If getrandom() wanted to block, instead it'll return EAGAIN, and we'll fail over to reading from /dev/urandom directly, just like we did in 3.4 and before. It's all working as intended, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hodgestar+pythondev at gmail.com Wed Jun 15 02:22:28 2016 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Wed, 15 Jun 2016 08:22:28 +0200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: On Tue, Jun 14, 2016 at 8:42 PM, Terry Reedy wrote: > Thank you for finding that. I reread it and still believe that bytes was > the right choice. Base64 is an generic edge encoding for binary data. It > fits in with the the standard paradigm as a edge encoding. I'd like to me-too Terry's sentiment, but also expand on it a bit. Base64 encoding is used to convert bytes into a limited set of symbols for inclusion in a stream of data. Whether bytes or unicode characters are appropriate depends on whether the stream being constructed is a byte stream or a unicode character stream. Many people do deal with byte streams in Python and we have large sub-communities for who this use case is important (e.g. Twisted, Asyncio, anyone using the socket module). It is also no longer 1980 though, and there are many protocols layered on top of unicode character streams rather than bytes. Ideally I'd like us to support both options (like we've been increasingly doing for reading from other external sources such as file systems or environment variables). If we only support one, I would prefer it to be bytes since (bytes -> bytes -> unicode) seems like less overhead and slightly conceptually clearer than (bytes -> unicode -> bytes), but I consider this a personal preference rather than any sort of one-true-way. Schiavo Simon From greg.ewing at canterbury.ac.nz Wed Jun 15 03:02:57 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jun 2016 19:02:57 +1200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <22368.50236.788324.306156@turnbull.sk.tsukuba.ac.jp> References: <20160614151935.GY27919@ando.pearwood.info> <20160614180556.9A1C0B1401C@webabinitio.net> <57609869.4060304@canterbury.ac.nz> <22368.50236.788324.306156@turnbull.sk.tsukuba.ac.jp> Message-ID: <5760FDA1.4000803@canterbury.ac.nz> Stephen J. Turnbull wrote: > The RFC is unclear on this point, but I read it as specifying the > ASCII coded character set, not the ASCII repertoire of (abstract) > characters. Well, I think you've misread it. Or at least there is a more general reading possible that is entirely consistent with the stated purpose and doesn't assume any particular output encoding. > It's more subtle than that. *RFCs do not deal with text.* That may be true of most RFCs, but I think this particular one really *is* talking about text, even if the authors didn't realise it at the time. > It is also desirable that it be likely to pass unscathed through channels > that ... *inadvertantly* treat it as text. Both requirements are > conveniently fulfilled by using appropriate ASCII subsets, and encoding on > the wire using the usual bit patterns. But only if the part that is (deliberately or inadvertently) treating it as text is using ASCII as its encoding. So, by your reading of the RFC, base64 is *only* intended for channels that use ASCII encoding. Whereas if you drop the assumption of ASCII and use whatever encoding the channel uses for text, then it works for all channels. RFC 4648 doesn't mention it, but an earlier RFC on base64 explicitly said that characters were chosen that also exist in EBCDIC, so it seems they were intending that base64 should work on EBCDIC-bases systems as well as ASCII-based ones. > It's purely a matter of our convenience > (as programmer *in* Python) whether we return str or bytes. Yes, and it seems to me the decision has been made by people with their noses stuck in low-level protocol implementations. Whenever *I've* needed to base64 encode something, I've wanted the output as text, because that's what I needed to feed into the next stage of the process. Maybe there should be two versions of the base64 codec, one producing bytes and one producing text? -- Greg From greg.ewing at canterbury.ac.nz Wed Jun 15 03:07:26 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jun 2016 19:07:26 +1200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: <5760FEAE.7000004@canterbury.ac.nz> Simon Cross wrote: > If we only support one, I would prefer it to be bytes since (bytes -> > bytes -> unicode) seems like less overhead and slightly conceptually > clearer than (bytes -> unicode -> bytes), Whereas bytes -> unicode, followed if needed by unicode -> bytes, seems conceptually clearer to me. IOW, base64 is conceptually a bytes-to-text transformation, and the usual way to represent text in Python 3 is unicode. -- Greg From steve at pearwood.info Wed Jun 15 08:34:01 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Jun 2016 22:34:01 +1000 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> Message-ID: <20160615123401.GB27919@ando.pearwood.info> On Tue, Jun 14, 2016 at 09:40:51PM -0700, Guido van Rossum wrote: > I'm officially on vacation, but I was surprised that people now assume > RFCs, which specify internet protocols, would have a bearing on programming > languages. (With perhaps an exception for RFCs that specifically specify > how programming languages or their libraries should treat certain specific > issues -- but I found no evidence that this RFC is doing that.) Sorry to disturb your vacation! I hoped that there might have been a nice simple answer, like "the main use-case for Base64 is the email module, which needs bytes, and thus it was decided". Or even "because backwards compatibility". Thanks to everyone for their constructive comments, and expecially Mark for digging up the original discussion on the Python-3000 list. I'm satisfied that the choice made by Python is the right choice, and that it meets the spirit (if, arguably, not the letter) of the RFC. -- Steve From dholth at gmail.com Wed Jun 15 08:53:15 2016 From: dholth at gmail.com (Daniel Holth) Date: Wed, 15 Jun 2016 12:53:15 +0000 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160615123401.GB27919@ando.pearwood.info> References: <20160614151935.GY27919@ando.pearwood.info> <20160615123401.GB27919@ando.pearwood.info> Message-ID: In that case could we just add a base64_text() method somewhere? Who would like to measure whether it would be a win? On Wed, Jun 15, 2016 at 8:34 AM Steven D'Aprano wrote: > On Tue, Jun 14, 2016 at 09:40:51PM -0700, Guido van Rossum wrote: > > I'm officially on vacation, but I was surprised that people now assume > > RFCs, which specify internet protocols, would have a bearing on > programming > > languages. (With perhaps an exception for RFCs that specifically specify > > how programming languages or their libraries should treat certain > specific > > issues -- but I found no evidence that this RFC is doing that.) > > Sorry to disturb your vacation! > > I hoped that there might have been a nice simple answer, like "the > main use-case for Base64 is the email module, which needs bytes, and > thus it was decided". Or even "because backwards compatibility". > > Thanks to everyone for their constructive comments, and expecially Mark > for digging up the original discussion on the Python-3000 list. I'm > satisfied that the choice made by Python is the right choice, and that > it meets the spirit (if, arguably, not the letter) of the RFC. > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Jun 15 09:17:40 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 15 Jun 2016 14:17:40 +0100 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> <20160615123401.GB27919@ando.pearwood.info> Message-ID: On 15 June 2016 at 13:53, Daniel Holth wrote: > In that case could we just add a base64_text() method somewhere? Who would > like to measure whether it would be a win? "Just adding" a method in the stdlib, means we'd have to support it long term (backward compatibility). So by the time such an experiment determined whether it was worth it, it'd be too late. Finding out whether users/projects typically write such a helper function for themselves would be a better way of getting this information. Personally, I suspect they don't, but facts beat speculation. Of course, "not every one liner needs to be a stdlib function" applies here too. Paul From steve at pearwood.info Wed Jun 15 11:07:50 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Jun 2016 01:07:50 +1000 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> <20160615123401.GB27919@ando.pearwood.info> Message-ID: <20160615150750.GC27919@ando.pearwood.info> On Wed, Jun 15, 2016 at 12:53:15PM +0000, Daniel Holth wrote: > In that case could we just add a base64_text() method somewhere? Who would > like to measure whether it would be a win? Just call .decode('ascii') on the output of base64.b64encode. Not every one-liner needs to be a standard function. -- Steve From ijmorlan at uwaterloo.ca Wed Jun 15 06:21:25 2016 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Wed, 15 Jun 2016 06:21:25 -0400 (EDT) Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <5760FEAE.7000004@canterbury.ac.nz> References: <20160614151935.GY27919@ando.pearwood.info> <5760FEAE.7000004@canterbury.ac.nz> Message-ID: On Wed, 15 Jun 2016, Greg Ewing wrote: > Simon Cross wrote: >> If we only support one, I would prefer it to be bytes since (bytes -> >> bytes -> unicode) seems like less overhead and slightly conceptually >> clearer than (bytes -> unicode -> bytes), > > Whereas bytes -> unicode, followed if needed by unicode -> bytes, > seems conceptually clearer to me. IOW, base64 is conceptually a > bytes-to-text transformation, and the usual way to represent > text in Python 3 is unicode. And in CPython, do I understand correctly that the output text would be represented using one byte per character? If so, would there be a way of encoding that into UTF-8 that re-used the raw memory that backs the Unicode object? And, therefore, avoids almost all the inefficiency of going via Unicode? If so, this would be a win - proper use of Unicode to represent a text string, combined with instantaneous conversion into a bytes object for the purpose of writing to the OS. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist From ninosm12 at gmail.com Wed Jun 15 02:40:06 2016 From: ninosm12 at gmail.com (ninostephen mathew) Date: Wed, 15 Jun 2016 12:10:06 +0530 Subject: [Python-Dev] Bug in the DELETE statement in sqlite3 module Message-ID: Respected Developer(s), while writing a database module for one of my applications in python I encountered something interesting. I had a username and password field in my table and only one entry which was "Admin" and "password". While debugging I purposefully deleted that record. Then I ran the same statement again. To my surprise, it got execute. Then I ran the statement to delete the user "admin" (lowercase 'a') which does not exist in the table. Surprisingly again is got executed even though the table was empty. What I expected was an error popping up. But nothing happened. I hope this error gets fixed soon. The code snippet is given below. self.cursor.execute(''' DELETE FROM Users WHERE username = ?''',(self.username,)) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Wed Jun 15 11:11:31 2016 From: dholth at gmail.com (Daniel Holth) Date: Wed, 15 Jun 2016 15:11:31 +0000 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> <5760FEAE.7000004@canterbury.ac.nz> Message-ID: It would be a codec. base64_text in the codecs module. Probably 1 line different than the existing codec. Very easy to use and maintain. Less surprising and less error prone for everyone who thinks base64 should convert between bytes to text. Sounds like an obvious win to me. On Wed, Jun 15, 2016 at 11:08 AM Isaac Morland wrote: > On Wed, 15 Jun 2016, Greg Ewing wrote: > > > Simon Cross wrote: > >> If we only support one, I would prefer it to be bytes since (bytes -> > >> bytes -> unicode) seems like less overhead and slightly conceptually > >> clearer than (bytes -> unicode -> bytes), > > > > Whereas bytes -> unicode, followed if needed by unicode -> bytes, > > seems conceptually clearer to me. IOW, base64 is conceptually a > > bytes-to-text transformation, and the usual way to represent > > text in Python 3 is unicode. > > And in CPython, do I understand correctly that the output text would be > represented using one byte per character? If so, would there be a way of > encoding that into UTF-8 that re-used the raw memory that backs the > Unicode object? And, therefore, avoids almost all the inefficiency of > going via Unicode? If so, this would be a win - proper use of Unicode to > represent a text string, combined with instantaneous conversion into a > bytes object for the purpose of writing to the OS. > > Isaac Morland CSCF Web Guru > DC 2619, x36650 WWW Software Specialist > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duda.piotr at gmail.com Wed Jun 15 11:25:53 2016 From: duda.piotr at gmail.com (Piotr Duda) Date: Wed, 15 Jun 2016 17:25:53 +0200 Subject: [Python-Dev] Bug in the DELETE statement in sqlite3 module In-Reply-To: References: Message-ID: This is not a bug, this is correct behavior of any sql database. 2016-06-15 8:40 GMT+02:00 ninostephen mathew : > Respected Developer(s), > while writing a database module for one of my applications in python I > encountered something interesting. I had a username and password field in my > table and only one entry which was "Admin" and "password". While debugging > I purposefully deleted that record. Then I ran the same statement again. To > my surprise, it got execute. Then I ran the statement to delete the user > "admin" (lowercase 'a') which does not exist in the table. Surprisingly > again is got executed even though the table was empty. What I expected was > an error popping up. But nothing happened. I hope this error gets fixed > soon. The code snippet is given below. > > self.cursor.execute(''' DELETE FROM Users WHERE username = > ?''',(self.username,)) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/duda.piotr%40gmail.com > -- ???????? ?????? From python-dev at masklinn.net Wed Jun 15 11:34:31 2016 From: python-dev at masklinn.net (Xavier Morel) Date: Wed, 15 Jun 2016 17:34:31 +0200 Subject: [Python-Dev] Bug in the DELETE statement in sqlite3 module In-Reply-To: References: Message-ID: <0BBBD362-B7A2-4437-9F5A-0843DB686055@masklinn.net> > On 2016-06-15, at 08:40 , ninostephen mathew wrote: > > Respected Developer(s), > while writing a database module for one of my applications in python I encountered something interesting. I had a username and password field in my table and only one entry which was "Admin" and "password". While debugging I purposefully deleted that record. Then I ran the same statement again. To my surprise, it got execute. Then I ran the statement to delete the user "admin" (lowercase 'a') which does not exist in the table. Surprisingly again is got executed even though the table was empty. What I expected was an error popping up. But nothing happened. I hope this error gets fixed soon. The code snippet is given below. > > self.cursor.execute(''' DELETE FROM Users WHERE username = ?''',(self.username,)) Despite Python bundling sqlite, the Python mailing list is not responsible for developing SQLite (only for the SQLite bindings themselves) so this is the wrong mailing list. That being said, the DELETE statement deletes whichever records in the table match the provided predicate. If no record matches the predicate, it will simply delete no record, that is not an error, it is the exact expected and documented behaviour for the statement in SQL in general and SQLite in particular. See https://www.sqlite.org/lang_delete.html for the documentation of the DELETE statement in SQLite. While you should feel free to report your expectations to the SQLite project or to the JTC1/SC32 technical committee (which is responsible for SQL itself) I fear that's what you will get told there, and that you are about 30 years too late to try influence such a core statement of the language. Not that it would have worked I'd think, I'm reasonably sure the behaviour of the DELETE statement is a natural consequence of SQL's set- theoretic foundations: DELETE applies to a set of records, regardless of the set's cardinality. From p.f.moore at gmail.com Wed Jun 15 11:29:43 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 15 Jun 2016 16:29:43 +0100 Subject: [Python-Dev] Bug in the DELETE statement in sqlite3 module In-Reply-To: References: Message-ID: On 15 June 2016 at 07:40, ninostephen mathew wrote: > Respected Developer(s), > while writing a database module for one of my applications in python I > encountered something interesting. I had a username and password field in my > table and only one entry which was "Admin" and "password". While debugging > I purposefully deleted that record. Then I ran the same statement again. To > my surprise, it got execute. Then I ran the statement to delete the user > "admin" (lowercase 'a') which does not exist in the table. Surprisingly > again is got executed even though the table was empty. What I expected was > an error popping up. But nothing happened. I hope this error gets fixed > soon. The code snippet is given below. > > self.cursor.execute(''' DELETE FROM Users WHERE username = > ?''',(self.username,)) First of all, this list is for the discussions about the development of Python itself, not for developing applications with Python. You should probably be posting to python-list instead. Having said that, this is how SQL works - a DELETE statement selects all records matching the WHERE clause and deletes them. If the WHERE clause doesn't match anything, nothing gets deleted. So your code is working exactly as I would expect. Paul From ethan at stoneleaf.us Wed Jun 15 12:12:07 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jun 2016 09:12:07 -0700 Subject: [Python-Dev] proposed os.fspath() change Message-ID: <57617E57.40808@stoneleaf.us> I would like to make a change to os.fspath(). Specifically, os.fspath() currently raises an exception if something besides str, bytes, or os.PathLike is passed in, but makes no checks if an os.PathLike object returns something besides a str or bytes. I would like to change that to the opposite: if a non-os.PathLike is passed in, return it unchanged (so no change for str and bytes); if an os.PathLike object returns something that is not a str nor bytes, raise. An example of the difference in the lzma file: Current code (has not been upgraded to use os.fspath() yet) ----------------------------------------------------------- if isinstance(filename, (str, bytes)): if "b" not in mode: mode += "b" self._fp = builtins.open(filename, mode) self._closefp = True self._mode = mode_code elif hasattr(filename, "read") or hasattr(filename, "write"): self._fp = filename self._mode = mode_code else: raise TypeError( "filename must be a str or bytes object, or a file" ) Code change if using upgraded os.fspath() (placed before above stanza): filename = os.fspath(filename) Code change with current os.fspath() (ditto): if isinstance(filename, os.PathLike): filename = os.fspath(filename) My intention with the os.fspath() function was to minimize boiler-plate code and make PathLike objects easy and painless to support; having to discover if any given parameter is PathLike before calling os.fspath() on it is, IMHO, just the opposite. There is also precedent for having a __dunder__ check the return type: --> class Huh: ... def __int__(self): ... return 'string' ... def __index__(self): ... return b'bytestring' ... def __bool__(self): ... return 'true-ish' ... --> h = Huh() --> int(h) Traceback (most recent call last): File "", line 1, in TypeError: __int__ returned non-int (type str) --> ''[h] Traceback (most recent call last): File "", line 1, in TypeError: __index__ returned non-int (type bytes) --> bool(h) Traceback (most recent call last): File "", line 1, in TypeError: __bool__ should return bool, returned str Arguments in favor or against? -- ~Ethan~ From guido at python.org Wed Jun 15 12:33:47 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Jun 2016 09:33:47 -0700 Subject: [Python-Dev] Bug in the DELETE statement in sqlite3 module In-Reply-To: References: Message-ID: A point of order: it's not necessary to post three separate "this is the wrong list" replies. In fact the optimal number is probably close to zero -- I understand we all want to be helpful, and we don't want to send duplicate replies, but someone who posts an inappropriate question is likely to try another venue when they receive no replies, and three replies to the list implies that some folks are a little too eager to appear helpful (while reading the list with considerable delay). When the OP pings the thread maybe one person, preferably someone who reads the list directly via email from the list server, could post a standard "wrong list" response. On Wed, Jun 15, 2016 at 8:29 AM, Paul Moore wrote: > On 15 June 2016 at 07:40, ninostephen mathew wrote: > > Respected Developer(s), > > while writing a database module for one of my applications in python I > > encountered something interesting. I had a username and password field > in my > > table and only one entry which was "Admin" and "password". While > debugging > > I purposefully deleted that record. Then I ran the same statement again. > To > > my surprise, it got execute. Then I ran the statement to delete the user > > "admin" (lowercase 'a') which does not exist in the table. Surprisingly > > again is got executed even though the table was empty. What I expected > was > > an error popping up. But nothing happened. I hope this error gets fixed > > soon. The code snippet is given below. > > > > self.cursor.execute(''' DELETE FROM Users WHERE username = > > ?''',(self.username,)) > > First of all, this list is for the discussions about the development > of Python itself, not for developing applications with Python. You > should probably be posting to python-list instead. > > Having said that, this is how SQL works - a DELETE statement selects > all records matching the WHERE clause and deletes them. If the WHERE > clause doesn't match anything, nothing gets deleted. So your code is > working exactly as I would expect. > > Paul > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jun 15 12:46:46 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Jun 2016 09:46:46 -0700 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: <57617E57.40808@stoneleaf.us> References: <57617E57.40808@stoneleaf.us> Message-ID: These are really two separate proposals. I'm okay with checking the return value of calling obj.__fspath__; that's an error in the object anyways, and it doesn't matter much whether we do this or not (though when approving the PEP I considered this and decided not to insert a check for this). But it doesn't affect your example, does it? I guess it's easier to raise now and change the API in the future to avoid raising in this case (if we find that raising is undesirable) than the other way around, so I'm +0 on this. The other proposal (passing anything that's not understood right through) is more interesting and your use case is somewhat compelling. Catching the exception coming out of os.fspath() would certainly be much messier. The question remaining is whether, when this behavior is not desired (e.g. when the caller of os.fspath() just wants a string that it can pass to open()), the condition of passing that's neither a string not supports __fspath__ still produces an understandable error. I'm not sure that that's the case. E.g. open() accepts file descriptors in addition to paths, but I'm not sure that accepting an integer is a good idea in most cases -- it either gives a mystery "Bad file descriptor" error or starts reading/writing some random system file, which it then closes once the stream is closed. On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman wrote: > I would like to make a change to os.fspath(). > > Specifically, os.fspath() currently raises an exception if something > besides str, bytes, or os.PathLike is passed in, but makes no checks > if an os.PathLike object returns something besides a str or bytes. > > I would like to change that to the opposite: if a non-os.PathLike is > passed in, return it unchanged (so no change for str and bytes); if > an os.PathLike object returns something that is not a str nor bytes, > raise. > > An example of the difference in the lzma file: > > Current code (has not been upgraded to use os.fspath() yet) > ----------------------------------------------------------- > > if isinstance(filename, (str, bytes)): > if "b" not in mode: > mode += "b" > self._fp = builtins.open(filename, mode) > self._closefp = True > self._mode = mode_code > elif hasattr(filename, "read") or hasattr(filename, "write"): > self._fp = filename > self._mode = mode_code > else: > raise TypeError( > "filename must be a str or bytes object, or a file" > ) > > Code change if using upgraded os.fspath() (placed before above stanza): > > filename = os.fspath(filename) > > Code change with current os.fspath() (ditto): > > if isinstance(filename, os.PathLike): > filename = os.fspath(filename) > > My intention with the os.fspath() function was to minimize boiler-plate > code and make PathLike objects easy and painless to support; having to > discover if any given parameter is PathLike before calling os.fspath() > on it is, IMHO, just the opposite. > > There is also precedent for having a __dunder__ check the return type: > > --> class Huh: > ... def __int__(self): > ... return 'string' > ... def __index__(self): > ... return b'bytestring' > ... def __bool__(self): > ... return 'true-ish' > ... > --> h = Huh() > > --> int(h) > Traceback (most recent call last): > File "", line 1, in > TypeError: __int__ returned non-int (type str) > > --> ''[h] > Traceback (most recent call last): > File "", line 1, in > TypeError: __index__ returned non-int (type bytes) > > --> bool(h) > Traceback (most recent call last): > File "", line 1, in > TypeError: __bool__ should return bool, returned str > > Arguments in favor or against? > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Wed Jun 15 12:50:18 2016 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 15 Jun 2016 12:50:18 -0400 Subject: [Python-Dev] Bug in the DELETE statement in sqlite3 module In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/15/2016 12:33 PM, Guido van Rossum wrote: > A point of order: it's not necessary to post three separate "this is > the wrong list" replies. In fact the optimal number is probably close > to zero -- I understand we all want to be helpful, and we don't want > to send duplicate replies, but someone who posts an inappropriate > question is likely to try another venue when they receive no replies, > and three replies to the list implies that some folks are a little too > eager to appear helpful (while reading the list with considerable > delay). When the OP pings the thread maybe one person, preferably > someone who reads the list directly via email from the list server, > could post a standard "wrong list" response. In addition, please don't undermine the "this is the wrong list" message by responding substantively to the OP's query. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXYYc/AAoJEPKpaDSJE9HYlSgP/1v+FpEvildmH4fEpZXG+j18 jCt3Q48ffSW22oPhx4lyfZv1Sh3EOsEuHHd3oU7jG9kUtTPyluQQYJiygfCBpSev CP8LonjJxxkFsVwK5SRGcp7JdjiFbLyqUXbtkFM6s2OE7mpXwtbn4suCRJx7MYaO CUkN2h0vAandftV4xu+lp/r7n0l8HLTTOsrUFuPZRbT4dVzKwRcM+ER1W4tCnkgZ bFRXM8YjrUcX/Um2blSi4yZT75TvHjyi44ujbQPsR3OHCPN8GAfAzIVSkbiECP2K xAqT2/h0E6VkGdEymELCMRHvhCI2wFrAoA6nWYCdyR2Ekg7VB/tnr6AGi+SNvP06 BETMf0BRxpd4sXOvS4+ydhBQQpydW4hiw61RHs8xFiy0W7pqp5Zh4ZHHcZBR2KRT TXfoxrwQIBIWKlyBdgv9d0maOWg3uq3I3MqO2vnGj/XRPsjs/BWCX9BYZqpnEATB MasQItCMPoOfmVxlS+cS7rIXXVFdwulm2s5GRZR9PwEuMS8Vmi9A5UyEpshlDYZM ZMPT3CScFOyczVgC3N+LyO7rYaJMlcNQD/HxxQDvpXoYinxQAFo4eVE2+490XN8j Od8n3UIo72+rFyyFJ8A7iBORYF9UD44VrFHQRHROTEvv7dV1OTYSVZcdqBb4Ik6S 8Wl+qMIEm8VcuFKI4b/T =4IaO -----END PGP SIGNATURE----- From brett at python.org Wed Jun 15 13:59:58 2016 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jun 2016 17:59:58 +0000 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: > These are really two separate proposals. > > I'm okay with checking the return value of calling obj.__fspath__; that's > an error in the object anyways, and it doesn't matter much whether we do > this or not (though when approving the PEP I considered this and decided > not to insert a check for this). But it doesn't affect your example, does > it? I guess it's easier to raise now and change the API in the future to > avoid raising in this case (if we find that raising is undesirable) than > the other way around, so I'm +0 on this. > +0 from me as well. I know in some code in the stdlib that has been ported which prior to adding support was explicitly checking for str/bytes this will eliminate its own checking (obviously not a motivating factor as it's pretty minor). > > The other proposal (passing anything that's not understood right through) > is more interesting and your use case is somewhat compelling. Catching the > exception coming out of os.fspath() would certainly be much messier. The > question remaining is whether, when this behavior is not desired (e.g. when > the caller of os.fspath() just wants a string that it can pass to open()), > the condition of passing that's neither a string not supports __fspath__ > still produces an understandable error. I'm not sure that that's the case. > E.g. open() accepts file descriptors in addition to paths, but I'm not sure > that accepting an integer is a good idea in most cases -- it either gives a > mystery "Bad file descriptor" error or starts reading/writing some random > system file, which it then closes once the stream is closed. > The FD issue of magically passing through an int was also a concern when Ethan brought this up in an issue on the tracker. My argument is that FDs are not file paths and so shouldn't magically pass through if we're going to type-check anything or claim os.fspath() only works with paths (FDs are already open file objects). So in my view either we go ahead and type-check the return value of __fspath__() and thus restrict everything coming out of os.fspath() to Union[str, bytes] or we don't type check anything and be consistent that os.fspath() simply does is call __fspath__() if present. And just because I'm thinking about it, I would special-case the FDs, not os.PathLike (clearer why you care and faster as it skips the override of __subclasshook__): # Can be a single-line ternary operator if preferred. if not isinstance(filename, int): filename = os.fspath(filename) > On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman wrote: > >> I would like to make a change to os.fspath(). >> >> Specifically, os.fspath() currently raises an exception if something >> besides str, bytes, or os.PathLike is passed in, but makes no checks >> if an os.PathLike object returns something besides a str or bytes. >> >> I would like to change that to the opposite: if a non-os.PathLike is >> passed in, return it unchanged (so no change for str and bytes); if >> an os.PathLike object returns something that is not a str nor bytes, >> raise. >> >> An example of the difference in the lzma file: >> >> Current code (has not been upgraded to use os.fspath() yet) >> ----------------------------------------------------------- >> >> if isinstance(filename, (str, bytes)): >> if "b" not in mode: >> mode += "b" >> self._fp = builtins.open(filename, mode) >> self._closefp = True >> self._mode = mode_code >> elif hasattr(filename, "read") or hasattr(filename, "write"): >> self._fp = filename >> self._mode = mode_code >> else: >> raise TypeError( >> "filename must be a str or bytes object, or a file" >> ) >> >> Code change if using upgraded os.fspath() (placed before above stanza): >> >> filename = os.fspath(filename) >> >> Code change with current os.fspath() (ditto): >> >> if isinstance(filename, os.PathLike): >> filename = os.fspath(filename) >> >> My intention with the os.fspath() function was to minimize boiler-plate >> code and make PathLike objects easy and painless to support; having to >> discover if any given parameter is PathLike before calling os.fspath() >> on it is, IMHO, just the opposite. >> >> There is also precedent for having a __dunder__ check the return type: >> >> --> class Huh: >> ... def __int__(self): >> ... return 'string' >> ... def __index__(self): >> ... return b'bytestring' >> ... def __bool__(self): >> ... return 'true-ish' >> ... >> --> h = Huh() >> >> --> int(h) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __int__ returned non-int (type str) >> >> --> ''[h] >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __index__ returned non-int (type bytes) >> >> --> bool(h) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __bool__ should return bool, returned str >> >> Arguments in favor or against? >> >> -- >> ~Ethan~ >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Jun 15 13:39:03 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 15 Jun 2016 20:39:03 +0300 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: My proposal at the point of the first PEP draft solved both of these issues. That version of the fspath function passed anything right through that was an instance of the keyword-only `type_constraint`. If not, it would ask __fspath__, and before returning the result, it would check that __fspath__ returned an instance of `type_constraint` and otherwise raise a TypeError. `type_constraint=object` would then have given the behavior you want. I always wanted fspath to spare the caller from all the instance checking (most of which it does even now). The main problem with setting type_constraint to something broader than (str, bytes) is that then that parameter would affect the return type of the function, which would at least complicate the type hinting issue. Mypy might now support things like @overload def fspath(path: T, type_constraint: Type[T] = (str, bytes)) -> T: ... but then again, isinstance and Union are not compatible (for a reason?), and PEP484 for a reason does not allow tuples like (str, bytes) in place of Unions. Anyway, if we were to go back to this behavior, we would need to decide whether to officially allow a wider type constraint or whether to leave that to Stack Overflow, so to speak. -- Koos On Wed, Jun 15, 2016 at 7:46 PM, Guido van Rossum wrote: > These are really two separate proposals. > > I'm okay with checking the return value of calling obj.__fspath__; that's an > error in the object anyways, and it doesn't matter much whether we do this > or not (though when approving the PEP I considered this and decided not to > insert a check for this). But it doesn't affect your example, does it? I > guess it's easier to raise now and change the API in the future to avoid > raising in this case (if we find that raising is undesirable) than the other > way around, so I'm +0 on this. > > The other proposal (passing anything that's not understood right through) is > more interesting and your use case is somewhat compelling. Catching the > exception coming out of os.fspath() would certainly be much messier. The > question remaining is whether, when this behavior is not desired (e.g. when > the caller of os.fspath() just wants a string that it can pass to open()), > the condition of passing that's neither a string not supports __fspath__ > still produces an understandable error. I'm not sure that that's the case. > E.g. open() accepts file descriptors in addition to paths, but I'm not sure > that accepting an integer is a good idea in most cases -- it either gives a > mystery "Bad file descriptor" error or starts reading/writing some random > system file, which it then closes once the stream is closed. > > On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman wrote: >> >> I would like to make a change to os.fspath(). >> >> Specifically, os.fspath() currently raises an exception if something >> besides str, bytes, or os.PathLike is passed in, but makes no checks >> if an os.PathLike object returns something besides a str or bytes. >> >> I would like to change that to the opposite: if a non-os.PathLike is >> passed in, return it unchanged (so no change for str and bytes); if >> an os.PathLike object returns something that is not a str nor bytes, >> raise. >> >> An example of the difference in the lzma file: >> >> Current code (has not been upgraded to use os.fspath() yet) >> ----------------------------------------------------------- >> >> if isinstance(filename, (str, bytes)): >> if "b" not in mode: >> mode += "b" >> self._fp = builtins.open(filename, mode) >> self._closefp = True >> self._mode = mode_code >> elif hasattr(filename, "read") or hasattr(filename, "write"): >> self._fp = filename >> self._mode = mode_code >> else: >> raise TypeError( >> "filename must be a str or bytes object, or a file" >> ) >> >> Code change if using upgraded os.fspath() (placed before above stanza): >> >> filename = os.fspath(filename) >> >> Code change with current os.fspath() (ditto): >> >> if isinstance(filename, os.PathLike): >> filename = os.fspath(filename) >> >> My intention with the os.fspath() function was to minimize boiler-plate >> code and make PathLike objects easy and painless to support; having to >> discover if any given parameter is PathLike before calling os.fspath() >> on it is, IMHO, just the opposite. >> >> There is also precedent for having a __dunder__ check the return type: >> >> --> class Huh: >> ... def __int__(self): >> ... return 'string' >> ... def __index__(self): >> ... return b'bytestring' >> ... def __bool__(self): >> ... return 'true-ish' >> ... >> --> h = Huh() >> >> --> int(h) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __int__ returned non-int (type str) >> >> --> ''[h] >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __index__ returned non-int (type bytes) >> >> --> bool(h) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __bool__ should return bool, returned str >> >> Arguments in favor or against? >> >> -- >> ~Ethan~ >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com > -- -- + Koos Zevenhoven + http://twitter.com/k7hoven + From ncoghlan at gmail.com Wed Jun 15 14:29:52 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jun 2016 11:29:52 -0700 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: On 15 June 2016 at 10:59, Brett Cannon wrote: > > > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: >> >> These are really two separate proposals. >> >> I'm okay with checking the return value of calling obj.__fspath__; that's >> an error in the object anyways, and it doesn't matter much whether we do >> this or not (though when approving the PEP I considered this and decided not >> to insert a check for this). But it doesn't affect your example, does it? I >> guess it's easier to raise now and change the API in the future to avoid >> raising in this case (if we find that raising is undesirable) than the other >> way around, so I'm +0 on this. > > +0 from me as well. I know in some code in the stdlib that has been ported > which prior to adding support was explicitly checking for str/bytes this > will eliminate its own checking (obviously not a motivating factor as it's > pretty minor). I'd like a strong assertion that the return value of os.fspath() is a plausible filesystem path representation (so either bytes or str), and *not* some other kind of object that can also be used for accessing the filesystem (like a file descriptor or an IO stream) >> The other proposal (passing anything that's not understood right through) >> is more interesting and your use case is somewhat compelling. Catching the >> exception coming out of os.fspath() would certainly be much messier. The >> question remaining is whether, when this behavior is not desired (e.g. when >> the caller of os.fspath() just wants a string that it can pass to open()), >> the condition of passing that's neither a string not supports __fspath__ >> still produces an understandable error. I'm not sure that that's the case. >> E.g. open() accepts file descriptors in addition to paths, but I'm not sure >> that accepting an integer is a good idea in most cases -- it either gives a >> mystery "Bad file descriptor" error or starts reading/writing some random >> system file, which it then closes once the stream is closed. > > The FD issue of magically passing through an int was also a concern when > Ethan brought this up in an issue on the tracker. My argument is that FDs > are not file paths and so shouldn't magically pass through if we're going to > type-check anything or claim os.fspath() only works with paths (FDs are > already open file objects). So in my view either we go ahead and type-check > the return value of __fspath__() and thus restrict everything coming out of > os.fspath() to Union[str, bytes] or we don't type check anything and be > consistent that os.fspath() simply does is call __fspath__() if present. > > And just because I'm thinking about it, I would special-case the FDs, not > os.PathLike (clearer why you care and faster as it skips the override of > __subclasshook__): > > # Can be a single-line ternary operator if preferred. > if not isinstance(filename, int): > filename = os.fspath(filename) Note that the LZMA case Ethan cites is one where the code accepts either an already opened file-like object *or* a path-like object, and does different things based on which it receives. In that scenario, rather than introducing an unconditional "filename = os.fspath(filename)" before the current logic, it makes more sense to me to change the current logic to use the new protocol check rather than a strict typecheck on str/bytes: if isinstance(filename, os.PathLike): # Changed line filename = os.fspath(filename) # New line if "b" not in mode: mode += "b" self._fp = builtins.open(filename, mode) self._closefp = True self._mode = mode_code elif hasattr(filename, "read") or hasattr(filename, "write"): self._fp = filename self._mode = mode_code else: raise TypeError( "filename must be a path-like or file-like object" ) I *don't* think it makes sense to weaken the guarantees on os.fspath to let it propagate non-path-like objects. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Jun 15 14:39:43 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Jun 2016 11:39:43 -0700 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: OK, so let's add a check on the return of __fspath__() and keep the check on path-like or string/bytes. --Guido (mobile) On Jun 15, 2016 11:29 AM, "Nick Coghlan" wrote: > On 15 June 2016 at 10:59, Brett Cannon wrote: > > > > > > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: > >> > >> These are really two separate proposals. > >> > >> I'm okay with checking the return value of calling obj.__fspath__; > that's > >> an error in the object anyways, and it doesn't matter much whether we do > >> this or not (though when approving the PEP I considered this and > decided not > >> to insert a check for this). But it doesn't affect your example, does > it? I > >> guess it's easier to raise now and change the API in the future to avoid > >> raising in this case (if we find that raising is undesirable) than the > other > >> way around, so I'm +0 on this. > > > > +0 from me as well. I know in some code in the stdlib that has been > ported > > which prior to adding support was explicitly checking for str/bytes this > > will eliminate its own checking (obviously not a motivating factor as > it's > > pretty minor). > > I'd like a strong assertion that the return value of os.fspath() is a > plausible filesystem path representation (so either bytes or str), and > *not* some other kind of object that can also be used for accessing > the filesystem (like a file descriptor or an IO stream) > > >> The other proposal (passing anything that's not understood right > through) > >> is more interesting and your use case is somewhat compelling. Catching > the > >> exception coming out of os.fspath() would certainly be much messier. The > >> question remaining is whether, when this behavior is not desired (e.g. > when > >> the caller of os.fspath() just wants a string that it can pass to > open()), > >> the condition of passing that's neither a string not supports __fspath__ > >> still produces an understandable error. I'm not sure that that's the > case. > >> E.g. open() accepts file descriptors in addition to paths, but I'm not > sure > >> that accepting an integer is a good idea in most cases -- it either > gives a > >> mystery "Bad file descriptor" error or starts reading/writing some > random > >> system file, which it then closes once the stream is closed. > > > > The FD issue of magically passing through an int was also a concern when > > Ethan brought this up in an issue on the tracker. My argument is that FDs > > are not file paths and so shouldn't magically pass through if we're > going to > > type-check anything or claim os.fspath() only works with paths (FDs are > > already open file objects). So in my view either we go ahead and > type-check > > the return value of __fspath__() and thus restrict everything coming out > of > > os.fspath() to Union[str, bytes] or we don't type check anything and be > > consistent that os.fspath() simply does is call __fspath__() if present. > > > > And just because I'm thinking about it, I would special-case the FDs, > not > > os.PathLike (clearer why you care and faster as it skips the override of > > __subclasshook__): > > > > # Can be a single-line ternary operator if preferred. > > if not isinstance(filename, int): > > filename = os.fspath(filename) > > Note that the LZMA case Ethan cites is one where the code accepts > either an already opened file-like object *or* a path-like object, and > does different things based on which it receives. > > In that scenario, rather than introducing an unconditional "filename = > os.fspath(filename)" before the current logic, it makes more sense to > me to change the current logic to use the new protocol check rather > than a strict typecheck on str/bytes: > > if isinstance(filename, os.PathLike): # Changed line > filename = os.fspath(filename) # New line > if "b" not in mode: > mode += "b" > self._fp = builtins.open(filename, mode) > self._closefp = True > self._mode = mode_code > elif hasattr(filename, "read") or hasattr(filename, "write"): > self._fp = filename > self._mode = mode_code > else: > raise TypeError( > "filename must be a path-like or file-like object" > ) > > I *don't* think it makes sense to weaken the guarantees on os.fspath > to let it propagate non-path-like objects. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Jun 15 14:44:13 2016 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jun 2016 18:44:13 +0000 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote: > OK, so let's add a check on the return of __fspath__() and keep the check > on path-like or string/bytes. > I'll update the PEP. Ethan, do you want to leave a note on the os.fspath() issue to update the code and go through where we've used os.fspath() to see where we can cut out redundant type checks? > --Guido (mobile) > On Jun 15, 2016 11:29 AM, "Nick Coghlan" wrote: > >> On 15 June 2016 at 10:59, Brett Cannon wrote: >> > >> > >> > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: >> >> >> >> These are really two separate proposals. >> >> >> >> I'm okay with checking the return value of calling obj.__fspath__; >> that's >> >> an error in the object anyways, and it doesn't matter much whether we >> do >> >> this or not (though when approving the PEP I considered this and >> decided not >> >> to insert a check for this). But it doesn't affect your example, does >> it? I >> >> guess it's easier to raise now and change the API in the future to >> avoid >> >> raising in this case (if we find that raising is undesirable) than the >> other >> >> way around, so I'm +0 on this. >> > >> > +0 from me as well. I know in some code in the stdlib that has been >> ported >> > which prior to adding support was explicitly checking for str/bytes this >> > will eliminate its own checking (obviously not a motivating factor as >> it's >> > pretty minor). >> >> I'd like a strong assertion that the return value of os.fspath() is a >> plausible filesystem path representation (so either bytes or str), and >> *not* some other kind of object that can also be used for accessing >> the filesystem (like a file descriptor or an IO stream) >> >> >> The other proposal (passing anything that's not understood right >> through) >> >> is more interesting and your use case is somewhat compelling. Catching >> the >> >> exception coming out of os.fspath() would certainly be much messier. >> The >> >> question remaining is whether, when this behavior is not desired (e.g. >> when >> >> the caller of os.fspath() just wants a string that it can pass to >> open()), >> >> the condition of passing that's neither a string not supports >> __fspath__ >> >> still produces an understandable error. I'm not sure that that's the >> case. >> >> E.g. open() accepts file descriptors in addition to paths, but I'm not >> sure >> >> that accepting an integer is a good idea in most cases -- it either >> gives a >> >> mystery "Bad file descriptor" error or starts reading/writing some >> random >> >> system file, which it then closes once the stream is closed. >> > >> > The FD issue of magically passing through an int was also a concern when >> > Ethan brought this up in an issue on the tracker. My argument is that >> FDs >> > are not file paths and so shouldn't magically pass through if we're >> going to >> > type-check anything or claim os.fspath() only works with paths (FDs are >> > already open file objects). So in my view either we go ahead and >> type-check >> > the return value of __fspath__() and thus restrict everything coming >> out of >> > os.fspath() to Union[str, bytes] or we don't type check anything and be >> > consistent that os.fspath() simply does is call __fspath__() if present. >> > >> > And just because I'm thinking about it, I would special-case the FDs, >> not >> > os.PathLike (clearer why you care and faster as it skips the override of >> > __subclasshook__): >> > >> > # Can be a single-line ternary operator if preferred. >> > if not isinstance(filename, int): >> > filename = os.fspath(filename) >> >> Note that the LZMA case Ethan cites is one where the code accepts >> either an already opened file-like object *or* a path-like object, and >> does different things based on which it receives. >> >> In that scenario, rather than introducing an unconditional "filename = >> os.fspath(filename)" before the current logic, it makes more sense to >> me to change the current logic to use the new protocol check rather >> than a strict typecheck on str/bytes: >> >> if isinstance(filename, os.PathLike): # Changed line >> filename = os.fspath(filename) # New line >> if "b" not in mode: >> mode += "b" >> self._fp = builtins.open(filename, mode) >> self._closefp = True >> self._mode = mode_code >> elif hasattr(filename, "read") or hasattr(filename, "write"): >> self._fp = filename >> self._mode = mode_code >> else: >> raise TypeError( >> "filename must be a path-like or file-like object" >> ) >> >> I *don't* think it makes sense to weaken the guarantees on os.fspath >> to let it propagate non-path-like objects. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 15 14:46:21 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jun 2016 11:46:21 -0700 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: <5761A27D.3070004@stoneleaf.us> On 06/15/2016 10:59 AM, Brett Cannon wrote: > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: >> These are really two separate proposals. >> >> I'm okay with checking the return value of calling obj.__fspath__; >> that's an error in the object anyways, and it doesn't matter much >> whether we do this or not (though when approving the PEP I >> considered this and decided not to insert a check for this). But it >> doesn't affect your example, does it? I guess it's easier to raise >> now and change the API in the future to avoid raising in this case >> (if we find that raising is undesirable) than the other way around, >> so I'm +0 on this. > > +0 from me as well. I know in some code in the stdlib that has been > ported which prior to adding support was explicitly checking for > str/bytes this will eliminate its own checking (obviously not a > motivating factor as it's pretty minor). If we accept both parts of this proposal the checking will have to stay in place as the original argument may not have been bytes, str, nor os.PathLike. >> The other proposal (passing anything that's not understood right >> through) is more interesting and your use case is somewhat >> compelling. Catching the exception coming out of os.fspath() would >> certainly be much messier. The question remaining is whether, when >> this behavior is not desired (e.g. when the caller of os.fspath() >> just wants a string that it can pass to open()), the condition of >> passing that's neither a string not supports __fspath__ still >> produces an understandable error. This is no different than before os.fspath() existed -- if the function wasn't checking that the "filename" was a str but just used it as-is, then whatever strange, possibly-hard-to-debug error they would get now is the same as what they would have gotten before. >> I'm not sure that that's the case. >> E.g. open() accepts file descriptors in addition to paths, but I'm >> not sure that accepting an integer is a good idea in most cases -- >> it either gives a mystery "Bad file descriptor" error or starts >> reading/writing some random system file, which it then closes once >> the stream is closed. My vision of os.fspath() is simply to reduce rich-path objects to their component str or bytes representation, and pass anything else through. The advantage: - if os.open accepts str/bytes/fd it can prep the argument by calling os.fspath() and then do it's argument checking all in one place; - if lzma accepts bytes/str/filelike-obj it can prep its argument by calling os.fspath() and then do it's argument checking all in one place - if Path accepts str/os.PathLike it can prep it's argument(s) with os.fspath() and then do its argument checking all in one place. > The FD issue of magically passing through an int was also a concern when > Ethan brought this up in an issue on the tracker. My argument is that > FDs are not file paths and so shouldn't magically pass through if we're > going to type-check anything or claim os.fspath() only works with paths > (FDs are already open file objects). So in my view either we go ahead > and type-check the return value of __fspath__() and thus restrict > everything coming out of os.fspath() to Union[str, bytes] or we don't > type check anything and be consistent that os.fspath() simply does is > call __fspath__() if present. This is better than what os.fspath() currently does as it has all the advantages listed above, but why is checking the output of __fspath__ incompatible with not checking anything else? > And just because I'm thinking about it, I would special-case the FDs, > not os.PathLike (clearer why you care and faster as it skips the > override of __subclasshook__): > > # Can be a single-line ternary operator if preferred. > if not isinstance(filename, int): > filename = os.fspath(filename) That example will not do the right thing in the lzma case. -- ~Ethan~ From k7hoven at gmail.com Wed Jun 15 14:48:36 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 15 Jun 2016 21:48:36 +0300 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: On Wed, Jun 15, 2016 at 9:29 PM, Nick Coghlan wrote: > On 15 June 2016 at 10:59, Brett Cannon wrote: >> >> >> On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: >>> >>> These are really two separate proposals. >>> >>> I'm okay with checking the return value of calling obj.__fspath__; that's >>> an error in the object anyways, and it doesn't matter much whether we do >>> this or not (though when approving the PEP I considered this and decided not >>> to insert a check for this). But it doesn't affect your example, does it? I >>> guess it's easier to raise now and change the API in the future to avoid >>> raising in this case (if we find that raising is undesirable) than the other >>> way around, so I'm +0 on this. >> >> +0 from me as well. I know in some code in the stdlib that has been ported >> which prior to adding support was explicitly checking for str/bytes this >> will eliminate its own checking (obviously not a motivating factor as it's >> pretty minor). > > I'd like a strong assertion that the return value of os.fspath() is a > plausible filesystem path representation (so either bytes or str), and > *not* some other kind of object that can also be used for accessing > the filesystem (like a file descriptor or an IO stream) I agree, so I'm -0.5 on passing through any object (at least by default). >>> The other proposal (passing anything that's not understood right through) >>> is more interesting and your use case is somewhat compelling. Catching the >>> exception coming out of os.fspath() would certainly be much messier. The >>> question remaining is whether, when this behavior is not desired (e.g. when >>> the caller of os.fspath() just wants a string that it can pass to open()), >>> the condition of passing that's neither a string not supports __fspath__ >>> still produces an understandable error. I'm not sure that that's the case. >>> E.g. open() accepts file descriptors in addition to paths, but I'm not sure >>> that accepting an integer is a good idea in most cases -- it either gives a >>> mystery "Bad file descriptor" error or starts reading/writing some random >>> system file, which it then closes once the stream is closed. >> >> The FD issue of magically passing through an int was also a concern when >> Ethan brought this up in an issue on the tracker. My argument is that FDs >> are not file paths and so shouldn't magically pass through if we're going to >> type-check anything or claim os.fspath() only works with paths (FDs are >> already open file objects). So in my view either we go ahead and type-check >> the return value of __fspath__() and thus restrict everything coming out of >> os.fspath() to Union[str, bytes] or we don't type check anything and be >> consistent that os.fspath() simply does is call __fspath__() if present. >> >> And just because I'm thinking about it, I would special-case the FDs, not >> os.PathLike (clearer why you care and faster as it skips the override of >> __subclasshook__): >> >> # Can be a single-line ternary operator if preferred. >> if not isinstance(filename, int): >> filename = os.fspath(filename) > > Note that the LZMA case Ethan cites is one where the code accepts > either an already opened file-like object *or* a path-like object, and > does different things based on which it receives. > > In that scenario, rather than introducing an unconditional "filename = > os.fspath(filename)" before the current logic, it makes more sense to > me to change the current logic to use the new protocol check rather > than a strict typecheck on str/bytes: > > if isinstance(filename, os.PathLike): # Changed line > filename = os.fspath(filename) # New line You are making one of my earlier points here, thanks ;). The point is that the name PathLike sounds like it would mean anything path-like, except that os.PathLike does not include str and bytes. And I still think the naming should be a little different. So that would be (os.Pathlike, str, bytes) instead of just os.PathLike. > if "b" not in mode: > mode += "b" > self._fp = builtins.open(filename, mode) > self._closefp = True > self._mode = mode_code > elif hasattr(filename, "read") or hasattr(filename, "write"): > self._fp = filename > self._mode = mode_code > else: > raise TypeError( > "filename must be a path-like or file-like object" > ) > > I *don't* think it makes sense to weaken the guarantees on os.fspath > to let it propagate non-path-like objects. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com -- + Koos Zevenhoven + http://twitter.com/k7hoven + From ncoghlan at gmail.com Wed Jun 15 14:55:34 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jun 2016 11:55:34 -0700 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: On 10 June 2016 at 16:36, Neil Schemenauer wrote: > Nick Coghlan wrote: >> It could be very interesting to add an "ascii-warn" codec to Python >> 2.7, and then set that as the default encoding when the -3 flag is >> set. > > I don't think that can work. The library code in Python would spew > out warnings even in the cases when nothing is wrong with the > application code. I think warnings have to be added to a Python > where str and bytes have been properly separated. Without extreme > backporting efforts, that means 3.x. > > We don't want to saddle 3.x with a bunch of backwards compatibility > cruft. Maybe some of my runtime warning changes could be merged > using a command line flag to enable them. It would be nice to have > the stepping stone version just be normal 3.x with a command line > option. However, for the sanity of people maintaining 3.x, I think > perhaps we don't want to do it. Right, my initial negative reactions were mainly to the idea of having these kinds of capabilities in the mainline 3.x codebase (where we'd then have to support them for everyone, not just the folks that genuinely need them to help in migration from Python 2). The standard porting instructions currently assume code bases that are *mostly* bytes/unicode clean, with perhaps a few oversights where Python 3 rejects ambiguity that Python 2 tolerates. In that context, "run your test suite, address the test failures" should generally be sufficient, without needing to use a custom Python build. However, there are a couple of cases those standard instructions still don't cover: - if there's no test suite, exploratory discovery is problematic when the app falls over at the first type ambiguity - even if there is a test suite, sufficiently pervasive type ambiguity may make it difficult to use for fault isolation That's where I now agree your proposal for a variant build specifically aimed at compatibility testing is potentially interesting: - the tool would become an escalation path for folks that aren't in a position to use their own test suite to isolate type ambiguity problems under Python 3 - using Python 3 as a basis means you get a clean standard library that shouldn't emit any false alarms - the necessary feature set is defined by the common subset of Python 2.7 and a chosen minimum Python 3 version, not any future 3.x release, so you should be able to maintain the changes as a stable patch set without needing to chase CPython trunk (with the attendant risk of merge conflicts) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Wed Jun 15 15:09:39 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jun 2016 12:09:39 -0700 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: <5761A7F3.4040401@stoneleaf.us> On 06/15/2016 11:44 AM, Brett Cannon wrote: > On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote: >> OK, so let's add a check on the return of __fspath__() and keep the >> check on path-like or string/bytes. > > I'll update the PEP. > > Ethan, do you want to leave a note on the os.fspath() issue to update > the code and go through where we've used os.fspath() to see where we can > cut out redundant type checks? Will do. I didn't see this subthread before my last post, so unless you agree with those other changes feel free to ignore it. ;) -- ~Ethan~ From k7hoven at gmail.com Wed Jun 15 15:10:11 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 15 Jun 2016 22:10:11 +0300 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: >> if isinstance(filename, os.PathLike): By the way, regarding the line of code above, is there a convention regarding whether implementing some protocol/interface requires registering with (or inheriting from) the appropriate ABC for it to work in all situations. IOW, in this case, is it sufficient to implement __fspath__ to make your type pathlike? Is there a conscious trend towards requiring the ABC? -- Koos From brett at python.org Wed Jun 15 15:15:01 2016 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jun 2016 19:15:01 +0000 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: On Wed, 15 Jun 2016 at 12:12 Koos Zevenhoven wrote: > >> if isinstance(filename, os.PathLike): > > By the way, regarding the line of code above, is there a convention > regarding whether implementing some protocol/interface requires > registering with (or inheriting from) the appropriate ABC for it to > work in all situations. IOW, in this case, is it sufficient to > implement __fspath__ to make your type pathlike? Is there a conscious > trend towards requiring the ABC? > ABCs like os.PathLike can override __subclasshook__ so that registration isn't required (see https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration is definitely good to do to be explicit that you're trying to meet an ABC, but it isn't strictly required. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 15 15:16:38 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jun 2016 12:16:38 -0700 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: <5761A996.3010101@stoneleaf.us> On 06/15/2016 12:10 PM, Koos Zevenhoven wrote: >>> if isinstance(filename, os.PathLike): > > By the way, regarding the line of code above, is there a convention > regarding whether implementing some protocol/interface requires > registering with (or inheriting from) the appropriate ABC for it to > work in all situations. IOW, in this case, is it sufficient to > implement __fspath__ to make your type pathlike? Is there a conscious > trend towards requiring the ABC? The ABC is not required, simply having the __fspath__ attribute is enough. Of course, to actually work that attribute should be a function that returns a str or bytes object. ;) -- ~Ethan~ From k7hoven at gmail.com Wed Jun 15 15:24:43 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 15 Jun 2016 22:24:43 +0300 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: On Wed, Jun 15, 2016 at 10:15 PM, Brett Cannon wrote: > > > On Wed, 15 Jun 2016 at 12:12 Koos Zevenhoven wrote: >> >> >> if isinstance(filename, os.PathLike): >> >> By the way, regarding the line of code above, is there a convention >> regarding whether implementing some protocol/interface requires >> registering with (or inheriting from) the appropriate ABC for it to >> work in all situations. IOW, in this case, is it sufficient to >> implement __fspath__ to make your type pathlike? Is there a conscious >> trend towards requiring the ABC? > > > ABCs like os.PathLike can override __subclasshook__ so that registration > isn't required (see > https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration > is definitely good to do to be explicit that you're trying to meet an ABC, > but it isn't strictly required. Ok I suppose that's fine, so I propose we update the ABC part in the PEP with __subclasshook__. And the other question could be turned into whether to make str and bytes also PathLike in __subclasshook__. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + From brett at python.org Wed Jun 15 15:57:28 2016 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jun 2016 19:57:28 +0000 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: PEP 519 updated: https://hg.python.org/peps/rev/92feff129ee4 On Wed, 15 Jun 2016 at 11:44 Brett Cannon wrote: > On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote: > >> OK, so let's add a check on the return of __fspath__() and keep the check >> on path-like or string/bytes. >> > > I'll update the PEP. > > Ethan, do you want to leave a note on the os.fspath() issue to update the > code and go through where we've used os.fspath() to see where we can cut > out redundant type checks? > > >> --Guido (mobile) >> On Jun 15, 2016 11:29 AM, "Nick Coghlan" wrote: >> >>> On 15 June 2016 at 10:59, Brett Cannon wrote: >>> > >>> > >>> > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum >>> wrote: >>> >> >>> >> These are really two separate proposals. >>> >> >>> >> I'm okay with checking the return value of calling obj.__fspath__; >>> that's >>> >> an error in the object anyways, and it doesn't matter much whether we >>> do >>> >> this or not (though when approving the PEP I considered this and >>> decided not >>> >> to insert a check for this). But it doesn't affect your example, does >>> it? I >>> >> guess it's easier to raise now and change the API in the future to >>> avoid >>> >> raising in this case (if we find that raising is undesirable) than >>> the other >>> >> way around, so I'm +0 on this. >>> > >>> > +0 from me as well. I know in some code in the stdlib that has been >>> ported >>> > which prior to adding support was explicitly checking for str/bytes >>> this >>> > will eliminate its own checking (obviously not a motivating factor as >>> it's >>> > pretty minor). >>> >>> I'd like a strong assertion that the return value of os.fspath() is a >>> plausible filesystem path representation (so either bytes or str), and >>> *not* some other kind of object that can also be used for accessing >>> the filesystem (like a file descriptor or an IO stream) >>> >>> >> The other proposal (passing anything that's not understood right >>> through) >>> >> is more interesting and your use case is somewhat compelling. >>> Catching the >>> >> exception coming out of os.fspath() would certainly be much messier. >>> The >>> >> question remaining is whether, when this behavior is not desired >>> (e.g. when >>> >> the caller of os.fspath() just wants a string that it can pass to >>> open()), >>> >> the condition of passing that's neither a string not supports >>> __fspath__ >>> >> still produces an understandable error. I'm not sure that that's the >>> case. >>> >> E.g. open() accepts file descriptors in addition to paths, but I'm >>> not sure >>> >> that accepting an integer is a good idea in most cases -- it either >>> gives a >>> >> mystery "Bad file descriptor" error or starts reading/writing some >>> random >>> >> system file, which it then closes once the stream is closed. >>> > >>> > The FD issue of magically passing through an int was also a concern >>> when >>> > Ethan brought this up in an issue on the tracker. My argument is that >>> FDs >>> > are not file paths and so shouldn't magically pass through if we're >>> going to >>> > type-check anything or claim os.fspath() only works with paths (FDs are >>> > already open file objects). So in my view either we go ahead and >>> type-check >>> > the return value of __fspath__() and thus restrict everything coming >>> out of >>> > os.fspath() to Union[str, bytes] or we don't type check anything and be >>> > consistent that os.fspath() simply does is call __fspath__() if >>> present. >>> > >>> > And just because I'm thinking about it, I would special-case the FDs, >>> not >>> > os.PathLike (clearer why you care and faster as it skips the override >>> of >>> > __subclasshook__): >>> > >>> > # Can be a single-line ternary operator if preferred. >>> > if not isinstance(filename, int): >>> > filename = os.fspath(filename) >>> >>> Note that the LZMA case Ethan cites is one where the code accepts >>> either an already opened file-like object *or* a path-like object, and >>> does different things based on which it receives. >>> >>> In that scenario, rather than introducing an unconditional "filename = >>> os.fspath(filename)" before the current logic, it makes more sense to >>> me to change the current logic to use the new protocol check rather >>> than a strict typecheck on str/bytes: >>> >>> if isinstance(filename, os.PathLike): # Changed line >>> filename = os.fspath(filename) # New line >>> if "b" not in mode: >>> mode += "b" >>> self._fp = builtins.open(filename, mode) >>> self._closefp = True >>> self._mode = mode_code >>> elif hasattr(filename, "read") or hasattr(filename, "write"): >>> self._fp = filename >>> self._mode = mode_code >>> else: >>> raise TypeError( >>> "filename must be a path-like or file-like object" >>> ) >>> >>> I *don't* think it makes sense to weaken the guarantees on os.fspath >>> to let it propagate non-path-like objects. >>> >>> Cheers, >>> Nick. >>> >>> -- >>> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 15 16:00:01 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jun 2016 13:00:01 -0700 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: References: <57617E57.40808@stoneleaf.us> Message-ID: <5761B3C1.3060601@stoneleaf.us> On 06/15/2016 12:24 PM, Koos Zevenhoven wrote: > On Wed, Jun 15, 2016 at 10:15 PM, Brett Cannon wrote: >> ABCs like os.PathLike can override __subclasshook__ so that registration >> isn't required (see >> https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration >> is definitely good to do to be explicit that you're trying to meet an ABC, >> but it isn't strictly required. > And the other question could be turned into whether to make str and > bytes also PathLike in __subclasshook__. No, for two reasons. - most str's and bytes' are not paths; - PathLike indicates a rich-path object, which str's and bytes' are not. -- ~Ethan~ From ncoghlan at gmail.com Wed Jun 15 16:01:27 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jun 2016 13:01:27 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160613122654.GE17328@thunk.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> Message-ID: [whew, actually read the whole thread] On 11 June 2016 at 10:28, Terry Reedy wrote: > On 6/11/2016 11:34 AM, Guido van Rossum wrote: >> >> In terms of API design, I'd prefer a flag to os.urandom() indicating a >> preference for >> - blocking >> - raising an exception >> - weaker random bits > > > +100 ;-) > > I proposed exactly this 2 days ago, 5 hours after Larry's initial post. No, this is a bad idea. Asking novice developers to make security decisions they're not yet qualified to make when it's genuinely possible for us to do the right thing by default is the antithesis of good security API design, and os.urandom() *is* a security API (whether we like it or not - third party documentation written by the cryptographic software development community has made it so, since it's part of their guidelines for writing security sensitive code in pure Python). Adding *new* APIs is also a bad idea, since "os.urandom() is the right answer on every OS except Linux, and also the best currently available answer on Linux" has been the standard security advice for generating cryptographic secrets in pure Python code for years now, so we should only change that guidance if we have extraordinarily compelling reasons to do so, and we don't. Instead, we have Ted T'so himself chiming in to say: "My preference would be that os.[u]random should block, because the odds that people would be trying to generate long-term cryptographic secrets within seconds after boot is very small, and if you *do* block for a second or two, it's not the end of the world." The *actual bug* that triggered this latest firestorm of commentary (from experts and non-experts alike) had *nothing* to do with user code calling os.urandom, and instead was a combination of: - CPython startup requesting cryptographically secure randomness when it didn't need it - a systemd init script written in Python running before the kernel RNG was fully initialised That created a deadlock between CPython startup and the rest of the Linux init process, so the latter only continued when the systemd watchdog timed out and killed the offending script. As others have noted, this kind of deadlock scenario is generally impossible on other operating systems, as the operating system doesn't provide a way to run Python code before the random number generator is ready. The change Victor made in 3.5.2 to fall back to reading /dev/urandom directly if the getrandom() syscall returns EAGAIN (effectively reverting to the Python 3.4 behaviour) was the simplest possible fix for that problem (and an approach I thoroughly endorse, both for 3.5.2 and for the life of the 3.5 series), but that doesn't make it the right answer for 3.6+. To repeat: the problem encountered was NOT due to user code calling os.urandom(), but rather due to the way CPython initialises its own internal hash algorithm at interpreter startup. However, due to the way CPython is currently implemented, fixing the regression in that not only changed the behaviour of CPython startup, it *also* changed the behaviour of every call to os.urandom() in Python 3.5.2+. For 3.6+, we can instead make it so that the only things that actually rely on cryptographic quality randomness being available are: - calling a secrets module API - calling a random.SystemRandom method - calling os.urandom directly These are all APIs that were either created specifically for use in security sensitive situations (secrets module), or have long been documented (both within our own documentation, and in third party documentation, books and Q&A sites) as being an appropriate choice for use in security sensitive situations (os.urandom and random.SystemRandom). However, we don't need to make those block waiting for randomness to be available - we can update them to raise BlockingIOError instead (which makes it trivial for people to decide for themselves how they want to handle that case). Along with that change, we can make it so that starting the interpreter will never block waiting for cryptographic randomness to be available (since it doesn't need it), and importing the random module won't block waiting for it either. To the best of our knowledge, on all operating systems other than Linux, encountering the new exception will still be impossible in practice, as there is no known opportunity to run Python code before the kernel random number generator is ready. On Linux, init scripts may still run before the kernel random number generator is ready, but will now throw an immediate BlockingIOError if they access an API that relies on crytographic randomness being available, rather than potentially deadlocking the init process. Folks encountering that situation will then need to make an explicit decision: - loop until the exception is no longer thrown - switch to reading from /dev/urandom directly instead of calling os.urandom() - switch to using a cross-platform non-cryptographic API (probably the random module) Victor has some additional technical details written up at http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to formalise this proposed approach as a PEP (the current reference is http://bugs.python.org/issue27282 ) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Wed Jun 15 16:30:33 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jun 2016 13:30:33 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> Message-ID: <5761BAE9.10604@stoneleaf.us> On 06/15/2016 01:01 PM, Nick Coghlan wrote: > For 3.6+, we can instead make it so that the only things that actually > rely on cryptographic quality randomness being available are: > > - calling a secrets module API > - calling a random.SystemRandom method > - calling os.urandom directly > > However, we don't need to make those block waiting for randomness to > be available - we can update them to raise BlockingIOError instead > (which makes it trivial for people to decide for themselves how they > want to handle that case). > > Along with that change, we can make it so that starting the > interpreter will never block waiting for cryptographic randomness to > be available (since it doesn't need it), and importing the random > module won't block waiting for it either. +1 -- ~Ethan~ From ncoghlan at gmail.com Wed Jun 15 16:30:19 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jun 2016 13:30:19 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 14 June 2016 at 02:41, Nikita Nemkin wrote: > Is there any rationale for rejecting alternatives like: Good questions - Eric, it's likely worth capturing answers to these in the PEP for the benefit of future readers. > 1. Adding standard metaclass with ordered namespace. Adding metaclasses to an existing class can break compatibility with third party subclasses, so making it possible for people to avoid that while still gaining the ability to implicitly expose attribute ordering to class decorators and other potentially interested parties is a recurring theme behind this PEP and also PEPs 422 and 487. > 2. Adding `namespace` or `ordered` args to the default metaclass. See below (as it relates to your own complexity argument) > 3. Making compiler fill in __definition_order__ for every class > (just like __qualname__) without touching the runtime. > ? Class scopes support conditionals and loops, so we can't necessarily be sure what names will be assigned without running the code. It's also possible to make attribute assignments via locals() that are entirely opaque to the compiler, but visible to the interpreter at runtime. > To me, any of the above seems preferred to complicating > the core part of the language forever. > > The vast majority of Python classes don't care about their member > order, this is minority use case receiving majority treatment. > > Also, wiring OrderedDict into class creation means elevating it > from a peripheral utility to indispensable built-in type. Right, that's one of the key reasons this is a PEP, rather than just an item on the issue tracker. The rationale for "Why not make this configurable, rather than switching it unilaterally?" is that it's actually *simpler* overall to just make it the default - we can then change the documentation to say "class bodies are evaluated in a collections.OrderedDict instance by default" and record the consequences of that, rather than having to document yet another class customisation mechanism. It also eliminates boilerplate from class decorator usage instructions, where people have to write "to use this class decorator, you must also specify 'namespace=collections.OrderedDict' in your class header" Folks that don't need the ordering information do end up paying a slight import time and memory cost, which is another key reason for handling the proposal as a PEP rather than just as a tracker issue. Aside from the boilerplate reduction when used in conjunction with a class decorator, a further possible category of consumers would be documentation generators like pydoc and Sphinx apidoc, which may be able to switch to displaying methods in definition order, rather than the current approach of always listing them in alphabetical order. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From k7hoven at gmail.com Wed Jun 15 16:58:17 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 15 Jun 2016 23:58:17 +0300 Subject: [Python-Dev] proposed os.fspath() change In-Reply-To: <5761B3C1.3060601@stoneleaf.us> References: <57617E57.40808@stoneleaf.us> <5761B3C1.3060601@stoneleaf.us> Message-ID: On Wed, Jun 15, 2016 at 11:00 PM, Ethan Furman wrote: > On 06/15/2016 12:24 PM, Koos Zevenhoven wrote: >> >> And the other question could be turned into whether to make str and >> bytes also PathLike in __subclasshook__. > > No, for two reasons. > > - most str's and bytes' are not paths; True. Well, at least most str and bytes objects are not *meant* to be used as paths, even if they could be. > - PathLike indicates a rich-path object, which str's and bytes' are not. This does not count as a reason. If this were called pathlib.PathABC, I would definitely agree [1]. But since this is called os.PathLike, I'm not quite as sure. Anyway, including str and bytes is more of a type hinting issue. And since type hints will in also act as documentation, the naming of types is becoming more important. -- Koos [1] No, I'm not proposing moving this to pathlib > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com -- + Koos Zevenhoven + http://twitter.com/k7hoven + From andersjm at stofanet.dk Wed Jun 15 16:42:09 2016 From: andersjm at stofanet.dk (Anders J. Munch) Date: Wed, 15 Jun 2016 22:42:09 +0200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: References: <20160614151935.GY27919@ando.pearwood.info> <20160615123401.GB27919@ando.pearwood.info> Message-ID: Paul Moore: > Finding out whether users/projects typically write such a helper > function for themselves would be a better way of getting this > information. Personally, I suspect they don't, but facts beat > speculation. Well, I did. It was necessary to get 2to3 conversion to work(*). I turned every occurence of E.encode('base-64') and E.decode('base-64') into helper function calls that for Python 3 did: b64encode(E).decode('ascii') and b64decode(E.encode('ascii')) (Or something similar, I don't have the code in front of me.) Leaving out .decode/.encode('ascii') would simply not have worked. That would just be asking for TypeError's. regards, Anders (*) Yes, I use 2to3, believe it or not. Maintaining Python 2 code and doing an automated conversion to Python 3 as needed. From njs at pobox.com Wed Jun 15 19:12:57 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 15 Jun 2016 16:12:57 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> Message-ID: On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan wrote: [...] > For 3.6+, we can instead make it so that the only things that actually > rely on cryptographic quality randomness being available are: > > - calling a secrets module API > - calling a random.SystemRandom method > - calling os.urandom directly > > These are all APIs that were either created specifically for use in > security sensitive situations (secrets module), or have long been > documented (both within our own documentation, and in third party > documentation, books and Q&A sites) as being an appropriate choice for > use in security sensitive situations (os.urandom and > random.SystemRandom). > > However, we don't need to make those block waiting for randomness to > be available - we can update them to raise BlockingIOError instead > (which makes it trivial for people to decide for themselves how they > want to handle that case). > > Along with that change, we can make it so that starting the > interpreter will never block waiting for cryptographic randomness to > be available (since it doesn't need it), and importing the random > module won't block waiting for it either. This all seems exactly right to me, to the point that I've been dreading having to find the time to write pretty much this exact email. So thank you :-) > To the best of our knowledge, on all operating systems other than > Linux, encountering the new exception will still be impossible in > practice, as there is no known opportunity to run Python code before > the kernel random number generator is ready. > > On Linux, init scripts may still run before the kernel random number > generator is ready, but will now throw an immediate BlockingIOError if > they access an API that relies on crytographic randomness being > available, rather than potentially deadlocking the init process. Folks > encountering that situation will then need to make an explicit > decision: > > - loop until the exception is no longer thrown > - switch to reading from /dev/urandom directly instead of calling os.urandom() > - switch to using a cross-platform non-cryptographic API (probably the > random module) > > Victor has some additional technical details written up at > http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to > formalise this proposed approach as a PEP (the current reference is > http://bugs.python.org/issue27282 ) I'd make two additional suggestions: - one person did chime in on the thread to say that they've used os.urandom for non-security-sensitive purposes, simply because it provided a convenient "give me a random byte-string" API that is missing from random. I think we should go ahead and add a .randbytes method to random.Random that simply returns a random bytestring using the regular RNG, to give these users a nice drop-in replacement for os.urandom. Rationale: I don't think the existence of these users should block making os.urandom appropriate for generating secrets, because (1) a glance at github shows that this is very unusual -- if you skim through this search you get page after page of functions with names like "generate_secret_key" https://github.com/search?l=python&p=2&q=urandom&ref=searchresults&type=Code&utf8=%E2%9C%93 and (2) for the minority of people who are using os.urandom for non-security-sensitive purposes, if they find os.urandom raising an error, then this is just a regular bug that they will notice immediately and fix, and anyway it's basically never going to happen. (As far as we can tell, this has never yet happened in the wild, even once.) OTOH if os.urandom is allowed to fail silently, then people who are using it to generate secrets will get silent catastrophic failures, plus those users can't assume it will never happen because they have to worry about active attackers trying to drive systems into unusual states. So I'd much rather ask the non-security-sensitive users to switch to using something in random, than force the cryptographic users to switch to using secrets. But it does seem like it would be good to give those non-security-sensitive users something to switch to :-). - It's not exactly true that the Python interpreter doesn't need cryptographic randomness to initialize SipHash -- it's more that *some* Python invocations need unguessable randomness (to first approximation: all those which are exposed to hostile input), and some don't. And since the Python interpreter has no idea which case it's in, and since it's unacceptable for it to break invocations that don't need unguessable hashes, then it has to err on the side of continuing without randomness. All that's fine. But, given that the interpreter doesn't know which state it's in, there's also the possibility that this invocation *will* be exposed to hostile input, and the 3.5.2+ behavior gives absolutely no warning that this is what's happening. So instead of letting this potential error pass silently, I propose that if SipHash fails to acquire real randomness at startup, then it should issue a warning. In practice, this will almost never happen. But in the rare cases it does, it at least gives the user a fighting chance to realize that their system is in a potentially dangerous state. And by using the warnings module, we automatically get quite a bit of flexibility. If some particular invocation (e.g. systemd-cron) has audited their code and decided that they don't care about this issue, they can make the message go away: PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning OTOH if some particular invocation knows that they do process potentially hostile input early on (e.g. cloud-init, maybe?), then they can explicitly promote the warning to an error: PYTHONWARNINGS=error::NoEntropyAtStartupWarning (I guess the way to implement this would be for the SipHash initialization code -- which runs very early -- to set some flag, and then we expose that flag in sys._something, and later in the startup sequence check for it after the warnings module is functional. Exposing the flag at the Python level would also make it possible for code like cloud-init to do its own explicit check and respond appropriately.) -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Wed Jun 15 19:26:07 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jun 2016 16:26:07 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> Message-ID: On 15 June 2016 at 16:12, Nathaniel Smith wrote: > On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan wrote: >> Victor has some additional technical details written up at >> http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to >> formalise this proposed approach as a PEP (the current reference is >> http://bugs.python.org/issue27282 ) > > I'd make two additional suggestions: > > - one person did chime in on the thread to say that they've used > os.urandom for non-security-sensitive purposes, simply because it > provided a convenient "give me a random byte-string" API that is > missing from random. I think we should go ahead and add a .randbytes > method to random.Random that simply returns a random bytestring using > the regular RNG, to give these users a nice drop-in replacement for > os.urandom. That seems reasonable. > - It's not exactly true that the Python interpreter doesn't need > cryptographic randomness to initialize SipHash -- it's more that > *some* Python invocations need unguessable randomness (to first > approximation: all those which are exposed to hostile input), and some > don't. And since the Python interpreter has no idea which case it's > in, and since it's unacceptable for it to break invocations that don't > need unguessable hashes, then it has to err on the side of continuing > without randomness. All that's fine. > > But, given that the interpreter doesn't know which state it's in, > there's also the possibility that this invocation *will* be exposed to > hostile input, and the 3.5.2+ behavior gives absolutely no warning > that this is what's happening. So instead of letting this potential > error pass silently, I propose that if SipHash fails to acquire real > randomness at startup, then it should issue a warning. In practice, > this will almost never happen. But in the rare cases it does, it at > least gives the user a fighting chance to realize that their system is > in a potentially dangerous state. And by using the warnings module, we > automatically get quite a bit of flexibility. > > If some particular > invocation (e.g. systemd-cron) has audited their code and decided that > they don't care about this issue, they can make the message go away: > > PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning > > OTOH if some particular invocation knows that they do process > potentially hostile input early on (e.g. cloud-init, maybe?), then > they can explicitly promote the warning to an error: > > PYTHONWARNINGS=error::NoEntropyAtStartupWarning > > (I guess the way to implement this would be for the SipHash > initialization code -- which runs very early -- to set some flag, and > then we expose that flag in sys._something, and later in the startup > sequence check for it after the warnings module is functional. > Exposing the flag at the Python level would also make it possible for > code like cloud-init to do its own explicit check and respond > appropriately.) A Python level warning/flag seems overly elaborate to me, but we can easily emit a warning on stderr when SipHash is initialised via the fallback rather than the operating system's RNG. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jake at lwn.net Wed Jun 15 18:30:44 2016 From: jake at lwn.net (Jake Edge) Date: Wed, 15 Jun 2016 16:30:44 -0600 Subject: [Python-Dev] Final round of the Python Language Summit coverage at LWN Message-ID: <20160615163044.5abe13af@redtail.lan> Hola python-dev, The final batch of articles from the Python Language Summit is now ready. The starting point is here: https://lwn.net/Articles/688969/ I have added the final six sessions (with SubscriberLinks for those without a subscription): Python 3 in Fedora: https://lwn.net/Articles/690676/ https://lwn.net/SubscriberLink/690676/cdf118081ac0ffd5/ The Python JITs are coming: https://lwn.net/Articles/691070/ https://lwn.net/SubscriberLink/691070/2714fd6a4934f016/ Pyjion: https://lwn.net/Articles/691152/ https://lwn.net/SubscriberLink/691152/6334fd8d5a9992c0/ Why is Python slow?: https://lwn.net/Articles/691243/ https://lwn.net/SubscriberLink/691243/669cb2bf2fe220c4/ Automated testing of CPython patches: https://lwn.net/Articles/691307/ https://lwn.net/SubscriberLink/691307/89feefecfe425f58/ The Python security response team: https://lwn.net/Articles/691308/ https://lwn.net/SubscriberLink/691308/432ff50e0f9b794f/ The articles will be freely available (without using the SubscriberLink) to the world at large in a week ... until then, feel free to share the SubscriberLinks. Hopefully I have captured things reasonably well. If there are corrections or clarifications needed, though, I recommend posting them as comments on the article. With luck, I will be able to sit in on the summit again next year ... enjoy! jake -- Jake Edge - LWN - jake at lwn.net - http://lwn.net From tytso at mit.edu Thu Jun 16 01:25:41 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Thu, 16 Jun 2016 01:25:41 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> Message-ID: <20160616052541.GB32689@thunk.org> On Wed, Jun 15, 2016 at 04:12:57PM -0700, Nathaniel Smith wrote: > - It's not exactly true that the Python interpreter doesn't need > cryptographic randomness to initialize SipHash -- it's more that > *some* Python invocations need unguessable randomness (to first > approximation: all those which are exposed to hostile input), and some > don't. And since the Python interpreter has no idea which case it's > in, and since it's unacceptable for it to break invocations that don't > need unguessable hashes, then it has to err on the side of continuing > without randomness. All that's fine. In practice, those Python ivocation which are exposed to hostile input are those that are started while the network are up. The vast majority of time, they are launched by the web brwoser --- and if this happens after a second or so of the system getting networking interrupts, (a) getrandom won't block, and (b) /dev/urandom and getrandom will be initialized. Also, I wish people would say that this is only an issue on Linux. Again, FreeBSD's /dev/urandom will block as well if it is uninitialized. It's just that in practice, for both Linux and Freebsd, we try very hard to make sure /dev/urandom is fully initialized by the time it matters. It's just that so far, it's only on Linux when there was an attempt to use Python in the early init scripts, and in a VM and in a system where everything is modularized such that the deadlock became visible. > (I guess the way to implement this would be for the SipHash > initialization code -- which runs very early -- to set some flag, and > then we expose that flag in sys._something, and later in the startup > sequence check for it after the warnings module is functional. > Exposing the flag at the Python level would also make it possible for > code like cloud-init to do its own explicit check and respond > appropriately.) I really don't think it's that big a of a deal in *practice*, and but if you really are concerned about the very remote possibility that a Python invocation could start in early boot, and *then* also stick around for the long term, and *then* be exosed to hostile input --- what if you set the flag, and then later on, N minutes, either automatically, or via some trigger such as cloud-init --- try and see if /dev/urandom is initialized (even a few seconds later, so long as the init scripts are hanging, it should be initialized) have Python hash all of its dicts, or maybe just the non-system dicts (since those are presumably the ones mos tlikely to be exposed hostile input). - Ted From greg.ewing at canterbury.ac.nz Thu Jun 16 01:59:26 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Jun 2016 17:59:26 +1200 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <20160615123401.GB27919@ando.pearwood.info> References: <20160614151935.GY27919@ando.pearwood.info> <20160615123401.GB27919@ando.pearwood.info> Message-ID: <5762403E.90701@canterbury.ac.nz> Steven D'Aprano wrote: > I'm > satisfied that the choice made by Python is the right choice, and that > it meets the spirit (if, arguably, not the letter) of the RFC. IMO it meets the letter (if you read it a certain way) but *not* the spirit. -- Greg From barry at python.org Thu Jun 16 02:45:08 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 09:45:08 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> Message-ID: <20160616094508.3acf1de7.barry@wooz.org> On Jun 15, 2016, at 01:01 PM, Nick Coghlan wrote: >No, this is a bad idea. Asking novice developers to make security >decisions they're not yet qualified to make when it's genuinely >possible for us to do the right thing by default is the antithesis of >good security API design, and os.urandom() *is* a security API >(whether we like it or not - third party documentation written by the >cryptographic software development community has made it so, since >it's part of their guidelines for writing security sensitive code in >pure Python). Regardless of what third parties have said about os.urandom(), let's look at what *we* have said about it. Going back to pre-churn 3.4 documentation: os.urandom(n) Return a string of n random bytes suitable for cryptographic use. This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). If a randomness source is not found, NotImplementedError will be raised. For an easy-to-use interface to the random number generator provided by your platform, please see random.SystemRandom. So we very clearly provided platform-dependent caveats on the cryptographic quality of os.urandom(). We also made a strong claim that there's a direct connection between os.urandom() and /dev/urandom on "Unix-like system(s)". We broke that particular promise in 3.5. and semi-fixed it 3.5.2. >Adding *new* APIs is also a bad idea, since "os.urandom() is the right >answer on every OS except Linux, and also the best currently available >answer on Linux" has been the standard security advice for generating >cryptographic secrets in pure Python code for years now, so we should >only change that guidance if we have extraordinarily compelling >reasons to do so, and we don't. Disagree. We have broken one long-term promise on os.urandom() ("On a Unix-like system this will query /dev/urandom") and changed another ("should be unpredictable enough for cryptographic applications, though its exact quality depends on OS implementations"). We broke the experienced Linux developer's natural and long-standing link between the API called os.urandom() and /dev/urandom. This breaks pre-3.5 code that assumes read-from-/dev/urandom semantics for os.urandom(). We have introduced churn. Predicting a future SO question such as "Can os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the 3.5.x series, and yes possibly in Python 3.6 and beyond". We have a better answer for "cryptographically appropriate" use cases in Python 3.6 - the secrets module. Trying to make os.urandom() "the right answer on every OS" weakens the promotion of secrets as *the* module to use for cryptographically appropriate use cases. IMHO it would be better to leave os.urandom() well enough alone, except for the documentation which should effectively say, a la 3.4: os.urandom(n) Return a string of n random bytes suitable for cryptographic use. This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom(). If a randomness source is not found, NotImplementedError will be raised. Cryptographic applications should use the secrets module for stronger guaranteed sources of randomness. For an easy-to-use interface to the random number generator provided by your platform, please see random.SystemRandom. Cheers, -Barry From larry at hastings.org Thu Jun 16 02:52:19 2016 From: larry at hastings.org (Larry Hastings) Date: Wed, 15 Jun 2016 23:52:19 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616094508.3acf1de7.barry@wooz.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: <57624CA3.3020704@hastings.org> On 06/15/2016 11:45 PM, Barry Warsaw wrote: > So we very clearly provided platform-dependent caveats on the cryptographic > quality of os.urandom(). We also made a strong claim that there's a direct > connection between os.urandom() and /dev/urandom on "Unix-like system(s)". > > We broke that particular promise in 3.5. and semi-fixed it 3.5.2. Well, 3.5.2 hasn't happened yet. So if you see it as still being broken, please speak up now. Why do you call it only "semi-fixed"? As far as I understand it, the semantics of os.urandom() in 3.5.2rc1 are indistinguishable from reading from /dev/urandom directly, except it may not need to use a file handle. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Thu Jun 16 03:26:14 2016 From: robertc at robertcollins.net (Robert Collins) Date: Thu, 16 Jun 2016 19:26:14 +1200 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> Message-ID: On 16 Jun 2016 6:55 PM, "Larry Hastings" wrote: > > > Why do you call it only "semi-fixed"? As far as I understand it, the semantics of os.urandom() in 3.5.2rc1 are indistinguishable from reading from /dev/urandom directly, except it may not need to use a file handle. Which is a contract change. Someone testing in E.g. a chroot could have a different device on /dev/urandom, and now they will need to intercept syscalls for the same effect. Personally I think this is fine, but assuming i see Barry's point correctly, it is indeed but the same as it was. -rob -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jun 16 03:36:13 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Jun 2016 00:36:13 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616052541.GB32689@thunk.org> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> Message-ID: On Wed, Jun 15, 2016 at 10:25 PM, Theodore Ts'o wrote: > On Wed, Jun 15, 2016 at 04:12:57PM -0700, Nathaniel Smith wrote: >> - It's not exactly true that the Python interpreter doesn't need >> cryptographic randomness to initialize SipHash -- it's more that >> *some* Python invocations need unguessable randomness (to first >> approximation: all those which are exposed to hostile input), and some >> don't. And since the Python interpreter has no idea which case it's >> in, and since it's unacceptable for it to break invocations that don't >> need unguessable hashes, then it has to err on the side of continuing >> without randomness. All that's fine. > > In practice, those Python ivocation which are exposed to hostile input > are those that are started while the network are up. The vast > majority of time, they are launched by the web brwoser --- and if this > happens after a second or so of the system getting networking > interrupts, (a) getrandom won't block, and (b) /dev/urandom and > getrandom will be initialized. Not sure what you mean about the vast majority of Python invocations being launched by the web browser? But anyway, sure, usually this isn't an issue. This is just discussing about what to do in the unlikely case when it actually has become an issue, and it's hard to be certain that this will *never* happen. E.g. it's entirely plausible that someone will write some cloud-init plugin that exposes an HTTP server or something. People do all kinds of weird things in VMs these days... Basically this is a question of whether we should make an (unlikely) error totally invisible to the user, and "errors should never pass silently" is right there in the Zen of Python :-). > Also, I wish people would say that this is only an issue on Linux. > Again, FreeBSD's /dev/urandom will block as well if it is > uninitialized. It's just that in practice, for both Linux and > Freebsd, we try very hard to make sure /dev/urandom is fully > initialized by the time it matters. It's just that so far, it's only > on Linux when there was an attempt to use Python in the early init > scripts, and in a VM and in a system where everything is modularized > such that the deadlock became visible. > > >> (I guess the way to implement this would be for the SipHash >> initialization code -- which runs very early -- to set some flag, and >> then we expose that flag in sys._something, and later in the startup >> sequence check for it after the warnings module is functional. >> Exposing the flag at the Python level would also make it possible for >> code like cloud-init to do its own explicit check and respond >> appropriately.) > > I really don't think it's that big a of a deal in *practice*, and but > if you really are concerned about the very remote possibility that a > Python invocation could start in early boot, and *then* also stick > around for the long term, and *then* be exosed to hostile input --- > what if you set the flag, and then later on, N minutes, either > automatically, or via some trigger such as cloud-init --- try and see > if /dev/urandom is initialized (even a few seconds later, so long as > the init scripts are hanging, it should be initialized) have Python > hash all of its dicts, or maybe just the non-system dicts (since those > are presumably the ones mos tlikely to be exposed hostile input). I don't think this is technically doable. There's no global list of hash tables, and Python exposes the actual hash values to user code with some guarantee that they won't change. -n -- Nathaniel J. Smith -- https://vorpus.org From stefan at bytereef.org Thu Jun 16 03:48:28 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 16 Jun 2016 07:48:28 +0000 (UTC) Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> Message-ID: Nathaniel Smith pobox.com> writes: > On Wed, Jun 15, 2016 at 10:25 PM, Theodore Ts'o mit.edu> wrote: > > In practice, those Python ivocation which are exposed to hostile input > > are those that are started while the network are up. > > Not sure what you mean about the vast majority of Python invocations > being launched by the web browser? "Python invocations which are exposed to hostile input". ;) Stefan Krah From njs at pobox.com Thu Jun 16 03:53:38 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Jun 2016 00:53:38 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616094508.3acf1de7.barry@wooz.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On Wed, Jun 15, 2016 at 11:45 PM, Barry Warsaw wrote: > On Jun 15, 2016, at 01:01 PM, Nick Coghlan wrote: > >>No, this is a bad idea. Asking novice developers to make security >>decisions they're not yet qualified to make when it's genuinely >>possible for us to do the right thing by default is the antithesis of >>good security API design, and os.urandom() *is* a security API >>(whether we like it or not - third party documentation written by the >>cryptographic software development community has made it so, since >>it's part of their guidelines for writing security sensitive code in >>pure Python). > > Regardless of what third parties have said about os.urandom(), let's look at > what *we* have said about it. Going back to pre-churn 3.4 documentation: > > os.urandom(n) > Return a string of n random bytes suitable for cryptographic use. > > This function returns random bytes from an OS-specific randomness > source. The returned data should be unpredictable enough for cryptographic > applications, though its exact quality depends on the OS > implementation. On a Unix-like system this will query /dev/urandom, and on > Windows it will use CryptGenRandom(). If a randomness source is not found, > NotImplementedError will be raised. > > For an easy-to-use interface to the random number generator provided by > your platform, please see random.SystemRandom. > > So we very clearly provided platform-dependent caveats on the cryptographic > quality of os.urandom(). We also made a strong claim that there's a direct > connection between os.urandom() and /dev/urandom on "Unix-like system(s)". > > We broke that particular promise in 3.5. and semi-fixed it 3.5.2. > >>Adding *new* APIs is also a bad idea, since "os.urandom() is the right >>answer on every OS except Linux, and also the best currently available >>answer on Linux" has been the standard security advice for generating >>cryptographic secrets in pure Python code for years now, so we should >>only change that guidance if we have extraordinarily compelling >>reasons to do so, and we don't. > > Disagree. > > We have broken one long-term promise on os.urandom() ("On a Unix-like system > this will query /dev/urandom") and changed another ("should be unpredictable > enough for cryptographic applications, though its exact quality depends on OS > implementations"). > > We broke the experienced Linux developer's natural and long-standing link > between the API called os.urandom() and /dev/urandom. This breaks pre-3.5 > code that assumes read-from-/dev/urandom semantics for os.urandom(). > > We have introduced churn. Predicting a future SO question such as "Can > os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes > possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the > 3.5.x series, and yes possibly in Python 3.6 and beyond". It also depends on the kernel version, since it will never block on old kernels that are missing getrandom(), but it might block on future kernels if Linux's /dev/urandom ever becomes blocking. (Ted's said that this is not going to happen now, but the only reason it isn't was that he tried to make the change and it broke some distros that are still in use -- so it seems entirely possible that it will happen a few years from now.) > We have a better answer for "cryptographically appropriate" use cases in > Python 3.6 - the secrets module. Trying to make os.urandom() "the right > answer on every OS" weakens the promotion of secrets as *the* module to use > for cryptographically appropriate use cases. > > IMHO it would be better to leave os.urandom() well enough alone, except for > the documentation which should effectively say, a la 3.4: > > os.urandom(n) > Return a string of n random bytes suitable for cryptographic use. > > This function returns random bytes from an OS-specific randomness > source. The returned data should be unpredictable enough for cryptographic > applications, though its exact quality depends on the OS > implementation. On a Unix-like system this will query /dev/urandom, and on > Windows it will use CryptGenRandom(). If a randomness source is not found, > NotImplementedError will be raised. > > Cryptographic applications should use the secrets module for stronger > guaranteed sources of randomness. > > For an easy-to-use interface to the random number generator provided by > your platform, please see random.SystemRandom. This is not an accurate docstring, though. The more accurate docstring for your proposed behavior would be: os.urandom(n) Return a string of n bytes that will usually, but not always, be suitable for cryptographic use. This function returns random bytes from an OS-specific randomness source. On non-Linux OSes, this uses the best available source of randomness, e.g. CryptGenRandom() on Windows and /dev/urandom on OS X, and thus will be strong enough for cryptographic use. However, on Linux it uses a deprecated API (/dev/urandom) which in rare cases is known to return bytes that look random, but aren't. There is no way to know when this has happened; your code will just silently stop being secure. In some unusual configurations, where Python is not configured with any source of randomness, it will raise NotImplementedError. You should never use this function. If you need unguessable random bytes, then the 'secrets' module is always a strictly better choice -- unlike this function, it always uses the best available source of cryptographic randomness, even on Linux. Alternatively, if you need random bytes but it doesn't matter whether other people can guess them, then the 'random' module is always a strictly better choice -- it will be faster, as well as providing useful features like deterministic seeding. --- In practice, your proposal means that ~all existing code that uses os.urandom becomes incorrect and should be switched to either secrets or random. This is *far* more churn for end-users than Nick's proposal. ...Anyway, since there's clearly going to be at least one PEP about this, maybe we should stop rehashing bits and pieces of the argument in these long threads that most people end up skipping and then rehashing again later? -n -- Nathaniel J. Smith -- https://vorpus.org From barry at python.org Thu Jun 16 04:03:29 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 11:03:29 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <57624CA3.3020704@hastings.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> Message-ID: <20160616110329.37acd390.barry@wooz.org> On Jun 15, 2016, at 11:52 PM, Larry Hastings wrote: >Well, 3.5.2 hasn't happened yet. So if you see it as still being broken, >please speak up now. In discussion with other Ubuntu developers, several salient points were raised. The documentation for os.urandom() in 3.5.2rc1 doesn't make sense: On Linux, getrandom() syscall is used if available and the urandom entropy pool is initialized (getrandom() does not block). On a Unix-like system this will query /dev/urandom. Perhaps better would be: Where available, the getrandom() syscall is used (with the GRND_NONBLOCK flag) if available and the urandom entropy pool is initialized. When getrandom() returns EAGAIN because of insufficient entropy, fallback to reading from /dev/urandom. When the getrandom() syscall is unavailable on other Unix-like systems, this will query /dev/urandom. It's actually a rather twisty maze of code to verify these claims, and I'm nearly certain we don't have any tests to guarantee this is what actually happens in those cases, so there are many caveats. This means that an experienced developer can no longer just `man urandom` to understand the unique operational behavior of os.urandom() on their platform, but instead would be forced to actually read our code to find out what's actually happening when/if things break. It is unacceptable if any new exceptions are raised when insufficient entropy is available. Python 3.4 essentially promises that "if only crap entropy is available, you'll get crap, but at least it won't block and no exceptions are raised". Proper backward compatibility requires the same in 3.5 and beyond. Are we sure that's still the case? Using the system call *may* be faster in the we-have-good-entropy-case, but it will definitely be slower in the we-don't-have-good-entropy-case (because of the fallback logic). Maybe that doesn't matter in practice but it's worth noting. >Why do you call it only "semi-fixed"? As far as I understand it, the >semantics of os.urandom() in 3.5.2rc1 are indistinguishable from reading from >/dev/urandom directly, except it may not need to use a file handle. Semi-fixed because os.urandom() will still not be strictly backward compatible between Python 3.5.2 and 3.4. *If* we can guarantee that os.urandom() will never block or raise an exception when only poor entropy is available, then it may be indeed indistinguishably backward compatible for most if not all cases. Cheers, -Barry From stefan at bytereef.org Thu Jun 16 04:19:53 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 16 Jun 2016 08:19:53 +0000 (UTC) Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: Nathaniel Smith pobox.com> writes: > In practice, your proposal means that ~all existing code that uses > os.urandom becomes incorrect and should be switched to either secrets > or random. This is *far* more churn for end-users than Nick's > proposal. This should only concern code that a) was specifically written for 3.5.0/3.5.1 and b) implements a serious cryptographic application in Python. I think b) is not a good idea anyway due to timing and side channel attacks and the lack of secure wiping of memory. Such applications should be written in C, where one does not have to predict the behavior of multiple layers of abstractions. Stefan Krah From barry at python.org Thu Jun 16 04:22:20 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 11:22:20 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> Message-ID: <20160616112220.681ff6e6.barry@wooz.org> On Jun 16, 2016, at 07:26 PM, Robert Collins wrote: >Which is a contract change. Someone testing in E.g. a chroot could have a >different device on /dev/urandom, and now they will need to intercept >syscalls for the same effect. Personally I think this is fine, but assuming >i see Barry's point correctly, it is indeed but the same as it was. It's true there could be a different device on /dev/urandom, but by my reading of the getrandom() manpage I think that *should* be transparent since By default, getrandom() draws entropy from the /dev/urandom pool. This behavior can be changed via the flags argument. and we don't pass the GRND_RANDOM flag to getrandom(). Cheers, -Barry From barry at python.org Thu Jun 16 04:33:51 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 11:33:51 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: <20160616113351.7e9f2c3b@python.org> On Jun 16, 2016, at 12:53 AM, Nathaniel Smith wrote: >> We have introduced churn. Predicting a future SO question such as "Can >> os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, >> yes possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of >> the 3.5.x series, and yes possibly in Python 3.6 and beyond". > >It also depends on the kernel version, since it will never block on >old kernels that are missing getrandom(), but it might block on future >kernels if Linux's /dev/urandom ever becomes blocking. (Ted's said >that this is not going to happen now, but the only reason it isn't was >that he tried to make the change and it broke some distros that are >still in use -- so it seems entirely possible that it will happen a >few years from now.) Right; I noticed this and had it in my copious notes for my follow up but forgot to mention it. Thanks! >This is not an accurate docstring, though. The more accurate docstring >for your proposed behavior would be: [...] >You should never use this function. If you need unguessable random >bytes, then the 'secrets' module is always a strictly better choice -- >unlike this function, it always uses the best available source of >cryptographic randomness, even on Linux. Alternatively, if you need >random bytes but it doesn't matter whether other people can guess >them, then the 'random' module is always a strictly better choice -- >it will be faster, as well as providing useful features like >deterministic seeding. Note that I was talking about 3.5.x, where we don't have the secrets module. I'd quibble about the admonition about never using the function. It *can* be useful if the trade-offs are appropriate for your application (e.g. "almost always random enough, but maybe not though at least you won't block and you'll get back something"). Getting the words right is useful, but I agree that we should be strongly recommending crypto applications use the secrets module in Python 3.6. >In practice, your proposal means that ~all existing code that uses os.urandom >becomes incorrect and should be switched to either secrets or random. This is >*far* more churn for end-users than Nick's proposal. I disagree. We have a clear upgrade path for end-users. If you're using os.urandom() in pre-3.5 and understand what you're getting (or not getting as the case may be), you will continue to get or not get exactly the same bits in 3.5.x (where x >= 2). No changes to your code are necessary. This is also the case in 3.6 but there you can do much better by porting your code to the new secrets module. Go do that! >...Anyway, since there's clearly going to be at least one PEP about this, >maybe we should stop rehashing bits and pieces of the argument in these long >threads that most people end up skipping and then rehashing again later? Sure, I'll try. ;) Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From larry at hastings.org Thu Jun 16 04:40:22 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jun 2016 01:40:22 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616110329.37acd390.barry@wooz.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> <20160616110329.37acd390.barry@wooz.org> Message-ID: <576265F6.5020807@hastings.org> On 06/16/2016 01:03 AM, Barry Warsaw wrote: > *If* we can guarantee that os.urandom() will never block or raise an exception > when only poor entropy is available, then it may be indeed indistinguishably > backward compatible for most if not all cases. I stepped through the code that shipped in 3.5.2rc1. It only ever calls getrandom() with the GRND_NONBLOCK flag. If getrandom() returns -1 and errno is EAGAIN it falls back to /dev/urandom--I actually simulated this condition in gdb and watched it open /dev/urandom. I didn't see any code for raising an exception or blocking when only poor entropy is available. As Robert Collins points out, this does change the behavior ever-so-slightly from 3.4; if urandom is initialized, and the kernel has the getrandom system call, getrandom() will give us the bytes we asked for and we won't open and read from /dev/urandom. In this state os.urandom() behaves ever-so-slightly differently: * os.urandom() will now work in chroot environments where /dev/urandom doesn't exist. * If Python runs in a chroot environment with a fake /dev/urandom, we'll ignore that and use the kernel's urandom device. * If the sysadmin changed what the systemwide /dev/urandom points to, we'll ignore that and use the kernel's urandom device. But os.urandom() is documented as calling getrandom() when available in 3.5... though doesn't detail how it calls it or what it uses the result for. Anyway, I feel these differences were minor, and covered by the documented change in 3.5, so I thought it was reasonable and un-broken. If this isn't backwards-compatible enough to suit you, please speak up now! //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jun 16 04:46:10 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 11:46:10 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> Message-ID: <20160616114610.18de2d88.barry@wooz.org> On Jun 16, 2016, at 12:36 AM, Nathaniel Smith wrote: >Basically this is a question of whether we should make an (unlikely) error >totally invisible to the user, and "errors should never pass silently" is >right there in the Zen of Python :-). I'd phrase it differently though. To me, it comes down to hand-holding our users who for whatever reason, don't use the appropriate APIs for what they're trying to accomplish. We can educate them through documentation, but I don't think it's appropriate to retrofit existing APIs to different behavior based on those faulty assumptions, because that has other negative effects, such as breaking the promises we make to experienced and knowledgeable developers. To me, the better policy is to admit our mistake in 3.5.0 and 3.5.1, restore pre-existing behavior, accurately document the trade-offs, and provide a clear, better upgrade path for our users. We've done this beautifully and effectively via the secrets module in Python 3.6. Cheers, -Barry From barry at python.org Thu Jun 16 05:06:38 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 12:06:38 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <576265F6.5020807@hastings.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> <20160616110329.37acd390.barry@wooz.org> <576265F6.5020807@hastings.org> Message-ID: <20160616120638.424be4fe.barry@wooz.org> On Jun 16, 2016, at 01:40 AM, Larry Hastings wrote: >As Robert Collins points out, this does change the behavior ever-so-slightly >from 3.4; Ah yes, I misunderstood Robert's point. >if urandom is initialized, and the kernel has the getrandom system call, >getrandom() will give us the bytes we asked for and we won't open and read >from /dev/urandom. In this state os.urandom() behaves ever-so-slightly >differently: > > * os.urandom() will now work in chroot environments where /dev/urandom > doesn't exist. > * If Python runs in a chroot environment with a fake /dev/urandom, > we'll ignore that and use the kernel's urandom device. > * If the sysadmin changed what the systemwide /dev/urandom points to, > we'll ignore that and use the kernel's urandom device. > >But os.urandom() is documented as calling getrandom() when available in >3.5... though doesn't detail how it calls it or what it uses the result for. >Anyway, I feel these differences were minor, and covered by the documented >change in 3.5, so I thought it was reasonable and un-broken. > >If this isn't backwards-compatible enough to suit you, please speak up now! It does seem like a narrow corner case, which of course means *someone* will be affected by it . I'll leave it up to you, though it should at least be clearly documented. Let's hope the googles will also help our hypothetical future head-scratcher. Cheers, -Barry From stephen at xemacs.org Thu Jun 16 05:32:23 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 16 Jun 2016 18:32:23 +0900 Subject: [Python-Dev] Smoothing the transition from Python 2 to 3 In-Reply-To: References: <20160608210133.GA4318@python.ca> <20160609230807.GA8118@python.ca> Message-ID: <22370.29223.878866.723213@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > - even if there is a test suite, sufficiently pervasive [str/bytes] > type ambiguity may make it difficult to use for fault isolation Difficult yes, but I would argue that that difficuly is inherent[1]. Ie, if it's pervasive, the fault should be isolated to the whole module. Such a fault *will* regress, often in the exact same place, but if not there, elsewhere due to the same ambiguity. That was my experience in both GNU Emacs and Mailman. In GNU Emacs's case there's a paired, much more successful (in respect of encoding problems) experience with XEmacs to compare.[2] We'll see how things go in Mailman 3 (which uses a nearly completely rewritten email package), but I'll bet the experience there is even more successful.[3] If you're looking for a band-aid that will get you back running asap, then you're better off bisecting the change history than going through a slew of warnings one-by-one, as a recent error is likely due to a recent change. If Neil still wants to go ahead, more power to him. I don't know everything. It's just that my experience in this area is sufficiently extensive and sufficiently bad that it's worth repeating (just this once!) Footnotes: [1] Or as Brooks would have said, "of the essence". [2] GNU Emacs has a multilingualization specialist in Ken Handa whose day job is writing multiligualization libraries, so their encoding detection, accuracy of implementation, and codec coverage is and always was better than XEmacs's. I'm referring here to internal bugs in the Lisp primitives dealing with text, as well as the difficulty of writing applications that handled both internal text and external bytes without confusing them. [3] Though not strictly comparable to the XEmacs experience, due to (1) being a second implementation, not a parallel implementation, and (2) the Internet environment being much more standard conformant, even in email, these days. From donald at stufft.io Thu Jun 16 06:04:39 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 16 Jun 2016 06:04:39 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616114610.18de2d88.barry@wooz.org> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> Message-ID: <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> > On Jun 16, 2016, at 4:46 AM, Barry Warsaw wrote: > > We can educate them through documentation, but I don't think it's appropriate > to retrofit existing APIs to different behavior based on those faulty > assumptions, because that has other negative effects, such as breaking the > promises we make to experienced and knowledgeable developers. You can?t document your way out of a usability problem, in the same way that while it was true that urllib was *documented* to not verify certificates by default, that didn?t matter because a large set of users used it like it did anyways. In my opinion, this is a usability issue as well. You have a ton of third party documentation and effort around ?just use urandom? for Cryptographic random which is generally the right (and best!) answer except for this one little niggle on a Linux platform where /dev/urandom *may* produce predictable bytes (but usually doesn?t). That documentation typically doesn?t go into telling people this small niggle because prior to getrandom(0) there wasn?t much they could do about it except use /dev/random which is bad in every other situation but early boot cryptographic keys. Regardless of what we document it as, people are going to use os.urandom for cryptographic purposes because for everyone who doesn?t keep up on exactly what modules are being added to Python who has any idea about cryptography at all is going to look for a Python interface to urandom. That doesn?t even begin to touch the thousands upon thousands of uses that already exist in the wild that are assuming that os.urandom will always give them cryptographic random, who now *need* to write this as: try: from secrets import token_bytes except ImportError: from os import urandom as token_bytes In order to get the best cryptographic random available to them on their system, which assumes they?re even going to notice at all that there?s a new secrets model, and requires each and every use of os.urandom to change. Honestly, I think that the first sentence in the documentation should most obviously be the most pertinent one, and the first sentence here is "Return a string of n random bytes suitable for cryptographic use.?. The bit about how the exact quality depends on the OS and documenting what device it uses is, to my eyes, obviously a hedge to say that ?Hey, if this gives you bad random it?s your OSs fault not ours, we can?t produce good random where your OS can?t give us some? and to give people a suggestion of where to look to determine if they?re going to get good random or not. I do not think ?uses /dev/urandom? is, or should be considered a core part of this API, it already doesn?t use /dev/urandom on Windows where it doesn?t exist nor does it use /dev/urandom in 3.5+ if it can help it. Using getrandom(0) or using getrandom(GRND_NONBLOCK) and raising an exception on EAGAIN is still accessing the urandom CSPRNG with the same general runtime characteristics of /dev/urandom outside of cases where it?s not safe to actually use /dev/urandom. Frankly, I think it?s a disservice to Python developers to leave in this footgun. ? Donald Stufft From barry at python.org Thu Jun 16 07:07:56 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 14:07:56 +0300 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> Message-ID: <20160616140756.62be6e25@python.org> On Jun 16, 2016, at 06:04 AM, Donald Stufft wrote: >Regardless of what we document it as, people are going to use os.urandom for >cryptographic purposes because for everyone who doesn?t keep up on exactly >what modules are being added to Python who has any idea about cryptography at >all is going to look for a Python interface to urandom. That doesn?t even >begin to touch the thousands upon thousands of uses that already exist in the >wild that are assuming that os.urandom will always give them cryptographic >random, who now *need* to write this as: [...] >Frankly, I think it?s a disservice to Python developers to leave in this >footgun. This really gets to the core of our responsibility to our users. Let's start by acknowledging that good-willed people can have different opinions on this, and that we all want to do what's best for our users, although we may have different definitions of "what's best". Since this topic comes up over and over again, it's worth exploring in more detail. Here's my take on it in this context. We have a responsibility to provide stable, well-documented, obvious APIs to our users to provide functionality that is useful and appropriate to the best of our abilities. We have a responsibility to provide secure implementations of that functionality wherever possible. It's in the conflict between these two responsibilities that these heated discussions and differences of opinions come up. This conflict is exposed in the os.urandom() debate because the first responsibility informs us that backward compatibility is more important to maintain because it provides stability and predictability. The second responsibility urges us to favor retrofitting increased security into APIs that for practicality purposes are being used counter to our original intent. It's not that you think backward compatibility is unimportant, or that I think improving security has no value. In the messy mudpit of the middle, we can't seem to have both, as much as I'd argue that providing new, better APIs can give us edible cake. Coming down on either side has its consequences, both known and unintended, and I think in these cases consensus can't be reached. It's for these reasons that we have RMs and BDFLs to break the tie. We must lay out our arguments and trust our Larrys, Neds, and Guidos to make the right --or at least *a*-- decision on a case-by-case basis, and if not agree then accept. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From rdmurray at bitdance.com Thu Jun 16 07:08:59 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 16 Jun 2016 07:08:59 -0400 Subject: [Python-Dev] Why does base64 return bytes? In-Reply-To: <57609869.4060304@canterbury.ac.nz> References: <20160614151935.GY27919@ando.pearwood.info> <20160614180556.9A1C0B1401C@webabinitio.net> <57609869.4060304@canterbury.ac.nz> Message-ID: <20160616110900.653EAB14028@webabinitio.net> On Wed, 15 Jun 2016 11:51:05 +1200, Greg Ewing wrote: > R. David Murray wrote: > > The fundamental purpose of the base64 encoding is to take a series > > of arbitrary bytes and reversibly turn them into another series of > > bytes in which the eighth bit is not significant. > > No, it's not. If that were its only purpose, it would be > called base128, and the RFC would describe it purely in > terms of bit patterns and not mention characters or > character sets at all. Sorry, you are correct. IMO it is to encode it to a representation that consists of a limited subset of printable (makes marks on paper or screen) characters (which is an imprecise term); ie: data that will not be interpreted as having control information by most programs processing the data stream as either human-readable or raw bytes. The rest of the argument still applies, specifically the part about wire encoding to seven bit bytes being the currently-most-used[*] and backward-compatible use case. And I say this despite the fact that the email package currently handles everything as surrogate-escaped text and so does in fact decode the output of base64.encode to ASCII and only later re-encodes it. That's a design issue in the email package deriving from the fact that bytes and string used to be the same thing in python2. It might some day get corrected, but probably won't be, and it is a legacy of *not* making the distinction between bytes and string. --David [*] Yes this is changing, I already said that :) From nikita at nemkin.ru Thu Jun 16 07:11:33 2016 From: nikita at nemkin.ru (Nikita Nemkin) Date: Thu, 16 Jun 2016 16:11:33 +0500 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace Message-ID: I'll reformulate my argument: Ordered class namespaces are a minority use case that's already covered by existing language features (custom metaclasses) and doesn't warrant the extension of the language (i.e. making OrderedDict a builtin type). This is about Python-the-Language, not CPython-the-runtime. If you disagree with this premise, there's no point arguing about the alternatives. That being said, below are the answers to your objections to specific alternatives. On Thu, Jun 16, 2016 at 1:30 AM, Nick Coghlan wrote: > On 14 June 2016 at 02:41, Nikita Nemkin wrote: > > Adding metaclasses to an existing class can break compatibility with > third party subclasses, so making it possible for people to avoid that > while still gaining the ability to implicitly expose attribute > ordering to class decorators and other potentially interested parties > is a recurring theme behind this PEP and also PEPs 422 and 487. The simple answer is "don't do that", i.e. don't pile an ordered metaclass on top of another metaclass. Such use case is hypothetical anyway. Also, namespace argument to the default metaclass doesn't cause conflicts. >> 3. Making compiler fill in __definition_order__ for every class >> (just like __qualname__) without touching the runtime. >> ? > > Class scopes support conditionals and loops, so we can't necessarily > be sure what names will be assigned without running the code. It's > also possible to make attribute assignments via locals() that are > entirely opaque to the compiler, but visible to the interpreter at > runtime. All explicit assignments in the class body can be detected statically. Implicit assignments via locals(), sys._frame() etc. can't be detected, BUT they are unlikely to have a meaningful order! It's reasonable to exclude them from __definition_order__. This also applies to documentation tools. If there really was a need, they could have easily extracted static order, solving 99.9999% of the problem. > The rationale for "Why not make this configurable, rather than > switching it unilaterally?" is that it's actually *simpler* overall to > just make it the default - we can then change the documentation to say > "class bodies are evaluated in a collections.OrderedDict instance by > default" and record the consequences of that, rather than having to > document yet another class customisation mechanism. It would have been a "simpler" default if it was the core dict that became ordered. Instead, it brings in a 3rd party (OrderedDict). Documenting an extra metaclass or an extra type kward would hardly take more space. And it's NOT yet another mechanism. It's the good old metaclass mechanism. > It also eliminates boilerplate from class decorator usage > instructions, where people have to write "to use this class decorator, > you must also specify 'namespace=collections.OrderedDict' in your > class header" Statically inferred __definition_order__ would work here. Order-dependent decorators don't seem to be important enough to worry about their usability. From donald at stufft.io Thu Jun 16 07:34:47 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 16 Jun 2016 07:34:47 -0400 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: <20160616140756.62be6e25@python.org> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> Message-ID: <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> > On Jun 16, 2016, at 7:07 AM, Barry Warsaw wrote: > > On Jun 16, 2016, at 06:04 AM, Donald Stufft wrote: > >> Regardless of what we document it as, people are going to use os.urandom for >> cryptographic purposes because for everyone who doesn?t keep up on exactly >> what modules are being added to Python who has any idea about cryptography at >> all is going to look for a Python interface to urandom. That doesn?t even >> begin to touch the thousands upon thousands of uses that already exist in the >> wild that are assuming that os.urandom will always give them cryptographic >> random, who now *need* to write this as: > > [...] > >> Frankly, I think it?s a disservice to Python developers to leave in this >> footgun. > > This really gets to the core of our responsibility to our users. Let's start > by acknowledging that good-willed people can have different opinions on this, > and that we all want to do what's best for our users, although we may have > different definitions of "what's best?. Yes, I don?t think anyone is being malicious :) that?s why I qualified my statement with ?I think?, because I don?t believe that whether or not this particular choice is a disservice is a fundamental property of the universe, but rather my opinion influenced by my priorities. > > Since this topic comes up over and over again, it's worth exploring in more > detail. Here's my take on it in this context. > > We have a responsibility to provide stable, well-documented, obvious APIs to > our users to provide functionality that is useful and appropriate to the best > of our abilities. > > We have a responsibility to provide secure implementations of that > functionality wherever possible. > > It's in the conflict between these two responsibilities that these heated > discussions and differences of opinions come up. This conflict is exposed in > the os.urandom() debate because the first responsibility informs us that > backward compatibility is more important to maintain because it provides > stability and predictability. The second responsibility urges us to favor > retrofitting increased security into APIs that for practicality purposes are > being used counter to our original intent. Well, I don?t think that for os.urandom someone using it for security is running ?counter to it?s original intent?, given that in general urandom?s purpose is for cryptographic random. Someone *may* be using it for something other than that, but it?s pretty explicitly there for security sensitive applications. > > It's not that you think backward compatibility is unimportant, or that I think > improving security has no value. In the messy mudpit of the middle, we can't > seem to have both, as much as I'd argue that providing new, better APIs can > give us edible cake. Right. I personally often fall towards securing the *existing* APIs and adding new, insecure APIs that are obviously so in cases where we can reasonably do that. That?s largely because given an API that?s both being used in security sensitive applications and ones that?s not, the ?failure? to be properly secure is almost always a silent failure, while the ?failure? to applications that don?t need that security is almost always obvious and immediate. Taking os.urandom as an example, the failure case here for the security side is that you get some bytes that are, to some degree, predictable. There is nobody alive who can look at some bytes and go ?oh yep, those bytes are predictable we?re using the wrong API?, thus basically anyone ?incorrectly? [1] using this API for security sensitive applications is going to have it just silently doing the wrong thing. On the flip side, if someone is using this API and what they care about is it not blocking, ever, and always giving them some sort of random-ish number no matter how predictable it is, then both of the proposed failure cases are fairly noticeable (to varying degrees), either it blocks long enough for it to matter for those people and they notice and dig in, or it raises an exception and they notice and dig in. In both cases they get some indication that something is wrong. > > Coming down on either side has its consequences, both known and unintended, > and I think in these cases consensus can't be reached. It's for these reasons > that we have RMs and BDFLs to break the tie. We must lay out our arguments > and trust our Larrys, Neds, and Guidos to make the right --or at least *a*-- > decision on a case-by-case basis, and if not agree then accept. Right. I?ve personally tried not to personally be the one who keeps pushing for this even after a decree, partially because it?s draining to me to argue for the security side with python-dev [2] and partially because It was ruled on and I lost. However if there continues to be discussion I?ll continue to advocate for what I think is right :) [1] I don?t think using os.urandom is incorrect to use for security sensitive applications and I think it?s a losing battle for Python to try and fight the rest of the world that urandom is not the right answer here. [2] python-dev tends to favor not breaking ?working? code over securing existing APIs, even if ?working? is silently doing the wrong thing in a security context. This is particularly frustrating when it comes to security because security is by it?s nature the act of taking code that would otherwise execute and making it error, ideally only in bad situations, but this ?security?s purpose is to make things break? nature clashes with python-dev?s default of not breaking ?working? code in a way that is personally draining to me. ? Donald Stufft From cory at lukasa.co.uk Thu Jun 16 07:58:29 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 16 Jun 2016 12:58:29 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: <63F9DC8F-7AB2-4D1E-8DC6-771D41955DA5@lukasa.co.uk> > On 16 Jun 2016, at 09:19, Stefan Krah wrote: > > This should only concern code that a) was specifically written for > 3.5.0/3.5.1 and b) implements a serious cryptographic application > in Python. > > I think b) is not a good idea anyway due to timing and side channel > attacks and the lack of secure wiping of memory. Such applications > should be written in C, where one does not have to predict the > behavior of multiple layers of abstractions. No, it concerns code that generates its random numbers from Python. For example, you may want to use AES GCM to encrypt a file at rest. AES GCM requires the use of an nonce, and has only one rule about this nonce: you MUST NOT, under any circumstances, re-use an nonce/key combination. If you do, AES GCM fails catastrophically (I cannot emphasise this enough, re-using a nonce/key combination in AES GCM totally destroys all the properties the algorithm provides)[0]. You can use a C implementation of all of the AES logic, including offload to your x86 CPU with its fancy AES GCM instruction set. However, you *need* to provide an nonce: AES GCM can?t magically guess what it is, and it needs to be communicated in some way for the decryption[1]. In situations where you do not have an easily available nonce (you do have it for TLS, for example), you will need to provide one, and the logical and obvious thing to do is to use a random number. Your Python application needs to obtain that random number, and the safest way to do it is via os.urandom(). This is the problem with this argument: we cannot wave our hands and say ?os.urandom can be as unsafe as we want because crypto code must not be written in Python?. Even if we never implement an algorithm in Python (and I agree with you that crypto primitives in general should not be implemented in Python for the exact reasons you suggest), most algorithms require the ability to be provided with good random numbers by their callers. As long as crypto algorithms require good nonces, Python needs access to a secure CSPRNG. Kernel CSPRNGs are *strongly* favoured for many reasons that I won?t go into here, so os.urandom is our winner. python-dev cannot wash its hands of the security decision here. As I?ve said many times, I?m pleased to see the decision makers have not done that: while I don?t agree with their decision, I totally respect that it was theirs to make, and they made it with all of the facts. Cory [0]: Someone will *inevitably* point out that other algorithms resist nonce misuse somewhat better than this. While that?s true, it?s a) not relevant, because some standards require use of the non-NMR algorithms, and b) unhelpful, because even if we could switch, we?d need access to the better primitives, which we don?t have. [1]: Again, to head off some questions at the pass: the reason nonces are usually provided by the user of the algorithm is that sometimes they?re generated semi-deterministically. For example, TLS generates a unique key for each session (again, requiring randomness, but that?s neither here nor there), and so TLS can use deterministic *but non-repeated* nonces, which in practice it derives from record numbers. Because you have two options (re-use keys with random nonces, or random keys with deterministic nonces), a generic algorithm implementation does not constrain your choice of nonce. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From barry at python.org Thu Jun 16 08:24:33 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 15:24:33 +0300 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> Message-ID: <20160616152433.70b2d9f7@python.org> On Jun 16, 2016, at 07:34 AM, Donald Stufft wrote: >Well, I don?t think that for os.urandom someone using it for security is >running ?counter to it?s original intent?, given that in general urandom?s >purpose is for cryptographic random. Someone *may* be using it for something >other than that, but it?s pretty explicitly there for security sensitive >applications. Except that I disagree. I think os.urandom's original intent, as documented in Python 3.4, is to provide a thin layer over /dev/urandom, with all that implies, and with the documented quality caveats. I know as a Linux developer that if I need to know the details of that, I can `man urandom` and read the gory details. In Python 3.5, I can't do that any more. >Right. I personally often fall towards securing the *existing* APIs and >adding new, insecure APIs that are obviously so in cases where we can >reasonably do that. Sure, and I personally fall on the side of maintaining stable, backward compatible APIs, adding new, better, more secure APIs to address deficiencies in real-world use cases. That's because when we break APIs, even with the best of intentions, it breaks people's code in ways and places that we can't predict, and which are very often very difficult to discover. I guess it all comes down to who's yelling at you. ;) Cheers, -Barry P.S. These discussions do not always end in despair. Witness PEP 493. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tytso at mit.edu Thu Jun 16 08:44:01 2016 From: tytso at mit.edu (Theodore Ts'o) Date: Thu, 16 Jun 2016 08:44:01 -0400 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: <20160616152433.70b2d9f7@python.org> References: <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> <20160616152433.70b2d9f7@python.org> Message-ID: <20160616124401.GC32689@thunk.org> On Thu, Jun 16, 2016 at 03:24:33PM +0300, Barry Warsaw wrote: > Except that I disagree. I think os.urandom's original intent, as documented > in Python 3.4, is to provide a thin layer over /dev/urandom, with all that > implies, and with the documented quality caveats. I know as a Linux developer > that if I need to know the details of that, I can `man urandom` and read the > gory details. In Python 3.5, I can't do that any more. If Python were to document os.urandom as providing a thin wrapper over /dev/urandom as implemented on Linux, and also document os.getrandom as providing a thin wrapper over getrandom(2) as implemented on Linux. And then say that the best emulation of those two interfaces will be provided say that on other operating systems, and that today the best practice is to call getrandom with the flags set to zero (or defaulted out), that would certainly make me very happy. I could imagine that some people might complain that it is too Linux-centric, or it is not adhering to Python's design principles, but it makes a lot sense of me as a Linux person. :-) Cheers, - Ted From p.f.moore at gmail.com Thu Jun 16 08:50:54 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Jun 2016 13:50:54 +0100 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> Message-ID: On 16 June 2016 at 12:34, Donald Stufft wrote: > [1] I don?t think using os.urandom is incorrect to use for security sensitive > applications and I think it?s a losing battle for Python to try and fight > the rest of the world that urandom is not the right answer here. > > [2] python-dev tends to favor not breaking ?working? code over securing existing > APIs, even if ?working? is silently doing the wrong thing in a security > context. This is particularly frustrating when it comes to security because > security is by it?s nature the act of taking code that would otherwise > execute and making it error, ideally only in bad situations, but this > ?security?s purpose is to make things break? nature clashes with python-dev?s > default of not breaking ?working? code in a way that is personally draining > to me. Should I take it from these two statements that you do not believe that providing *new* APIs that provide better security compared to a backward compatible but flawed existing implementation is a reasonable approach? And specifically that you don't agree with the decision to provide the new "secrets" module as the recommended interface for getting secure random numbers from Python? One of the aspects of this debate that I'm unclear about is what role the people arguing that os.urandom must change see for the new secrets module. Paul From stefan at bytereef.org Thu Jun 16 08:57:54 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 16 Jun 2016 12:57:54 +0000 (UTC) Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <63F9DC8F-7AB2-4D1E-8DC6-771D41955DA5@lukasa.co.uk> Message-ID: Cory Benfield lukasa.co.uk> writes: > python-dev cannot wash its hands of the security decision here. As I?ve said many times, I?m pleased to > see the decision makers have not done that: while I don?t agree with their decision, I totally respect > that it was theirs to make, and they made it with all of the facts. I think the sysadmin's responsibility still plays a major role here. If a Linux system crucially relies on the quality of /dev/urandom, it should be possible to insert a small C program (call it ensure_random) into the boot sequence that does *exactly* what Python did in the bug report: block until entropy is available. Well, it *was* possible with SysVinit ... :) Python is not the only application that needs a secure /dev/urandom. Stefan Krah From donald at stufft.io Thu Jun 16 09:02:26 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 16 Jun 2016 09:02:26 -0400 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> Message-ID: <82F8D8A2-57D0-4973-9B08-05D04E0D7629@stufft.io> > On Jun 16, 2016, at 8:50 AM, Paul Moore wrote: > > On 16 June 2016 at 12:34, Donald Stufft wrote: >> [1] I don?t think using os.urandom is incorrect to use for security sensitive >> applications and I think it?s a losing battle for Python to try and fight >> the rest of the world that urandom is not the right answer here. >> >> [2] python-dev tends to favor not breaking ?working? code over securing existing >> APIs, even if ?working? is silently doing the wrong thing in a security >> context. This is particularly frustrating when it comes to security because >> security is by it?s nature the act of taking code that would otherwise >> execute and making it error, ideally only in bad situations, but this >> ?security?s purpose is to make things break? nature clashes with python-dev?s >> default of not breaking ?working? code in a way that is personally draining >> to me. > > Should I take it from these two statements that you do not believe > that providing *new* APIs that provide better security compared to a > backward compatible but flawed existing implementation is a reasonable > approach? And specifically that you don't agree with the decision to > provide the new "secrets" module as the recommended interface for > getting secure random numbers from Python? > > One of the aspects of this debate that I'm unclear about is what role > the people arguing that os.urandom must change see for the new secrets > module. > > Paul I think the new secrets module is great, particularly for functions other than secrets.token_bytes. If that?s all the secrets module was then I?d argue it shouldn?t exist because we already have os.urandom. IOW I think it solves a different problem than os.urandom, if all you need is cryptographically random bytes, I think that os.urandom is the most obvious thing that someone will reach for given: * Pages upon pages of documentation both inside the Python community and outside saying ?use urandom?. * The sheer bulk of existing code that is already out there using os.urandom for it?s cryptographic properties. I also think it?s a great module for providing defaults that we can?t provide in os.urandom, like the number of bytes that are considered ?secure? [1]. What I don?t think is that the secrets module means that all of a sudden os.urandom is no longer an API that is primarily used in a security sensitive context [2] and thus we should willfully choose to use a subpar interface to the same CSPRNG when the OS provides us a better one [3] because one small edge case *might* break in a loud an obvious way for the minority of people using this API in a non security sensitive context while leaving the majority of people using this API possible getting silently insecure behavior from it. [1] Of course, what is considered secure is going to be application dependent, but secrets can give a pretty good approximation for the general case. [2] This is one of the things that really gets me about this, it?s not like folks on my side are saying we need to break the pickle module because it?s possible to use it insecurely. That would be silly because one of the primary use cases for that module is using it in a context that is not security sensitive. However, os.urandom is, to the best of my ability to determine and reason, almost always used in a security sensitive context, and thus should make security sensitive trade offs in it?s API. [3] Thus it?s still a small wrapper around OS provided APIs, so we?re not asking for os.py to implement some great big functionality, we?re just asking for it to provide a thin shim over a better interface to the same thing. ? Donald Stufft From random832 at fastmail.com Thu Jun 16 09:51:52 2016 From: random832 at fastmail.com (Random832) Date: Thu, 16 Jun 2016 09:51:52 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616110329.37acd390.barry@wooz.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> <20160616110329.37acd390.barry@wooz.org> Message-ID: <1466085112.1516432.639628113.069A20D4@webmail.messagingengine.com> On Thu, Jun 16, 2016, at 04:03, Barry Warsaw wrote: > *If* we can guarantee that os.urandom() will never block or raise an > exception when only poor entropy is available, then it may be indeed > indistinguishably backward compatible for most if not all cases. Why can't we exclude cases when only poor entropy is available from "most if not all cases"? From barry at python.org Thu Jun 16 10:04:43 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 17:04:43 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <1466085112.1516432.639628113.069A20D4@webmail.messagingengine.com> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> <20160616110329.37acd390.barry@wooz.org> <1466085112.1516432.639628113.069A20D4@webmail.messagingengine.com> Message-ID: <20160616170443.4dc73f23.barry@wooz.org> On Jun 16, 2016, at 09:51 AM, Random832 wrote: >On Thu, Jun 16, 2016, at 04:03, Barry Warsaw wrote: >> *If* we can guarantee that os.urandom() will never block or raise an >> exception when only poor entropy is available, then it may be indeed >> indistinguishably backward compatible for most if not all cases. > >Why can't we exclude cases when only poor entropy is available from >"most if not all cases"? Because if it blocks or raises a new exception on poor entropy it's an API break. Cheers, -Barry From srkunze at mail.de Thu Jun 16 10:53:35 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 16 Jun 2016 16:53:35 +0200 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: <82F8D8A2-57D0-4973-9B08-05D04E0D7629@stufft.io> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> <82F8D8A2-57D0-4973-9B08-05D04E0D7629@stufft.io> Message-ID: <5762BD6F.50809@mail.de> > I also think it?s a great module for providing defaults that we can?t > provide in os.urandom, like the number of bytes that are considered > ?secure? [1]. > > What I don?t think is that the secrets module means that all of a sudden > os.urandom is no longer an API that is primarily used in a security > sensitive context Not all of a sudden. However, I guess things will change in the future. If we want the secrets module to be the first and only place where crypto goes, we should work towards that goal. It needs proper communication, marketing etc. Deprecation periods can be years long. This change (whatever form it will take) can be carried out over 3 or 4 releases when the ultimate goal is made clear to everybody reading the docs. OTOH I don't know whether long deprecation periods are necessary here at all. Other industries are very sensitive to fast changes. Furthermore, next generations will be taught using the new way, so the Python community should not be afraid of some changes because most of them are for the better. On 16.06.2016 15:02, Donald Stufft wrote: > I think that os.urandom is the most obvious thing that someone will reach for given: > > * Pages upon pages of documentation both inside the Python community > and outside saying ?use urandom?. > * The sheer bulk of existing code that is already out there using > os.urandom for it?s cryptographic properties. That's maybe you. However, as stated before, I am not expert in this field. So, when I need to, I first would start researching the current state of the art in Python. If the docs says: use the secrets module (e.g. near os.urandom), I would happily comply -- especially when there's reasonable explanation. That's from a newbie's point of view. Best, Sven From random832 at fastmail.com Thu Jun 16 11:07:04 2016 From: random832 at fastmail.com (Random832) Date: Thu, 16 Jun 2016 11:07:04 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616170443.4dc73f23.barry@wooz.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <57624CA3.3020704@hastings.org> <20160616110329.37acd390.barry@wooz.org> <1466085112.1516432.639628113.069A20D4@webmail.messagingengine.com> <20160616170443.4dc73f23.barry@wooz.org> Message-ID: <1466089624.1532079.639708689.6394BFD9@webmail.messagingengine.com> On Thu, Jun 16, 2016, at 10:04, Barry Warsaw wrote: > On Jun 16, 2016, at 09:51 AM, Random832 wrote: > > >On Thu, Jun 16, 2016, at 04:03, Barry Warsaw wrote: > >> *If* we can guarantee that os.urandom() will never block or raise an > >> exception when only poor entropy is available, then it may be indeed > >> indistinguishably backward compatible for most if not all cases. > > > >Why can't we exclude cases when only poor entropy is available from > >"most if not all cases"? > > Because if it blocks or raises a new exception on poor entropy it's an > API break. Yes, but in only very rare cases. Which as I *just said* makes it backwards compatible for "most" cases. From contact at ionelmc.ro Thu Jun 16 07:27:01 2016 From: contact at ionelmc.ro (=?UTF-8?Q?Ionel_Cristian_M=C4=83rie=C8=99?=) Date: Thu, 16 Jun 2016 14:27:01 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> Message-ID: On Thu, Jun 16, 2016 at 1:04 PM, Donald Stufft wrote: > In my opinion, this is a usability issue as well. You have a ton of third > party documentation and effort around ?just use urandom? for Cryptographic > random which is generally the right (and best!) answer except for this one > little niggle on a Linux platform where /dev/urandom *may* produce > predictable bytes (but usually doesn?t). ?Why not consider opt-out behavior with environment variables?? Eg: people that don't care about crypto mumbojumbo and want fast interpreter startup could just use a PYTHONWEAKURANDOM=y or PYTHONFASTURANDOM=y. That ways there's no need to change api of os.urandom() and users have a clear and easy path to get old behavior. Thanks, -- Ionel Cristian M?rie?, http://blog.ionelmc.ro -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jun 16 11:39:48 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jun 2016 08:39:48 -0700 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> Message-ID: On 16 June 2016 at 05:50, Paul Moore wrote: > On 16 June 2016 at 12:34, Donald Stufft wrote: >> [1] I don?t think using os.urandom is incorrect to use for security sensitive >> applications and I think it?s a losing battle for Python to try and fight >> the rest of the world that urandom is not the right answer here. >> >> [2] python-dev tends to favor not breaking ?working? code over securing existing >> APIs, even if ?working? is silently doing the wrong thing in a security >> context. This is particularly frustrating when it comes to security because >> security is by it?s nature the act of taking code that would otherwise >> execute and making it error, ideally only in bad situations, but this >> ?security?s purpose is to make things break? nature clashes with python-dev?s >> default of not breaking ?working? code in a way that is personally draining >> to me. > > Should I take it from these two statements that you do not believe > that providing *new* APIs that provide better security compared to a > backward compatible but flawed existing implementation is a reasonable > approach? And specifically that you don't agree with the decision to > provide the new "secrets" module as the recommended interface for > getting secure random numbers from Python? > > One of the aspects of this debate that I'm unclear about is what role > the people arguing that os.urandom must change see for the new secrets > module. The secrets module is great for new code that gets to ignore any version of Python older than 3.6 - it's the "solve this problem for the next generation of developers" answer. All of the complicated "this API is safe for that purpose, this API isn't" discussions get replaced by "do the obvious thing" (i.e. use random for simulations, secrets for security). The os.urandom() debate is about taking the current obvious (because that's what the entire security community is telling you to do) low level way to do it and categorically eliminating any and all caveats on its correctness. Not "it's correct if you use these new flags that are incompatible with older Python versions". Not "it's not correct anymore, use a different API". Just "it's correct, and the newer your Python runtime, the more correct it is". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Thu Jun 16 11:58:30 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Jun 2016 08:58:30 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On Jun 16, 2016 1:23 AM, "Stefan Krah" wrote: > > Nathaniel Smith pobox.com> writes: > > In practice, your proposal means that ~all existing code that uses > > os.urandom becomes incorrect and should be switched to either secrets > > or random. This is *far* more churn for end-users than Nick's > > proposal. > > This should only concern code that a) was specifically written for > 3.5.0/3.5.1 and b) implements a serious cryptographic application > in Python. > > I think b) is not a good idea anyway due to timing and side channel > attacks and the lack of secure wiping of memory. Such applications > should be written in C, where one does not have to predict the > behavior of multiple layers of abstractions. This is completely unhelpful. Firstly because it's an argument that os.urandom and the secrets module shouldn't exist, which doesn't tell is much about what their behavior should be given that they do exist, and secondly because it fundamentally misunderstands why they exist. The word "cryptographic" here is a bit of a red herring. The guarantee that a CSPRNG makes is that the output should be *unguessable by third parties*. There are plenty of times when this is what you need even when you aren't using actual cryptography. For example, when someone logs into a web app, I may want to send back a session cookie so that I can recognize this person later without making then reauthenticate all the time. For this to work securely, it's extremely important that no one else be able to predict what session cookie I sent, because if you can guess the cookie then you can impersonate the user. In python 2.3-3.5, the most correct way to write this code is to use os.urandom. The question in this thread is whether we should break that in 3.6, so that conscientious users are forced to switch existing code over to using the secrets module if they want to continue to get the most correct available behavior, or whether we should preserve that in 3.6, so that code like my hypothetical web app that was correct on 2.3-3.5 remains correct on 3.6 (with the secrets module being a more friendly wrapper that we recommend for new code, but with no urgency about porting existing code to it). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jun 16 12:39:12 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Jun 2016 17:39:12 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On 16 June 2016 at 16:58, Nathaniel Smith wrote: > The word "cryptographic" here is a bit of a red herring. The guarantee that > a CSPRNG makes is that the output should be *unguessable by third parties*. > There are plenty of times when this is what you need even when you aren't > using actual cryptography. For example, when someone logs into a web app, I > may want to send back a session cookie so that I can recognize this person > later without making then reauthenticate all the time. For this to work > securely, it's extremely important that no one else be able to predict what > session cookie I sent, because if you can guess the cookie then you can > impersonate the user. > > In python 2.3-3.5, the most correct way to write this code is to use > os.urandom. The question in this thread is whether we should break that in > 3.6, so that conscientious users are forced to switch existing code over to > using the secrets module if they want to continue to get the most correct > available behavior, or whether we should preserve that in 3.6, so that code > like my hypothetical web app that was correct on 2.3-3.5 remains correct on > 3.6 (with the secrets module being a more friendly wrapper that we recommend > for new code, but with no urgency about porting existing code to it). While your example is understandable and clear, it's also a bit of a red herring as well. Nobody's setting up a web session cookie during the first moments of Linux boot (are they?), so os.urandom is perfectly OK in all cases here. We have a new API in 3.6 that might better express the *intent* of generating a secret token, but (cryptographic) correctness is the same either way for this example. As someone who isn't experienced in crypto, I genuinely don't have the slightest idea of what sort of program we're talking about that is written in Python, runs in the early stages of OS startup, and needs crypto-strength random numbers. So I can't reason about whether the proposed solutions are sensible. Would such programs be used in a variety of environments with different Python versions? Would the developers be non-specialists? Which of the mistakes being made that result in a vulnerability is the easiest to solve (move the code to run later, modify the Python code, require a fixed version of Python)? How severe is the security hole compared to others (for example, users with weak passwords)? What attacks are possible, and what damage could be done? (I know that in principle, any security hole needs to be plugged, but I work in an environment where production services with a password of "password" exist, and applying system security patches is treated as a "think about it when things are quiet" activity - so forgive me if I don't immediately understand why obscure vulnerabilities are important). I'm willing to accept the view of the security experts that there's a problem here. But without a clear explanation of the problem, how can a non-specialist like myself have an opinion? (And I hope the security POV isn't "you don't need an opinion, just do as we say"). Paul From random832 at fastmail.com Thu Jun 16 12:57:09 2016 From: random832 at fastmail.com (Random832) Date: Thu, 16 Jun 2016 12:57:09 -0400 Subject: [Python-Dev] Our responsibilities (was Re: BDFL ruling request: should we block forever waiting for high-quality random bits?) In-Reply-To: <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> References: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616052541.GB32689@thunk.org> <20160616114610.18de2d88.barry@wooz.org> <61CE1B75-AC9A-4493-9AF5-44D022629DEA@stufft.io> <20160616140756.62be6e25@python.org> <85CBF6B2-85A6-4285-891A-154B84C3A533@stufft.io> Message-ID: <1466096229.1556382.639822985.1A5F49FA@webmail.messagingengine.com> On Thu, Jun 16, 2016, at 07:34, Donald Stufft wrote: > python-dev tends to favor not breaking ?working? code over securing > existing APIs, even if ?working? is silently doing the wrong thing > in a security context. This is particularly frustrating when it > comes to security because security is by it?s nature the act of > taking code that would otherwise execute and making it error, > ideally only in bad situations, but this ?security?s purpose is to > make things break? nature clashes with python-dev?s default of > not breaking ?working? code in a way that is personally draining > to me. I was almost about to reply with "Maybe what we need is a new zen of python", then I checked. It turns out we already have "Errors should never pass silently" which fits *perfectly* in this situation. So what's needed is a change to the attitude that if an error passes silently, that making it no longer pass silently is a backward compatibility break. This isn't Java, where the exceptions not thrown by an API are part of that API's contract. We're free to throw new exceptions in a new version of Python. From mertz at gnosis.cx Thu Jun 16 13:01:00 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 16 Jun 2016 13:01:00 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On Thu, Jun 16, 2016 at 11:58 AM, Nathaniel Smith wrote: > [...] no one else be able to predict what session cookie I sent [...] In > python 2.3-3.5, the most correct way to write this code is to use > os.urandom. The question in this thread is whether we should break that in > 3.6, so that conscientious users are forced to switch existing code over to > using the secrets module if they want to continue to get the most correct > available behavior, or whether we should preserve that in 3.6, so that code > like my hypothetical web app that was correct on 2.3-3.5 remains correct on > 3.6 > This is kinda silly. Unless you specifically wrote your code for Python 3.5.1, and NOT for 2.3.x through 3.4.x, your code is NO WORSE in 3.5.2 than it has been under all those prior versions. The cases where the behavior in everything other than 3.5.0-3.5.1 is suboptimal are *extremely limited*, as you understand (things that run in Python very early in the boot process, and only on recent versions of Linux, no other OS). This does not even remotely describe the web-server-with-cookies example that you outline. Python 3.6 is introducing a NEW MODULE, with new APIs. The 'secrets' module is the very first time that Python has ever really explicitly addressed cryptography in the standard library. Yes, there have been third-party modules and libraries, but any cryptographic application of Python prior to 'secrets' is very much roll-your-own and know-what-you-are-doing. Yes, there has been a history of telling people to "use os.urandom()" on StackOverflow and places like that. That's about the best advice that was available prior to 3.6. Adding a new module and API is specifically designed to allow for a better answer, otherwise there'd be no reason to include it. And that advice that's been on StackOverflow and wherever has been subject to the narrow, edge-case flaw we've discussed here for at least a decade without anyone noticing or caring. It seems to me that backporting 'secrets' and putting it on Warehouse would be a lot more productive than complaining about 3.5.2 reverting to (almost) the behavior of 2.3-3.4. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jun 16 13:03:34 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jun 2016 10:03:34 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On 16 June 2016 at 09:39, Paul Moore wrote: > I'm willing to accept the view of the security experts that there's a > problem here. But without a clear explanation of the problem, how can > a non-specialist like myself have an opinion? (And I hope the security > POV isn't "you don't need an opinion, just do as we say"). If you're not writing Linux (and presumably *BSD) scripts and applications that run during system initialisation or on embedded ARM hardware with no good sources of randomness, then there's zero chance of any change made in relation to this affecting you (Windows and Mac OS X are completely immune, since they don't allow Python scripts to run early enough in the boot sequence for there to ever be a problem). The only question at hand is what CPython should do in the case where the operating system *does* let Python scripts run before the system random number generator is ready, and the application calls a security sensitive API that relies on that RNG: - throw BlockingIOError (so the script developer knows they have a potential problem to fix) - block (so the script developer has a system hang to debug) - return low quality random data (so the script developer doesn't even know they have a potential problem) The last option is the status quo, and has a remarkable number of vocal defenders. The second option is what we changed the behaviour to in 3.5 as a side effect of switching to a syscall to save a file descriptor (and *also* inadvertently made a gating requirement for CPython starting at all, without which I'd be very surprised if anyone actually noticed the potentially blocking behaviour in os.urandom itself) The first option is the one I'm currently writing a PEP for, since it makes the longstanding advice to use os.urandom() as the low level random data API for security sensitive operations unequivocally correct (as it will either do the right thing, or throw an exception which the developer can handle as appropriate for their particular application) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 16 13:26:22 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jun 2016 10:26:22 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On 16 June 2016 at 10:01, David Mertz wrote: > It seems to me that backporting 'secrets' and putting it on Warehouse would > be a lot more productive than complaining about 3.5.2 reverting to (almost) > the behavior of 2.3-3.4. "Let Flask/Django/passlib/cryptography/whatever handle the problem rather than rolling your own" is already the higher level meta-guidance. However, there are multiple levels of improvement being pursued here, since developer ignorance of security concerns and problematic defaults at the language level is a chronic problem rather than an acute one (and one that affects all languages, not just Python). In that context, the main benefit of the secrets module is as a deterrent against people reaching for the reproducible simulation focused random module to implement security sensitive operations. By offering both secrets and random in the standard library, we help make it clear that secrecy and simulation are *not the same problem*, even though they both involve random numbers. Folks that learn Python 3.6 first and then later start supporting earlier versions are likely to be more aware of the difference, and hence go looking for "What's the equivalent of the secrets module on earlier Python versions?" (at which point they can just copy whichever one-liner they actually need into their particular application - just as not every 3 line function needs to be a builtin, not every 3 line function needs to be a module on PyPI) The os.urandom proposal is aimed more at removing any remaining equivocation from the longstanding "Use os.urandom() for security sensitive operations in Python" advice - it's for the benefit of folks that are *already* attempting to do the right thing given the tools they have available. The sole source of that equivocation is that in some cases, at least on Linux, and potentially on *BSD (although we haven't seen a confirmed reproducer there), os.urandom() may return results that are sufficiently predictable to be inappropriate for use in security sensitive applications. At the moment, determining whether or not you're risking exposure to that problem requires that you know a whole lot about Linux (and *BSD, where even we haven't been able to determine the level of exposure on embedded systems), and also about how ``os.urandom()`` is implemented on different platforms. My proposal is that we do away with the requirement for all that assumed knowledge and instead say "Are you using os.urandom(), random.SystemRandom(), or an API in the secrets module? Are you using Python 3.6+? Did it raise BlockingIOError? No? Then you're fine". The vast majority of Python developers will thus be free to remain entirely ignorant of these platform specific idiosyncracies, while those that have a potential need to know will get an exception from the interpreter that they can then feed into a search engine and get pointed in the right direction. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Thu Jun 16 13:40:12 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Jun 2016 10:40:12 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On Jun 16, 2016 10:01 AM, "David Mertz" wrote: > Python 3.6 is introducing a NEW MODULE, with new APIs. The 'secrets' module is the very first time that Python has ever really explicitly addressed cryptography in the standard library. This is completely, objectively untrue. If you look up os.urandom in the official manual for the standard library, then it have always stated explicitly, as the very first line, that os.urandom returns "a string of n random bytes suitable for cryptographic use." This is *exactly* the same explicit guarantee that the secrets module makes. The motivation for adding the secrets module was to make this functionality easier to find and more convenient to use (e.g. by providing convenience functions for getting random strings of ASCII characters), not to suddenly start addressing cryptographic concerns for the first time. (Will try to address other more nuanced points later.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Jun 16 13:46:25 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 16 Jun 2016 11:46:25 -0600 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: Thanks for raising these good points, Nikita. I'll make sure the PEP reflects this discussion. (inline responses below...) -eric On Tue, Jun 14, 2016 at 3:41 AM, Nikita Nemkin wrote: > Is there any rationale for rejecting alternatives like: > > 1. Adding standard metaclass with ordered namespace. > 2. Adding `namespace` or `ordered` args to the default metaclass. We already have a metaclass-based solution: __prepare__(). Unfortunately, this opt-in option means that the definition order isn't preserved by default, which means folks can't rely on access to the definition order. This is effectively no different from the status quo. Furthermore, there's a practical problem with requiring the use of metaclasses to achieve some particular capability: metaclass conflicts. PEPs 422 and 487 exist, in large part, as a response to specific feedback from users about problems they've had with metaclasses. While the key objective of PEP 520 is preserving the class definition order, it also helps make it less necessary to write a metaclass. > 3. Making compiler fill in __definition_order__ for every class > (just like __qualname__) without touching the runtime. This is a great idea. I'd support any effort to do so. But keep in mind that how we derive __definition_order__ isn't as important as that it's always there. So the use of OrderedDict for the implementation isn't necessary. Instead, it's the implementation I've taken. If we later switch to using the compiler to get the definition order, then great! > ? > > To me, any of the above seems preferred to complicating > the core part of the language forever. What specific complication are you expecting? Like nearly all of Python's "power tools", folks won't need to know about the changes from this PEP in order to use the language. Then when they need the new functionality, it will be ready for them to use. Furthermore, as far as changes to the language go, this change is quite simple and straightforward (consider other recent changes, e.g. async). It is arguably a natural step and fills in some of the information that Python currently throws away. Finally, I've gotten broad support for the change from across the community (both on the mailing lists and in personal correspondence), from the time I first introduced the idea several years ago. > > The vast majority of Python classes don't care about their member > order, this is minority use case receiving majority treatment. The problem is that there isn't any other recourse available to code that wishes to determine the definition order of an arbitrary class. This is an obstacle to code that I personally want to write (hence my interest). > > Also, wiring OrderedDict into class creation means elevating it > from a peripheral utility to indispensable built-in type. Note that as of 3.5 CPython's OrderedDict *is* a builtin type (though exposed via the collections module rather than the builtins module). However, you're right that this change would mean OrderedDict would now be used by the interpreter in all implementations of Python 3.6+. Some of the other implementators from which I've gotten feedback have indicated this isn't a problem. From ncoghlan at gmail.com Thu Jun 16 13:53:31 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jun 2016 10:53:31 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On 16 June 2016 at 10:40, Nathaniel Smith wrote: > On Jun 16, 2016 10:01 AM, "David Mertz" wrote: >> Python 3.6 is introducing a NEW MODULE, with new APIs. The 'secrets' >> module is the very first time that Python has ever really explicitly >> addressed cryptography in the standard library. > > This is completely, objectively untrue. If you look up os.urandom in the > official manual for the standard library, then it have always stated > explicitly, as the very first line, that os.urandom returns "a string of n > random bytes suitable for cryptographic use." This is *exactly* the same > explicit guarantee that the secrets module makes. The motivation for adding > the secrets module was to make this functionality easier to find and more > convenient to use (e.g. by providing convenience functions for getting > random strings of ASCII characters), not to suddenly start addressing > cryptographic concerns for the first time. An analogy that occurred to me that may help some folks: secrets is a higher level API around os.urandom and some other standard library features (like base64 and binascii.hexlify) in the same way that shutil and pathlib are higher level APIs that aggregate other os module functions with other parts of the standard library. The existence of those higher level APIs doesn't make the lower level building blocks redundant. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Thu Jun 16 14:15:21 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 16 Jun 2016 12:15:21 -0600 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Thu, Jun 16, 2016 at 5:11 AM, Nikita Nemkin wrote: > I'll reformulate my argument: > > Ordered class namespaces are a minority use case that's already covered > by existing language features (custom metaclasses) and doesn't warrant > the extension of the language (i.e. making OrderedDict a builtin type). > This is about Python-the-Language, not CPython-the-runtime. So your main objection is that OrderedDict would effectively become part of the language definition? Please elaborate on why this is a problem. > The simple answer is "don't do that", i.e. don't pile an ordered metaclass > on top of another metaclass. Such use case is hypothetical anyway. It isn't hypothetical. It's a concrete problem that folks have run into enough that it's been a point of discussion on several occasions and the motivation for several PEPs. > All explicit assignments in the class body can be detected statically. > Implicit assignments via locals(), sys._frame() etc. can't be detected, > BUT they are unlikely to have a meaningful order! > It's reasonable to exclude them from __definition_order__. Yeah, it's reasonable to exclude them. However, in cases where I've done so I would have wanted them included in the definition order. That said, explicitly setting __definition_order__ in the class body would be enough to address that corner case. > > This also applies to documentation tools. If there really was a need, > they could have easily extracted static order, solving 99.9999% of > the problem. You mean that they have the opportunity to do something like AST traversal to extract the definition order? I expect the definition order isn't important enough to them to do that work. However, if the language provided the definition order to them for free then they'd use it. > >> The rationale for "Why not make this configurable, rather than >> switching it unilaterally?" is that it's actually *simpler* overall to >> just make it the default - we can then change the documentation to say >> "class bodies are evaluated in a collections.OrderedDict instance by >> default" and record the consequences of that, rather than having to >> document yet another class customisation mechanism. > > It would have been a "simpler" default if it was the core dict that > became ordered. Instead, it brings in a 3rd party (OrderedDict). Obviously if dict preserved insertion order then we'd use that instead of OrderedDict. There have been proposals along those lines in the past but at the end of the day someone has to do the work. Since we can use OrderedDict right now and there's no ordered dict in sight, it makes the choice rather easy. :) Ultimately the cost of defaulting to OrderedDict is not significant, neither to the language definition nor to run-time performance. Furthermore, defaulting to OrderedDict (per the PEP) makes things possible right now that aren't otherwise a possibility. > >> It also eliminates boilerplate from class decorator usage >> instructions, where people have to write "to use this class decorator, >> you must also specify 'namespace=collections.OrderedDict' in your >> class header" > > Statically inferred __definition_order__ would work here. > Order-dependent decorators don't seem to be important enough > to worry about their usability. Please be careful about discounting seemingly unimportant use cases. There's a decent chance they are important to someone. In this case that someone is (at least) myself. :) My main motivation for PEP 520 is exactly writing a class decorator that would rely on access to the definition order. Such a decorator (which could also be used stand-alone) cannot rely on every possible class it might encounter to explicitly expose its definition order. -eric From Nikolaus at rath.org Thu Jun 16 14:29:04 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 16 Jun 2016 11:29:04 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: (Nick Coghlan's message of "Thu, 16 Jun 2016 10:03:34 -0700") References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: <87eg7x3s3z.fsf@thinkpad.rath.org> On Jun 16 2016, Nick Coghlan wrote: > On 16 June 2016 at 09:39, Paul Moore wrote: >> I'm willing to accept the view of the security experts that there's a >> problem here. But without a clear explanation of the problem, how can >> a non-specialist like myself have an opinion? (And I hope the security >> POV isn't "you don't need an opinion, just do as we say"). > > If you're not writing Linux (and presumably *BSD) scripts and > applications that run during system initialisation or on embedded ARM > hardware with no good sources of randomness, then there's zero chance > of any change made in relation to this affecting you (Windows and Mac > OS X are completely immune, since they don't allow Python scripts to > run early enough in the boot sequence for there to ever be a problem). > > The only question at hand is what CPython should do in the case where > the operating system *does* let Python scripts run before the system > random number generator is ready, and the application calls a security > sensitive API that relies on that RNG: > > - throw BlockingIOError (so the script developer knows they have a > potential problem to fix) > - block (so the script developer has a system hang to debug) > - return low quality random data (so the script developer doesn't even > know they have a potential problem) > > The last option is the status quo, and has a remarkable number of > vocal defenders. *applaud* Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From amk at amk.ca Thu Jun 16 14:38:19 2016 From: amk at amk.ca (A.M. Kuchling) Date: Thu, 16 Jun 2016 14:38:19 -0400 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: <20160616183819.GA47680@ratlwsb1lcredwi.cmg.int> On Thu, Jun 16, 2016 at 10:26:22AM -0700, Nick Coghlan wrote: > meta-guidance. However, there are multiple levels of improvement being > pursued here, since developer ignorance of security concerns and > problematic defaults at the language level is a chronic problem rather > than an acute one (and one that affects all languages, not just > Python). For a while Christian Heimes has speculated on Twitter about writing a Secure Programming HOWTO. At the last language summit in Montreal, I told him I'd be happy to do the actual writing and editing if given a detailed outline. (I miss not having an ongoing writing project since ceasing to write the "What's New", but have no ideas for anything to write about.) That offer is still open, if Christian or someone else wants to produce an outline. --amk From lkb.teichmann at gmail.com Thu Jun 16 14:56:33 2016 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Thu, 16 Jun 2016 20:56:33 +0200 Subject: [Python-Dev] PEP 487: Simpler customization of class creation Message-ID: Hi list, using metaclasses in Python is a very flexible method of customizing class creation, yet this customization comes at a cost: once you want to combine two classes with different metaclasses, you run into problems. This is why I proposed PEP 487 (see https://github.com/tecki/peps/blob/pep487/pep-0487.txt, which I also attached here for ease of discussion), proposing a simple hook into class creation, with which one can override in subclasses such that sub-subclasses get customized accordingly. Otherwise, the standard Python inheritance rules apply (super() and the MRO). I also proposed to store the order in which attributes in classes are defined. This is exactly the same as PEP 520, discussed here recently, just that unfortunately we chose different names, but I am open for suggestions for better names. After having gotten good feedback on python-ideas (see https://mail.python.org/pipermail/python-ideas/2016-February/038305.html) and from IPython traitlets as a potential user of the feature (see https://mail.scipy.org/pipermail/ipython-dev/2016-February/017066.html, and the code at https://github.com/tecki/traitlets/commits/pep487) I implemented a pure Python version of this PEP, to be introduced into the standard library. I also wrote a proof-of-concept for another potential user of this feature, django forms, at https://github.com/tecki/django/commits/no-metaclass. The code to be introduced into the standard library can be found at https://github.com/tecki/cpython/commits/pep487 (sorry for using github, I'll submit something using hg once I understand that toolchain). It introduces a new metaclass types.Type which contains the machinery, and a new base class types.Object which uses said metaclass. The naming was chosen to clarify the intention that eventually those classes may be implemented in C and replace type and object. As above, I am open to better naming. As a second step, I let abc.ABCMeta inherit from said types.Type, such that an ABC may also use the features of my metaclass, without the need to define a new mixing metaclass. I am looking forward to a lot of comments on this! Greetings Martin The proposed PEP for discussion: PEP: 487 Title: Simpler customisation of class creation Version: $Revision$ Last-Modified: $Date$ Author: Martin Teichmann , Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 27-Feb-2015 Python-Version: 3.6 Post-History: 27-Feb-2015, 5-Feb-2016 Replaces: 422 Abstract ======== Currently, customising class creation requires the use of a custom metaclass. This custom metaclass then persists for the entire lifecycle of the class, creating the potential for spurious metaclass conflicts. This PEP proposes to instead support a wide range of customisation scenarios through a new ``__init_subclass__`` hook in the class body, a hook to initialize attributes, and a way to keep the order in which attributes are defined. Those hooks should at first be defined in a metaclass in the standard library, with the option that this metaclass eventually becomes the default ``type`` metaclass. The new mechanism should be easier to understand and use than implementing a custom metaclass, and thus should provide a gentler introduction to the full power Python's metaclass machinery. Background ========== Metaclasses are a powerful tool to customize class creation. They have, however, the problem that there is no automatic way to combine metaclasses. If one wants to use two metaclasses for a class, a new metaclass combining those two needs to be created, typically manually. This need often occurs as a surprise to a user: inheriting from two base classes coming from two different libraries suddenly raises the necessity to manually create a combined metaclass, where typically one is not interested in those details about the libraries at all. This becomes even worse if one library starts to make use of a metaclass which it has not done before. While the library itself continues to work perfectly, suddenly every code combining those classes with classes from another library fails. Proposal ======== While there are many possible ways to use a metaclass, the vast majority of use cases falls into just three categories: some initialization code running after class creation, the initalization of descriptors and keeping the order in which class attributes were defined. Those three use cases can easily be performed by just one metaclass. If this metaclass is put into the standard library, and all libraries that wish to customize class creation use this very metaclass, no combination of metaclasses is necessary anymore. Said metaclass should live in the ``types`` module under the name ``Type``. This should hint the user that in the future, this metaclass may become the default metaclass ``type``. The three use cases are achieved as follows: 1. The metaclass contains an ``__init_subclass__`` hook that initializes all subclasses of a given class, 2. the metaclass calls a ``__set_owner__`` hook on all the attribute (descriptors) defined in the class, and 3. an ``__attribute_order__`` tuple is left in the class in order to inspect the order in which attributes were defined. For ease of use, a base class ``types.Object`` is defined, which uses said metaclass and contains an empty stub for the hook described for use case 1. It will eventually become the new replacement for the standard ``object``. As an example, the first use case looks as follows:: >>> class SpamBase(types.Object): ... # this is implicitly a @classmethod ... def __init_subclass__(cls, **kwargs): ... cls.class_args = kwargs ... super().__init_subclass__(cls, **kwargs) >>> class Spam(SpamBase, a=1, b="b"): ... pass >>> Spam.class_args {'a': 1, 'b': 'b'} The base class ``types.Object`` contains an empty ``__init_subclass__`` method which serves as an endpoint for cooperative multiple inheritance. Note that this method has no keyword arguments, meaning that all methods which are more specialized have to process all keyword arguments. This general proposal is not a new idea (it was first suggested for inclusion in the language definition `more than 10 years ago`_, and a similar mechanism has long been supported by `Zope's ExtensionClass`_), but the situation has changed sufficiently in recent years that the idea is worth reconsidering for inclusion. The second part of the proposal adds an ``__set_owner__`` initializer for class attributes, especially if they are descriptors. Descriptors are defined in the body of a class, but they do not know anything about that class, they do not even know the name they are accessed with. They do get to know their owner once ``__get__`` is called, but still they do not know their name. This is unfortunate, for example they cannot put their associated value into their object's ``__dict__`` under their name, since they do not know that name. This problem has been solved many times, and is one of the most important reasons to have a metaclass in a library. While it would be easy to implement such a mechanism using the first part of the proposal, it makes sense to have one solution for this problem for everyone. To give an example of its usage, imagine a descriptor representing weak referenced values:: import weakref class WeakAttribute: def __get__(self, instance, owner): return instance.__dict__[self.name] def __set__(self, instance, value): instance.__dict__[self.name] = weakref.ref(value) # this is the new initializer: def __set_owner__(self, owner, name): self.name = name While this example looks very trivial, it should be noted that until now such an attribute cannot be defined without the use of a metaclass. And given that such a metaclass can make life very hard, this kind of attribute does not exist yet. The third part of the proposal is to leave a tuple called ``__attribute_order__`` in the class that contains the order in which the attributes were defined. This is a very common usecase, many libraries use an ``OrderedDict`` to store this order. This is a very simple way to achieve the same goal. Under the hood, the implementation *does* ``__prepare__`` an ``OrderedDict`` namespace, it just retains the order of the keys in ``__attribute_order__``, since ``type.__new__`` will cripple the ``OrderedDict`` into a normal ``dict``, discarding the order information. Key Benefits ============ Easier inheritance of definition time behaviour ----------------------------------------------- Understanding Python's metaclasses requires a deep understanding of the type system and the class construction process. This is legitimately seen as challenging, due to the need to keep multiple moving parts (the code, the metaclass hint, the actual metaclass, the class object, instances of the class object) clearly distinct in your mind. Even when you know the rules, it's still easy to make a mistake if you're not being extremely careful. Understanding the proposed implicit class initialization hook only requires ordinary method inheritance, which isn't quite as daunting a task. The new hook provides a more gradual path towards understanding all of the phases involved in the class definition process. Reduced chance of metaclass conflicts ------------------------------------- One of the big issues that makes library authors reluctant to use metaclasses (even when they would be appropriate) is the risk of metaclass conflicts. These occur whenever two unrelated metaclasses are used by the desired parents of a class definition. This risk also makes it very difficult to *add* a metaclass to a class that has previously been published without one. By contrast, adding an ``__init_subclass__`` method to an existing type poses a similar level of risk to adding an ``__init__`` method: technically, there is a risk of breaking poorly implemented subclasses, but when that occurs, it is recognised as a bug in the subclass rather than the library author breaching backwards compatibility guarantees. A path of introduction into Python ================================== Most of the benefits of this PEP can already be implemented using a simple metaclass. For the ``__init_subclass__`` hook this works all the way down to Python 2.7, while the attribute order needs Python 3.0 to work. Such a class has been `uploaded to PyPI`_. The only drawback of such a metaclass are the mentioned problems with metaclasses and multiple inheritance. Two classes using such a metaclass can only be combined, if they use exactly the same such metaclass. This fact calls for the inclusion of such a class into the standard library, as ``types.Type``, with a ``types.Object`` base class using it. Once all users use this standard library metaclass, classes from different packages can easily be combined. But still such classes cannot be easily combined with other classes using other metaclasses. Authors of metaclasses should bear that in mind and inherit from the standard metaclass if it seems useful for users of the metaclass to add more functionality. Ultimately, if the need for combining with other metaclasses is strong enough, the proposed functionality may be introduced into Python's ``type``. Those arguments strongly hint to the following procedure to include the proposed functionality into Python: 1. The metaclass implementing this proposal is put onto PyPI, so that it can be used and scrutinized. 2. Introduce this class into the Python 3.6 standard library. 3. Consider this as the default behavior for Python 3.7. Steps 2 and 3 would be similar to how the ``set`` datatype was first introduced as ``sets.Set``, and only later made a builtin type (with a slightly different API) based on wider experiences with the ``sets`` module. While the metaclass is still in the standard library and not in the language, it may still clash with other metaclasses. The most prominent metaclass in use is probably ABCMeta. It is also a particularly good example for the need of combining metaclasses. For users who want to define a ABC with subclass initialization, we should support a ``types.ABCMeta`` class, or let ``abc.ABCMeta`` inherit from this PEP's metaclass. As it turns out, most of the behavior of ``abc.ABCMeta`` can be done achieved with our ``types.Type``, except its core behavior, ``__instancecheck__`` and ``__subclasscheck__`` which can be supplied, as per the definition of the Python language, exclusively in a metaclass. Extensions written in C or C++ also often define their own metaclass. It would be very useful if those could also inherit from the metaclass defined here, but this is probably not possible. New Ways of Using Classes ========================= This proposal has many usecases like the following. In the examples, we still inherit from the ``SubclassInit`` base class. This would become unnecessary once this PEP is included in Python directly. Subclass registration --------------------- Especially when writing a plugin system, one likes to register new subclasses of a plugin baseclass. This can be done as follows:: class PluginBase(Object): subclasses = [] def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) cls.subclasses.append(cls) In this example, ``PluginBase.subclasses`` will contain a plain list of all subclasses in the entire inheritance tree. One should note that this also works nicely as a mixin class. Trait descriptors ----------------- There are many designs of Python descriptors in the wild which, for example, check boundaries of values. Often those "traits" need some support of a metaclass to work. This is how this would look like with this PEP:: class Trait: def __get__(self, instance, owner): return instance.__dict__[self.key] def __set__(self, instance, value): instance.__dict__[self.key] = value def __set_owner__(self, owner, name): self.key = name Rejected Design Options ======================= Calling the hook on the class itself ------------------------------------ Adding an ``__autodecorate__`` hook that would be called on the class itself was the proposed idea of PEP 422. Most examples work the same way or even better if the hook is called on the subclass. In general, it is much easier to explicitly call the hook on the class in which it is defined (to opt-in to such a behavior) than to opt-out, meaning that one does not want the hook to be called on the class it is defined in. This becomes most evident if the class in question is designed as a mixin: it is very unlikely that the code of the mixin is to be executed for the mixin class itself, as it is not supposed to be a complete class on its own. The original proposal also made major changes in the class initialization process, rendering it impossible to back-port the proposal to older Python versions. More importantly, having a pure Python implementation allows us to take two preliminary steps before before we actually change the interpreter, giving us the chance to iron out all possible wrinkles in the API. Other variants of calling the hook ---------------------------------- Other names for the hook were presented, namely ``__decorate__`` or ``__autodecorate__``. This proposal opts for ``__init_subclass__`` as it is very close to the ``__init__`` method, just for the subclass, while it is not very close to decorators, as it does not return the class. Requiring an explicit decorator on ``__init_subclass__`` -------------------------------------------------------- One could require the explicit use of ``@classmethod`` on the ``__init_subclass__`` decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message. This decision was reinforced after noticing that the user experience of defining ``__prepare__`` and forgetting the ``@classmethod`` method decorator is singularly incomprehensible (particularly since PEP 3115 documents it as an ordinary method, and the current documentation doesn't explicitly say anything one way or the other). Defining arbitrary namespaces ----------------------------- PEP 422 defined a generic way to add arbitrary namespaces for class definitions. This approach is much more flexible than just leaving the definition order in a tuple. The ``__prepare__`` method in a metaclass supports exactly this behavior. But given that effectively the only use cases that could be found out in the wild were the ``OrderedDict`` way of determining the attribute order, it seemed reasonable to only support this special case. The metaclass described in this PEP has been designed to be very simple such that it could be reasonably made the default metaclass. This was especially important when designing the attribute order functionality: This was a highly demanded feature and has been enabled through the ``__prepare__`` method of metaclasses. This method can be abused in very weird ways, making it hard to correctly maintain this feature in CPython. This is why it has been proposed to deprecated this feature, and instead use ``OrderedDict`` as the standard namespace, supporting the most important feature while dropping most of the complexity. But this would have meant that ``OrderedDict`` becomes a language builtin like dict and set, and not just a standard library class. The choice of the ``__attribute_order__`` tuple is a much simpler solution to the problem. A more ``__new__``-like hook ---------------------------- In PEP 422 the hook worked more like the ``__new__`` method than the ``__init__`` method, meaning that it returned a class instead of modifying one. This allows a bit more flexibility, but at the cost of much harder implementation and undesired side effects. History ======= This used to be a competing proposal to PEP 422 by Nick Coghlan and Daniel Urban. PEP 422 intended to achieve the same goals as this PEP, but with a different way of implementation. In the meantime, PEP 422 has been withdrawn favouring this approach. References ========== .. _published code: http://mail.python.org/pipermail/python-dev/2012-June/119878.html .. _more than 10 years ago: http://mail.python.org/pipermail/python-dev/2001-November/018651.html .. _Zope's ExtensionClass: http://docs.zope.org/zope_secrets/extensionclass.html .. _uploaded to PyPI: https://pypi.python.org/pypi/metaclass Copyright ========= This document has been placed in the public domain. From kevin-lists at theolliviers.com Thu Jun 16 15:22:12 2016 From: kevin-lists at theolliviers.com (Kevin Ollivier) Date: Thu, 16 Jun 2016 12:22:12 -0700 Subject: [Python-Dev] Discussion overload Message-ID: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> Hi all, Recent joiner here, I signed up after PyCon made me want to get more involved and have been lurking. I woke up this morning again to about 30 new messages in my inbox, almost all of which revolve around the os.urandom blocking discussion. There are just about hourly new posts showing up on this topic. There is such a thing as too much of a good thing. Discussion of issues is certainly good, but so far since joining this list I am seeing too much discussion happening too fast, and as someone who has been involved in open source for approaching two decades now, frankly, that is not really a good sign. The discussions are somewhat overlapping as so many people write back so quickly, there are multiple sub-discussions happening at once, and really at this point I'm not sure how much new each message is really adding, if anything at all. It seems to me the main solutions to this problem have all been identified, as have the tradeoffs of each. The discussion is now mostly at a point where people are just repeatedly debating (or promoting) the merits of their preferred solution and tradeoff. It is even spawning more abstract sub-discsussions about things like project compatibility policies. This discussion has really taken on a life of its own. For someone like me, a new joiner, seeing this makes me feel like wanting to simply unsubscribe. I've been on mailing lists where issues get debated endlessly, and at some point what inevitably happens is that the project starts to lose members who feel that even just trying to follow the discussions is eating up too much of their time. It really can suck the energy right out of a community. I don't want to see that happen to Python. I had a blast at PyCon, my first, and I really came away feeling more than ever that the community you have here is really special. The one problem I felt concerned about though, was that the core dev community risked a sense of paralysis caused by having too many cooks in the kitchen and too much worry about the potential unseen ramifications of changing things. That creates a sort of paralysis and difficulty achieving consensus on anything that, eventually, causes projects to slowly decline and be disrupted by a more agile alternative. Please consider taking a step back from this issue. Take a deep breath, and consider responding more slowly and letting people's points stew in your head for a day or two first. (Including this one pls. :) Python will not implode if you don't get that email out right away. If I understand what I've read of this torrent of messages correctly, we don't even know if there's a single real world use case where a user of os.urandom is hitting the same problem CPython did, so we don't even know if the blocking at startup issue is actually even happening in any real world Python code out there. It's clearly far from a rampant problem, in any case. Stop and think about that for a second. This is, in practice, potentially a complete non-issue. Fixing it in any number of ways may potentially change things for no one at all. You could even introduce a real problem while trying to fix a hypothetical one. There are more than enough real problems to deal with, so why push hypothetical problems to the top of your priority list? It's too easy to get caught up in the abstract nature of problems and to lose sight of the real people and code behind them, or sometimes, the lack thereof. Be practical, be pragmatic. Before you hit that reply button, think - in a practical sense, of all the things I could be doing right now, is this discussion the place where my involvement could generate the greatest positive impact for the project? Is this the biggest and most substantial problem the project should be focusing on right now? Projects and developers who know how to manage focus go on to achieve the greatest things, in my experience. Having been critical, I will end with a compliment. :) It is nice to see that with only a couple small exceptions, this discussion has remained very civil and respectful, which should be expected, but I know from experience that far too often these discussions start to take a nasty tone as people get frustrated. This is one of the things I really do love about the Python community, and it's one reason I want to see both the product and community grow and succeed even more. That, in fact, is why I'm choosing to write this message first rather than simply unsubscribe. Kevin From p.f.moore at gmail.com Thu Jun 16 15:33:16 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Jun 2016 20:33:16 +0100 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: On 16 June 2016 at 18:03, Nick Coghlan wrote: > On 16 June 2016 at 09:39, Paul Moore wrote: >> I'm willing to accept the view of the security experts that there's a >> problem here. But without a clear explanation of the problem, how can >> a non-specialist like myself have an opinion? (And I hope the security >> POV isn't "you don't need an opinion, just do as we say"). > > If you're not writing Linux (and presumably *BSD) scripts and > applications that run during system initialisation or on embedded ARM > hardware with no good sources of randomness, then there's zero chance > of any change made in relation to this affecting you (Windows and Mac > OS X are completely immune, since they don't allow Python scripts to > run early enough in the boot sequence for there to ever be a problem). Understood. I could quite happily ignore this thread for all the impact it will have on me. However, I've seen enough of these debates (and witnessed the frustration of the security advocates) that I want to try to understand the issues better - as much as anything so that I don't end up adding uninformed opposition to these threads (in my day job, unfortunately, security is generally the excuse for all sorts of counter-productive rules, and never offers any practical benefits that I am aware of, so I'm predisposed to rejecting arguments based on security - that background isn't accurate in this environment and I'm actively trying to counter it). > The only question at hand is what CPython should do in the case where > the operating system *does* let Python scripts run before the system > random number generator is ready, and the application calls a security > sensitive API that relies on that RNG: > > - throw BlockingIOError (so the script developer knows they have a > potential problem to fix) > - block (so the script developer has a system hang to debug) > - return low quality random data (so the script developer doesn't even > know they have a potential problem) > > The last option is the status quo, and has a remarkable number of > vocal defenders. Understood. It seems to me that there are two arguments here - backward compatibility (which is always a pressure, but sometimes applied too vigourously and not always consistently) and "we've always done it that way" (aka "people will have to consider what happens when they run under 3.4 anyway, so how will changing help?"). Jusging backward compatibility is always a matter of trade-offs, hence my interest in the actual benefits. > The second option is what we changed the behaviour to in 3.5 as a side > effect of switching to a syscall to save a file descriptor (and *also* > inadvertently made a gating requirement for CPython starting at all, > without which I'd be very surprised if anyone actually noticed the > potentially blocking behaviour in os.urandom itself) OK, so (given that the issue of CPython starting at all was an accidental, and now corrected, side effect) why is this so bad? Maybe not in a minor release, but at least for 3.6? How come this has caused such a fuss? I genuinely don't understand why people see blocking as such an issue (and as far as I can tell, Ted Tso seems to agree). The one case where this had an impact was a quickly fixed bug - so as far as I can tell, the risk of problems caused by blocking is purely hypothetical. > The first option is the one I'm currently writing a PEP for, since it > makes the longstanding advice to use os.urandom() as the low level > random data API for security sensitive operations unequivocally > correct (as it will either do the right thing, or throw an exception > which the developer can handle as appropriate for their particular > application) In my code, I typically prefer Python to make detailed decisions for me (e.g. requests follows redirects by default, it doesn't expect me to do so manually). Now certainly this is a low-level interface so the rules are different, but I don't see why blocking by default isn't "unequivocally correct" in the same way that it is on other platforms, rather than raising an exception and requiring the developer to do the wait manually. (What else would they do - fall back to insecure data? I thought the point here was that that's the wrong thing to do?) Having a blocking default with a non-blocking version seems just as arguable, and has the advantage that naive users (I don't even know if we're allowing for naive users here) won't get an unexpected exception and handle it badly because they don't know what to do (a sadly common practice in my experience). OK. Guido has pronounced, you're writing a PEP. None of this debate is really constructive any more. But I still don't understand the trade-offs, which frustrates me. Surely security isn't so hard that it can't be explained in a way that an interested layman like myself can follow? :-( Paul From barry at python.org Thu Jun 16 16:09:40 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Jun 2016 23:09:40 +0300 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> Message-ID: <20160616230940.570b8553.barry@wooz.org> On Jun 16, 2016, at 01:01 PM, David Mertz wrote: >It seems to me that backporting 'secrets' and putting it on Warehouse would >be a lot more productive than complaining about 3.5.2 reverting to (almost) >the behavior of 2.3-3.4. Very wise suggestion indeed. We have all kinds of stdlib modules backported and released as third party packages. Why not secrets too? If such were on PyPI, I'd happily package it up for the Debian ecosystem. Problem solved . But I'm *really* going to try to disengage from this discussion until Nick's PEP is posted. Cheers, -Barry From ncoghlan at gmail.com Thu Jun 16 16:50:55 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jun 2016 13:50:55 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <20160616230940.570b8553.barry@wooz.org> References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <20160616230940.570b8553.barry@wooz.org> Message-ID: On 16 June 2016 at 13:09, Barry Warsaw wrote: > On Jun 16, 2016, at 01:01 PM, David Mertz wrote: > >>It seems to me that backporting 'secrets' and putting it on Warehouse would >>be a lot more productive than complaining about 3.5.2 reverting to (almost) >>the behavior of 2.3-3.4. > > Very wise suggestion indeed. We have all kinds of stdlib modules backported > and released as third party packages. Why not secrets too? If such were on > PyPI, I'd happily package it up for the Debian ecosystem. Problem solved > . The secrets module is just a collection of one liners pulling together other stdlib components that have been around for years - the main problem it aims to address is one of discoverability (rather than one of code complexity), while also eliminating the "simulation is in the standard library, secrecy requires a third party module" discrepancy in the long term. Once you're aware the problem exists, the easiest way to use it in a version independent manner is to just copy the relevant snippet into your own project's utility library - adding an entire new dependency to your project just for those utility functions would be overkill. If you *do* add a dependency, you'd typically be better off with something more comprehensive and tailored to the particular problem domain you're dealing with, like passlib or cryptography or itsdangerous. Cheers, Nick. P.S. Having the secrets module available on PyPI wouldn't *hurt*, I just don't think it would help much. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Thu Jun 16 16:24:04 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 16 Jun 2016 14:24:04 -0600 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Thu, Jun 16, 2016 at 12:56 PM, Martin Teichmann wrote: > I am looking forward to a lot of comments on this! I'd be glad to give feedback on this, probably later today or tomorrow. In particular, I'd like to help resolve the intersection with PEP 520. :) -eric From lkb.teichmann at gmail.com Thu Jun 16 17:17:14 2016 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Thu, 16 Jun 2016 23:17:14 +0200 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: Hi Eric, hi List, > I'd be glad to give feedback on this, probably later today or > tomorrow. In particular, I'd like to help resolve the intersection > with PEP 520. :) Thanks in advance! Let me already elaborate on the differences, so that others can follow: You chose the name "__definition_order__", I chose "__attribute_order__", I am fine with either, what are other people's opinions? The bigger difference is actually the path to inclusion into Python: my idea is to first make it a standard library feature, with the later option to put it into the C core, while you want to put the feature directly into the C core. Again I'm fine with either, as long as the feature is eventually in. As a side note, you propose to use OrderedDict as the class definition namespace, and this is exactly how I implemented it. Nonetheless, I would like to keep this fact as an implementation detail, such that other implementations of Python (PyPy comes to mind) or even CPython at a later time may switch to a different way to implement this feature. I am thinking especially about the option to determine the -_order__ already at compile time. Sure, this would mean that someone could trick us by dynamically changing the order of attribute definition, but I would document that as an abuse of the functionality with undocumented outcome. Greetings Martin From ncoghlan at gmail.com Thu Jun 16 17:36:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jun 2016 14:36:36 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On 16 June 2016 at 14:17, Martin Teichmann wrote: > As a side note, you propose to use OrderedDict as the class definition > namespace, and this is exactly how I implemented it. Nonetheless, I > would like to keep this fact as an implementation detail, such that > other implementations of Python (PyPy comes to mind) or even CPython > at a later time may switch to a different way to implement this > feature. I am thinking especially about the option to determine the > -_order__ already at compile time. Sure, this would mean that someone > could trick us by dynamically changing the order of attribute > definition, but I would document that as an abuse of the functionality > with undocumented outcome. I don't think that's a side note, I think it's an important point (and relates to one of Nikita's questions as well): we have the option of carving out certain aspects of PEP 520 as CPython implementation details. In particular, the language level guarantee can be that "class statements set __definition_order__ by default, but may not do so when using a metaclass that returns a custom namespace from __prepare__", with the implementation detail that CPython does that by using collection.OrderedDict for the class namespace by default. An implementation like PyPy, with an inherently ordered standard dict implementation, can just rely on that rather than being obliged to switch to their full collections.OrderedDict type. However, I don't think we should leave the compile-time vs runtime definition order question as an implementation detail - I think we should be explicit that the definition order attribute captures the runtime definition order, with conditionals, loops and reassignment being handled accordingly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Thu Jun 16 17:57:16 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 16 Jun 2016 15:57:16 -0600 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Thu, Jun 16, 2016 at 3:36 PM, Nick Coghlan wrote: > I don't think that's a side note, I think it's an important point (and > relates to one of Nikita's questions as well): we have the option of > carving out certain aspects of PEP 520 as CPython implementation > details. > > In particular, the language level guarantee can be that "class > statements set __definition_order__ by default, but may not do so when > using a metaclass that returns a custom namespace from __prepare__", > with the implementation detail that CPython does that by using > collection.OrderedDict for the class namespace by default. > > An implementation like PyPy, with an inherently ordered standard dict > implementation, can just rely on that rather than being obliged to > switch to their full collections.OrderedDict type. Excellent point from you both. :) I'll rework PEP 520 accordingly (to focus on __definition_order__). At that point I expect the definition order part of PEP 487 could be dropped (as redundant). > > However, I don't think we should leave the compile-time vs runtime > definition order question as an implementation detail - I think we > should be explicit that the definition order attribute captures the > runtime definition order, with conditionals, loops and reassignment > being handled accordingly. Yeah, I'll make that clear. We can discuss these changes in a separate thread once I've updated PEP 520. So let's focus back on the rest of PEP 487! :) -eric From nikita at nemkin.ru Thu Jun 16 18:24:15 2016 From: nikita at nemkin.ru (Nikita Nemkin) Date: Fri, 17 Jun 2016 03:24:15 +0500 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Fri, Jun 17, 2016 at 2:36 AM, Nick Coghlan wrote: > On 16 June 2016 at 14:17, Martin Teichmann wrote: > An implementation like PyPy, with an inherently ordered standard dict > implementation, can just rely on that rather than being obliged to > switch to their full collections.OrderedDict type. I didin't know that PyPy has actually implemented packed ordered dicts! https://morepypy.blogspot.ru/2015/01/faster-more-memory-efficient-and-more.html https://mail.python.org/pipermail/python-dev/2012-December/123028.html This old idea by Raymond Hettinger is vastly superior to __definition_order__ duct tape (now that PyPy has validated it). It also gives kwarg order for free, which is important in many metaprogramming scenarios. Not to mention memory usage reduction and dict operations speedup... From mertz at gnosis.cx Thu Jun 16 18:33:42 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 16 Jun 2016 15:33:42 -0700 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: References: <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> <20160612061142.GA1986@thunk.org> <147ACCD6-17A5-42DE-A3C6-15758F45D289@lukasa.co.uk> <20160612134315.GC1986@thunk.org> <1A3E7FD6-4BF5-4097-BEC3-77EAB6956487@lukasa.co.uk> <20160612232803.GB17328@thunk.org> <20160613122654.GE17328@thunk.org> <20160616094508.3acf1de7.barry@wooz.org> <20160616230940.570b8553.barry@wooz.org> Message-ID: Yes 'secrets' is one-liners. However, it might grow a few more lines around the blocking in getrandom() on Linux. But still, not more than a few. But the reason it should be on PyPI is so that programs can have a uniform API across various Python versions. There's no real reason that someone stick on Python 2.7 or 3.3 shouldn't be able to include the future-style: import secrets Answer = secrets.token_bytes(42) On Jun 16, 2016 4:53 PM, "Nick Coghlan" wrote: > On 16 June 2016 at 13:09, Barry Warsaw wrote: > > On Jun 16, 2016, at 01:01 PM, David Mertz wrote: > > > >>It seems to me that backporting 'secrets' and putting it on Warehouse > would > >>be a lot more productive than complaining about 3.5.2 reverting to > (almost) > >>the behavior of 2.3-3.4. > > > > Very wise suggestion indeed. We have all kinds of stdlib modules > backported > > and released as third party packages. Why not secrets too? If such > were on > > PyPI, I'd happily package it up for the Debian ecosystem. Problem solved > > . > > The secrets module is just a collection of one liners pulling together > other stdlib components that have been around for years - the main > problem it aims to address is one of discoverability (rather than one > of code complexity), while also eliminating the "simulation is in the > standard library, secrecy requires a third party module" discrepancy > in the long term. > > Once you're aware the problem exists, the easiest way to use it in a > version independent manner is to just copy the relevant snippet into > your own project's utility library - adding an entire new dependency > to your project just for those utility functions would be overkill. > > If you *do* add a dependency, you'd typically be better off with > something more comprehensive and tailored to the particular problem > domain you're dealing with, like passlib or cryptography or > itsdangerous. > > Cheers, > Nick. > > P.S. Having the secrets module available on PyPI wouldn't *hurt*, I > just don't think it would help much. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jun 16 20:27:51 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jun 2016 17:27:51 -0700 Subject: [Python-Dev] Discussion overload In-Reply-To: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> Message-ID: Hi Kevin, I often feel the same way. Are you using GMail? It combines related messages in threads and lets you mute threads. I often use this feature so I can manage my inbox. (I presume other mailers have the same features, but I don't know if all of them do.) There are also many people who read the list on a website, e.g. gmane. (Though I think that sometimes the delays incurred there add to the noise -- e.g. when a decision is reached on the list sometimes people keep responding to earlier threads.) --Guido (don't get me started on top-posting :-) On Thu, Jun 16, 2016 at 12:22 PM, Kevin Ollivier < kevin-lists at theolliviers.com> wrote: > Hi all, > > Recent joiner here, I signed up after PyCon made me want to get more > involved and have been lurking. I woke up this morning again to about 30 > new messages in my inbox, almost all of which revolve around the os.urandom > blocking discussion. There are just about hourly new posts showing up on > this topic. > > > > > There is such a thing as too much of a good thing. Discussion of issues is > certainly good, but so far since joining this list I am seeing too much > discussion happening too fast, and as someone who has been involved in open > source for approaching two decades now, frankly, that is not really a good > sign. The discussions are somewhat overlapping as so many people write back > so quickly, there are multiple sub-discussions happening at once, and > really at this point I'm not sure how much new each message is really > adding, if anything at all. It seems to me the main solutions to this > problem have all been identified, as have the tradeoffs of each. The > discussion is now mostly at a point where people are just repeatedly > debating (or promoting) the merits of their preferred solution and > tradeoff. It is even spawning more abstract sub-discsussions about things > like project compatibility policies. This discussion has really taken on a > life of its own. > > For someone like me, a new joiner, seeing this makes me feel like wanting > to simply unsubscribe. I've been on mailing lists where issues get debated > endlessly, and at some point what inevitably happens is that the project > starts to lose members who feel that even just trying to follow the > discussions is eating up too much of their time. It really can suck the > energy right out of a community. I don't want to see that happen to Python. > I had a blast at PyCon, my first, and I really came away feeling more than > ever that the community you have here is really special. The one problem I > felt concerned about though, was that the core dev community risked a sense > of paralysis caused by having too many cooks in the kitchen and too much > worry about the potential unseen ramifications of changing things. That > creates a sort of paralysis and difficulty achieving consensus on anything > that, eventually, causes projects to slowly decline and be disrupted by a > more agile alternative. > > Please consider taking a step back from this issue. Take a deep breath, > and consider responding more slowly and letting people's points stew in > your head for a day or two first. (Including this one pls. :) Python will > not implode if you don't get that email out right away. If I understand > what I've read of this torrent of messages correctly, we don't even know if > there's a single real world use case where a user of os.urandom is hitting > the same problem CPython did, so we don't even know if the blocking at > startup issue is actually even happening in any real world Python code out > there. It's clearly far from a rampant problem, in any case. Stop and think > about that for a second. This is, in practice, potentially a complete > non-issue. Fixing it in any number of ways may potentially change things > for no one at all. You could even introduce a real problem while trying to > fix a hypothetical one. There are more than enough real problems to deal > with, so why push hypothetical problems to t > he top of your priority list? > > It's too easy to get caught up in the abstract nature of problems and to > lose sight of the real people and code behind them, or sometimes, the lack > thereof. Be practical, be pragmatic. Before you hit that reply button, > think - in a practical sense, of all the things I could be doing right now, > is this discussion the place where my involvement could generate the > greatest positive impact for the project? Is this the biggest and most > substantial problem the project should be focusing on right now? Projects > and developers who know how to manage focus go on to achieve the greatest > things, in my experience. > > Having been critical, I will end with a compliment. :) It is nice to see > that with only a couple small exceptions, this discussion has remained very > civil and respectful, which should be expected, but I know from experience > that far too often these discussions start to take a nasty tone as people > get frustrated. This is one of the things I really do love about the Python > community, and it's one reason I want to see both the product and community > grow and succeed even more. That, in fact, is why I'm choosing to write > this message first rather than simply unsubscribe. > > Kevin > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin-lists at theolliviers.com Thu Jun 16 22:00:59 2016 From: kevin-lists at theolliviers.com (Kevin Ollivier) Date: Thu, 16 Jun 2016 19:00:59 -0700 Subject: [Python-Dev] Discussion overload In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> Message-ID: <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Hi Guido, From: on behalf of Guido van Rossum Reply-To: Date: Thursday, June 16, 2016 at 5:27 PM To: Kevin Ollivier Cc: Python Dev Subject: Re: [Python-Dev] Discussion overload Hi Kevin, I often feel the same way. Are you using GMail? It combines related messages in threads and lets you mute threads. I often use this feature so I can manage my inbox. (I presume other mailers have the same features, but I don't know if all of them do.) There are also many people who read the list on a website, e.g. gmane. (Though I think that sometimes the delays incurred there add to the noise -- e.g. when a decision is reached on the list sometimes people keep responding to earlier threads.) I fear I did quite a poor job of making my point. :( I've been on open source mailing lists since the late 90s, so I've learned strategies for dealing with mailing list overload. I've got my mail folders, my mail rules, etc. Having been on many mailing lists over the years, I've seen many productive discussions and many unproductive ones, and over time you start to see patterns. You also see what happens to those communities over time. On the mailing lists where discussions become these unwieldy floods with 30-40 posts a day on one topic, over time what I have seen is that that rapid fire of posts generally does not lead to better decisions being made. In fact, usually it is the opposite. Faster discussions are not usually better discussions, and the chances of that gem of knowledge getting lost in the flood of posts is much greater. The more long-term consequence is that people start hesitating to bring up ideas, sometimes even very good ones, simply because even the discussion of them gets to be so draining that it's better to just leave things be. As an example, I do have work to do :) and I know if I was the one who had wanted to propose a fix for os.urandom or what have you, waking up to 30 messages I need to read to get caught up each day would be a pretty disheartening prospect, and possibly not even possible with my work obligations. It raises the bar to participating, in a way. Perhaps some of this is inherent in mailing list discussions, but really in my experience, just a conscious decision on the part of contributors to slow down the discussion and "think more, write less", can do quite a lot to ensure the discussion is in fact a better one. I probably should have taken more time to write my initial message, in fact, in order to better coalesce my points into something more succinct and clearly understandable. I somehow managed to convince people I need to learn mail management strategies. :) Anyway, that is just my $0.02 cents on the matter. With inflation it accounts for less every day, so make of it what you will. :P Thanks, Kevin --Guido (don't get me started on top-posting :-) On Thu, Jun 16, 2016 at 12:22 PM, Kevin Ollivier wrote: Hi all, Recent joiner here, I signed up after PyCon made me want to get more involved and have been lurking. I woke up this morning again to about 30 new messages in my inbox, almost all of which revolve around the os.urandom blocking discussion. There are just about hourly new posts showing up on this topic. There is such a thing as too much of a good thing. Discussion of issues is certainly good, but so far since joining this list I am seeing too much discussion happening too fast, and as someone who has been involved in open source for approaching two decades now, frankly, that is not really a good sign. The discussions are somewhat overlapping as so many people write back so quickly, there are multiple sub-discussions happening at once, and really at this point I'm not sure how much new each message is really adding, if anything at all. It seems to me the main solutions to this problem have all been identified, as have the tradeoffs of each. The discussion is now mostly at a point where people are just repeatedly debating (or promoting) the merits of their preferred solution and tradeoff. It is even spawning more abstract sub-discsussions about things like project compatibility policies. This discussion has really taken on a life of its own. For someone like me, a new joiner, seeing this makes me feel like wanting to simply unsubscribe. I've been on mailing lists where issues get debated endlessly, and at some point what inevitably happens is that the project starts to lose members who feel that even just trying to follow the discussions is eating up too much of their time. It really can suck the energy right out of a community. I don't want to see that happen to Python. I had a blast at PyCon, my first, and I really came away feeling more than ever that the community you have here is really special. The one problem I felt concerned about though, was that the core dev community risked a sense of paralysis caused by having too many cooks in the kitchen and too much worry about the potential unseen ramifications of changing things. That creates a sort of paralysis and difficulty achieving consensus on anything that, eventually, causes projects to slowly decline and be disrupted by a more agile alternative. Please consider taking a step back from this issue. Take a deep breath, and consider responding more slowly and letting people's points stew in your head for a day or two first. (Including this one pls. :) Python will not implode if you don't get that email out right away. If I understand what I've read of this torrent of messages correctly, we don't even know if there's a single real world use case where a user of os.urandom is hitting the same problem CPython did, so we don't even know if the blocking at startup issue is actually even happening in any real world Python code out there. It's clearly far from a rampant problem, in any case. Stop and think about that for a second. This is, in practice, potentially a complete non-issue. Fixing it in any number of ways may potentially change things for no one at all. You could even introduce a real problem while trying to fix a hypothetical one. There are more than enough real problems to deal with, so why push hypothetical problems to t he top of your priority list? It's too easy to get caught up in the abstract nature of problems and to lose sight of the real people and code behind them, or sometimes, the lack thereof. Be practical, be pragmatic. Before you hit that reply button, think - in a practical sense, of all the things I could be doing right now, is this discussion the place where my involvement could generate the greatest positive impact for the project? Is this the biggest and most substantial problem the project should be focusing on right now? Projects and developers who know how to manage focus go on to achieve the greatest things, in my experience. Having been critical, I will end with a compliment. :) It is nice to see that with only a couple small exceptions, this discussion has remained very civil and respectful, which should be expected, but I know from experience that far too often these discussions start to take a nasty tone as people get frustrated. This is one of the things I really do love about the Python community, and it's one reason I want to see both the product and community grow and succeed even more. That, in fact, is why I'm choosing to write this message first rather than simply unsubscribe. Kevin _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jun 16 23:25:51 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jun 2016 20:25:51 -0700 Subject: [Python-Dev] Discussion overload In-Reply-To: <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: More likely your post was too long... :-( On Thu, Jun 16, 2016 at 7:00 PM, Kevin Ollivier < kevin-lists at theolliviers.com> wrote: > Hi Guido, > > From: on behalf of Guido van Rossum < > guido at python.org> > Reply-To: > Date: Thursday, June 16, 2016 at 5:27 PM > To: Kevin Ollivier > Cc: Python Dev > Subject: Re: [Python-Dev] Discussion overload > > Hi Kevin, > > I often feel the same way. Are you using GMail? It combines related > messages in threads and lets you mute threads. I often use this feature so > I can manage my inbox. (I presume other mailers have the same features, but > I don't know if all of them do.) There are also many people who read the > list on a website, e.g. gmane. (Though I think that sometimes the delays > incurred there add to the noise -- e.g. when a decision is reached on the > list sometimes people keep responding to earlier threads.) > > > I fear I did quite a poor job of making my point. :( I've been on open > source mailing lists since the late 90s, so I've learned strategies for > dealing with mailing list overload. I've got my mail folders, my mail > rules, etc. Having been on many mailing lists over the years, I've seen > many productive discussions and many unproductive ones, and over time you > start to see patterns. You also see what happens to those communities over > time. > > On the mailing lists where discussions become these unwieldy floods with > 30-40 posts a day on one topic, over time what I have seen is that that > rapid fire of posts generally does not lead to better decisions being made. > In fact, usually it is the opposite. Faster discussions are not usually > better discussions, and the chances of that gem of knowledge getting lost > in the flood of posts is much greater. The more long-term consequence is > that people start hesitating to bring up ideas, sometimes even very good > ones, simply because even the discussion of them gets to be so draining > that it's better to just leave things be. As an example, I do have work to > do :) and I know if I was the one who had wanted to propose a fix for > os.urandom or what have you, waking up to 30 messages I need to read to get > caught up each day would be a pretty disheartening prospect, and possibly > not even possible with my work obligations. It raises the bar to > participating, in a way. > > Perhaps some of this is inherent in mailing list discussions, but really > in my experience, just a conscious decision on the part of contributors to > slow down the discussion and "think more, write less", can do quite a lot > to ensure the discussion is in fact a better one. > > I probably should have taken more time to write my initial message, in > fact, in order to better coalesce my points into something more succinct > and clearly understandable. I somehow managed to convince people I need to > learn mail management strategies. :) > > Anyway, that is just my $0.02 cents on the matter. With inflation it > accounts for less every day, so make of it what you will. :P > > Thanks, > > Kevin > > > --Guido (don't get me started on top-posting :-) > > On Thu, Jun 16, 2016 at 12:22 PM, Kevin Ollivier < > kevin-lists at theolliviers.com> wrote: > >> Hi all, >> >> Recent joiner here, I signed up after PyCon made me want to get more >> involved and have been lurking. I woke up this morning again to about 30 >> new messages in my inbox, almost all of which revolve around the os.urandom >> blocking discussion. There are just about hourly new posts showing up on >> this topic. >> >> >> >> >> There is such a thing as too much of a good thing. Discussion of issues >> is certainly good, but so far since joining this list I am seeing too much >> discussion happening too fast, and as someone who has been involved in open >> source for approaching two decades now, frankly, that is not really a good >> sign. The discussions are somewhat overlapping as so many people write back >> so quickly, there are multiple sub-discussions happening at once, and >> really at this point I'm not sure how much new each message is really >> adding, if anything at all. It seems to me the main solutions to this >> problem have all been identified, as have the tradeoffs of each. The >> discussion is now mostly at a point where people are just repeatedly >> debating (or promoting) the merits of their preferred solution and >> tradeoff. It is even spawning more abstract sub-discsussions about things >> like project compatibility policies. This discussion has really taken on a >> life of its own. >> >> For someone like me, a new joiner, seeing this makes me feel like wanting >> to simply unsubscribe. I've been on mailing lists where issues get debated >> endlessly, and at some point what inevitably happens is that the project >> starts to lose members who feel that even just trying to follow the >> discussions is eating up too much of their time. It really can suck the >> energy right out of a community. I don't want to see that happen to Python. >> I had a blast at PyCon, my first, and I really came away feeling more than >> ever that the community you have here is really special. The one problem I >> felt concerned about though, was that the core dev community risked a sense >> of paralysis caused by having too many cooks in the kitchen and too much >> worry about the potential unseen ramifications of changing things. That >> creates a sort of paralysis and difficulty achieving consensus on anything >> that, eventually, causes projects to slowly decline and be disrupted by a >> more agile alternative. >> >> Please consider taking a step back from this issue. Take a deep breath, >> and consider responding more slowly and letting people's points stew in >> your head for a day or two first. (Including this one pls. :) Python will >> not implode if you don't get that email out right away. If I understand >> what I've read of this torrent of messages correctly, we don't even know if >> there's a single real world use case where a user of os.urandom is hitting >> the same problem CPython did, so we don't even know if the blocking at >> startup issue is actually even happening in any real world Python code out >> there. It's clearly far from a rampant problem, in any case. Stop and think >> about that for a second. This is, in practice, potentially a complete >> non-issue. Fixing it in any number of ways may potentially change things >> for no one at all. You could even introduce a real problem while trying to >> fix a hypothetical one. There are more than enough real problems to deal >> with, so why push hypothetical problems to t >> he top of your priority list? >> >> It's too easy to get caught up in the abstract nature of problems and to >> lose sight of the real people and code behind them, or sometimes, the lack >> thereof. Be practical, be pragmatic. Before you hit that reply button, >> think - in a practical sense, of all the things I could be doing right now, >> is this discussion the place where my involvement could generate the >> greatest positive impact for the project? Is this the biggest and most >> substantial problem the project should be focusing on right now? Projects >> and developers who know how to manage focus go on to achieve the greatest >> things, in my experience. >> >> Having been critical, I will end with a compliment. :) It is nice to see >> that with only a couple small exceptions, this discussion has remained very >> civil and respectful, which should be expected, but I know from experience >> that far too often these discussions start to take a nasty tone as people >> get frustrated. This is one of the things I really do love about the Python >> community, and it's one reason I want to see both the product and community >> grow and succeed even more. That, in fact, is why I'm choosing to write >> this message first rather than simply unsubscribe. >> >> Kevin >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > > -- > --Guido van Rossum (python.org/~guido) > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin-lists at theolliviers.com Fri Jun 17 00:10:56 2016 From: kevin-lists at theolliviers.com (Kevin Ollivier) Date: Thu, 16 Jun 2016 21:10:56 -0700 Subject: [Python-Dev] Discussion overload In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: <76676028-A883-4B08-869E-E68FB2CB0D21@theolliviers.com> Yes, it most certainly was. :( Sorry about that! From: on behalf of Guido van Rossum Reply-To: Date: Thursday, June 16, 2016 at 8:25 PM To: Kevin Ollivier Cc: Python Dev Subject: Re: [Python-Dev] Discussion overload More likely your post was too long... :-( On Thu, Jun 16, 2016 at 7:00 PM, Kevin Ollivier wrote: Hi Guido, From: on behalf of Guido van Rossum Reply-To: Date: Thursday, June 16, 2016 at 5:27 PM To: Kevin Ollivier Cc: Python Dev Subject: Re: [Python-Dev] Discussion overload Hi Kevin, I often feel the same way. Are you using GMail? It combines related messages in threads and lets you mute threads. I often use this feature so I can manage my inbox. (I presume other mailers have the same features, but I don't know if all of them do.) There are also many people who read the list on a website, e.g. gmane. (Though I think that sometimes the delays incurred there add to the noise -- e.g. when a decision is reached on the list sometimes people keep responding to earlier threads.) I fear I did quite a poor job of making my point. :( I've been on open source mailing lists since the late 90s, so I've learned strategies for dealing with mailing list overload. I've got my mail folders, my mail rules, etc. Having been on many mailing lists over the years, I've seen many productive discussions and many unproductive ones, and over time you start to see patterns. You also see what happens to those communities over time. On the mailing lists where discussions become these unwieldy floods with 30-40 posts a day on one topic, over time what I have seen is that that rapid fire of posts generally does not lead to better decisions being made. In fact, usually it is the opposite. Faster discussions are not usually better discussions, and the chances of that gem of knowledge getting lost in the flood of posts is much greater. The more long-term consequence is that people start hesitating to bring up ideas, sometimes even very good ones, simply because even the discussion of them gets to be so draining that it's better to just leave things be. As an example, I do have work to do :) and I know if I was the one who had wanted to propose a fix for os.urandom or what have you, waking up to 30 messages I need to read to get caught up each day would be a pretty disheartening prospect, and possibly not even possible with my work obligations. It raises the bar to participating, in a way. Perhaps some of this is inherent in mailing list discussions, but really in my experience, just a conscious decision on the part of contributors to slow down the discussion and "think more, write less", can do quite a lot to ensure the discussion is in fact a better one. I probably should have taken more time to write my initial message, in fact, in order to better coalesce my points into something more succinct and clearly understandable. I somehow managed to convince people I need to learn mail management strategies. :) Anyway, that is just my $0.02 cents on the matter. With inflation it accounts for less every day, so make of it what you will. :P Thanks, Kevin --Guido (don't get me started on top-posting :-) On Thu, Jun 16, 2016 at 12:22 PM, Kevin Ollivier wrote: Hi all, Recent joiner here, I signed up after PyCon made me want to get more involved and have been lurking. I woke up this morning again to about 30 new messages in my inbox, almost all of which revolve around the os.urandom blocking discussion. There are just about hourly new posts showing up on this topic. There is such a thing as too much of a good thing. Discussion of issues is certainly good, but so far since joining this list I am seeing too much discussion happening too fast, and as someone who has been involved in open source for approaching two decades now, frankly, that is not really a good sign. The discussions are somewhat overlapping as so many people write back so quickly, there are multiple sub-discussions happening at once, and really at this point I'm not sure how much new each message is really adding, if anything at all. It seems to me the main solutions to this problem have all been identified, as have the tradeoffs of each. The discussion is now mostly at a point where people are just repeatedly debating (or promoting) the merits of their preferred solution and tradeoff. It is even spawning more abstract sub-discsussions about things like project compatibility policies. This discussion has really taken on a life of its own. For someone like me, a new joiner, seeing this makes me feel like wanting to simply unsubscribe. I've been on mailing lists where issues get debated endlessly, and at some point what inevitably happens is that the project starts to lose members who feel that even just trying to follow the discussions is eating up too much of their time. It really can suck the energy right out of a community. I don't want to see that happen to Python. I had a blast at PyCon, my first, and I really came away feeling more than ever that the community you have here is really special. The one problem I felt concerned about though, was that the core dev community risked a sense of paralysis caused by having too many cooks in the kitchen and too much worry about the potential unseen ramifications of changing things. That creates a sort of paralysis and difficulty achieving consensus on anything that, eventually, causes projects to slowly decline and be disrupted by a more agile alternative. Please consider taking a step back from this issue. Take a deep breath, and consider responding more slowly and letting people's points stew in your head for a day or two first. (Including this one pls. :) Python will not implode if you don't get that email out right away. If I understand what I've read of this torrent of messages correctly, we don't even know if there's a single real world use case where a user of os.urandom is hitting the same problem CPython did, so we don't even know if the blocking at startup issue is actually even happening in any real world Python code out there. It's clearly far from a rampant problem, in any case. Stop and think about that for a second. This is, in practice, potentially a complete non-issue. Fixing it in any number of ways may potentially change things for no one at all. You could even introduce a real problem while trying to fix a hypothetical one. There are more than enough real problems to deal with, so why push hypothetical problems to t he top of your priority list? It's too easy to get caught up in the abstract nature of problems and to lose sight of the real people and code behind them, or sometimes, the lack thereof. Be practical, be pragmatic. Before you hit that reply button, think - in a practical sense, of all the things I could be doing right now, is this discussion the place where my involvement could generate the greatest positive impact for the project? Is this the biggest and most substantial problem the project should be focusing on right now? Projects and developers who know how to manage focus go on to achieve the greatest things, in my experience. Having been critical, I will end with a compliment. :) It is nice to see that with only a couple small exceptions, this discussion has remained very civil and respectful, which should be expected, but I know from experience that far too often these discussions start to take a nasty tone as people get frustrated. This is one of the things I really do love about the Python community, and it's one reason I want to see both the product and community grow and succeed even more. That, in fact, is why I'm choosing to write this message first rather than simply unsubscribe. Kevin _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Fri Jun 17 05:15:53 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 17 Jun 2016 18:15:53 +0900 Subject: [Python-Dev] Compact dict implementations (was: PEP 468 Message-ID: Hi, developers. I'm trying to implement compact dict. https://github.com/methane/cpython/pull/1 Current status is passing most of tests. Some tests are failing because of I haven't updated `sizeof` until layout fix. And I haven't dropped OrderedDict has linked list. Before finishing implementation, I want to see comments and tests from core developers. Please come to core-mentorship ML or pull request and try it if you interested in. Regards, -- INADA Naoki From status at bugs.python.org Fri Jun 17 12:08:38 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 17 Jun 2016 18:08:38 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160617160838.8966856A1A@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-06-10 - 2016-06-17) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5544 ( -9) closed 33557 (+66) total 39101 (+57) Open issues with patches: 2417 Issues opened (41) ================== #10839: email module should not allow some header field repetitions http://bugs.python.org/issue10839 reopened by rhettinger #26171: heap overflow in zipimporter module http://bugs.python.org/issue26171 reopened by ned.deily #27288: secrets should use getrandom() on Linux http://bugs.python.org/issue27288 opened by dstufft #27292: Warn users that os.urandom() can return insecure values http://bugs.python.org/issue27292 opened by christian.heimes #27293: Summarize issues related to urandom, getrandom etc in secrets http://bugs.python.org/issue27293 opened by steven.daprano #27294: Better repr for Tkinter event objects http://bugs.python.org/issue27294 opened by serhiy.storchaka #27297: Add support for /dev/random to "secrets" http://bugs.python.org/issue27297 opened by larry #27298: redundant iteration over digits in _PyLong_AsUnsignedLongMask http://bugs.python.org/issue27298 opened by Oren Milman #27299: urllib does not splitport while putrequest realhost to HTTP he http://bugs.python.org/issue27299 opened by gr zhang #27300: tempfile.TemporaryFile(): missing errors=... argument http://bugs.python.org/issue27300 opened by mmarkk #27302: csv.Sniffer guesses wrong when unquoted fields contain quotes http://bugs.python.org/issue27302 opened by Redoute #27303: [argparse] Unify options in help output http://bugs.python.org/issue27303 opened by memeplex #27304: Create "Source Code" links in module sections, where relevant http://bugs.python.org/issue27304 opened by Yoni Lavi #27305: Crash with "pip list --outdated" on Windows 10 with Python 2.7 http://bugs.python.org/issue27305 opened by James.Paget #27307: string.Formatter does not support key/attribute access on unnu http://bugs.python.org/issue27307 opened by tbeadle #27309: Visual Styles support http://bugs.python.org/issue27309 opened by [HYBRID BEING] #27312: test_setupapp (idlelib.idle_test.test_macosx.SetupTest) fails http://bugs.python.org/issue27312 opened by ned.deily #27313: test case failures in test_widgets.ComboboxTest.of test_ttk_gu http://bugs.python.org/issue27313 opened by ned.deily #27314: Cannot install 3.5.2 with 3.6.0a1 installed http://bugs.python.org/issue27314 opened by steve.dower #27315: pydoc: prefer the pager command in favor of the specifc less c http://bugs.python.org/issue27315 opened by doko #27317: Handling data_files: too much is removed in uninstall http://bugs.python.org/issue27317 opened by sylvain.corlay #27318: Add support for symlinks to zipfile http://bugs.python.org/issue27318 opened by ldoktor #27319: Multiple item arguments for selection operations http://bugs.python.org/issue27319 opened by serhiy.storchaka #27320: ./setup.py --help-commands should sort extra commands http://bugs.python.org/issue27320 opened by Antony.Lee #27321: Email parser creates a message object that can't be flattened http://bugs.python.org/issue27321 opened by msapiro #27322: test_compile_path fails when python has been installed http://bugs.python.org/issue27322 opened by xdegaye #27323: ncurses putwin() fails in test_module_funcs http://bugs.python.org/issue27323 opened by xdegaye #27326: SIGSEV in test_window_funcs of test_curses http://bugs.python.org/issue27326 opened by xdegaye #27328: Documentation corrections for email defects http://bugs.python.org/issue27328 opened by martin.panter #27329: Document behavior when CDLL is called with None as an argumen http://bugs.python.org/issue27329 opened by Jeffrey Esquivel Sibaja #27331: Add a policy argument to email.mime.MIMEBase http://bugs.python.org/issue27331 opened by berker.peksag #27332: Clinic: first parameter for module-level functions should be P http://bugs.python.org/issue27332 opened by encukou #27333: validate_step in rangeobject.c, incorrect code logic but right http://bugs.python.org/issue27333 opened by xiang.zhang #27334: pysqlite3 context manager not performing rollback when a datab http://bugs.python.org/issue27334 opened by lciti #27335: Clarify that writing to locals() inside a class body is suppor http://bugs.python.org/issue27335 opened by steven.daprano #27337: 3.6.0a2 tarball has weird paths http://bugs.python.org/issue27337 opened by petere #27340: bytes-like objects with socket.sendall(), SSL, and http.client http://bugs.python.org/issue27340 opened by martin.panter #27341: mock.patch decorator fails silently on generators http://bugs.python.org/issue27341 opened by shoshber #27342: Clean up some Py_XDECREFs in rangeobject.c and bltinmodule.c http://bugs.python.org/issue27342 opened by xiang.zhang #27343: Incorrect error message for conflicting initializers of ctypes http://bugs.python.org/issue27343 opened by serhiy.storchaka #27344: zipfile *does* support utf-8 filenames http://bugs.python.org/issue27344 opened by dholth Most recent 15 issues with no replies (15) ========================================== #27344: zipfile *does* support utf-8 filenames http://bugs.python.org/issue27344 #27343: Incorrect error message for conflicting initializers of ctypes http://bugs.python.org/issue27343 #27341: mock.patch decorator fails silently on generators http://bugs.python.org/issue27341 #27340: bytes-like objects with socket.sendall(), SSL, and http.client http://bugs.python.org/issue27340 #27332: Clinic: first parameter for module-level functions should be P http://bugs.python.org/issue27332 #27331: Add a policy argument to email.mime.MIMEBase http://bugs.python.org/issue27331 #27329: Document behavior when CDLL is called with None as an argumen http://bugs.python.org/issue27329 #27328: Documentation corrections for email defects http://bugs.python.org/issue27328 #27326: SIGSEV in test_window_funcs of test_curses http://bugs.python.org/issue27326 #27323: ncurses putwin() fails in test_module_funcs http://bugs.python.org/issue27323 #27322: test_compile_path fails when python has been installed http://bugs.python.org/issue27322 #27317: Handling data_files: too much is removed in uninstall http://bugs.python.org/issue27317 #27309: Visual Styles support http://bugs.python.org/issue27309 #27307: string.Formatter does not support key/attribute access on unnu http://bugs.python.org/issue27307 #27304: Create "Source Code" links in module sections, where relevant http://bugs.python.org/issue27304 Most recent 15 issues waiting for review (15) ============================================= #27343: Incorrect error message for conflicting initializers of ctypes http://bugs.python.org/issue27343 #27342: Clean up some Py_XDECREFs in rangeobject.c and bltinmodule.c http://bugs.python.org/issue27342 #27334: pysqlite3 context manager not performing rollback when a datab http://bugs.python.org/issue27334 #27333: validate_step in rangeobject.c, incorrect code logic but right http://bugs.python.org/issue27333 #27332: Clinic: first parameter for module-level functions should be P http://bugs.python.org/issue27332 #27331: Add a policy argument to email.mime.MIMEBase http://bugs.python.org/issue27331 #27328: Documentation corrections for email defects http://bugs.python.org/issue27328 #27321: Email parser creates a message object that can't be flattened http://bugs.python.org/issue27321 #27320: ./setup.py --help-commands should sort extra commands http://bugs.python.org/issue27320 #27319: Multiple item arguments for selection operations http://bugs.python.org/issue27319 #27318: Add support for symlinks to zipfile http://bugs.python.org/issue27318 #27315: pydoc: prefer the pager command in favor of the specifc less c http://bugs.python.org/issue27315 #27307: string.Formatter does not support key/attribute access on unnu http://bugs.python.org/issue27307 #27304: Create "Source Code" links in module sections, where relevant http://bugs.python.org/issue27304 #27298: redundant iteration over digits in _PyLong_AsUnsignedLongMask http://bugs.python.org/issue27298 Top 10 most discussed issues (10) ================================= #27305: Crash with "pip list --outdated" on Windows 10 with Python 2.7 http://bugs.python.org/issue27305 18 msgs #27294: Better repr for Tkinter event objects http://bugs.python.org/issue27294 13 msgs #10839: email module should not allow some header field repetitions http://bugs.python.org/issue10839 12 msgs #25782: CPython hangs on error __context__ set to the error itself http://bugs.python.org/issue25782 12 msgs #27186: add os.fspath() http://bugs.python.org/issue27186 11 msgs #27292: Warn users that os.urandom() can return insecure values http://bugs.python.org/issue27292 11 msgs #27263: Tkinter sets the HOME environment variable, breaking scripts http://bugs.python.org/issue27263 9 msgs #25455: Some repr implementations don't check for self-referential str http://bugs.python.org/issue25455 8 msgs #27025: More human readable generated widget names http://bugs.python.org/issue27025 8 msgs #27288: secrets should use getrandom() on Linux http://bugs.python.org/issue27288 8 msgs Issues closed (62) ================== #5124: IDLE - pasting text doesn't delete selection http://bugs.python.org/issue5124 closed by terry.reedy #8637: Add MANPAGER envvar to specify pager for pydoc http://bugs.python.org/issue8637 closed by doko #14209: pkgutil.iter_zipimport_modules ignores the prefix parameter fo http://bugs.python.org/issue14209 closed by lukasz.langa #15468: Edit docs to hide hashlib.md5() http://bugs.python.org/issue15468 closed by gregory.p.smith #16182: readline: Wrong tab completion scope indices in Unicode termin http://bugs.python.org/issue16182 closed by martin.panter #16234: Implement correct block_size and tests for HMAC-SHA3 http://bugs.python.org/issue16234 closed by christian.heimes #16864: sqlite3.Cursor.lastrowid isn't populated when executing a SQL http://bugs.python.org/issue16864 closed by berker.peksag #17500: move PC/icons/source.xar to http://www.python.org/community/lo http://bugs.python.org/issue17500 closed by doko #19328: Improve PBKDF2 documentation http://bugs.python.org/issue19328 closed by christian.heimes #20508: IndexError from ipaddress._BaseNetwork.__getitem__ has no mess http://bugs.python.org/issue20508 closed by berker.peksag #20699: Document that binary IO classes work with bytes-likes objects http://bugs.python.org/issue20699 closed by martin.panter #20900: distutils register command should print text, not bytes repr http://bugs.python.org/issue20900 closed by berker.peksag #21386: ipaddress.IPv4Address.is_global not implemented http://bugs.python.org/issue21386 closed by berker.peksag #22558: Missing doc links to source code for Python-coded modules. http://bugs.python.org/issue22558 closed by terry.reedy #22970: asyncio: Cancelling wait() after notification leaves Condition http://bugs.python.org/issue22970 closed by yselivanov #24086: Configparser interpolation is unexpected http://bugs.python.org/issue24086 closed by lukasz.langa #24136: document PEP 448: unpacking generalization http://bugs.python.org/issue24136 closed by martin.panter #24750: IDLE: Cosmetic improvements for main window http://bugs.python.org/issue24750 closed by terry.reedy #24887: Sqlite3 has no option to provide open flags http://bugs.python.org/issue24887 closed by berker.peksag #25529: Provide access to the validated certificate chain in ssl modul http://bugs.python.org/issue25529 closed by berker.peksag #25724: SSLv3 test failure on Ubuntu 16.04 LTS http://bugs.python.org/issue25724 closed by martin.panter #26282: Add support for partial keyword arguments in extension functio http://bugs.python.org/issue26282 closed by serhiy.storchaka #26386: tkinter - Treeview - .selection_add and selection_toggle http://bugs.python.org/issue26386 closed by serhiy.storchaka #26556: Update expat to 2.2.1 http://bugs.python.org/issue26556 closed by python-dev #26862: android: SYS_getdents64 does not need to be defined on android http://bugs.python.org/issue26862 closed by xdegaye #27029: Remove support of deprecated mode 'U' in zipfile http://bugs.python.org/issue27029 closed by serhiy.storchaka #27030: Remove deprecated re features http://bugs.python.org/issue27030 closed by serhiy.storchaka #27095: Simplify MAKE_FUNCTION http://bugs.python.org/issue27095 closed by serhiy.storchaka #27122: Hang with contextlib.ExitStack and subprocess.Popen (regressio http://bugs.python.org/issue27122 closed by gregory.p.smith #27140: Opcode for creating dict with constant keys http://bugs.python.org/issue27140 closed by serhiy.storchaka #27188: sqlite3 execute* methods return value not documented http://bugs.python.org/issue27188 closed by berker.peksag #27190: Check sqlite3_version before allowing check_same_thread = Fals http://bugs.python.org/issue27190 closed by berker.peksag #27194: Tarfile superfluous truncate calls slows extraction. http://bugs.python.org/issue27194 closed by lukasz.langa #27221: multiprocessing documentation is outdated regarding method pic http://bugs.python.org/issue27221 closed by berker.peksag #27223: _read_ready and _write_ready should respect _conn_lost http://bugs.python.org/issue27223 closed by yselivanov #27227: argparse fails to parse [] when using choices and nargs='*' http://bugs.python.org/issue27227 closed by berker.peksag #27233: Missing documentation for PyOS_FSPath http://bugs.python.org/issue27233 closed by Jelle Zijlstra #27238: Bare except: usages in turtle.py http://bugs.python.org/issue27238 closed by serhiy.storchaka #27245: IDLE: Fix deletion of custom themes and key bindings http://bugs.python.org/issue27245 closed by terry.reedy #27262: IDLE: move Aqua context menu code to maxosx http://bugs.python.org/issue27262 closed by terry.reedy #27270: 'parentheses-equality' warnings when building with clang and c http://bugs.python.org/issue27270 closed by xdegaye #27272: random.Random should not read 2500 bytes from urandom http://bugs.python.org/issue27272 closed by rhettinger #27278: py_getrandom() uses an int for syscall() result http://bugs.python.org/issue27278 closed by haypo #27286: str object got multiple values for keyword argument http://bugs.python.org/issue27286 closed by serhiy.storchaka #27289: test_ftp_timeout fails with EOFError http://bugs.python.org/issue27289 closed by berker.peksag #27290: Turn heaps library into a more OOP data structure? http://bugs.python.org/issue27290 closed by rhettinger #27291: two heap corruption issues when running modified pyc code. http://bugs.python.org/issue27291 closed by gregory.p.smith #27295: heaps library does not have support for max heap http://bugs.python.org/issue27295 closed by rhettinger #27296: Urllib/Urlopen IncompleteRead with HTTP header with new line c http://bugs.python.org/issue27296 closed by martin.panter #27301: Incorrect return codes in compile.c http://bugs.python.org/issue27301 closed by serhiy.storchaka #27306: Grammatical Error in Documentation - Tarfile page http://bugs.python.org/issue27306 closed by berker.peksag #27308: Inconsistency in cgi.FieldStorage() causes unicode/byte TypeEr http://bugs.python.org/issue27308 closed by berker.peksag #27310: 3.6.0a2 IDLE.app on OS X fails to launch, use command line idl http://bugs.python.org/issue27310 closed by ned.deily #27311: Incorrect documentation for zipfile.writestr() http://bugs.python.org/issue27311 closed by martin.panter #27316: [PDB] NameError in list comprehension in PDB http://bugs.python.org/issue27316 closed by SilentGhost #27324: Error when building Python extension http://bugs.python.org/issue27324 closed by zach.ware #27325: random failure of test_builtin http://bugs.python.org/issue27325 closed by berker.peksag #27327: re documentation: typo "escapes consist of" http://bugs.python.org/issue27327 closed by ned.deily #27330: Possible leaks in ctypes http://bugs.python.org/issue27330 closed by serhiy.storchaka #27336: --without-threads build fails due to undeclared _PyGILState_ch http://bugs.python.org/issue27336 closed by berker.peksag #27338: python 2.7 platform.system reports wrong on Mac OS X El Capita http://bugs.python.org/issue27338 closed by Audric D'Hoest (Dr. Pariolo) #27339: Security Issue: Typosquatting http://bugs.python.org/issue27339 closed by haypo From ncoghlan at gmail.com Fri Jun 17 21:12:43 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Jun 2016 18:12:43 -0700 Subject: [Python-Dev] Discussion overload In-Reply-To: <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: On 16 June 2016 at 19:00, Kevin Ollivier wrote: > Hi Guido, > > From: on behalf of Guido van Rossum > > Reply-To: > Date: Thursday, June 16, 2016 at 5:27 PM > To: Kevin Ollivier > Cc: Python Dev > Subject: Re: [Python-Dev] Discussion overload > > Hi Kevin, > > I often feel the same way. Are you using GMail? It combines related messages > in threads and lets you mute threads. I often use this feature so I can > manage my inbox. (I presume other mailers have the same features, but I > don't know if all of them do.) There are also many people who read the list > on a website, e.g. gmane. (Though I think that sometimes the delays incurred > there add to the noise -- e.g. when a decision is reached on the list > sometimes people keep responding to earlier threads.) > > > I fear I did quite a poor job of making my point. :( I've been on open > source mailing lists since the late 90s, so I've learned strategies for > dealing with mailing list overload. I've got my mail folders, my mail rules, > etc. Having been on many mailing lists over the years, I've seen many > productive discussions and many unproductive ones, and over time you start > to see patterns. You also see what happens to those communities over time. This is one of the major reasons we have the option of escalating things to the PEP process (and that's currently in train for os.urandom), as well as the SIGs for when folks really need to dig into topics that risk incurring a relatively low signal-to-noise ration on python-dev. It's also why python-ideas was turned into a separate list, since folks without the time for more speculative discussions and brainstorming can safely ignore it, while remaining confident that any ideas considered interesting enough for further review will be brought to python-dev's attention. But yes, one of the more significant design errors I've made with the contextlib API was due to just such a draining pile-on by folks that weren't happy the original name wasn't a 100% accurate description of the underlying mechanics (even though it was an accurate description of the intended use case), and "people yelling at you on project communication channels without doing adequate research first" is the number one reason we see otherwise happily engaged core developers decide to find something else to do with their time. The challenge and art in community management in that context is balancing telling both old and new list participants "It's OK to ask 'Why is this so?', as sometimes the answer is that there isn't a good reason and we may want to change it" and "Learn to be a good peer manager, and avoid behaving like a micro-managing autocrat that chases away experienced contributors". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jun 17 21:32:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Jun 2016 18:32:36 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 7 June 2016 at 17:50, Eric Snow wrote: > Why is __definition_order__ even necessary? > ------------------------------------------- > > Since the definition order is not preserved in ``__dict__``, it would be > lost once class definition execution completes. Classes *could* > explicitly set the attribute as the last thing in the body. However, > then independent decorators could only make use of classes that had done > so. Instead, ``__definition_order__`` preserves this one bit of info > from the class body so that it is universally available. The discussion in the PEP 487 thread made me realise that I'd like to see a discussion in PEP 520 regarding whether or not to define __definition_order__ for builtin types initialised via PyType_Ready or created via PyType_FromSpec in addition to defining it for types created via the class statement or types.new_class(). For static types, PyType_Ready could potentially set it based on tp_members, tp_methods & tp_getset (see https://docs.python.org/3/c-api/typeobj.html ) Similarly, PyType_FromSpec could potentially set it based on the contents of Py_tp_members, Py_tp_methods and Py_tp_getset slot definitions Having definition order support in both types.new_class() and builtin types would also make it clear why we can't rely purely on the compiler to provide the necessary ordering information - in both of those cases, the Python compiler isn't directly involved in the type creation process. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Fri Jun 17 22:58:48 2016 From: brett at python.org (Brett Cannon) Date: Sat, 18 Jun 2016 02:58:48 +0000 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: I have taken PEP 523 for this: https://github.com/python/peps/blob/master/pep-0523.txt . I'm waiting until Guido gets back from vacation, at which point I'll ask for a pronouncement or assignment of a BDFL delegate. On Fri, 3 Jun 2016 at 14:37 Brett Cannon wrote: > For those of you who follow python-ideas or were at the PyCon US 2016 > language summit, you have already seen/heard about this PEP. For those of > you who don't fall into either of those categories, this PEP proposed a > frame evaluation API for CPython. The motivating example of this work has > been Pyjion, the experimental CPython JIT Dino Viehland and I have been > working on in our spare time at Microsoft. The API also works for > debugging, though, as already demonstrated by Google having added a very > similar API internally for debugging purposes. > > The PEP is pasted in below and also available in rendered form at > https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I will assign > myself a PEP # once discussion is finished as it's easier to work in git > for this for the rich rendering of the in-progress PEP). > > I should mention that the difference from python-ideas and the language > summit in the PEP are the listed support from Google's use of a very > similar API as well as clarifying the co_extra field on code objects > doesn't change their immutability (at least from the view of the PEP). > > ---------- > PEP: NNN > Title: Adding a frame evaluation API to CPython > Version: $Revision$ > Last-Modified: $Date$ > Author: Brett Cannon , > Dino Viehland > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 16-May-2016 > Post-History: 16-May-2016 > 03-Jun-2016 > > > Abstract > ======== > > This PEP proposes to expand CPython's C API [#c-api]_ to allow for > the specification of a per-interpreter function pointer to handle the > evaluation of frames [#pyeval_evalframeex]_. This proposal also > suggests adding a new field to code objects [#pycodeobject]_ to store > arbitrary data for use by the frame evaluation function. > > > Rationale > ========= > > One place where flexibility has been lacking in Python is in the direct > execution of Python code. While CPython's C API [#c-api]_ allows for > constructing the data going into a frame object and then evaluating it > via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control over the > execution of Python code comes down to individual objects instead of a > hollistic control of execution at the frame level. > > While wanting to have influence over frame evaluation may seem a bit > too low-level, it does open the possibility for things such as a > method-level JIT to be introduced into CPython without CPython itself > having to provide one. By allowing external C code to control frame > evaluation, a JIT can participate in the execution of Python code at > the key point where evaluation occurs. This then allows for a JIT to > conditionally recompile Python bytecode to machine code as desired > while still allowing for executing regular CPython bytecode when > running the JIT is not desired. This can be accomplished by allowing > interpreters to specify what function to call to evaluate a frame. And > by placing the API at the frame evaluation level it allows for a > complete view of the execution environment of the code for the JIT. > > This ability to specify a frame evaluation function also allows for > other use-cases beyond just opening CPython up to a JIT. For instance, > it would not be difficult to implement a tracing or profiling function > at the call level with this API. While CPython does provide the > ability to set a tracing or profiling function at the Python level, > this would be able to match the data collection of the profiler and > quite possibly be faster for tracing by simply skipping per-line > tracing support. > > It also opens up the possibility of debugging where the frame > evaluation function only performs special debugging work when it > detects it is about to execute a specific code object. In that > instance the bytecode could be theoretically rewritten in-place to > inject a breakpoint function call at the proper point for help in > debugging while not having to do a heavy-handed approach as > required by ``sys.settrace()``. > > To help facilitate these use-cases, we are also proposing the adding > of a "scratch space" on code objects via a new field. This will allow > per-code object data to be stored with the code object itself for easy > retrieval by the frame evaluation function as necessary. The field > itself will simply be a ``PyObject *`` type so that any data stored in > the field will participate in normal object memory management. > > > Proposal > ======== > > All proposed C API changes below will not be part of the stable ABI. > > > Expanding ``PyCodeObject`` > -------------------------- > > One field is to be added to the ``PyCodeObject`` struct > [#pycodeobject]_:: > > typedef struct { > ... > PyObject *co_extra; /* "Scratch space" for the code object. */ > } PyCodeObject; > > The ``co_extra`` will be ``NULL`` by default and will not be used by > CPython itself. Third-party code is free to use the field as desired. > Values stored in the field are expected to not be required in order > for the code object to function, allowing the loss of the data of the > field to be acceptable (this keeps the code object as immutable from > a functionality point-of-view; this is slightly contentious and so is > listed as an open issue in `Is co_extra needed?`_). The field will be > freed like all other fields on ``PyCodeObject`` during deallocation > using ``Py_XDECREF()``. > > It is not recommended that multiple users attempt to use the > ``co_extra`` simultaneously. While a dictionary could theoretically be > set to the field and various users could use a key specific to the > project, there is still the issue of key collisions as well as > performance degradation from using a dictionary lookup on every frame > evaluation. Users are expected to do a type check to make sure that > the field has not been previously set by someone else. > > > Expanding ``PyInterpreterState`` > -------------------------------- > > The entrypoint for the frame evalution function is per-interpreter:: > > // Same type signature as PyEval_EvalFrameEx(). > typedef PyObject* (__stdcall *PyFrameEvalFunction)(PyFrameObject*, int); > > typedef struct { > ... > PyFrameEvalFunction eval_frame; > } PyInterpreterState; > > By default, the ``eval_frame`` field will be initialized to a function > pointer that represents what ``PyEval_EvalFrameEx()`` currently is > (called ``PyEval_EvalFrameDefault()``, discussed later in this PEP). > Third-party code may then set their own frame evaluation function > instead to control the execution of Python code. A pointer comparison > can be used to detect if the field is set to > ``PyEval_EvalFrameDefault()`` and thus has not been mutated yet. > > > Changes to ``Python/ceval.c`` > ----------------------------- > > ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it currently stands > will be renamed to ``PyEval_EvalFrameDefault()``. The new > ``PyEval_EvalFrameEx()`` will then become:: > > PyObject * > PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) > { > PyThreadState *tstate = PyThreadState_GET(); > return tstate->interp->eval_frame(frame, throwflag); > } > > This allows third-party code to place themselves directly in the path > of Python code execution while being backwards-compatible with code > already using the pre-existing C API. > > > Updating ``python-gdb.py`` > -------------------------- > > The generated ``python-gdb.py`` file used for Python support in GDB > makes some hard-coded assumptions about ``PyEval_EvalFrameEx()``, e.g. > the names of local variables. It will need to be updated to work with > the proposed changes. > > > Performance impact > ================== > > As this PEP is proposing an API to add pluggability, performance > impact is considered only in the case where no third-party code has > made any changes. > > Several runs of pybench [#pybench]_ consistently showed no performance > cost from the API change alone. > > A run of the Python benchmark suite [#py-benchmarks]_ showed no > measurable cost in performance. > > In terms of memory impact, since there are typically not many CPython > interpreters executing in a single process that means the impact of > ``co_extra`` being added to ``PyCodeObject`` is the only worry. > According to [#code-object-count]_, a run of the Python test suite > results in about 72,395 code objects being created. On a 64-bit > CPU that would result in 579,160 bytes of extra memory being used if > all code objects were alive at once and had nothing set in their > ``co_extra`` fields. > > > Example Usage > ============= > > A JIT for CPython > ----------------- > > Pyjion > '''''' > > The Pyjion project [#pyjion]_ has used this proposed API to implement > a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each code > object has its ``co_extra`` field set to a ``PyjionJittedCode`` object > which stores four pieces of information: > > 1. Execution count > 2. A boolean representing whether a previous attempt to JIT failed > 3. A function pointer to a trampoline (which can be type tracing or not) > 4. A void pointer to any JIT-compiled machine code > > The frame evaluation function has (roughly) the following algorithm:: > > def eval_frame(frame, throw_flag): > pyjion_code = frame.code.co_extra > if not pyjion_code: > frame.code.co_extra = PyjionJittedCode() > elif not pyjion_code.jit_failed: > if not pyjion_code.jit_code: > return pyjion_code.eval(pyjion_code.jit_code, frame) > elif pyjion_code.exec_count > 20_000: > if jit_compile(frame): > return pyjion_code.eval(pyjion_code.jit_code, frame) > else: > pyjion_code.jit_failed = True > pyjion_code.exec_count += 1 > return PyEval_EvalFrameDefault(frame, throw_flag) > > The key point, though, is that all of this work and logic is separate > from CPython and yet with the proposed API changes it is able to > provide a JIT that is compliant with Python semantics (as of this > writing, performance is almost equivalent to CPython without the new > API). This means there's nothing technically preventing others from > implementing their own JITs for CPython by utilizing the proposed API. > > > Other JITs > '''''''''' > > It should be mentioned that the Pyston team was consulted on an > earlier version of this PEP that was more JIT-specific and they were > not interested in utilizing the changes proposed because they want > control over memory layout they had no interest in directly supporting > CPython itself. An informal discusion with a developer on the PyPy > team led to a similar comment. > > Numba [#numba]_, on the other hand, suggested that they would be > interested in the proposed change in a post-1.0 future for > themselves [#numba-interest]_. > > The experimental Coconut JIT [#coconut]_ could have benefitted from > this PEP. In private conversations with Coconut's creator we were told > that our API was probably superior to the one they developed for > Coconut to add JIT support to CPython. > > > Debugging > --------- > > In conversations with the Python Tools for Visual Studio team (PTVS) > [#ptvs]_, they thought they would find these API changes useful for > implementing more performant debugging. As mentioned in the Rationale_ > section, this API would allow for switching on debugging functionality > only in frames where it is needed. This could allow for either > skipping information that ``sys.settrace()`` normally provides and > even go as far as to dynamically rewrite bytecode prior to execution > to inject e.g. breakpoints in the bytecode. > > It also turns out that Google has provided a very similar API > internally for years. It has been used for performant debugging > purposes. > > > Implementation > ============== > > A set of patches implementing the proposed API is available through > the Pyjion project [#pyjion]_. In its current form it has more > changes to CPython than just this proposed API, but that is for ease > of development instead of strict requirements to accomplish its goals. > > > Open Issues > =========== > > Allow ``eval_frame`` to be ``NULL`` > ----------------------------------- > > Currently the frame evaluation function is expected to always be set. > It could very easily simply default to ``NULL`` instead which would > signal to use ``PyEval_EvalFrameDefault()``. The current proposal of > not special-casing the field seemed the most straight-forward, but it > does require that the field not accidentally be cleared, else a crash > may occur. > > > Is co_extra needed? > ------------------- > > While discussing this PEP at PyCon US 2016, some core developers > expressed their worry of the ``co_extra`` field making code objects > mutable. The thinking seemed to be that having a field that was > mutated after the creation of the code object made the object seem > mutable, even though no other aspect of code objects changed. > > The view of this PEP is that the `co_extra` field doesn't change the > fact that code objects are immutable. The field is specified in this > PEP as to not contain information required to make the code object > usable, making it more of a caching field. It could be viewed as > similar to the UTF-8 cache that string objects have internally; > strings are still considered immutable even though they have a field > that is conditionally set. > > The field is also not strictly necessary. While the field greatly > simplifies attaching extra information to code objects, other options > such as keeping a mapping of code object memory addresses to what > would have been kept in ``co_extra`` or perhaps using a weak reference > of the data on the code object and then iterating through the weak > references until the attached data is found is possible. But obviously > all of these solutions are not as simple or performant as adding the > ``co_extra`` field. > > > Rejected Ideas > ============== > > A JIT-specific C API > -------------------- > > Originally this PEP was going to propose a much larger API change > which was more JIT-specific. After soliciting feedback from the Numba > team [#numba]_, though, it became clear that the API was unnecessarily > large. The realization was made that all that was truly needed was the > opportunity to provide a trampoline function to handle execution of > Python code that had been JIT-compiled and a way to attach that > compiled machine code along with other critical data to the > corresponding Python code object. Once it was shown that there was no > loss in functionality or in performance while minimizing the API > changes required, the proposal was changed to its current form. > > > References > ========== > > .. [#pyjion] Pyjion project > (https://github.com/microsoft/pyjion) > > .. [#c-api] CPython's C API > (https://docs.python.org/3/c-api/index.html) > > .. [#pycodeobject] ``PyCodeObject`` > (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > .. [#coreclr] .NET Core Runtime (CoreCLR) > (https://github.com/dotnet/coreclr) > > .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` > ( > https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx > ) > > .. [#pycodeobject] ``PyCodeObject`` > (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > .. [#numba] Numba > (http://numba.pydata.org/) > > .. [#numba-interest] numba-users mailing list: > "Would the C API for a JIT entrypoint being proposed by Pyjion help out > Numba?" > ( > https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g > ) > > .. [#code-object-count] [Python-Dev] Opcode cache in ceval loop > (https://mail.python.org/pipermail/python-dev/2016-February/143025.html > ) > > .. [#py-benchmarks] Python benchmark suite > (https://hg.python.org/benchmarks) > > .. [#pyston] Pyston > (http://pyston.org) > > .. [#pypy] PyPy > (http://pypy.org/) > > .. [#ptvs] Python Tools for Visual Studio > (http://microsoft.github.io/PTVS/) > > .. [#coconut] Coconut > (https://github.com/davidmalcolm/coconut) > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jun 17 23:06:32 2016 From: brett at python.org (Brett Cannon) Date: Sat, 18 Jun 2016 03:06:32 +0000 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: On Fri, 17 Jun 2016 at 18:13 Nick Coghlan wrote: > On 16 June 2016 at 19:00, Kevin Ollivier > wrote: > > Hi Guido, > > > > From: on behalf of Guido van Rossum > > > > Reply-To: > > Date: Thursday, June 16, 2016 at 5:27 PM > > To: Kevin Ollivier > > Cc: Python Dev > > Subject: Re: [Python-Dev] Discussion overload > > > > Hi Kevin, > > > > I often feel the same way. Are you using GMail? It combines related > messages > > in threads and lets you mute threads. I often use this feature so I can > > manage my inbox. (I presume other mailers have the same features, but I > > don't know if all of them do.) There are also many people who read the > list > > on a website, e.g. gmane. (Though I think that sometimes the delays > incurred > > there add to the noise -- e.g. when a decision is reached on the list > > sometimes people keep responding to earlier threads.) > > > > > > I fear I did quite a poor job of making my point. :( I've been on open > > source mailing lists since the late 90s, so I've learned strategies for > > dealing with mailing list overload. I've got my mail folders, my mail > rules, > > etc. Having been on many mailing lists over the years, I've seen many > > productive discussions and many unproductive ones, and over time you > start > > to see patterns. You also see what happens to those communities over > time. > > This is one of the major reasons we have the option of escalating > things to the PEP process (and that's currently in train for > os.urandom), as well as the SIGs for when folks really need to dig > into topics that risk incurring a relatively low signal-to-noise > ration on python-dev. It's also why python-ideas was turned into a > separate list, since folks without the time for more speculative > discussions and brainstorming can safely ignore it, while remaining > confident that any ideas considered interesting enough for further > review will be brought to python-dev's attention. > Do we need a security SIG? E.g. would people like Christian and Cory like to have a separate place to talk about the ssl stuff brought up at the language summit? -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Sat Jun 18 00:47:05 2016 From: barry at python.org (Barry Warsaw) Date: Sat, 18 Jun 2016 07:47:05 +0300 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: <20160618074705.58bddccf.barry@wooz.org> On Jun 18, 2016, at 03:06 AM, Brett Cannon wrote: >Do we need a security SIG? E.g. would people like Christian and Cory like >to have a separate place to talk about the ssl stuff brought up at the >language summit? The only thing I'd be worried about is people thinking that the sig is the place to report confidential security issues. Thesaurusly suggesting danger-sig and not just because that sounds so much cooler. not-a-serious-suggestion-ly y'rs, -Barry From songofacandy at gmail.com Sat Jun 18 03:12:50 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sat, 18 Jun 2016 16:12:50 +0900 Subject: [Python-Dev] Compact dict implementations (was: PEP 468 In-Reply-To: References: Message-ID: Now I fixed failing tests (some tests relying to underlying layout). Before posting it to bugs.python.org, I want to confirm I have chance to it merged. First big problem is language spec. If builtin dict in both of PyPy and CPython is ordered, many people will relying it. It will force other Python implementations to implement it for compatibility. In other words, it may be de-facto "Python Language", even if Python Language spec say it's an implementation detail. Is it OK? Second problem is performance. Quick benchmark on my laptop (Sorry, I don't have dedicated hardware for long running stable benchmarking), It reduces 3% memory usage and increase 3% cpu time. I'll run longer benchmark in next week. I think I can't avoid the penalty because index hashtable and (hash, key, value) is not in same cacheline. (I hope my thought is wrong and there is way to optimize more.) pybench: https://gist.github.com/methane/cfad1427d87ceff9310350e78a214880 benchmark: https://gist.github.com/methane/5eb11fdd93863813b222e795ca0bfc1f Is it acceptable? I have some other minor problems (e.g. How I can use 2byte integer? Using int16_t in stdint.h is OK?). I'll discuss them in core-mentor ML or bugs.python.org. Thanks -- INADA Naoki From stephen at xemacs.org Sat Jun 18 06:38:34 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 18 Jun 2016 19:38:34 +0900 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: <22373.9386.96945.48324@turnbull.sk.tsukuba.ac.jp> Brett Cannon writes: > Do we need a security SIG? E.g. would people like Christian and > Cory like to have a separate place to talk about the ssl stuff > brought up at the language summit? Besides what Barry brought up about the potential for attractive nuisance where people post security issues that should be confidential (I don't think it's that great, though), I don't see it solving the "clash of cultures" issue. The people who have invested in learning a lot of technical stuff related to security post as if they believe that "consenting adults" cannot be applied to security issues (more on that below), while RMs and working on distros tend to take the position that, of course, "consenting adults" covers security too. A SIG does help to address Christian's "ya gotta be this tall" to contribute to security discussions, at least in the early stages of discussion, but eventually it's going to arrive at python-dev.[1] ISTM that in this case sufficient behind the scenes discussion took place that the main contributors to the ultimate decision had a pretty good idea of where each other stood, and (I'm guessing here) Larry said "OK, we agree to disagree. I could say I'm RM, you lose, but to be fair I'll ask for a BDFL ruling." Even though there really wasn't anything for most of us to do but wait for that ruling (really -- Guido talks to Ted T'so and Theo de Raadt when he wants advice, there are very few among us who travel in those circles), it ended up that several of the security guys say they're not sure they can participate in Python development any more. I see the security issue as a backyard swimming pool. The law may say you must put a fence around it, but even 6 year olds can climb the fence, fall in the pool, and drown. The hard-line security advocate position then is "the risk is a *kid's life*, backyard pools must be banned". You have to sympathize with their honest and deep concern, but the community accepts that risk in the case of swimming pools. I suspect the Python community at large is going to be happy with Larry's decision and the strategy of emphasizing the secrets module starting with 3.6. If so, the hard-line security advocates are going to have to accept that, or stay painfully frustrated. That would be very unfortunate, because their knowledge is very much needed. Footnotes: [1] Keeping the BFDL ruling within the security group isn't going to work, either -- the news of a secret patch will become public quickly, and it will just seriously harm the trust the community has in its leaders. From cory at lukasa.co.uk Sat Jun 18 10:25:49 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Sat, 18 Jun 2016 15:25:49 +0100 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: <22373.9386.96945.48324@turnbull.sk.tsukuba.ac.jp> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <22373.9386.96945.48324@turnbull.sk.tsukuba.ac.jp> Message-ID: <1BFEC08A-C620-4E89-B801-AA8072F5391A@lukasa.co.uk> > On 18 Jun 2016, at 11:38, Stephen J. Turnbull wrote: > > I see the security issue as a backyard swimming pool. The law may say > you must put a fence around it, but even 6 year olds can climb the > fence, fall in the pool, and drown. The hard-line security advocate > position then is "the risk is a *kid's life*, backyard pools must be > banned". You have to sympathize with their honest and deep concern, > but the community accepts that risk in the case of swimming pools. I > suspect the Python community at large is going to be happy with > Larry's decision and the strategy of emphasizing the secrets module > starting with 3.6. I don?t think that?s really an accurate representation of any of the arguments put forward here. A better analogy is this: - Right now, we have a fence around the neighbourhood swimming pool. This fence has a gate with a guard, and the guard prevents 6 year olds from getting through the gate. - Kids can climb the fence (by using random.choice or something like it). The security community is mostly in agreement with the stdlib folks: we?re happy to say that this problem is best dealt with by educating children to not climb the fence (don?t use random.choice in a security context). - In Python 3.4 and earlier, the guard on this gate will, in some circumstances, turn up to work drunk. The guard is very good at pretending to be sober, so you cannot tell just by looking that he?s drunk, but in this state he will let anyone through the gate. He sobers up fast, so it only matters if a child tries to get in very shortly after you open the swimming pool. - In Python 3.5 we included a patch that, by accident, installed a breathalyser on the gate. Now the guard can only open the gate when he?s sober. - The problem is that he cannot open the gate *for anyone* while he?s drunk. All entry to the pool stops if he shows up drunk. - The security folks want to say ?yes, this breathalyser is awesome, leave it in place, it should always have been there?. - The compat folks want to say ?the gate hasn?t had a breathalyser for years, and it represents a genuine inconvenience to adults who want to swim, so we need to remove it?. We are not trying to take non-CSPRNGs away from you. We are not trying to remove the random module, we are not trying to say that everyone must use urandom for all cases. We totally agree with the consenting adults policy. We just believe that the number of people who have used os.urandom and actively wanted the Linux behaviour may well be zero, and is certainly a tiny fraction of the user base of os.urandom, whereas the number of people who have used os.urandom and expected it to produce safe random bytes is dramatically larger. We believe that invisible risks are bad. We believe that it is difficult to meaningfully consent to behaviour you do not know about. And we believe that when your expectations are violated, it is better for the system to fail hard than to subtly degrade to a behaviour that puts you at risk, because one of these provides a trigger for action and one does not. In the case of ?consenting adults?: users cannot be said to have meaningfully consented to behaviours they do not understand. Consider urllib2 and PEP 476. Prior to Python 2.7.9, urllib2 did not validate TLS certificates. It *could*, if a user was willing to configure it to do so, but by default it did not. We could defend that behaviour under ?consenting adults?: users *technically* consented to the behaviour by using the code. However, most of these users *passively* consented: their consent is inferred by their use of the API, but they have written no code that actively asserts their consent. In Requests, we allow consenting adults to turn off cert verification if they want to, but they have to *actively* consent: they *have* to say verify=False. And they have to say it every time: we deliberately provide no global switch to turn off cert validation in Requests, you have to set verify=False every single time. This is very deliberate. If a user wants to shoot themselves in the foot they are welcome to do so, but we don?t hand the gun to the user pointed at their foot. We?re arguing for the same here with os.urandom(). If you want the Linux default urandom behaviour, that?s fine, but we think that it?s surprising to people and that they should be forced to *actively ask* for that behaviour, rather than passively be given it. The TL;DR is: consent is not the absence of saying no, it?s the presence of saying yes. Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From cory at lukasa.co.uk Sat Jun 18 10:30:31 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Sat, 18 Jun 2016 15:30:31 +0100 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> > On 18 Jun 2016, at 04:06, Brett Cannon wrote: > > Do we need a security SIG? E.g. would people like Christian and Cory like to have a separate place to talk about the ssl stuff brought up at the language summit? Honestly, I?m not sure what we would gain. Unless that SIG is empowered to take action, all it will be is a factory for generating arguments like this one. It will inevitably be either a toxic environment in itself, or a source of toxic threads on python-dev as the security SIG brings new threads like this one to the table. It should be noted that of the three developers that originally stepped forward on the security side of things here (myself, Donald, and Christian), only I am left subscribed to python-dev and nosy?d on the relevant issues. Put another way: each time we do this, several people on the security side burn themselves out in the thread and walk away (it?s possible that those on the other side of the threads do too, I just don?t know those people so well). It?s hard to get enthusiastic about signing people up for that. =) Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From leewangzhong+rsm at gmail.com Sat Jun 18 12:57:16 2016 From: leewangzhong+rsm at gmail.com (Franklin Lee) Date: Sat, 18 Jun 2016 12:57:16 -0400 Subject: [Python-Dev] Compact dict implementations (was: PEP 468 In-Reply-To: References: Message-ID: In the original discussion, I think they decided to reimplement set before dict. The original discussion is here, for anyone else: https://mail.python.org/pipermail/python-dev/2012-December/123028.html On Jun 18, 2016 3:15 AM, "INADA Naoki" wrote: > If builtin dict in both of PyPy and CPython is ordered, many people > will relying it. > It will force other Python implementations to implement it for compatibility. > In other words, it may be de-facto "Python Language", even if Python > Language spec > say it's an implementation detail. > > Is it OK? Ordered, or just initially ordered? I mean, "ordered if no deletion". They discussed scrambling the order. (Subdiscussion was here: https://mail.python.org/pipermail/python-dev/2012-December/123041.html) -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Sat Jun 18 13:13:55 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 19 Jun 2016 02:13:55 +0900 Subject: [Python-Dev] Compact dict implementations (was: PEP 468 In-Reply-To: References: Message-ID: > > pybench: https://gist.github.com/methane/cfad1427d87ceff9310350e78a214880 > benchmark: https://gist.github.com/methane/5eb11fdd93863813b222e795ca0bfc1f > > Is it acceptable? latest result is here https://gist.github.com/methane/22cf5d1dadb62bc87a15e9244a9d0ab8 -- INADA Naoki From ethan at stoneleaf.us Sat Jun 18 13:36:56 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 18 Jun 2016 10:36:56 -0700 Subject: [Python-Dev] security SIG? In-Reply-To: <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> Message-ID: <576586B8.5090009@stoneleaf.us> On 06/18/2016 07:30 AM, Cory Benfield wrote: > On 18 Jun 2016, at 04:06, Brett Cannon wrote: >> Do we need a security SIG? E.g. would people like Christian and Cory like >> to have a separate place to talk about the ssl stuff brought up at the >> language summit? > > Honestly, I?m not sure what we would gain. We would gain a place where security enhancements/fixes can be discussed by those interested, where the environment is "how do we fix/improve such-and-such while breaking as little as possible" (those that want backward-compatibility at all costs need not apply ;). Once a consensus has been reached (and possibly a PEP written, but hopefully that part will only rarely be necessary) then the proposal can be made to py-dev, complete with the "this portion is backwards incompatible, this is the expected impact, this is why it's important, here are the other far more painful alternatives". > Unless that SIG is empowered to take action, all it will be is a factory for > generating arguments like this one. It will inevitably be either a toxic > environment in itself, or a source of toxic threads on python-dev as the > security SIG brings new threads like this one to the table. I suspect the resulting thread on py-dev will be far less painful when the initial discussions on ways to fix/improve this-or-that has already been done, the various options are being laid out, it's clear the new method will be in the next major release (unless incredibly serious, of course). > It should be noted that of the three developers that originally stepped forward > on the security side of things here (myself, Donald, and Christian), only I am > left subscribed to python-dev and nosy?d on the relevant issues. Put another way: > each time we do this, several people on the security side burn themselves out in > the thread and walk away (it?s possible that those on the other side of the > threads do too, I just don?t know those people so well). It?s hard to get > enthusiastic about signing people up for that. =) One of the big advantages of a SIG is the much reduced pool of participants, and that those participants are usually interested in forward progress. It would also be helpful to have a single person both champion and act as buffer for the proposals (not necessarily the same person each time). I am reminded of the matrix-multiply PEP brought forward by Nathaniel a few months ago -- the proposal was researched outside of py-dev, presented to py-dev when ready, Nathaniel acted as the gateway between py-dev and those that wanted/needed the change, the discussion stayed (pretty much) on track, and it felt like the whole thing was very smooth. (If it was somebody else, my apologies for my terrible memory! ;) To sum up: I think it would be a good idea. -- ~Ethan~ From songofacandy at gmail.com Sat Jun 18 13:55:45 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 19 Jun 2016 02:55:45 +0900 Subject: [Python-Dev] Compact dict implementations (was: PEP 468 In-Reply-To: References: Message-ID: > > Ordered, or just initially ordered? I mean, "ordered if no deletion". > I implemented "ordered". Because: * "orderd" is easier to explain than "ordered if no deletion". * I don't want to split sparse index hash and dense entry array. In case of very small dict, index hash (8byte) and first two entries (24*2=48byte) can be on one cache line. * Easy to implement "split dictionary" (aka. key sharing dictionary). You can see what I implemented in here. https://github.com/methane/cpython/pull/1/files -- INADA Naoki From brett at python.org Sat Jun 18 14:10:29 2016 From: brett at python.org (Brett Cannon) Date: Sat, 18 Jun 2016 18:10:29 +0000 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> Message-ID: On Sat, 18 Jun 2016 at 07:30 Cory Benfield wrote: > > > On 18 Jun 2016, at 04:06, Brett Cannon wrote: > > > > Do we need a security SIG? E.g. would people like Christian and Cory > like to have a separate place to talk about the ssl stuff brought up at the > language summit? > > > Honestly, I?m not sure what we would gain. > > Unless that SIG is empowered to take action, all it will be is a factory > for generating arguments like this one. It will inevitably be either a > toxic environment in itself, or a source of toxic threads on python-dev as > the security SIG brings new threads like this one to the table. > > It should be noted that of the three developers that originally stepped > forward on the security side of things here (myself, Donald, and > Christian), only I am left subscribed to python-dev and nosy?d on the > relevant issues. Put another way: each time we do this, several people on > the security side burn themselves out in the thread and walk away (it?s > possible that those on the other side of the threads do too, I just don?t > know those people so well). It?s hard to get enthusiastic about signing > people up for that. =) > And this is the problem I'm trying to solve. As various people have pointed out, the conversation was pretty much cordial, but it did end up feeling like "you're not listening to me" on both sides on top of the volume, which is what I think burned people out on this thread. I think Nick brought up the point that we as a group need to come up with some guideline that we more-or-less stick with to help guide this kind of discussion or else we are going to burn out regularly any time security comes up; we can't keep holding security discussions like this or else we're going to end up in a bad place when everyone burns out and stops caring. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sat Jun 18 15:22:48 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 18 Jun 2016 12:22:48 -0700 Subject: [Python-Dev] Compact dict implementations (was: PEP 468 In-Reply-To: References: Message-ID: <417DB1ED-405D-417A-B868-EF82F9AEB712@gmail.com> > On Jun 18, 2016, at 9:57 AM, Franklin Lee wrote: > > In the original discussion, I think they decided to reimplement set before dict. I ended-up going in a different direction with sets (using linear probes to reduce the cost of collisions). Also, after the original discussion, PyPy implemented the idea for dicts and achieved some nice improvements. So, I think Inada Naoki is going in the right direction by focusing on compact dicts. Raymond From c4obi at yahoo.com Sat Jun 18 17:04:10 2016 From: c4obi at yahoo.com (Obiesie ike-nwosu) Date: Sat, 18 Jun 2016 22:04:10 +0100 Subject: [Python-Dev] JUMP_ABSOLUTE in nested if statements Message-ID: <5509708F-76C5-431F-A1BB-7F379E86B184@yahoo.com> Hi, Could some one give a hand with explaining to me why we have a JUMP_ABSOLUTE followed by a JUMP_FORWARD op code when this function is disassembled. >>> def f1(): ... a, b = 10, 11 ... if a >= 10: ... if b >= 11: ... print("hello world") ? The disassembled function is shown below. >>> dis(f1) 2 0 LOAD_CONST 4 ((10, 11)) 3 UNPACK_SEQUENCE 2 6 STORE_FAST 0 (a) 9 STORE_FAST 1 (b) 3 12 LOAD_FAST 0 (a) 15 LOAD_CONST 1 (10) 18 COMPARE_OP 5 (>=) 21 POP_JUMP_IF_FALSE 47 4 24 LOAD_FAST 1 (b) 27 LOAD_CONST 2 (11) 30 COMPARE_OP 5 (>=) 33 POP_JUMP_IF_FALSE 47 5 36 LOAD_CONST 3 ('hello world') 39 PRINT_ITEM 40 PRINT_NEWLINE 41 JUMP_ABSOLUTE 47 44 JUMP_FORWARD 0 (to 47) >> 47 LOAD_CONST 0 (None) 50 RETURN_VALUE From my understanding, once JUMP_ABSOLUTE is executed, then JUMP_FORWARD is never gotten to so must be dead code so why is it being generated? Furthermore why is JUMP_ABSOLUTE rather than JUMP_FORWARD used in this particular case of nested if statements? I have tried other types of nested if statements and it has always been JUMP_FORWARD that is generated. From victor.stinner at gmail.com Sat Jun 18 18:18:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 19 Jun 2016 00:18:42 +0200 Subject: [Python-Dev] JUMP_ABSOLUTE in nested if statements In-Reply-To: <5509708F-76C5-431F-A1BB-7F379E86B184@yahoo.com> References: <5509708F-76C5-431F-A1BB-7F379E86B184@yahoo.com> Message-ID: Python has a peephole optimizer which does not remove dead code that it just created. Victor Le 18 juin 2016 23:14, "Obiesie ike-nwosu via Python-Dev" < python-dev at python.org> a ?crit : > Hi, > > Could some one give a hand with explaining to me why we have a > JUMP_ABSOLUTE followed by a JUMP_FORWARD op code when this function is > disassembled. > > >>> def f1(): > ... a, b = 10, 11 > ... if a >= 10: > ... if b >= 11: > ... print("hello world") > ? > > The disassembled function is shown below. > >>> dis(f1) > 2 0 LOAD_CONST 4 ((10, 11)) > 3 UNPACK_SEQUENCE 2 > 6 STORE_FAST 0 (a) > 9 STORE_FAST 1 (b) > > 3 12 LOAD_FAST 0 (a) > 15 LOAD_CONST 1 (10) > 18 COMPARE_OP 5 (>=) > 21 POP_JUMP_IF_FALSE 47 > > 4 24 LOAD_FAST 1 (b) > 27 LOAD_CONST 2 (11) > 30 COMPARE_OP 5 (>=) > 33 POP_JUMP_IF_FALSE 47 > > 5 36 LOAD_CONST 3 ('hello world') > 39 PRINT_ITEM > 40 PRINT_NEWLINE > 41 JUMP_ABSOLUTE 47 > 44 JUMP_FORWARD 0 (to 47) > >> 47 LOAD_CONST 0 (None) > 50 RETURN_VALUE > > From my understanding, once JUMP_ABSOLUTE is executed, then JUMP_FORWARD > is never gotten to so must be dead code so why is it being generated? > Furthermore why is JUMP_ABSOLUTE rather than JUMP_FORWARD used in this > particular case of nested if statements? I have tried other types of nested > if statements and it has always been JUMP_FORWARD that > is generated. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sat Jun 18 18:10:21 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 18 Jun 2016 15:10:21 -0700 Subject: [Python-Dev] JUMP_ABSOLUTE in nested if statements In-Reply-To: <5509708F-76C5-431F-A1BB-7F379E86B184@yahoo.com> References: <5509708F-76C5-431F-A1BB-7F379E86B184@yahoo.com> Message-ID: <3646EE2F-C372-49DC-8ADF-5360F3F60BE8@gmail.com> > On Jun 18, 2016, at 2:04 PM, Obiesie ike-nwosu via Python-Dev wrote: > > Hi, > > Could some one give a hand with explaining to me why we have a JUMP_ABSOLUTE followed by a JUMP_FORWARD op code when this function is disassembled. > < snipped> > From my understanding, once JUMP_ABSOLUTE is executed, then JUMP_FORWARD is never gotten to so must be dead code so why is it being generated? > Furthermore why is JUMP_ABSOLUTE rather than JUMP_FORWARD used in this particular case of nested if statements? I have tried other types of nested if statements and it has always been JUMP_FORWARD that > is generated. The AST compilation step generates code with two JUMP_FORWARDs (see below). Then, the peephole optimizer recognizes a jump-to-an-unconditional-jump and replaces the first one with a JUMP_ABSOLUTE to save an unnecessary step. The reason that it uses JUMP_ABSOLUTE instead of JUMP_FORWARD is that the former is more general (it can jump backwards). Using the more general form reduces the complexity of the optimizer. The reason that the remaining jump-to-jump isn't optimized is that the peepholer is intentionally kept simplistic, making only a single pass over the opcodes. That misses some optimizations but gets the most common cases. FWIW, the jump opcodes are very fast, so missing the final jump-to-jump isn't much of a loss. If you're curious, the relevant code is in Python/compile.c and Python/peephole.c. The compile.c code generated opcodes in the most straight-forward way possible and then the peephole optimizer gets some of the low-hanging fruit by making a few simple transformations. Raymond ------------ AST generated code before peephole optimization ----------------- 5 0 LOAD_CONST 1 (10) 3 LOAD_CONST 2 (11) 6 BUILD_TUPLE 2 9 UNPACK_SEQUENCE 2 12 STORE_FAST 0 (a) 15 STORE_FAST 1 (b) 6 18 LOAD_FAST 0 (a) 21 LOAD_CONST 1 (10) 24 COMPARE_OP 5 (>=) 27 POP_JUMP_IF_FALSE 53 7 30 LOAD_FAST 1 (b) 33 LOAD_CONST 2 (11) 36 COMPARE_OP 5 (>=) 39 POP_JUMP_IF_FALSE 50 8 42 LOAD_CONST 3 ('hello world') 45 PRINT_ITEM 46 PRINT_NEWLINE 47 JUMP_FORWARD 0 (to 50) >> 50 JUMP_FORWARD 0 (to 53) >> 53 LOAD_CONST 0 (None) 56 RETURN_VALUE From barry at python.org Sat Jun 18 18:41:23 2016 From: barry at python.org (Barry Warsaw) Date: Sat, 18 Jun 2016 18:41:23 -0400 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> Message-ID: <20160618184123.4ad9b93b.barry@wooz.org> On Jun 18, 2016, at 03:30 PM, Cory Benfield wrote: >Unless that SIG is empowered to take action It wouldn't be, but there *is* a private security mailing list that is. Christian was on it, and I'm sad that he got burned out. If you are willing and able to help out there, please contact security at python dot org. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From steve.dower at python.org Sat Jun 18 18:47:57 2016 From: steve.dower at python.org (Steve Dower) Date: Sat, 18 Jun 2016 15:47:57 -0700 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> Message-ID: It's not just security discussions. The same thing happened with fspath, tzinfo, and many others that I have erased from my own memory. distutils-sig sees them often as well. The whole thing seems like a limitation of written communication. There's no way to indicate or define whether something should be nitpicked or not, and so everything gets line-by-line analysis whether it deserves it or not, which is what leads to such huge and fragmented threads, regardless of topic. At work, when we start seeing email or IM discussions going this way, we schedule a meeting. Perhaps we need a formal outlet for suspending discussion (and moderating incoming emails with a particular subject?) until an online call can be held and outcomes presented back to the list. Maybe we should schedule monthly online language summits and defer these discussions/decisions to that? I know that change won't be popular with some people. Honestly, if you haven't contributed more than the people who quit python-dev over these threads, you don't get to demand status quo. We need to change something, and I don't think more email or mute buttons (sorry Guido :) ) are the answer. Top-posted from my Windows Phone -----Original Message----- From: "Brett Cannon" Sent: ?6/?18/?2016 11:13 To: "Cory Benfield" Cc: "Nick Coghlan" ; "Python Dev" Subject: Re: [Python-Dev] security SIG? (was: Discussion overload) On Sat, 18 Jun 2016 at 07:30 Cory Benfield wrote: > On 18 Jun 2016, at 04:06, Brett Cannon wrote: > > Do we need a security SIG? E.g. would people like Christian and Cory like to have a separate place to talk about the ssl stuff brought up at the language summit? Honestly, I?m not sure what we would gain. Unless that SIG is empowered to take action, all it will be is a factory for generating arguments like this one. It will inevitably be either a toxic environment in itself, or a source of toxic threads on python-dev as the security SIG brings new threads like this one to the table. It should be noted that of the three developers that originally stepped forward on the security side of things here (myself, Donald, and Christian), only I am left subscribed to python-dev and nosy?d on the relevant issues. Put another way: each time we do this, several people on the security side burn themselves out in the thread and walk away (it?s possible that those on the other side of the threads do too, I just don?t know those people so well). It?s hard to get enthusiastic about signing people up for that. =) And this is the problem I'm trying to solve. As various people have pointed out, the conversation was pretty much cordial, but it did end up feeling like "you're not listening to me" on both sides on top of the volume, which is what I think burned people out on this thread. I think Nick brought up the point that we as a group need to come up with some guideline that we more-or-less stick with to help guide this kind of discussion or else we are going to burn out regularly any time security comes up; we can't keep holding security discussions like this or else we're going to end up in a bad place when everyone burns out and stops caring. -------------- next part -------------- An HTML attachment was scrubbed... URL: From c4obi at yahoo.com Sat Jun 18 18:32:52 2016 From: c4obi at yahoo.com (Obiesie ike-nwosu) Date: Sat, 18 Jun 2016 23:32:52 +0100 Subject: [Python-Dev] JUMP_ABSOLUTE in nested if statements In-Reply-To: <3646EE2F-C372-49DC-8ADF-5360F3F60BE8@gmail.com> References: <5509708F-76C5-431F-A1BB-7F379E86B184@yahoo.com> <3646EE2F-C372-49DC-8ADF-5360F3F60BE8@gmail.com> Message-ID: That is much clearer now. Thanks a lot Raymond for taking the time out to explain this to me. On a closing note, is this mailing list the right place to ask these kinds of n00b questions? Obi. > On 18 Jun 2016, at 23:10, Raymond Hettinger wrote: > > >> On Jun 18, 2016, at 2:04 PM, Obiesie ike-nwosu via Python-Dev wrote: >> >> Hi, >> >> Could some one give a hand with explaining to me why we have a JUMP_ABSOLUTE followed by a JUMP_FORWARD op code when this function is disassembled. >> < snipped> >> From my understanding, once JUMP_ABSOLUTE is executed, then JUMP_FORWARD is never gotten to so must be dead code so why is it being generated? >> Furthermore why is JUMP_ABSOLUTE rather than JUMP_FORWARD used in this particular case of nested if statements? I have tried other types of nested if statements and it has always been JUMP_FORWARD that >> is generated. > > The AST compilation step generates code with two JUMP_FORWARDs (see below). Then, the peephole optimizer recognizes a jump-to-an-unconditional-jump and replaces the first one with a JUMP_ABSOLUTE to save an unnecessary step. > > The reason that it uses JUMP_ABSOLUTE instead of JUMP_FORWARD is that the former is more general (it can jump backwards). Using the more general form reduces the complexity of the optimizer. > > The reason that the remaining jump-to-jump isn't optimized is that the peepholer is intentionally kept simplistic, making only a single pass over the opcodes. That misses some optimizations but gets the most common cases. > > FWIW, the jump opcodes are very fast, so missing the final jump-to-jump isn't much of a loss. > > If you're curious, the relevant code is in Python/compile.c and Python/peephole.c. The compile.c code generated opcodes in the most straight-forward way possible and then the peephole optimizer gets some of the low-hanging fruit by making a few simple transformations. > > > Raymond > > > ------------ AST generated code before peephole optimization ----------------- > > > 5 0 LOAD_CONST 1 (10) > 3 LOAD_CONST 2 (11) > 6 BUILD_TUPLE 2 > 9 UNPACK_SEQUENCE 2 > 12 STORE_FAST 0 (a) > 15 STORE_FAST 1 (b) > > 6 18 LOAD_FAST 0 (a) > 21 LOAD_CONST 1 (10) > 24 COMPARE_OP 5 (>=) > 27 POP_JUMP_IF_FALSE 53 > > 7 30 LOAD_FAST 1 (b) > 33 LOAD_CONST 2 (11) > 36 COMPARE_OP 5 (>=) > 39 POP_JUMP_IF_FALSE 50 > > 8 42 LOAD_CONST 3 ('hello world') > 45 PRINT_ITEM > 46 PRINT_NEWLINE > 47 JUMP_FORWARD 0 (to 50) >>> 50 JUMP_FORWARD 0 (to 53) >>> 53 LOAD_CONST 0 (None) > 56 RETURN_VALUE > From guido at python.org Sat Jun 18 20:39:54 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Jun 2016 17:39:54 -0700 Subject: [Python-Dev] security SIG? (was: Discussion overload) In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> Message-ID: Like it or not, written communication is all we have. However, I do think we are running into some kind of limitation: the ancient concept of mailing lists (or newsgroups). I would like to continue the discussion of this limitation in the original thread. PS. I think it's somewhat ironic that Steve posted his idea to deal with discussions run amok in the forked thread that was meant specifically t discuss the proposal for a security-sig. Ditto that Cory used this same thread to bring up his philosophy about computer security -- that topic itself belongs clearly in the proposed SIG or on python-dev (if we don't create a SIG) but not (yet) in the discussion about whether we should create a SIG. On Sat, Jun 18, 2016 at 3:47 PM, Steve Dower wrote: > It's not just security discussions. The same thing happened with fspath, > tzinfo, and many others that I have erased from my own memory. > distutils-sig sees them often as well. > > The whole thing seems like a limitation of written communication. There's > no way to indicate or define whether something should be nitpicked or not, > and so everything gets line-by-line analysis whether it deserves it or not, > which is what leads to such huge and fragmented threads, regardless of > topic. > > At work, when we start seeing email or IM discussions going this way, we > schedule a meeting. Perhaps we need a formal outlet for suspending > discussion (and moderating incoming emails with a particular subject?) > until an online call can be held and outcomes presented back to the list. > Maybe we should schedule monthly online language summits and defer these > discussions/decisions to that? > > I know that change won't be popular with some people. Honestly, if you > haven't contributed more than the people who quit python-dev over these > threads, you don't get to demand status quo. We need to change something, > and I don't think more email or mute buttons (sorry Guido :) ) are the > answer. > > Top-posted from my Windows Phone > ------------------------------ > From: Brett Cannon > Sent: ?6/?18/?2016 11:13 > To: Cory Benfield > Cc: Nick Coghlan ; Python Dev > Subject: Re: [Python-Dev] security SIG? (was: Discussion overload) > > > > On Sat, 18 Jun 2016 at 07:30 Cory Benfield wrote: > >> >> > On 18 Jun 2016, at 04:06, Brett Cannon wrote: >> > >> > Do we need a security SIG? E.g. would people like Christian and Cory >> like to have a separate place to talk about the ssl stuff brought up at the >> language summit? >> >> >> Honestly, I?m not sure what we would gain. >> >> Unless that SIG is empowered to take action, all it will be is a factory >> for generating arguments like this one. It will inevitably be either a >> toxic environment in itself, or a source of toxic threads on python-dev as >> the security SIG brings new threads like this one to the table. >> >> It should be noted that of the three developers that originally stepped >> forward on the security side of things here (myself, Donald, and >> Christian), only I am left subscribed to python-dev and nosy?d on the >> relevant issues. Put another way: each time we do this, several people on >> the security side burn themselves out in the thread and walk away (it?s >> possible that those on the other side of the threads do too, I just don?t >> know those people so well). It?s hard to get enthusiastic about signing >> people up for that. =) >> > > And this is the problem I'm trying to solve. As various people have > pointed out, the conversation was pretty much cordial, but it did end up > feeling like "you're not listening to me" on both sides on top of the > volume, which is what I think burned people out on this thread. > > I think Nick brought up the point that we as a group need to come up with > some guideline that we more-or-less stick with to help guide this kind of > discussion or else we are going to burn out regularly any time security > comes up; we can't keep holding security discussions like this or else > we're going to end up in a bad place when everyone burns out and stops > caring. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Jun 18 21:17:03 2016 From: brett at python.org (Brett Cannon) Date: Sun, 19 Jun 2016 01:17:03 +0000 Subject: [Python-Dev] Discussion overload In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: Over on the "security SIG" thread, the point has been made that we seem to be hitting some limits in communication (Steve Dower said written communication, Guido said mailing lists/newsgroups). Based on the burnout we are seeing from these centi-threads we need to try and come up with some solution to this problem, else we are heading towards a bad place sue to communication burn-out. For me, I don't think we can give up written communication thanks to how worldwide we all are and thus make scheduling some monthly video chat very difficult. What I would like to consider, though, is something like Discourse where we at least have a chance to have tools available to us to manage discussions better than through federated email where everyone has different experiences in terms of delivery rate, ability to filter, splitting discussions, locking down out-of-control discussions, etc. I think harmonizing the experience along with better controls could help make all of this more manageable. On Fri, Jun 17, 2016, 18:13 Nick Coghlan wrote: > On 16 June 2016 at 19:00, Kevin Ollivier > wrote: > > Hi Guido, > > > > From: on behalf of Guido van Rossum > > > > Reply-To: > > Date: Thursday, June 16, 2016 at 5:27 PM > > To: Kevin Ollivier > > Cc: Python Dev > > Subject: Re: [Python-Dev] Discussion overload > > > > Hi Kevin, > > > > I often feel the same way. Are you using GMail? It combines related > messages > > in threads and lets you mute threads. I often use this feature so I can > > manage my inbox. (I presume other mailers have the same features, but I > > don't know if all of them do.) There are also many people who read the > list > > on a website, e.g. gmane. (Though I think that sometimes the delays > incurred > > there add to the noise -- e.g. when a decision is reached on the list > > sometimes people keep responding to earlier threads.) > > > > > > I fear I did quite a poor job of making my point. :( I've been on open > > source mailing lists since the late 90s, so I've learned strategies for > > dealing with mailing list overload. I've got my mail folders, my mail > rules, > > etc. Having been on many mailing lists over the years, I've seen many > > productive discussions and many unproductive ones, and over time you > start > > to see patterns. You also see what happens to those communities over > time. > > This is one of the major reasons we have the option of escalating > things to the PEP process (and that's currently in train for > os.urandom), as well as the SIGs for when folks really need to dig > into topics that risk incurring a relatively low signal-to-noise > ration on python-dev. It's also why python-ideas was turned into a > separate list, since folks without the time for more speculative > discussions and brainstorming can safely ignore it, while remaining > confident that any ideas considered interesting enough for further > review will be brought to python-dev's attention. > > But yes, one of the more significant design errors I've made with the > contextlib API was due to just such a draining pile-on by folks that > weren't happy the original name wasn't a 100% accurate description of > the underlying mechanics (even though it was an accurate description > of the intended use case), and "people yelling at you on project > communication channels without doing adequate research first" is the > number one reason we see otherwise happily engaged core developers > decide to find something else to do with their time. > > The challenge and art in community management in that context is > balancing telling both old and new list participants "It's OK to ask > 'Why is this so?', as sometimes the answer is that there isn't a good > reason and we may want to change it" and "Learn to be a good peer > manager, and avoid behaving like a micro-managing autocrat that chases > away experienced contributors". > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 18 21:57:49 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Jun 2016 18:57:49 -0700 Subject: [Python-Dev] Discussion overload In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: On Sat, Jun 18, 2016 at 6:17 PM, Brett Cannon wrote: > Over on the "security SIG" thread, the point has been made that we seem to > be hitting some limits in communication (Steve Dower said written > communication, Guido said mailing lists/newsgroups). Based on the burnout > we are seeing from these centi-threads we need to try and come up with some > solution to this problem, else we are heading towards a bad place [d]ue to > communication burn-out. > > For me, I don't think we can give up written communication thanks to how > worldwide we all are and thus make scheduling some monthly video chat very > difficult. What I would like to consider, though, is something like > Discourse where we at least have a chance to have tools available to us to > manage discussions better than through federated email where everyone has > different experiences in terms of delivery rate, ability to filter, > splitting discussions, locking down out-of-control discussions, etc. I > think harmonizing the experience along with better controls could help make > all of this more manageable. > Agreed that any form of real-time communication is out. First, I want to apologize to Kevin -- I only skimmed his message. I only saw that he had carefully qualified himself as a long-time open source contributor and list participant when I re-read his message. I also want to keep this short, so I'm proof-reading this before posting. Many projects on which I am currently working use one or more GitHub issue trackers as their main communication mechanism (mypy et al. don't even have a mailing list). I find that this works quite well to stay focused. We have quite a few issues that track important discussions over many days, weeks or months, and there is very little noise or cross-talk. It's easy to stay on topic, it's much easier to refer to other topics, it's easy to mute individual topics, and it's much less likely that a topic degenerates into a different discussion altogether (because it's easy to create a new issue for it). It's also easier to moderate, and you can even edit conversations (with restraint). I also like that it's possible to to do sentence-by-sentence quotation, but the extra effort required (copy/paste) encourages a linear thread of conversation within one issue. I did a quick check of my inbox and I think over the past week I had about as much mypy-related messages generated by GitHub as there were python-dev messages. And I felt much less bad for ignoring much of the mypy traffic while I was on vacation than I felt for ignoring python-dev, because it's easy to catch up using GitHub's web UI. (And no, I don't want to use gmane. I think it doesn't solve any of the other problems.) I don't know Discourse, but if it has a similar (or even better) feature set maybe we should give it a try. Or, now that we're going to migrate the CPython repo to GitHub, maybe we could just give GitHub's issue tracker a try? We could create a repo that has just a tracker (or a tracker plus a README.md explaining its purpose -- eventually we could add more resources and even a wiki). I'm sure that in the venerable python-dev tradition everyone is now jumping to give their opinion about Discourse, the GitHub tracker, their favorite alternative, the needs for free-form discussion, the need to have a GitHub account to participate, Slack, and the upcoming Mailman 3.0. But let's not do that, because it would be too self-referential (and defeat the purpose). I think we seriously need to rethink the way we have conversations here, and that includes the conversation about conversations. Here's my proposal: let's decide what to do about this roughly the same way we decided what to do with Mercurial. We don't have to take as long, but we'll use a similar process: a small committee run by a dedicated volunteer will compare alternatives and pick a strategy. If you're interested in serving on this committee, send me email off-list. If you want to head the committee, ditto. If you reply-all, you're automatically disqualified. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jun 18 23:46:51 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jun 2016 13:46:51 +1000 Subject: [Python-Dev] JUMP_ABSOLUTE in nested if statements In-Reply-To: References: <5509708F-76C5-431F-A1BB-7F379E86B184@yahoo.com> <3646EE2F-C372-49DC-8ADF-5360F3F60BE8@gmail.com> Message-ID: <20160619034651.GP27919@ando.pearwood.info> On Sat, Jun 18, 2016 at 11:32:52PM +0100, Obiesie ike-nwosu via Python-Dev wrote: > That is much clearer now. > Thanks a lot Raymond for taking the time out to explain this to me. > On a closing note, is this mailing list the right place to ask these kinds of n00b questions? That depends what sort of n00b question. If they are specifically related to the internals of the CPython interpreter, then this is certainly the right place. Code generation will count as an internal function of the interpreter. If they're general questions about Python the language, then the python-list mailing list is better. (Also available as comp.lang.python on Usenet.) Beware: it tends to be a high-volume, easily distracted forum where people often go off on long discussions which are only peripherally related to Python. -- Steve From songofacandy at gmail.com Sat Jun 18 23:48:43 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 19 Jun 2016 12:48:43 +0900 Subject: [Python-Dev] Compact dict implementations (was: PEP 468 In-Reply-To: References: Message-ID: I've sent my patch to issue tracker, since I can't fix some remains TODOs by myself. http://bugs.python.org/issue27350 On Fri, Jun 17, 2016 at 6:15 PM, INADA Naoki wrote: > Hi, developers. > > I'm trying to implement compact dict. > https://github.com/methane/cpython/pull/1 > > Current status is passing most of tests. > Some tests are failing because of I haven't updated `sizeof` until layout fix. > And I haven't dropped OrderedDict has linked list. > > Before finishing implementation, I want to see comments and tests from core > developers. > Please come to core-mentorship ML or pull request and try it if you > interested in. > > Regards, > -- > INADA Naoki > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com -- INADA Naoki From guido at python.org Sun Jun 19 00:48:47 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Jun 2016 21:48:47 -0700 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: Hi Brett, I've got a few questions about the specific design. Probably you know the answers, it would be nice to have them in the PEP. First, why not have a global hook? What does a hook per interpreter give you? Would even finer granularity buy anything? Next, I'm a bit (but no more than a bit) concerned about the extra 8 bytes per code object, especially since for most people this is just waste (assuming most people won't be using Pyjion or Numba). Could it be a compile-time feature (requiring recompilation of CPython but not extensions)? Could you figure out some other way to store per-code-object data? It seems you considered this but decided that the co_extra field was simpler and faster; I'm basically pushing a little harder on this. Of course most of the PEP would disappear without this feature; the extra interpreter field is fine. Finally, there are some error messages from pep2html.py: https://www.python.org/dev/peps/pep-0523/#copyright --Guido On Fri, Jun 17, 2016 at 7:58 PM, Brett Cannon wrote: > I have taken PEP 523 for this: > https://github.com/python/peps/blob/master/pep-0523.txt . > > I'm waiting until Guido gets back from vacation, at which point I'll ask > for a pronouncement or assignment of a BDFL delegate. > > On Fri, 3 Jun 2016 at 14:37 Brett Cannon wrote: > >> For those of you who follow python-ideas or were at the PyCon US 2016 >> language summit, you have already seen/heard about this PEP. For those of >> you who don't fall into either of those categories, this PEP proposed a >> frame evaluation API for CPython. The motivating example of this work has >> been Pyjion, the experimental CPython JIT Dino Viehland and I have been >> working on in our spare time at Microsoft. The API also works for >> debugging, though, as already demonstrated by Google having added a very >> similar API internally for debugging purposes. >> >> The PEP is pasted in below and also available in rendered form at >> https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I will assign >> myself a PEP # once discussion is finished as it's easier to work in git >> for this for the rich rendering of the in-progress PEP). >> >> I should mention that the difference from python-ideas and the language >> summit in the PEP are the listed support from Google's use of a very >> similar API as well as clarifying the co_extra field on code objects >> doesn't change their immutability (at least from the view of the PEP). >> >> ---------- >> PEP: NNN >> Title: Adding a frame evaluation API to CPython >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Brett Cannon , >> Dino Viehland >> Status: Draft >> Type: Standards Track >> Content-Type: text/x-rst >> Created: 16-May-2016 >> Post-History: 16-May-2016 >> 03-Jun-2016 >> >> >> Abstract >> ======== >> >> This PEP proposes to expand CPython's C API [#c-api]_ to allow for >> the specification of a per-interpreter function pointer to handle the >> evaluation of frames [#pyeval_evalframeex]_. This proposal also >> suggests adding a new field to code objects [#pycodeobject]_ to store >> arbitrary data for use by the frame evaluation function. >> >> >> Rationale >> ========= >> >> One place where flexibility has been lacking in Python is in the direct >> execution of Python code. While CPython's C API [#c-api]_ allows for >> constructing the data going into a frame object and then evaluating it >> via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control over the >> execution of Python code comes down to individual objects instead of a >> hollistic control of execution at the frame level. >> >> While wanting to have influence over frame evaluation may seem a bit >> too low-level, it does open the possibility for things such as a >> method-level JIT to be introduced into CPython without CPython itself >> having to provide one. By allowing external C code to control frame >> evaluation, a JIT can participate in the execution of Python code at >> the key point where evaluation occurs. This then allows for a JIT to >> conditionally recompile Python bytecode to machine code as desired >> while still allowing for executing regular CPython bytecode when >> running the JIT is not desired. This can be accomplished by allowing >> interpreters to specify what function to call to evaluate a frame. And >> by placing the API at the frame evaluation level it allows for a >> complete view of the execution environment of the code for the JIT. >> >> This ability to specify a frame evaluation function also allows for >> other use-cases beyond just opening CPython up to a JIT. For instance, >> it would not be difficult to implement a tracing or profiling function >> at the call level with this API. While CPython does provide the >> ability to set a tracing or profiling function at the Python level, >> this would be able to match the data collection of the profiler and >> quite possibly be faster for tracing by simply skipping per-line >> tracing support. >> >> It also opens up the possibility of debugging where the frame >> evaluation function only performs special debugging work when it >> detects it is about to execute a specific code object. In that >> instance the bytecode could be theoretically rewritten in-place to >> inject a breakpoint function call at the proper point for help in >> debugging while not having to do a heavy-handed approach as >> required by ``sys.settrace()``. >> >> To help facilitate these use-cases, we are also proposing the adding >> of a "scratch space" on code objects via a new field. This will allow >> per-code object data to be stored with the code object itself for easy >> retrieval by the frame evaluation function as necessary. The field >> itself will simply be a ``PyObject *`` type so that any data stored in >> the field will participate in normal object memory management. >> >> >> Proposal >> ======== >> >> All proposed C API changes below will not be part of the stable ABI. >> >> >> Expanding ``PyCodeObject`` >> -------------------------- >> >> One field is to be added to the ``PyCodeObject`` struct >> [#pycodeobject]_:: >> >> typedef struct { >> ... >> PyObject *co_extra; /* "Scratch space" for the code object. */ >> } PyCodeObject; >> >> The ``co_extra`` will be ``NULL`` by default and will not be used by >> CPython itself. Third-party code is free to use the field as desired. >> Values stored in the field are expected to not be required in order >> for the code object to function, allowing the loss of the data of the >> field to be acceptable (this keeps the code object as immutable from >> a functionality point-of-view; this is slightly contentious and so is >> listed as an open issue in `Is co_extra needed?`_). The field will be >> freed like all other fields on ``PyCodeObject`` during deallocation >> using ``Py_XDECREF()``. >> >> It is not recommended that multiple users attempt to use the >> ``co_extra`` simultaneously. While a dictionary could theoretically be >> set to the field and various users could use a key specific to the >> project, there is still the issue of key collisions as well as >> performance degradation from using a dictionary lookup on every frame >> evaluation. Users are expected to do a type check to make sure that >> the field has not been previously set by someone else. >> >> >> Expanding ``PyInterpreterState`` >> -------------------------------- >> >> The entrypoint for the frame evalution function is per-interpreter:: >> >> // Same type signature as PyEval_EvalFrameEx(). >> typedef PyObject* (__stdcall *PyFrameEvalFunction)(PyFrameObject*, int); >> >> typedef struct { >> ... >> PyFrameEvalFunction eval_frame; >> } PyInterpreterState; >> >> By default, the ``eval_frame`` field will be initialized to a function >> pointer that represents what ``PyEval_EvalFrameEx()`` currently is >> (called ``PyEval_EvalFrameDefault()``, discussed later in this PEP). >> Third-party code may then set their own frame evaluation function >> instead to control the execution of Python code. A pointer comparison >> can be used to detect if the field is set to >> ``PyEval_EvalFrameDefault()`` and thus has not been mutated yet. >> >> >> Changes to ``Python/ceval.c`` >> ----------------------------- >> >> ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it currently stands >> will be renamed to ``PyEval_EvalFrameDefault()``. The new >> ``PyEval_EvalFrameEx()`` will then become:: >> >> PyObject * >> PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) >> { >> PyThreadState *tstate = PyThreadState_GET(); >> return tstate->interp->eval_frame(frame, throwflag); >> } >> >> This allows third-party code to place themselves directly in the path >> of Python code execution while being backwards-compatible with code >> already using the pre-existing C API. >> >> >> Updating ``python-gdb.py`` >> -------------------------- >> >> The generated ``python-gdb.py`` file used for Python support in GDB >> makes some hard-coded assumptions about ``PyEval_EvalFrameEx()``, e.g. >> the names of local variables. It will need to be updated to work with >> the proposed changes. >> >> >> Performance impact >> ================== >> >> As this PEP is proposing an API to add pluggability, performance >> impact is considered only in the case where no third-party code has >> made any changes. >> >> Several runs of pybench [#pybench]_ consistently showed no performance >> cost from the API change alone. >> >> A run of the Python benchmark suite [#py-benchmarks]_ showed no >> measurable cost in performance. >> >> In terms of memory impact, since there are typically not many CPython >> interpreters executing in a single process that means the impact of >> ``co_extra`` being added to ``PyCodeObject`` is the only worry. >> According to [#code-object-count]_, a run of the Python test suite >> results in about 72,395 code objects being created. On a 64-bit >> CPU that would result in 579,160 bytes of extra memory being used if >> all code objects were alive at once and had nothing set in their >> ``co_extra`` fields. >> >> >> Example Usage >> ============= >> >> A JIT for CPython >> ----------------- >> >> Pyjion >> '''''' >> >> The Pyjion project [#pyjion]_ has used this proposed API to implement >> a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each code >> object has its ``co_extra`` field set to a ``PyjionJittedCode`` object >> which stores four pieces of information: >> >> 1. Execution count >> 2. A boolean representing whether a previous attempt to JIT failed >> 3. A function pointer to a trampoline (which can be type tracing or not) >> 4. A void pointer to any JIT-compiled machine code >> >> The frame evaluation function has (roughly) the following algorithm:: >> >> def eval_frame(frame, throw_flag): >> pyjion_code = frame.code.co_extra >> if not pyjion_code: >> frame.code.co_extra = PyjionJittedCode() >> elif not pyjion_code.jit_failed: >> if not pyjion_code.jit_code: >> return pyjion_code.eval(pyjion_code.jit_code, frame) >> elif pyjion_code.exec_count > 20_000: >> if jit_compile(frame): >> return pyjion_code.eval(pyjion_code.jit_code, frame) >> else: >> pyjion_code.jit_failed = True >> pyjion_code.exec_count += 1 >> return PyEval_EvalFrameDefault(frame, throw_flag) >> >> The key point, though, is that all of this work and logic is separate >> from CPython and yet with the proposed API changes it is able to >> provide a JIT that is compliant with Python semantics (as of this >> writing, performance is almost equivalent to CPython without the new >> API). This means there's nothing technically preventing others from >> implementing their own JITs for CPython by utilizing the proposed API. >> >> >> Other JITs >> '''''''''' >> >> It should be mentioned that the Pyston team was consulted on an >> earlier version of this PEP that was more JIT-specific and they were >> not interested in utilizing the changes proposed because they want >> control over memory layout they had no interest in directly supporting >> CPython itself. An informal discusion with a developer on the PyPy >> team led to a similar comment. >> >> Numba [#numba]_, on the other hand, suggested that they would be >> interested in the proposed change in a post-1.0 future for >> themselves [#numba-interest]_. >> >> The experimental Coconut JIT [#coconut]_ could have benefitted from >> this PEP. In private conversations with Coconut's creator we were told >> that our API was probably superior to the one they developed for >> Coconut to add JIT support to CPython. >> >> >> Debugging >> --------- >> >> In conversations with the Python Tools for Visual Studio team (PTVS) >> [#ptvs]_, they thought they would find these API changes useful for >> implementing more performant debugging. As mentioned in the Rationale_ >> section, this API would allow for switching on debugging functionality >> only in frames where it is needed. This could allow for either >> skipping information that ``sys.settrace()`` normally provides and >> even go as far as to dynamically rewrite bytecode prior to execution >> to inject e.g. breakpoints in the bytecode. >> >> It also turns out that Google has provided a very similar API >> internally for years. It has been used for performant debugging >> purposes. >> >> >> Implementation >> ============== >> >> A set of patches implementing the proposed API is available through >> the Pyjion project [#pyjion]_. In its current form it has more >> changes to CPython than just this proposed API, but that is for ease >> of development instead of strict requirements to accomplish its goals. >> >> >> Open Issues >> =========== >> >> Allow ``eval_frame`` to be ``NULL`` >> ----------------------------------- >> >> Currently the frame evaluation function is expected to always be set. >> It could very easily simply default to ``NULL`` instead which would >> signal to use ``PyEval_EvalFrameDefault()``. The current proposal of >> not special-casing the field seemed the most straight-forward, but it >> does require that the field not accidentally be cleared, else a crash >> may occur. >> >> >> Is co_extra needed? >> ------------------- >> >> While discussing this PEP at PyCon US 2016, some core developers >> expressed their worry of the ``co_extra`` field making code objects >> mutable. The thinking seemed to be that having a field that was >> mutated after the creation of the code object made the object seem >> mutable, even though no other aspect of code objects changed. >> >> The view of this PEP is that the `co_extra` field doesn't change the >> fact that code objects are immutable. The field is specified in this >> PEP as to not contain information required to make the code object >> usable, making it more of a caching field. It could be viewed as >> similar to the UTF-8 cache that string objects have internally; >> strings are still considered immutable even though they have a field >> that is conditionally set. >> >> The field is also not strictly necessary. While the field greatly >> simplifies attaching extra information to code objects, other options >> such as keeping a mapping of code object memory addresses to what >> would have been kept in ``co_extra`` or perhaps using a weak reference >> of the data on the code object and then iterating through the weak >> references until the attached data is found is possible. But obviously >> all of these solutions are not as simple or performant as adding the >> ``co_extra`` field. >> >> >> Rejected Ideas >> ============== >> >> A JIT-specific C API >> -------------------- >> >> Originally this PEP was going to propose a much larger API change >> which was more JIT-specific. After soliciting feedback from the Numba >> team [#numba]_, though, it became clear that the API was unnecessarily >> large. The realization was made that all that was truly needed was the >> opportunity to provide a trampoline function to handle execution of >> Python code that had been JIT-compiled and a way to attach that >> compiled machine code along with other critical data to the >> corresponding Python code object. Once it was shown that there was no >> loss in functionality or in performance while minimizing the API >> changes required, the proposal was changed to its current form. >> >> >> References >> ========== >> >> .. [#pyjion] Pyjion project >> (https://github.com/microsoft/pyjion) >> >> .. [#c-api] CPython's C API >> (https://docs.python.org/3/c-api/index.html) >> >> .. [#pycodeobject] ``PyCodeObject`` >> (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) >> >> .. [#coreclr] .NET Core Runtime (CoreCLR) >> (https://github.com/dotnet/coreclr) >> >> .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` >> ( >> https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx >> ) >> >> .. [#pycodeobject] ``PyCodeObject`` >> (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) >> >> .. [#numba] Numba >> (http://numba.pydata.org/) >> >> .. [#numba-interest] numba-users mailing list: >> "Would the C API for a JIT entrypoint being proposed by Pyjion help >> out Numba?" >> ( >> https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g >> ) >> >> .. [#code-object-count] [Python-Dev] Opcode cache in ceval loop >> ( >> https://mail.python.org/pipermail/python-dev/2016-February/143025.html) >> >> .. [#py-benchmarks] Python benchmark suite >> (https://hg.python.org/benchmarks) >> >> .. [#pyston] Pyston >> (http://pyston.org) >> >> .. [#pypy] PyPy >> (http://pypy.org/) >> >> .. [#ptvs] Python Tools for Visual Studio >> (http://microsoft.github.io/PTVS/) >> >> .. [#coconut] Coconut >> (https://github.com/davidmalcolm/coconut) >> >> >> Copyright >> ========= >> >> This document has been placed in the public domain. >> >> >> .. >> Local Variables: >> mode: indented-text >> indent-tabs-mode: nil >> sentence-end-double-space: t >> fill-column: 70 >> coding: utf-8 >> End: >> >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin-lists at theolliviers.com Sun Jun 19 12:12:36 2016 From: kevin-lists at theolliviers.com (Kevin Ollivier) Date: Sun, 19 Jun 2016 09:12:36 -0700 Subject: [Python-Dev] Discussion overload In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> Message-ID: Hi Nick, On 6/17/16, 6:12 PM, "Nick Coghlan" wrote: >On 16 June 2016 at 19:00, Kevin Ollivier wrote: >> Hi Guido, >> >> From: on behalf of Guido van Rossum >> >> Reply-To: >> Date: Thursday, June 16, 2016 at 5:27 PM >> To: Kevin Ollivier >> Cc: Python Dev >> Subject: Re: [Python-Dev] Discussion overload >> >> Hi Kevin, >> >> I often feel the same way. Are you using GMail? It combines related messages >> in threads and lets you mute threads. I often use this feature so I can >> manage my inbox. (I presume other mailers have the same features, but I >> don't know if all of them do.) There are also many people who read the list >> on a website, e.g. gmane. (Though I think that sometimes the delays incurred >> there add to the noise -- e.g. when a decision is reached on the list >> sometimes people keep responding to earlier threads.) >> >> >> I fear I did quite a poor job of making my point. :( I've been on open >> source mailing lists since the late 90s, so I've learned strategies for >> dealing with mailing list overload. I've got my mail folders, my mail rules, >> etc. Having been on many mailing lists over the years, I've seen many >> productive discussions and many unproductive ones, and over time you start >> to see patterns. You also see what happens to those communities over time. > >This is one of the major reasons we have the option of escalating >things to the PEP process (and that's currently in train for >os.urandom), as well as the SIGs for when folks really need to dig >into topics that risk incurring a relatively low signal-to-noise >ration on python-dev. It's also why python-ideas was turned into a >separate list, since folks without the time for more speculative >discussions and brainstorming can safely ignore it, while remaining >confident that any ideas considered interesting enough for further >review will be brought to python-dev's attention. > >But yes, one of the more significant design errors I've made with the >contextlib API was due to just such a draining pile-on by folks that >weren't happy the original name wasn't a 100% accurate description of >the underlying mechanics (even though it was an accurate description >of the intended use case), and "people yelling at you on project >communication channels without doing adequate research first" is the >number one reason we see otherwise happily engaged core developers >decide to find something else to do with their time. Yeah, the sad truth is that when you start having these problems, it's the good people that leave. The key though is not to treat this as some unsolvable problem, which honestly is what I've seen many projects do. :( My guess is that once these issues are addressed, at least some of the people who left would be willing to give it another try. I had written a couple paragraphs about some different tools and approaches that might help with that, but I think Guido's got the right idea by taking it off-list to determine the best way to move forward first. Regards, Kevin >The challenge and art in community management in that context is >balancing telling both old and new list participants "It's OK to ask >'Why is this so?', as sometimes the answer is that there isn't a good >reason and we may want to change it" and "Learn to be a good peer >manager, and avoid behaving like a micro-managing autocrat that chases >away experienced contributors". >Cheers, >Nick. > >-- >Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 19 15:39:14 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Jun 2016 12:39:14 -0700 Subject: [Python-Dev] security SIG? In-Reply-To: <576586B8.5090009@stoneleaf.us> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> <576586B8.5090009@stoneleaf.us> Message-ID: On 18 June 2016 at 10:36, Ethan Furman wrote: > One of the big advantages of a SIG is the much reduced pool of participants, > and that those participants are usually interested in forward progress. It > would also be helpful to have a single person both champion and act as > buffer for the proposals (not necessarily the same person each time). I am > reminded of the matrix-multiply PEP brought forward by Nathaniel a few > months ago -- the proposal was researched outside of py-dev, presented to > py-dev when ready, Nathaniel acted as the gateway between py-dev and those > that wanted/needed the change, the discussion stayed (pretty much) on track, > and it felt like the whole thing was very smooth. (If it was somebody else, > my apologies for my terrible memory! ;) > > To sum up: I think it would be a good idea. I'm coming around to this point of view as well. import-sig, for example, is a very low traffic SIG, but I think it serves three key useful purposes: - it clearly indicates that import is a specialist topic with additional considerations to take into account that may not be obvious to developers touching the import system for the first time - it provides a forum to collaboratively craft explanations of proposed changes that should make sense to folks that *aren't* specialists - anyone that wants to become an "import system expert" can join the SIG and learn from the intermittent discussions of proposed changes distutils-sig is an example at the other end of the scale - while distutils-sig and python-dev subscribers aren't a disjoint set, those of us that fall into the intersection are a clear minority on both lists, and can act as representatives of the interests of the other group when needed. As far as names go, my vote would be for "paranoia-sig" - it nicely avoids any risk of folks submitting security bugs there instead of to the PSRT, and "We're professionally paranoid, so you don't need to be" is an apt description of good security sensitive API design in a general purpose language like Python :) Cheers, Nick. P.S. Hopefully we could get some of the Python Cryptographic Authority folks to sign up, just as distutils-sig is a point of collaboration between python-dev and PyPA. "Secure software design in Python" covers a lot more than just the standard library, since in many cases you really want to reach beyond the standard library and grab something like cryptography or passlib, or delegate the problem to a domain specific framework like Django or the relevant components of the Flask or Pyramid ecosystems. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Sun Jun 19 15:54:47 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jun 2016 12:54:47 -0700 Subject: [Python-Dev] security SIG? In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> <576586B8.5090009@stoneleaf.us> Message-ID: <5766F887.6090302@stoneleaf.us> On 06/19/2016 12:39 PM, Nick Coghlan wrote: > On 18 June 2016 at 10:36, Ethan Furman wrote: >> To sum up: I think it would be a good idea. > > I'm coming around to this point of view as well. import-sig, for > example, is a very low traffic SIG, but I think it serves three key > useful purposes: > > - it clearly indicates that import is a specialist topic with > additional considerations to take into account that may not be obvious > to developers touching the import system for the first time > - it provides a forum to collaboratively craft explanations of > proposed changes that should make sense to folks that *aren't* > specialists > - anyone that wants to become an "import system expert" can join the > SIG and learn from the intermittent discussions of proposed changes [...] > As far as names go, my vote would be for "paranoia-sig" - it nicely > avoids any risk of folks submitting security bugs there instead of to > the PSRT, and "We're professionally paranoid, so you don't need to be" > is an apt description of good security sensitive API design in a > general purpose language like Python :) Heh. I like it. If no one comes up with any other names I'll get the SIG requested mid-week-ish. -- ~Ethan~ From guido at python.org Sun Jun 19 18:51:24 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 19 Jun 2016 15:51:24 -0700 Subject: [Python-Dev] security SIG? In-Reply-To: <5766F887.6090302@stoneleaf.us> References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> <576586B8.5090009@stoneleaf.us> <5766F887.6090302@stoneleaf.us> Message-ID: I think it's fine to have this SIG. I could see it going different ways in terms of discussions and membership, but it's definitely worth a try. I don't like clever names, and I very much doubt that it'll be mistaken for an address to report sensitive issues, so I think it should just be security-sig. (The sensitive-issues people are usually paranoid enough to check before they post; the script kiddies reporting python.org "issues" probably will get a faster and more appropriate response from the security-sig.) So let's just do it. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sun Jun 19 21:29:45 2016 From: brett at python.org (Brett Cannon) Date: Mon, 20 Jun 2016 01:29:45 +0000 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On Sat, 18 Jun 2016 at 21:49 Guido van Rossum wrote: > Hi Brett, > > I've got a few questions about the specific design. Probably you know the > answers, it would be nice to have them in the PEP. > Once you're happy with my answers I'll update the PEP. > > First, why not have a global hook? What does a hook per interpreter give > you? Would even finer granularity buy anything? > We initially considered a per-code object hook, but we figured it was unnecessary to have that level of control, especially since people like Numba have gotten away with not needing it for this long (although I suspect that's because they are a decorator so they can just return an object that overrides __call__()). We didn't think that a global one was appropriate as different workloads may call for different JITs/debuggers/etc. and there is no guarantee that you are executing every interpreter with the same workload. Plus we figured people might simply import their JIT of choice and as a side-effect set the hook, and since imports are a per-interpreter thing that seemed to suggest the granularity of interpreters. IOW it seemed to be more in line with sys.settrace() than some global thing for the process. > > Next, I'm a bit (but no more than a bit) concerned about the extra 8 bytes > per code object, especially since for most people this is just waste > (assuming most people won't be using Pyjion or Numba). Could it be a > compile-time feature (requiring recompilation of CPython but not > extensions)? > Probably. It does water down potential usage thanks to needing a special build. If the decision is "special build or not", I would simply pull out this part of the proposal as I wouldn't want to add a flag that influences what is or is not possible for an interpreter. > Could you figure out some other way to store per-code-object data? It > seems you considered this but decided that the co_extra field was simpler > and faster; I'm basically pushing a little harder on this. Of course most > of the PEP would disappear without this feature; the extra interpreter > field is fine. > Dino and I thought of two potential alternatives, neither of which we have taken the time to implement and benchmark. One is to simply have a hash table of memory addresses to JIT data that is kept on the JIT side of things. Obviously it would be nice to avoid the overhead of a hash table lookup on every function call. This also doesn't help minimize memory when the code object gets GC'ed. The other potential solution we came up with was to use weakrefs. I have not looked into the details, but we were thinking that if we registered the JIT data object as a weakref on the code object, couldn't we iterate through the weakrefs attached to the code object to look for the JIT data object, and then get the reference that way? It would let us avoid a more expensive hash table lookup if we assume most code objects won't have a weakref on it (assuming weakrefs are stored in a list), and it gives us the proper cleanup semantics we want by getting the weakref cleanup callback execution to make sure we decref the JIT data object appropriately. But as I said, I have not looked into the feasibility of this at all to know if I'm remembering the weakref implementation details correctly. > > Finally, there are some error messages from pep2html.py: > https://www.python.org/dev/peps/pep-0523/#copyright > All fixed in https://github.com/python/peps/commit/6929f850a5af07e51d0163558a5fe8d6b85dccfe . -Brett > > > --Guido > > On Fri, Jun 17, 2016 at 7:58 PM, Brett Cannon wrote: > >> I have taken PEP 523 for this: >> https://github.com/python/peps/blob/master/pep-0523.txt . >> >> I'm waiting until Guido gets back from vacation, at which point I'll ask >> for a pronouncement or assignment of a BDFL delegate. >> >> On Fri, 3 Jun 2016 at 14:37 Brett Cannon wrote: >> >>> For those of you who follow python-ideas or were at the PyCon US 2016 >>> language summit, you have already seen/heard about this PEP. For those of >>> you who don't fall into either of those categories, this PEP proposed a >>> frame evaluation API for CPython. The motivating example of this work has >>> been Pyjion, the experimental CPython JIT Dino Viehland and I have been >>> working on in our spare time at Microsoft. The API also works for >>> debugging, though, as already demonstrated by Google having added a very >>> similar API internally for debugging purposes. >>> >>> The PEP is pasted in below and also available in rendered form at >>> https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I will assign >>> myself a PEP # once discussion is finished as it's easier to work in git >>> for this for the rich rendering of the in-progress PEP). >>> >>> I should mention that the difference from python-ideas and the language >>> summit in the PEP are the listed support from Google's use of a very >>> similar API as well as clarifying the co_extra field on code objects >>> doesn't change their immutability (at least from the view of the PEP). >>> >>> ---------- >>> PEP: NNN >>> Title: Adding a frame evaluation API to CPython >>> Version: $Revision$ >>> Last-Modified: $Date$ >>> Author: Brett Cannon , >>> Dino Viehland >>> Status: Draft >>> Type: Standards Track >>> Content-Type: text/x-rst >>> Created: 16-May-2016 >>> Post-History: 16-May-2016 >>> 03-Jun-2016 >>> >>> >>> Abstract >>> ======== >>> >>> This PEP proposes to expand CPython's C API [#c-api]_ to allow for >>> the specification of a per-interpreter function pointer to handle the >>> evaluation of frames [#pyeval_evalframeex]_. This proposal also >>> suggests adding a new field to code objects [#pycodeobject]_ to store >>> arbitrary data for use by the frame evaluation function. >>> >>> >>> Rationale >>> ========= >>> >>> One place where flexibility has been lacking in Python is in the direct >>> execution of Python code. While CPython's C API [#c-api]_ allows for >>> constructing the data going into a frame object and then evaluating it >>> via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control over the >>> execution of Python code comes down to individual objects instead of a >>> hollistic control of execution at the frame level. >>> >>> While wanting to have influence over frame evaluation may seem a bit >>> too low-level, it does open the possibility for things such as a >>> method-level JIT to be introduced into CPython without CPython itself >>> having to provide one. By allowing external C code to control frame >>> evaluation, a JIT can participate in the execution of Python code at >>> the key point where evaluation occurs. This then allows for a JIT to >>> conditionally recompile Python bytecode to machine code as desired >>> while still allowing for executing regular CPython bytecode when >>> running the JIT is not desired. This can be accomplished by allowing >>> interpreters to specify what function to call to evaluate a frame. And >>> by placing the API at the frame evaluation level it allows for a >>> complete view of the execution environment of the code for the JIT. >>> >>> This ability to specify a frame evaluation function also allows for >>> other use-cases beyond just opening CPython up to a JIT. For instance, >>> it would not be difficult to implement a tracing or profiling function >>> at the call level with this API. While CPython does provide the >>> ability to set a tracing or profiling function at the Python level, >>> this would be able to match the data collection of the profiler and >>> quite possibly be faster for tracing by simply skipping per-line >>> tracing support. >>> >>> It also opens up the possibility of debugging where the frame >>> evaluation function only performs special debugging work when it >>> detects it is about to execute a specific code object. In that >>> instance the bytecode could be theoretically rewritten in-place to >>> inject a breakpoint function call at the proper point for help in >>> debugging while not having to do a heavy-handed approach as >>> required by ``sys.settrace()``. >>> >>> To help facilitate these use-cases, we are also proposing the adding >>> of a "scratch space" on code objects via a new field. This will allow >>> per-code object data to be stored with the code object itself for easy >>> retrieval by the frame evaluation function as necessary. The field >>> itself will simply be a ``PyObject *`` type so that any data stored in >>> the field will participate in normal object memory management. >>> >>> >>> Proposal >>> ======== >>> >>> All proposed C API changes below will not be part of the stable ABI. >>> >>> >>> Expanding ``PyCodeObject`` >>> -------------------------- >>> >>> One field is to be added to the ``PyCodeObject`` struct >>> [#pycodeobject]_:: >>> >>> typedef struct { >>> ... >>> PyObject *co_extra; /* "Scratch space" for the code object. */ >>> } PyCodeObject; >>> >>> The ``co_extra`` will be ``NULL`` by default and will not be used by >>> CPython itself. Third-party code is free to use the field as desired. >>> Values stored in the field are expected to not be required in order >>> for the code object to function, allowing the loss of the data of the >>> field to be acceptable (this keeps the code object as immutable from >>> a functionality point-of-view; this is slightly contentious and so is >>> listed as an open issue in `Is co_extra needed?`_). The field will be >>> freed like all other fields on ``PyCodeObject`` during deallocation >>> using ``Py_XDECREF()``. >>> >>> It is not recommended that multiple users attempt to use the >>> ``co_extra`` simultaneously. While a dictionary could theoretically be >>> set to the field and various users could use a key specific to the >>> project, there is still the issue of key collisions as well as >>> performance degradation from using a dictionary lookup on every frame >>> evaluation. Users are expected to do a type check to make sure that >>> the field has not been previously set by someone else. >>> >>> >>> Expanding ``PyInterpreterState`` >>> -------------------------------- >>> >>> The entrypoint for the frame evalution function is per-interpreter:: >>> >>> // Same type signature as PyEval_EvalFrameEx(). >>> typedef PyObject* (__stdcall *PyFrameEvalFunction)(PyFrameObject*, >>> int); >>> >>> typedef struct { >>> ... >>> PyFrameEvalFunction eval_frame; >>> } PyInterpreterState; >>> >>> By default, the ``eval_frame`` field will be initialized to a function >>> pointer that represents what ``PyEval_EvalFrameEx()`` currently is >>> (called ``PyEval_EvalFrameDefault()``, discussed later in this PEP). >>> Third-party code may then set their own frame evaluation function >>> instead to control the execution of Python code. A pointer comparison >>> can be used to detect if the field is set to >>> ``PyEval_EvalFrameDefault()`` and thus has not been mutated yet. >>> >>> >>> Changes to ``Python/ceval.c`` >>> ----------------------------- >>> >>> ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it currently stands >>> will be renamed to ``PyEval_EvalFrameDefault()``. The new >>> ``PyEval_EvalFrameEx()`` will then become:: >>> >>> PyObject * >>> PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) >>> { >>> PyThreadState *tstate = PyThreadState_GET(); >>> return tstate->interp->eval_frame(frame, throwflag); >>> } >>> >>> This allows third-party code to place themselves directly in the path >>> of Python code execution while being backwards-compatible with code >>> already using the pre-existing C API. >>> >>> >>> Updating ``python-gdb.py`` >>> -------------------------- >>> >>> The generated ``python-gdb.py`` file used for Python support in GDB >>> makes some hard-coded assumptions about ``PyEval_EvalFrameEx()``, e.g. >>> the names of local variables. It will need to be updated to work with >>> the proposed changes. >>> >>> >>> Performance impact >>> ================== >>> >>> As this PEP is proposing an API to add pluggability, performance >>> impact is considered only in the case where no third-party code has >>> made any changes. >>> >>> Several runs of pybench [#pybench]_ consistently showed no performance >>> cost from the API change alone. >>> >>> A run of the Python benchmark suite [#py-benchmarks]_ showed no >>> measurable cost in performance. >>> >>> In terms of memory impact, since there are typically not many CPython >>> interpreters executing in a single process that means the impact of >>> ``co_extra`` being added to ``PyCodeObject`` is the only worry. >>> According to [#code-object-count]_, a run of the Python test suite >>> results in about 72,395 code objects being created. On a 64-bit >>> CPU that would result in 579,160 bytes of extra memory being used if >>> all code objects were alive at once and had nothing set in their >>> ``co_extra`` fields. >>> >>> >>> Example Usage >>> ============= >>> >>> A JIT for CPython >>> ----------------- >>> >>> Pyjion >>> '''''' >>> >>> The Pyjion project [#pyjion]_ has used this proposed API to implement >>> a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each code >>> object has its ``co_extra`` field set to a ``PyjionJittedCode`` object >>> which stores four pieces of information: >>> >>> 1. Execution count >>> 2. A boolean representing whether a previous attempt to JIT failed >>> 3. A function pointer to a trampoline (which can be type tracing or not) >>> 4. A void pointer to any JIT-compiled machine code >>> >>> The frame evaluation function has (roughly) the following algorithm:: >>> >>> def eval_frame(frame, throw_flag): >>> pyjion_code = frame.code.co_extra >>> if not pyjion_code: >>> frame.code.co_extra = PyjionJittedCode() >>> elif not pyjion_code.jit_failed: >>> if not pyjion_code.jit_code: >>> return pyjion_code.eval(pyjion_code.jit_code, frame) >>> elif pyjion_code.exec_count > 20_000: >>> if jit_compile(frame): >>> return pyjion_code.eval(pyjion_code.jit_code, frame) >>> else: >>> pyjion_code.jit_failed = True >>> pyjion_code.exec_count += 1 >>> return PyEval_EvalFrameDefault(frame, throw_flag) >>> >>> The key point, though, is that all of this work and logic is separate >>> from CPython and yet with the proposed API changes it is able to >>> provide a JIT that is compliant with Python semantics (as of this >>> writing, performance is almost equivalent to CPython without the new >>> API). This means there's nothing technically preventing others from >>> implementing their own JITs for CPython by utilizing the proposed API. >>> >>> >>> Other JITs >>> '''''''''' >>> >>> It should be mentioned that the Pyston team was consulted on an >>> earlier version of this PEP that was more JIT-specific and they were >>> not interested in utilizing the changes proposed because they want >>> control over memory layout they had no interest in directly supporting >>> CPython itself. An informal discusion with a developer on the PyPy >>> team led to a similar comment. >>> >>> Numba [#numba]_, on the other hand, suggested that they would be >>> interested in the proposed change in a post-1.0 future for >>> themselves [#numba-interest]_. >>> >>> The experimental Coconut JIT [#coconut]_ could have benefitted from >>> this PEP. In private conversations with Coconut's creator we were told >>> that our API was probably superior to the one they developed for >>> Coconut to add JIT support to CPython. >>> >>> >>> Debugging >>> --------- >>> >>> In conversations with the Python Tools for Visual Studio team (PTVS) >>> [#ptvs]_, they thought they would find these API changes useful for >>> implementing more performant debugging. As mentioned in the Rationale_ >>> section, this API would allow for switching on debugging functionality >>> only in frames where it is needed. This could allow for either >>> skipping information that ``sys.settrace()`` normally provides and >>> even go as far as to dynamically rewrite bytecode prior to execution >>> to inject e.g. breakpoints in the bytecode. >>> >>> It also turns out that Google has provided a very similar API >>> internally for years. It has been used for performant debugging >>> purposes. >>> >>> >>> Implementation >>> ============== >>> >>> A set of patches implementing the proposed API is available through >>> the Pyjion project [#pyjion]_. In its current form it has more >>> changes to CPython than just this proposed API, but that is for ease >>> of development instead of strict requirements to accomplish its goals. >>> >>> >>> Open Issues >>> =========== >>> >>> Allow ``eval_frame`` to be ``NULL`` >>> ----------------------------------- >>> >>> Currently the frame evaluation function is expected to always be set. >>> It could very easily simply default to ``NULL`` instead which would >>> signal to use ``PyEval_EvalFrameDefault()``. The current proposal of >>> not special-casing the field seemed the most straight-forward, but it >>> does require that the field not accidentally be cleared, else a crash >>> may occur. >>> >>> >>> Is co_extra needed? >>> ------------------- >>> >>> While discussing this PEP at PyCon US 2016, some core developers >>> expressed their worry of the ``co_extra`` field making code objects >>> mutable. The thinking seemed to be that having a field that was >>> mutated after the creation of the code object made the object seem >>> mutable, even though no other aspect of code objects changed. >>> >>> The view of this PEP is that the `co_extra` field doesn't change the >>> fact that code objects are immutable. The field is specified in this >>> PEP as to not contain information required to make the code object >>> usable, making it more of a caching field. It could be viewed as >>> similar to the UTF-8 cache that string objects have internally; >>> strings are still considered immutable even though they have a field >>> that is conditionally set. >>> >>> The field is also not strictly necessary. While the field greatly >>> simplifies attaching extra information to code objects, other options >>> such as keeping a mapping of code object memory addresses to what >>> would have been kept in ``co_extra`` or perhaps using a weak reference >>> of the data on the code object and then iterating through the weak >>> references until the attached data is found is possible. But obviously >>> all of these solutions are not as simple or performant as adding the >>> ``co_extra`` field. >>> >>> >>> Rejected Ideas >>> ============== >>> >>> A JIT-specific C API >>> -------------------- >>> >>> Originally this PEP was going to propose a much larger API change >>> which was more JIT-specific. After soliciting feedback from the Numba >>> team [#numba]_, though, it became clear that the API was unnecessarily >>> large. The realization was made that all that was truly needed was the >>> opportunity to provide a trampoline function to handle execution of >>> Python code that had been JIT-compiled and a way to attach that >>> compiled machine code along with other critical data to the >>> corresponding Python code object. Once it was shown that there was no >>> loss in functionality or in performance while minimizing the API >>> changes required, the proposal was changed to its current form. >>> >>> >>> References >>> ========== >>> >>> .. [#pyjion] Pyjion project >>> (https://github.com/microsoft/pyjion) >>> >>> .. [#c-api] CPython's C API >>> (https://docs.python.org/3/c-api/index.html) >>> >>> .. [#pycodeobject] ``PyCodeObject`` >>> (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) >>> >>> .. [#coreclr] .NET Core Runtime (CoreCLR) >>> (https://github.com/dotnet/coreclr) >>> >>> .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` >>> ( >>> https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx >>> ) >>> >>> .. [#pycodeobject] ``PyCodeObject`` >>> (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) >>> >>> .. [#numba] Numba >>> (http://numba.pydata.org/) >>> >>> .. [#numba-interest] numba-users mailing list: >>> "Would the C API for a JIT entrypoint being proposed by Pyjion help >>> out Numba?" >>> ( >>> https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g >>> ) >>> >>> .. [#code-object-count] [Python-Dev] Opcode cache in ceval loop >>> ( >>> https://mail.python.org/pipermail/python-dev/2016-February/143025.html) >>> >>> .. [#py-benchmarks] Python benchmark suite >>> (https://hg.python.org/benchmarks) >>> >>> .. [#pyston] Pyston >>> (http://pyston.org) >>> >>> .. [#pypy] PyPy >>> (http://pypy.org/) >>> >>> .. [#ptvs] Python Tools for Visual Studio >>> (http://microsoft.github.io/PTVS/) >>> >>> .. [#coconut] Coconut >>> (https://github.com/davidmalcolm/coconut) >>> >>> >>> Copyright >>> ========= >>> >>> This document has been placed in the public domain. >>> >>> >>> .. >>> Local Variables: >>> mode: indented-text >>> indent-tabs-mode: nil >>> sentence-end-double-space: t >>> fill-column: 70 >>> coding: utf-8 >>> End: >>> >>> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> >> > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sun Jun 19 22:14:37 2016 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 20 Jun 2016 03:14:37 +0100 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On 2016-06-20 02:29, Brett Cannon wrote: > > On Sat, 18 Jun 2016 at 21:49 Guido van Rossum > wrote: > [snip] > > Could you figure out some other way to store per-code-object data? > It seems you considered this but decided that the co_extra field was > simpler and faster; I'm basically pushing a little harder on this. > Of course most of the PEP would disappear without this feature; the > extra interpreter field is fine. > > Dino and I thought of two potential alternatives, neither of which we > have taken the time to implement and benchmark. One is to simply have a > hash table of memory addresses to JIT data that is kept on the JIT side > of things. Obviously it would be nice to avoid the overhead of a hash > table lookup on every function call. This also doesn't help minimize > memory when the code object gets GC'ed. > [snip] If you had a flag in co_flags that said whether it should look in the hash table, then that might reduce the overhead. From guido at python.org Sun Jun 19 22:36:36 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 19 Jun 2016 19:36:36 -0700 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On Sun, Jun 19, 2016 at 6:29 PM, Brett Cannon wrote: > > > On Sat, 18 Jun 2016 at 21:49 Guido van Rossum wrote: > >> Hi Brett, >> >> I've got a few questions about the specific design. Probably you know the >> answers, it would be nice to have them in the PEP. >> > > Once you're happy with my answers I'll update the PEP. > Soon! > > >> >> First, why not have a global hook? What does a hook per interpreter give >> you? Would even finer granularity buy anything? >> > > We initially considered a per-code object hook, but we figured it was > unnecessary to have that level of control, especially since people like > Numba have gotten away with not needing it for this long (although I > suspect that's because they are a decorator so they can just return an > object that overrides __call__()). > So they do it at the function object level? > We didn't think that a global one was appropriate as different workloads > may call for different JITs/debuggers/etc. and there is no guarantee that > you are executing every interpreter with the same workload. Plus we figured > people might simply import their JIT of choice and as a side-effect set the > hook, and since imports are a per-interpreter thing that seemed to suggest > the granularity of interpreters. > I like import as the argument here. > > IOW it seemed to be more in line with sys.settrace() than some global > thing for the process. > > >> >> Next, I'm a bit (but no more than a bit) concerned about the extra 8 >> bytes per code object, especially since for most people this is just waste >> (assuming most people won't be using Pyjion or Numba). Could it be a >> compile-time feature (requiring recompilation of CPython but not >> extensions)? >> > > Probably. It does water down potential usage thanks to needing a special > build. If the decision is "special build or not", I would simply pull out > this part of the proposal as I wouldn't want to add a flag that influences > what is or is not possible for an interpreter. > MRAB's response made me think of a possible approach: the co_extra field could be the very last field of the PyCodeObject struct and only present if a certain flag is set in co_flags. This is similar to a trick used by X11 (I know, it's long ago :-). > > >> Could you figure out some other way to store per-code-object data? It >> seems you considered this but decided that the co_extra field was simpler >> and faster; I'm basically pushing a little harder on this. Of course most >> of the PEP would disappear without this feature; the extra interpreter >> field is fine. >> > > Dino and I thought of two potential alternatives, neither of which we have > taken the time to implement and benchmark. One is to simply have a hash > table of memory addresses to JIT data that is kept on the JIT side of > things. Obviously it would be nice to avoid the overhead of a hash table > lookup on every function call. This also doesn't help minimize memory when > the code object gets GC'ed. > I guess the prospect of the extra hash lookup per call isn't great given that this is about perf... > > The other potential solution we came up with was to use weakrefs. I have > not looked into the details, but we were thinking that if we registered the > JIT data object as a weakref on the code object, couldn't we iterate > through the weakrefs attached to the code object to look for the JIT data > object, and then get the reference that way? It would let us avoid a more > expensive hash table lookup if we assume most code objects won't have a > weakref on it (assuming weakrefs are stored in a list), and it gives us the > proper cleanup semantics we want by getting the weakref cleanup callback > execution to make sure we decref the JIT data object appropriately. But as > I said, I have not looked into the feasibility of this at all to know if > I'm remembering the weakref implementation details correctly. > That would be even slower than the hash table lookup, and unbounded. So let's not go there. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at hotpy.org Mon Jun 20 00:01:16 2016 From: mark at hotpy.org (Mark Shannon) Date: Sun, 19 Jun 2016 21:01:16 -0700 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: <57676A8C.8070207@hotpy.org> On 19/06/16 18:29, Brett Cannon wrote: > > > On Sat, 18 Jun 2016 at 21:49 Guido van Rossum > wrote: > > Hi Brett, > > I've got a few questions about the specific design. Probably you > know the answers, it would be nice to have them in the PEP. > > > Once you're happy with my answers I'll update the PEP. > > > First, why not have a global hook? What does a hook per interpreter > give you? Would even finer granularity buy anything? > > > We initially considered a per-code object hook, but we figured it was > unnecessary to have that level of control, especially since people like > Numba have gotten away with not needing it for this long (although I > suspect that's because they are a decorator so they can just return an > object that overrides __call__()). We didn't think that a global one was > appropriate as different workloads may call for different > JITs/debuggers/etc. and there is no guarantee that you are executing > every interpreter with the same workload. Plus we figured people might > simply import their JIT of choice and as a side-effect set the hook, and > since imports are a per-interpreter thing that seemed to suggest the > granularity of interpreters. > > IOW it seemed to be more in line with sys.settrace() than some global > thing for the process. > > > Next, I'm a bit (but no more than a bit) concerned about the extra 8 > bytes per code object, especially since for most people this is just > waste (assuming most people won't be using Pyjion or Numba). Could > it be a compile-time feature (requiring recompilation of CPython but > not extensions)? > > > Probably. It does water down potential usage thanks to needing a special > build. If the decision is "special build or not", I would simply pull > out this part of the proposal as I wouldn't want to add a flag that > influences what is or is not possible for an interpreter. > > Could you figure out some other way to store per-code-object data? > It seems you considered this but decided that the co_extra field was > simpler and faster; I'm basically pushing a little harder on this. > Of course most of the PEP would disappear without this feature; the > extra interpreter field is fine. > > > Dino and I thought of two potential alternatives, neither of which we > have taken the time to implement and benchmark. One is to simply have a > hash table of memory addresses to JIT data that is kept on the JIT side > of things. Obviously it would be nice to avoid the overhead of a hash > table lookup on every function call. This also doesn't help minimize > memory when the code object gets GC'ed. Hash lookups aren't that slow. If you combine it with the custom flags suggested by MRAB, then you would only suffer the lookup penalty when actually entering the special interpreter. You can use a weakref callback to ensure things get GC'd properly. Also, if there is a special extra field on code-object, then everyone will want to use it. How do you handle clashes? > > The other potential solution we came up with was to use weakrefs. I have > not looked into the details, but we were thinking that if we registered > the JIT data object as a weakref on the code object, couldn't we iterate > through the weakrefs attached to the code object to look for the JIT > data object, and then get the reference that way? It would let us avoid > a more expensive hash table lookup if we assume most code objects won't > have a weakref on it (assuming weakrefs are stored in a list), and it > gives us the proper cleanup semantics we want by getting the weakref > cleanup callback execution to make sure we decref the JIT data object > appropriately. But as I said, I have not looked into the feasibility of > this at all to know if I'm remembering the weakref implementation > details correctly. > > > Finally, there are some error messages from pep2html.py: > https://www.python.org/dev/peps/pep-0523/#copyright > > > All fixed in > https://github.com/python/peps/commit/6929f850a5af07e51d0163558a5fe8d6b85dccfe . > > -Brett > > > > --Guido > > On Fri, Jun 17, 2016 at 7:58 PM, Brett Cannon > wrote: > > I have taken PEP 523 for this: > https://github.com/python/peps/blob/master/pep-0523.txt . > > I'm waiting until Guido gets back from vacation, at which point > I'll ask for a pronouncement or assignment of a BDFL delegate. > > On Fri, 3 Jun 2016 at 14:37 Brett Cannon > wrote: > > For those of you who follow python-ideas or were at the > PyCon US 2016 language summit, you have already seen/heard > about this PEP. For those of you who don't fall into either > of those categories, this PEP proposed a frame evaluation > API for CPython. The motivating example of this work has > been Pyjion, the experimental CPython JIT Dino Viehland and > I have been working on in our spare time at Microsoft. The > API also works for debugging, though, as already > demonstrated by Google having added a very similar API > internally for debugging purposes. > > The PEP is pasted in below and also available in rendered > form at > https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I > will assign myself a PEP # once discussion is finished as > it's easier to work in git for this for the rich rendering > of the in-progress PEP). > > I should mention that the difference from python-ideas and > the language summit in the PEP are the listed support from > Google's use of a very similar API as well as clarifying the > co_extra field on code objects doesn't change their > immutability (at least from the view of the PEP). > > ---------- > PEP: NNN > Title: Adding a frame evaluation API to CPython > Version: $Revision$ > Last-Modified: $Date$ > Author: Brett Cannon >, > Dino Viehland > > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 16-May-2016 > Post-History: 16-May-2016 > 03-Jun-2016 > > > Abstract > ======== > > This PEP proposes to expand CPython's C API [#c-api]_ to > allow for > the specification of a per-interpreter function pointer to > handle the > evaluation of frames [#pyeval_evalframeex]_. This proposal also > suggests adding a new field to code objects [#pycodeobject]_ > to store > arbitrary data for use by the frame evaluation function. > > > Rationale > ========= > > One place where flexibility has been lacking in Python is in > the direct > execution of Python code. While CPython's C API [#c-api]_ > allows for > constructing the data going into a frame object and then > evaluating it > via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control > over the > execution of Python code comes down to individual objects > instead of a > hollistic control of execution at the frame level. > > While wanting to have influence over frame evaluation may > seem a bit > too low-level, it does open the possibility for things such as a > method-level JIT to be introduced into CPython without > CPython itself > having to provide one. By allowing external C code to > control frame > evaluation, a JIT can participate in the execution of Python > code at > the key point where evaluation occurs. This then allows for > a JIT to > conditionally recompile Python bytecode to machine code as > desired > while still allowing for executing regular CPython bytecode when > running the JIT is not desired. This can be accomplished by > allowing > interpreters to specify what function to call to evaluate a > frame. And > by placing the API at the frame evaluation level it allows for a > complete view of the execution environment of the code for > the JIT. > > This ability to specify a frame evaluation function also > allows for > other use-cases beyond just opening CPython up to a JIT. For > instance, > it would not be difficult to implement a tracing or > profiling function > at the call level with this API. While CPython does provide the > ability to set a tracing or profiling function at the Python > level, > this would be able to match the data collection of the > profiler and > quite possibly be faster for tracing by simply skipping per-line > tracing support. > > It also opens up the possibility of debugging where the frame > evaluation function only performs special debugging work when it > detects it is about to execute a specific code object. In that > instance the bytecode could be theoretically rewritten > in-place to > inject a breakpoint function call at the proper point for > help in > debugging while not having to do a heavy-handed approach as > required by ``sys.settrace()``. > > To help facilitate these use-cases, we are also proposing > the adding > of a "scratch space" on code objects via a new field. This > will allow > per-code object data to be stored with the code object > itself for easy > retrieval by the frame evaluation function as necessary. The > field > itself will simply be a ``PyObject *`` type so that any data > stored in > the field will participate in normal object memory management. > > > Proposal > ======== > > All proposed C API changes below will not be part of the > stable ABI. > > > Expanding ``PyCodeObject`` > -------------------------- > > One field is to be added to the ``PyCodeObject`` struct > [#pycodeobject]_:: > > typedef struct { > ... > PyObject *co_extra; /* "Scratch space" for the code > object. */ > } PyCodeObject; > > The ``co_extra`` will be ``NULL`` by default and will not be > used by > CPython itself. Third-party code is free to use the field as > desired. > Values stored in the field are expected to not be required > in order > for the code object to function, allowing the loss of the > data of the > field to be acceptable (this keeps the code object as > immutable from > a functionality point-of-view; this is slightly contentious > and so is > listed as an open issue in `Is co_extra needed?`_). The > field will be > freed like all other fields on ``PyCodeObject`` during > deallocation > using ``Py_XDECREF()``. > > It is not recommended that multiple users attempt to use the > ``co_extra`` simultaneously. While a dictionary could > theoretically be > set to the field and various users could use a key specific > to the > project, there is still the issue of key collisions as well as > performance degradation from using a dictionary lookup on > every frame > evaluation. Users are expected to do a type check to make > sure that > the field has not been previously set by someone else. > > > Expanding ``PyInterpreterState`` > -------------------------------- > > The entrypoint for the frame evalution function is > per-interpreter:: > > // Same type signature as PyEval_EvalFrameEx(). > typedef PyObject* (__stdcall > *PyFrameEvalFunction)(PyFrameObject*, int); > > typedef struct { > ... > PyFrameEvalFunction eval_frame; > } PyInterpreterState; > > By default, the ``eval_frame`` field will be initialized to > a function > pointer that represents what ``PyEval_EvalFrameEx()`` > currently is > (called ``PyEval_EvalFrameDefault()``, discussed later in > this PEP). > Third-party code may then set their own frame evaluation > function > instead to control the execution of Python code. A pointer > comparison > can be used to detect if the field is set to > ``PyEval_EvalFrameDefault()`` and thus has not been mutated yet. > > > Changes to ``Python/ceval.c`` > ----------------------------- > > ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it > currently stands > will be renamed to ``PyEval_EvalFrameDefault()``. The new > ``PyEval_EvalFrameEx()`` will then become:: > > PyObject * > PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) > { > PyThreadState *tstate = PyThreadState_GET(); > return tstate->interp->eval_frame(frame, throwflag); > } > > This allows third-party code to place themselves directly in > the path > of Python code execution while being backwards-compatible > with code > already using the pre-existing C API. > > > Updating ``python-gdb.py`` > -------------------------- > > The generated ``python-gdb.py`` file used for Python support > in GDB > makes some hard-coded assumptions about > ``PyEval_EvalFrameEx()``, e.g. > the names of local variables. It will need to be updated to > work with > the proposed changes. > > > Performance impact > ================== > > As this PEP is proposing an API to add pluggability, performance > impact is considered only in the case where no third-party > code has > made any changes. > > Several runs of pybench [#pybench]_ consistently showed no > performance > cost from the API change alone. > > A run of the Python benchmark suite [#py-benchmarks]_ showed no > measurable cost in performance. > > In terms of memory impact, since there are typically not > many CPython > interpreters executing in a single process that means the > impact of > ``co_extra`` being added to ``PyCodeObject`` is the only worry. > According to [#code-object-count]_, a run of the Python test > suite > results in about 72,395 code objects being created. On a 64-bit > CPU that would result in 579,160 bytes of extra memory being > used if > all code objects were alive at once and had nothing set in their > ``co_extra`` fields. > > > Example Usage > ============= > > A JIT for CPython > ----------------- > > Pyjion > '''''' > > The Pyjion project [#pyjion]_ has used this proposed API to > implement > a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each code > object has its ``co_extra`` field set to a > ``PyjionJittedCode`` object > which stores four pieces of information: > > 1. Execution count > 2. A boolean representing whether a previous attempt to JIT > failed > 3. A function pointer to a trampoline (which can be type > tracing or not) > 4. A void pointer to any JIT-compiled machine code > > The frame evaluation function has (roughly) the following > algorithm:: > > def eval_frame(frame, throw_flag): > pyjion_code = frame.code.co_extra > if not pyjion_code: > frame.code.co_extra = PyjionJittedCode() > elif not pyjion_code.jit_failed: > if not pyjion_code.jit_code: > return > pyjion_code.eval(pyjion_code.jit_code, frame) > elif pyjion_code.exec_count > 20_000: > if jit_compile(frame): > return > pyjion_code.eval(pyjion_code.jit_code, frame) > else: > pyjion_code.jit_failed = True > pyjion_code.exec_count += 1 > return PyEval_EvalFrameDefault(frame, throw_flag) > > The key point, though, is that all of this work and logic is > separate > from CPython and yet with the proposed API changes it is able to > provide a JIT that is compliant with Python semantics (as of > this > writing, performance is almost equivalent to CPython without > the new > API). This means there's nothing technically preventing > others from > implementing their own JITs for CPython by utilizing the > proposed API. > > > Other JITs > '''''''''' > > It should be mentioned that the Pyston team was consulted on an > earlier version of this PEP that was more JIT-specific and > they were > not interested in utilizing the changes proposed because > they want > control over memory layout they had no interest in directly > supporting > CPython itself. An informal discusion with a developer on > the PyPy > team led to a similar comment. > > Numba [#numba]_, on the other hand, suggested that they would be > interested in the proposed change in a post-1.0 future for > themselves [#numba-interest]_. > > The experimental Coconut JIT [#coconut]_ could have > benefitted from > this PEP. In private conversations with Coconut's creator we > were told > that our API was probably superior to the one they developed for > Coconut to add JIT support to CPython. > > > Debugging > --------- > > In conversations with the Python Tools for Visual Studio > team (PTVS) > [#ptvs]_, they thought they would find these API changes > useful for > implementing more performant debugging. As mentioned in the > Rationale_ > section, this API would allow for switching on debugging > functionality > only in frames where it is needed. This could allow for either > skipping information that ``sys.settrace()`` normally > provides and > even go as far as to dynamically rewrite bytecode prior to > execution > to inject e.g. breakpoints in the bytecode. > > It also turns out that Google has provided a very similar API > internally for years. It has been used for performant debugging > purposes. > > > Implementation > ============== > > A set of patches implementing the proposed API is available > through > the Pyjion project [#pyjion]_. In its current form it has more > changes to CPython than just this proposed API, but that is > for ease > of development instead of strict requirements to accomplish > its goals. > > > Open Issues > =========== > > Allow ``eval_frame`` to be ``NULL`` > ----------------------------------- > > Currently the frame evaluation function is expected to > always be set. > It could very easily simply default to ``NULL`` instead > which would > signal to use ``PyEval_EvalFrameDefault()``. The current > proposal of > not special-casing the field seemed the most > straight-forward, but it > does require that the field not accidentally be cleared, > else a crash > may occur. > > > Is co_extra needed? > ------------------- > > While discussing this PEP at PyCon US 2016, some core developers > expressed their worry of the ``co_extra`` field making code > objects > mutable. The thinking seemed to be that having a field that was > mutated after the creation of the code object made the > object seem > mutable, even though no other aspect of code objects changed. > > The view of this PEP is that the `co_extra` field doesn't > change the > fact that code objects are immutable. The field is specified > in this > PEP as to not contain information required to make the code > object > usable, making it more of a caching field. It could be viewed as > similar to the UTF-8 cache that string objects have internally; > strings are still considered immutable even though they have > a field > that is conditionally set. > > The field is also not strictly necessary. While the field > greatly > simplifies attaching extra information to code objects, > other options > such as keeping a mapping of code object memory addresses to > what > would have been kept in ``co_extra`` or perhaps using a weak > reference > of the data on the code object and then iterating through > the weak > references until the attached data is found is possible. But > obviously > all of these solutions are not as simple or performant as > adding the > ``co_extra`` field. > > > Rejected Ideas > ============== > > A JIT-specific C API > -------------------- > > Originally this PEP was going to propose a much larger API > change > which was more JIT-specific. After soliciting feedback from > the Numba > team [#numba]_, though, it became clear that the API was > unnecessarily > large. The realization was made that all that was truly > needed was the > opportunity to provide a trampoline function to handle > execution of > Python code that had been JIT-compiled and a way to attach that > compiled machine code along with other critical data to the > corresponding Python code object. Once it was shown that > there was no > loss in functionality or in performance while minimizing the API > changes required, the proposal was changed to its current form. > > > References > ========== > > .. [#pyjion] Pyjion project > (https://github.com/microsoft/pyjion) > > .. [#c-api] CPython's C API > (https://docs.python.org/3/c-api/index.html) > > .. [#pycodeobject] ``PyCodeObject`` > (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > .. [#coreclr] .NET Core Runtime (CoreCLR) > (https://github.com/dotnet/coreclr) > > .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` > > (https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx) > > .. [#pycodeobject] ``PyCodeObject`` > (https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > .. [#numba] Numba > (http://numba.pydata.org/) > > .. [#numba-interest] numba-users mailing list: > "Would the C API for a JIT entrypoint being proposed by > Pyjion help out Numba?" > > (https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g) > > .. [#code-object-count] [Python-Dev] Opcode cache in ceval loop > > (https://mail.python.org/pipermail/python-dev/2016-February/143025.html) > > .. [#py-benchmarks] Python benchmark suite > (https://hg.python.org/benchmarks) > > .. [#pyston] Pyston > (http://pyston.org) > > .. [#pypy] PyPy > (http://pypy.org/) > > .. [#ptvs] Python Tools for Visual Studio > (http://microsoft.github.io/PTVS/) > > .. [#coconut] Coconut > (https://github.com/davidmalcolm/coconut) > > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido ) > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org > From ethan at stoneleaf.us Mon Jun 20 02:17:53 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jun 2016 23:17:53 -0700 Subject: [Python-Dev] security SIG? In-Reply-To: References: <90F89145-596F-403D-B789-59E4DA866491@theolliviers.com> <52FF5A38-7AD3-4C8D-9248-FE1FFFA6A6C6@theolliviers.com> <5B4E973C-B09E-487E-9074-3B42DC773B99@lukasa.co.uk> <576586B8.5090009@stoneleaf.us> <5766F887.6090302@stoneleaf.us> Message-ID: <57678A91.9010704@stoneleaf.us> On 06/19/2016 03:51 PM, Guido van Rossum wrote: > I think it's fine to have this SIG. I could see it going different ways > in terms of discussions and membership, but it's definitely worth a try. > I don't like clever names, and I very much doubt that it'll be mistaken > for an address to report sensitive issues, so I think it should just be > security-sig. (The sensitive-issues people are usually paranoid enough > to check before they post; the script kiddies reporting python.org > "issues" probably will get a faster and more > appropriate response from the security-sig.) > > So let's just do it. Started the process of creating "security-sig". -- ~Ethan~ From guido at python.org Mon Jun 20 11:49:36 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jun 2016 08:49:36 -0700 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: I agree it's better to define the order as computed at runtime. I don't think there's much of a point to mandate that all builtin/extension types reveal their order too -- I doubt there will be many uses for that -- but I don't want to disallow it either. But we can allow types to define this, as long as it's in their documentation (so users can rely on it in those cases). As another point of review, I don't like the exception for dunder names. I can see that __module__, __name__ etc. are distractions, but since you're adding methods, you should also add methods with dunder names. The overlap with PEP 487 makes me think that this feature is clearly desirable (I like the name you give it in PEP 520 better, and PEP 487 is too vague about its definition). Finally, it seems someone is working on making all dicts ordered. Does that mean this will soon be obsolete? On Fri, Jun 17, 2016 at 6:32 PM, Nick Coghlan wrote: > On 7 June 2016 at 17:50, Eric Snow wrote: > > Why is __definition_order__ even necessary? > > ------------------------------------------- > > > > Since the definition order is not preserved in ``__dict__``, it would be > > lost once class definition execution completes. Classes *could* > > explicitly set the attribute as the last thing in the body. However, > > then independent decorators could only make use of classes that had done > > so. Instead, ``__definition_order__`` preserves this one bit of info > > from the class body so that it is universally available. > > The discussion in the PEP 487 thread made me realise that I'd like to > see a discussion in PEP 520 regarding whether or not to define > __definition_order__ for builtin types initialised via PyType_Ready or > created via PyType_FromSpec in addition to defining it for types > created via the class statement or types.new_class(). > > For static types, PyType_Ready could potentially set it based on > tp_members, tp_methods & tp_getset (see > https://docs.python.org/3/c-api/typeobj.html ) > Similarly, PyType_FromSpec could potentially set it based on the > contents of Py_tp_members, Py_tp_methods and Py_tp_getset slot > definitions > > Having definition order support in both types.new_class() and builtin > types would also make it clear why we can't rely purely on the > compiler to provide the necessary ordering information - in both of > those cases, the Python compiler isn't directly involved in the type > creation process. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Jun 20 12:32:36 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jun 2016 09:32:36 -0700 Subject: [Python-Dev] Review of PEP 520: Ordered Class Definition Namespace Message-ID: PEP 520 review notes. (From previous message; edited.) - I agree it's better to define the order as computed at runtime. - I don't think there's much of a point to mandate that all builtin/extension types reveal their order too -- I doubt there will be many uses for that -- but I don't want to disallow it either. We can allow types to define this, as long as it's in their documentation (so users can rely on it in those cases). - I don't like the exception for dunder names. I can see that __module__, __name__ etc. that occur in every class are distractions, but since you're adding methods, you should also add methods with dunder names like __init__ or __getattr__. (Otherwise, what if someone really wanted to create a Django form with a field named __dunder__?) - The overlap with PEP 487 makes me think that this feature is clearly desirable (I like the name you give it in PEP 520 better, and PEP 487 is too vague about its definition). - It seems someone is working on making all dicts ordered. Does that mean this will soon be obsolete? It would be a shame if we ended up having to give every class an extra attribute that is just a materialization of C.__dict__.keys() with (some) dunder names filtered out. (New) - It's a shame we can't just make __dict__ (a proxy to) an OrderedDict, then we wouldn't need an extra attribute. Hm, maybe we could customize the proxy class so its keys(), values(), items() views return things in the order of __definition_order__? (In the tracker discussion this was considered a big deal, but given that a class __dict__ is already a readonly proxy I'm not sure I agree. Or is this about C level access? How much of that would break?) - I don't see why it needs to be a read-only attribute. There are very few of those -- in general we let users play around with things unless we have a hard reason to restrict assignment (e.g. the interpreter's internal state could be compromised). I don't see such a hard reason here. - All in all the motivation is fairly weak -- it seems to be mostly motivated on avoiding a custom metaclass for this purpose because combining metaclasses is a pain. I realize it's only a small patch in a small corner of the language, but I do worry about repercussions -- it's an API that's going to be used for new (and useful) purposes so we will never be able to get rid of it. Note: I'm neither accepting nor rejecting the PEP; I'm merely inviting more contemplation. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Jun 20 12:48:34 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jun 2016 09:48:34 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Thu, Jun 16, 2016 at 3:24 PM, Nikita Nemkin wrote: > On Fri, Jun 17, 2016 at 2:36 AM, Nick Coghlan wrote: > > On 16 June 2016 at 14:17, Martin Teichmann > wrote: > > > An implementation like PyPy, with an inherently ordered standard dict > > implementation, can just rely on that rather than being obliged to > > switch to their full collections.OrderedDict type. > > I didin't know that PyPy has actually implemented packed ordered dicts! > > https://morepypy.blogspot.ru/2015/01/faster-more-memory-efficient-and-more.html > https://mail.python.org/pipermail/python-dev/2012-December/123028.html > > This old idea by Raymond Hettinger is vastly superior to > __definition_order__ duct tape (now that PyPy has validated it). > It also gives kwarg order for free, which is important in many > metaprogramming scenarios. > Not to mention memory usage reduction and dict operations speedup... > That idea is only vastly superior if we want to force all other Python implementations to also have an order-preserving dict with the same semantics and API. I'd like to hear more about your metaprogramming scenarios -- often such things end up being code the author is ashamed of. Perhaps they should stay in the shadows? Or could we do something to make it so you won't have to be ashamed of it? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From nikita at nemkin.ru Mon Jun 20 14:31:54 2016 From: nikita at nemkin.ru (Nikita Nemkin) Date: Mon, 20 Jun 2016 23:31:54 +0500 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Mon, Jun 20, 2016 at 9:48 PM, Guido van Rossum wrote: > On Thu, Jun 16, 2016 at 3:24 PM, Nikita Nemkin wrote: >> >> I didin't know that PyPy has actually implemented packed ordered dicts! >> >> https://morepypy.blogspot.ru/2015/01/faster-more-memory-efficient-and-more.html >> https://mail.python.org/pipermail/python-dev/2012-December/123028.html >> >> This old idea by Raymond Hettinger is vastly superior to >> __definition_order__ duct tape (now that PyPy has validated it). >> It also gives kwarg order for free, which is important in many >> metaprogramming scenarios. >> Not to mention memory usage reduction and dict operations speedup... > > > That idea is only vastly superior if we want to force all other Python > implementations to also have an order-preserving dict with the same > semantics and API. Right. Ordered by default is a very serious implementation constraint. It's only superior in a sense that it completely subsumes/obsoletes PEP 520. > I'd like to hear more about your metaprogramming scenarios -- often such > things end up being code the author is ashamed of. Perhaps they should stay > in the shadows? Or could we do something to make it so you won't have to be > ashamed of it? What I meant is embedding declarative domain-specific languages in Python. Examples of such languages include SQL table definitions, binary data definitions (in-memory C structs or wire protocol), GUI definitions (look up enaml for an interesting example), etc. etc. DSLs are a well defined field and the point of embedding into Python is to implement in Python and to empower DSL with Python constructs for generation and logic. Basic blocks for a declarative language are lists and "objects" - groups of ordered, named fields. Representing lists is easy and elegant, commas make a tuple and [] makes a list. It's when trying to represent "objects" the issues arise. Literal dicts are "ugly" (for DSL purposes) and unordered. Lists of 2-tuples are even uglier. Py3 gave us __prepare__ for ordered class bodies, and this became a first valid option. For example, SQL table: class MyTable(SqlTable): field1 = Type1(options...) field2 = Type2() Unfortunately, class declarations don't look good when nested, and nesting is a common thing. class MainWindow: caption = "Window" class HSplit: label1 = Label(...) text1 = Text(...) You get the idea. Another option for expressing "objects" are function calls with kwargs: packet = Struct(type=uint8, length=uint32, body=Array(uint8, 'type')) Looks reasonably clean, but more often than not requires kwargs to be ordered. THIS is the scenario I was talking about. Function attributes also have a role, but being attached to function definitions, their scope is somewhat limited. Of course, all of the above is largely theoretical, for two basic reasons: 1) Python syntax/runtime is too rigid for a declarative DSL. (Specifically, _embedded_ DSL. The syntax alone can be re-used with ast.parse, but it's a different scenario.) 2) DSLs in general are grossly unpythonic, hiding loads of magic and unfamiliar semantics behind what looks like a normal Python. It's not something to be ashamed of, but the benefit rarely justifies the (maintenance) cost. To be clear: I'm NOT advocating for ordered kwargs. Embedding DSLs into Python is generally a bad idea. PS. __prepare__ enables many DSL tricks. In fact, it's difficult to imagine a use case that's not related to some attempt at DSL. Keyword-only args also help: ordered part of the definition can go into *args, while attributes/options are kw-only args. From ethan at stoneleaf.us Mon Jun 20 15:24:22 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 20 Jun 2016 12:24:22 -0700 Subject: [Python-Dev] New security-sig mailling list Message-ID: <576842E6.2030805@stoneleaf.us> has been created: https://mail.python.org/mailman/listinfo/security-sig The purpose of this list is to discuss security-related enhancements to Python while having as little impact on backwards compatibility as possible. Once a proposal is ready it will be presented to Python Dev. (This text is subject to change. ;) -- ~Ethan~ From brett at python.org Mon Jun 20 15:52:23 2016 From: brett at python.org (Brett Cannon) Date: Mon, 20 Jun 2016 19:52:23 +0000 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: <57676A8C.8070207@hotpy.org> References: <57676A8C.8070207@hotpy.org> Message-ID: On Sun, 19 Jun 2016 at 21:01 Mark Shannon wrote: > > > On 19/06/16 18:29, Brett Cannon wrote: > > > > > > On Sat, 18 Jun 2016 at 21:49 Guido van Rossum > > wrote: > > > > Hi Brett, > > > > I've got a few questions about the specific design. Probably you > > know the answers, it would be nice to have them in the PEP. > > > > > > Once you're happy with my answers I'll update the PEP. > > > > > > First, why not have a global hook? What does a hook per interpreter > > give you? Would even finer granularity buy anything? > > > > > > We initially considered a per-code object hook, but we figured it was > > unnecessary to have that level of control, especially since people like > > Numba have gotten away with not needing it for this long (although I > > suspect that's because they are a decorator so they can just return an > > object that overrides __call__()). We didn't think that a global one was > > appropriate as different workloads may call for different > > JITs/debuggers/etc. and there is no guarantee that you are executing > > every interpreter with the same workload. Plus we figured people might > > simply import their JIT of choice and as a side-effect set the hook, and > > since imports are a per-interpreter thing that seemed to suggest the > > granularity of interpreters. > > > > IOW it seemed to be more in line with sys.settrace() than some global > > thing for the process. > > > > > > Next, I'm a bit (but no more than a bit) concerned about the extra 8 > > bytes per code object, especially since for most people this is just > > waste (assuming most people won't be using Pyjion or Numba). Could > > it be a compile-time feature (requiring recompilation of CPython but > > not extensions)? > > > > > > Probably. It does water down potential usage thanks to needing a special > > build. If the decision is "special build or not", I would simply pull > > out this part of the proposal as I wouldn't want to add a flag that > > influences what is or is not possible for an interpreter. > > > > Could you figure out some other way to store per-code-object data? > > It seems you considered this but decided that the co_extra field was > > simpler and faster; I'm basically pushing a little harder on this. > > Of course most of the PEP would disappear without this feature; the > > extra interpreter field is fine. > > > > > > Dino and I thought of two potential alternatives, neither of which we > > have taken the time to implement and benchmark. One is to simply have a > > hash table of memory addresses to JIT data that is kept on the JIT side > > of things. Obviously it would be nice to avoid the overhead of a hash > > table lookup on every function call. This also doesn't help minimize > > memory when the code object gets GC'ed. > > Hash lookups aren't that slow. There's "slow" and there's "slower". > If you combine it with the custom flags > suggested by MRAB, then you would only suffer the lookup penalty when > actually entering the special interpreter. > You actually will always need the lookup in the JIT case to increment the execution count if you're not always immediately JIT-ing. That means MRAB's flag won't necessarily be that useful in the JIT case (it could in the debugging case, though, if you're really aiming for the fastest debugger possible). > You can use a weakref callback to ensure things get GC'd properly. > Yes, that was already the plan if we lost co_extra. > > Also, if there is a special extra field on code-object, then everyone > will want to use it. How do you handle clashes? > As already explained in the PEP in https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject, like consenting adults. The expectation is that there will not be multiple users of the object at the same time. -Brett > > > > > The other potential solution we came up with was to use weakrefs. I have > > not looked into the details, but we were thinking that if we registered > > the JIT data object as a weakref on the code object, couldn't we iterate > > through the weakrefs attached to the code object to look for the JIT > > data object, and then get the reference that way? It would let us avoid > > a more expensive hash table lookup if we assume most code objects won't > > have a weakref on it (assuming weakrefs are stored in a list), and it > > gives us the proper cleanup semantics we want by getting the weakref > > cleanup callback execution to make sure we decref the JIT data object > > appropriately. But as I said, I have not looked into the feasibility of > > this at all to know if I'm remembering the weakref implementation > > details correctly. > > > > > > Finally, there are some error messages from pep2html.py: > > https://www.python.org/dev/peps/pep-0523/#copyright > > > > > > All fixed in > > > https://github.com/python/peps/commit/6929f850a5af07e51d0163558a5fe8d6b85dccfe > . > > > > -Brett > > > > > > > > --Guido > > > > On Fri, Jun 17, 2016 at 7:58 PM, Brett Cannon > > wrote: > > > > I have taken PEP 523 for this: > > https://github.com/python/peps/blob/master/pep-0523.txt . > > > > I'm waiting until Guido gets back from vacation, at which point > > I'll ask for a pronouncement or assignment of a BDFL delegate. > > > > On Fri, 3 Jun 2016 at 14:37 Brett Cannon > > wrote: > > > > For those of you who follow python-ideas or were at the > > PyCon US 2016 language summit, you have already seen/heard > > about this PEP. For those of you who don't fall into either > > of those categories, this PEP proposed a frame evaluation > > API for CPython. The motivating example of this work has > > been Pyjion, the experimental CPython JIT Dino Viehland and > > I have been working on in our spare time at Microsoft. The > > API also works for debugging, though, as already > > demonstrated by Google having added a very similar API > > internally for debugging purposes. > > > > The PEP is pasted in below and also available in rendered > > form at > > https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I > > will assign myself a PEP # once discussion is finished as > > it's easier to work in git for this for the rich rendering > > of the in-progress PEP). > > > > I should mention that the difference from python-ideas and > > the language summit in the PEP are the listed support from > > Google's use of a very similar API as well as clarifying the > > co_extra field on code objects doesn't change their > > immutability (at least from the view of the PEP). > > > > ---------- > > PEP: NNN > > Title: Adding a frame evaluation API to CPython > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Brett Cannon > >, > > Dino Viehland > > > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 16-May-2016 > > Post-History: 16-May-2016 > > 03-Jun-2016 > > > > > > Abstract > > ======== > > > > This PEP proposes to expand CPython's C API [#c-api]_ to > > allow for > > the specification of a per-interpreter function pointer to > > handle the > > evaluation of frames [#pyeval_evalframeex]_. This proposal > also > > suggests adding a new field to code objects [#pycodeobject]_ > > to store > > arbitrary data for use by the frame evaluation function. > > > > > > Rationale > > ========= > > > > One place where flexibility has been lacking in Python is in > > the direct > > execution of Python code. While CPython's C API [#c-api]_ > > allows for > > constructing the data going into a frame object and then > > evaluating it > > via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control > > over the > > execution of Python code comes down to individual objects > > instead of a > > hollistic control of execution at the frame level. > > > > While wanting to have influence over frame evaluation may > > seem a bit > > too low-level, it does open the possibility for things such > as a > > method-level JIT to be introduced into CPython without > > CPython itself > > having to provide one. By allowing external C code to > > control frame > > evaluation, a JIT can participate in the execution of Python > > code at > > the key point where evaluation occurs. This then allows for > > a JIT to > > conditionally recompile Python bytecode to machine code as > > desired > > while still allowing for executing regular CPython bytecode > when > > running the JIT is not desired. This can be accomplished by > > allowing > > interpreters to specify what function to call to evaluate a > > frame. And > > by placing the API at the frame evaluation level it allows > for a > > complete view of the execution environment of the code for > > the JIT. > > > > This ability to specify a frame evaluation function also > > allows for > > other use-cases beyond just opening CPython up to a JIT. For > > instance, > > it would not be difficult to implement a tracing or > > profiling function > > at the call level with this API. While CPython does provide > the > > ability to set a tracing or profiling function at the Python > > level, > > this would be able to match the data collection of the > > profiler and > > quite possibly be faster for tracing by simply skipping > per-line > > tracing support. > > > > It also opens up the possibility of debugging where the frame > > evaluation function only performs special debugging work > when it > > detects it is about to execute a specific code object. In > that > > instance the bytecode could be theoretically rewritten > > in-place to > > inject a breakpoint function call at the proper point for > > help in > > debugging while not having to do a heavy-handed approach as > > required by ``sys.settrace()``. > > > > To help facilitate these use-cases, we are also proposing > > the adding > > of a "scratch space" on code objects via a new field. This > > will allow > > per-code object data to be stored with the code object > > itself for easy > > retrieval by the frame evaluation function as necessary. The > > field > > itself will simply be a ``PyObject *`` type so that any data > > stored in > > the field will participate in normal object memory > management. > > > > > > Proposal > > ======== > > > > All proposed C API changes below will not be part of the > > stable ABI. > > > > > > Expanding ``PyCodeObject`` > > -------------------------- > > > > One field is to be added to the ``PyCodeObject`` struct > > [#pycodeobject]_:: > > > > typedef struct { > > ... > > PyObject *co_extra; /* "Scratch space" for the code > > object. */ > > } PyCodeObject; > > > > The ``co_extra`` will be ``NULL`` by default and will not be > > used by > > CPython itself. Third-party code is free to use the field as > > desired. > > Values stored in the field are expected to not be required > > in order > > for the code object to function, allowing the loss of the > > data of the > > field to be acceptable (this keeps the code object as > > immutable from > > a functionality point-of-view; this is slightly contentious > > and so is > > listed as an open issue in `Is co_extra needed?`_). The > > field will be > > freed like all other fields on ``PyCodeObject`` during > > deallocation > > using ``Py_XDECREF()``. > > > > It is not recommended that multiple users attempt to use the > > ``co_extra`` simultaneously. While a dictionary could > > theoretically be > > set to the field and various users could use a key specific > > to the > > project, there is still the issue of key collisions as well > as > > performance degradation from using a dictionary lookup on > > every frame > > evaluation. Users are expected to do a type check to make > > sure that > > the field has not been previously set by someone else. > > > > > > Expanding ``PyInterpreterState`` > > -------------------------------- > > > > The entrypoint for the frame evalution function is > > per-interpreter:: > > > > // Same type signature as PyEval_EvalFrameEx(). > > typedef PyObject* (__stdcall > > *PyFrameEvalFunction)(PyFrameObject*, int); > > > > typedef struct { > > ... > > PyFrameEvalFunction eval_frame; > > } PyInterpreterState; > > > > By default, the ``eval_frame`` field will be initialized to > > a function > > pointer that represents what ``PyEval_EvalFrameEx()`` > > currently is > > (called ``PyEval_EvalFrameDefault()``, discussed later in > > this PEP). > > Third-party code may then set their own frame evaluation > > function > > instead to control the execution of Python code. A pointer > > comparison > > can be used to detect if the field is set to > > ``PyEval_EvalFrameDefault()`` and thus has not been mutated > yet. > > > > > > Changes to ``Python/ceval.c`` > > ----------------------------- > > > > ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it > > currently stands > > will be renamed to ``PyEval_EvalFrameDefault()``. The new > > ``PyEval_EvalFrameEx()`` will then become:: > > > > PyObject * > > PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) > > { > > PyThreadState *tstate = PyThreadState_GET(); > > return tstate->interp->eval_frame(frame, throwflag); > > } > > > > This allows third-party code to place themselves directly in > > the path > > of Python code execution while being backwards-compatible > > with code > > already using the pre-existing C API. > > > > > > Updating ``python-gdb.py`` > > -------------------------- > > > > The generated ``python-gdb.py`` file used for Python support > > in GDB > > makes some hard-coded assumptions about > > ``PyEval_EvalFrameEx()``, e.g. > > the names of local variables. It will need to be updated to > > work with > > the proposed changes. > > > > > > Performance impact > > ================== > > > > As this PEP is proposing an API to add pluggability, > performance > > impact is considered only in the case where no third-party > > code has > > made any changes. > > > > Several runs of pybench [#pybench]_ consistently showed no > > performance > > cost from the API change alone. > > > > A run of the Python benchmark suite [#py-benchmarks]_ showed > no > > measurable cost in performance. > > > > In terms of memory impact, since there are typically not > > many CPython > > interpreters executing in a single process that means the > > impact of > > ``co_extra`` being added to ``PyCodeObject`` is the only > worry. > > According to [#code-object-count]_, a run of the Python test > > suite > > results in about 72,395 code objects being created. On a > 64-bit > > CPU that would result in 579,160 bytes of extra memory being > > used if > > all code objects were alive at once and had nothing set in > their > > ``co_extra`` fields. > > > > > > Example Usage > > ============= > > > > A JIT for CPython > > ----------------- > > > > Pyjion > > '''''' > > > > The Pyjion project [#pyjion]_ has used this proposed API to > > implement > > a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each > code > > object has its ``co_extra`` field set to a > > ``PyjionJittedCode`` object > > which stores four pieces of information: > > > > 1. Execution count > > 2. A boolean representing whether a previous attempt to JIT > > failed > > 3. A function pointer to a trampoline (which can be type > > tracing or not) > > 4. A void pointer to any JIT-compiled machine code > > > > The frame evaluation function has (roughly) the following > > algorithm:: > > > > def eval_frame(frame, throw_flag): > > pyjion_code = frame.code.co_extra > > if not pyjion_code: > > frame.code.co_extra = PyjionJittedCode() > > elif not pyjion_code.jit_failed: > > if not pyjion_code.jit_code: > > return > > pyjion_code.eval(pyjion_code.jit_code, frame) > > elif pyjion_code.exec_count > 20_000: > > if jit_compile(frame): > > return > > pyjion_code.eval(pyjion_code.jit_code, frame) > > else: > > pyjion_code.jit_failed = True > > pyjion_code.exec_count += 1 > > return PyEval_EvalFrameDefault(frame, throw_flag) > > > > The key point, though, is that all of this work and logic is > > separate > > from CPython and yet with the proposed API changes it is > able to > > provide a JIT that is compliant with Python semantics (as of > > this > > writing, performance is almost equivalent to CPython without > > the new > > API). This means there's nothing technically preventing > > others from > > implementing their own JITs for CPython by utilizing the > > proposed API. > > > > > > Other JITs > > '''''''''' > > > > It should be mentioned that the Pyston team was consulted on > an > > earlier version of this PEP that was more JIT-specific and > > they were > > not interested in utilizing the changes proposed because > > they want > > control over memory layout they had no interest in directly > > supporting > > CPython itself. An informal discusion with a developer on > > the PyPy > > team led to a similar comment. > > > > Numba [#numba]_, on the other hand, suggested that they > would be > > interested in the proposed change in a post-1.0 future for > > themselves [#numba-interest]_. > > > > The experimental Coconut JIT [#coconut]_ could have > > benefitted from > > this PEP. In private conversations with Coconut's creator we > > were told > > that our API was probably superior to the one they developed > for > > Coconut to add JIT support to CPython. > > > > > > Debugging > > --------- > > > > In conversations with the Python Tools for Visual Studio > > team (PTVS) > > [#ptvs]_, they thought they would find these API changes > > useful for > > implementing more performant debugging. As mentioned in the > > Rationale_ > > section, this API would allow for switching on debugging > > functionality > > only in frames where it is needed. This could allow for > either > > skipping information that ``sys.settrace()`` normally > > provides and > > even go as far as to dynamically rewrite bytecode prior to > > execution > > to inject e.g. breakpoints in the bytecode. > > > > It also turns out that Google has provided a very similar API > > internally for years. It has been used for performant > debugging > > purposes. > > > > > > Implementation > > ============== > > > > A set of patches implementing the proposed API is available > > through > > the Pyjion project [#pyjion]_. In its current form it has > more > > changes to CPython than just this proposed API, but that is > > for ease > > of development instead of strict requirements to accomplish > > its goals. > > > > > > Open Issues > > =========== > > > > Allow ``eval_frame`` to be ``NULL`` > > ----------------------------------- > > > > Currently the frame evaluation function is expected to > > always be set. > > It could very easily simply default to ``NULL`` instead > > which would > > signal to use ``PyEval_EvalFrameDefault()``. The current > > proposal of > > not special-casing the field seemed the most > > straight-forward, but it > > does require that the field not accidentally be cleared, > > else a crash > > may occur. > > > > > > Is co_extra needed? > > ------------------- > > > > While discussing this PEP at PyCon US 2016, some core > developers > > expressed their worry of the ``co_extra`` field making code > > objects > > mutable. The thinking seemed to be that having a field that > was > > mutated after the creation of the code object made the > > object seem > > mutable, even though no other aspect of code objects changed. > > > > The view of this PEP is that the `co_extra` field doesn't > > change the > > fact that code objects are immutable. The field is specified > > in this > > PEP as to not contain information required to make the code > > object > > usable, making it more of a caching field. It could be > viewed as > > similar to the UTF-8 cache that string objects have > internally; > > strings are still considered immutable even though they have > > a field > > that is conditionally set. > > > > The field is also not strictly necessary. While the field > > greatly > > simplifies attaching extra information to code objects, > > other options > > such as keeping a mapping of code object memory addresses to > > what > > would have been kept in ``co_extra`` or perhaps using a weak > > reference > > of the data on the code object and then iterating through > > the weak > > references until the attached data is found is possible. But > > obviously > > all of these solutions are not as simple or performant as > > adding the > > ``co_extra`` field. > > > > > > Rejected Ideas > > ============== > > > > A JIT-specific C API > > -------------------- > > > > Originally this PEP was going to propose a much larger API > > change > > which was more JIT-specific. After soliciting feedback from > > the Numba > > team [#numba]_, though, it became clear that the API was > > unnecessarily > > large. The realization was made that all that was truly > > needed was the > > opportunity to provide a trampoline function to handle > > execution of > > Python code that had been JIT-compiled and a way to attach > that > > compiled machine code along with other critical data to the > > corresponding Python code object. Once it was shown that > > there was no > > loss in functionality or in performance while minimizing the > API > > changes required, the proposal was changed to its current > form. > > > > > > References > > ========== > > > > .. [#pyjion] Pyjion project > > (https://github.com/microsoft/pyjion) > > > > .. [#c-api] CPython's C API > > (https://docs.python.org/3/c-api/index.html) > > > > .. [#pycodeobject] ``PyCodeObject`` > > ( > https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > > > .. [#coreclr] .NET Core Runtime (CoreCLR) > > (https://github.com/dotnet/coreclr) > > > > .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` > > > > ( > https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx > ) > > > > .. [#pycodeobject] ``PyCodeObject`` > > ( > https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > > > .. [#numba] Numba > > (http://numba.pydata.org/) > > > > .. [#numba-interest] numba-users mailing list: > > "Would the C API for a JIT entrypoint being proposed by > > Pyjion help out Numba?" > > > > ( > https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g > ) > > > > .. [#code-object-count] [Python-Dev] Opcode cache in ceval > loop > > > > ( > https://mail.python.org/pipermail/python-dev/2016-February/143025.html) > > > > .. [#py-benchmarks] Python benchmark suite > > (https://hg.python.org/benchmarks) > > > > .. [#pyston] Pyston > > (http://pyston.org) > > > > .. [#pypy] PyPy > > (http://pypy.org/) > > > > .. [#ptvs] Python Tools for Visual Studio > > (http://microsoft.github.io/PTVS/) > > > > .. [#coconut] Coconut > > (https://github.com/davidmalcolm/coconut) > > > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > > > > > > > .. > > Local Variables: > > mode: indented-text > > indent-tabs-mode: nil > > sentence-end-double-space: t > > fill-column: 70 > > coding: utf-8 > > End: > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > > > > > -- > > --Guido van Rossum (python.org/~guido ) > > > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Jun 20 15:59:45 2016 From: brett at python.org (Brett Cannon) Date: Mon, 20 Jun 2016 19:59:45 +0000 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On Sun, 19 Jun 2016 at 19:37 Guido van Rossum wrote: > On Sun, Jun 19, 2016 at 6:29 PM, Brett Cannon wrote: > >> >> >> On Sat, 18 Jun 2016 at 21:49 Guido van Rossum wrote: >> >>> Hi Brett, >>> >>> I've got a few questions about the specific design. Probably you know >>> the answers, it would be nice to have them in the PEP. >>> >> >> Once you're happy with my answers I'll update the PEP. >> > > Soon! > > >> >> >>> >>> First, why not have a global hook? What does a hook per interpreter give >>> you? Would even finer granularity buy anything? >>> >> >> We initially considered a per-code object hook, but we figured it was >> unnecessary to have that level of control, especially since people like >> Numba have gotten away with not needing it for this long (although I >> suspect that's because they are a decorator so they can just return an >> object that overrides __call__()). >> > > So they do it at the function object level? > Yes. They use a decorator, allowing them to completely control what function object gets returned. > > >> We didn't think that a global one was appropriate as different workloads >> may call for different JITs/debuggers/etc. and there is no guarantee that >> you are executing every interpreter with the same workload. Plus we figured >> people might simply import their JIT of choice and as a side-effect set the >> hook, and since imports are a per-interpreter thing that seemed to suggest >> the granularity of interpreters. >> > > I like import as the argument here. > > >> >> IOW it seemed to be more in line with sys.settrace() than some global >> thing for the process. >> >> >>> >>> Next, I'm a bit (but no more than a bit) concerned about the extra 8 >>> bytes per code object, especially since for most people this is just waste >>> (assuming most people won't be using Pyjion or Numba). Could it be a >>> compile-time feature (requiring recompilation of CPython but not >>> extensions)? >>> >> >> Probably. It does water down potential usage thanks to needing a special >> build. If the decision is "special build or not", I would simply pull out >> this part of the proposal as I wouldn't want to add a flag that influences >> what is or is not possible for an interpreter. >> > > MRAB's response made me think of a possible approach: the co_extra field > could be the very last field of the PyCodeObject struct and only present if > a certain flag is set in co_flags. This is similar to a trick used by X11 > (I know, it's long ago :-). > But that doesn't resolve your memory worry, right? For a JIT you will have to access the memory regardless for execution count (unless Yury's patch to add caching goes in, in which case it will be provided by code objects already). > >> >>> Could you figure out some other way to store per-code-object data? It >>> seems you considered this but decided that the co_extra field was simpler >>> and faster; I'm basically pushing a little harder on this. Of course most >>> of the PEP would disappear without this feature; the extra interpreter >>> field is fine. >>> >> >> Dino and I thought of two potential alternatives, neither of which we >> have taken the time to implement and benchmark. One is to simply have a >> hash table of memory addresses to JIT data that is kept on the JIT side of >> things. Obviously it would be nice to avoid the overhead of a hash table >> lookup on every function call. This also doesn't help minimize memory when >> the code object gets GC'ed. >> > > I guess the prospect of the extra hash lookup per call isn't great given > that this is about perf... > It's not desirable, but it isn't the end of the world either. I think Dino doesn't believe it will be that big of a deal to switch to a hash table. > >> The other potential solution we came up with was to use weakrefs. I have >> not looked into the details, but we were thinking that if we registered the >> JIT data object as a weakref on the code object, couldn't we iterate >> through the weakrefs attached to the code object to look for the JIT data >> object, and then get the reference that way? It would let us avoid a more >> expensive hash table lookup if we assume most code objects won't have a >> weakref on it (assuming weakrefs are stored in a list), and it gives us the >> proper cleanup semantics we want by getting the weakref cleanup callback >> execution to make sure we decref the JIT data object appropriately. But as >> I said, I have not looked into the feasibility of this at all to know if >> I'm remembering the weakref implementation details correctly. >> > > That would be even slower than the hash table lookup, and unbounded. So > let's not go there. > OK. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Jun 20 16:12:08 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jun 2016 13:12:08 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: OK, basically you're arguing that knowing the definition order of class attributes is often useful when (ab)using Python for things like schema or form definitions. There are a few ways to go about it: 1. A hack using a global creation counter 2. Metaclass with __prepare__ 3. PEP 520 4a. Make all dicts OrderedDicts in CPython 4b. Ditto in the language standard If we can make the jump to (4b) soon enough I think we should skip PEP 520; if not, I am still hemming and hawing about whether PEP 520 has enough benefits over (2) to bother. Sorry Eric for making this so hard. The better is so often the enemy of the good. I am currently somewhere between -0 and +0 on PEP 520. I'm not sure if the work on (4a) is going to bear fruit in time for the 3.6 feature freeze ; if it goes well I think we should have a separate conversation (maybe even a PEP?) about (4b). Maybe we should ask for feedback from the Jython developers? (PyPy already has this IIUC, and IronPython seems moribund?) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Jun 20 16:18:26 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jun 2016 13:18:26 -0700 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On Mon, Jun 20, 2016 at 12:59 PM, Brett Cannon wrote: > MRAB's response made me think of a possible approach: the co_extra field >> could be the very last field of the PyCodeObject struct and only present if >> a certain flag is set in co_flags. This is similar to a trick used by X11 >> (I know, it's long ago :-) >> > > But that doesn't resolve your memory worry, right? For a JIT you will have > to access the memory regardless for execution count (unless Yury's patch to > add caching goes in, in which case it will be provided by code objects > already). > If you make the code object constructor another function pointer in the interpreter struct, you could solve this quite well IMO. An interpreter with a JIT installed would always create code objects with the co_extra field. But interpreters without a JIT would have have code objects without it. This would mean the people who aren't using a JIT at all don't pay for co_extra. The flag would still be needed so the JIT can tell when you pass it a code object that was created before the JIT was installed (or belonging to a different interpreter). Would that work? Or is it important to be able to import a lot of code and then later import+install the JIT and have it benefit the code you already imported? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Mon Jun 20 16:41:40 2016 From: christian at python.org (Christian Heimes) Date: Mon, 20 Jun 2016 22:41:40 +0200 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On 2016-06-20 22:18, Guido van Rossum wrote: > On Mon, Jun 20, 2016 at 12:59 PM, Brett Cannon > wrote: > > MRAB's response made me think of a possible approach: the > co_extra field could be the very last field of the PyCodeObject > struct and only present if a certain flag is set in co_flags. > This is similar to a trick used by X11 (I know, it's long ago :-) > > > But that doesn't resolve your memory worry, right? For a JIT you > will have to access the memory regardless for execution count > (unless Yury's patch to add caching goes in, in which case it will > be provided by code objects already). > > > If you make the code object constructor another function pointer in the > interpreter struct, you could solve this quite well IMO. An interpreter > with a JIT installed would always create code objects with the co_extra > field. But interpreters without a JIT would have have code objects > without it. This would mean the people who aren't using a JIT at all > don't pay for co_extra. The flag would still be needed so the JIT can > tell when you pass it a code object that was created before the JIT was > installed (or belonging to a different interpreter). > > Would that work? Or is it important to be able to import a lot of code > and then later import+install the JIT and have it benefit the code you > already imported? Ha, I had the same idea. co_flags has some free bits. You could store extra flags there. Is the PyCodeObject's layout part of Python's stable ABI? I'm asking because the PyCodeObject struct has a suboptimal layout. It's wasting a couple of bytes becaues it mixes int and ptr. If we move the int co_firstlineno member below the co_flags member, then the struct size shrinks by 64 bits on 64bit system -- the exact same size a PyObject *co_extras member. Christian From brett at python.org Mon Jun 20 16:50:56 2016 From: brett at python.org (Brett Cannon) Date: Mon, 20 Jun 2016 20:50:56 +0000 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On Mon, 20 Jun 2016 at 13:43 Christian Heimes wrote: > On 2016-06-20 22:18, Guido van Rossum wrote: > > On Mon, Jun 20, 2016 at 12:59 PM, Brett Cannon > > wrote: > > > > MRAB's response made me think of a possible approach: the > > co_extra field could be the very last field of the PyCodeObject > > struct and only present if a certain flag is set in co_flags. > > This is similar to a trick used by X11 (I know, it's long ago :-) > > > > > > But that doesn't resolve your memory worry, right? For a JIT you > > will have to access the memory regardless for execution count > > (unless Yury's patch to add caching goes in, in which case it will > > be provided by code objects already). > > > > > > If you make the code object constructor another function pointer in the > > interpreter struct, you could solve this quite well IMO. An interpreter > > with a JIT installed would always create code objects with the co_extra > > field. But interpreters without a JIT would have have code objects > > without it. This would mean the people who aren't using a JIT at all > > don't pay for co_extra. The flag would still be needed so the JIT can > > tell when you pass it a code object that was created before the JIT was > > installed (or belonging to a different interpreter). > > > > Would that work? Or is it important to be able to import a lot of code > > and then later import+install the JIT and have it benefit the code you > > already imported? > > Ha, I had the same idea. co_flags has some free bits. You could store > extra flags there. > > Is the PyCodeObject's layout part of Python's stable ABI? No: https://docs.python.org/3/c-api/code.html#c.PyCodeObject > I'm asking > because the PyCodeObject struct has a suboptimal layout. It's wasting a > couple of bytes because it mixes int and ptr. If we move the int > co_firstlineno member below the co_flags member, then the struct size > shrinks by 64 bits on 64bit system -- the exact same size a PyObject > *co_extras member. > :) We should probably do that reordering regardless of the result of this PEP. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dinov at microsoft.com Mon Jun 20 16:20:22 2016 From: dinov at microsoft.com (Dino Viehland) Date: Mon, 20 Jun 2016 20:20:22 +0000 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: <57676A8C.8070207@hotpy.org> References: <57676A8C.8070207@hotpy.org> Message-ID: Mark wrote: > > Dino and I thought of two potential alternatives, neither of which we > > have taken the time to implement and benchmark. One is to simply have > > a hash table of memory addresses to JIT data that is kept on the JIT > > side of things. Obviously it would be nice to avoid the overhead of a > > hash table lookup on every function call. This also doesn't help > > minimize memory when the code object gets GC'ed. > > Hash lookups aren't that slow. If you combine it with the custom flags > suggested by MRAB, then you would only suffer the lookup penalty when > actually entering the special interpreter. > You can use a weakref callback to ensure things get GC'd properly. > > Also, if there is a special extra field on code-object, then everyone will want > to use it. How do you handle clashes? This is exactly what I've started prototyping and have mostly coded up, I've just been getting randomized and haven't gotten back to it yet. It may have some impact in the short-term but my expectation is that as the JIT gets better that this will become less and less important. Currently we're just JITing one method at a time and have no inlining support. But once we can start putting guards in place and inlining across multiple function calls we will start reducing the transitions from JIT -> Function Call -> JIT and get rid of those hash table lookups entirely. And if we can't succeed at inlining then I suspect the JIT won't end up offering the performance we'd hope for. From sandranel at comcast.net Mon Jun 20 17:55:38 2016 From: sandranel at comcast.net (Sandranel) Date: Mon, 20 Jun 2016 17:55:38 -0400 Subject: [Python-Dev] Problem Message-ID: <001701d1cb3e$7badbe90$73093bb0$@comcast.net> Hi: My daughter and I are trying to update to 8.1.2,but every time we try this happens >From the API To the Python window: Please advise -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 35588 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.jpg Type: image/jpeg Size: 20791 bytes Desc: not available URL: From chris.jerdonek at gmail.com Mon Jun 20 18:02:30 2016 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Mon, 20 Jun 2016 15:02:30 -0700 Subject: [Python-Dev] New security-sig mailling list In-Reply-To: <576842E6.2030805@stoneleaf.us> References: <576842E6.2030805@stoneleaf.us> Message-ID: On Mon, Jun 20, 2016 at 12:24 PM, Ethan Furman wrote: > > has been created: > > https://mail.python.org/mailman/listinfo/security-sig > > The purpose of this list is to discuss security-related enhancements to Python while having as little impact on backwards compatibility as possible. I would recommend clarifying the relationship of the SIG to the Python Security Response Team ( https://www.python.org/news/security ), or at least clarifying that the SIG is different from the PSRT (and that security reports should not be sent to the SIG). --Chris > > Once a proposal is ready it will be presented to Python Dev. > > (This text is subject to change. ;) > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com From phd at phdru.name Mon Jun 20 18:13:14 2016 From: phd at phdru.name (Oleg Broytman) Date: Tue, 21 Jun 2016 00:13:14 +0200 Subject: [Python-Dev] Problem In-Reply-To: <001701d1cb3e$7badbe90$73093bb0$@comcast.net> References: <001701d1cb3e$7badbe90$73093bb0$@comcast.net> Message-ID: <20160620221314.GA31892@phdru.name> Hello. We are sorry but we cannot help you. This mailing list is to work on developing Python (adding new features to Python itself and fixing bugs); if you're having problems learning, understanding or using Python, please find another forum. Probably python-list/comp.lang.python mailing list/news group is the best place; there are Python developers who participate in it; you may get a faster, and probably more complete, answer there. See http://www.python.org/community/ for other lists/news groups/fora. Thank you for understanding. On Mon, Jun 20, 2016 at 05:55:38PM -0400, Sandranel wrote: > Hi: > > My daughter and I are trying to update to 8.1.2,but every time we try this > happens As for your question: the command "python -m pip install" must be run from OS command line, not from Python itself. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From guido at python.org Mon Jun 20 18:13:15 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jun 2016 15:13:15 -0700 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: <57676A8C.8070207@hotpy.org> Message-ID: Couple uses of "it" here are ambiguous -- are you saying we don't need co_extra after all, or that we can safely insist it's a dict, or...? On Mon, Jun 20, 2016 at 1:20 PM, Dino Viehland via Python-Dev < python-dev at python.org> wrote: > Mark wrote: > > > Dino and I thought of two potential alternatives, neither of which we > > > have taken the time to implement and benchmark. One is to simply have > > > a hash table of memory addresses to JIT data that is kept on the JIT > > > side of things. Obviously it would be nice to avoid the overhead of a > > > hash table lookup on every function call. This also doesn't help > > > minimize memory when the code object gets GC'ed. > > > > Hash lookups aren't that slow. If you combine it with the custom flags > > suggested by MRAB, then you would only suffer the lookup penalty when > > actually entering the special interpreter. > > You can use a weakref callback to ensure things get GC'd properly. > > > > Also, if there is a special extra field on code-object, then everyone > will want > > to use it. How do you handle clashes? > > This is exactly what I've started prototyping and have mostly coded up, > I've > just been getting randomized and haven't gotten back to it yet. > > It may have some impact in the short-term but my expectation is that as the > JIT gets better that this will become less and less important. Currently > we're > just JITing one method at a time and have no inlining support. But once we > can start putting guards in place and inlining across multiple function > calls > we will start reducing the transitions from JIT -> Function Call -> JIT > and get > rid of those hash table lookups entirely. And if we can't succeed at > inlining then > I suspect the JIT won't end up offering the performance we'd hope for. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcgoble3 at gmail.com Mon Jun 20 18:13:23 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Mon, 20 Jun 2016 18:13:23 -0400 Subject: [Python-Dev] Problem In-Reply-To: <001701d1cb3e$7badbe90$73093bb0$@comcast.net> References: <001701d1cb3e$7badbe90$73093bb0$@comcast.net> Message-ID: General questions like this belong on python-list, not python-dev. To answer your question, though, you need to run that command from the Windows Command Prompt, not from the Python interpreter. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dinov at microsoft.com Mon Jun 20 16:32:54 2016 From: dinov at microsoft.com (Dino Viehland) Date: Mon, 20 Jun 2016 20:32:54 +0000 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On Mon, Jun 20, 2016 at 12:59 PM, Brett Cannon > wrote: MRAB's response made me think of a possible approach: the co_extra field could be the very last field of the PyCodeObject struct and only present if a certain flag is set in co_flags. This is similar to a trick used by X11 (I know, it's long ago :-) But that doesn't resolve your memory worry, right? For a JIT you will have to access the memory regardless for execution count (unless Yury's patch to add caching goes in, in which case it will be provided by code objects already). If you make the code object constructor another function pointer in the interpreter struct, you could solve this quite well IMO. An interpreter with a JIT installed would always create code objects with the co_extra field. But interpreters without a JIT would have have code objects without it. This would mean the people who aren't using a JIT at all don't pay for co_extra. The flag would still be needed so the JIT can tell when you pass it a code object that was created before the JIT was installed (or belonging to a different interpreter). Would that work? Or is it important to be able to import a lot of code and then later import+install the JIT and have it benefit the code you already imported? That?s a pretty interesting idea. We actually load the JIT DLL before we execute any Python code so currently it wouldn?t have any issues with not having the full sized code objects created. But it could also let JITs store all of the info they need right there instead of having to create yet another place to track code data. And it fits in nicely with having the extra data being truly ephemeral that no one else should care about. It doesn?t help with the issue of potentially multiple consumers of that field that has been brought up before but I?m not sure how concerned we should be about that scenario anyway. I still want to check and see what the hash table overhead looks like but if that does end up looking bad I can definitely look at giving this a shot. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jun 20 18:39:04 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 20 Jun 2016 15:39:04 -0700 Subject: [Python-Dev] New security-sig mailling list In-Reply-To: References: <576842E6.2030805@stoneleaf.us> Message-ID: <57687088.5040801@stoneleaf.us> On 06/20/2016 03:02 PM, Chris Jerdonek wrote: > I would recommend clarifying the relationship of the SIG to the Python > Security Response Team ( https://www.python.org/news/security ), or at > least clarifying that the SIG is different from the PSRT (and that > security reports should not be sent to the SIG). Attempted to do so. Let me know if it can be clearer still. -- ~Ethan~ From timothy.c.delaney at gmail.com Mon Jun 20 20:41:23 2016 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 21 Jun 2016 10:41:23 +1000 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On 21 June 2016 at 06:12, Guido van Rossum wrote: > OK, basically you're arguing that knowing the definition order of class > attributes is often useful when (ab)using Python for things like schema or > form definitions. There are a few ways to go about it: > > 1. A hack using a global creation counter > > 2. Metaclass with __prepare__ > > 3. PEP 520 > 4a. Make all dicts OrderedDicts in CPython > > 4b. Ditto in the language standard > > If we can make the jump to (4b) soon enough I think we should skip PEP > 520; if not, I am still hemming and hawing about whether PEP 520 has enough > benefits over (2) to bother. > > Sorry Eric for making this so hard. The better is so often the enemy of > the good. I am currently somewhere between -0 and +0 on PEP 520. I'm not > sure if the work on (4a) is going to bear fruit in time for the 3.6 > feature freeze ; if > it goes well I think we should have a separate conversation (maybe even a > PEP?) about (4b). Maybe we should ask for feedback from the Jython > developers? (PyPy already has this IIUC, and IronPython > seems moribund?) > Although not a Jython developer, I've looked into the code a few times. The major stumbling block as I understand it will be that Jython uses a ConcurrentHashMap as the underlying structure for a dictionary. This would need to change to a concurrent LinkedHashMap, but there's no such thing in the standard library. The best option would appear to be https://github.com/ben-manes/concurrentlinkedhashmap. There are also plenty of other places that use maps and all of them would need to be looked at. In a lot of cases they're things like IdentityHashMap which may also need an ordered equivalent. There is a repo for Jython 3.5 development: https://github.com/jython/jython3 but it doesn't seem to be very active - only 11 commits in the last year (OTOH that's also in the last 3 months). Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Mon Jun 20 21:30:20 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 20 Jun 2016 19:30:20 -0600 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Fri, Jun 17, 2016 at 7:32 PM, Nick Coghlan wrote: > The discussion in the PEP 487 thread made me realise that I'd like to > see a discussion in PEP 520 regarding whether or not to define > __definition_order__ for builtin types initialised via PyType_Ready or > created via PyType_FromSpec in addition to defining it for types > created via the class statement or types.new_class(). > > For static types, PyType_Ready could potentially set it based on > tp_members, tp_methods & tp_getset (see > https://docs.python.org/3/c-api/typeobj.html ) > Similarly, PyType_FromSpec could potentially set it based on the > contents of Py_tp_members, Py_tp_methods and Py_tp_getset slot > definitions > > Having definition order support in both types.new_class() and builtin > types would also make it clear why we can't rely purely on the > compiler to provide the necessary ordering information - in both of > those cases, the Python compiler isn't directly involved in the type > creation process. I'll mention this in the PEP, but I'd rather not make it a part of the proposal. -eric From ericsnowcurrently at gmail.com Mon Jun 20 21:37:44 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 20 Jun 2016 19:37:44 -0600 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Mon, Jun 20, 2016 at 9:49 AM, Guido van Rossum wrote: > I agree it's better to define the order as computed at runtime. I don't > think there's much of a point to mandate that all builtin/extension types > reveal their order too -- I doubt there will be many uses for that -- but I > don't want to disallow it either. But we can allow types to define this, as > long as it's in their documentation (so users can rely on it in those > cases). Agreed. > > As another point of review, I don't like the exception for dunder names. I > can see that __module__, __name__ etc. are distractions, but since you're > adding methods, you should also add methods with dunder names. I still think that in practice the dunder names will be clutter that folks have to ignore. However, it's a relatively weak point given that it's easy to ignore dunder names. So I don't mind including them. > > The overlap with PEP 487 makes me think that this feature is clearly > desirable (I like the name you give it in PEP 520 better, and PEP 487 is too > vague about its definition). Agreed. > > Finally, it seems someone is working on making all dicts ordered. Does that > mean this will soon be obsolete? Nope. Having an ordered definition namespace by default does not give us __definition_order__ for free. Furthermore, the compact dict under consideration isn't strictly order-preserving (re-orders for deletion). -eric From ericsnowcurrently at gmail.com Mon Jun 20 22:11:09 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 20 Jun 2016 20:11:09 -0600 Subject: [Python-Dev] Review of PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Mon, Jun 20, 2016 at 10:32 AM, Guido van Rossum wrote: > - I don't like the exception for dunder names. I can see that __module__, > __name__ etc. that occur in every class are distractions, but since you're > adding methods, you should also add methods with dunder names like > __init__ or __getattr__. (Otherwise, what if someone really wanted to create > a Django form with a field named __dunder__?) Note that in that case they could set __definition_order__ manually in their class body. That said, I don't mind relaxing this if you think the common-case clutter is worth it for the case where a dunder name is relevant. You have a keen sense for this sort of situation. :) > - It's a shame we can't just make __dict__ (a proxy to) an OrderedDict, then > we wouldn't need an extra attribute. Hm, maybe we could customize the proxy > class so its keys(), values(), items() views return things in the order of > __definition_order__? I'm not sure it's worth it to mess with the proxy like that. Plus, I like how __definition_order__ makes it obvious what it is as well as more discoverable. > (In the tracker discussion this was considered a big > deal, but given that a class __dict__ is already a readonly proxy I'm not > sure I agree. Or is this about C level access? How much of that would > break?) I actually tried making the underlying class namespace (behind the proxy at __dict__) an OrderedDict. I ended up with a number of problems because of the pervasive use of the concrete dict API relative to the class dict. That API does not play well with subclasses. > > - I don't see why it needs to be a read-only attribute. There are very few > of those -- in general we let users play around with things unless we have a > hard reason to restrict assignment (e.g. the interpreter's internal state > could be compromised). I don't see such a hard reason here. I'm willing to change that. I figured we would start off treating it like we have other dunder attributes, some of which have become writable while others remain read-only. However, you are right that there is no danger in making it writable. > > - All in all the motivation is fairly weak -- it seems to be mostly > motivated on avoiding a custom metaclass for this purpose because combining > metaclasses is a pain. I realize it's only a small patch in a small corner > of the language, but I do worry about repercussions -- it's an API that's > going to be used for new (and useful) purposes so we will never be able to > get rid of it. True. It's certainly a very specific feature. The point is that we currently throw away the attribute order from class definitions. You can opt in to preserving the order using an appropriate metaclass. However, everything that would make use of that information (e.g. class decorators) would then have a prerequisite of that metaclass. That means such a tool could only consume classes that were designed to be used by the tool. Then there's the whole problem of metaclass conflicts (see PEP 487). If, instead, we always preserved the definition order then these problems (again, for an admittedly corner use case) go away. FWIW, regarding repercussions, I do not expect any other potential future feature will subsume the functionality of PEP 520. The closest thing would be if cls.__dict__ became ordered. However, that would intersect with __definition_order__ only at first. Furthermore, cls.__dict__ would only ever be able to make vague promises about any relationship with the definiton order. The point of __definiton_order__ is to provide the one obvious place to get a specific bit of information about a class. > > Note: I'm neither accepting nor rejecting the PEP; I'm merely inviting more > contemplation. Thanks. :) -eric From songofacandy at gmail.com Mon Jun 20 22:14:39 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Tue, 21 Jun 2016 11:14:39 +0900 Subject: [Python-Dev] PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: >> >> Finally, it seems someone is working on making all dicts ordered. Does that >> mean this will soon be obsolete? > > Nope. Having an ordered definition namespace by default does not give > us __definition_order__ for free. Furthermore, the compact dict under > consideration isn't strictly order-preserving (re-orders for > deletion). > compact ordered dict I proposed is preserves insertion order even some items are deleted. http://bugs.python.org/issue27350 Should I post PEP for compact dict? Here is my draft, but I haven't posted it yet since my English is much worse than C. https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0 From raymond.hettinger at gmail.com Mon Jun 20 22:17:00 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 20 Jun 2016 19:17:00 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: <38A44E02-93EA-44BC-A68E-0A6F490136F9@gmail.com> > On Jun 20, 2016, at 5:41 PM, Tim Delaney wrote: > > Although not a Jython developer, I've looked into the code a few times. > > The major stumbling block as I understand it will be that Jython uses a ConcurrentHashMap as the underlying structure for a dictionary. This would need to change to a concurrent LinkedHashMap, but there's no such thing in the standard library. The best option would appear to be https://github.com/ben-manes/concurrentlinkedhashmap. > > There are also plenty of other places that use maps and all of them would need to be looked at. In a lot of cases they're things like IdentityHashMap which may also need an ordered equivalent. If you can, check with Jim Baker. At the language summit a few years ago, he and I sketched out a solution that he thought was doable without much effort and without much of a performance hit. IIRC, it involved using a ConcurrentHashMap augmented by an auxiliary 2-by-n-row array of indices (one for forward links and the other for backward links). There was also need to add a reentrant lock around the mutating methods. Raymond Hettinger From ericsnowcurrently at gmail.com Mon Jun 20 22:17:04 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 20 Jun 2016 20:17:04 -0600 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 4) Message-ID: I've updated PEP 520 to reflect a clearer focus on the definition order and less emphasis on OrderedDict. -eric ======================================= PEP: 520 Title: Preserving Class Attribute Definition Order Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 7-Jun-2016 Python-Version: 3.6 Post-History: 7-Jun-2016, 11-Jun-2016, 20-Jun-2016 Abstract ======== When a class is defined using a ``class`` statement, the class body is executed within a namespace. After the execution completes, that namespace is copied into new ``dict`` and the original definition namespace is discarded. The new copy is stored away as the class's namespace and is exposed as ``__dict__`` through a read-only proxy. This PEP preserves the order in which the attributes in the definition namespace were added to it, before that namespace is discarded. This means it reflects the definition order of the class body. That order will now be preserved in the ``__definition_order__`` attribute of the class. This allows introspection of the original definition order, e.g. by class decorators. Additionally, this PEP changes the default class definition namespace to ``OrderedDict``. The long-lived class namespace (``__dict__``) will remain a ``dict``. Motivation ========== Currently Python does not preserve the order in which attributes are added to the class definition namespace. The namespace used during execution of a class body defaults to ``dict``. If the metaclass defines ``__prepare__()`` then the result of calling it is used. Thus, before this PEP, to access your class definition namespace you must use ``OrderedDict`` along with a metaclass. Then you must preserve the definition order (from the ``OrderedDict``) yourself. This has a couple of problems. First, it requires the use of a metaclass. Metaclasses introduce an extra level of complexity to code and in some cases (e.g. conflicts) are a problem. So reducing the need for them is worth doing when the opportunity presents itself. PEP 422 and PEP 487 discuss this at length. Given that we now have a C implementation of ``OrderedDict`` and that ``OrderedDict`` is the common use case for ``__prepare__()``, we have such an opportunity by defaulting to ``OrderedDict``. Second, only classes that opt in to using the ``OrderedDict``-based metaclass will have access to the definition order. This is problematic for cases where universal access to the definition order is important. One of the original motivating use cases for this PEP is generic class decorators that make use of the definition order. Specification ============= Part 1: * the order in which class attributes are defined is preserved in the new ``__definition_order__`` attribute on each class * "dunder" attributes (e.g. ``__init__``, ``__module__``) are ignored * ``__definition_order__`` is a ``tuple`` (or ``None``) * ``__definition_order__`` is a read-only attribute * ``__definition_order__`` is always set: 1. if ``__definition_order__`` is defined in the class body then it must be a ``tuple`` of identifiers or ``None``; any other value will result in ``TypeError`` 2. classes that do not have a class definition (e.g. builtins) have their ``__definition_order__`` set to ``None`` 3. classes for which `__prepare__()`` returned something other than ``OrderedDict`` (or a subclass) have their ``__definition_order__`` set to ``None`` (except where #1 applies) Part 2: * the default class *definition* namespace is now ``OrderdDict`` The following code demonstrates roughly equivalent semantics for the default behavior:: class Meta(type): @classmethod def __prepare__(cls, *args, **kwargs): return OrderedDict() class Spam(metaclass=Meta): ham = None eggs = 5 __definition_order__ = tuple(k for k in locals() if not (k.startswith('__') and k.endswith('__'))) Note that [pep487_] proposes a similar solution, albeit as part of a broader proposal. Why a tuple? ------------ Use of a tuple reflects the fact that we are exposing the order in which attributes on the class were *defined*. Since the definition is already complete by the time ``__definition_order__`` is set, the content and order of the value won't be changing. Thus we use a type that communicates that state of immutability. Why a read-only attribute? -------------------------- As with the use of tuple, making ``__definition_order__`` a read-only attribute communicates the fact that the information it represents is complete. Since it represents the state of a particular one-time event (execution of the class definition body), allowing the value to be replaced would reduce confidence that the attribute corresponds to the original class body. If a use case for a writable (or mutable) ``__definition_order__`` arises, the restriction may be loosened later. Presently this seems unlikely and furthermore it is usually best to go immutable-by-default. Note that ``__definition_order__`` is centered on the class definition body. The use cases for dealing with the class namespace (``__dict__``) post-definition are a separate matter. ``__definition_order__`` would be a significantly misleading name for a feature focused on more than class definition. See [nick_concern_] for more discussion. Why ignore "dunder" names? -------------------------- Names starting and ending with "__" are reserved for use by the interpreter. In practice they should not be relevant to the users of ``__definition_order__``. Instead, for nearly everyone they would only be clutter, causing the same extra work for everyone. Why None instead of an empty tuple? ----------------------------------- A key objective of adding ``__definition_order__`` is to preserve information in class definitions which was lost prior to this PEP. One consequence is that ``__definition_order__`` implies an original class definition. Using ``None`` allows us to clearly distinquish classes that do not have a definition order. An empty tuple clearly indicates a class that came from a definition statement but did not define any attributes there. Why None instead of not setting the attribute? ---------------------------------------------- The absence of an attribute requires more complex handling than ``None`` does for consumers of ``__definition_order__``. Why constrain manually set values? ---------------------------------- If ``__definition_order__`` is manually set in the class body then it will be used. We require it to be a tuple of identifiers (or ``None``) so that consumers of ``__definition_order__`` may have a consistent expectation for the value. That helps maximize the feature's usefulness. We could also also allow an arbitrary iterable for a manually set ``__definition_order__`` and convert it into a tuple. However, not all iterables infer a definition order (e.g. ``set``). So we opt in favor of requiring a tuple. Why is __definition_order__ even necessary? ------------------------------------------- Since the definition order is not preserved in ``__dict__``, it is lost once class definition execution completes. Classes *could* explicitly set the attribute as the last thing in the body. However, then independent decorators could only make use of classes that had done so. Instead, ``__definition_order__`` preserves this one bit of info from the class body so that it is universally available. Support for C-API Types ======================= Arguably, most C-defined Python types (e.g. built-in, extension modules) have a roughly equivalent concept of a definition order. So conceivably ``__definition_order__`` could be set for such types automatically. This PEP does not introduce any such support. However, it does not prohibit it either. Compatibility ============= This PEP does not break backward compatibility, except in the case that someone relies *strictly* on ``dict`` as the class definition namespace. This shouldn't be a problem since ``issubclass(OrderedDict, dict)`` is true. Changes ============= In addition to the class syntax, the following expose the new behavior: * builtins.__build_class__ * types.prepare_class * types.new_class Other Python Implementations ============================ Pending feedback, the impact on Python implementations is expected to be minimal. If a Python implementation cannot support switching to `OrderedDict``-by-default then it can always set ``__definition_order__`` to ``None``. Implementation ============== The implementation is found in the tracker. [impl_] Alternatives ============ cls.__dict__ as OrderedDict ------------------------------- Instead of storing the definition order in ``__definition_order__``, the now-ordered definition namespace could be copied into a new ``OrderedDict``. This would then be used as the mapping proxied as ``__dict__``. Doing so would mostly provide the same semantics. However, using ``OrderedDict`` for ``__dict__`` would obscure the relationship with the definition namespace, making it less useful. Additionally, doing this would require significant changes to the semantics of the concrete ``dict`` C-API. A "namespace" Keyword Arg for Class Definition ---------------------------------------------- PEP 422 introduced a new "namespace" keyword arg to class definitions that effectively replaces the need to ``__prepare__()``. [pep422_] However, the proposal was withdrawn in favor of the simpler PEP 487. A stdlib Metaclass that Implements __prepare__() with OrderedDict ----------------------------------------------------------------- This has all the same problems as writing your own metaclass. The only advantage is that you don't have to actually write this metaclass. So it doesn't offer any benefit in the context of this PEP. Set __definition_order__ at Compile-time ---------------------------------------- Each class's ``__qualname__`` is determined at compile-time. This same concept could be applied to ``__definition_order__``. The result of composing ``__definition_order__`` at compile-time would be nearly the same as doing so at run-time. Comparative implementation difficulty aside, the key difference would be that at compile-time it would not be practical to preserve definition order for attributes that are set dynamically in the class body (e.g. ``locals()[name] = value``). However, they should still be reflected in the definition order. One posible resolution would be to require class authors to manually set ``__definition_order__`` if they define any class attributes dynamically. Ultimately, the use of ``OrderedDict`` at run-time or compile-time discovery is almost entirely an implementation detail. References ========== .. [impl] issue #24254 (https://bugs.python.org/issue24254) .. [nick_concern] Nick's concerns about mutability (https://mail.python.org/pipermail/python-dev/2016-June/144883.html) .. [pep422] PEP 422 (https://www.python.org/dev/peps/pep-0422/#order-preserving-classes) .. [pep487] PEP 487 (https://www.python.org/dev/peps/pep-0487/#defining-arbitrary-namespaces) .. [orig] original discussion (https://mail.python.org/pipermail/python-ideas/2013-February/019690.html) .. [followup1] follow-up 1 (https://mail.python.org/pipermail/python-dev/2013-June/127103.html) .. [followup2] follow-up 2 (https://mail.python.org/pipermail/python-dev/2015-May/140137.html) Copyright =========== This document has been placed in the public domain. From turnbull at sk.tsukuba.ac.jp Sun Jun 12 02:43:20 2016 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 12 Jun 2016 15:43:20 +0900 Subject: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? In-Reply-To: <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> References: <20160609124102.5EE4EB14024@webabinitio.net> <1465476616-sup-8510@lrrr.local> <5034383A-3A95-41EE-9326-983AA0AFEDC2@stufft.io> <5759EC2B.8040208@hastings.org> <87lh2dycuo.fsf@vostro.rath.org> <20160611074013.GL27919@ando.pearwood.info> <649D18FA-5076-4A69-8433-5D8A01EE23B4@stufft.io> <9F5471E7-CA58-4B87-A6BE-297C76222BA3@stufft.io> <9BA06FA0-62F1-4491-AB57-8A1CFBF8334A@stufft.io> Message-ID: Donald Stufft writes: > I guess one question would be, what does the secrets module do if > it?s on a Linux that is too old to have getrandom(0), off the top > of my head I can think of: > > * Silently fall back to reading os.urandom and hope that it?s been > seeded. > * Fall back to os.urandom and hope that it?s been seeded and add a > SecurityWarning or something like it to mention that it?s > falling back to os.urandom and it may be getting predictable > random from /dev/urandom. > * Hard fail because it can?t guarantee secure cryptographic > random. I'm going to hide behind the Linux manpage (which actually suggests saving the data in a file to speed initialization at boot) in mentioning this: * if random_initialized_timestamp_pre_boot(): r = open("/dev/random", "rb") u = open("/dev/urandom", "wb") u.write(r.read(enough_bytes)) set_random_initialized_timestamp() # in theory, secrets can now use os.urandom From phd at phdru.name Mon Jun 20 23:17:27 2016 From: phd at phdru.name (Oleg Broytman) Date: Tue, 21 Jun 2016 05:17:27 +0200 Subject: [Python-Dev] PEP XXX: Compact ordered dict In-Reply-To: References: Message-ID: <20160621031727.GA7518@phdru.name> Hi! On Tue, Jun 21, 2016 at 11:14:39AM +0900, INADA Naoki wrote: > Here is my draft, but I haven't > posted it yet since > my English is much worse than C. > https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0 It's good enough for a start (if a PEP is needed at all). If you push it to Github I'm sure they will come with pull requests. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From songofacandy at gmail.com Tue Jun 21 01:02:52 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Tue, 21 Jun 2016 14:02:52 +0900 Subject: [Python-Dev] PEP XXX: Compact ordered dict In-Reply-To: <20160621031727.GA7518@phdru.name> References: <20160621031727.GA7518@phdru.name> Message-ID: On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: > Hi! > > On Tue, Jun 21, 2016 at 11:14:39AM +0900, INADA Naoki wrote: >> Here is my draft, but I haven't >> posted it yet since >> my English is much worse than C. >> https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0 > > It's good enough for a start (if a PEP is needed at all). If you push > it to Github I'm sure they will come with pull requests. > > Oleg. Thank you for reading my draft. > (if a PEP is needed at all) I don't think so. My PEP is not for changing Python Language, just describe implementation detail. Python 3.5 has new OrderedDict implemented in C without PEP. My patch is relatively small than it. And the idea has been well known. -- INADA Naoki From songofacandy at gmail.com Tue Jun 21 11:10:15 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 22 Jun 2016 00:10:15 +0900 Subject: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict Message-ID: I'm sorry, but I hadn't realized which compact ordered dict is not ordered for split table. For example: >>> class A: ... ... ... >>> a = A() >>> b = A() >>> a.a = 1 >>> a.b = 2 >>> b.b = 3 >>> b.a = 4 >>> a.__dict__.items() dict_items([('a', 1), ('b', 2)]) >>> b.__dict__.items() dict_items([('a', 4), ('b', 3)]) This doesn't affects to **kwargs and class namespace. But if we change the language spec to dict preserves insertion order, this should be addressed. On Tue, Jun 21, 2016 at 2:02 PM, INADA Naoki wrote: > On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: >> Hi! >> >> On Tue, Jun 21, 2016 at 11:14:39AM +0900, INADA Naoki wrote: >>> Here is my draft, but I haven't >>> posted it yet since >>> my English is much worse than C. >>> https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0 >> >> It's good enough for a start (if a PEP is needed at all). If you push >> it to Github I'm sure they will come with pull requests. >> >> Oleg. > > Thank you for reading my draft. > >> (if a PEP is needed at all) > > I don't think so. My PEP is not for changing Python Language, > just describe implementation detail. > > Python 3.5 has new OrderedDict implemented in C without PEP. > My patch is relatively small than it. And the idea has been well known. > > -- > INADA Naoki -- INADA Naoki From ericsnowcurrently at gmail.com Tue Jun 21 13:12:26 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 21 Jun 2016 11:12:26 -0600 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Mon, Jun 20, 2016 at 12:31 PM, Nikita Nemkin wrote: > Right. Ordered by default is a very serious implementation constraint. > It's only superior in a sense that it completely subsumes/obsoletes > PEP 520. Just to be clear, PEP 520 is more than just OrderedDict-by-default. In fact, the key point is preserving the definition order, which the PEP now reflects better. Raymond's compact dict would only provide the ordered-by-default part and does nothing to persist the definition order like the PEP specifies. -eric From guido at python.org Tue Jun 21 13:18:50 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Jun 2016 10:18:50 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Tue, Jun 21, 2016 at 10:12 AM, Eric Snow wrote: > On Mon, Jun 20, 2016 at 12:31 PM, Nikita Nemkin wrote: > > Right. Ordered by default is a very serious implementation constraint. > > It's only superior in a sense that it completely subsumes/obsoletes > > PEP 520. > > Just to be clear, PEP 520 is more than just OrderedDict-by-default. > In fact, the key point is preserving the definition order, which the > PEP now reflects better. Raymond's compact dict would only provide > the ordered-by-default part and does nothing to persist the definition > order like the PEP specifies. > Judging from Inada's message there seems to be some confusion about how well the compact dict preserves order (personally I think if it doesn't guarantee order after deletions it's pretty useless). Assuming it preserves order across deletions/compactions (like IIUC OrderedDict does) isn't that good enough for any of the use cases considered? It would require a delete+insert to change an item's order. If we had had these semantics in the language from the start, there would have been plenty uses of this order, and I suspect nobody would have considered asking for __definition_order__. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Tue Jun 21 13:50:00 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 21 Jun 2016 10:50:00 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: <32C4B383-4DD4-4F44-A522-55E6EA96FFE1@gmail.com> > On Jun 21, 2016, at 10:18 AM, Guido van Rossum wrote: > > Judging from Inada's message there seems to be some confusion about how well the compact dict preserves order (personally I think if it doesn't guarantee order after deletions it's pretty useless). Inada should follow PyPy's implementation of the compact dict which does preserve order after deletions (see below). My original proof-of-concept code didn't have that feature; instead, it was aimed at saving space and speeding-up iteration. The key ordering was just a by-product. Additional logic was needed to preserve order for interleaved insertions and deletions. Raymond ---(PyPy test of order preservation)------------------------------------------------------------- 'Demonstrate PyPy preserves order across repeated insertions and deletions' from random import randrange import string s = list(string.letters) d = dict.fromkeys(s) n = len(s) for _ in range(10000): i = randrange(n) c = s.pop(i); s.append(c) d.pop(c); d[c] = None assert d.keys() == s ---(PyPy session showing order preservation)-------------------------------------------------- $ pypy Python 2.7.10 (c09c19272c990a0611b17569a0085ad1ab00c8ff, Jun 13 2016, 03:59:08) [PyPy 5.3.0 with GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>> d = dict(raymond='red', rachel='blue', matthew='yellow') >>>> del d['rachel'] >>>> d['cindy'] = 'green' >>>> d['rachel'] = 'azure' >>>> d {'raymond': 'red', 'matthew': 'yellow', 'cindy': 'green', 'rachel': 'azure'} From storchaka at gmail.com Tue Jun 21 16:48:09 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 21 Jun 2016 23:48:09 +0300 Subject: [Python-Dev] When to use EOFError? Message-ID: There is a design question. If you read file in some format or with some protocol, and the data is ended unexpectedly, when to use general EOFError exception and when to use format/protocol specific exception? For example when load truncated pickle data, an unpickler can raise EOFError, UnpicklingError, ValueError or AttributeError. It is possible to avoid ValueError or AttributeError, but what exception should be raised instead, EOFError or UnpicklingError? Maybe convert all EOFError to UnpicklingError? Or all UnpicklingError caused by unexpectedly ended input to EOFError? Or raise EOFError if the input is ended after completed opcode, and UnpicklingError if it contains truncated opcode? http://bugs.python.org/issue25761 From ncoghlan at gmail.com Tue Jun 21 17:01:15 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jun 2016 14:01:15 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On 21 June 2016 at 10:18, Guido van Rossum wrote: > On Tue, Jun 21, 2016 at 10:12 AM, Eric Snow > wrote: >> >> On Mon, Jun 20, 2016 at 12:31 PM, Nikita Nemkin wrote: >> > Right. Ordered by default is a very serious implementation constraint. >> > It's only superior in a sense that it completely subsumes/obsoletes >> > PEP 520. >> >> Just to be clear, PEP 520 is more than just OrderedDict-by-default. >> In fact, the key point is preserving the definition order, which the >> PEP now reflects better. Raymond's compact dict would only provide >> the ordered-by-default part and does nothing to persist the definition >> order like the PEP specifies. > > > Judging from Inada's message there seems to be some confusion about how well > the compact dict preserves order (personally I think if it doesn't guarantee > order after deletions it's pretty useless). > > Assuming it preserves order across deletions/compactions (like IIUC > OrderedDict does) isn't that good enough for any of the use cases > considered? It would require a delete+insert to change an item's order. If > we had had these semantics in the language from the start, there would have > been plenty uses of this order, and I suspect nobody would have considered > asking for __definition_order__. RIght, if *tp_dict itself* on type objects is guaranteed to be order-preserviing, then we don't need to do anything except perhaps provide a helper method or descriptor on type that automatically filters out the dunder-attributes, and spell out the type dict population order for: - class statements (universal) - types.new_class (universal) - calling type() directly (universal) - PyType_Ready (CPython-specific) - PyType_FromSpec (CPython-specific) Something that isn't currently defined in PEP 520, and probably should be regardless of whether the final implementation is an order preserving tp_dict or a new __definition_order__ attribute, is where descriptors implicitly defined via __slots__ will appear relative to other attributes. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Jun 21 17:10:58 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jun 2016 14:10:58 -0700 Subject: [Python-Dev] frame evaluation API PEP In-Reply-To: References: Message-ID: On 20 June 2016 at 13:32, Dino Viehland via Python-Dev wrote: > It doesn?t help with the issue of potentially multiple consumers of that field > that has been brought up before but I?m not sure how concerned we should be > about that scenario anyway. Brett's comparison with sys.settrace seems relevant here - we don't allow multiple trace hooks at once, which means if you want more than one active at once, either they need to cooperate with each other, or you need to install a meta-tracehook to manage them somehow. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Tue Jun 21 17:17:00 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 21 Jun 2016 23:17:00 +0200 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: Message-ID: When loading truncated data with pickle, I expect a pickle error, not a generic ValueError nor EOFError. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 21 17:21:21 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jun 2016 14:21:21 -0700 Subject: [Python-Dev] Review of PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On 20 June 2016 at 19:11, Eric Snow wrote: > FWIW, regarding repercussions, I do not expect any other potential > future feature will subsume the functionality of PEP 520. The closest > thing would be if cls.__dict__ became ordered. However, that would > intersect with __definition_order__ only at first. Furthermore, > cls.__dict__ would only ever be able to make vague promises about any > relationship with the definiton order. The point of > __definiton_order__ is to provide the one obvious place to get a > specific bit of information about a class. It occurs to me that a settable __definition_order__ provides a benefit that an ordered tp_dict doesn't: to get the "right" definition order in something like Cython or dynamic type creation, you don't need to carefully craft the order in which attributes are defined, you just need to set __definition_order__ appropriately. It also means that the "include dunder-attributes or not" decision is easy to override, regardless of what we set as the default. By contrast, if the *only* ordering information is cls.__dict__.keys(), then there's no way for a type implementor to hide implementation details. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From songofacandy at gmail.com Tue Jun 21 19:09:09 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 22 Jun 2016 08:09:09 +0900 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: <32C4B383-4DD4-4F44-A522-55E6EA96FFE1@gmail.com> References: <32C4B383-4DD4-4F44-A522-55E6EA96FFE1@gmail.com> Message-ID: On Wed, Jun 22, 2016 at 2:50 AM, Raymond Hettinger wrote: > >> On Jun 21, 2016, at 10:18 AM, Guido van Rossum wrote: >> >> Judging from Inada's message there seems to be some confusion about how well the compact dict preserves order (personally I think if it doesn't guarantee order after deletions it's pretty useless). > > Inada should follow PyPy's implementation of the compact dict which does preserve order after deletions (see below). I follow it, for most cases. When my compact dict doesn't preserve order is using PEP 412 Key sharing dict. >>> class A: ... ... ... >>> a = A() >>> b = A() # a and b shares same keys, and have each values >>> a.a = 1 >>> a.b = 2 # The order in shared key is (a, b) >>> b.b = 3 >>> b.a = 4 >>> a.__dict__.items() dict_items([('a', 1), ('b', 2)]) >>> b.__dict__.items() dict_items([('a', 4), ('b', 3)]) It's possible to split keys when the insertion order is not strictly same. But it decrease efficiency of key sharing dict. If key sharing dict is effective only such a very strict cases, I feel __slots__ can be used for it. From ericsnowcurrently at gmail.com Tue Jun 21 20:33:11 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 21 Jun 2016 18:33:11 -0600 Subject: [Python-Dev] Review of PEP 520: Ordered Class Definition Namespace In-Reply-To: References: Message-ID: On Tue, Jun 21, 2016 at 3:21 PM, Nick Coghlan wrote: > It occurs to me that a settable __definition_order__ provides a > benefit that an ordered tp_dict doesn't: to get the "right" definition > order in something like Cython or dynamic type creation, you don't > need to carefully craft the order in which attributes are defined, you > just need to set __definition_order__ appropriately. > > It also means that the "include dunder-attributes or not" decision is > easy to override, regardless of what we set as the default. > > By contrast, if the *only* ordering information is > cls.__dict__.keys(), then there's no way for a type implementor to > hide implementation details. Good point. I'll make a note of this in the PEP. -eric From ericsnowcurrently at gmail.com Tue Jun 21 20:41:53 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 21 Jun 2016 18:41:53 -0600 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Tue, Jun 21, 2016 at 11:18 AM, Guido van Rossum wrote: > If we had had these semantics in the language from the start, there would have > been plenty uses of this order, and I suspect nobody would have considered > asking for __definition_order__. True. The key thing that __definition_order__ provides is an explicit relationship with the class definition. Since we have the opportunity to capture that now, I think we should take it, regardless of the type of the class definition namespace or even of cls.__dict__. For me the strong association with the order in the class definition is worth having. -eric From ericsnowcurrently at gmail.com Tue Jun 21 20:50:19 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 21 Jun 2016 18:50:19 -0600 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Tue, Jun 21, 2016 at 3:01 PM, Nick Coghlan wrote: > RIght, if *tp_dict itself* on type objects is guaranteed to be > order-preserviing, then we don't need to do anything except perhaps > provide a helper method or descriptor on type that automatically > filters out the dunder-attributes, and spell out the type dict > population order for: > > - class statements (universal) > - types.new_class (universal) > - calling type() directly (universal) > - PyType_Ready (CPython-specific) > - PyType_FromSpec (CPython-specific) The problem I have with this is that it still doesn't give any strong relationship with the class definition. Certainly in most cases it will amount to the same thing. However, there is no way to know if cls.__dict__ represents the class definition or not. You also lose knowing whether or not a class came from a definition (or acts as though it did). Finally, __definition_order__ makes the relationship with the definition order clear, whereas cls.__dict__ does not. Instead of being an obvious tool, with cls.__dict__ that relationship would be tucked away where only a few folks with deep knowledge of Python would be in a position to take advantage. > > Something that isn't currently defined in PEP 520, and probably should > be regardless of whether the final implementation is an order > preserving tp_dict or a new __definition_order__ attribute, is where > descriptors implicitly defined via __slots__ will appear relative to > other attributes. I'll add that. -eric From songofacandy at gmail.com Tue Jun 21 23:40:48 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 22 Jun 2016 12:40:48 +0900 Subject: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict In-Reply-To: References: Message-ID: There are three options I can think. 1) Revert key-shared dict (PEP412). pros: Removing key-shared dict makes dict implementation simple. cons: In some applications, PEP 412 is far more compact than compact ordered dict. (Note: Using __slots__ may help such situation). 2) Don't make "keeping insertion order" is Python Language Spec. pros: Best efficiency cons: Different behavior between normal dict and instance.__dict__ may confuse people. 3) More strict rule for key sharing dict. My idea is: * Increasing number of entries (inserting new key) can be possible only if refcnt of keys == 1. * Inserting new item (with existing key) into dict is allowed only when insertion position == number of items in the dict (PyDictObject.ma_used). pros: We can have "dict keeping insertion order". cons: Can't use key-sharing dict for many cases. Small and harmless change may cause sudden memory usage increase. (__slots__ is more predicable). On Wed, Jun 22, 2016 at 12:10 AM, INADA Naoki wrote: > I'm sorry, but I hadn't realized which compact ordered dict is > not ordered for split table. > > For example: >>>> class A: > ... ... > ... >>>> a = A() >>>> b = A() >>>> a.a = 1 >>>> a.b = 2 >>>> b.b = 3 >>>> b.a = 4 >>>> a.__dict__.items() > dict_items([('a', 1), ('b', 2)]) >>>> b.__dict__.items() > dict_items([('a', 4), ('b', 3)]) > > > This doesn't affects to **kwargs and class namespace. > > But if we change the language spec to dict preserves insertion order, > this should be addressed. > > > On Tue, Jun 21, 2016 at 2:02 PM, INADA Naoki wrote: >> On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: >>> Hi! >>> >>> On Tue, Jun 21, 2016 at 11:14:39AM +0900, INADA Naoki wrote: >>>> Here is my draft, but I haven't >>>> posted it yet since >>>> my English is much worse than C. >>>> https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0 >>> >>> It's good enough for a start (if a PEP is needed at all). If you push >>> it to Github I'm sure they will come with pull requests. >>> >>> Oleg. >> >> Thank you for reading my draft. >> >>> (if a PEP is needed at all) >> >> I don't think so. My PEP is not for changing Python Language, >> just describe implementation detail. >> >> Python 3.5 has new OrderedDict implemented in C without PEP. >> My patch is relatively small than it. And the idea has been well known. >> >> -- >> INADA Naoki > > > > -- > INADA Naoki -- INADA Naoki From greg.ewing at canterbury.ac.nz Wed Jun 22 01:34:48 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 22 Jun 2016 17:34:48 +1200 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: <576A2378.9090508@canterbury.ac.nz> Nick Coghlan wrote: > Something that isn't currently defined in PEP 520 ... is where > descriptors implicitly defined via __slots__ will appear relative to > other attributes. In the place where the __slots__ attribute appears? -- Greg From songofacandy at gmail.com Wed Jun 22 07:48:59 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 22 Jun 2016 20:48:59 +0900 Subject: [Python-Dev] PEP XXX: Compact ordered dict In-Reply-To: <20160621031727.GA7518@phdru.name> References: <20160621031727.GA7518@phdru.name> Message-ID: FYI, Here is calculated size of each dict by len(d). https://docs.google.com/spreadsheets/d/1nN5y6IsiJGdNxD7L7KBXmhdUyXjuRAQR_WbrS8zf6mA/edit?usp=sharing On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: > Hi! > > On Tue, Jun 21, 2016 at 11:14:39AM +0900, INADA Naoki wrote: >> Here is my draft, but I haven't >> posted it yet since >> my English is much worse than C. >> https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0 > > It's good enough for a start (if a PEP is needed at all). If you push > it to Github I'm sure they will come with pull requests. > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com -- INADA Naoki From random832 at fastmail.com Wed Jun 22 10:17:27 2016 From: random832 at fastmail.com (Random832) Date: Wed, 22 Jun 2016 10:17:27 -0400 Subject: [Python-Dev] Why are class dictionaries not accessible? Message-ID: <1466605047.4114458.645279001.61C40E1D@webmail.messagingengine.com> The documentation states: """Objects such as modules and instances have an updateable __dict__ attribute; however, other objects may have write restrictions on their __dict__ attributes (for example, classes use a dictproxy to prevent direct dictionary updates).""" However, it's not clear from that *why* direct dictionary updates are undesirable. This not only prevents you from getting a reference to the real class dict (which is the apparent goal), but is also the fundamental reason why you can't use a metaclass to put, say, an OrderedDict in its place - because the type constructor has to copy the dict that was used in class preparation into a new dict rather than using the one that was actually returned by __prepare__. [Also, the name of the type used for this is mappingproxy, not dictproxy] From random832 at fastmail.com Wed Jun 22 10:31:27 2016 From: random832 at fastmail.com (Random832) Date: Wed, 22 Jun 2016 10:31:27 -0400 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: Message-ID: <1466605887.4117246.645287633.6E053CD9@webmail.messagingengine.com> On Tue, Jun 21, 2016, at 16:48, Serhiy Storchaka wrote: > There is a design question. If you read file in some format or with some > protocol, and the data is ended unexpectedly, when to use general > EOFError exception and when to use format/protocol specific exception? > > For example when load truncated pickle data, an unpickler can raise > EOFError, UnpicklingError, ValueError or AttributeError. It is possible > to avoid ValueError or AttributeError, but what exception should be > raised instead, EOFError or UnpicklingError? Maybe convert all EOFError > to UnpicklingError? I think this is the most appropriate. If the calling code needs to know the original reason it can find it in __cause__. My instinct, though, (and I'm aware that others may not agree, but I thought it was worth bringing up) is that loads should actually always raise a ValueError, i.e. my mental model of loads is like: def loads(s): f = BytesIO(s) try: return load(f) except UnpicklingError as e: raise ValueError from e From guido at python.org Wed Jun 22 11:11:19 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Jun 2016 08:11:19 -0700 Subject: [Python-Dev] Why are class dictionaries not accessible? In-Reply-To: <1466605047.4114458.645279001.61C40E1D@webmail.messagingengine.com> References: <1466605047.4114458.645279001.61C40E1D@webmail.messagingengine.com> Message-ID: On Wed, Jun 22, 2016 at 7:17 AM, Random832 wrote: > The documentation states: """Objects such as modules and instances have > an updateable __dict__ attribute; however, other objects may have write > restrictions on their __dict__ attributes (for example, classes use a > dictproxy to prevent direct dictionary updates).""" > > However, it's not clear from that *why* direct dictionary updates are > undesirable. This not only prevents you from getting a reference to the > real class dict (which is the apparent goal), but is also the > fundamental reason why you can't use a metaclass to put, say, an > OrderedDict in its place - because the type constructor has to copy the > dict that was used in class preparation into a new dict rather than > using the one that was actually returned by __prepare__. > > [Also, the name of the type used for this is mappingproxy, not > dictproxy] > This is done in order to force all mutations of the class dict to go through attribute assignments on the class. The latter takes care of updating the class struct, e.g. if you were to add an `__add__` method dynamically it would update tp_as_number->nb_add. If you could modify the dict object directly it would be more difficult to arrange for this side effect. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From nd at perlig.de Wed Jun 22 12:22:49 2016 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Wed, 22 Jun 2016 18:22:49 +0200 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: Message-ID: <201606221822.49308@news.perlig.de> * Serhiy Storchaka wrote: > There is a design question. If you read file in some format or with some > protocol, and the data is ended unexpectedly, when to use general > EOFError exception and when to use format/protocol specific exception? > > For example when load truncated pickle data, an unpickler can raise > EOFError, UnpicklingError, ValueError or AttributeError. It is possible > to avoid ValueError or AttributeError, but what exception should be > raised instead, EOFError or UnpicklingError? Maybe convert all EOFError > to UnpicklingError? Or all UnpicklingError caused by unexpectedly ended > input to EOFError? Or raise EOFError if the input is ended after > completed opcode, and UnpicklingError if it contains truncated opcode? I often concatenate multiple pickles into one file. When reading them, it works like this: try: while True: yield pickle.load(fp) except EOFError: pass In this case the truncation is not really unexpected. Maybe it should distinguish between truncated-in-the-middle and truncated-because-empty. (Same goes for marshal) Cheers, -- Real programmers confuse Christmas and Halloween because DEC 25 = OCT 31. -- Unknown (found in ssl_engine_mutex.c) From ericfahlgren at gmail.com Wed Jun 22 13:05:50 2016 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Wed, 22 Jun 2016 10:05:50 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: <08cf01d1cca8$54e2d960$fea88c20$@gmail.com> On Wed 2016-06-22 Eric Snow [mailto:ericsnowcurrently at gmail.com] wrote: > The problem I have with this is that it still doesn't give any strong relationship with the class definition. > Certainly in most cases it will amount to the same thing. However, there is no way to know if cls.__dict__ > represents the class definition or not. You also lose knowing whether or not a class came from a definition > (or acts as though it did). Finally, __definition_order__ makes the relationship with the definition order clear, > whereas cls.__dict__ does not. > Instead of being an obvious tool, with cls.__dict__ that relationship would be tucked away where only a > few folks with deep knowledge of Python would be in a position to take advantage. I see this as being grossly/loosely analogous to traversing __bases__ vs calling mro(), so I feel the same rationale applies to adding __definition_order__ as mro. Eric From songofacandy at gmail.com Wed Jun 22 13:23:18 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 23 Jun 2016 02:23:18 +0900 Subject: [Python-Dev] Idea: more compact, interned string key only dict for namespace. Message-ID: As my last email, compact ordered dict can't preserve insertion order of key sharing dict (PEP 412). I'm thinking about deprecating key shared dict for now. Instead, my new idea is introducing more compact dict specialized for namespace. If BDFL (or BDFL delegate) likes this idea, I'll take another one week to implement this. Background ---------------- * Most keys of namespace dict are string. * Calculating hash of string is cheap (one memory access, thanks for cache). * And most keys are interned already. Design ---------- Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this. typedef struct { // no me_hash PyObject *me_key, *me_value; } PyInternedKeyEntry; insertdict() interns key if it's unicode, otherwise it converts dict to normal compact ordered dict. lookdict_interned() compares only pointer (doesn't call unicode_eq()) when searching key is interned. And add new internal API to create interned key only dict. PyDictObject* _PyDict_NewForNamespace(); Memory usage -------------------- on amd64 arch. key-sharing dict: * 96 bytes for ~3 items * 128 bytes for 4~5 items. compact dict: * 224 bytes for ~5 items. (232 bytes when keep supporting key-shared dict) interned key only dict: * 184 bytes for ~5 items Note ------ Interned key only dict is still larger than key-shared dict. But it can be used for more purpose. It can be used for interning string for example. It can be used to kwargs dict when all keys are interned already. If we provide _PyDict_NewForNamespace to extension modules, json decoder can have option to use this, too. -- INADA Naoki From mark at hotpy.org Wed Jun 22 21:30:01 2016 From: mark at hotpy.org (Mark Shannon) Date: Wed, 22 Jun 2016 18:30:01 -0700 Subject: [Python-Dev] Idea: more compact, interned string key only dict for namespace. In-Reply-To: References: Message-ID: <576B3B99.6070204@hotpy.org> Hi all, I think we need some more data before going any further reimplementing dicts. What I would like to know is, across a set of Python programs (ideally a representative set), what the proportion of dicts in memory at any one time are: a) instance dicts b) other namespace dicts (classes and modules) c) data dicts with all string keys d) other data dicts e) keyword argument dicts (I'm guessing this is vanishingly small) I would expect that (a) far exceeds (b) and depending on the application also considerably exceeds (c), but I would like some real data. From that we can compute the (approximate) memory costs of the competing designs. As an aside, if anyone is really keen to save memory, then removing the cycle GC header is the thing to do. That uses 24 bytes per object and *half* of all live objects have it. And don't forget that any Python object is really two objects, the object and its dict, so that is 48 extra bytes every time you create a new object. Cheers, Mark. On 22/06/16 10:23, INADA Naoki wrote: > As my last email, compact ordered dict can't preserve > insertion order of key sharing dict (PEP 412). > > I'm thinking about deprecating key shared dict for now. > > Instead, my new idea is introducing more compact dict > specialized for namespace. > > If BDFL (or BDFL delegate) likes this idea, I'll take another > one week to implement this. > > > Background > ---------------- > > * Most keys of namespace dict are string. > * Calculating hash of string is cheap (one memory access, thanks for cache). > * And most keys are interned already. > > > Design > ---------- > > Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this. > > typedef struct { > // no me_hash > PyObject *me_key, *me_value; > } PyInternedKeyEntry; > > > insertdict() interns key if it's unicode, otherwise it converts dict to > normal compact ordered dict. > > lookdict_interned() compares only pointer (doesn't call unicode_eq()) > when searching key is interned. > > And add new internal API to create interned key only dict. > > PyDictObject* _PyDict_NewForNamespace(); > > > Memory usage > -------------------- > > on amd64 arch. > > key-sharing dict: > > * 96 bytes for ~3 items > * 128 bytes for 4~5 items. > > compact dict: > > * 224 bytes for ~5 items. > > (232 bytes when keep supporting key-shared dict) > > interned key only dict: > > * 184 bytes for ~5 items > > > Note > ------ > > Interned key only dict is still larger than key-shared dict. > > But it can be used for more purpose. It can be used for interning string > for example. It can be used to kwargs dict when all keys are interned already. > > If we provide _PyDict_NewForNamespace to extension modules, > json decoder can have option to use this, too. > > From songofacandy at gmail.com Thu Jun 23 00:08:27 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 23 Jun 2016 13:08:27 +0900 Subject: [Python-Dev] Idea: more compact, interned string key only dict for namespace. In-Reply-To: References: Message-ID: > Memory usage > -------------------- > > on amd64 arch. > > key-sharing dict: > > * 96 bytes for ~3 items > * 128 bytes for 4~5 items. Note: There are another shared key. * 128 bytes for ~3 items * 224 bytes for 4~5 items So, let S = how many instances shares the key, * 90 + (96 / S) bytes for ~3 items * 128 + (224 / S) bytes for 4~5 items > > compact dict: > > * 224 bytes for ~5 items. > > (232 bytes when keep supporting key-shared dict) > > interned key only dict: > > * 184 bytes for ~5 items > > > Note > ------ > > Interned key only dict is still larger than key-shared dict. > > But it can be used for more purpose. It can be used for interning string > for example. It can be used to kwargs dict when all keys are interned already. > > If we provide _PyDict_NewForNamespace to extension modules, > json decoder can have option to use this, too. > > > -- > INADA Naoki -- INADA Naoki From songofacandy at gmail.com Thu Jun 23 00:43:17 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 23 Jun 2016 13:43:17 +0900 Subject: [Python-Dev] Idea: more compact, interned string key only dict for namespace. In-Reply-To: <576B3B99.6070204@hotpy.org> References: <576B3B99.6070204@hotpy.org> Message-ID: Hi, Mark. Thank you for reply. On Thu, Jun 23, 2016 at 10:30 AM, Mark Shannon wrote: > Hi all, > > I think we need some more data before going any further reimplementing > dicts. > > What I would like to know is, across a set of Python programs (ideally a > representative set), what the proportion of dicts in memory at any one time > are: > > a) instance dicts > b) other namespace dicts (classes and modules) > c) data dicts with all string keys > d) other data dicts > e) keyword argument dicts (I'm guessing this is vanishingly small) > > I would expect that (a) far exceeds (b) and depending on the application > also considerably exceeds (c), but I would like some real data. > From that we can compute the (approximate) memory costs of the competing > designs. I think you're right. But, I don't have clear idea about how to do it. Is there existing effort about collecting stats of dict? > > As an aside, if anyone is really keen to save memory, then removing the > cycle GC header is the thing to do. > That uses 24 bytes per object and *half* of all live objects have it. > And don't forget that any Python object is really two objects, the object > and its dict, so that is 48 extra bytes every time you create a new object. > It's great idea. But I can't do it before Python 3.6. My main concern is not saving memory, ordered dict for **kwargs without significant overhead. If "orderd, except key sharing dict" is acceptable, no problem. Key sharing compact dict is smaller than current key sharing dict of Python 3.5 for most cases. https://docs.google.com/spreadsheets/d/1nN5y6IsiJGdNxD7L7KBXmhdUyXjuRAQR_WbrS8zf6mA/edit#gid=0 Regards, -- INADA Naoki From songofacandy at gmail.com Thu Jun 23 03:41:21 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 23 Jun 2016 16:41:21 +0900 Subject: [Python-Dev] Idea: more compact, interned string key only dict for namespace. In-Reply-To: References: <576B3B99.6070204@hotpy.org> Message-ID: I've checked time and maxrss of sphinx-build. In case of sphinx, ## master $ rm -rf build/ $ /usr/bin/time ~/local/python-master/bin/sphinx-build -b html -d build/doctrees -D latex_paper_size= . build/html -QN 71.76user 0.27system 1:12.06elapsed 99%CPU (0avgtext+0avgdata 176248maxresident)k 80inputs+202888outputs (2major+58234minor)pagefaults 0swaps 71.86user 0.28system 1:12.16elapsed 99%CPU (0avgtext+0avgdata 176312maxresident)k 0inputs+201480outputs (0major+59897minor)pagefaults 0swaps ## compact-dict w/ shared $ rm -rf build/ $ /usr/bin/time ~/local/python-compact/bin/sphinx-build -b html -d build/doctrees -D latex_paper_size= . build/html -QN 72.18user 0.27system 1:12.47elapsed 99%CPU (0avgtext+0avgdata 158104maxresident)k 728inputs+200792outputs (0major+53409minor)pagefaults 0swaps 72.79user 0.30system 1:13.11elapsed 99%CPU (0avgtext+0avgdata 157916maxresident)k 0inputs+200792outputs (0major+54072minor)pagefaults 0swaps ## compact w/o shared key (Only shared key removed. No interned key only dict) $ rm -rf build/ $ /usr/bin/time ~/local/python-intern/bin/sphinx-build -b html -d build/doctrees -D latex_paper_size= . build/html -QN 71.79user 0.34system 1:12.16elapsed 99%CPU (0avgtext+0avgdata 165884maxresident)k 480inputs+200792outputs (0major+56947minor)pagefaults 0swaps 71.84user 0.27system 1:12.13elapsed 99%CPU (0avgtext+0avgdata 166888maxresident)k 640inputs+200792outputs (5major+56834minor)pagefaults 0swaps -- INADA Naoki From random832 at fastmail.com Thu Jun 23 11:01:05 2016 From: random832 at fastmail.com (Random832) Date: Thu, 23 Jun 2016 11:01:05 -0400 Subject: [Python-Dev] Why are class dictionaries not accessible? In-Reply-To: References: <1466605047.4114458.645279001.61C40E1D@webmail.messagingengine.com> Message-ID: <1466694065.230465.646395065.5FE7CA44@webmail.messagingengine.com> On Wed, Jun 22, 2016, at 11:11, Guido van Rossum wrote: > This is done in order to force all mutations of the class dict to go > through attribute assignments on the class. The latter takes care of > updating the class struct, e.g. if you were to add an `__add__` method > dynamically it would update tp_as_number->nb_add. If you could modify the > dict object directly it would be more difficult to arrange for this side > effect. Why is this different from the fact that updating a normal object's dict bypasses descriptors and any special logic in __setattr__? Dunder methods are already "special" in the sense that you can't use them as object attributes; I wouldn't be surprised by "assigning a dunder method via the class's dict breaks things". From ericsnowcurrently at gmail.com Thu Jun 23 11:03:57 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 23 Jun 2016 09:03:57 -0600 Subject: [Python-Dev] PEP XXX: Compact ordered dict In-Reply-To: References: <20160621031727.GA7518@phdru.name> Message-ID: On Mon, Jun 20, 2016 at 11:02 PM, INADA Naoki wrote: > On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: >> (if a PEP is needed at all) > > I don't think so. My PEP is not for changing Python Language, > just describe implementation detail. > > Python 3.5 has new OrderedDict implemented in C without PEP. > My patch is relatively small than it. And the idea has been well known. How about, for 3.6, target re-implementing OrderedDict using the compact dict approach (and leave dict alone for now). That way we have an extra release cycle to iron out the kinks before switching dict over for 3.7. :) -eric From guido at python.org Thu Jun 23 11:19:44 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Jun 2016 08:19:44 -0700 Subject: [Python-Dev] Why are class dictionaries not accessible? In-Reply-To: <1466694065.230465.646395065.5FE7CA44@webmail.messagingengine.com> References: <1466605047.4114458.645279001.61C40E1D@webmail.messagingengine.com> <1466694065.230465.646395065.5FE7CA44@webmail.messagingengine.com> Message-ID: On Thu, Jun 23, 2016 at 8:01 AM, Random832 wrote: > On Wed, Jun 22, 2016, at 11:11, Guido van Rossum wrote: > > This is done in order to force all mutations of the class dict to go > > through attribute assignments on the class. The latter takes care of > > updating the class struct, e.g. if you were to add an `__add__` method > > dynamically it would update tp_as_number->nb_add. If you could modify the > > dict object directly it would be more difficult to arrange for this side > > effect. > > Why is this different from the fact that updating a normal object's dict > bypasses descriptors and any special logic in __setattr__? Dunder > methods are already "special" in the sense that you can't use them as > object attributes; I wouldn't be surprised by "assigning a dunder method > via the class's dict breaks things". > It was a long time when I wrote this, but IIRC the breakage could express itself as a segfault or other C-level crash due to some internal state invariant of the type object being violated, not just an exception. The existence of ctypes notwithstanding, we take C-level crashes very seriously. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Thu Jun 23 11:26:28 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 24 Jun 2016 00:26:28 +0900 Subject: [Python-Dev] PEP XXX: Compact ordered dict In-Reply-To: References: <20160621031727.GA7518@phdru.name> Message-ID: On Fri, Jun 24, 2016 at 12:03 AM, Eric Snow wrote: > On Mon, Jun 20, 2016 at 11:02 PM, INADA Naoki wrote: >> On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: >>> (if a PEP is needed at all) >> >> I don't think so. My PEP is not for changing Python Language, >> just describe implementation detail. >> >> Python 3.5 has new OrderedDict implemented in C without PEP. >> My patch is relatively small than it. And the idea has been well known. > > How about, for 3.6, target re-implementing OrderedDict using the > compact dict approach (and leave dict alone for now). That way we > have an extra release cycle to iron out the kinks before switching > dict over for 3.7. :) > > -eric I can't. Since OrderedDict inherits dict. OrderedDict implementation based on dict implementation. Since I'm not expert of Python object system, I don't know how to separate OrderedDict implementation from dict. -- INADA Naoki From jeanpierreda at gmail.com Thu Jun 23 16:03:00 2016 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 23 Jun 2016 13:03:00 -0700 Subject: [Python-Dev] Why are class dictionaries not accessible? In-Reply-To: References: <1466605047.4114458.645279001.61C40E1D@webmail.messagingengine.com> <1466694065.230465.646395065.5FE7CA44@webmail.messagingengine.com> Message-ID: On Thu, Jun 23, 2016 at 8:19 AM, Guido van Rossum wrote: > > It was a long time when I wrote this, but IIRC the breakage could express > itself as a segfault or other C-level crash due to some internal state > invariant of the type object being violated, not just an exception. The > existence of ctypes notwithstanding, we take C-level crashes very seriously. > Big digression: one can still obtain the dict if they really want to, even without using ctypes. I suppose don't actually mutate it unless you want to segfault. >>> import gc >>> class A(object): pass >>> type(A.__dict__) >>> type(gc.get_referents(A.__dict__)[0]) >>> gc.get_referents(A.__dict__)[0]['abc'] = 1 >>> A.abc 1 >>> (One can also get it right from A, but A can have other references, so maybe that's less reliable.) I think I wanted this at the time so I could better measure the sizes of objects. sys.getsizeof(A.__dict__) is very different from sys.getsizeof(gc.get_referents(A.__dict__)[0]), and also different from sys.getsizeof(A). For example: >>> import gc >>> class A(object): pass >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); sys.getsizeof(gc.get_referents(A.__dict__)[0]) 976 48 288 >>> for i in range(10000): setattr(A, 'attr_%s' % i, i) >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); sys.getsizeof(gc.get_referents(A.__dict__)[0]) 976 48 393312 (Fortunately, if you want to walk the object graph to measure memory usage per object type, you're probably going to be using gc.get_referents already anyway, so this is just confirmation that you're getting what you want in one corner case.) -- Devin -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jun 23 16:40:51 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Jun 2016 13:40:51 -0700 Subject: [Python-Dev] Why are class dictionaries not accessible? In-Reply-To: References: <1466605047.4114458.645279001.61C40E1D@webmail.messagingengine.com> <1466694065.230465.646395065.5FE7CA44@webmail.messagingengine.com> Message-ID: "Er, among our chief weapons are fear, surprise, ctypes, gc, and fanatical devotion to the Pope!" On Thu, Jun 23, 2016 at 1:03 PM, Devin Jeanpierre wrote: > On Thu, Jun 23, 2016 at 8:19 AM, Guido van Rossum > wrote: >> >> It was a long time when I wrote this, but IIRC the breakage could express >> itself as a segfault or other C-level crash due to some internal state >> invariant of the type object being violated, not just an exception. The >> existence of ctypes notwithstanding, we take C-level crashes very seriously. >> > > Big digression: one can still obtain the dict if they really want to, even > without using ctypes. I suppose don't actually mutate it unless you want to > segfault. > > >>> import gc > >>> class A(object): pass > >>> type(A.__dict__) > > >>> type(gc.get_referents(A.__dict__)[0]) > > >>> gc.get_referents(A.__dict__)[0]['abc'] = 1 > >>> A.abc > 1 > >>> > > (One can also get it right from A, but A can have other references, so > maybe that's less reliable.) > > I think I wanted this at the time so I could better measure the sizes of > objects. sys.getsizeof(A.__dict__) is very different > from sys.getsizeof(gc.get_referents(A.__dict__)[0]), and also different > from sys.getsizeof(A). For example: > > >>> import gc > >>> class A(object): pass > >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); > sys.getsizeof(gc.get_referents(A.__dict__)[0]) > 976 > 48 > 288 > >>> for i in range(10000): setattr(A, 'attr_%s' % i, i) > >>> sys.getsizeof(A); sys.getsizeof(A.__dict__); > sys.getsizeof(gc.get_referents(A.__dict__)[0]) > 976 > 48 > 393312 > > (Fortunately, if you want to walk the object graph to measure memory usage > per object type, you're probably going to be using gc.get_referents already > anyway, so this is just confirmation that you're getting what you want in > one corner case.) > > -- Devin > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jun 23 21:00:18 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 23 Jun 2016 18:00:18 -0700 Subject: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict In-Reply-To: References: Message-ID: On Tue, Jun 21, 2016 at 8:40 PM, INADA Naoki wrote: > There are three options I can think. > > > 1) Revert key-shared dict (PEP412). > > pros: Removing key-shared dict makes dict implementation simple. > > cons: In some applications, PEP 412 is far more compact than compact > ordered dict. (Note: Using __slots__ may help such situation). > > > 2) Don't make "keeping insertion order" is Python Language Spec. > > pros: Best efficiency > > cons: Different behavior between normal dict and instance.__dict__ may > confuse people. > > > 3) More strict rule for key sharing dict. > > My idea is: > * Increasing number of entries (inserting new key) can be possible > only if refcnt of keys == 1. > > * Inserting new item (with existing key) into dict is allowed only when > insertion position == number of items in the dict (PyDictObject.ma_used). > > pros: We can have "dict keeping insertion order". > > cons: Can't use key-sharing dict for many cases. Small and harmless > change may cause > sudden memory usage increase. (__slots__ is more predicable). IIUC, key-sharing dicts are a best-effort optimization where if I have a class like: class Foo: def __init__(self, a, b): self.a = a self.b = b f1 = Foo(1, 2) f2 = Foo(3, 4) then f1.__dict__ and f2.__dict__ can share their key arrays... but if I do f1.c = "c", then f1.__dict__ gets automatically switched to a regular dict. The idea being that in, say, 99% of cases, different objects of the same type all share the same set of keys, and in the other 1%, oh well, we fall back on the regular behavior. It seems to me that all this works fine for ordered dicts too, if we add the restriction that key arrays can be shared if and only if the two dicts have the same set of keys *and* initially assign those keys in the same order. In, say, 98.9% of cases, different objects of the same type all share the same set of keys and initially assign those keys in the same order, and in the other 1.1%, oh well, we can silently fall back on unshared keys, same as before. (And crucially, the OrderedDict semantics are that only adding *new* keys changes the order; assignments to existing keys preserve the existing order. So if a given type always creates the same instance attributes in the same order at startup and never adds or deletes any, then its key values *and* key order will stay the same even if it later mutates some of those existing attributes in-place.) It's possible that there will be some weird types that mess this up, like: class WeirdFoo: def __init__(self, a, b): if a % 2 == 0: self.a = a self.b = b else: self.b = b self.a = a assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2, 3).__dict__.keys()) but, who cares? It'd be good due-diligence to collect data on this to confirm that it isn't a big issue, but intuitively, code like WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort optimization anyway. Catching 98.9% of cases is good enough. Is there something I'm missing here? Is this your option #3? -n -- Nathaniel J. Smith -- https://vorpus.org From songofacandy at gmail.com Fri Jun 24 00:14:55 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 24 Jun 2016 13:14:55 +0900 Subject: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict In-Reply-To: References: Message-ID: > IIUC, key-sharing dicts are a best-effort optimization where if I have > a class like: > > class Foo: > def __init__(self, a, b): > self.a = a > self.b = b > > f1 = Foo(1, 2) > f2 = Foo(3, 4) > > then f1.__dict__ and f2.__dict__ can share their key arrays... but if > I do f1.c = "c", then f1.__dict__ gets automatically switched to a > regular dict. The idea being that in, say, 99% of cases, different > objects of the same type all share the same set of keys, and in the > other 1%, oh well, we fall back on the regular behavior. Small correction: Giving up sharing dict can happen when resizing keys. f1 = Foo(1, 2) # f1 has [a, b] keys. Let's say it k1. Foo caches k1. f2 = Foo(3, 4) # new instance uses cached k1 keys. f1.c = "c" # Since k1 can contain three keys, nothing happen. f1.d = "d" # gave up. Foo doesn't use shared key anymore. f3 = Foo(5, 6) # f3 has normal dict. You can see it by `sys.getsizeof(f1.__dict__), sys.getsizeof(f2.__dict__)`. > It seems to me that all this works fine for ordered dicts too, if we > add the restriction that key arrays can be shared if and only if the > two dicts have the same set of keys *and* initially assign those keys > in the same order. In, say, 98.9% of cases, different objects of the > same type all share the same set of keys and initially assign those > keys in the same order, and in the other 1.1%, oh well, we can > silently fall back on unshared keys, same as before. (And crucially, > the OrderedDict semantics are that only adding *new* keys changes the > order; assignments to existing keys preserve the existing order. So if > a given type always creates the same instance attributes in the same > order at startup and never adds or deletes any, then its key values > *and* key order will stay the same even if it later mutates some of > those existing attributes in-place.) > > It's possible that there will be some weird types that mess this up, like: > > class WeirdFoo: > def __init__(self, a, b): > if a % 2 == 0: > self.a = a > self.b = b > else: > self.b = b > self.a = a > > assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2, > 3).__dict__.keys()) > > but, who cares? It'd be good due-diligence to collect data on this to > confirm that it isn't a big issue, but intuitively, code like > WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort > optimization anyway. Catching 98.9% of cases is good enough. > While I think it's less than 98.9% (see below examples), I agree with you. 1) not shared even current implementation class A: n = 0 def __init__(self, a, b, c): self.a, self.b, self.c = a, b, c def add(self): self.n += 1 a = A() b = A() a.add(1) 2) not shared if strict ordering rule class A: file = None def __init__(self, a, **, filename=None): if filename is not None: self.file = open(filename, 'w') self.a = a a = A(42, filename="logfile.txt") b = B(43) 3) Web application's model objects class User(Model): id = IntColumn() name = StringColumn() age = IntColumn() # When creating new instance, (name, age) is initialized, and id is filled after insert. user = User(name="methane", age=32) db.add(user) # When instances fetched from DB, ORM populate attributes in (id, name, age) order. # 100 instances doesn't share keys under "strict ordering rule". users = User.query.limit(100).all() > Is there something I'm missing here? Is this your option #3? Yes. It may works well, but "one special instance disables key-sharing for all instances created after" may cause long time increasing memory usage. People seeing monitoring graph will think their application have memory leak. My new idea may have more stable memory usage, without decreasing memory efficiency so much. See https://mail.python.org/pipermail/python-dev/2016-June/145391.html Compact ordered dict is more efficient than key-sharing dict in case of Sphinx. It means, instance __dict__ is not dominance. I'll implement POC of my new idea and compare it with Sphinx. If you know another good *real application*, which is easy to benchmark, please tell me it. -- INADA Naoki From lkb.teichmann at gmail.com Fri Jun 24 03:41:36 2016 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Fri, 24 Jun 2016 09:41:36 +0200 Subject: [Python-Dev] PEP 487: Simpler customization of class creation Message-ID: Hi list, just recently, I posted about the implementation of PEP 487. The discussion quickly diverted to PEP 520, which happened to be strongly related. Hoping to get some comments about the rest of PEP 487, I took out the part that is also in PEP 520. I attached the new version of the PEP. The implementation can be found on the Python issue tracker: http://bugs.python.org/issue27366 So PEP 487 is about simplifying the customization of class creation. Currently, this is done via metaclasses, which are very powerful, but often inflexible, as it is hard to combine two metaclasses. PEP 487 proposes a new metaclass which calls a method on all newly created subclasses. This way, in order to customize the creation of subclasses, one just needs to write a simple method. An absolutely classic example for metaclasses is the need to tell descriptors who they belong to. There are many large frameworks out there, e.g. enthought's traits, IPython's traitlets, Django's forms and many more. Their problem is: they're all fully incompatible. It's really hard to inherit from two classes which have different metaclasses. PEP 487 proposes to have one simple metaclass, which can do all those frameworks need, making them all compatible. As an example, imagine the framework has a generic descriptor called Integer, which describes, well, an integer. Typically you use it like that: class MyClass(FrameworkBaseClass): my_value = Integer() how does my_value know how it's called, how it should put its data into the object's __dict__? Well, this is what the framework's metaclass is for. With PEP 487, a framework doesn't need to declare an own metaclass anymore, but simply uses types.Object of PEP 487 as a base class: class FrameworkBaseClass(types.Object): def __init_subclass__(cls): super().__init_subclass__() for k, v in cls.__dict__.items(): if isinstance(v, FrameworkDescriptorBase): v.__set_owner__(cls, name) and all the framework's descriptors know their name. And if another framework should be used as well: no problem, they just work together easily. Actually, the above example is just that common, that PEP 487 includes it directly: a method __set_owner__ is called for every descriptor. That could make most descriptors in frameworks work out of the box. So now I am hoping for comments! Greetings Martin New version of the PEP follows: PEP: 487 Title: Simpler customisation of class creation Version: $Revision$ Last-Modified: $Date$ Author: Martin Teichmann , Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 27-Feb-2015 Python-Version: 3.6 Post-History: 27-Feb-2015, 5-Feb-2016, 24-Jun-2016 Replaces: 422 Abstract ======== Currently, customising class creation requires the use of a custom metaclass. This custom metaclass then persists for the entire lifecycle of the class, creating the potential for spurious metaclass conflicts. This PEP proposes to instead support a wide range of customisation scenarios through a new ``__init_subclass__`` hook in the class body, a hook to initialize attributes. Those hooks should at first be defined in a metaclass in the standard library, with the option that this metaclass eventually becomes the default ``type`` metaclass. The new mechanism should be easier to understand and use than implementing a custom metaclass, and thus should provide a gentler introduction to the full power Python's metaclass machinery. Background ========== Metaclasses are a powerful tool to customize class creation. They have, however, the problem that there is no automatic way to combine metaclasses. If one wants to use two metaclasses for a class, a new metaclass combining those two needs to be created, typically manually. This need often occurs as a surprise to a user: inheriting from two base classes coming from two different libraries suddenly raises the necessity to manually create a combined metaclass, where typically one is not interested in those details about the libraries at all. This becomes even worse if one library starts to make use of a metaclass which it has not done before. While the library itself continues to work perfectly, suddenly every code combining those classes with classes from another library fails. Proposal ======== While there are many possible ways to use a metaclass, the vast majority of use cases falls into just three categories: some initialization code running after class creation, the initalization of descriptors and keeping the order in which class attributes were defined. Those three use cases can easily be performed by just one metaclass. If this metaclass is put into the standard library, and all libraries that wish to customize class creation use this very metaclass, no combination of metaclasses is necessary anymore. Said metaclass should live in the ``types`` module under the name ``Type``. This should hint the user that in the future, this metaclass may become the default metaclass ``type``. The three use cases are achieved as follows: 1. The metaclass contains an ``__init_subclass__`` hook that initializes all subclasses of a given class, 2. the metaclass calls a ``__set_owner__`` hook on all the attribute (descriptors) defined in the class, and For ease of use, a base class ``types.Object`` is defined, which uses said metaclass and contains an empty stub for the hook described for use case 1. It will eventually become the new replacement for the standard ``object``. As an example, the first use case looks as follows:: >>> class SpamBase(types.Object): ... # this is implicitly a @classmethod ... def __init_subclass__(cls, **kwargs): ... cls.class_args = kwargs ... super().__init_subclass__(cls, **kwargs) >>> class Spam(SpamBase, a=1, b="b"): ... pass >>> Spam.class_args {'a': 1, 'b': 'b'} The base class ``types.Object`` contains an empty ``__init_subclass__`` method which serves as an endpoint for cooperative multiple inheritance. Note that this method has no keyword arguments, meaning that all methods which are more specialized have to process all keyword arguments. This general proposal is not a new idea (it was first suggested for inclusion in the language definition `more than 10 years ago`_, and a similar mechanism has long been supported by `Zope's ExtensionClass`_), but the situation has changed sufficiently in recent years that the idea is worth reconsidering for inclusion. The second part of the proposal adds an ``__set_owner__`` initializer for class attributes, especially if they are descriptors. Descriptors are defined in the body of a class, but they do not know anything about that class, they do not even know the name they are accessed with. They do get to know their owner once ``__get__`` is called, but still they do not know their name. This is unfortunate, for example they cannot put their associated value into their object's ``__dict__`` under their name, since they do not know that name. This problem has been solved many times, and is one of the most important reasons to have a metaclass in a library. While it would be easy to implement such a mechanism using the first part of the proposal, it makes sense to have one solution for this problem for everyone. To give an example of its usage, imagine a descriptor representing weak referenced values:: import weakref class WeakAttribute: def __get__(self, instance, owner): return instance.__dict__[self.name] def __set__(self, instance, value): instance.__dict__[self.name] = weakref.ref(value) # this is the new initializer: def __set_owner__(self, owner, name): self.name = name While this example looks very trivial, it should be noted that until now such an attribute cannot be defined without the use of a metaclass. And given that such a metaclass can make life very hard, this kind of attribute does not exist yet. Key Benefits ============ Easier inheritance of definition time behaviour ----------------------------------------------- Understanding Python's metaclasses requires a deep understanding of the type system and the class construction process. This is legitimately seen as challenging, due to the need to keep multiple moving parts (the code, the metaclass hint, the actual metaclass, the class object, instances of the class object) clearly distinct in your mind. Even when you know the rules, it's still easy to make a mistake if you're not being extremely careful. Understanding the proposed implicit class initialization hook only requires ordinary method inheritance, which isn't quite as daunting a task. The new hook provides a more gradual path towards understanding all of the phases involved in the class definition process. Reduced chance of metaclass conflicts ------------------------------------- One of the big issues that makes library authors reluctant to use metaclasses (even when they would be appropriate) is the risk of metaclass conflicts. These occur whenever two unrelated metaclasses are used by the desired parents of a class definition. This risk also makes it very difficult to *add* a metaclass to a class that has previously been published without one. By contrast, adding an ``__init_subclass__`` method to an existing type poses a similar level of risk to adding an ``__init__`` method: technically, there is a risk of breaking poorly implemented subclasses, but when that occurs, it is recognised as a bug in the subclass rather than the library author breaching backwards compatibility guarantees. A path of introduction into Python ================================== Most of the benefits of this PEP can already be implemented using a simple metaclass. For the ``__init_subclass__`` hook this works all the way down to Python 2.7, while the attribute order needs Python 3.0 to work. Such a class has been `uploaded to PyPI`_. The only drawback of such a metaclass are the mentioned problems with metaclasses and multiple inheritance. Two classes using such a metaclass can only be combined, if they use exactly the same such metaclass. This fact calls for the inclusion of such a class into the standard library, as ``types.Type``, with a ``types.Object`` base class using it. Once all users use this standard library metaclass, classes from different packages can easily be combined. But still such classes cannot be easily combined with other classes using other metaclasses. Authors of metaclasses should bear that in mind and inherit from the standard metaclass if it seems useful for users of the metaclass to add more functionality. Ultimately, if the need for combining with other metaclasses is strong enough, the proposed functionality may be introduced into Python's ``type``. Those arguments strongly hint to the following procedure to include the proposed functionality into Python: 1. The metaclass implementing this proposal is put onto PyPI, so that it can be used and scrutinized. 2. Introduce this class into the Python 3.6 standard library. 3. Consider this as the default behavior for Python 3.7. Steps 2 and 3 would be similar to how the ``set`` datatype was first introduced as ``sets.Set``, and only later made a builtin type (with a slightly different API) based on wider experiences with the ``sets`` module. While the metaclass is still in the standard library and not in the language, it may still clash with other metaclasses. The most prominent metaclass in use is probably ABCMeta. It is also a particularly good example for the need of combining metaclasses. For users who want to define a ABC with subclass initialization, we should support a ``types.ABCMeta`` class, or let ``abc.ABCMeta`` inherit from this PEP's metaclass. As it turns out, most of the behavior of ``abc.ABCMeta`` can be done achieved with our ``types.Type``, except its core behavior, ``__instancecheck__`` and ``__subclasscheck__`` which can be supplied, as per the definition of the Python language, exclusively in a metaclass. Extensions written in C or C++ also often define their own metaclass. It would be very useful if those could also inherit from the metaclass defined here, but this is probably not possible. New Ways of Using Classes ========================= This proposal has many usecases like the following. In the examples, we still inherit from the ``SubclassInit`` base class. This would become unnecessary once this PEP is included in Python directly. Subclass registration --------------------- Especially when writing a plugin system, one likes to register new subclasses of a plugin baseclass. This can be done as follows:: class PluginBase(Object): subclasses = [] def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) cls.subclasses.append(cls) In this example, ``PluginBase.subclasses`` will contain a plain list of all subclasses in the entire inheritance tree. One should note that this also works nicely as a mixin class. Trait descriptors ----------------- There are many designs of Python descriptors in the wild which, for example, check boundaries of values. Often those "traits" need some support of a metaclass to work. This is how this would look like with this PEP:: class Trait: def __get__(self, instance, owner): return instance.__dict__[self.key] def __set__(self, instance, value): instance.__dict__[self.key] = value def __set_owner__(self, owner, name): self.key = name Rejected Design Options ======================= Calling the hook on the class itself ------------------------------------ Adding an ``__autodecorate__`` hook that would be called on the class itself was the proposed idea of PEP 422. Most examples work the same way or even better if the hook is called on the subclass. In general, it is much easier to explicitly call the hook on the class in which it is defined (to opt-in to such a behavior) than to opt-out, meaning that one does not want the hook to be called on the class it is defined in. This becomes most evident if the class in question is designed as a mixin: it is very unlikely that the code of the mixin is to be executed for the mixin class itself, as it is not supposed to be a complete class on its own. The original proposal also made major changes in the class initialization process, rendering it impossible to back-port the proposal to older Python versions. More importantly, having a pure Python implementation allows us to take two preliminary steps before before we actually change the interpreter, giving us the chance to iron out all possible wrinkles in the API. Other variants of calling the hook ---------------------------------- Other names for the hook were presented, namely ``__decorate__`` or ``__autodecorate__``. This proposal opts for ``__init_subclass__`` as it is very close to the ``__init__`` method, just for the subclass, while it is not very close to decorators, as it does not return the class. Requiring an explicit decorator on ``__init_subclass__`` -------------------------------------------------------- One could require the explicit use of ``@classmethod`` on the ``__init_subclass__`` decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message. This decision was reinforced after noticing that the user experience of defining ``__prepare__`` and forgetting the ``@classmethod`` method decorator is singularly incomprehensible (particularly since PEP 3115 documents it as an ordinary method, and the current documentation doesn't explicitly say anything one way or the other). Defining arbitrary namespaces ----------------------------- PEP 422 defined a generic way to add arbitrary namespaces for class definitions. This approach is much more flexible than just leaving the definition order in a tuple. The ``__prepare__`` method in a metaclass supports exactly this behavior. But given that effectively the only use cases that could be found out in the wild were the ``OrderedDict`` way of determining the attribute order, it seemed reasonable to only support this special case. The metaclass described in this PEP has been designed to be very simple such that it could be reasonably made the default metaclass. This was especially important when designing the attribute order functionality: This was a highly demanded feature and has been enabled through the ``__prepare__`` method of metaclasses. This method can be abused in very weird ways, making it hard to correctly maintain this feature in CPython. This is why it has been proposed to deprecated this feature, and instead use ``OrderedDict`` as the standard namespace, supporting the most important feature while dropping most of the complexity. But this would have meant that ``OrderedDict`` becomes a language builtin like dict and set, and not just a standard library class. The choice of the ``__attribute_order__`` tuple is a much simpler solution to the problem. A more ``__new__``-like hook ---------------------------- In PEP 422 the hook worked more like the ``__new__`` method than the ``__init__`` method, meaning that it returned a class instead of modifying one. This allows a bit more flexibility, but at the cost of much harder implementation and undesired side effects. Adding a class attribute with the attribute order ------------------------------------------------- This got its own PEP 520. History ======= This used to be a competing proposal to PEP 422 by Nick Coghlan and Daniel Urban. PEP 422 intended to achieve the same goals as this PEP, but with a different way of implementation. In the meantime, PEP 422 has been withdrawn favouring this approach. References ========== .. _published code: http://mail.python.org/pipermail/python-dev/2012-June/119878.html .. _more than 10 years ago: http://mail.python.org/pipermail/python-dev/2001-November/018651.html .. _Zope's ExtensionClass: http://docs.zope.org/zope_secrets/extensionclass.html .. _uploaded to PyPI: https://pypi.python.org/pypi/metaclass Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From larry at hastings.org Fri Jun 24 05:14:12 2016 From: larry at hastings.org (Larry Hastings) Date: Fri, 24 Jun 2016 02:14:12 -0700 Subject: [Python-Dev] Here's what's going into 3.5.2 final and 3.4.5 final Message-ID: <576CF9E4.60104@hastings.org> Heads up! This is a courtesy reminder from your friendly 3.4 and 3.5 release manager. Here's a list of all the changes since 3.5.2rc1 that are currently going into 3.5.2 final: * 155e665428c6 - Zachary: OpenSSL 1.0.2h build changes for Windows * cae0b7ffeb9f - Benjamin: fix % in Doc/whatsnew/3.5.rst that confuses latex * 783dfd77e4c1 - Terry Reed: allow non-ascii in idlelib/NEWS.txt * - Matthias: fix for test_ssl test_options on Ubuntu 3.4.5 final only has one change from 3.4.5rc1: the test_ssl test_options fix from Matthias. If there's something else that needs to go into one of these releases, and it's not on the list above, speak up now. I may actually tag these late Friday as I'm traveling Saturday. So you have approximately, let's say 20 hours from when I post this. Thanks, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ilya.Kazakevich at JetBrains.com Fri Jun 24 11:52:27 2016 From: Ilya.Kazakevich at JetBrains.com (Ilya Kazakevich) Date: Fri, 24 Jun 2016 18:52:27 +0300 Subject: [Python-Dev] unittest.TestResult lacks API to separate subtests Message-ID: <045b01d1ce30$688f2a20$39ad7e60$@JetBrains.com> Hello, We're developing Python IDE and integrated it with unittest module using TestResult inheritor to track test start, end etc. With Py3K, it supports addSubTest method, that is called after all subtests. But there is no method called before and after _each_ subtest (like it is done for regular tests). With out of it I can't fetch each subtest output and display it correctly. I suggest to add subTestStart / subTestEnd methods to help me with my issue and other people with similar issues. I can send patch if you think this is a good idea. Ilya Kazakevich JetBrains http://www.jetbrains.com The Drive to Develop -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jun 24 12:08:44 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 24 Jun 2016 18:08:44 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160624160844.50B2E56B74@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-06-17 - 2016-06-24) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5531 (-13) closed 33609 (+52) total 39140 (+39) Open issues with patches: 2419 Issues opened (27) ================== #21106: Updated Mac folder icon http://bugs.python.org/issue21106 reopened by r.david.murray #27196: Eliminate 'ThemeChanged' warning when running IDLE tests http://bugs.python.org/issue27196 reopened by ned.deily #27346: Implement os.readv() / os.writev() http://bugs.python.org/issue27346 opened by mmarkk #27348: Non-main thread exception handler drops exception message http://bugs.python.org/issue27348 opened by martin.panter #27350: Compact and ordered dict http://bugs.python.org/issue27350 opened by naoki #27351: Unexpected ConfigParser.read() behavior when passed fileobject http://bugs.python.org/issue27351 opened by Rich.Rauenzahn #27352: Bug in IMPORT_NAME http://bugs.python.org/issue27352 opened by serhiy.storchaka #27353: Add nroot function to math http://bugs.python.org/issue27353 opened by steven.daprano #27354: SSLContext.load_verify_locations cannot handle paths on Window http://bugs.python.org/issue27354 opened by Ilya.Kulakov #27355: Strip out the last lingering vestiges of Windows CE support http://bugs.python.org/issue27355 opened by larry #27357: Enhancing the Windows installer http://bugs.python.org/issue27357 opened by Sworddragon #27358: BUILD_MAP_UNPACK_WITH_CALL is slow http://bugs.python.org/issue27358 opened by Demur Rumed #27359: OrderedDict pseudo-literals (WIP) http://bugs.python.org/issue27359 opened by llllllllll #27363: Complex with negative zero imaginary part http://bugs.python.org/issue27363 opened by Vedran.??a??i?? #27364: Deprecate invalid unicode escape sequences http://bugs.python.org/issue27364 opened by ebarry #27365: Allow non-ascii chars in IDLE NEWS.txt (for contributor names) http://bugs.python.org/issue27365 opened by terry.reedy #27366: PEP487: Simpler customization of class creation http://bugs.python.org/issue27366 opened by Martin.Teichmann #27367: Windows buildbot: random timeout failure on test_threading http://bugs.python.org/issue27367 opened by haypo #27369: [PATCH] Tests break with --with-system-expat and Expat 2.2.0 http://bugs.python.org/issue27369 opened by sping #27372: Test_idle should stop changing locale http://bugs.python.org/issue27372 opened by terry.reedy #27373: logging.handlers.SysLogHandler with TCP not working on rsyslog http://bugs.python.org/issue27373 opened by ????????? #27374: Cygwin: Makefile does not install DLL import library http://bugs.python.org/issue27374 opened by erik.bray #27376: Add mock_import method to mock module http://bugs.python.org/issue27376 opened by Eyal Posener #27377: Add smarter socket.fromfd() http://bugs.python.org/issue27377 opened by nascheme #27379: SocketType changed in Python 3 http://bugs.python.org/issue27379 opened by martin.panter #27380: IDLE: add base Query dialog with ttk widgets http://bugs.python.org/issue27380 opened by terry.reedy #27383: executuable in distutils triggering microsoft anti virus http://bugs.python.org/issue27383 opened by Rob Bairos Most recent 15 issues with no replies (15) ========================================== #27383: executuable in distutils triggering microsoft anti virus http://bugs.python.org/issue27383 #27380: IDLE: add base Query dialog with ttk widgets http://bugs.python.org/issue27380 #27379: SocketType changed in Python 3 http://bugs.python.org/issue27379 #27376: Add mock_import method to mock module http://bugs.python.org/issue27376 #27374: Cygwin: Makefile does not install DLL import library http://bugs.python.org/issue27374 #27372: Test_idle should stop changing locale http://bugs.python.org/issue27372 #27367: Windows buildbot: random timeout failure on test_threading http://bugs.python.org/issue27367 #27366: PEP487: Simpler customization of class creation http://bugs.python.org/issue27366 #27348: Non-main thread exception handler drops exception message http://bugs.python.org/issue27348 #27340: bytes-like objects with socket.sendall(), SSL, and http.client http://bugs.python.org/issue27340 #27332: Clinic: first parameter for module-level functions should be P http://bugs.python.org/issue27332 #27331: Add a policy argument to email.mime.MIMEBase http://bugs.python.org/issue27331 #27329: Document behavior when CDLL is called with None as an argumen http://bugs.python.org/issue27329 #27326: SIGSEV in test_window_funcs of test_curses http://bugs.python.org/issue27326 #27323: ncurses putwin() fails in test_module_funcs http://bugs.python.org/issue27323 Most recent 15 issues waiting for review (15) ============================================= #27380: IDLE: add base Query dialog with ttk widgets http://bugs.python.org/issue27380 #27377: Add smarter socket.fromfd() http://bugs.python.org/issue27377 #27376: Add mock_import method to mock module http://bugs.python.org/issue27376 #27374: Cygwin: Makefile does not install DLL import library http://bugs.python.org/issue27374 #27372: Test_idle should stop changing locale http://bugs.python.org/issue27372 #27369: [PATCH] Tests break with --with-system-expat and Expat 2.2.0 http://bugs.python.org/issue27369 #27366: PEP487: Simpler customization of class creation http://bugs.python.org/issue27366 #27365: Allow non-ascii chars in IDLE NEWS.txt (for contributor names) http://bugs.python.org/issue27365 #27364: Deprecate invalid unicode escape sequences http://bugs.python.org/issue27364 #27359: OrderedDict pseudo-literals (WIP) http://bugs.python.org/issue27359 #27358: BUILD_MAP_UNPACK_WITH_CALL is slow http://bugs.python.org/issue27358 #27355: Strip out the last lingering vestiges of Windows CE support http://bugs.python.org/issue27355 #27352: Bug in IMPORT_NAME http://bugs.python.org/issue27352 #27350: Compact and ordered dict http://bugs.python.org/issue27350 #27334: pysqlite3 context manager not performing rollback when a datab http://bugs.python.org/issue27334 Top 10 most discussed issues (10) ================================= #27365: Allow non-ascii chars in IDLE NEWS.txt (for contributor names) http://bugs.python.org/issue27365 19 msgs #27051: Create PIP gui http://bugs.python.org/issue27051 15 msgs #27350: Compact and ordered dict http://bugs.python.org/issue27350 9 msgs #27172: Undeprecate inspect.getfullargspec() http://bugs.python.org/issue27172 8 msgs #27344: zipfile *does* support utf-8 filenames http://bugs.python.org/issue27344 8 msgs #27309: Visual Styles support to tk/tkinter file and message dialogs http://bugs.python.org/issue27309 7 msgs #27353: Add nroot function to math http://bugs.python.org/issue27353 7 msgs #27359: OrderedDict pseudo-literals (WIP) http://bugs.python.org/issue27359 7 msgs #27364: Deprecate invalid unicode escape sequences http://bugs.python.org/issue27364 7 msgs #27363: Complex with negative zero imaginary part http://bugs.python.org/issue27363 6 msgs Issues closed (50) ================== #5225: OS X "Update Shell Profile" may not update $PATH if run more t http://bugs.python.org/issue5225 closed by willingc #8406: Make some setup.py paths exclude-able http://bugs.python.org/issue8406 closed by willingc #9156: socket._fileobject: read raises AttributeError when closed in http://bugs.python.org/issue9156 closed by martin.panter #11623: Distutils is reporting OSX 10.6 w/ XCode 4 as "universal" http://bugs.python.org/issue11623 closed by willingc #14354: Crash in _ctypes_alloc_callback http://bugs.python.org/issue14354 closed by willingc #16821: bundlebuilder broken in 2.7 http://bugs.python.org/issue16821 closed by willingc #18300: script_helper._assert_python should set TERM='' by default. http://bugs.python.org/issue18300 closed by berker.peksag #20256: Argument Clinic: compare signed and unsigned ints http://bugs.python.org/issue20256 closed by serhiy.storchaka #22463: Warnings when building on AIX http://bugs.python.org/issue22463 closed by martin.panter #22636: avoid using a shell in ctypes.util: replace os.popen with subp http://bugs.python.org/issue22636 closed by martin.panter #23641: Got rid of bad dunder names http://bugs.python.org/issue23641 closed by serhiy.storchaka #24137: Force not using _default_root in IDLE http://bugs.python.org/issue24137 closed by terry.reedy #24314: irrelevant cross-link in documentation of user-defined functio http://bugs.python.org/issue24314 closed by martin.panter #24419: In argparse action append_const doesn't work for positional ar http://bugs.python.org/issue24419 closed by paul.j3 #26290: fileinput and 'for line in sys.stdin' do strange mockery of in http://bugs.python.org/issue26290 closed by martin.panter #26536: Add the SIO_LOOPBACK_FAST_PATH option to socket.ioctl http://bugs.python.org/issue26536 closed by steve.dower #26547: Undocumented use of the term dictproxy in vars() documentation http://bugs.python.org/issue26547 closed by berker.peksag #26930: Upgrade installers to OpenSSL 1.0.2h http://bugs.python.org/issue26930 closed by steve.dower #26975: Decimal.from_float works incorrectly for non-binary floats http://bugs.python.org/issue26975 closed by skrah #27006: C implementation of Decimal.from_float() bypasses __new__ and http://bugs.python.org/issue27006 closed by skrah #27021: It is not documented that os.writev() suffer from SC_IOV_MAX http://bugs.python.org/issue27021 closed by orsenthil #27048: distutils._msvccompiler._get_vc_env() fails with UnicodeDecode http://bugs.python.org/issue27048 closed by steve.dower #27079: Bugs in curses.ascii predicates http://bugs.python.org/issue27079 closed by serhiy.storchaka #27177: re match.group should support __index__ http://bugs.python.org/issue27177 closed by serhiy.storchaka #27244: print(';;') fails in pdb with SyntaxError http://bugs.python.org/issue27244 closed by ned.deily #27287: SIGSEGV when calling os.forkpty() http://bugs.python.org/issue27287 closed by ned.deily #27294: Better repr for Tkinter event objects http://bugs.python.org/issue27294 closed by serhiy.storchaka #27297: Add support for /dev/random to "secrets" http://bugs.python.org/issue27297 closed by ncoghlan #27299: urllib does not splitport while putrequest realhost to HTTP he http://bugs.python.org/issue27299 closed by r.david.murray #27304: Create "Source Code" links in module sections, where relevant http://bugs.python.org/issue27304 closed by terry.reedy #27312: test_setupapp (idlelib.idle_test.test_macosx.SetupTest) fails http://bugs.python.org/issue27312 closed by ned.deily #27319: Multiple item arguments for selection operations http://bugs.python.org/issue27319 closed by serhiy.storchaka #27333: validate_step in rangeobject.c, incorrect code logic but right http://bugs.python.org/issue27333 closed by serhiy.storchaka #27337: 3.6.0a2 tarball has weird paths http://bugs.python.org/issue27337 closed by ned.deily #27342: Clean up some Py_XDECREFs in rangeobject.c and bltinmodule.c http://bugs.python.org/issue27342 closed by serhiy.storchaka #27343: Incorrect error message for conflicting initializers of ctypes http://bugs.python.org/issue27343 closed by serhiy.storchaka #27345: GzipFile's readinto() reads gzip data instead of file data. http://bugs.python.org/issue27345 closed by Ryan Birmingham #27347: Spam http://bugs.python.org/issue27347 closed by berker.peksag #27349: distutils.command.upload: typo "protcol_version" http://bugs.python.org/issue27349 closed by berker.peksag #27356: csv http://bugs.python.org/issue27356 closed by berker.peksag #27360: _deque and _islice are sometimes None http://bugs.python.org/issue27360 closed by SilentGhost #27361: ValueError on eval after 'from pandas import *' http://bugs.python.org/issue27361 closed by SilentGhost #27362: json.dumps to check for obj.__json__ before raising TypeError http://bugs.python.org/issue27362 closed by r.david.murray #27368: os.mkdir is not working for multiple level of directory creati http://bugs.python.org/issue27368 closed by SilentGhost #27370: Inconsistency in docs for list.extend http://bugs.python.org/issue27370 closed by martin.panter #27371: Runaway memory consumption using tkinter update() http://bugs.python.org/issue27371 closed by jeremyblow #27375: error in "make test", while trying to install python on linux http://bugs.python.org/issue27375 closed by skrah #27378: remove ref to Phil Schwartz's 'Kodos' in regex HOWTO http://bugs.python.org/issue27378 closed by berker.peksag #27381: Typo in zipfile documentation http://bugs.python.org/issue27381 closed by berker.peksag #27382: calendar module .isleap() probleam http://bugs.python.org/issue27382 closed by ebarry From songofacandy at gmail.com Fri Jun 24 13:51:16 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sat, 25 Jun 2016 02:51:16 +0900 Subject: [Python-Dev] Idea: more compact, interned string key only dict for namespace. In-Reply-To: References: <576B3B99.6070204@hotpy.org> Message-ID: Hi, all. I implemented my new idea. (still wip) https://github.com/methane/cpython/pull/3/files Memory usage when building Python doc with sphinx are: 1) master (shared key) 176MB 2) compact (w/ shared key) 158MB 3) compact (w/o shared key) 166MB 4) compact & interned (new) 160MB Memory usage is close to compact w/ shared key, and more efficient than current. In case of Python benchmark (master vs compact & interned): $ python perf.py -fm ~/local/python-master/bin/python3 ~/local/python-intern/bin/python3 ### 2to3 ### Mem max: 20392.000 -> 16936.000: 1.2041x smaller ### chameleon_v2 ### Mem max: 364604.000 -> 359904.000: 1.0131x smaller ### django_v3 ### Mem max: 26648.000 -> 24948.000: 1.0681x smaller ### fastpickle ### Mem max: 8296.000 -> 8996.000: 1.0844x larger ### fastunpickle ### Mem max: 8332.000 -> 7964.000: 1.0462x smaller ### json_dump_v2 ### Mem max: 10400.000 -> 9972.000: 1.0429x smaller ### json_load ### Mem max: 8088.000 -> 7644.000: 1.0581x smaller ### nbody ### Mem max: 7460.000 -> 7036.000: 1.0603x smaller ### regex_v8 ### Mem max: 12572.000 -> 12520.000: 1.0042x smaller ### tornado_http ### Mem max: 27860.000 -> 26792.000: 1.0399x smaller I'll do more hack in next week to prove my idea. (interned string only vs string only, revive embedded small table or not). If someone interested in, please try my interned-dict branch and report difference of performance and memory usage. https://github.com/methane/cpython/tree/interned-dict (cb0a125c79 passes most tests, except tests using sys.getsizeof()). -- INADA Naoki From guido at python.org Fri Jun 24 14:53:02 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Jun 2016 11:53:02 -0700 Subject: [Python-Dev] unittest.TestResult lacks API to separate subtests In-Reply-To: <045b01d1ce30$688f2a20$39ad7e60$@JetBrains.com> References: <045b01d1ce30$688f2a20$39ad7e60$@JetBrains.com> Message-ID: Hi Ilya, That sounds like a fine idea. Can you submit a patch to our bug tracker? bugs.python.org. You'll need to fill out a contributor form as well ( https://www.python.org/psf/contrib/contrib-form/) --Guido On Fri, Jun 24, 2016 at 8:52 AM, Ilya Kazakevich < Ilya.Kazakevich at jetbrains.com> wrote: > Hello, > > > > We?re developing Python IDE and integrated it with unittest module using > TestResult > > inheritor to track test start, end etc. With Py3K, it supports addSubTest method, > that is called after all subtests. But there is no method called before and > after _*each*_ subtest (like it is done for regular tests). With out of > it I can?t fetch each subtest output and display it correctly. > > > > I suggest to add subTestStart / subTestEnd methods to help me with my > issue and other people with similar issues. I can send patch if you think > this is a good idea. > > > > > > Ilya Kazakevich > > > > JetBrains > > http://www.jetbrains.com > > The Drive to Develop > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 24 15:50:18 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Jun 2016 12:50:18 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On 24 June 2016 at 00:41, Martin Teichmann wrote: > Hi list, > > just recently, I posted about the implementation of PEP 487. The > discussion quickly diverted to PEP 520, which happened to be > strongly related. > > Hoping to get some comments about the rest of PEP 487, I took > out the part that is also in PEP 520. Good idea :) ========================= > Proposal > ======== > > While there are many possible ways to use a metaclass, the vast majority > of use cases falls into just three categories: some initialization code > running after class creation, the initalization of descriptors and > keeping the order in which class attributes were defined. > > Those three use cases can easily be performed by just one metaclass. This section needs to be tweaked a bit to defer to PEP 520 for discussion of the 3rd case. > If > this metaclass is put into the standard library, and all libraries that > wish to customize class creation use this very metaclass, no combination > of metaclasses is necessary anymore. Said metaclass should live in the > ``types`` module under the name ``Type``. This should hint the user that > in the future, this metaclass may become the default metaclass ``type``. As long as the PEP still proposes phased integration into the standard library and builtins (more on that below) I'd suggest being explicit here in the proposal section that the non-default metaclasses in the standard library (abc.ABCMeta and enum.EnumMeta) should be updated to inherit from the new types.Type. > The three use cases are achieved as follows: "The three ..." -> "These ..." > 1. The metaclass contains an ``__init_subclass__`` hook that initializes > all subclasses of a given class, > 2. the metaclass calls a ``__set_owner__`` hook on all the attribute > (descriptors) defined in the class, and This part isn't entirely clear to me, so you may want to give some Python pseudo-code that: - is explicit regarding exactly when this new code runs in the type creation process - whether the __set_owner__ hooks are called before or after __init_subclass__ runs, or only when the subclass calls up to super().__init_subclass__, and the implications of each choice (either descriptors see a partially initialised class object, or init_subclass sees partially initialised descriptor objects, or that choice is delegated to individual subclasses) - how the list of objects to be checked for "__set_owner__" methods is retrieved (presumably via "ns.items()" on the class definition namespace, but the PEP should be explicit) For the second point, my personal preference would be for descriptors to have their owner set first and independently of __init_subclass__ super calls (as it seems more likely that __init_subclass__ will depend on having access to fully initialised descriptors than the other way around). > Reduced chance of metaclass conflicts > ------------------------------------- > > One of the big issues that makes library authors reluctant to use metaclasses > (even when they would be appropriate) is the risk of metaclass conflicts. > These occur whenever two unrelated metaclasses are used by the desired > parents of a class definition. This risk also makes it very difficult to > *add* a metaclass to a class that has previously been published without one. > > By contrast, adding an ``__init_subclass__`` method to an existing type poses > a similar level of risk to adding an ``__init__`` method: technically, there > is a risk of breaking poorly implemented subclasses, but when that occurs, > it is recognised as a bug in the subclass rather than the library author > breaching backwards compatibility guarantees. This section needs some additional explanation of how it fares given the proposed migration plan below. I *think* it would be fine, assuming that in 3.7, the types module gains the lines: Type = type Object = object As that would collapse the hierarchy again, even for classes that had declared inheritance from types.Object or the direct use of types.Type as their metaclass in 3.6 Honestly though, I'm not sure this additional user-visible complexity is worth it - "The default type metaclass has this new behaviour" is a lot easier to document and explain than "We added a new opt-in alternate metaclass that you can use if you want, and in the next version that will just become an alias for the builtin types again". We'd also end up being stuck with types.Type and types.Object as aliases for the type and object builtins forever (with the associated "How does 'class name:' or 'class name(object)' differ from 'class name(types.Object)'?" question and "It doesn't, unless you're using Python 3.6" answer for folks learning the language for the first time). If we decide __init_subclass__ and __set_owner__ are good ideas, let's just implement them, with a backport available on PyPI for folks that want to use them on earlier versions, including in Python 2/3 compatible code. > A path of introduction into Python > ================================== > > Most of the benefits of this PEP can already be implemented using > a simple metaclass. For the ``__init_subclass__`` hook this works > all the way down to Python 2.7, while the attribute order needs Python 3.0 > to work. Such a class has been `uploaded to PyPI`_. This paragraph should refer to just __init_subclass__ and __set_owner__ now, since the attribute ordering problem has been moved out to PEP 520. [snip: see above for further comments on why I think this additional complexity in the migration plan might not be worth it] > Rejected Design Options > ======================= > Defining arbitrary namespaces > ----------------------------- > > PEP 422 defined a generic way to add arbitrary namespaces for class > definitions. This approach is much more flexible than just leaving > the definition order in a tuple. The ``__prepare__`` method in a metaclass > supports exactly this behavior. But given that effectively > the only use cases that could be found out in the wild were the > ``OrderedDict`` way of determining the attribute order, it seemed > reasonable to only support this special case. Since it isn't tackling definition order any more, this section can now be left out of this PEP. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Fri Jun 24 17:52:51 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 24 Jun 2016 15:52:51 -0600 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) Message-ID: - a clearer motivation section - include "dunder" names - 2 open questions (__slots__? drop read-only requirement?) -eric ----------------------------------- PEP: 520 Title: Preserving Class Attribute Definition Order Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 7-Jun-2016 Python-Version: 3.6 Post-History: 7-Jun-2016, 11-Jun-2016, 20-Jun-2016, 24-Jun-2016 Abstract ======== The class definition syntax is ordered by its very nature. Class attributes defined there are thus ordered. Aside from helping with readability, that ordering is sometimes significant. If it were automatically available outside the class definition then the attribute order could be used without the need for extra boilerplate (such as metaclasses or manually enumerating the attribute order). Given that this information already exists, access to the definition order of attributes is a reasonable expectation. However, currently Python does not preserve the attribute order from the class definition. This PEP changes that by preserving the order in which attributes are introduced in the class definition body. That order will now be preserved in the ``__definition_order__`` attribute of the class. This allows introspection of the original definition order, e.g. by class decorators. Additionally, this PEP changes the default class definition namespace to ``OrderedDict``. The long-lived class namespace (``__dict__``) will remain a ``dict``. Motivation ========== The attribute order from a class definition may be useful to tools that rely on name order. However, without the automatic availability of the definition order, those tools must impose extra requirements on users. For example, use of such a tool may require that your class use a particular metaclass. Such requirements are often enough to discourage use of the tool. Some tools that could make use of this PEP include: * documentation generators * testing frameworks * CLI frameworks * web frameworks * config generators * data serializers * enum factories (my original motivation) Background ========== When a class is defined using a ``class`` statement, the class body is executed within a namespace. Currently that namespace defaults to ``dict``. If the metaclass defines ``__prepare__()`` then the result of calling it is used for the class definition namespace. After the execution completes, the definition namespace namespace is copied into new ``dict``. Then the original definition namespace is discarded. The new copy is stored away as the class's namespace and is exposed as ``__dict__`` through a read-only proxy. The class attribute definition order is represented by the insertion order of names in the *definition* namespace. Thus, we can have access to the definition order by switching the definition namespace to an ordered mapping, such as ``collections.OrderedDict``. This is feasible using a metaclass and ``__prepare__``, as described above. In fact, exactly this is by far the most common use case for using ``__prepare__`` (see PEP 487). At that point, the only missing thing for later access to the definition order is storing it on the class before the definition namespace is thrown away. Again, this may be done using a metaclass. However, this means that the definition order is preserved only for classes that use such a metaclass. There are two practical problems with that: First, it requires the use of a metaclass. Metaclasses introduce an extra level of complexity to code and in some cases (e.g. conflicts) are a problem. So reducing the need for them is worth doing when the opportunity presents itself. PEP 422 and PEP 487 discuss this at length. Given that we now have a C implementation of ``OrderedDict`` and that ``OrderedDict`` is the common use case for ``__prepare__()``, we have such an opportunity by defaulting to ``OrderedDict``. Second, only classes that opt in to using the ``OrderedDict``-based metaclass will have access to the definition order. This is problematic for cases where universal access to the definition order is important. Specification ============= Part 1: * all classes have a ``__definition_order__`` attribute * ``__definition_order__`` is a ``tuple`` of identifiers (or ``None``) * ``__definition_order__`` is a read-only attribute * ``__definition_order__`` is always set: 1. during execution of the class body, the insertion order of names into the class *definition* namespace is stored in a tuple 2. if ``__definition_order__`` is defined in the class body then it must be a ``tuple`` of identifiers or ``None``; any other value will result in ``TypeError`` 3. classes that do not have a class definition (e.g. builtins) have their ``__definition_order__`` set to ``None`` 4. classes for which `__prepare__()`` returned something other than ``OrderedDict`` (or a subclass) have their ``__definition_order__`` set to ``None`` (except where #2 applies) Part 2: * the default class *definition* namespace is now ``OrderdDict`` The following code demonstrates roughly equivalent semantics for the default behavior:: class Meta(type): @classmethod def __prepare__(cls, *args, **kwargs): return OrderedDict() class Spam(metaclass=Meta): ham = None eggs = 5 __definition_order__ = tuple(locals()) Why a tuple? ------------ Use of a tuple reflects the fact that we are exposing the order in which attributes on the class were *defined*. Since the definition is already complete by the time ``__definition_order__`` is set, the content and order of the value won't be changing. Thus we use a type that communicates that state of immutability. Why a read-only attribute? -------------------------- As with the use of tuple, making ``__definition_order__`` a read-only attribute communicates the fact that the information it represents is complete. Since it represents the state of a particular one-time event (execution of the class definition body), allowing the value to be replaced would reduce confidence that the attribute corresponds to the original class body. If a use case for a writable (or mutable) ``__definition_order__`` arises, the restriction may be loosened later. Presently this seems unlikely and furthermore it is usually best to go immutable-by-default. Note that the ability to set ``__definition_order__`` manually allows a dynamically created class (e.g. Cython, ``type()``) to still have ``__definition_order__`` properly set. Why not "__attribute_order__"? ------------------------------ ``__definition_order__`` is centered on the class definition body. The use cases for dealing with the class namespace (``__dict__``) post-definition are a separate matter. ``__definition_order__`` would be a significantly misleading name for a feature focused on more than class definition. Why not ignore "dunder" names? ------------------------------ Names starting and ending with "__" are reserved for use by the interpreter. In practice they should not be relevant to the users of ``__definition_order__``. Instead, for nearly everyone they would only be clutter, causing the same extra work for everyone. However, dropping dunder names by default may inadvertantly cause problems for classes that use dunder names unconventionally. In this case it's better to play it safe and preserve *all* the names from the class definition. Note that a couple of dunder names (``__name__`` and ``__qualname__``) are injected by default by the compiler. So they will be included even though they are not strictly part of the class definition body. Why None instead of an empty tuple? ----------------------------------- A key objective of adding ``__definition_order__`` is to preserve information in class definitions which was lost prior to this PEP. One consequence is that ``__definition_order__`` implies an original class definition. Using ``None`` allows us to clearly distinquish classes that do not have a definition order. An empty tuple clearly indicates a class that came from a definition statement but did not define any attributes there. Why None instead of not setting the attribute? ---------------------------------------------- The absence of an attribute requires more complex handling than ``None`` does for consumers of ``__definition_order__``. Why constrain manually set values? ---------------------------------- If ``__definition_order__`` is manually set in the class body then it will be used. We require it to be a tuple of identifiers (or ``None``) so that consumers of ``__definition_order__`` may have a consistent expectation for the value. That helps maximize the feature's usefulness. We could also also allow an arbitrary iterable for a manually set ``__definition_order__`` and convert it into a tuple. However, not all iterables infer a definition order (e.g. ``set``). So we opt in favor of requiring a tuple. Why is __definition_order__ even necessary? ------------------------------------------- Since the definition order is not preserved in ``__dict__``, it is lost once class definition execution completes. Classes *could* explicitly set the attribute as the last thing in the body. However, then independent decorators could only make use of classes that had done so. Instead, ``__definition_order__`` preserves this one bit of info from the class body so that it is universally available. Support for C-API Types ======================= Arguably, most C-defined Python types (e.g. built-in, extension modules) have a roughly equivalent concept of a definition order. So conceivably ``__definition_order__`` could be set for such types automatically. This PEP does not introduce any such support. However, it does not prohibit it either. The specific cases: * builtin types * PyType_Ready * PyType_FromSpec Compatibility ============= This PEP does not break backward compatibility, except in the case that someone relies *strictly* on ``dict`` as the class definition namespace. This shouldn't be a problem since ``issubclass(OrderedDict, dict)`` is true. Changes ============= In addition to the class syntax, the following expose the new behavior: * builtins.__build_class__ * types.prepare_class * types.new_class Other Python Implementations ============================ Pending feedback, the impact on Python implementations is expected to be minimal. If a Python implementation cannot support switching to `OrderedDict``-by-default then it can always set ``__definition_order__`` to ``None``. Open Questions ============== * What about `__slots__`? * Drop the "read-only attribute" requirement? Per Guido: I don't see why it needs to be a read-only attribute. There are very few of those -- in general we let users play around with things unless we have a hard reason to restrict assignment (e.g. the interpreter's internal state could be compromised). I don't see such a hard reason here. Implementation ============== The implementation is found in the tracker. [impl_] Alternatives ============ An Order-preserving cls.__dict__ -------------------------------- Instead of storing the definition order in ``__definition_order__``, the now-ordered definition namespace could be copied into a new ``OrderedDict``. This would then be used as the mapping proxied as ``__dict__``. Doing so would mostly provide the same semantics. However, using ``OrderedDict`` for ``__dict__`` would obscure the relationship with the definition namespace, making it less useful. Additionally, (in the case of ``OrderedDict`` specifically) doing this would require significant changes to the semantics of the concrete ``dict`` C-API. There has been some discussion about moving to a compact dict implementation which would (mostly) preserve insertion order. However the lack of an explicit ``__definition_order__`` would still remain as a pain point. A "namespace" Keyword Arg for Class Definition ---------------------------------------------- PEP 422 introduced a new "namespace" keyword arg to class definitions that effectively replaces the need to ``__prepare__()``. [pep422_] However, the proposal was withdrawn in favor of the simpler PEP 487. A stdlib Metaclass that Implements __prepare__() with OrderedDict ----------------------------------------------------------------- This has all the same problems as writing your own metaclass. The only advantage is that you don't have to actually write this metaclass. So it doesn't offer any benefit in the context of this PEP. Set __definition_order__ at Compile-time ---------------------------------------- Each class's ``__qualname__`` is determined at compile-time. This same concept could be applied to ``__definition_order__``. The result of composing ``__definition_order__`` at compile-time would be nearly the same as doing so at run-time. Comparative implementation difficulty aside, the key difference would be that at compile-time it would not be practical to preserve definition order for attributes that are set dynamically in the class body (e.g. ``locals()[name] = value``). However, they should still be reflected in the definition order. One posible resolution would be to require class authors to manually set ``__definition_order__`` if they define any class attributes dynamically. Ultimately, the use of ``OrderedDict`` at run-time or compile-time discovery is almost entirely an implementation detail. References ========== .. [impl] issue #24254 (https://bugs.python.org/issue24254) .. [nick_concern] Nick's concerns about mutability (https://mail.python.org/pipermail/python-dev/2016-June/144883.html) .. [pep422] PEP 422 (https://www.python.org/dev/peps/pep-0422/#order-preserving-classes) .. [pep487] PEP 487 (https://www.python.org/dev/peps/pep-0487/#defining-arbitrary-namespaces) .. [orig] original discussion (https://mail.python.org/pipermail/python-ideas/2013-February/019690.html) .. [followup1] follow-up 1 (https://mail.python.org/pipermail/python-dev/2013-June/127103.html) .. [followup2] follow-up 2 (https://mail.python.org/pipermail/python-dev/2015-May/140137.html) Copyright =========== This document has been placed in the public domain. From ericsnowcurrently at gmail.com Fri Jun 24 18:17:01 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 24 Jun 2016 16:17:01 -0600 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: On Fri, Jun 24, 2016 at 1:50 PM, Nick Coghlan wrote: > Honestly though, I'm not sure this additional user-visible complexity > is worth it - "The default type metaclass has this new behaviour" is a > lot easier to document and explain than "We added a new opt-in > alternate metaclass that you can use if you want, and in the next > version that will just become an alias for the builtin types again". > We'd also end up being stuck with types.Type and types.Object as > aliases for the type and object builtins forever (with the associated > "How does 'class name:' or 'class name(object)' differ from 'class > name(types.Object)'?" question and "It doesn't, unless you're using > Python 3.6" answer for folks learning the language for the first > time). > > If we decide __init_subclass__ and __set_owner__ are good ideas, let's > just implement them, with a backport available on PyPI for folks that > want to use them on earlier versions, including in Python 2/3 > compatible code. +1 Could you clarify the value of the staged approach over jumping straight to changing builtins.type? -eric From ericsnowcurrently at gmail.com Fri Jun 24 18:27:18 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 24 Jun 2016 16:27:18 -0600 Subject: [Python-Dev] PEP XXX: Compact ordered dict In-Reply-To: References: <20160621031727.GA7518@phdru.name> Message-ID: There are a number of ways to make it work (mostly). However, I'll defer to Raymond on how strictly OrderedDict should "subclass" from dict. -eric On Thu, Jun 23, 2016 at 9:26 AM, INADA Naoki wrote: > On Fri, Jun 24, 2016 at 12:03 AM, Eric Snow wrote: >> On Mon, Jun 20, 2016 at 11:02 PM, INADA Naoki wrote: >>> On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: >>>> (if a PEP is needed at all) >>> >>> I don't think so. My PEP is not for changing Python Language, >>> just describe implementation detail. >>> >>> Python 3.5 has new OrderedDict implemented in C without PEP. >>> My patch is relatively small than it. And the idea has been well known. >> >> How about, for 3.6, target re-implementing OrderedDict using the >> compact dict approach (and leave dict alone for now). That way we >> have an extra release cycle to iron out the kinks before switching >> dict over for 3.7. :) >> >> -eric > > I can't. Since OrderedDict inherits dict. OrderedDict implementation > based on dict > implementation. > Since I'm not expert of Python object system, I don't know how to > separate OrderedDict > implementation from dict. > > > -- > INADA Naoki From ericsnowcurrently at gmail.com Fri Jun 24 18:46:11 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 24 Jun 2016 16:46:11 -0600 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: On Fri, Jun 24, 2016 at 4:37 PM, Nick Coghlan wrote: > This version looks fine to me. \o/ > The definition order question has been dropped from PEP 487, so this > cross-reference doesn't really make sense any more :) Ah, so much for my appeal to authority. > I'd characterise this section at the language definition level as the > default class definition namespace now being *permitted* to be an > OrderedDict. For implementations where dict is ordered by default, > there's no requirement to switch specifically to > collections.OrderedDict. Yeah, I'd meant to fix that. > This paragraph is a little confusing, since "set > ``__definition_order__`` manually" is ambiguous. > > "supply an explicit ``__definition_order__`` via the class namespace" > might be clearer. ack > I realised there's another important reason for doing it this way by > default: it's *really easy* to write a "skip_dunder_names" filter that > leaves out dunder names from an arbitrary interable of strings. It's > flatout *impossible* to restore the dunder attribute order if the > class definition process throws it away. Yep. That's why I felt fine with relaxing that. I guess I didn't actually put that in the PEP though. :) -eric From levkivskyi at gmail.com Fri Jun 24 19:04:24 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sat, 25 Jun 2016 01:04:24 +0200 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: I think in any case Type is a bad name, since we now have typing.Type (and it is completely different) I could imagine a lot of confusion. -- Ivan On 25 June 2016 at 00:17, Eric Snow wrote: > On Fri, Jun 24, 2016 at 1:50 PM, Nick Coghlan wrote: > > Honestly though, I'm not sure this additional user-visible complexity > > is worth it - "The default type metaclass has this new behaviour" is a > > lot easier to document and explain than "We added a new opt-in > > alternate metaclass that you can use if you want, and in the next > > version that will just become an alias for the builtin types again". > > We'd also end up being stuck with types.Type and types.Object as > > aliases for the type and object builtins forever (with the associated > > "How does 'class name:' or 'class name(object)' differ from 'class > > name(types.Object)'?" question and "It doesn't, unless you're using > > Python 3.6" answer for folks learning the language for the first > > time). > > > > If we decide __init_subclass__ and __set_owner__ are good ideas, let's > > just implement them, with a backport available on PyPI for folks that > > want to use them on earlier versions, including in Python 2/3 > > compatible code. > > +1 > > Could you clarify the value of the staged approach over jumping > straight to changing builtins.type? > > -eric > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Fri Jun 24 19:56:04 2016 From: random832 at fastmail.com (Random832) Date: Fri, 24 Jun 2016 19:56:04 -0400 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: <1466812564.435734.647885009.35EDDBB1@webmail.messagingengine.com> On Fri, Jun 24, 2016, at 17:52, Eric Snow wrote: > - 2 open questions (__slots__? drop read-only requirement?) It's worth noting that __slots__ itself doesn't have a read-only requirement. It can be a tuple, any iterable of strings, or a single string (which means the object has a single slot). Should dir() iterate in the order of __definition_order__? What, if so, should be done about instance attributes, or attributes of multiple classes, or class attributes not present in __definition_order__? What happens to classes whose __prepare__ doesn't return an OrderedDict? Can __definition_order__ be reassigned at runtime? Will it have the same constraints? What if a metaclass defines __getattribute__ in a way that specially handles __definition_order__? If someone really wants to put a non-tuple there they will find a way. How hard do we want to think about ways to stop consenting adults from doing weird things with the __definition_order__ attribute? From ericsnowcurrently at gmail.com Fri Jun 24 21:28:34 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 24 Jun 2016 19:28:34 -0600 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: <1466812564.435734.647885009.35EDDBB1@webmail.messagingengine.com> References: <1466812564.435734.647885009.35EDDBB1@webmail.messagingengine.com> Message-ID: On Fri, Jun 24, 2016 at 5:56 PM, Random832 wrote: > On Fri, Jun 24, 2016, at 17:52, Eric Snow wrote: >> - 2 open questions (__slots__? drop read-only requirement?) > > It's worth noting that __slots__ itself doesn't have a read-only > requirement. It can be a tuple, any iterable of strings, or a single > string (which means the object has a single slot). That is somewhat orthogonal to this PEP. > > Should dir() iterate in the order of __definition_order__? What, if so, > should be done about instance attributes, or attributes of multiple > classes, or class attributes not present in __definition_order__? dir() relates to the object's namespace, not its class's definition namespace. > > What happens to classes whose __prepare__ doesn't return an OrderedDict? The PEP already indicates that __definition_order__ will be set to None. > > Can __definition_order__ be reassigned at runtime? That is the subject of one of the open questions. Guido has suggested that it should. I don't agree, but then I'm not Dutch. :) > Will it have the same > constraints? Given that folks generally shouldn't be setting it at runtime, there isn't much point to constraining it. :) > > What if a metaclass defines __getattribute__ in a way that specially > handles __definition_order__? If someone really wants to put a non-tuple > there they will find a way. How hard do we want to think about ways to > stop consenting adults from doing weird things with the > __definition_order__ attribute? The point of "consenting adults" is that the person breaking the rules is aware that they are doing so and that they are willing to accept the consequences. Also, note that the interpreter does not depend on __definition_order__ in any way. As I say in the PEP, I'd rather __definition_order__ remain read-only until there's a need. -eric From ncoghlan at gmail.com Fri Jun 24 18:37:55 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Jun 2016 15:37:55 -0700 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: This version looks fine to me. On 24 June 2016 at 14:52, Eric Snow wrote: > Background > ========== > > When a class is defined using a ``class`` statement, the class body > is executed within a namespace. Currently that namespace defaults to > ``dict``. If the metaclass defines ``__prepare__()`` then the result > of calling it is used for the class definition namespace. > > After the execution completes, the definition namespace namespace is > copied into new ``dict``. Then the original definition namespace is > discarded. The new copy is stored away as the class's namespace and > is exposed as ``__dict__`` through a read-only proxy. > > The class attribute definition order is represented by the insertion > order of names in the *definition* namespace. Thus, we can have > access to the definition order by switching the definition namespace > to an ordered mapping, such as ``collections.OrderedDict``. This is > feasible using a metaclass and ``__prepare__``, as described above. > In fact, exactly this is by far the most common use case for using > ``__prepare__`` (see PEP 487). The definition order question has been dropped from PEP 487, so this cross-reference doesn't really make sense any more :) > Part 2: > > * the default class *definition* namespace is now ``OrderdDict`` > > The following code demonstrates roughly equivalent semantics for the > default behavior:: > > class Meta(type): > @classmethod > def __prepare__(cls, *args, **kwargs): > return OrderedDict() > > class Spam(metaclass=Meta): > ham = None > eggs = 5 > __definition_order__ = tuple(locals()) I'd characterise this section at the language definition level as the default class definition namespace now being *permitted* to be an OrderedDict. For implementations where dict is ordered by default, there's no requirement to switch specifically to collections.OrderedDict. > Why a read-only attribute? > -------------------------- > > As with the use of tuple, making ``__definition_order__`` a read-only > attribute communicates the fact that the information it represents is > complete. Since it represents the state of a particular one-time event > (execution of the class definition body), allowing the value to be > replaced would reduce confidence that the attribute corresponds to the > original class body. > > If a use case for a writable (or mutable) ``__definition_order__`` > arises, the restriction may be loosened later. Presently this seems > unlikely and furthermore it is usually best to go immutable-by-default. > > Note that the ability to set ``__definition_order__`` manually allows > a dynamically created class (e.g. Cython, ``type()``) to still have > ``__definition_order__`` properly set. This paragraph is a little confusing, since "set ``__definition_order__`` manually" is ambiguous. "supply an explicit ``__definition_order__`` via the class namespace" might be clearer. > Why not ignore "dunder" names? > ------------------------------ > > Names starting and ending with "__" are reserved for use by the > interpreter. In practice they should not be relevant to the users of > ``__definition_order__``. Instead, for nearly everyone they would only > be clutter, causing the same extra work for everyone. > > However, dropping dunder names by default may inadvertantly cause > problems for classes that use dunder names unconventionally. In this > case it's better to play it safe and preserve *all* the names from > the class definition. > > Note that a couple of dunder names (``__name__`` and ``__qualname__``) > are injected by default by the compiler. So they will be included even > though they are not strictly part of the class definition body. I realised there's another important reason for doing it this way by default: it's *really easy* to write a "skip_dunder_names" filter that leaves out dunder names from an arbitrary interable of strings. It's flatout *impossible* to restore the dunder attribute order if the class definition process throws it away. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From lkb.teichmann at gmail.com Sat Jun 25 03:43:35 2016 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Sat, 25 Jun 2016 09:43:35 +0200 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: Hi Nick, hi List, thanks for the good comments! I'll update the PEP and the implementation soon, as discussed at the end, I'll do it in C this time. For now just some quick responses: > This part isn't entirely clear to me, so you may want to give some > Python pseudo-code that: I will actually give the actual code, it's just 10 lines, that should be understandable. > - is explicit regarding exactly when this new code runs in the type > creation process > - whether the __set_owner__ hooks are called before or after > __init_subclass__ runs, or only when the subclass calls up to > super().__init_subclass__, and the implications of each choice (either > descriptors see a partially initialised class object, or init_subclass > sees partially initialised descriptor objects, or that choice is > delegated to individual subclasses) > - how the list of objects to be checked for "__set_owner__" methods is > retrieved (presumably via "ns.items()" on the class definition > namespace, but the PEP should be explicit) > > For the second point, my personal preference would be for descriptors > to have their owner set first and independently of __init_subclass__ > super calls (as it seems more likely that __init_subclass__ will > depend on having access to fully initialised descriptors than the > other way around). I intuitively programmed it the other way around, but I get your point and will change it. I agree that it should not be done in super().__init_subclass__ as people often forget to call it, or do weird things. > Honestly though, I'm not sure this additional user-visible complexity > is worth it - "The default type metaclass has this new behaviour" is a > lot easier to document and explain than "We added a new opt-in > alternate metaclass that you can use if you want, and in the next > version that will just become an alias for the builtin types again". > We'd also end up being stuck with types.Type and types.Object as > aliases for the type and object builtins forever (with the associated > "How does 'class name:' or 'class name(object)' differ from 'class > name(types.Object)'?" question and "It doesn't, unless you're using > Python 3.6" answer for folks learning the language for the first > time). My idea with a stepped approach was to have a chance to look how people use the feature, so that when we make it standard eventually we actually get it right. In the end, this is a maintainability question. I am fully OK with following experienced core developers here, so if the general idea here is that a two-step approach is not needed, then no problem, let's do it in the Python core directly! Greetings Martin From storchaka at gmail.com Sat Jun 25 17:21:36 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 26 Jun 2016 00:21:36 +0300 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: Message-ID: On 22.06.16 00:17, Victor Stinner wrote: > When loading truncated data with pickle, I expect a pickle error, not a > generic ValueError nor EOFError. Many modules raise EOFError when read truncated data. From storchaka at gmail.com Sat Jun 25 17:26:26 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 26 Jun 2016 00:26:26 +0300 Subject: [Python-Dev] When to use EOFError? In-Reply-To: <201606221822.49308@news.perlig.de> References: <201606221822.49308@news.perlig.de> Message-ID: On 22.06.16 19:22, Andr? Malo wrote: > I often concatenate multiple pickles into one file. When reading them, it > works like this: > > try: > while True: > yield pickle.load(fp) > except EOFError: > pass > > In this case the truncation is not really unexpected. Maybe it should > distinguish between truncated-in-the-middle and truncated-because-empty. > > (Same goes for marshal) This is interesting application, but works only for non-truncated data. If the data is truncated, you just lose the last item without a notice. From leewangzhong+python at gmail.com Sat Jun 25 19:40:08 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 25 Jun 2016 19:40:08 -0400 Subject: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict In-Reply-To: References: Message-ID: On Jun 21, 2016 11:12 AM, "INADA Naoki" wrote: > > I'm sorry, but I hadn't realized which compact ordered dict is > not ordered for split table. > > For example: > >>> class A: > ... ... > ... > >>> a = A() > >>> b = A() > >>> a.a = 1 > >>> a.b = 2 > >>> b.b = 3 > >>> b.a = 4 > >>> a.__dict__.items() > dict_items([('a', 1), ('b', 2)]) > >>> b.__dict__.items() > dict_items([('a', 4), ('b', 3)]) > > > This doesn't affects to **kwargs and class namespace. > > But if we change the language spec to dict preserves insertion order, > this should be addressed. Is that really how it works? From my understanding of PEP 412, they should have different keysets, because one diverged in keys from the other at an intermediate step. Another idea (though it has several issues and seems like a step backward): a split-table dict can have a separate iteration list, indexing into the entry table. There are ways to share iteration lists, and make it so that adding the same keys in the same order each time results in the same iteration list each time, but this costs overhead. There might be ways of reducing the overhead, or the overhead might be replacing bigger overhead, but we should decide if the behavior is what we want in the first place. -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Sun Jun 26 00:46:21 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 26 Jun 2016 13:46:21 +0900 Subject: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict In-Reply-To: References: Message-ID: On Sun, Jun 26, 2016 at 8:40 AM, Franklin? Lee wrote: > On Jun 21, 2016 11:12 AM, "INADA Naoki" wrote: >> >> I'm sorry, but I hadn't realized which compact ordered dict is >> not ordered for split table. >> >> For example: >> >>> class A: >> ... ... >> ... >> >>> a = A() >> >>> b = A() >> >>> a.a = 1 >> >>> a.b = 2 >> >>> b.b = 3 >> >>> b.a = 4 >> >>> a.__dict__.items() >> dict_items([('a', 1), ('b', 2)]) >> >>> b.__dict__.items() >> dict_items([('a', 4), ('b', 3)]) >> >> >> This doesn't affects to **kwargs and class namespace. >> >> But if we change the language spec to dict preserves insertion order, >> this should be addressed. > > Is that really how it works? From my understanding of PEP 412, they should > have different keysets, because one diverged in keys from the other at an > intermediate step. See here https://github.com/python/cpython/blob/3.5/Objects/dictobject.c#L3855-L3866 When keys is resized, 1) If refcnt of old keys is one, new keys are shared with instances created after. 2) Otherwise, sharing key of the class is totally disabled. > Another idea (though it has several issues and seems like a step backward): > a split-table dict can have a separate iteration list, indexing into the > entry table. There are ways to share iteration lists, and make it so that > adding the same keys in the same order each time results in the same > iteration list each time, but this costs overhead. There might be ways of > reducing the overhead, or the overhead might be replacing bigger overhead, > but we should decide if the behavior is what we want in the first place. > I'll test some ideas. But for now, I'll update http://bugs.python.org/issue27350 to stop key sharing when order is different. (a. deletion is not allowed, and insertion order must be same). It may reduce key sharing rate, but total memory usage must not increase so much thanks to compact dict. -- INADA Naoki From songofacandy at gmail.com Sun Jun 26 07:25:36 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 26 Jun 2016 20:25:36 +0900 Subject: [Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict In-Reply-To: References: Message-ID: >> Another idea (though it has several issues and seems like a step backward): >> a split-table dict can have a separate iteration list, indexing into the >> entry table. There are ways to share iteration lists, and make it so that >> adding the same keys in the same order each time results in the same >> iteration list each time, but this costs overhead. There might be ways of >> reducing the overhead, or the overhead might be replacing bigger overhead, >> but we should decide if the behavior is what we want in the first place. >> > > I'll test some ideas. > > But for now, I'll update http://bugs.python.org/issue27350 to stop key > sharing when > order is different. (a. deletion is not allowed, and insertion order > must be same). > > It may reduce key sharing rate, but total memory usage must not increase so > much thanks to compact dict. I did it. issue27350 is now ordered for key sharing dict, too. -- INADA Naoki From nd at perlig.de Sun Jun 26 08:00:49 2016 From: nd at perlig.de (=?windows-1252?q?Andr=E9_Malo?=) Date: Sun, 26 Jun 2016 14:00:49 +0200 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: <201606221822.49308@news.perlig.de> Message-ID: <201606261400.49620@news.perlig.de> * Serhiy Storchaka wrote: > On 22.06.16 19:22, Andr? Malo wrote: > > I often concatenate multiple pickles into one file. When reading them, > > it works like this: > > > > try: > > while True: > > yield pickle.load(fp) > > except EOFError: > > pass > > > > In this case the truncation is not really unexpected. Maybe it should > > distinguish between truncated-in-the-middle and > > truncated-because-empty. > > > > (Same goes for marshal) > > This is interesting application, but works only for non-truncated data. > If the data is truncated, you just lose the last item without a notice. Yes (as said). In my case it's typically not a problem, because I write them myself right before reading them. It's a basically about spooling data to disk in order to keep them out of the RAM. However, because of the truncation issue it would be nice, to have a distinction between no-data and truncated-data. Cheers, -- Winnetous Erbe: From gvanrossum at gmail.com Sun Jun 26 16:42:42 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun, 26 Jun 2016 13:42:42 -0700 Subject: [Python-Dev] When to use EOFError? In-Reply-To: <201606261400.49620@news.perlig.de> References: <201606221822.49308@news.perlig.de> <201606261400.49620@news.perlig.de> Message-ID: I think this is an interesting idea and quite in line with the meaning of EOFError. --Guido (mobile) On Jun 26, 2016 5:02 AM, "Andr? Malo" wrote: > * Serhiy Storchaka wrote: > > > On 22.06.16 19:22, Andr? Malo wrote: > > > I often concatenate multiple pickles into one file. When reading them, > > > it works like this: > > > > > > try: > > > while True: > > > yield pickle.load(fp) > > > except EOFError: > > > pass > > > > > > In this case the truncation is not really unexpected. Maybe it should > > > distinguish between truncated-in-the-middle and > > > truncated-because-empty. > > > > > > (Same goes for marshal) > > > > This is interesting application, but works only for non-truncated data. > > If the data is truncated, you just lose the last item without a notice. > > Yes (as said). In my case it's typically not a problem, because I write > them > myself right before reading them. It's a basically about spooling data to > disk in order to keep them out of the RAM. > However, because of the truncation issue it would be nice, to have a > distinction between no-data and truncated-data. > > Cheers, > -- > Winnetous Erbe: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jun 26 18:25:53 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Jun 2016 15:25:53 -0700 Subject: [Python-Dev] PEP 487: Simpler customization of class creation In-Reply-To: References: Message-ID: > One of the big issues that makes library authors reluctant to use metaclasses > (even when they would be appropriate) is the risk of metaclass conflicts. Really? I've written and reviewed a lot of metaclasses and this has never worried me. The problem is limited to multiple inheritance, right? I worry a lot about MI being imposed on classes that weren't written with MI in mind, but I've never particularly worried about the special case of metaclasses. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jun 26 19:55:05 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Jun 2016 16:55:05 -0700 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: On Fri, Jun 24, 2016 at 3:46 PM, Eric Snow wrote: > On Fri, Jun 24, 2016 at 4:37 PM, Nick Coghlan wrote: > > This version looks fine to me. > > \o/ > Same to me, mostly. > > The definition order question has been dropped from PEP 487, so this > > cross-reference doesn't really make sense any more :) > > Ah, so much for my appeal to authority. > > > I'd characterise this section at the language definition level as the > > default class definition namespace now being *permitted* to be an > > OrderedDict. For implementations where dict is ordered by default, > > there's no requirement to switch specifically to > > collections.OrderedDict. > > Yeah, I'd meant to fix that. > Please do. > > This paragraph is a little confusing, since "set > > ``__definition_order__`` manually" is ambiguous. > > > > "supply an explicit ``__definition_order__`` via the class namespace" > > might be clearer. > > ack > > > I realised there's another important reason for doing it this way by > > default: it's *really easy* to write a "skip_dunder_names" filter that > > leaves out dunder names from an arbitrary interable of strings. It's > > flatout *impossible* to restore the dunder attribute order if the > > class definition process throws it away. > > Yep. That's why I felt fine with relaxing that. I guess I didn't > actually put that in the PEP though. :) > Please add it. I'd also like the PEP point out that there might be other things that an app wouldn't want in the definition order, e.g. anything that's a method, or anything that starts with '_', etc. I still think that it should not be read-only. If __slots__ and __name__ can be writable I think __definition_order__ can be too. (I believe an easy way to make it so should be to add it to the dict that's passed to type.__new__().) Other nits: - I don't think it's great to let other implementations leave __definition_order__ set to None when they don't care to support it; this would break apps/libraries that want to use it and the implementation could refuse to fix it, claiming PEP 520 doesn't mandate it. I think it's better to mandate this from a confirming implementation. - I don't think there's much of a use case for setting __definition_order__ (to a tuple) for builtin classes. However I do think extension modules should be allowed to set it, in case they are substituting for a previous Python-level class whose users expect this to work. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.cliffe at btinternet.com Sun Jun 26 20:22:49 2016 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 27 Jun 2016 01:22:49 +0100 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: <201606221822.49308@news.perlig.de> <201606261400.49620@news.perlig.de> Message-ID: So how about an EmptyFileError (or similar name) as a subclass of EOFError? On 26/06/2016 21:42, Guido van Rossum wrote: > > I think this is an interesting idea and quite in line with the meaning > of EOFError. > > --Guido (mobile) > > On Jun 26, 2016 5:02 AM, "Andr? Malo" > wrote: > > * Serhiy Storchaka wrote: > > > On 22.06.16 19:22, Andr? Malo wrote: > > > I often concatenate multiple pickles into one file. When > reading them, > > > it works like this: > > > > > > try: > > > while True: > > > yield pickle.load(fp) > > > except EOFError: > > > pass > > > > > > In this case the truncation is not really unexpected. Maybe it > should > > > distinguish between truncated-in-the-middle and > > > truncated-because-empty. > > > > > > (Same goes for marshal) > > > > This is interesting application, but works only for > non-truncated data. > > If the data is truncated, you just lose the last item without a > notice. > > Yes (as said). In my case it's typically not a problem, because I > write them > myself right before reading them. It's a basically about spooling > data to > disk in order to keep them out of the RAM. > However, because of the truncation issue it would be nice, to have a > distinction between no-data and truncated-data. > > Cheers, > -- > Winnetous Erbe: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/rob.cliffe%40btinternet.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jun 26 20:40:32 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Jun 2016 17:40:32 -0700 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: <201606221822.49308@news.perlig.de> <201606261400.49620@news.perlig.de> Message-ID: But that use case is not about an empty file. It's about finding nothing at the current position where something was expected. This is similar to the original use case for EOFError, which was raised by input() (or rather, raw_input()) when there was no more data on sys.stdin. On Sun, Jun 26, 2016 at 5:22 PM, Rob Cliffe wrote: > So how about an EmptyFileError (or similar name) as a subclass of EOFError? > > On 26/06/2016 21:42, Guido van Rossum wrote: > > I think this is an interesting idea and quite in line with the meaning of > EOFError. > > --Guido (mobile) > On Jun 26, 2016 5:02 AM, "Andr? Malo" wrote: > >> * Serhiy Storchaka wrote: >> >> > On 22.06.16 19:22, Andr? Malo wrote: >> > > I often concatenate multiple pickles into one file. When reading them, >> > > it works like this: >> > > >> > > try: >> > > while True: >> > > yield pickle.load(fp) >> > > except EOFError: >> > > pass >> > > >> > > In this case the truncation is not really unexpected. Maybe it should >> > > distinguish between truncated-in-the-middle and >> > > truncated-because-empty. >> > > >> > > (Same goes for marshal) >> > >> > This is interesting application, but works only for non-truncated data. >> > If the data is truncated, you just lose the last item without a notice. >> >> Yes (as said). In my case it's typically not a problem, because I write >> them >> myself right before reading them. It's a basically about spooling data to >> disk in order to keep them out of the RAM. >> However, because of the truncation issue it would be nice, to have a >> distinction between no-data and truncated-data. >> >> Cheers, >> -- >> Winnetous Erbe: >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > _______________________________________________ > Python-Dev mailing listPython-Dev at python.orghttps://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/rob.cliffe%40btinternet.com > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Sun Jun 26 22:32:12 2016 From: larry at hastings.org (Larry Hastings) Date: Sun, 26 Jun 2016 19:32:12 -0700 Subject: [Python-Dev] [RELEASED] Python 3.4.5 and Python 3.5.2 are now available Message-ID: <5770902C.4010809@hastings.org> On behalf of the Python development community and the Python 3.4 and Python 3.5 release teams, I'm thrilled to announce the availability of Python 3.4.5 and Python 3.5.2. Python 3.4 is now in "security fixes only" mode. This is the final stage of support for Python 3.4. All changes made to Python 3.4 since Python 3.4.4 should be security fixes only; conventional bug fixes are not accepted. Also, Python 3.4.5 and all future releases of Python 3.4 will only be released as source code--no official binary installers will be produced. Python 3.5 is still in active "bug fix" mode. Python 3.5.2 contains many incremental improvements and bug fixes over Python 3.5.1. You can find Python 3.4.5 here: https://www.python.org/downloads/release/python-345/ And you can find Python 3.5.2 here: https://www.python.org/downloads/release/python-352/ Releasing software from 30,000 feet, //arry /p.s. There appears to be a small oops with the Windows installers for 3.5.2--uploaded to the wrong directory or something. They'll be available soon, honest! -------------- next part -------------- An HTML attachment was scrubbed... URL: From arigo at tunes.org Mon Jun 27 02:34:28 2016 From: arigo at tunes.org (Armin Rigo) Date: Mon, 27 Jun 2016 08:34:28 +0200 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: Hi, On 24 June 2016 at 23:52, Eric Snow wrote: > Pending feedback, the impact on Python implementations is expected to > be minimal. If a Python implementation cannot support switching to > `OrderedDict``-by-default then it can always set ``__definition_order__`` > to ``None``. That's wishful thinking. Any Python implementation that sets ``__definition_order__`` to None where CPython sets it to something useful is likely going to break programs and be deemed not fully compatible. (Note: this PEP is not a problem for PyPy.) A bient?t, Armin. From steve.dower at python.org Mon Jun 27 11:25:40 2016 From: steve.dower at python.org (Steve Dower) Date: Mon, 27 Jun 2016 08:25:40 -0700 Subject: [Python-Dev] [RELEASED] Python 3.4.5 and Python 3.5.2 are now available In-Reply-To: <5770902C.4010809@hastings.org> References: <5770902C.4010809@hastings.org> Message-ID: <794952a4-f50e-00f1-cad8-b424f8d1e945@python.org> On 26Jun2016 1932, Larry Hastings wrote: > https://www.python.org/downloads/release/python-352/ > ... > /p.s. There appears to be a small oops with the Windows installers for > 3.5.2--uploaded to the wrong directory or something. They'll be > available soon, honest! That oops is now fixed, but I wanted to mention one other thing. Microsoft Security Essentials, now a very common antivirus/antimalware scanner on Windows, is incorrectly detecting Lib/distutils/command/wininst-14.0.exe as malware (originally reported at http://bugs.python.org/issue27383). My assumption is that someone distributed malware using a bdist_exe package, and our stub executable got picked up in the signature. I rebuilt the executable on my own machine from early source code and it still triggered the scan, so there does not appear to have been any change to the behaviour of the executable. I've already submitted a false positive report, so I expect an update to correct it at some point in the future, but please do not be alarmed to see this warning when installing Python 3.5.2, or when scanning any earlier version of 3.5. Feel free to contact me off-list or steve.dower at microsoft.com if you have concerns. Cheers, Steve From Nikolaus at rath.org Mon Jun 27 12:09:25 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Mon, 27 Jun 2016 09:09:25 -0700 Subject: [Python-Dev] When to use EOFError? In-Reply-To: (Serhiy Storchaka's message of "Tue, 21 Jun 2016 23:48:09 +0300") References: Message-ID: <8737nyzk96.fsf@thinkpad.rath.org> On Jun 21 2016, Serhiy Storchaka wrote: > There is a design question. If you read file in some format or with > some protocol, and the data is ended unexpectedly, when to use general > EOFError exception and when to use format/protocol specific exception? > > For example when load truncated pickle data, an unpickler can raise > EOFError, UnpicklingError, ValueError or AttributeError. It is > possible to avoid ValueError or AttributeError, but what exception > should be raised instead, EOFError or UnpicklingError? I think EOFError conveys more information. UnpicklingError can mean a lot of things, EOFError tells you the precise problem: pickle expected more data, but there was nothing left. I think in doubt the more specific exception (in this case EOFError) should be raised. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From ethan at stoneleaf.us Mon Jun 27 12:40:16 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Jun 2016 09:40:16 -0700 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: Message-ID: <577156F0.7090403@stoneleaf.us> On 06/21/2016 01:48 PM, Serhiy Storchaka wrote: > There is a design question. If you read file in some format or with some > protocol, and the data is ended unexpectedly, when to use general > EOFError exception and when to use format/protocol specific exception? I believe that EOFError was created for the situation when a file unexpectedly ends. -- ~Ethan~ From greg.ewing at canterbury.ac.nz Mon Jun 27 17:54:20 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 28 Jun 2016 09:54:20 +1200 Subject: [Python-Dev] When to use EOFError? In-Reply-To: <8737nyzk96.fsf@thinkpad.rath.org> References: <8737nyzk96.fsf@thinkpad.rath.org> Message-ID: <5771A08C.50709@canterbury.ac.nz> Nikolaus Rath wrote: > I think EOFError conveys more information. UnpicklingError can mean a > lot of things, EOFError tells you the precise problem: pickle expected > more data, but there was nothing left. I think EOFError should be used for EOF between pickles, but UnpicklingError should be used for EOF in the middle of a pickle. The former is not necessarily an error, but the latter definitely is. -- Greg From ethan at stoneleaf.us Mon Jun 27 18:06:31 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Jun 2016 15:06:31 -0700 Subject: [Python-Dev] When to use EOFError? In-Reply-To: <5771A08C.50709@canterbury.ac.nz> References: <8737nyzk96.fsf@thinkpad.rath.org> <5771A08C.50709@canterbury.ac.nz> Message-ID: <5771A367.3030208@stoneleaf.us> On 06/27/2016 02:54 PM, Greg Ewing wrote: > Nikolaus Rath wrote: >> I think EOFError conveys more information. UnpicklingError can mean a >> lot of things, EOFError tells you the precise problem: pickle expected >> more data, but there was nothing left. > > I think EOFError should be used for EOF between pickles, > but UnpicklingError should be used for EOF in the middle of > a pickle. The former is not necessarily an error, but the > latter definitely is. Why is hitting the end of a file between pickles an error? -- ~Ethan~ From guido at python.org Mon Jun 27 18:20:40 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Jun 2016 15:20:40 -0700 Subject: [Python-Dev] When to use EOFError? In-Reply-To: <5771A367.3030208@stoneleaf.us> References: <8737nyzk96.fsf@thinkpad.rath.org> <5771A08C.50709@canterbury.ac.nz> <5771A367.3030208@stoneleaf.us> Message-ID: The point is that it's not an error. In Andre Malo's use case, at least, EOFError is used as a control flow exception, not as an error. On Mon, Jun 27, 2016 at 3:06 PM, Ethan Furman wrote: > On 06/27/2016 02:54 PM, Greg Ewing wrote: >> >> Nikolaus Rath wrote: >>> >>> I think EOFError conveys more information. UnpicklingError can mean a >>> lot of things, EOFError tells you the precise problem: pickle expected >>> more data, but there was nothing left. >> >> >> I think EOFError should be used for EOF between pickles, >> but UnpicklingError should be used for EOF in the middle of >> a pickle. The former is not necessarily an error, but the >> latter definitely is. > > > Why is hitting the end of a file between pickles an error? > > -- > ~Ethan~ > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Jun 27 18:47:31 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Jun 2016 15:47:31 -0700 Subject: [Python-Dev] When to use EOFError? In-Reply-To: References: <8737nyzk96.fsf@thinkpad.rath.org> <5771A08C.50709@canterbury.ac.nz> <5771A367.3030208@stoneleaf.us> Message-ID: <5771AD03.2040005@stoneleaf.us> On 06/27/2016 03:20 PM, Guido van Rossum wrote: > The point is that it's not an error. In Andre Malo's use case, at > least, EOFError is used as a control flow exception, not as an error. Like StopIteration then: only an error if it escapes. -- ~Ethan~ From random832 at fastmail.com Mon Jun 27 20:31:35 2016 From: random832 at fastmail.com (Random832) Date: Mon, 27 Jun 2016 20:31:35 -0400 Subject: [Python-Dev] When to use EOFError? In-Reply-To: <577156F0.7090403@stoneleaf.us> References: <577156F0.7090403@stoneleaf.us> Message-ID: <1467073895.3675050.650277729.53864CFA@webmail.messagingengine.com> On Mon, Jun 27, 2016, at 12:40, Ethan Furman wrote: > On 06/21/2016 01:48 PM, Serhiy Storchaka wrote: > > > There is a design question. If you read file in some format or with some > > protocol, and the data is ended unexpectedly, when to use general > > EOFError exception and when to use format/protocol specific exception? > > I believe that EOFError was created for the situation when a file > unexpectedly ends. The problem is that's not a good abstraction for the class of errors we're discussing, because it means you've got to pick: the thing your parser parses is a file [and non-files are supported by wrapping them in a StringIO/BytesIO] or it is a str/bytes [and files are supported by reading their data into a string]. Or you could use a third option: a method that accepts a file raises EOFError, and a method that accepts a string raises some other error (ValueError?), and if either is implemented in terms of the other it's got to wrap the exception. (Also, that's nonsense. EOFError is also used when a file *expectedly* ends - EAFP i.e. "Exceptions As Flow-control is Pythonic" ;) From steve at pearwood.info Mon Jun 27 20:25:42 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jun 2016 10:25:42 +1000 Subject: [Python-Dev] When to use EOFError? In-Reply-To: <5771AD03.2040005@stoneleaf.us> References: <8737nyzk96.fsf@thinkpad.rath.org> <5771A08C.50709@canterbury.ac.nz> <5771A367.3030208@stoneleaf.us> <5771AD03.2040005@stoneleaf.us> Message-ID: <20160628002540.GE27919@ando.pearwood.info> On Mon, Jun 27, 2016 at 03:47:31PM -0700, Ethan Furman wrote: > On 06/27/2016 03:20 PM, Guido van Rossum wrote: > > >The point is that it's not an error. In Andre Malo's use case, at > >least, EOFError is used as a control flow exception, not as an error. > > Like StopIteration then: only an error if it escapes. Well, not quite -- if you're expected four pickles in a file, and get EOFError after pickle #2, then it's an actual error. But that's up to the caller to decide. EOFError just means there's nothing more to read in a situation where returning an empty (byte) string isn't an option. The meaning you give to that depends on your expectations. I think Greg had the right idea: raise a pickle error if you hit EOF in the middle of a pickle, because that absolutely means your data is corrupt; raise EOFError when you hit EOF at the very beginning of the file, or after a complete pickle. -- Steve From benjamin at python.org Tue Jun 28 00:36:55 2016 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 27 Jun 2016 21:36:55 -0700 Subject: [Python-Dev] [RELEASE] Python 2.7.12 Message-ID: <1467088615.1149970.650417777.5119C4CF@webmail.messagingengine.com> It is my privilege to present you with another release in the Python 2.7 series, Python 2.7.12. Since the release candidate, there were two changes: - The Windows binaries have been changed to use OpenSSL 1.0.2h. - The "about" dialog in IDLE was fixed. Downloads, as always, are on python.org: https://www.python.org/downloads/release/python-2712/ The complete 2.7.12 changelog is available at https://hg.python.org/cpython/raw-file/v2.7.12/Misc/NEWS Yet another Python 2.7.x release is anticipated near the end of the year. Numerologists may wish to upgrade to Python 3 before we hit the unlucky 2.7.13. Servus, Benjamin 2.7 release manager From ericsnowcurrently at gmail.com Tue Jun 28 13:30:37 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 28 Jun 2016 11:30:37 -0600 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: On Sun, Jun 26, 2016 at 5:55 PM, Guido van Rossum wrote: >> On Fri, Jun 24, 2016 at 4:37 PM, Nick Coghlan wrote: >> > This version looks fine to me. > > Same to me, mostly. I've updated the PEP per everyone's comments [1], except I still haven't dropped the read-only __definition_order__ constraint. I'll do that when I resolve the open questions, on which I'd like some feedback: * What about __slots__? In addition to including __slots__ in __definition_order__, the options I see are to either ignore the names in __slots__, put them into __definition_order__ right after __slots__, or stick them in at the end (since their descriptors are added afterward). I'm leaning toward the first one, leaving the slot names out of __definition_order__ since the names aren't actually part of the definition (__slots__ itself is). Doing so doesn't lose any information and more closely reflects the class definition body. * Allow setting __definition_order__ in type()? I don't see any reason to disallow "__definition_order__" in the namespace passed in to the 3 argument form of builtins.type(). Then dynamically created types can have a definition order (without needing to set cls.__definition_order__ afterward). * C-API for setting __definition_order__? I'd rather avoid any extra complexity in the PEP due to diving into C-API support for *creating* types with a __definition_order__. However, if it would be convenient enough and not a complex endeavor, I'm willing to accommodate that case in the PEP. At the same time, at the C-API level is it so important to accommodate __definition_order__ at class-creation time? Can't you directly modify cls.__dict__ in C? Perhaps it would be safer to have a simple C-API function to set __definition_order__ for you? * Drop the "read-only attribute" requirement? I really like that read-only implies "complete", which is a valuable message for __definition_order__ to convey. I think that there's a lot to be said for communicating about a value in that way. At the same time, most of the time Python doesn't keep you from fiddling with similar "complete" values (e.g. __name__, __slots__), so I see that point too. And since the interpreter (nor stdlib) doesn't rely on __definition_order__, it isn't much of a footgun (nor would setting __definition_order__ be much of an attractive nuisance). I suppose I'm having a hard time letting go of the attractiveness of "read-only == complete". However, given that you've been pretty clear what you think, I'm more at ease about it. :) Anyway, thoughts on the above would be helpful. I'll try to be responsive so we can wrap this up. -eric [1] https://github.com/python/peps/blob/master/pep-0520.txt From guido at python.org Tue Jun 28 13:43:18 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Jun 2016 10:43:18 -0700 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: On Tue, Jun 28, 2016 at 10:30 AM, Eric Snow wrote: > On Sun, Jun 26, 2016 at 5:55 PM, Guido van Rossum wrote: >>> On Fri, Jun 24, 2016 at 4:37 PM, Nick Coghlan wrote: >>> > This version looks fine to me. >> >> Same to me, mostly. > > I've updated the PEP per everyone's comments [1], except I still > haven't dropped the read-only __definition_order__ constraint. I'll > do that when I resolve the open questions, on which I'd like some > feedback: > > * What about __slots__? > > In addition to including __slots__ in __definition_order__, the > options I see are to either ignore the names in __slots__, put them > into __definition_order__ right after __slots__, or stick them in at > the end (since their descriptors are added afterward). I'm leaning > toward the first one, leaving the slot names out of > __definition_order__ since the > names aren't actually part of the definition (__slots__ itself is). > Doing so doesn't lose any information and more closely reflects the > class definition body. Sounds fine. I guess this means you don't have to do anything special, right? > * Allow setting __definition_order__ in type()? > > I don't see any reason to disallow "__definition_order__" in the > namespace passed in to the 3 argument form of builtins.type(). Then > dynamically created types can have a definition order (without needing > to set cls.__definition_order__ afterward). Right. > * C-API for setting __definition_order__? > > I'd rather avoid any extra complexity in the PEP due to diving into > C-API support for *creating* types with a __definition_order__. > However, if it would be convenient enough and not a complex endeavor, > I'm willing to accommodate that case in the PEP. At the same time, at > the C-API level is it so important to accommodate > __definition_order__ at class-creation time? Can't you directly > modify cls.__dict__ in C? Perhaps it would be safer to have a simple > C-API function to set __definition_order__ for you? What's the use case even? I think if __definition_order__ is writable then C code can just use PyObject_SetAttrString(, "__definition_order__", ). > * Drop the "read-only attribute" requirement? > > I really like that read-only implies "complete", which is a valuable > message for __definition_order__ to convey. I think that there's a > lot to be said for communicating about a value in that way. But it's still unique behavior, and it's not needed to protect CPython internals. > At the same time, most of the time Python doesn't keep you from > fiddling with similar "complete" values (e.g. __name__, __slots__), so > I see that point too. And since the interpreter (nor stdlib) doesn't > rely on __definition_order__, it isn't much of a footgun (nor would > setting __definition_order__ be much of an attractive nuisance). > > I suppose I'm having a hard time letting go of the attractiveness of > "read-only == complete". However, given that you've been pretty clear > what you think, I'm more at ease about it. :) Yeah, it's time to drop it. ;-) > Anyway, thoughts on the above would be helpful. I'll try to be > responsive so we can wrap this up. > > -eric > > > [1] https://github.com/python/peps/blob/master/pep-0520.txt -- --Guido van Rossum (python.org/~guido) From ericsnowcurrently at gmail.com Tue Jun 28 16:43:45 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 28 Jun 2016 14:43:45 -0600 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: On Tue, Jun 28, 2016 at 11:43 AM, Guido van Rossum wrote: > On Tue, Jun 28, 2016 at 10:30 AM, Eric Snow wrote: >> I suppose I'm having a hard time letting go of the attractiveness of >> "read-only == complete". However, given that you've been pretty clear >> what you think, I'm more at ease about it. :) > > Yeah, it's time to drop it. ;-) Thanks for the feedback. I've updated the PEP to resolve the open questions. Most notably, I've dropped the read-only constraint on the __definition_order__ attribute. Please let me know if you have any other outstanding concerns. Otherwise I think this PEP is ready for pronouncement. :) -eric From guido at python.org Tue Jun 28 16:55:56 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Jun 2016 13:55:56 -0700 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: Awesome. That addresses my last concerns. PEP 520 is now accepted. Congratulations! On Tue, Jun 28, 2016 at 1:43 PM, Eric Snow wrote: > On Tue, Jun 28, 2016 at 11:43 AM, Guido van Rossum wrote: >> On Tue, Jun 28, 2016 at 10:30 AM, Eric Snow wrote: >>> I suppose I'm having a hard time letting go of the attractiveness of >>> "read-only == complete". However, given that you've been pretty clear >>> what you think, I'm more at ease about it. :) >> >> Yeah, it's time to drop it. ;-) > > Thanks for the feedback. I've updated the PEP to resolve the open > questions. Most notably, I've dropped the read-only constraint on the > __definition_order__ attribute. Please let me know if you have any > other outstanding concerns. Otherwise I think this PEP is ready for > pronouncement. :) > > -eric -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Tue Jun 28 17:10:31 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 28 Jun 2016 14:10:31 -0700 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: <5772E7C7.6070203@stoneleaf.us> On 06/28/2016 01:55 PM, Guido van Rossum wrote: > Awesome. That addresses my last concerns. PEP 520 is now accepted. > Congratulations! And more Congratulations!! -- ~Ethan~ From yselivanov.ml at gmail.com Tue Jun 28 17:05:53 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 28 Jun 2016 17:05:53 -0400 Subject: [Python-Dev] Request for CPython 3.5.3 release Message-ID: Long story short, I've discovered that asyncio is broken in 3.5.2. Specifically, there is a callbacks race in `loop.sock_connect` which can make subsequent `loop.sock_sendall` calls to hang forever. This thing is very tricky and hard to detect and debug; I had to spend a few hours investigating what's going on with a failing unittest in uvloop (asyncio-compatible event loop). I can only imagine how hard it would be to understand what's going on in a larger codebase. For those who is interested, here's a PR for asyncio repo: https://github.com/python/asyncio/pull/366 It explains the bug in detail and there has a proposed patch to fix the problem. Larry and the release team: would it be possible to make an "emergency" 3.5.3 release? Going forward, we need to increase the number of functional tests for asyncio, as most of the current tests use mocks. I'm going to port all functional tests from uvloop to asyncio as a start. Yury From ericsnowcurrently at gmail.com Tue Jun 28 17:22:36 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 28 Jun 2016 15:22:36 -0600 Subject: [Python-Dev] PEP 520: Preserving Class Attribute Definition Order (round 5) In-Reply-To: References: Message-ID: On Jun 28, 2016 2:56 PM, "Guido van Rossum" wrote: > > Awesome. That addresses my last concerns. PEP 520 is now accepted. > Congratulations! Yay! Thank you and to all those that gave such good feedback. -eric (phone) -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Tue Jun 28 17:42:33 2016 From: larry at hastings.org (Larry Hastings) Date: Tue, 28 Jun 2016 14:42:33 -0700 Subject: [Python-Dev] Request for CPython 3.5.3 release In-Reply-To: References: Message-ID: <5772EF49.2020203@hastings.org> On 06/28/2016 02:05 PM, Yury Selivanov wrote: > Long story short, I've discovered that asyncio is broken in 3.5.2. > Specifically, there is a callbacks race in `loop.sock_connect` which > can make subsequent `loop.sock_sendall` calls to hang forever. This > thing is very tricky and hard to detect and debug; I had to spend a > few hours investigating what's going on with a failing unittest in > uvloop (asyncio-compatible event loop). I can only imagine how hard > it would be to understand what's going on in a larger codebase. > > For those who is interested, here's a PR for asyncio repo: > https://github.com/python/asyncio/pull/366 It explains the bug in > detail and there has a proposed patch to fix the problem. > > Larry and the release team: would it be possible to make an > "emergency" 3.5.3 release? I've looped in the rest of the 3.5 release team. By the way, I don't know why you Cc'd Nick and Brett. While they're fine fellows, they aren't on the release team, and they aren't involved in these sorts of decisions. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Tue Jun 28 17:51:54 2016 From: larry at hastings.org (Larry Hastings) Date: Tue, 28 Jun 2016 14:51:54 -0700 Subject: [Python-Dev] Request for CPython 3.5.3 release In-Reply-To: References: Message-ID: <5772F17A.1080902@hastings.org> On 06/28/2016 02:05 PM, Yury Selivanov wrote: > Larry and the release team: would it be possible to make an > "emergency" 3.5.3 release? I'd like to hear from the other asyncio reviewers: is this bug bad enough to merit such an "emergency" release? Thanks, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Tue Jun 28 19:23:05 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Wed, 29 Jun 2016 02:23:05 +0300 Subject: [Python-Dev] Request for CPython 3.5.3 release In-Reply-To: <5772EF49.2020203@hastings.org> References: <5772EF49.2020203@hastings.org> Message-ID: > On Jun 29, 2016, at 12:42 AM, Larry Hastings wrote: > > By the way, I don't know why you Cc'd Nick and Brett. While they're fine fellows, they aren't on the release team, and they aren't involved in these sorts of decisions. We're all involved in those sort of decisions. Raymond From larry at hastings.org Tue Jun 28 19:39:13 2016 From: larry at hastings.org (Larry Hastings) Date: Tue, 28 Jun 2016 16:39:13 -0700 Subject: [Python-Dev] Request for CPython 3.5.3 release In-Reply-To: References: <5772EF49.2020203@hastings.org> Message-ID: <57730AA1.4000305@hastings.org> On 06/28/2016 04:23 PM, Raymond Hettinger wrote: >> On Jun 29, 2016, at 12:42 AM, Larry Hastings wrote: >> >> By the way, I don't know why you Cc'd Nick and Brett. While they're fine fellows, they aren't on the release team, and they aren't involved in these sorts of decisions. > We're all involved in those sort of decisions. > > > Raymond Perhaps, but that would make the Cc: list intractably long. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Jun 28 21:26:18 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 28 Jun 2016 21:26:18 -0400 Subject: [Python-Dev] PEP 495 implementation In-Reply-To: References: Message-ID: Dear All, I have not received any responses since my first post in this thread in September last year. This time I am adding python-dev to BCC in hopes to reach a larger audience. With the date of the first beta (2016-09-07) fast approaching, I would like to commit PEP 495 implementation hopefully before alpha 3 (2016-07-11). I have a patch published as a pull request [1] against the python/cpython repository on github. If anyone still prefers the bug tracker workflow, I can publish it as a patch for issue #24773 [2] as well. Please also see my post from March below. Since that post, I have implemented a -utzdata option to the regression test, so now the long exhaustive test only runs when you do python -mtest -utzdata. Note that this test currently fails on a couple exotic timezones such as Asia/Riyadh87, but it is likely to be an issue with the timezone database rather than python code. I still don't have access to a Windows development box and I know that the current implementation will not work there because I use localtime_r. I need help/advise on this front. I have not started updating the documentation, so the PEP text [3] should serve as the documentation for now. I will try to get the documentation patch ready before the beta, but I don't want to delay checking in the code. On Mon, Mar 21, 2016 at 10:34 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > Dear All, > > I am getting close to completing PEP 495 implementation, but I need someone to help me with a port to Windows. One of the major obstacles is that my implementation relies heavily on the POSIX localtime_r function which apparently is not available on Windows. > > I would also appreciate help from a unittest expert to improve test_datetime. One of the pressing tasks is to make ZoneInfoCompleteTest optional because it takes several minutes to complete. > > I am maintaining the patch as a pull request [1] against the python/cpython repository on github. Code reviews are most welcome. > > [1]: https://github.com/python/cpython/pull/20 [2]: http://bugs.python.org/issue24773 [3]: https://www.python.org/dev/peps/pep-0495 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 29 13:40:40 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Jun 2016 10:40:40 -0700 Subject: [Python-Dev] AutoNumber Enum Message-ID: <57740818.30704@stoneleaf.us> There is a several-month-old request to add aenum's [1] AutoNumberEnum to the stdlib [2]. The requester and two of the three developers of Enum are in favor (the third hasn't chimed in yet). This new addition would enable the following: from Enum import AutoNumberEnum class Color(AutoNumberEnum): # auto-number magic is on Red Green Blue Cyan # magic turns off when non-enum is defined def is_primary(self): # typos in methods, etc, will raise return self in (self.Red, self.Grene, self.Blue) # typos after the initial definition stanza will raise BlueGreen = Blue + Grene There is, of course, the risk of typos during the initial member definition stanza, but since this magic only happens when the user explicitly asks for it (AutoNumberEnum), I think it is acceptable. The `start` parameter is still available, and assigning a number is supported (subsequent numbers will (re)start from the assigned number). Thoughts? Opinions? Flames? -- ~Ethan~ [1] https://pypi.python.org/pypi/aenum [2] http://bugs.python.org/issue26988 From brett at python.org Wed Jun 29 14:15:11 2016 From: brett at python.org (Brett Cannon) Date: Wed, 29 Jun 2016 18:15:11 +0000 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: <57740818.30704@stoneleaf.us> References: <57740818.30704@stoneleaf.us> Message-ID: On Wed, 29 Jun 2016 at 10:41 Ethan Furman wrote: > There is a several-month-old request to add aenum's [1] AutoNumberEnum > to the stdlib [2]. > > The requester and two of the three developers of Enum are in favor (the > third hasn't chimed in yet). > > This new addition would enable the following: > > from Enum import AutoNumberEnum > > class Color(AutoNumberEnum): > # auto-number magic is on > Red > Green > Blue > Cyan > # magic turns off when non-enum is defined > > def is_primary(self): > # typos in methods, etc, will raise > return self in (self.Red, self.Grene, self.Blue) > > # typos after the initial definition stanza will raise > BlueGreen = Blue + Grene > > There is, of course, the risk of typos during the initial member > definition stanza, but since this magic only happens when the user > explicitly asks for it (AutoNumberEnum), I think it is acceptable. > > The `start` parameter is still available, and assigning a number is > supported (subsequent numbers will (re)start from the assigned number). > > Thoughts? Opinions? Flames? > Is it going to subclass Enum or IntEnum? Personally I would be quite happy to never have to specify a value for enums ever again, but only if they subclass Enum (since IntEnum is for compatibility with C stuff where a specific value is needed I don't think users need to mess that up by having the automatic numbering not work how they would expect). -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Wed Jun 29 14:22:15 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 29 Jun 2016 11:22:15 -0700 Subject: [Python-Dev] Issue 27417: Call CoInitializeEx on startup Message-ID: I know this is of fairly limited interest, so this is just advertising http://bugs.python.org/issue27417 where I propose enabling COM by default on startup. If you are someone who knows what CoInitializeEx is or why you may want to call it, I'm interested in your feedback/concerns. Come and post on http://bugs.python.org/issue27417. Cheers, Steve From guido at python.org Wed Jun 29 15:11:39 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Jun 2016 12:11:39 -0700 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: References: <57740818.30704@stoneleaf.us> Message-ID: And how would you implement that without support from the compiler? Does it use a hook that catches the NameError? On Wed, Jun 29, 2016 at 11:15 AM, Brett Cannon wrote: > > > On Wed, 29 Jun 2016 at 10:41 Ethan Furman wrote: >> >> There is a several-month-old request to add aenum's [1] AutoNumberEnum >> to the stdlib [2]. >> >> The requester and two of the three developers of Enum are in favor (the >> third hasn't chimed in yet). >> >> This new addition would enable the following: >> >> from Enum import AutoNumberEnum >> >> class Color(AutoNumberEnum): >> # auto-number magic is on >> Red >> Green >> Blue >> Cyan >> # magic turns off when non-enum is defined >> >> def is_primary(self): >> # typos in methods, etc, will raise >> return self in (self.Red, self.Grene, self.Blue) >> >> # typos after the initial definition stanza will raise >> BlueGreen = Blue + Grene >> >> There is, of course, the risk of typos during the initial member >> definition stanza, but since this magic only happens when the user >> explicitly asks for it (AutoNumberEnum), I think it is acceptable. >> >> The `start` parameter is still available, and assigning a number is >> supported (subsequent numbers will (re)start from the assigned number). >> >> Thoughts? Opinions? Flames? > > > Is it going to subclass Enum or IntEnum? Personally I would be quite happy > to never have to specify a value for enums ever again, but only if they > subclass Enum (since IntEnum is for compatibility with C stuff where a > specific value is needed I don't think users need to mess that up by having > the automatic numbering not work how they would expect). > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Wed Jun 29 15:13:40 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Jun 2016 12:13:40 -0700 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: References: <57740818.30704@stoneleaf.us> Message-ID: <57741DE4.8010505@stoneleaf.us> On 06/29/2016 11:15 AM, Brett Cannon wrote: > On Wed, 29 Jun 2016 at 10:41 Ethan Furman wrote: >> There is a several-month-old request to add aenum's [1] AutoNumberEnum >> to the stdlib [2]. >> >> The requester and two of the three developers of Enum are in favor (the >> third hasn't chimed in yet). >> >> This new addition would enable the following: >> >> from Enum import AutoNumberEnum >> >> class Color(AutoNumberEnum): >> # auto-number magic is on >> Red >> Green >> Blue >> Cyan >> # magic turns off when non-enum is defined > > Is it going to subclass Enum or IntEnum? Enum. > Personally I would be quite > happy to never have to specify a value for enums ever again, but only if > they subclass Enum (since IntEnum is for compatibility with C stuff > where a specific value is needed I don't think users need to mess that > up by having the automatic numbering not work how they would expect). If a user really wants that they can, of course, specify both AutoNuberEnum and IntEnum in the class header. -- ~Ethan~ From ethan at stoneleaf.us Wed Jun 29 15:23:32 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Jun 2016 12:23:32 -0700 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: References: <57740818.30704@stoneleaf.us> Message-ID: <57742034.2000704@stoneleaf.us> On 06/29/2016 12:11 PM, Guido van Rossum wrote: > And how would you implement that without support from the compiler? > Does it use a hook that catches the NameError? It's built into the _EnumDict class dictionary used during class creation. Current (edited) code from the aenum package that implements this: class _EnumDict(dict): """Track enum member order and ensure member names are not reused. EnumMeta will use the names found in self._member_names as the enumeration member names. """ def __init__(self, locked=True, start=1, multivalue=False): super(_EnumDict, self).__init__() # list of enum members self._member_names = [] # starting value for AutoNumber self._value = start - 1 # when the magic turns off self._locked = locked ... def __getitem__(self, key): if ( self._locked or key in self or _is_sunder(key) or _is_dunder(key) ): return super(_EnumDict, self).__getitem__(key) try: # try to generate the next value value = self._value + 1 self.__setitem__(key, value) return value except: # couldn't work the magic, report error raise KeyError('%s not found' % key) def __setitem__(self, key, value): """Changes anything not sundured, dundered, nor a descriptor. Single underscore (sunder) names are reserved. """ if _is_sunder(key): raise ValueError('_names_ are reserved for future Enum use') elif _is_dunder(key): if key == '__order__': key = '_order_' if _is_descriptor(value): self._locked = True elif key in self._member_names: # descriptor overwriting an enum? raise TypeError('Attempted to reuse name: %r' % key) elif not _is_descriptor(value): if key in self: # enum overwriting a descriptor? raise TypeError('%s already defined as: %r' % ... self._member_names.append(key) if not self._locked: if isinstance(value, int): self._value = value else: count = self._value + 1 self._value = count value = count, value else: # not a new member, turn off the autoassign magic self._locked = True super(_EnumDict, self).__setitem__(key, value) Disclaimer: some errors may have crept in as I deleted unrelated content. For the full code check out the _EnumDict class in the aenum package. -- ~Ethan~ From levkivskyi at gmail.com Wed Jun 29 16:01:11 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 29 Jun 2016 22:01:11 +0200 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: <57742034.2000704@stoneleaf.us> References: <57740818.30704@stoneleaf.us> <57742034.2000704@stoneleaf.us> Message-ID: It looks like the values in AutoNumberEnum are consecutive integers 1,2,3,... Have you considered an option (keyword argument) to change this to powers of two 1,2,4,8,...? -- Ivan On 29 June 2016 at 21:23, Ethan Furman wrote: > On 06/29/2016 12:11 PM, Guido van Rossum wrote: > > And how would you implement that without support from the compiler? >> Does it use a hook that catches the NameError? >> > > It's built into the _EnumDict class dictionary used during class creation. > > Current (edited) code from the aenum package that implements this: > > class _EnumDict(dict): > """Track enum member order and ensure member names are not reused. > > EnumMeta will use the names found in self._member_names as the > enumeration member names. > """ > def __init__(self, locked=True, start=1, multivalue=False): > super(_EnumDict, self).__init__() > # list of enum members > self._member_names = [] > # starting value for AutoNumber > self._value = start - 1 > # when the magic turns off > self._locked = locked > ... > > def __getitem__(self, key): > if ( > self._locked > or key in self > or _is_sunder(key) > or _is_dunder(key) > ): > return super(_EnumDict, self).__getitem__(key) > try: > # try to generate the next value > value = self._value + 1 > self.__setitem__(key, value) > return value > except: > # couldn't work the magic, report error > raise KeyError('%s not found' % key) > > def __setitem__(self, key, value): > """Changes anything not sundured, dundered, nor a descriptor. > Single underscore (sunder) names are reserved. > """ > if _is_sunder(key): > raise ValueError('_names_ are reserved for future Enum use') > elif _is_dunder(key): > if key == '__order__': > key = '_order_' > if _is_descriptor(value): > self._locked = True > elif key in self._member_names: > # descriptor overwriting an enum? > raise TypeError('Attempted to reuse name: %r' % key) > elif not _is_descriptor(value): > if key in self: > # enum overwriting a descriptor? > raise TypeError('%s already defined as: %r' % ... > self._member_names.append(key) > if not self._locked: > if isinstance(value, int): > self._value = value > else: > count = self._value + 1 > self._value = count > value = count, value > else: > # not a new member, turn off the autoassign magic > self._locked = True > super(_EnumDict, self).__setitem__(key, value) > > Disclaimer: some errors may have crept in as I deleted unrelated > content. For the full code check out the _EnumDict class in the aenum > package. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 29 16:19:58 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Jun 2016 13:19:58 -0700 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: References: <57740818.30704@stoneleaf.us> <57742034.2000704@stoneleaf.us> Message-ID: <57742D6E.6080609@stoneleaf.us> On 06/29/2016 01:01 PM, Ivan Levkivskyi wrote: > It looks like the values in AutoNumberEnum are consecutive integers > 1,2,3,... > Have you considered an option (keyword argument) to change this to > powers of two 1,2,4,8,...? There is another issue relating to bitwise enums that deals with that. It is not part of this proposal. -- ~Ethan~ From larry at hastings.org Wed Jun 29 16:28:48 2016 From: larry at hastings.org (Larry Hastings) Date: Wed, 29 Jun 2016 13:28:48 -0700 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: References: <57740818.30704@stoneleaf.us> <57742034.2000704@stoneleaf.us> Message-ID: <57742F80.4040705@hastings.org> On 06/29/2016 01:01 PM, Ivan Levkivskyi wrote: > It looks like the values in AutoNumberEnum are consecutive integers > 1,2,3,... > Have you considered an option (keyword argument) to change this to > powers of two 1,2,4,8,...? Why would you want that? I remind you that this descends from Enum, so its members won't be directly interchangeable with ints. Presumably you want a bitfield enum, and those should descend from IntEnum. TBH I'd prefer the AutoNumberEnum *not* have this feature; it's already a little too magical for my tastes. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Wed Jun 29 16:49:19 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 29 Jun 2016 22:49:19 +0200 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: <57742F80.4040705@hastings.org> References: <57740818.30704@stoneleaf.us> <57742034.2000704@stoneleaf.us> <57742F80.4040705@hastings.org> Message-ID: > Presumably you want a bitfield enum, and those should descend from IntEnum. Yes, and probably having an AutoNumberIntEnum would indeed be too much magic in one place. Anyway, it is easy to implement bitfield IntEnum without magic. To be clear, I like the Ethan's original proposal. -- Ivan > TBH I'd prefer the AutoNumberEnum *not* have this feature; it's already a > little too magical for my tastes. > > > */arry* > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertomartinezp at gmail.com Wed Jun 29 18:40:50 2016 From: robertomartinezp at gmail.com (=?UTF-8?Q?Roberto_Mart=C3=ADnez?=) Date: Wed, 29 Jun 2016 22:40:50 +0000 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: <57742034.2000704@stoneleaf.us> References: <57740818.30704@stoneleaf.us> <57742034.2000704@stoneleaf.us> Message-ID: Why the 'start' parameter default is 1? 0 (zero) is more consistent with other parts of the language: indexes, enumerate, range... El mi?., 29 de jun. de 2016 21:26, Ethan Furman escribi?: > On 06/29/2016 12:11 PM, Guido van Rossum wrote: > > > And how would you implement that without support from the compiler? > > Does it use a hook that catches the NameError? > > It's built into the _EnumDict class dictionary used during class creation. > > Current (edited) code from the aenum package that implements this: > > class _EnumDict(dict): > """Track enum member order and ensure member names are not reused. > > EnumMeta will use the names found in self._member_names as the > enumeration member names. > """ > def __init__(self, locked=True, start=1, multivalue=False): > super(_EnumDict, self).__init__() > # list of enum members > self._member_names = [] > # starting value for AutoNumber > self._value = start - 1 > # when the magic turns off > self._locked = locked > ... > > def __getitem__(self, key): > if ( > self._locked > or key in self > or _is_sunder(key) > or _is_dunder(key) > ): > return super(_EnumDict, self).__getitem__(key) > try: > # try to generate the next value > value = self._value + 1 > self.__setitem__(key, value) > return value > except: > # couldn't work the magic, report error > raise KeyError('%s not found' % key) > > def __setitem__(self, key, value): > """Changes anything not sundured, dundered, nor a descriptor. > Single underscore (sunder) names are reserved. > """ > if _is_sunder(key): > raise ValueError('_names_ are reserved for future Enum use') > elif _is_dunder(key): > if key == '__order__': > key = '_order_' > if _is_descriptor(value): > self._locked = True > elif key in self._member_names: > # descriptor overwriting an enum? > raise TypeError('Attempted to reuse name: %r' % key) > elif not _is_descriptor(value): > if key in self: > # enum overwriting a descriptor? > raise TypeError('%s already defined as: %r' % ... > self._member_names.append(key) > if not self._locked: > if isinstance(value, int): > self._value = value > else: > count = self._value + 1 > self._value = count > value = count, value > else: > # not a new member, turn off the autoassign magic > self._locked = True > super(_EnumDict, self).__setitem__(key, value) > > Disclaimer: some errors may have crept in as I deleted unrelated > content. For the full code check out the _EnumDict class in the aenum > package. > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/robertomartinezp%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jun 30 00:29:30 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Jun 2016 21:29:30 -0700 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: References: <57740818.30704@stoneleaf.us> <57742034.2000704@stoneleaf.us> Message-ID: <5774A02A.6030205@stoneleaf.us> On 06/29/2016 03:40 PM, Roberto Mart?nez wrote: > Why the 'start' parameter default is 1? 0 (zero) is more consistent with > other parts of the language: indexes, enumerate, range... An excerpt from [1]: > The reason for defaulting to 1 as the starting number and not 0 is that 0 is False in a boolean sense, but enum members all evaluate to True. -- ~Ethan~ [1] https://docs.python.org/3/library/enum.html#functional-api From wes.turner at gmail.com Thu Jun 30 02:33:22 2016 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 30 Jun 2016 01:33:22 -0500 Subject: [Python-Dev] AutoNumber Enum In-Reply-To: <5774A02A.6030205@stoneleaf.us> References: <57740818.30704@stoneleaf.us> <57742034.2000704@stoneleaf.us> <5774A02A.6030205@stoneleaf.us> Message-ID: It may be worth mentioning that pandas Categoricals are mutable and zero-based: https://pandas-docs.github.io/pandas-docs-travis/categorical.html Serialization to SQL and CSV is (also?) lossy, though: - https://pandas-docs.github.io/pandas-docs-travis/categorical.html#getting-data-in-out - https://pandas-docs.github.io/pandas-docs-travis/io.html#io-stata-categorical On 06/29/2016 03:40 PM, Roberto Mart?nez wrote: Why the 'start' parameter default is 1? 0 (zero) is more consistent with > other parts of the language: indexes, enumerate, range... > An excerpt from [1]: The reason for defaulting to 1 as the starting number and not 0 is that 0 > is False in a boolean sense, but enum members all evaluate to True. > -- ~Ethan~ [1] https://docs.python.org/3/library/enum.html#functional-api _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jun 30 03:41:53 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 30 Jun 2016 00:41:53 -0700 Subject: [Python-Dev] Request for CPython 3.5.3 release In-Reply-To: <5772F17A.1080902@hastings.org> References: <5772F17A.1080902@hastings.org> Message-ID: <5774CD41.9030601@hastings.org> On 06/28/2016 02:51 PM, Larry Hastings wrote: > > On 06/28/2016 02:05 PM, Yury Selivanov wrote: >> Larry and the release team: would it be possible to make an >> "emergency" 3.5.3 release? > > I'd like to hear from the other asyncio reviewers: is this bug bad > enough to merit such an "emergency" release? > > > Thanks, > > > //arry/ There has been a distinct lack of "dear god yes Larry" emails so far. This absence suggests that, no, it is not a bad enough bug to merit such a release. If we stay to our usual schedule, I expect 3.5.3 to ship December-ish. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: