From greg.ewing at canterbury.ac.nz Fri Apr 1 00:57:37 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 01 Apr 2016 17:57:37 +1300 Subject: [Python-Dev] The next major Python version will be Python 8 In-Reply-To: References: Message-ID: <56FDFFC1.5020207@canterbury.ac.nz> Serhiy Storchaka wrote: > Does it combine the base of Python 2 with the power of Python 3? No, that would be Python Backwards-Six. -- Greg From ncoghlan at gmail.com Fri Apr 1 08:43:21 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 1 Apr 2016 22:43:21 +1000 Subject: [Python-Dev] Adding a Pip GUI to IDLE and idlelib (GSOC project) In-Reply-To: References: Message-ID: On 27 March 2016 at 16:13, Terry Reedy wrote: > Thoughts? +1 from me - being able to teach package installation without teaching the command line first has been an oft-requested capability for a long time. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rymg19 at gmail.com Fri Apr 1 10:19:04 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 1 Apr 2016 09:19:04 -0500 Subject: [Python-Dev] The future of Python: fixing broken error handling in Python 8 Message-ID: Python's exception handling system is currently badly brokeTypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'n. Therefore, with the recent news of the joyous release of Python 8 ( https://mail.python.org/pipermail/python-dev/2016-March/143603.html), I have decided to propose a revolutionary idea: safe mock objects. A "safe" mock object (qualified name `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`; Java-style naming was adopted for readability purposes; comments are now no longer necessary) is a magic object that supports everything and returns itself. Since examples speak more words than are in the Python source code, here are some (examples, not words in the Python source code): a = 1 b = None c = a + b # Returns a _frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8 print(c) # Prints the empty string. d = c+1 # All operations on `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`'s return a new one. e = d.xyz(1, 2, 3) # `e` is now a `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`. def f(): assert 0 # Causes the function to return a `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`. raise 123 # Does the same thing. print(L) # L is undefined, so it becomes a `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`. Safe mock objects are obviously the Next Error Handling Revolution ?. Unicode errors now simply disappear and return more `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`s. As for `try` and `catch` (protest the naming of `except`!!) statements, they will be completely ignored. The `try`, `except`, and `finally` bodies will all be executed in sequence, except that printing and returning values with an `except` statement does nothing: try: xyz = None.a # `xyz` becomes a `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`. except: print(123) # Does nothing. return None # Does nothing. finally: return xyz # Returns a `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`. Aggressive error handling (as shown in PanicSort [https://xkcd.com/1185/]) that does destructive actions (such as `rm -rf /`) will always execute the destructive code, encouraging more honest development. In addition, due to errors simply being ignored, nothing can ever quite go wrong. All discussions about a safe navigation operator can now be immediately halted, since any undefined attributes will simply return a `_frozensafemockobjectimplementation.SafeMockObjectThatIsIncludedWithPython8`. Although I have not yet destroy--I mean, improved CPython to allow for this amazing idea, I have created a primitive implementation of the `_frozensafemockobjectimplementation` module: https://github.com/kirbyfan64/_frozensafemockobjectimplementation I hope you will all realize that this new idea is a drastic improvement over current technologies and therefore support it, because we can Make Python Great Again?. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertomartinezp at gmail.com Fri Apr 1 06:42:10 2016 From: robertomartinezp at gmail.com (=?UTF-8?Q?Roberto_Mart=C3=ADnez?=) Date: Fri, 01 Apr 2016 10:42:10 +0000 Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35? Message-ID: Hi, I am having a hard time trying to choose one of this two products: Phyton 27: http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU Phyton 35: http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that Phyton 27 is more tested and have a bigger user base. Can you help to choose? Best regards, Roberto -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Apr 1 12:08:40 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 1 Apr 2016 18:08:40 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160401160840.165F456909@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-03-25 - 2016-04-01) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5471 (+10) closed 32971 (+33) total 38442 (+43) Open issues with patches: 2379 Issues opened (32) ================== #26643: regrtest: rework libregrtest.save_env submodule http://bugs.python.org/issue26643 opened by haypo #26646: Allow built-in module in package http://bugs.python.org/issue26646 opened by Daniel Shaulov #26647: ceval: use Wordcode, 16-bit bytecode http://bugs.python.org/issue26647 opened by Demur Rumed #26648: csv.reader Error message indicates to use deprecated http://bugs.python.org/issue26648 opened by Philip Martin #26650: calendar: OverflowErrors for year == 1 and firstweekday > 0 http://bugs.python.org/issue26650 opened by mjpieters #26651: Deprecate register_adapter() and register_converter() in sqlit http://bugs.python.org/issue26651 opened by berker.peksag #26652: Cannot install Python 2.7.11 on Windows Server 2008 R2 http://bugs.python.org/issue26652 opened by Hung-Hsuan Chen #26654: asyncio is not inspecting keyword arguments of functools.parti http://bugs.python.org/issue26654 opened by iceboy #26656: Documentation for re.compile is a bit outdated http://bugs.python.org/issue26656 opened by Sworddragon #26657: Directory traversal with http.server and SimpleHTTPServer on w http://bugs.python.org/issue26657 opened by Thomas #26658: test_os fails when run on Windows ramdisk http://bugs.python.org/issue26658 opened by jkloth #26659: slice() leaks memory when part of a cycle http://bugs.python.org/issue26659 opened by Kevin Modzelewski #26660: tempfile.TemporaryDirectory() cleanup exception on Windows if http://bugs.python.org/issue26660 opened by Laurent.Mazuel #26661: python fails to locate system libffi http://bugs.python.org/issue26661 opened by rkuska #26662: configure/Makefile doesn't check if "python" command works, ne http://bugs.python.org/issue26662 opened by haypo #26663: asyncio _UnixWritePipeTransport._close abandons unflushed writ http://bugs.python.org/issue26663 opened by Robert Smallshire #26664: find a bug in activate.fish of venv of cpython3.6 http://bugs.python.org/issue26664 opened by ????????? #26665: pip is not bootstrapped by default on 2.7 http://bugs.python.org/issue26665 opened by Axel #26666: File object hook to modify select(ors) event mask http://bugs.python.org/issue26666 opened by zwol #26667: Update importlib to accept pathlib.Path objects http://bugs.python.org/issue26667 opened by brett.cannon #26668: Remove Lib/test/test_importlib/regrtest.py? http://bugs.python.org/issue26668 opened by haypo #26669: time.localtime(float("NaN")) does not raise a ValueError on al http://bugs.python.org/issue26669 opened by gregory.p.smith #26671: Clean up path_converter in posixmodule.c http://bugs.python.org/issue26671 opened by serhiy.storchaka #26672: regrtest missing in the module name http://bugs.python.org/issue26672 opened by Axel #26673: Tkinter error when opening IDLE configuration menu http://bugs.python.org/issue26673 opened by wysaard #26677: pyvenv: activate.fish breaks $PATH for bash scripts http://bugs.python.org/issue26677 opened by Florian.Dold #26678: Incorrect linking to elements in datetime package http://bugs.python.org/issue26678 opened by andymaier #26679: curses: Descripton of KEY_NPAGE and KEY_PPAGE inverted http://bugs.python.org/issue26679 opened by Robert Bachmann #26680: Incorporating float.is_integer into the numeric tower and Deci http://bugs.python.org/issue26680 opened by Robert Smallshire2 #26682: Ttk Notebook tabs do not show with 1-2 char names http://bugs.python.org/issue26682 opened by terry.reedy #26683: Questionable terminology for describing what locals() does http://bugs.python.org/issue26683 opened by rhettinger #26685: Raise errors from socket.close() http://bugs.python.org/issue26685 opened by martin.panter Most recent 15 issues with no replies (15) ========================================== #26677: pyvenv: activate.fish breaks $PATH for bash scripts http://bugs.python.org/issue26677 #26672: regrtest missing in the module name http://bugs.python.org/issue26672 #26669: time.localtime(float("NaN")) does not raise a ValueError on al http://bugs.python.org/issue26669 #26667: Update importlib to accept pathlib.Path objects http://bugs.python.org/issue26667 #26665: pip is not bootstrapped by default on 2.7 http://bugs.python.org/issue26665 #26663: asyncio _UnixWritePipeTransport._close abandons unflushed writ http://bugs.python.org/issue26663 #26661: python fails to locate system libffi http://bugs.python.org/issue26661 #26660: tempfile.TemporaryDirectory() cleanup exception on Windows if http://bugs.python.org/issue26660 #26656: Documentation for re.compile is a bit outdated http://bugs.python.org/issue26656 #26652: Cannot install Python 2.7.11 on Windows Server 2008 R2 http://bugs.python.org/issue26652 #26626: test_dbm_gnu http://bugs.python.org/issue26626 #26618: _overlapped extension module of asyncio uses deprecated WSAStr http://bugs.python.org/issue26618 #26615: Missing entry in WRAPPER_ASSIGNMENTS in update_wrapper's doc http://bugs.python.org/issue26615 #26609: Wrong request target in test_httpservers.py http://bugs.python.org/issue26609 #26600: MagickMock __str__ sometimes returns MagickMock instead of str http://bugs.python.org/issue26600 Most recent 15 issues waiting for review (15) ============================================= #26685: Raise errors from socket.close() http://bugs.python.org/issue26685 #26680: Incorporating float.is_integer into the numeric tower and Deci http://bugs.python.org/issue26680 #26679: curses: Descripton of KEY_NPAGE and KEY_PPAGE inverted http://bugs.python.org/issue26679 #26671: Clean up path_converter in posixmodule.c http://bugs.python.org/issue26671 #26661: python fails to locate system libffi http://bugs.python.org/issue26661 #26658: test_os fails when run on Windows ramdisk http://bugs.python.org/issue26658 #26657: Directory traversal with http.server and SimpleHTTPServer on w http://bugs.python.org/issue26657 #26651: Deprecate register_adapter() and register_converter() in sqlit http://bugs.python.org/issue26651 #26650: calendar: OverflowErrors for year == 1 and firstweekday > 0 http://bugs.python.org/issue26650 #26648: csv.reader Error message indicates to use deprecated http://bugs.python.org/issue26648 #26647: ceval: use Wordcode, 16-bit bytecode http://bugs.python.org/issue26647 #26646: Allow built-in module in package http://bugs.python.org/issue26646 #26643: regrtest: rework libregrtest.save_env submodule http://bugs.python.org/issue26643 #26642: Replace stdout and stderr with simple standard printers at Pyt http://bugs.python.org/issue26642 #26639: Tools/i18n/pygettext.py: replace deprecated imp module with im http://bugs.python.org/issue26639 Top 10 most discussed issues (10) ================================= #26488: hashlib command line interface http://bugs.python.org/issue26488 15 msgs #26647: ceval: use Wordcode, 16-bit bytecode http://bugs.python.org/issue26647 15 msgs #26624: Windows hangs in call to CRT setlocale() http://bugs.python.org/issue26624 10 msgs #18844: allow weights in random.choice http://bugs.python.org/issue18844 8 msgs #26632: __all__ decorator http://bugs.python.org/issue26632 6 msgs #26658: test_os fails when run on Windows ramdisk http://bugs.python.org/issue26658 6 msgs #26680: Incorporating float.is_integer into the numeric tower and Deci http://bugs.python.org/issue26680 6 msgs #23551: IDLE to provide menu link to PIP gui. http://bugs.python.org/issue23551 5 msgs #23735: Readline not adjusting width after resize with 6.3 http://bugs.python.org/issue23735 5 msgs #26606: logging.baseConfig is missing the encoding parameter http://bugs.python.org/issue26606 5 msgs Issues closed (30) ================== #15117: Please document top-level sqlite3 module variables http://bugs.python.org/issue15117 closed by berker.peksag #18691: sqlite3.Cursor.execute expects sequence as second argument. http://bugs.python.org/issue18691 closed by berker.peksag #19065: sqlite3 timestamp adapter chokes on timezones http://bugs.python.org/issue19065 closed by berker.peksag #22218: Fix more compiler warnings "comparison between signed and unsi http://bugs.python.org/issue22218 closed by haypo #22854: Documentation/implementation out of sync for IO http://bugs.python.org/issue22854 closed by martin.panter #23758: Improve documenation about num_params in sqlite3 create_functi http://bugs.python.org/issue23758 closed by berker.peksag #23804: SSLSocket.recv(0) receives up to 1024 bytes http://bugs.python.org/issue23804 closed by martin.panter #25195: mock.ANY doesn't match mock.MagicMock() object http://bugs.python.org/issue25195 closed by berker.peksag #25256: Add sys.debug_build public variable to check if Python was com http://bugs.python.org/issue25256 closed by haypo #25276: Intermittent segfaults on PPC64 AIX 3.x http://bugs.python.org/issue25276 closed by haypo #25289: test_strptime hangs sometimes on AMD64 Windows7 SP1 3.x buildb http://bugs.python.org/issue25289 closed by haypo #25940: SSL tests failed due to expired svn.python.org SSL certificate http://bugs.python.org/issue25940 closed by martin.panter #26130: redundant local copy of a char pointer in classify in Parser\p http://bugs.python.org/issue26130 closed by berker.peksag #26492: Exhausted array iterator should left exhausted http://bugs.python.org/issue26492 closed by serhiy.storchaka #26494: Double deallocation on iterator exhausting http://bugs.python.org/issue26494 closed by serhiy.storchaka #26591: datetime datetime.time to datetime.time comparison does nothin http://bugs.python.org/issue26591 closed by belopolsky #26616: A bug in datetime.astimezone() method http://bugs.python.org/issue26616 closed by belopolsky #26640: xmlrpc.server imports xmlrpc.client http://bugs.python.org/issue26640 closed by brett.cannon #26641: doctest doesn't support packages http://bugs.python.org/issue26641 closed by haypo #26644: SSLSocket.recv(-1) triggers SystemError http://bugs.python.org/issue26644 closed by martin.panter #26645: argparse prints help messages to stdout instead of stderr by d http://bugs.python.org/issue26645 closed by serhiy.storchaka #26649: Fail update installation: 'utf-8' codec can't decode http://bugs.python.org/issue26649 closed by haypo #26653: bisect raises a TypeError when hi is None http://bugs.python.org/issue26653 closed by rhettinger #26655: pathlib glob case sensitivity issue on Windows http://bugs.python.org/issue26655 closed by SilentGhost #26670: Add a developer mode: -X dev command line option http://bugs.python.org/issue26670 closed by haypo #26674: ???typo??? Japanese Documentation http://bugs.python.org/issue26674 closed by ezio.melotti #26675: Appending to a large list flushes old entries http://bugs.python.org/issue26675 closed by Swaprava Nath #26676: Add missing XMLPullParser to ElementTree.__all__ http://bugs.python.org/issue26676 closed by martin.panter #26681: decorators for attributes http://bugs.python.org/issue26681 closed by ethan.furman #26684: pathlib.Path.with_name() and .with_suffix do not allow combini http://bugs.python.org/issue26684 closed by ethan.furman From rosuav at gmail.com Fri Apr 1 12:21:15 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 2 Apr 2016 03:21:15 +1100 Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35? In-Reply-To: References: Message-ID: On Fri, Apr 1, 2016 at 9:42 PM, Roberto Mart?nez wrote: > I am having a hard time trying to choose one of this two products: > > Phyton 27: > http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU > Phyton 35: > http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM > > Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that > Phyton 27 is more tested and have a bigger user base. > > Can you help to choose? Sure! This is a fairly common question, and it comes down to what sort of plants you're trying to use this with. Some plants prefer Phyton 27, while others prefer Phyton 35. Most plants are happy with either, though, so unless you have a good reason to do otherwise, use Phyton 35. Phyton 35 has some significant improvements that make it far better at handling plants from different parts of the world. And even some American plants have special black markings on them, or cost so much money that they're priced in Euros, or for some similar reason need the advanced care of Phyton 35. As such, I strongly recommend that you develop a taste for Phyton 35, as it will serve you better in the long run. In this era of international foods in every supermarket aisle, you cannot simply dismiss the black marks as "funny spots" and wish they'd just go away; you MUST have a fungicide which can adequately handle them. ChrisA PS. This is an *awesome* find! Nice going. From bussonniermatthias at gmail.com Fri Apr 1 12:35:42 2016 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Fri, 1 Apr 2016 09:35:42 -0700 Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35? In-Reply-To: References: Message-ID: On Fri, Apr 1, 2016 at 9:21 AM, Chris Angelico wrote: > On Fri, Apr 1, 2016 at 9:42 PM, Roberto Mart?nez > wrote: >> I am having a hard time trying to choose one of this two products: >> >> Phyton 27: >> http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU >> Phyton 35: >> http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM >> >> Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that >> Phyton 27 is more tested and have a bigger user base. >> >> Can you help to choose? > > Sure! This is a fairly common question, and it comes down to what sort > of plants you're trying to use this with. Some plants prefer Phyton > 27, while others prefer Phyton 35. Most plants are happy with either, > though, so unless you have a good reason to do otherwise, use Phyton > 35. > > Phyton 35 has some significant improvements that make it far better at > handling plants from different parts of the world. And even some > American plants have special black markings on them, or cost so much > money that they're priced in Euros, or for some similar reason need > the advanced care of Phyton 35. As such, I strongly recommend that you > develop a taste for Phyton 35, as it will serve you better in the long > run. In this era of international foods in every supermarket aisle, > you cannot simply dismiss the black marks as "funny spots" and wish > they'd just go away; you MUST have a fungicide which can adequately > handle them. > Also keep in mind that Phyton 35 improve on previous fungicide by allowing asynchronous plant growing using eukaryotic microorganisms also known `yeast from`. -- M From rymg19 at gmail.com Fri Apr 1 13:08:37 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 1 Apr 2016 12:08:37 -0500 Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35? In-Reply-To: References: Message-ID: Well, based on recent feedback, you should wait for Phyton 80, which will also make your bean plants start growing hair. (Side note: This is seriously weird. :O ) -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ Hi, I am having a hard time trying to choose one of this two products: Phyton 27: http://www.amazon.com/Phyton-27-Systemic-Bactericide-Fungicide/dp/B00VKPL8FU Phyton 35: http://www.amazon.com/Phyton-Bactericide-fungicide-Substitute-Liter/dp/B00BGE65VM Phyton 35 is announced as the "Substitute for Phyton 27" but I feel that Phyton 27 is more tested and have a bigger user base. Can you help to choose? Best regards, Roberto _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at snarky.ca Fri Apr 1 14:07:18 2016 From: brett at snarky.ca (Brett Cannon) Date: Fri, 01 Apr 2016 18:07:18 +0000 Subject: [Python-Dev] [Python-checkins] cpython: Python 8: no pep8, no chocolate! In-Reply-To: <20160331214027.11092.50943.0083A2D0@psf.io> References: <20160331214027.11092.50943.0083A2D0@psf.io> Message-ID: Are you planning on removing this after today? My worry about leaving it in is if it's a modified copy that follows your Python 8 April Fools joke then it will quite possibly trip people up who try and run pep8 but don't have it installed, leading them to wonder why the heck their imports are now all flagged as broken. On Thu, 31 Mar 2016 at 14:40 victor.stinner wrote: > https://hg.python.org/cpython/rev/9aedec2dbc01 > changeset: 100818:9aedec2dbc01 > user: Victor Stinner > date: Thu Mar 31 23:30:53 2016 +0200 > summary: > Python 8: no pep8, no chocolate! > > files: > Include/patchlevel.h | 6 +- > Lib/pep8.py | 2151 ++++++++++++++++++++++++++++++ > Lib/site.py | 56 + > 3 files changed, 2210 insertions(+), 3 deletions(-) > > > diff --git a/Include/patchlevel.h b/Include/patchlevel.h > --- a/Include/patchlevel.h > +++ b/Include/patchlevel.h > @@ -16,14 +16,14 @@ > > /* Version parsed out into numeric values */ > /*--start constants--*/ > -#define PY_MAJOR_VERSION 3 > -#define PY_MINOR_VERSION 6 > +#define PY_MAJOR_VERSION 8 > +#define PY_MINOR_VERSION 0 > #define PY_MICRO_VERSION 0 > #define PY_RELEASE_LEVEL PY_RELEASE_LEVEL_ALPHA > #define PY_RELEASE_SERIAL 0 > > /* Version as a string */ > -#define PY_VERSION "3.6.0a0" > +#define PY_VERSION "8.0.0a0" > /*--end constants--*/ > > /* Version as a single 4-byte hex number, e.g. 0x010502B2 == 1.5.2b2. > diff --git a/Lib/pep8.py b/Lib/pep8.py > new file mode 100644 > --- /dev/null > +++ b/Lib/pep8.py > @@ -0,0 +1,2151 @@ > +#!/usr/bin/env python > +# pep8.py - Check Python source code formatting, according to PEP 8 > +# Copyright (C) 2006-2009 Johann C. Rocholl > +# Copyright (C) 2009-2014 Florent Xicluna > +# Copyright (C) 2014-2016 Ian Lee > +# > +# Permission is hereby granted, free of charge, to any person > +# obtaining a copy of this software and associated documentation files > +# (the "Software"), to deal in the Software without restriction, > +# including without limitation the rights to use, copy, modify, merge, > +# publish, distribute, sublicense, and/or sell copies of the Software, > +# and to permit persons to whom the Software is furnished to do so, > +# subject to the following conditions: > +# > +# The above copyright notice and this permission notice shall be > +# included in all copies or substantial portions of the Software. > +# > +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > +# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > +# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > +# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > +# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > +# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > +# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > +# SOFTWARE. > + > +r""" > +Check Python source code formatting, according to PEP 8. > + > +For usage and a list of options, try this: > +$ python pep8.py -h > + > +This program and its regression test suite live here: > +https://github.com/pycqa/pep8 > + > +Groups of errors and warnings: > +E errors > +W warnings > +100 indentation > +200 whitespace > +300 blank lines > +400 imports > +500 line length > +600 deprecation > +700 statements > +900 syntax error > +""" > +from __future__ import with_statement > + > +import os > +import sys > +import re > +import time > +import inspect > +import keyword > +import tokenize > +from optparse import OptionParser > +from fnmatch import fnmatch > +try: > + from configparser import RawConfigParser > + from io import TextIOWrapper > +except ImportError: > + from ConfigParser import RawConfigParser > + > +__version__ = '1.7.0' > + > +DEFAULT_EXCLUDE = '.svn,CVS,.bzr,.hg,.git,__pycache__,.tox' > +DEFAULT_IGNORE = 'E121,E123,E126,E226,E24,E704' > +try: > + if sys.platform == 'win32': > + USER_CONFIG = os.path.expanduser(r'~\.pep8') > + else: > + USER_CONFIG = os.path.join( > + os.getenv('XDG_CONFIG_HOME') or > os.path.expanduser('~/.config'), > + 'pep8' > + ) > +except ImportError: > + USER_CONFIG = None > + > +PROJECT_CONFIG = ('setup.cfg', 'tox.ini', '.pep8') > +TESTSUITE_PATH = os.path.join(os.path.dirname(__file__), 'testsuite') > +MAX_LINE_LENGTH = 79 > +REPORT_FORMAT = { > + 'default': '%(path)s:%(row)d:%(col)d: %(code)s %(text)s', > + 'pylint': '%(path)s:%(row)d: [%(code)s] %(text)s', > +} > + > +PyCF_ONLY_AST = 1024 > +SINGLETONS = frozenset(['False', 'None', 'True']) > +KEYWORDS = frozenset(keyword.kwlist + ['print']) - SINGLETONS > +UNARY_OPERATORS = frozenset(['>>', '**', '*', '+', '-']) > +ARITHMETIC_OP = frozenset(['**', '*', '/', '//', '+', '-']) > +WS_OPTIONAL_OPERATORS = ARITHMETIC_OP.union(['^', '&', '|', '<<', '>>', > '%']) > +WS_NEEDED_OPERATORS = frozenset([ > + '**=', '*=', '/=', '//=', '+=', '-=', '!=', '<>', '<', '>', > + '%=', '^=', '&=', '|=', '==', '<=', '>=', '<<=', '>>=', '=']) > +WHITESPACE = frozenset(' \t') > +NEWLINE = frozenset([tokenize.NL, tokenize.NEWLINE]) > +SKIP_TOKENS = NEWLINE.union([tokenize.INDENT, tokenize.DEDENT]) > +# ERRORTOKEN is triggered by backticks in Python 3 > +SKIP_COMMENTS = SKIP_TOKENS.union([tokenize.COMMENT, tokenize.ERRORTOKEN]) > +BENCHMARK_KEYS = ['directories', 'files', 'logical lines', 'physical > lines'] > + > +INDENT_REGEX = re.compile(r'([ \t]*)') > +RAISE_COMMA_REGEX = re.compile(r'raise\s+\w+\s*,') > +RERAISE_COMMA_REGEX = re.compile(r'raise\s+\w+\s*,.*,\s*\w+\s*$') > +ERRORCODE_REGEX = re.compile(r'\b[A-Z]\d{3}\b') > +DOCSTRING_REGEX = re.compile(r'u?r?["\']') > +EXTRANEOUS_WHITESPACE_REGEX = re.compile(r'[[({] | []}),;:]') > +WHITESPACE_AFTER_COMMA_REGEX = re.compile(r'[,;:]\s*(?: |\t)') > +COMPARE_SINGLETON_REGEX = re.compile(r'(\bNone|\bFalse|\bTrue)?\s*([=!]=)' > + r'\s*(?(1)|(None|False|True))\b') > +COMPARE_NEGATIVE_REGEX = re.compile(r'\b(not)\s+[^][)(}{ ]+\s+(in|is)\s') > +COMPARE_TYPE_REGEX = > re.compile(r'(?:[=!]=|is(?:\s+not)?)\s*type(?:s.\w+Type' > + r'|\s*\(\s*([^)]*[^ )])\s*\))') > +KEYWORD_REGEX = re.compile(r'(\s*)\b(?:%s)\b(\s*)' % r'|'.join(KEYWORDS)) > +OPERATOR_REGEX = re.compile(r'(?:[^,\s])(\s*)(?:[-+*/|!<=>%&^]+)(\s*)') > +LAMBDA_REGEX = re.compile(r'\blambda\b') > +HUNK_REGEX = re.compile(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,(\d+))? @@.*$') > + > +# Work around Python < 2.6 behaviour, which does not generate NL after > +# a comment which is on a line by itself. > +COMMENT_WITH_NL = tokenize.generate_tokens(['#\n'].pop).send(None)[1] == > '#\n' > + > + > > +############################################################################## > +# Plugins (check functions) for physical lines > > +############################################################################## > + > + > +def tabs_or_spaces(physical_line, indent_char): > + r"""Never mix tabs and spaces. > + > + The most popular way of indenting Python is with spaces only. The > + second-most popular way is with tabs only. Code indented with a > mixture > + of tabs and spaces should be converted to using spaces exclusively. > When > + invoking the Python command line interpreter with the -t option, it > issues > + warnings about code that illegally mixes tabs and spaces. When using > -tt > + these warnings become errors. These options are highly recommended! > + > + Okay: if a == 0:\n a = 1\n b = 1 > + E101: if a == 0:\n a = 1\n\tb = 1 > + """ > + indent = INDENT_REGEX.match(physical_line).group(1) > + for offset, char in enumerate(indent): > + if char != indent_char: > + return offset, "E101 indentation contains mixed spaces and > tabs" > + > + > +def tabs_obsolete(physical_line): > + r"""For new projects, spaces-only are strongly recommended over tabs. > + > + Okay: if True:\n return > + W191: if True:\n\treturn > + """ > + indent = INDENT_REGEX.match(physical_line).group(1) > + if '\t' in indent: > + return indent.index('\t'), "W191 indentation contains tabs" > + > + > +def trailing_whitespace(physical_line): > + r"""Trailing whitespace is superfluous. > + > + The warning returned varies on whether the line itself is blank, for > easier > + filtering for those who want to indent their blank lines. > + > + Okay: spam(1)\n# > + W291: spam(1) \n# > + W293: class Foo(object):\n \n bang = 12 > + """ > + physical_line = physical_line.rstrip('\n') # chr(10), newline > + physical_line = physical_line.rstrip('\r') # chr(13), carriage > return > + physical_line = physical_line.rstrip('\x0c') # chr(12), form feed, ^L > + stripped = physical_line.rstrip(' \t\v') > + if physical_line != stripped: > + if stripped: > + return len(stripped), "W291 trailing whitespace" > + else: > + return 0, "W293 blank line contains whitespace" > + > + > +def trailing_blank_lines(physical_line, lines, line_number, total_lines): > + r"""Trailing blank lines are superfluous. > + > + Okay: spam(1) > + W391: spam(1)\n > + > + However the last line should end with a new line (warning W292). > + """ > + if line_number == total_lines: > + stripped_last_line = physical_line.rstrip() > + if not stripped_last_line: > + return 0, "W391 blank line at end of file" > + if stripped_last_line == physical_line: > + return len(physical_line), "W292 no newline at end of file" > + > + > +def maximum_line_length(physical_line, max_line_length, multiline): > + r"""Limit all lines to a maximum of 79 characters. > + > + There are still many devices around that are limited to 80 character > + lines; plus, limiting windows to 80 characters makes it possible to > have > + several windows side-by-side. The default wrapping on such devices > looks > + ugly. Therefore, please limit all lines to a maximum of 79 > characters. > + For flowing long blocks of text (docstrings or comments), limiting the > + length to 72 characters is recommended. > + > + Reports error E501. > + """ > + line = physical_line.rstrip() > + length = len(line) > + if length > max_line_length and not noqa(line): > + # Special case for long URLs in multi-line docstrings or comments, > + # but still report the error when the 72 first chars are > whitespaces. > + chunks = line.split() > + if ((len(chunks) == 1 and multiline) or > + (len(chunks) == 2 and chunks[0] == '#')) and \ > + len(line) - len(chunks[-1]) < max_line_length - 7: > + return > + if hasattr(line, 'decode'): # Python 2 > + # The line could contain multi-byte characters > + try: > + length = len(line.decode('utf-8')) > + except UnicodeError: > + pass > + if length > max_line_length: > + return (max_line_length, "E501 line too long " > + "(%d > %d characters)" % (length, max_line_length)) > + > + > > +############################################################################## > +# Plugins (check functions) for logical lines > > +############################################################################## > + > + > +def blank_lines(logical_line, blank_lines, indent_level, line_number, > + blank_before, previous_logical, previous_indent_level): > + r"""Separate top-level function and class definitions with two blank > lines. > + > + Method definitions inside a class are separated by a single blank > line. > + > + Extra blank lines may be used (sparingly) to separate groups of > related > + functions. Blank lines may be omitted between a bunch of related > + one-liners (e.g. a set of dummy implementations). > + > + Use blank lines in functions, sparingly, to indicate logical sections. > + > + Okay: def a():\n pass\n\n\ndef b():\n pass > + Okay: def a():\n pass\n\n\n# Foo\n# Bar\n\ndef b():\n pass > + > + E301: class Foo:\n b = 0\n def bar():\n pass > + E302: def a():\n pass\n\ndef b(n):\n pass > + E303: def a():\n pass\n\n\n\ndef b(n):\n pass > + E303: def a():\n\n\n\n pass > + E304: @decorator\n\ndef a():\n pass > + """ > + if line_number < 3 and not previous_logical: > + return # Don't expect blank lines before the first line > + if previous_logical.startswith('@'): > + if blank_lines: > + yield 0, "E304 blank lines found after function decorator" > + elif blank_lines > 2 or (indent_level and blank_lines == 2): > + yield 0, "E303 too many blank lines (%d)" % blank_lines > + elif logical_line.startswith(('def ', 'class ', '@')): > + if indent_level: > + if not (blank_before or previous_indent_level < indent_level > or > + DOCSTRING_REGEX.match(previous_logical)): > + yield 0, "E301 expected 1 blank line, found 0" > + elif blank_before != 2: > + yield 0, "E302 expected 2 blank lines, found %d" % > blank_before > + > + > +def extraneous_whitespace(logical_line): > + r"""Avoid extraneous whitespace. > + > + Avoid extraneous whitespace in these situations: > + - Immediately inside parentheses, brackets or braces. > + - Immediately before a comma, semicolon, or colon. > + > + Okay: spam(ham[1], {eggs: 2}) > + E201: spam( ham[1], {eggs: 2}) > + E201: spam(ham[ 1], {eggs: 2}) > + E201: spam(ham[1], { eggs: 2}) > + E202: spam(ham[1], {eggs: 2} ) > + E202: spam(ham[1 ], {eggs: 2}) > + E202: spam(ham[1], {eggs: 2 }) > + > + E203: if x == 4: print x, y; x, y = y , x > + E203: if x == 4: print x, y ; x, y = y, x > + E203: if x == 4 : print x, y; x, y = y, x > + """ > + line = logical_line > + for match in EXTRANEOUS_WHITESPACE_REGEX.finditer(line): > + text = match.group() > + char = text.strip() > + found = match.start() > + if text == char + ' ': > + # assert char in '([{' > + yield found + 1, "E201 whitespace after '%s'" % char > + elif line[found - 1] != ',': > + code = ('E202' if char in '}])' else 'E203') # if char in > ',;:' > + yield found, "%s whitespace before '%s'" % (code, char) > + > + > +def whitespace_around_keywords(logical_line): > + r"""Avoid extraneous whitespace around keywords. > + > + Okay: True and False > + E271: True and False > + E272: True and False > + E273: True and\tFalse > + E274: True\tand False > + """ > + for match in KEYWORD_REGEX.finditer(logical_line): > + before, after = match.groups() > + > + if '\t' in before: > + yield match.start(1), "E274 tab before keyword" > + elif len(before) > 1: > + yield match.start(1), "E272 multiple spaces before keyword" > + > + if '\t' in after: > + yield match.start(2), "E273 tab after keyword" > + elif len(after) > 1: > + yield match.start(2), "E271 multiple spaces after keyword" > + > + > +def missing_whitespace(logical_line): > + r"""Each comma, semicolon or colon should be followed by whitespace. > + > + Okay: [a, b] > + Okay: (3,) > + Okay: a[1:4] > + Okay: a[:4] > + Okay: a[1:] > + Okay: a[1:4:2] > + E231: ['a','b'] > + E231: foo(bar,baz) > + E231: [{'a':'b'}] > + """ > + line = logical_line > + for index in range(len(line) - 1): > + char = line[index] > + if char in ',;:' and line[index + 1] not in WHITESPACE: > + before = line[:index] > + if char == ':' and before.count('[') > before.count(']') and \ > + before.rfind('{') < before.rfind('['): > + continue # Slice syntax, no space required > + if char == ',' and line[index + 1] == ')': > + continue # Allow tuple with only one element: (3,) > + yield index, "E231 missing whitespace after '%s'" % char > + > + > +def indentation(logical_line, previous_logical, indent_char, > + indent_level, previous_indent_level): > + r"""Use 4 spaces per indentation level. > + > + For really old code that you don't want to mess up, you can continue > to > + use 8-space tabs. > + > + Okay: a = 1 > + Okay: if a == 0:\n a = 1 > + E111: a = 1 > + E114: # a = 1 > + > + Okay: for item in items:\n pass > + E112: for item in items:\npass > + E115: for item in items:\n# Hi\n pass > + > + Okay: a = 1\nb = 2 > + E113: a = 1\n b = 2 > + E116: a = 1\n # b = 2 > + """ > + c = 0 if logical_line else 3 > + tmpl = "E11%d %s" if logical_line else "E11%d %s (comment)" > + if indent_level % 4: > + yield 0, tmpl % (1 + c, "indentation is not a multiple of four") > + indent_expect = previous_logical.endswith(':') > + if indent_expect and indent_level <= previous_indent_level: > + yield 0, tmpl % (2 + c, "expected an indented block") > + elif not indent_expect and indent_level > previous_indent_level: > + yield 0, tmpl % (3 + c, "unexpected indentation") > + > + > +def continued_indentation(logical_line, tokens, indent_level, > hang_closing, > + indent_char, noqa, verbose): > + r"""Continuation lines indentation. > + > + Continuation lines should align wrapped elements either vertically > + using Python's implicit line joining inside parentheses, brackets > + and braces, or using a hanging indent. > + > + When using a hanging indent these considerations should be applied: > + - there should be no arguments on the first line, and > + - further indentation should be used to clearly distinguish itself as > a > + continuation line. > + > + Okay: a = (\n) > + E123: a = (\n ) > + > + Okay: a = (\n 42) > + E121: a = (\n 42) > + E122: a = (\n42) > + E123: a = (\n 42\n ) > + E124: a = (24,\n 42\n) > + E125: if (\n b):\n pass > + E126: a = (\n 42) > + E127: a = (24,\n 42) > + E128: a = (24,\n 42) > + E129: if (a or\n b):\n pass > + E131: a = (\n 42\n 24) > + """ > + first_row = tokens[0][2][0] > + nrows = 1 + tokens[-1][2][0] - first_row > + if noqa or nrows == 1: > + return > + > + # indent_next tells us whether the next block is indented; assuming > + # that it is indented by 4 spaces, then we should not allow 4-space > + # indents on the final continuation line; in turn, some other > + # indents are allowed to have an extra 4 spaces. > + indent_next = logical_line.endswith(':') > + > + row = depth = 0 > + valid_hangs = (4,) if indent_char != '\t' else (4, 8) > + # remember how many brackets were opened on each line > + parens = [0] * nrows > + # relative indents of physical lines > + rel_indent = [0] * nrows > + # for each depth, collect a list of opening rows > + open_rows = [[0]] > + # for each depth, memorize the hanging indentation > + hangs = [None] > + # visual indents > + indent_chances = {} > + last_indent = tokens[0][2] > + visual_indent = None > + last_token_multiline = False > + # for each depth, memorize the visual indent column > + indent = [last_indent[1]] > + if verbose >= 3: > + print(">>> " + tokens[0][4].rstrip()) > + > + for token_type, text, start, end, line in tokens: > + > + newline = row < start[0] - first_row > + if newline: > + row = start[0] - first_row > + newline = not last_token_multiline and token_type not in > NEWLINE > + > + if newline: > + # this is the beginning of a continuation line. > + last_indent = start > + if verbose >= 3: > + print("... " + line.rstrip()) > + > + # record the initial indent. > + rel_indent[row] = expand_indent(line) - indent_level > + > + # identify closing bracket > + close_bracket = (token_type == tokenize.OP and text in ']})') > + > + # is the indent relative to an opening bracket line? > + for open_row in reversed(open_rows[depth]): > + hang = rel_indent[row] - rel_indent[open_row] > + hanging_indent = hang in valid_hangs > + if hanging_indent: > + break > + if hangs[depth]: > + hanging_indent = (hang == hangs[depth]) > + # is there any chance of visual indent? > + visual_indent = (not close_bracket and hang > 0 and > + indent_chances.get(start[1])) > + > + if close_bracket and indent[depth]: > + # closing bracket for visual indent > + if start[1] != indent[depth]: > + yield (start, "E124 closing bracket does not match " > + "visual indentation") > + elif close_bracket and not hang: > + # closing bracket matches indentation of opening > bracket's line > + if hang_closing: > + yield start, "E133 closing bracket is missing > indentation" > + elif indent[depth] and start[1] < indent[depth]: > + if visual_indent is not True: > + # visual indent is broken > + yield (start, "E128 continuation line " > + "under-indented for visual indent") > + elif hanging_indent or (indent_next and rel_indent[row] == 8): > + # hanging indent is verified > + if close_bracket and not hang_closing: > + yield (start, "E123 closing bracket does not match " > + "indentation of opening bracket's line") > + hangs[depth] = hang > + elif visual_indent is True: > + # visual indent is verified > + indent[depth] = start[1] > + elif visual_indent in (text, str): > + # ignore token lined up with matching one from a previous > line > + pass > + else: > + # indent is broken > + if hang <= 0: > + error = "E122", "missing indentation or outdented" > + elif indent[depth]: > + error = "E127", "over-indented for visual indent" > + elif not close_bracket and hangs[depth]: > + error = "E131", "unaligned for hanging indent" > + else: > + hangs[depth] = hang > + if hang > 4: > + error = "E126", "over-indented for hanging indent" > + else: > + error = "E121", "under-indented for hanging > indent" > + yield start, "%s continuation line %s" % error > + > + # look for visual indenting > + if (parens[row] and > + token_type not in (tokenize.NL, tokenize.COMMENT) and > + not indent[depth]): > + indent[depth] = start[1] > + indent_chances[start[1]] = True > + if verbose >= 4: > + print("bracket depth %s indent to %s" % (depth, start[1])) > + # deal with implicit string concatenation > + elif (token_type in (tokenize.STRING, tokenize.COMMENT) or > + text in ('u', 'ur', 'b', 'br')): > + indent_chances[start[1]] = str > + # special case for the "if" statement because len("if (") == 4 > + elif not indent_chances and not row and not depth and text == > 'if': > + indent_chances[end[1] + 1] = True > + elif text == ':' and line[end[1]:].isspace(): > + open_rows[depth].append(row) > + > + # keep track of bracket depth > + if token_type == tokenize.OP: > + if text in '([{': > + depth += 1 > + indent.append(0) > + hangs.append(None) > + if len(open_rows) == depth: > + open_rows.append([]) > + open_rows[depth].append(row) > + parens[row] += 1 > + if verbose >= 4: > + print("bracket depth %s seen, col %s, visual min = > %s" % > + (depth, start[1], indent[depth])) > + elif text in ')]}' and depth > 0: > + # parent indents should not be more than this one > + prev_indent = indent.pop() or last_indent[1] > + hangs.pop() > + for d in range(depth): > + if indent[d] > prev_indent: > + indent[d] = 0 > + for ind in list(indent_chances): > + if ind >= prev_indent: > + del indent_chances[ind] > + del open_rows[depth + 1:] > + depth -= 1 > + if depth: > + indent_chances[indent[depth]] = True > + for idx in range(row, -1, -1): > + if parens[idx]: > + parens[idx] -= 1 > + break > + assert len(indent) == depth + 1 > + if start[1] not in indent_chances: > + # allow to line up tokens > + indent_chances[start[1]] = text > + > + last_token_multiline = (start[0] != end[0]) > + if last_token_multiline: > + rel_indent[end[0] - first_row] = rel_indent[row] > + > + if indent_next and expand_indent(line) == indent_level + 4: > + pos = (start[0], indent[0] + 4) > + if visual_indent: > + code = "E129 visually indented line" > + else: > + code = "E125 continuation line" > + yield pos, "%s with same indent as next logical line" % code > + > + > +def whitespace_before_parameters(logical_line, tokens): > + r"""Avoid extraneous whitespace. > + > + Avoid extraneous whitespace in the following situations: > + - before the open parenthesis that starts the argument list of a > + function call. > + - before the open parenthesis that starts an indexing or slicing. > + > + Okay: spam(1) > + E211: spam (1) > + > + Okay: dict['key'] = list[index] > + E211: dict ['key'] = list[index] > + E211: dict['key'] = list [index] > + """ > + prev_type, prev_text, __, prev_end, __ = tokens[0] > + for index in range(1, len(tokens)): > + token_type, text, start, end, __ = tokens[index] > + if (token_type == tokenize.OP and > + text in '([' and > + start != prev_end and > + (prev_type == tokenize.NAME or prev_text in '}])') and > + # Syntax "class A (B):" is allowed, but avoid it > + (index < 2 or tokens[index - 2][1] != 'class') and > + # Allow "return (a.foo for a in range(5))" > + not keyword.iskeyword(prev_text)): > + yield prev_end, "E211 whitespace before '%s'" % text > + prev_type = token_type > + prev_text = text > + prev_end = end > + > + > +def whitespace_around_operator(logical_line): > + r"""Avoid extraneous whitespace around an operator. > + > + Okay: a = 12 + 3 > + E221: a = 4 + 5 > + E222: a = 4 + 5 > + E223: a = 4\t+ 5 > + E224: a = 4 +\t5 > + """ > + for match in OPERATOR_REGEX.finditer(logical_line): > + before, after = match.groups() > + > + if '\t' in before: > + yield match.start(1), "E223 tab before operator" > + elif len(before) > 1: > + yield match.start(1), "E221 multiple spaces before operator" > + > + if '\t' in after: > + yield match.start(2), "E224 tab after operator" > + elif len(after) > 1: > + yield match.start(2), "E222 multiple spaces after operator" > + > + > +def missing_whitespace_around_operator(logical_line, tokens): > + r"""Surround operators with a single space on either side. > + > + - Always surround these binary operators with a single space on > + either side: assignment (=), augmented assignment (+=, -= etc.), > + comparisons (==, <, >, !=, <=, >=, in, not in, is, is not), > + Booleans (and, or, not). > + > + - If operators with different priorities are used, consider adding > + whitespace around the operators with the lowest priorities. > + > + Okay: i = i + 1 > + Okay: submitted += 1 > + Okay: x = x * 2 - 1 > + Okay: hypot2 = x * x + y * y > + Okay: c = (a + b) * (a - b) > + Okay: foo(bar, key='word', *args, **kwargs) > + Okay: alpha[:-i] > + > + E225: i=i+1 > + E225: submitted +=1 > + E225: x = x /2 - 1 > + E225: z = x **y > + E226: c = (a+b) * (a-b) > + E226: hypot2 = x*x + y*y > + E227: c = a|b > + E228: msg = fmt%(errno, errmsg) > + """ > + parens = 0 > + need_space = False > + prev_type = tokenize.OP > + prev_text = prev_end = None > + for token_type, text, start, end, line in tokens: > + if token_type in SKIP_COMMENTS: > + continue > + if text in ('(', 'lambda'): > + parens += 1 > + elif text == ')': > + parens -= 1 > + if need_space: > + if start != prev_end: > + # Found a (probably) needed space > + if need_space is not True and not need_space[1]: > + yield (need_space[0], > + "E225 missing whitespace around operator") > + need_space = False > + elif text == '>' and prev_text in ('<', '-'): > + # Tolerate the "<>" operator, even if running Python 3 > + # Deal with Python 3's annotated return value "->" > + pass > + else: > + if need_space is True or need_space[1]: > + # A needed trailing space was not found > + yield prev_end, "E225 missing whitespace around > operator" > + elif prev_text != '**': > + code, optype = 'E226', 'arithmetic' > + if prev_text == '%': > + code, optype = 'E228', 'modulo' > + elif prev_text not in ARITHMETIC_OP: > + code, optype = 'E227', 'bitwise or shift' > + yield (need_space[0], "%s missing whitespace " > + "around %s operator" % (code, optype)) > + need_space = False > + elif token_type == tokenize.OP and prev_end is not None: > + if text == '=' and parens: > + # Allow keyword args or defaults: foo(bar=None). > + pass > + elif text in WS_NEEDED_OPERATORS: > + need_space = True > + elif text in UNARY_OPERATORS: > + # Check if the operator is being used as a binary operator > + # Allow unary operators: -123, -x, +1. > + # Allow argument unpacking: foo(*args, **kwargs). > + if (prev_text in '}])' if prev_type == tokenize.OP > + else prev_text not in KEYWORDS): > + need_space = None > + elif text in WS_OPTIONAL_OPERATORS: > + need_space = None > + > + if need_space is None: > + # Surrounding space is optional, but ensure that > + # trailing space matches opening space > + need_space = (prev_end, start != prev_end) > + elif need_space and start == prev_end: > + # A needed opening space was not found > + yield prev_end, "E225 missing whitespace around operator" > + need_space = False > + prev_type = token_type > + prev_text = text > + prev_end = end > + > + > +def whitespace_around_comma(logical_line): > + r"""Avoid extraneous whitespace after a comma or a colon. > + > + Note: these checks are disabled by default > + > + Okay: a = (1, 2) > + E241: a = (1, 2) > + E242: a = (1,\t2) > + """ > + line = logical_line > + for m in WHITESPACE_AFTER_COMMA_REGEX.finditer(line): > + found = m.start() + 1 > + if '\t' in m.group(): > + yield found, "E242 tab after '%s'" % m.group()[0] > + else: > + yield found, "E241 multiple spaces after '%s'" % m.group()[0] > + > + > +def whitespace_around_named_parameter_equals(logical_line, tokens): > + r"""Don't use spaces around the '=' sign in function arguments. > + > + Don't use spaces around the '=' sign when used to indicate a > + keyword argument or a default parameter value. > + > + Okay: def complex(real, imag=0.0): > + Okay: return magic(r=real, i=imag) > + Okay: boolean(a == b) > + Okay: boolean(a != b) > + Okay: boolean(a <= b) > + Okay: boolean(a >= b) > + Okay: def foo(arg: int = 42): > + > + E251: def complex(real, imag = 0.0): > + E251: return magic(r = real, i = imag) > + """ > + parens = 0 > + no_space = False > + prev_end = None > + annotated_func_arg = False > + in_def = logical_line.startswith('def') > + message = "E251 unexpected spaces around keyword / parameter equals" > + for token_type, text, start, end, line in tokens: > + if token_type == tokenize.NL: > + continue > + if no_space: > + no_space = False > + if start != prev_end: > + yield (prev_end, message) > + if token_type == tokenize.OP: > + if text == '(': > + parens += 1 > + elif text == ')': > + parens -= 1 > + elif in_def and text == ':' and parens == 1: > + annotated_func_arg = True > + elif parens and text == ',' and parens == 1: > + annotated_func_arg = False > + elif parens and text == '=' and not annotated_func_arg: > + no_space = True > + if start != prev_end: > + yield (prev_end, message) > + if not parens: > + annotated_func_arg = False > + > + prev_end = end > + > + > +def whitespace_before_comment(logical_line, tokens): > + r"""Separate inline comments by at least two spaces. > + > + An inline comment is a comment on the same line as a statement. > Inline > + comments should be separated by at least two spaces from the > statement. > + They should start with a # and a single space. > + > + Each line of a block comment starts with a # and a single space > + (unless it is indented text inside the comment). > + > + Okay: x = x + 1 # Increment x > + Okay: x = x + 1 # Increment x > + Okay: # Block comment > + E261: x = x + 1 # Increment x > + E262: x = x + 1 #Increment x > + E262: x = x + 1 # Increment x > + E265: #Block comment > + E266: ### Block comment > + """ > + prev_end = (0, 0) > + for token_type, text, start, end, line in tokens: > + if token_type == tokenize.COMMENT: > + inline_comment = line[:start[1]].strip() > + if inline_comment: > + if prev_end[0] == start[0] and start[1] < prev_end[1] + 2: > + yield (prev_end, > + "E261 at least two spaces before inline > comment") > + symbol, sp, comment = text.partition(' ') > + bad_prefix = symbol not in '#:' and (symbol.lstrip('#')[:1] > or '#') > + if inline_comment: > + if bad_prefix or comment[:1] in WHITESPACE: > + yield start, "E262 inline comment should start with > '# '" > + elif bad_prefix and (bad_prefix != '!' or start[0] > 1): > + if bad_prefix != '#': > + yield start, "E265 block comment should start with '# > '" > + elif comment: > + yield start, "E266 too many leading '#' for block > comment" > + elif token_type != tokenize.NL: > + prev_end = end > + > + > +def imports_on_separate_lines(logical_line): > + r"""Imports should usually be on separate lines. > + > + Okay: import os\nimport sys > + E401: import sys, os > + > + Okay: from subprocess import Popen, PIPE > + Okay: from myclas import MyClass > + Okay: from foo.bar.yourclass import YourClass > + Okay: import myclass > + Okay: import foo.bar.yourclass > + """ > + line = logical_line > + if line.startswith('import '): > + found = line.find(',') > + if -1 < found and ';' not in line[:found]: > + yield found, "E401 multiple imports on one line" > + > + > +def module_imports_on_top_of_file( > + logical_line, indent_level, checker_state, noqa): > + r"""Imports are always put at the top of the file, just after any > module > + comments and docstrings, and before module globals and constants. > + > + Okay: import os > + Okay: # this is a comment\nimport os > + Okay: '''this is a module docstring'''\nimport os > + Okay: r'''this is a module docstring'''\nimport os > + Okay: try:\n import x\nexcept:\n pass\nelse:\n pass\nimport y > + Okay: try:\n import x\nexcept:\n pass\nfinally:\n > pass\nimport y > + E402: a=1\nimport os > + E402: 'One string'\n"Two string"\nimport os > + E402: a=1\nfrom sys import x > + > + Okay: if x:\n import os > + """ > + def is_string_literal(line): > + if line[0] in 'uUbB': > + line = line[1:] > + if line and line[0] in 'rR': > + line = line[1:] > + return line and (line[0] == '"' or line[0] == "'") > + > + allowed_try_keywords = ('try', 'except', 'else', 'finally') > + > + if indent_level: # Allow imports in conditional statements or > functions > + return > + if not logical_line: # Allow empty lines or comments > + return > + if noqa: > + return > + line = logical_line > + if line.startswith('import ') or line.startswith('from '): > + if checker_state.get('seen_non_imports', False): > + yield 0, "E402 module level import not at top of file" > + elif any(line.startswith(kw) for kw in allowed_try_keywords): > + # Allow try, except, else, finally keywords intermixed with > imports in > + # order to support conditional importing > + return > + elif is_string_literal(line): > + # The first literal is a docstring, allow it. Otherwise, report > error. > + if checker_state.get('seen_docstring', False): > + checker_state['seen_non_imports'] = True > + else: > + checker_state['seen_docstring'] = True > + else: > + checker_state['seen_non_imports'] = True > + > + > +def compound_statements(logical_line): > + r"""Compound statements (on the same line) are generally discouraged. > + > + While sometimes it's okay to put an if/for/while with a small body > + on the same line, never do this for multi-clause statements. > + Also avoid folding such long lines! > + > + Always use a def statement instead of an assignment statement that > + binds a lambda expression directly to a name. > + > + Okay: if foo == 'blah':\n do_blah_thing() > + Okay: do_one() > + Okay: do_two() > + Okay: do_three() > + > + E701: if foo == 'blah': do_blah_thing() > + E701: for x in lst: total += x > + E701: while t < 10: t = delay() > + E701: if foo == 'blah': do_blah_thing() > + E701: else: do_non_blah_thing() > + E701: try: something() > + E701: finally: cleanup() > + E701: if foo == 'blah': one(); two(); three() > + E702: do_one(); do_two(); do_three() > + E703: do_four(); # useless semicolon > + E704: def f(x): return 2*x > + E731: f = lambda x: 2*x > + """ > + line = logical_line > + last_char = len(line) - 1 > + found = line.find(':') > + while -1 < found < last_char: > + before = line[:found] > + if ((before.count('{') <= before.count('}') and # {'a': 1} > (dict) > + before.count('[') <= before.count(']') and # [1:2] (slice) > + before.count('(') <= before.count(')'))): # (annotation) > + lambda_kw = LAMBDA_REGEX.search(before) > + if lambda_kw: > + before = line[:lambda_kw.start()].rstrip() > + if before[-1:] == '=' and > isidentifier(before[:-1].strip()): > + yield 0, ("E731 do not assign a lambda expression, > use a " > + "def") > + break > + if before.startswith('def '): > + yield 0, "E704 multiple statements on one line (def)" > + else: > + yield found, "E701 multiple statements on one line > (colon)" > + found = line.find(':', found + 1) > + found = line.find(';') > + while -1 < found: > + if found < last_char: > + yield found, "E702 multiple statements on one line > (semicolon)" > + else: > + yield found, "E703 statement ends with a semicolon" > + found = line.find(';', found + 1) > + > + > +def explicit_line_join(logical_line, tokens): > + r"""Avoid explicit line join between brackets. > + > + The preferred way of wrapping long lines is by using Python's implied > line > + continuation inside parentheses, brackets and braces. Long lines can > be > + broken over multiple lines by wrapping expressions in parentheses. > These > + should be used in preference to using a backslash for line > continuation. > + > + E502: aaa = [123, \\n 123] > + E502: aaa = ("bbb " \\n "ccc") > + > + Okay: aaa = [123,\n 123] > + Okay: aaa = ("bbb "\n "ccc") > + Okay: aaa = "bbb " \\n "ccc" > + Okay: aaa = 123 # \\ > + """ > + prev_start = prev_end = parens = 0 > + comment = False > + backslash = None > + for token_type, text, start, end, line in tokens: > + if token_type == tokenize.COMMENT: > + comment = True > + if start[0] != prev_start and parens and backslash and not > comment: > + yield backslash, "E502 the backslash is redundant between > brackets" > + if end[0] != prev_end: > + if line.rstrip('\r\n').endswith('\\'): > + backslash = (end[0], len(line.splitlines()[-1]) - 1) > + else: > + backslash = None > + prev_start = prev_end = end[0] > + else: > + prev_start = start[0] > + if token_type == tokenize.OP: > + if text in '([{': > + parens += 1 > + elif text in ')]}': > + parens -= 1 > + > + > +def break_around_binary_operator(logical_line, tokens): > + r""" > + Avoid breaks before binary operators. > + > + The preferred place to break around a binary operator is after the > + operator, not before it. > + > + W503: (width == 0\n + height == 0) > + W503: (width == 0\n and height == 0) > + > + Okay: (width == 0 +\n height == 0) > + Okay: foo(\n -x) > + Okay: foo(x\n []) > + Okay: x = '''\n''' + '' > + Okay: foo(x,\n -y) > + Okay: foo(x, # comment\n -y) > + """ > + def is_binary_operator(token_type, text): > + # The % character is strictly speaking a binary operator, but the > + # common usage seems to be to put it next to the format > parameters, > + # after a line break. > + return ((token_type == tokenize.OP or text in ['and', 'or']) and > + text not in "()[]{},:.;@=%") > + > + line_break = False > + unary_context = True > + for token_type, text, start, end, line in tokens: > + if token_type == tokenize.COMMENT: > + continue > + if ('\n' in text or '\r' in text) and token_type != > tokenize.STRING: > + line_break = True > + else: > + if (is_binary_operator(token_type, text) and line_break and > + not unary_context): > + yield start, "W503 line break before binary operator" > + unary_context = text in '([{,;' > + line_break = False > + > + > +def comparison_to_singleton(logical_line, noqa): > + r"""Comparison to singletons should use "is" or "is not". > + > + Comparisons to singletons like None should always be done > + with "is" or "is not", never the equality operators. > + > + Okay: if arg is not None: > + E711: if arg != None: > + E711: if None == arg: > + E712: if arg == True: > + E712: if False == arg: > + > + Also, beware of writing if x when you really mean if x is not None -- > + e.g. when testing whether a variable or argument that defaults to > None was > + set to some other value. The other value might have a type (such as a > + container) that could be false in a boolean context! > + """ > + match = not noqa and COMPARE_SINGLETON_REGEX.search(logical_line) > + if match: > + singleton = match.group(1) or match.group(3) > + same = (match.group(2) == '==') > + > + msg = "'if cond is %s:'" % (('' if same else 'not ') + singleton) > + if singleton in ('None',): > + code = 'E711' > + else: > + code = 'E712' > + nonzero = ((singleton == 'True' and same) or > + (singleton == 'False' and not same)) > + msg += " or 'if %scond:'" % ('' if nonzero else 'not ') > + yield match.start(2), ("%s comparison to %s should be %s" % > + (code, singleton, msg)) > + > + > +def comparison_negative(logical_line): > + r"""Negative comparison should be done using "not in" and "is not". > + > + Okay: if x not in y:\n pass > + Okay: assert (X in Y or X is Z) > + Okay: if not (X in Y):\n pass > + Okay: zz = x is not y > + E713: Z = not X in Y > + E713: if not X.B in Y:\n pass > + E714: if not X is Y:\n pass > + E714: Z = not X.B is Y > + """ > + match = COMPARE_NEGATIVE_REGEX.search(logical_line) > + if match: > + pos = match.start(1) > + if match.group(2) == 'in': > + yield pos, "E713 test for membership should be 'not in'" > + else: > + yield pos, "E714 test for object identity should be 'is not'" > + > + > +def comparison_type(logical_line, noqa): > + r"""Object type comparisons should always use isinstance(). > + > + Do not compare types directly. > + > + Okay: if isinstance(obj, int): > + E721: if type(obj) is type(1): > + > + When checking if an object is a string, keep in mind that it might be > a > + unicode string too! In Python 2.3, str and unicode have a common base > + class, basestring, so you can do: > + > + Okay: if isinstance(obj, basestring): > + Okay: if type(a1) is type(b1): > + """ > + match = COMPARE_TYPE_REGEX.search(logical_line) > + if match and not noqa: > + inst = match.group(1) > + if inst and isidentifier(inst) and inst not in SINGLETONS: > + return # Allow comparison for types which are not obvious > + yield match.start(), "E721 do not compare types, use > 'isinstance()'" > + > + > +def python_3000_has_key(logical_line, noqa): > + r"""The {}.has_key() method is removed in Python 3: use the 'in' > operator. > + > + Okay: if "alph" in d:\n print d["alph"] > + W601: assert d.has_key('alph') > + """ > + pos = logical_line.find('.has_key(') > + if pos > -1 and not noqa: > + yield pos, "W601 .has_key() is deprecated, use 'in'" > + > + > +def python_3000_raise_comma(logical_line): > + r"""When raising an exception, use "raise ValueError('message')". > + > + The older form is removed in Python 3. > + > + Okay: raise DummyError("Message") > + W602: raise DummyError, "Message" > + """ > + match = RAISE_COMMA_REGEX.match(logical_line) > + if match and not RERAISE_COMMA_REGEX.match(logical_line): > + yield match.end() - 1, "W602 deprecated form of raising exception" > + > + > +def python_3000_not_equal(logical_line): > + r"""New code should always use != instead of <>. > + > + The older syntax is removed in Python 3. > + > + Okay: if a != 'no': > + W603: if a <> 'no': > + """ > + pos = logical_line.find('<>') > + if pos > -1: > + yield pos, "W603 '<>' is deprecated, use '!='" > + > + > +def python_3000_backticks(logical_line): > + r"""Backticks are removed in Python 3: use repr() instead. > + > + Okay: val = repr(1 + 2) > + W604: val = `1 + 2` > + """ > + pos = logical_line.find('`') > + if pos > -1: > + yield pos, "W604 backticks are deprecated, use 'repr()'" > + > + > > +############################################################################## > +# Helper functions > > +############################################################################## > + > + > +if sys.version_info < (3,): > + # Python 2: implicit encoding. > + def readlines(filename): > + """Read the source code.""" > + with open(filename, 'rU') as f: > + return f.readlines() > + isidentifier = re.compile(r'[a-zA-Z_]\w*$').match > + stdin_get_value = sys.stdin.read > +else: > + # Python 3 > + def readlines(filename): > + """Read the source code.""" > + try: > + with open(filename, 'rb') as f: > + (coding, lines) = tokenize.detect_encoding(f.readline) > + f = TextIOWrapper(f, coding, line_buffering=True) > + return [l.decode(coding) for l in lines] + f.readlines() > + except (LookupError, SyntaxError, UnicodeError): > + # Fall back if file encoding is improperly declared > + with open(filename, encoding='latin-1') as f: > + return f.readlines() > + isidentifier = str.isidentifier > + > + def stdin_get_value(): > + return TextIOWrapper(sys.stdin.buffer, errors='ignore').read() > +noqa = re.compile(r'# no(?:qa|pep8)\b', re.I).search > + > + > +def expand_indent(line): > + r"""Return the amount of indentation. > + > + Tabs are expanded to the next multiple of 8. > + > + >>> expand_indent(' ') > + 4 > + >>> expand_indent('\t') > + 8 > + >>> expand_indent(' \t') > + 8 > + >>> expand_indent(' \t') > + 16 > + """ > + if '\t' not in line: > + return len(line) - len(line.lstrip()) > + result = 0 > + for char in line: > + if char == '\t': > + result = result // 8 * 8 + 8 > + elif char == ' ': > + result += 1 > + else: > + break > + return result > + > + > +def mute_string(text): > + """Replace contents with 'xxx' to prevent syntax matching. > + > + >>> mute_string('"abc"') > + '"xxx"' > + >>> mute_string("'''abc'''") > + "'''xxx'''" > + >>> mute_string("r'abc'") > + "r'xxx'" > + """ > + # String modifiers (e.g. u or r) > + start = text.index(text[-1]) + 1 > + end = len(text) - 1 > + # Triple quotes > + if text[-3:] in ('"""', "'''"): > + start += 2 > + end -= 2 > + return text[:start] + 'x' * (end - start) + text[end:] > + > + > +def parse_udiff(diff, patterns=None, parent='.'): > + """Return a dictionary of matching lines.""" > + # For each file of the diff, the entry key is the filename, > + # and the value is a set of row numbers to consider. > + rv = {} > + path = nrows = None > + for line in diff.splitlines(): > + if nrows: > + if line[:1] != '-': > + nrows -= 1 > + continue > + if line[:3] == '@@ ': > + hunk_match = HUNK_REGEX.match(line) > + (row, nrows) = [int(g or '1') for g in hunk_match.groups()] > + rv[path].update(range(row, row + nrows)) > + elif line[:3] == '+++': > + path = line[4:].split('\t', 1)[0] > + if path[:2] == 'b/': > + path = path[2:] > + rv[path] = set() > + return dict([(os.path.join(parent, path), rows) > + for (path, rows) in rv.items() > + if rows and filename_match(path, patterns)]) > + > + > +def normalize_paths(value, parent=os.curdir): > + """Parse a comma-separated list of paths. > + > + Return a list of absolute paths. > + """ > + if not value: > + return [] > + if isinstance(value, list): > + return value > + paths = [] > + for path in value.split(','): > + path = path.strip() > + if '/' in path: > + path = os.path.abspath(os.path.join(parent, path)) > + paths.append(path.rstrip('/')) > + return paths > + > + > +def filename_match(filename, patterns, default=True): > + """Check if patterns contains a pattern that matches filename. > + > + If patterns is unspecified, this always returns True. > + """ > + if not patterns: > + return default > + return any(fnmatch(filename, pattern) for pattern in patterns) > + > + > +def _is_eol_token(token): > + return token[0] in NEWLINE or token[4][token[3][1]:].lstrip() == > '\\\n' > +if COMMENT_WITH_NL: > + def _is_eol_token(token, _eol_token=_is_eol_token): > + return _eol_token(token) or (token[0] == tokenize.COMMENT and > + token[1] == token[4]) > + > > +############################################################################## > +# Framework to run all checks > > +############################################################################## > + > + > +_checks = {'physical_line': {}, 'logical_line': {}, 'tree': {}} > + > + > +def _get_parameters(function): > + if sys.version_info >= (3, 3): > + return [parameter.name > + for parameter > + in inspect.signature(function).parameters.values() > + if parameter.kind == parameter.POSITIONAL_OR_KEYWORD] > + else: > + return inspect.getargspec(function)[0] > + > + > +def register_check(check, codes=None): > + """Register a new check object.""" > + def _add_check(check, kind, codes, args): > + if check in _checks[kind]: > + _checks[kind][check][0].extend(codes or []) > + else: > + _checks[kind][check] = (codes or [''], args) > + if inspect.isfunction(check): > + args = _get_parameters(check) > + if args and args[0] in ('physical_line', 'logical_line'): > + if codes is None: > + codes = ERRORCODE_REGEX.findall(check.__doc__ or '') > + _add_check(check, args[0], codes, args) > + elif inspect.isclass(check): > + if _get_parameters(check.__init__)[:2] == ['self', 'tree']: > + _add_check(check, 'tree', codes, None) > + > + > +def init_checks_registry(): > + """Register all globally visible functions. > + > + The first argument name is either 'physical_line' or 'logical_line'. > + """ > + mod = inspect.getmodule(register_check) > + for (name, function) in inspect.getmembers(mod, inspect.isfunction): > + register_check(function) > +init_checks_registry() > + > + > +class Checker(object): > + """Load a Python source file, tokenize it, check coding style.""" > + > + def __init__(self, filename=None, lines=None, > + options=None, report=None, **kwargs): > + if options is None: > + options = StyleGuide(kwargs).options > + else: > + assert not kwargs > + self._io_error = None > + self._physical_checks = options.physical_checks > + self._logical_checks = options.logical_checks > + self._ast_checks = options.ast_checks > + self.max_line_length = options.max_line_length > + self.multiline = False # in a multiline string? > + self.hang_closing = options.hang_closing > + self.verbose = options.verbose > + self.filename = filename > + # Dictionary where a checker can store its custom state. > + self._checker_states = {} > + if filename is None: > + self.filename = 'stdin' > + self.lines = lines or [] > + elif filename == '-': > + self.filename = 'stdin' > + self.lines = stdin_get_value().splitlines(True) > + elif lines is None: > + try: > + self.lines = readlines(filename) > + except IOError: > + (exc_type, exc) = sys.exc_info()[:2] > + self._io_error = '%s: %s' % (exc_type.__name__, exc) > + self.lines = [] > + else: > + self.lines = lines > + if self.lines: > + ord0 = ord(self.lines[0][0]) > + if ord0 in (0xef, 0xfeff): # Strip the UTF-8 BOM > + if ord0 == 0xfeff: > + self.lines[0] = self.lines[0][1:] > + elif self.lines[0][:3] == '\xef\xbb\xbf': > + self.lines[0] = self.lines[0][3:] > + self.report = report or options.report > + self.report_error = self.report.error > + > + def report_invalid_syntax(self): > + """Check if the syntax is valid.""" > + (exc_type, exc) = sys.exc_info()[:2] > + if len(exc.args) > 1: > + offset = exc.args[1] > + if len(offset) > 2: > + offset = offset[1:3] > + else: > + offset = (1, 0) > + self.report_error(offset[0], offset[1] or 0, > + 'E901 %s: %s' % (exc_type.__name__, > exc.args[0]), > + self.report_invalid_syntax) > + > + def readline(self): > + """Get the next line from the input buffer.""" > + if self.line_number >= self.total_lines: > + return '' > + line = self.lines[self.line_number] > + self.line_number += 1 > + if self.indent_char is None and line[:1] in WHITESPACE: > + self.indent_char = line[0] > + return line > + > + def run_check(self, check, argument_names): > + """Run a check plugin.""" > + arguments = [] > + for name in argument_names: > + arguments.append(getattr(self, name)) > + return check(*arguments) > + > + def init_checker_state(self, name, argument_names): > + """ Prepares a custom state for the specific checker plugin.""" > + if 'checker_state' in argument_names: > + self.checker_state = self._checker_states.setdefault(name, {}) > + > + def check_physical(self, line): > + """Run all physical checks on a raw input line.""" > + self.physical_line = line > + for name, check, argument_names in self._physical_checks: > + self.init_checker_state(name, argument_names) > + result = self.run_check(check, argument_names) > + if result is not None: > + (offset, text) = result > + self.report_error(self.line_number, offset, text, check) > + if text[:4] == 'E101': > + self.indent_char = line[0] > + > + def build_tokens_line(self): > + """Build a logical line from tokens.""" > + logical = [] > + comments = [] > + length = 0 > + prev_row = prev_col = mapping = None > + for token_type, text, start, end, line in self.tokens: > + if token_type in SKIP_TOKENS: > + continue > + if not mapping: > + mapping = [(0, start)] > + if token_type == tokenize.COMMENT: > + comments.append(text) > + continue > + if token_type == tokenize.STRING: > + text = mute_string(text) > + if prev_row: > + (start_row, start_col) = start > + if prev_row != start_row: # different row > + prev_text = self.lines[prev_row - 1][prev_col - 1] > + if prev_text == ',' or (prev_text not in '{[(' and > + text not in '}])'): > + text = ' ' + text > + elif prev_col != start_col: # different column > + text = line[prev_col:start_col] + text > + logical.append(text) > + length += len(text) > + mapping.append((length, end)) > + (prev_row, prev_col) = end > + self.logical_line = ''.join(logical) > + self.noqa = comments and noqa(''.join(comments)) > + return mapping > + > + def check_logical(self): > + """Build a line from tokens and run all logical checks on it.""" > + self.report.increment_logical_line() > + mapping = self.build_tokens_line() > + > + if not mapping: > + return > + > + (start_row, start_col) = mapping[0][1] > + start_line = self.lines[start_row - 1] > + self.indent_level = expand_indent(start_line[:start_col]) > + if self.blank_before < self.blank_lines: > + self.blank_before = self.blank_lines > + if self.verbose >= 2: > + print(self.logical_line[:80].rstrip()) > + for name, check, argument_names in self._logical_checks: > + if self.verbose >= 4: > + print(' ' + name) > + self.init_checker_state(name, argument_names) > + for offset, text in self.run_check(check, argument_names) or > (): > + if not isinstance(offset, tuple): > + for token_offset, pos in mapping: > + if offset <= token_offset: > + break > + offset = (pos[0], pos[1] + offset - token_offset) > + self.report_error(offset[0], offset[1], text, check) > + if self.logical_line: > + self.previous_indent_level = self.indent_level > + self.previous_logical = self.logical_line > + self.blank_lines = 0 > + self.tokens = [] > + > + def check_ast(self): > + """Build the file's AST and run all AST checks.""" > + try: > + tree = compile(''.join(self.lines), '', 'exec', PyCF_ONLY_AST) > + except (ValueError, SyntaxError, TypeError): > + return self.report_invalid_syntax() > + for name, cls, __ in self._a -------------- next part -------------- An HTML attachment was scrubbed... URL: From nad at python.org Fri Apr 1 16:16:28 2016 From: nad at python.org (Ned Deily) Date: Fri, 1 Apr 2016 16:16:28 -0400 Subject: [Python-Dev] [Python-checkins] cpython: Python 8: no pep8, no chocolate! In-Reply-To: References: <20160331214027.11092.50943.0083A2D0@psf.io> Message-ID: <7B294850-BFD9-48D3-8E12-0FA3312136BB@python.org> On Apr 1, 2016, at 14:07, Brett Cannon wrote: > Are you planning on removing this after today? My worry about leaving it in is if it's a modified copy that follows your Python 8 April Fools joke then it will quite possibly trip people up who try and run pep8 but don't have it installed, leading them to wonder why the heck their imports are now all flagged as broken. > > On Thu, 31 Mar 2016 at 14:40 victor.stinner wrote: > https://hg.python.org/cpython/rev/9aedec2dbc01 > changeset: 100818:9aedec2dbc01 > user: Victor Stinner > date: Thu Mar 31 23:30:53 2016 +0200 > summary: > Python 8: no pep8, no chocolate! > > files: > Include/patchlevel.h | 6 +- > Lib/pep8.py | 2151 ++++++++++++++++++++++++++++++ > Lib/site.py | 56 + > 3 files changed, 2210 insertions(+), 3 deletions(-) [...] It has already been removed, a few hours after it was pushed, since it broke all of the 3x buidbots, and would have confused and/or added extra work to anyone trying to build or push changes. On behalf of my fellow release managers, may I suggest that, in the future, if anyone feels the urge to check something like this in to the live cpython repository, please resist that urge? :) A patch would be just as amusing without the need to use the soft cushion or the comfy chair. Inquisitorly yours, --Ned -- Ned Deily nad at python.org -- [] From greg.ewing at canterbury.ac.nz Fri Apr 1 20:16:06 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 02 Apr 2016 13:16:06 +1300 Subject: [Python-Dev] Which version is better? Phyton 27 or Phyton 35? In-Reply-To: References: Message-ID: <56FF0F46.2010603@canterbury.ac.nz> Chris Angelico wrote: > In this era of international foods in every supermarket aisle, > you cannot simply dismiss the black marks as "funny spots" and wish > they'd just go away; you MUST have a fungicide which can adequately > handle them. At least there's a standard for the spots now. It used to be a real mess -- Japanese plants had yellow spots, Chinese ones had red spots, all the European countries had their own slightly different variations on the spots, and you had to keep a dozen different fungicides in your shed for treating them. But now, fortunately, more and more growers are producing plants with the standard spots, and Phyton 35 is widely acknowledged as being one of the best fungicides for dealing with them. (Except for one person who seems to have his own inscrutable ideas on what should be done with spots.) -- Greg From victor.stinner at gmail.com Sat Apr 2 03:53:50 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 2 Apr 2016 09:53:50 +0200 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: <20160330133041.72218B14159@webabinitio.net> Message-ID: Any progress on the issue? Victor Le jeudi 31 mars 2016, Martin Panter a ?crit : > On 30 March 2016 at 13:30, R. David Murray > wrote: > > Anyone know how to find out what changed from Google's POV? As far as > > we know nothing changed at the bugs end, but it is certainly possible > > that something did change in the hosting infrastructure without our > > knowledge. Knowing what is setting google off would help track it down, > > if so...or perhaps something changed at the google end, in which case we > > *really* need to know what. > > My only guess is that Google decided to get stricter regarding > something mentioned in > , maybe > something in its sending guidelines. Perhaps to do with IPv6 DNS > . > > FYI I am now working around the problem for myself by pointing my > bugs.python.org account at a Yahoo email address, and setting up Yahoo > to forward all emails to my G Mail address. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Apr 2 11:42:57 2016 From: brett at python.org (Brett Cannon) Date: Sat, 02 Apr 2016 15:42:57 +0000 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: <20160330133041.72218B14159@webabinitio.net> Message-ID: This is probably the wrong place to be posting as there's an issue tracker for the issue tracker. Anyways this might be a solution: http://psf.upfronthosting.co.za/roundup/meta/issue568 On Sat, Apr 2, 2016, 00:54 Victor Stinner wrote: > Any progress on the issue? > > Victor > > > Le jeudi 31 mars 2016, Martin Panter a ?crit : > >> On 30 March 2016 at 13:30, R. David Murray wrote: >> > Anyone know how to find out what changed from Google's POV? As far as >> > we know nothing changed at the bugs end, but it is certainly possible >> > that something did change in the hosting infrastructure without our >> > knowledge. Knowing what is setting google off would help track it down, >> > if so...or perhaps something changed at the google end, in which case we >> > *really* need to know what. >> >> My only guess is that Google decided to get stricter regarding >> something mentioned in >> , maybe >> something in its sending guidelines. Perhaps to do with IPv6 DNS >> . >> >> FYI I am now working around the problem for myself by pointing my >> bugs.python.org account at a Yahoo email address, and setting up Yahoo >> to forward all emails to my G Mail address. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sun Apr 3 03:32:19 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 3 Apr 2016 10:32:19 +0300 Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF Message-ID: Originally I proposed a pair of macros for safe reference replacing to reflects the duality of Py_DECREF/Py_XDECREF. [1], [2] The one should use Py_DECREF and the other should use Py_XDECREF. But then I got a number of voices for the single name [3], and no one voice (except mine) for the pair of names. Thus in final patches the single name Py_SETREF that uses Py_XDECREF is used. Due to adding some overhead in comparison with using Py_DECREF, this macros is not used in critical performance code such as PyDict_SetItem(). Now Raymond says that we should have separate Py_SETREF/Py_XSETREF names to avoid any overhead. [4] And so I'm raising this issue on Python-Dev. Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF that uses Py_DECREF? [1] http://comments.gmane.org/gmane.comp.python.devel/145346 [2] http://comments.gmane.org/gmane.comp.python.devel/145974 [3] http://bugs.python.org/issue26200#msg259784 [4] http://bugs.python.org/issue26200 From python at mrabarnett.plus.com Sun Apr 3 09:29:31 2016 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 3 Apr 2016 14:29:31 +0100 Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF In-Reply-To: References: Message-ID: <57011ABB.8070509@mrabarnett.plus.com> On 2016-04-03 08:32, Serhiy Storchaka wrote: > Originally I proposed a pair of macros for safe reference replacing to > reflects the duality of Py_DECREF/Py_XDECREF. [1], [2] The one should > use Py_DECREF and the other should use Py_XDECREF. > > But then I got a number of voices for the single name [3], and no one > voice (except mine) for the pair of names. Thus in final patches the > single name Py_SETREF that uses Py_XDECREF is used. Due to adding some > overhead in comparison with using Py_DECREF, this macros is not used in > critical performance code such as PyDict_SetItem(). > > Now Raymond says that we should have separate Py_SETREF/Py_XSETREF names > to avoid any overhead. [4] And so I'm raising this issue on Python-Dev. > > Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF > that uses Py_DECREF? > > [1] http://comments.gmane.org/gmane.comp.python.devel/145346 > [2] http://comments.gmane.org/gmane.comp.python.devel/145974 > [3] http://bugs.python.org/issue26200#msg259784 > [4] http://bugs.python.org/issue26200 > Checking for NULL is convenient (and safer), but, on the other hand, it _would_ be consistent with the others. From arigo at tunes.org Sun Apr 3 10:00:39 2016 From: arigo at tunes.org (Armin Rigo) Date: Sun, 3 Apr 2016 16:00:39 +0200 Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF In-Reply-To: <57011ABB.8070509@mrabarnett.plus.com> References: <57011ABB.8070509@mrabarnett.plus.com> Message-ID: Hi, On 3 April 2016 at 15:29, MRAB wrote: >> Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF >> that uses Py_DECREF? > > Checking for NULL is convenient (and safer), but, on the other hand, it > _would_ be consistent with the others. My 2 cents would be to call the new macro Py_XSETREF for consistency, at least, whether you decide to go with two macros or not. Otherwise it's kind of obvious that if you add Py_SETREF that checks for nulls, in 2 or 3 releases people will really want a "fast" variant anyway, and there will be no consistent name for that. A bient?t, Armin. From jeog.dev at gmail.com Sun Apr 3 14:31:12 2016 From: jeog.dev at gmail.com (J.E. Ogden) Date: Sun, 3 Apr 2016 14:31:12 -0400 Subject: [Python-Dev] review/proof docs about private memory heap Message-ID: After digging through obmalloc.c to optimize some memory intensive code, I put a paper together on the entire private memory heap that may or may not be a useful addition to docs. I was hoping someone could review/proof it for errors in content. Not sure the policy on links but I've uploaded it to google drive: https://drive.google.com/open?id=0B6IkX5KnPHVLamwxSTNYR3dJYkE thanks, jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Apr 4 05:09:56 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 4 Apr 2016 19:09:56 +1000 Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF In-Reply-To: References: Message-ID: On 3 April 2016 at 17:32, Serhiy Storchaka wrote: > Originally I proposed a pair of macros for safe reference replacing to > reflects the duality of Py_DECREF/Py_XDECREF. [1], [2] The one should use > Py_DECREF and the other should use Py_XDECREF. > > But then I got a number of voices for the single name [3], and no one voice > (except mine) for the pair of names. Thus in final patches the single name > Py_SETREF that uses Py_XDECREF is used. Due to adding some overhead in > comparison with using Py_DECREF, this macros is not used in critical > performance code such as PyDict_SetItem(). I was one of those arguing for the single macro, and I think Alexander raises a good point in http://bugs.python.org/issue26200#msg262204 that I don't recall seeing in the original discussion: the "X" in the macro serves as a good shorthand for indicating that the code in question isn't closely tracking whether or not manipulated reference might be NULL, and hence may be a good candidate for additional micro-optimisations that keep better track of whether or not the pointer is NULL. > Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF that > uses Py_DECREF? With the single-macro design put into effect and concrete problems arising from that, I'm now more persuaded by the consistency argument than I was originally, so +1 from me for reverting to your original dual-macro proposal. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Mon Apr 4 05:35:41 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 4 Apr 2016 11:35:41 +0200 Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF In-Reply-To: References: Message-ID: If some dev don't want to use the single macro for good or bad reasons, it's maybe better to have two macros to generalize their usage. The macro makes to C code shorter and easier to review. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Mon Apr 4 06:04:58 2016 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 4 Apr 2016 22:04:58 +1200 Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7? Message-ID: I'm working on teaching funcsigs - the backport of inspect.signature - better handling for wrapped functions, and the key enabler to do that is capturing the wrapped function in __wrapped__. I'm wondering what folks thoughts are on backporting that to 2.7 - seems cleaner than monkeypatching functools.wraps, which would tend to be subject to import ordering races and general ick. I'll likely prep such a monkeypatch for folk that are stuck on older versions of 2.7 anyhow... so its not a huge win... -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From ncoghlan at gmail.com Mon Apr 4 09:24:25 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 4 Apr 2016 23:24:25 +1000 Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7? In-Reply-To: References: Message-ID: On 4 April 2016 at 20:04, Robert Collins wrote: > I'm working on teaching funcsigs - the backport of inspect.signature - > better handling for wrapped functions, and the key enabler to do that > is capturing the wrapped function in __wrapped__. I'm wondering what > folks thoughts are on backporting that to 2.7 - seems cleaner than > monkeypatching functools.wraps, which would tend to be subject to > import ordering races and general ick. I'll likely prep such a > monkeypatch for folk that are stuck on older versions of 2.7 anyhow... > so its not a huge win... Right, the baseline there is really 2.7.5 + selected backports, and the backport set is small for RHEL 7.x, and even smaller for Debian stable and Ubuntu LTS. Even getting the network security enhancements backported has proven to be challenging - other feature updates have next to no chance. Given that, I don't see a compelling reason to change the existing policy - the "no new features in point releases" restriction only gets waived in cases that have implications beyond the Python 2.7 process itself (which pretty much restricts potential waivers to network security enhancements). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Mon Apr 4 11:47:01 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 4 Apr 2016 08:47:01 -0700 Subject: [Python-Dev] Py_SETREF vs. Py_XSETREF In-Reply-To: References: Message-ID: Agreed, let's go with two macros. The time discussing this further could be spent more productively. On Mon, Apr 4, 2016 at 2:35 AM, Victor Stinner wrote: > If some dev don't want to use the single macro for good or bad reasons, it's > maybe better to have two macros to generalize their usage. The macro makes > to C code shorter and easier to review. > > Victor > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Mon Apr 4 17:05:23 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 4 Apr 2016 17:05:23 -0400 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: On 3/29/2016 7:30 PM, Martin Panter wrote: > For the last ~36 hours I have stopped receiving emails for messages > posted in the bug tracker. Is anyone else having this problem? Has > anything changed recently? My udel dot edu account is handled by google. I am also not getting anything at all, not even in spam, since at least 3/31 when I was added to https://bugs.python.org/issue26673 I only discovered it in the Friday weekly New Issues report. More emails were missing on Friday. The problem continues. I just added a question to https://bugs.python.org/issue19944 and got nothing. > I have had it set to send to my gmail.com address since the beginning. > At the moment the last bug message email is > with ?Date: Mon, 28 Mar > 2016 12:19:49 +0000?. I have checked spam and they are not going > there. Since at least last summer, Rietveld reviews have consistently gone to Junk. Normal tracker emails sometimes went to Inbox, sometimes to Junk. Since normal emails (but not reviews, unfortunately) are tagged in the subject line, I added a rule to Thunderbird to move tracker email to Inbox when I open Junk. This is no longer happening at they do not even get to Junk. I tried changing my tracker email to verizon.net and posted a message on on issue where I am the only nosy person. After half an hour, nothing. I am not surprised as Verizon rarely delivers anything it considers junk. I had this confirmed by a game site that said that its emails are deleted unless one contacts Verizon to whitelist their site. I will see if I can again find the page to do that. I do get checkins and core-mentorship mail. I have not seen anything on core-developers since the discussion of new commits privileges a month ago. -- Terry Jan Reedy From brett at python.org Mon Apr 4 17:13:15 2016 From: brett at python.org (Brett Cannon) Date: Mon, 04 Apr 2016 21:13:15 +0000 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: On Mon, 4 Apr 2016 at 14:05 Terry Reedy wrote: > On 3/29/2016 7:30 PM, Martin Panter wrote: > > For the last ~36 hours I have stopped receiving emails for messages > > posted in the bug tracker. Is anyone else having this problem? Has > > anything changed recently? > > My udel dot edu account is handled by google. I am also not getting > anything at all, not even in spam, since at least 3/31 when I was added > to https://bugs.python.org/issue26673 I only discovered it in the > Friday weekly New Issues report. More emails were missing on Friday. > The problem continues. I just added a question to > https://bugs.python.org/issue19944 and got nothing. > I have reached out to Upfront -- our Roundup host -- to see if the fix proposed in http://psf.upfronthosting.co.za/roundup/meta/issue568 will solve the issue to make sure this gets resolved. When I know something I will post here. > > > I have had it set to send to my gmail.com address since the beginning. > > At the moment the last bug message email is > > with ?Date: Mon, 28 Mar > > 2016 12:19:49 +0000?. I have checked spam and they are not going > > there. > > Since at least last summer, Rietveld reviews have consistently gone to > Junk. Normal tracker emails sometimes went to Inbox, sometimes to Junk. > Since normal emails (but not reviews, unfortunately) are tagged in the > subject line, I added a rule to Thunderbird to move tracker email to > Inbox when I open Junk. This is no longer happening at they do not even > get to Junk. > > I tried changing my tracker email to verizon.net and posted a message on > on issue where I am the only nosy person. After half an hour, nothing. > I am not surprised as Verizon rarely delivers anything it considers > junk. I had this confirmed by a game site that said that its emails are > deleted unless one contacts Verizon to whitelist their site. I will see > if I can again find the page to do that. > > I do get checkins and core-mentorship mail. I have not seen anything on > core-developers since the discussion of new commits privileges a month ago. > Do you mean python-committers? I don't know of any core-developers mailing list. If you do mean python-committers just let me know and I will see what address you're subscribed under. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stefan.Richthofer at gmx.de Mon Apr 4 23:38:51 2016 From: Stefan.Richthofer at gmx.de (Stefan Richthofer) Date: Tue, 5 Apr 2016 05:38:51 +0200 Subject: [Python-Dev] Help/advice needed with JyNI issue #4 (Tkinter on OSX) In-Reply-To: References: Message-ID: Hey everybody, I need help/advice for this JyNI-related issue: https://github.com/Stewori/JyNI/issues/4 Especially I need advice from someone familiar with TCL and TK internals, preferably also Tkinter. The issue is rather strange in the sense that it works well on Linux, while the program hangs on OSX. Everything we found out so far was collected in the thread linked above. Briefly speaking, on OSX TCL/TK does not produce a particular event the loop is waiting for and does not display the window. However logging suggests that calls to TCL/TK API are identical between Linux and OSX runs, so we are really stuck here in finding out what is different on Linux (our current logging does not cover function argument values though). Any advise how I can debug interaction with TCL/TK to find the reason for the missing event would be helpful. (Sorry if you might regard this off-topic for Python-dev; since JyNI is somewhat a crossover-project (also containing lots of CPython 2.7 code) I am asking in various locations. Starting here, because in this list I see best chances to find someone who can help within the Python ecosystem. Next I would look for a TCL/TK forum or something.) Thanks! Stefan From victor.stinner at gmail.com Tue Apr 5 04:10:45 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Apr 2016 10:10:45 +0200 Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7? In-Reply-To: References: Message-ID: See https://pypi.python.org/pypi/functools32 for the functools backport for Python 2.7. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Tue Apr 5 04:20:54 2016 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Apr 2016 20:20:54 +1200 Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7? In-Reply-To: References: Message-ID: Sadly that has the ordering bug of assigning __wrapped__ first and appears a little unmaintained based on the bug tracker :( On 5 Apr 2016 8:10 PM, "Victor Stinner" wrote: > See https://pypi.python.org/pypi/functools32 for the functools backport > for Python 2.7. > > Victor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Apr 5 11:46:24 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Apr 2016 08:46:24 -0700 Subject: [Python-Dev] Help/advice needed with JyNI issue #4 (Tkinter on OSX) In-Reply-To: References: Message-ID: Since this seems tcl/tk related your best bet is the tkinter mailing list: https://mail.python.org/mailman/listinfo/tkinter-discuss On Mon, Apr 4, 2016 at 8:38 PM, Stefan Richthofer wrote: > Hey everybody, > > I need help/advice for this JyNI-related issue: https://github.com/Stewori/JyNI/issues/4 > Especially I need advice from someone familiar with TCL and TK internals, preferably also Tkinter. > The issue is rather strange in the sense that it works well on Linux, while the program hangs on OSX. Everything we found out so far was collected in the thread linked above. Briefly speaking, on OSX TCL/TK does not produce a particular event the loop is waiting for and does not display the window. However logging suggests that calls to TCL/TK API are identical between Linux and OSX runs, so we are really stuck here in finding out what is different on Linux (our current logging does not cover function argument values though). > Any advise how I can debug interaction with TCL/TK to find the reason for the missing event would be helpful. > > (Sorry if you might regard this off-topic for Python-dev; since JyNI is somewhat a crossover-project (also containing lots of CPython 2.7 code) I am asking in various locations. Starting here, because in this list I see best chances to find someone who can help within the Python ecosystem. Next I would look for a TCL/TK forum or something.) > > Thanks! > > Stefan > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From brett at python.org Tue Apr 5 12:36:59 2016 From: brett at python.org (Brett Cannon) Date: Tue, 05 Apr 2016 16:36:59 +0000 Subject: [Python-Dev] Anyone want to lead the sprints at PyCon US 2016? In-Reply-To: References: Message-ID: The call has started to go out for sprint groups to list themselves online. Anyone want to specifically lead the core sprint this year? If no one specifically does then I will sign us up and do my usual thing of pointing people at the devguide and encourage people to ask questions but not do a lot of hand-holding (I'm expecting to be busy either working on GitHub migration stuff or doing other things that I have been neglecting due to my GitHub migration work). ---------- Forwarded message --------- From: Ewa Jodlowska Date: Mon, 4 Apr 2016 at 07:14 Subject: [PSF-Community] Sprinting at PyCon US 2016 To: Are you coming to PyCon US? Have you thought about sprinting? The coding Sprints are the hidden gem of PyCon, up to 4 days (June 2-5) of coding with many Python projects and their maintainers. And if you're coming to PyCon, taking part in the Sprints is easy! You don?t need to change your registration* to join the Sprints. There?s no additional registration fee, and you even get lunch. You do need to cover the additional lodging and other meals, but that?s it. If you?ve booked a room through the PyCon registration system, you'll need to contact the registration team at pycon2016 at cteusa.com as soon as possible to request the extra nights. The sprinting itself (along with lunch every day) is free, so your only expenses are your room and other meals. If you're interested in what projects will be sprinting, just keep an eye on the sprints page on the PyCon web site at https://us.pycon.org/2016/community/sprints/ Be sure to check back, as groups are being added all the time. If you haven't sprinted before, or if you just need to brush up on sprinting tools and techniques, there will again be an 'Intro to Sprinting' session the evening of June 1, lead by Shauna Gordon-McKeon and other members of Python community. To grab a free ticket for this session, just visit https://www.eventbrite.com/e/introduction-to-open-source-the-pycon-sprints-tickets-22435151141 . *Please note that conference registration is sold out, but you do not need a conference registration to come to the Sprints. _______________________________________________ PSF-Community mailing list PSF-Community at python.org https://mail.python.org/mailman/listinfo/psf-community -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Tue Apr 5 15:56:30 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 05 Apr 2016 15:56:30 -0400 Subject: [Python-Dev] bugs.python.org email blockage at gmail In-Reply-To: References: Message-ID: <20160405195631.B2F35B14156@webabinitio.net> We think we have a partial (and hopefully temporary) solution to the bugs email blockage: ipv6 has been turned off on bugs, so it is sending only from the ipv4 address. Google appears to be accepting the emails again. However, the IPV4 address has a poor reputation, and Verizon at least appears to be blocking it. So more work is still needed. --David From brett at python.org Tue Apr 5 18:41:14 2016 From: brett at python.org (Brett Cannon) Date: Tue, 05 Apr 2016 22:41:14 +0000 Subject: [Python-Dev] When should pathlib stop being provisional? Message-ID: After a rather extensive discussion on python-ideas about pathlib.PurePath not inheriting from str, another point that came up was that the use of pathlib has been rather light. Unfortunately even the stdlib doesn't really use pathlib because it's currently marked as provisional (or at least that's why I haven't tried to use it where possible in importlib). Do we have a plan of what is required to remove the provisional label from pathlib? -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Apr 5 18:55:28 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Apr 2016 15:55:28 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: It's been provisional since 3.4. I think if it is still there in 3.6.0 it should be considered no longer provisional. But this may indeed be a test case for the ultimate fate of provisional modules -- should we remove it? I have to admit I got tired of the discussions and muted them all. Personally I am not worried about the light use (I always expected it would take a long time to get adoption) but I am worried about the hostility towards the module. My last/only comment in the discussion was about there possibly being a dichotomy between people who use Python for scripting and those who use it to write more substantial programs (I'm trying not to judge one group more important than another -- I'm just observing there seem to be these two groups). But I didn't stick around long enough to watch for responses to this idea. Would making it inherit from str cause most hostility to disappear? I'm sure there was a discussion about this when PEP 428 was originally proposed, and I recall I was strongly in the camp of "it should not inherit from str", but unfortunately the PEP has no mention of this discussion or even the stated reason. --Guido On Tue, Apr 5, 2016 at 3:41 PM, Brett Cannon wrote: > After a rather extensive discussion on python-ideas about pathlib.PurePath > not inheriting from str, another point that came up was that the use of > pathlib has been rather light. Unfortunately even the stdlib doesn't really > use pathlib because it's currently marked as provisional (or at least that's > why I haven't tried to use it where possible in importlib). > > Do we have a plan of what is required to remove the provisional label from > pathlib? > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From antoine at python.org Tue Apr 5 18:45:23 2016 From: antoine at python.org (Antoine Pitrou) Date: Wed, 6 Apr 2016 00:45:23 +0200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <57044003.3050400@python.org> I think the provisional status can be safely lifted now. Even though pathlib hasn't seen that much use, there have been enough reports and discussion since its acception that I think the API has proven it's sane for general use. (as for importlib, pathlib might have too many dependencies for sane bootstrapping) Regards Antoine. Le 06/04/2016 00:41, Brett Cannon a ?crit : > After a rather extensive discussion on python-ideas about > pathlib.PurePath not inheriting from str, another point that came up was > that the use of pathlib has been rather light. Unfortunately even the > stdlib doesn't really use pathlib because it's currently marked as > provisional (or at least that's why I haven't tried to use it where > possible in importlib). > > Do we have a plan of what is required to remove the provisional label > from pathlib? From tritium-list at sdamon.com Tue Apr 5 19:08:23 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Tue, 5 Apr 2016 19:08:23 -0400 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <57044567.6070308@sdamon.com> On 4/5/2016 18:55, Guido van Rossum wrote: > My last/only comment in the discussion > was about there possibly being a dichotomy between people who use > Python for scripting and those who use it to write more substantial > programs (I'm trying not to judge one group more important than > another -- I'm just observing there seem to be these two groups). But > I didn't stick around long enough to watch for responses to this idea. This was all but ignored. The opinions mentioned in the thread, without throwing my opinion behind any of them were: * pathlib should be improved (specifically by making it inherit from str) * the stdlib should be made to deal with pathlib without changing pathlib * pathlib is redundant to third party modules which work better * the continued existence of pathlib was briefly discussed You can insert the never-ending arguments for and against each of those points in your head - none of them were particularly convincing (in that i don't think anyone changed their position.) the split between utility scripting and application development was not really discussed. From rosuav at gmail.com Tue Apr 5 19:13:24 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 6 Apr 2016 09:13:24 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <57044567.6070308@sdamon.com> References: <57044567.6070308@sdamon.com> Message-ID: On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters wrote: > * pathlib should be improved (specifically by making it inherit from str) I'd like to see this specific change settled on in the PEP, actually. There are some arguments on both sides, and some hybrid solutions being proposed, and it looks to be an important enough issue to people for there to be an answer somewhere. It seems to come down to a sloppiness vs strictness concern, I think, but I'm not sure. ChrisA From guido at python.org Tue Apr 5 19:45:50 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Apr 2016 16:45:50 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On Tue, Apr 5, 2016 at 4:13 PM, Chris Angelico wrote: > On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters > wrote: >> * pathlib should be improved (specifically by making it inherit from str) > > I'd like to see this specific change settled on in the PEP, actually. > There are some arguments on both sides, and some hybrid solutions > being proposed, and it looks to be an important enough issue to people > for there to be an answer somewhere. It seems to come down to a > sloppiness vs strictness concern, I think, but I'm not sure. This does sound like it's the crucial issue, and it is worth writing up clearly the pros and cons. Let's draft those lists in a thread (this one's fine) and then add them to the PEP. We can then decide to: - keep the status quo - change PurePath to inherit from str - decide it's never going to be settled and kill pathlib.py (And yes, I'm dead serious about the latter, rather Solomonic option.) -- --Guido van Rossum (python.org/~guido) From brett at python.org Tue Apr 5 19:47:32 2016 From: brett at python.org (Brett Cannon) Date: Tue, 05 Apr 2016 23:47:32 +0000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: On Tue, 5 Apr 2016 at 15:55 Guido van Rossum wrote: > It's been provisional since 3.4. I think if it is still there in 3.6.0 > it should be considered no longer provisional. But this may indeed be > a test case for the ultimate fate of provisional modules -- should we > remove it? > > I have to admit I got tired of the discussions and muted them all. > :) I figured. I was close myself until I decided to be the "not inheriting from str is a sane decision" camp because people weren't understanding where the design decision probably came from, hence http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str . > Personally I am not worried about the light use (I always expected it > would take a long time to get adoption) Ditto. My expectation/hope is that once we stop having it be provisional and we start using it in the stdlib then usage will pick up, especially if libraries pick up the `getattr(path, 'path', path)` idiom as an easy transition technique until they decide to drop support for str-based paths. The main motivation of this email is actually to have newcomers to the sprints at PyCon US sprint on adding support for pathlib (after we add "path-like object" to the glossary to say something like "a `str` object or an object that has a `path` attribute that itself is a `str`"). > but I am worried about the > hostility towards the module. My last/only comment in the discussion > was about there possibly being a dichotomy between people who use > Python for scripting and those who use it to write more substantial > programs (I'm trying not to judge one group more important than > another -- I'm just observing there seem to be these two groups). But > I didn't stick around long enough to watch for responses to this idea. > Nope, no response (as Alexander pointed out). > > Would making it inherit from str cause most hostility to disappear? > Probably. Most people were upset with pathlib because they couldn't use it immediately with all of the third-party libraries out there on top of the stdlib because adoption has been so low. Now if we make a concerted effort to accept pathlib in the stdlib then this may be the kick in the pants that it takes to start getting people to accept it externally and the transition band-aid of inheriting from str may not be needed. To me it seems to basically be a question of whether people can be patient during a transition and embrace pathlib over time or if they will simply refuse to add support in libraries and refuse to use `getattr(path, 'path', path)` or `str(path)` in the mean time. Personally, if we can wait out the Python 3 transition I have no issue waiting on a transition like this that has no backward-compatibility issues and has a one-liner solution for adding shallow support (and thus is ripe for quick patches to projects). After the whole str thing the only other major topic was coming up with some easier way to produce pathlib.Path instances (e.g. the p-string suggestion). Nothing really came of those discussions that seemed concrete and reach consensus, though (I think that may have been where your scripting/substantial programming comment came from). > I'm sure there was a discussion about this when PEP 428 was originally > proposed, and I recall I was strongly in the camp of "it should not > inherit from str", but unfortunately the PEP has no mention of this > discussion or even the stated reason. > https://www.python.org/dev/peps/pep-0428/#no-confusion-with-builtins is the best you get in the PEP. -Brett > > --Guido > > > On Tue, Apr 5, 2016 at 3:41 PM, Brett Cannon wrote: > > After a rather extensive discussion on python-ideas about > pathlib.PurePath > > not inheriting from str, another point that came up was that the use of > > pathlib has been rather light. Unfortunately even the stdlib doesn't > really > > use pathlib because it's currently marked as provisional (or at least > that's > > why I haven't tried to use it where possible in importlib). > > > > Do we have a plan of what is required to remove the provisional label > from > > pathlib? > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Apr 5 20:02:30 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 6 Apr 2016 10:02:30 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On Wed, Apr 6, 2016 at 9:45 AM, Guido van Rossum wrote: > On Tue, Apr 5, 2016 at 4:13 PM, Chris Angelico wrote: >> On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters >> wrote: >>> * pathlib should be improved (specifically by making it inherit from str) >> >> I'd like to see this specific change settled on in the PEP, actually. >> There are some arguments on both sides, and some hybrid solutions >> being proposed, and it looks to be an important enough issue to people >> for there to be an answer somewhere. It seems to come down to a >> sloppiness vs strictness concern, I think, but I'm not sure. > > This does sound like it's the crucial issue, and it is worth writing > up clearly the pros and cons. Let's draft those lists in a thread > (this one's fine) and then add them to the PEP. We can then decide to: > > - keep the status quo > - change PurePath to inherit from str > - decide it's never going to be settled and kill pathlib.py > > (And yes, I'm dead serious about the latter, rather Solomonic option.) Summarizing from memory to get things started. Inheriting from str makes it easier for code to support pathlib without really caring about the details. NOT inheriting from str forces code to be aware that it's working with a path, in the same way that text and bytes are fundamentally different things, and the Unicode string doesn't inherit from the byte string, nor vice versa. If a few crucial built-in functions support Path objects (notably open() and a handful of os.* functions), the bulk of stdlib support will be easy (sometimes trivial) to implement. Paths are [or are not] fundamentally different from strings. <-- argued point Paths might be backed by Unicode text, and might be backed by bytes. Should a Path be able to be implicitly constructed from either? Should there be some sort of "Path literal"? <-- possibly a completely separate question, to be resolved after this one How should .. be handled? Can you canonicalize a Path? Can Path handle URIs as well as file system paths? ----- My personal view on the text/bytes debate is that a path is fundamentally a human concept, and consists therefore of text. The fact that some file systems store (at the low level) bytes and some store (I think) UTF-16 code units should be immaterial; path components exist for people. We can smuggle unrecognized bytes around, but ultimately, those bytes came from characters at some point - we just don't know the encoding. So a Path object has no relationship with bytes, only with str. Whether a Path is fundamentally "a text string that uses slashes to separate components" or "a tuple of path components" is up for debate. Both make a lot of sense, and I'm somewhat inclined to the latter view; it allows for other forms of path component, such as an open directory (for statat/openat etc), or a special thing representing "current directory" or "root directory". ChrisA From tjreedy at udel.edu Tue Apr 5 21:27:05 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 5 Apr 2016 21:27:05 -0400 Subject: [Python-Dev] bugs.python.org email blockage at gmail In-Reply-To: <20160405195631.B2F35B14156@webabinitio.net> References: <20160405195631.B2F35B14156@webabinitio.net> Message-ID: On 4/5/2016 3:56 PM, R. David Murray wrote: > We think we have a partial (and hopefully temporary) solution to the > bugs email blockage: ipv6 has been turned off on bugs, so it is sending > only from the ipv4 address. Google appears to be accepting the emails > again. However, the IPV4 address has a poor reputation, and Verizon > at least appears to be blocking it. So more work is still needed. Switching back to Google from Verizon. How is bugs email sent differently from list email? What the latter does works fine, at least for gmail. -- Terry Jan Reedy From tjreedy at udel.edu Tue Apr 5 21:39:15 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 5 Apr 2016 21:39:15 -0400 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 4/5/2016 7:45 PM, Guido van Rossum wrote: > This does sound like it's the crucial issue, and it is worth writing > up clearly the pros and cons. Let's draft those lists in a thread > (this one's fine) and then add them to the PEP. We can then decide to: > > - keep the status quo > - change PurePath to inherit from str > - decide it's never going to be settled and kill pathlib.py > > (And yes, I'm dead serious about the latter, rather Solomonic option.) My sense of the discussion was that some people think that the new-in-upcoming 3.5.2 PurePath.path should serve as a substitute for inheriting from str. In particular, it should make it easy for stringpath functions to also accept path objects. -- Terry Jan Reedy From ncoghlan at gmail.com Tue Apr 5 22:21:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 12:21:04 +1000 Subject: [Python-Dev] bugs.python.org email blockage at gmail In-Reply-To: References: <20160405195631.B2F35B14156@webabinitio.net> Message-ID: On 6 April 2016 at 11:27, Terry Reedy wrote: > On 4/5/2016 3:56 PM, R. David Murray wrote: >> >> We think we have a partial (and hopefully temporary) solution to the >> bugs email blockage: ipv6 has been turned off on bugs, so it is sending >> only from the ipv4 address. Google appears to be accepting the emails >> again. However, the IPV4 address has a poor reputation, and Verizon >> at least appears to be blocking it. So more work is still needed. > > Switching back to Google from Verizon. > > How is bugs email sent differently from list email? What the latter does > works fine, at least for gmail. bugs.python.org is currently sending notification emails directly to recipients, rather than routing them via the outbound SMTP server on mail.python.org. Reconfiguring it to relay notifications via the main outgoing server is the longer term fix, but an initial attempt at enabling that resulted in errors in the bugs.python.org mail logs, so David reverted to the direct email configuration for the time being. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Apr 5 22:40:13 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 6 Apr 2016 12:40:13 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <20160406024012.GG12526@ando.pearwood.info> I haven't really been following this discussion, but a couple of comments... On Tue, Apr 05, 2016 at 11:47:32PM +0000, Brett Cannon wrote: > http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str Nice write-up, thanks. [...] > To me it seems to basically be a question of whether people can be patient > during a transition and embrace pathlib over time or if they will simply > refuse to add support in libraries and refuse to use `getattr(path, 'path', > path)` or `str(path)` in the mean time. Wait, what? Is that what the whole fuss is about? That some people refuse to call str(path) when passing a path object to a function that expects a string? Really? That's it? The mind boggles. -- Steve From ncoghlan at gmail.com Tue Apr 5 22:44:47 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 12:44:47 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 6 April 2016 at 09:45, Guido van Rossum wrote: > On Tue, Apr 5, 2016 at 4:13 PM, Chris Angelico wrote: >> On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters >> wrote: >>> * pathlib should be improved (specifically by making it inherit from str) >> >> I'd like to see this specific change settled on in the PEP, actually. >> There are some arguments on both sides, and some hybrid solutions >> being proposed, and it looks to be an important enough issue to people >> for there to be an answer somewhere. It seems to come down to a >> sloppiness vs strictness concern, I think, but I'm not sure. > > This does sound like it's the crucial issue, and it is worth writing > up clearly the pros and cons. Let's draft those lists in a thread > (this one's fine) and then add them to the PEP. We can then decide to: > > - keep the status quo > - change PurePath to inherit from str > - decide it's never going to be settled and kill pathlib.py Option 4: define a rich-object-to-text path serialisation convention, as paths are not conceptually the same as arbitrary strings, and we can define a new protocol accepted by builtins and standard library modules, while third parties can't The most promising option for that is probably "getattr(path, 'path', path)", since the "path" attribute is being added to pathlib, and the given idiom can be readily adopted in Python 2/3 compatible code (since normal strings and any other object without a "path" attribute are passed through unchanged). Alternatively, since it's a protocol, double-underscores on the property name may be appropriate (i.e. "getattr(path, '__path__', path)") The next challenge would then be to make a list of APIs to be updated for 3.6 to implicitly accept "rich path" objects via the agreed convention, with pathlib.PurePath used as a test class: * open() * codecs.open() (et al) * io.* * os.path.* * other os functions * shutil.* * tempfile.* * shelve.* * csv.* The list wouldn't necessarily need to be 100% comprehensive (similar to the rollout of context management, "support rich path objects in API " may appear as future RFEs), but it should be comprehensive enough for rich path objects to mostly "just work" with other APIs that aren't specifically limiting their inputs to str objects (although using lower level APIs may force a conversion to the lower level plain text representation as a side-effect). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Apr 5 22:51:55 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 6 Apr 2016 12:51:55 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <20160406025154.GH12526@ando.pearwood.info> On Wed, Apr 06, 2016 at 10:02:30AM +1000, Chris Angelico wrote: > My personal view on the text/bytes debate is that a path is > fundamentally a human concept, and consists therefore of text. The > fact that some file systems store (at the low level) bytes and some > store (I think) UTF-16 code units should be immaterial; path > components exist for people. We can smuggle unrecognized bytes around, > but ultimately, those bytes came from characters at some point - we > just don't know the encoding. So a Path object has no relationship > with bytes, only with str. That might be usually true in practice, but it is incorrect in principle. Paths in POSIX systems like Linux are fundamentally byte-strings with only two restrictions: \0 and \x2f are forbidden. The fact that paths in Linux mostly happen to look like English words (often heavily abbreviated) is a historical accident. The file system itself supported paths containing (say) \xff even back in the days when text was pure US-ASCII and bytes over \x7f had no textual meaning, and these days paths still support sequences of bytes that have no human meaning in any encoding. I don't know if this makes the tiniest lick of difference for Pathlib. I would be perfectly content if we stuck with the design decision that Pathlib can only represent paths representable as Unicode strings, and left weird POSIX filenames to the legacy byte-string interface. -- Steve From stephen at xemacs.org Tue Apr 5 23:03:36 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 6 Apr 2016 12:03:36 +0900 Subject: [Python-Dev] bugs.python.org email blockage at gmail In-Reply-To: <20160405195631.B2F35B14156@webabinitio.net> References: <20160405195631.B2F35B14156@webabinitio.net> Message-ID: <22276.31880.854091.86500@turnbull.sk.tsukuba.ac.jp> R. David Murray writes: > again. However, the IPV4 address has a poor reputation, and Verizon > at least appears to be blocking it. So more work is still needed. Don't take Verizon's policy as meaningful. Tell Verizon customers to get another address. That is the only solution that works for Verizon subscribers for very long (based on 15 years of Mailman-Users posts), they have never been a high-quality email provider. Further, Verizon (as an email provider) is in the process of dying anyway (they are very much alive as the new owner of AOL), so improvements in their email practices have a likelihood of zero to the resolution of a C float. From stephen at xemacs.org Tue Apr 5 23:03:59 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 6 Apr 2016 12:03:59 +0900 Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7? In-Reply-To: References: Message-ID: <22276.31903.569346.438240@turnbull.sk.tsukuba.ac.jp> Robert Collins writes: > Sadly that has the ordering bug of assigning __wrapped__ first and appears > a little unmaintained based on the bug tracker :( You can fix two problems with one patch, then! From tritium-list at sdamon.com Tue Apr 5 23:06:36 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Tue, 5 Apr 2016 23:06:36 -0400 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <57047D3C.2030700@sdamon.com> On 4/5/2016 22:44, Nick Coghlan wrote: > Option 4: define a rich-object-to-text path serialisation convention, > as paths are not conceptually the same as arbitrary strings Just as a nit to pick, it is perfectly acceptable for hypothetical path objects to raise when someone tries to shoehorn them into acting like arbitrary strings - open() will gladly halt and set fire if you try and pass the text of war and peace as an argument. I think the naysayers would be satisfied with an object that... while not str or bytes or a derived class of either... acted like str when it had to. Is that possible without deriving from str or bytes? From rosuav at gmail.com Tue Apr 5 23:18:09 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 6 Apr 2016 13:18:09 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <20160406025154.GH12526@ando.pearwood.info> References: <57044567.6070308@sdamon.com> <20160406025154.GH12526@ando.pearwood.info> Message-ID: On Wed, Apr 6, 2016 at 12:51 PM, Steven D'Aprano wrote: > On Wed, Apr 06, 2016 at 10:02:30AM +1000, Chris Angelico wrote: > >> My personal view on the text/bytes debate is that a path is >> fundamentally a human concept, and consists therefore of text. The >> fact that some file systems store (at the low level) bytes and some >> store (I think) UTF-16 code units should be immaterial; path >> components exist for people. We can smuggle unrecognized bytes around, >> but ultimately, those bytes came from characters at some point - we >> just don't know the encoding. So a Path object has no relationship >> with bytes, only with str. > > That might be usually true in practice, but it is incorrect in > principle. Paths in POSIX systems like Linux are fundamentally > byte-strings with only two restrictions: \0 and \x2f are forbidden. That's the file system level. But more fundamentally than that, a path exists so that humans can refer to files. That's why they have *names*, not just dirent numbers. We could assign dirent number -1 to mean "parent directory", and then represent everything with tuples of directory entries. Follow the chain and you get an inode. Absolute paths would start with an inode (the root directory being inode 2) and proceed with dirents thereafter. Maybe we'd need a pseudo-inode to mean "current directory". Should we do paths like this? No way! Much better to have either "/home/rosuav/cpython/python" or (P.ROOT, "home", "rosuav", "cpython", "python") to represent them, because they exist for the human. The POSIX file system rules aren't insignificant, but my point is that every byte value seen in a file name was once representing a character. Outside of deliberate tests, we don't create files on our disks whose names are strings of random bytes; the normal use of a file system is to store files that a human has named. Hence my recommendation that a Path object be tied to str, but *not* to bytes. > The fact that paths in Linux mostly happen to look like English words > (often heavily abbreviated) is a historical accident. The file system > itself supported paths containing (say) \xff even back in the days when > text was pure US-ASCII and bytes over \x7f had no textual meaning, and > these days paths still support sequences of bytes that have no human > meaning in any encoding. > > I don't know if this makes the tiniest lick of difference for Pathlib. I > would be perfectly content if we stuck with the design decision that > Pathlib can only represent paths representable as Unicode strings, and > left weird POSIX filenames to the legacy byte-string interface. I'd prefer to keep the surrogateescape compatibility hack with U+DC00 to U+DCFF being used to smuggle bytes around. That means that every path can be represented as a Unicode string, with only minor loss of functionality (imagine a path with only a single character that can't be decoded - chances are a human can figure out what the file is), but it still strongly pushes to a Unicode interpretation of the path. An *actual* byte-string interface (such as os.listdir and friends support) would be completely outside of anything involving Pathlib. If you give bytes, you'll get bytes. And I'd deprecate that once Path objects are more broadly accepted. ChrisA From ethan at stoneleaf.us Wed Apr 6 00:29:18 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 05 Apr 2016 21:29:18 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <5704909E.8070908@stoneleaf.us> On 04/05/2016 03:55 PM, Guido van Rossum wrote: > It's been provisional since 3.4. I think if it is still there in 3.6.0 > it should be considered no longer provisional. But this may indeed be > a test case for the ultimate fate of provisional modules -- should we > remove it? We should either remove it or make the rest of the stdlib work with it. Currently, pathlib.*Paths are second-class citizens, and working with them is not significantly better than working with os.path.* simply because we have to cast to str every time we want to deal with any other part of the stdlib. > Would making it inherit from str cause most hostility to disappear? I don't think that is necessary. The hostility (of which I have some) is because we can't do: app_root = Path(...) config = app_root/'settings.cfg' with open(config) as blah: # whatever It feels like instead of addressing this basic disconnect, the answer has instead been: add that to pathlib! Which works great -- until a user or a library gets this path object and tries to use something from os on it. To come at this from a different angle: Python now has Enum; it is arguable that Path is more important, or at least much more useful. We have IntEnum whose sole purpose in life is to make it possible to (mostly) seamlessly work with the stdlib and other libraries where ints are being used to represent enumerations; and in pathlib we have . . . absolutely nothing. We have the promise of great things and wonderful usability, but in reality we have just as much pain as before -- or more if we forget to str(path) somewhere. I said that pathlib.Path does not need to inherit from str, and I still think that; however, to be a good stepping stone / transitional library I think the pathlib backport does need to have its Paths inherit from str. -- ~Ethan~ From ethan at stoneleaf.us Wed Apr 6 00:35:57 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 05 Apr 2016 21:35:57 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <20160406024012.GG12526@ando.pearwood.info> References: <20160406024012.GG12526@ando.pearwood.info> Message-ID: <5704922D.7060905@stoneleaf.us> On 04/05/2016 07:40 PM, Steven D'Aprano wrote: > On Tue, Apr 05, 2016 at 11:47:32PM +0000, Brett Cannon wrote: >> To me it seems to basically be a question of whether people can be patient >> during a transition and embrace pathlib over time or if they will simply >> refuse to add support in libraries and refuse to use `getattr(path, 'path', >> path)` or `str(path)` in the mean time. > > Wait, what? Is that what the whole fuss is about? That some people > refuse to call str(path) when passing a path object to a function that > expects a string? No, Stephen, that is not what this is about. This is about the ugliness of code with str(path) this and str(path) that and let's not forget the Path(this_returned_string) and Path(that_returned_string), not to mention the frustrations of forgetting to cast a str to Path or a Path to str. It's about the horror of boiler-plate infecting our otherwise beautiful Python code. -- ~Ethan~ From ncoghlan at gmail.com Wed Apr 6 00:49:33 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 14:49:33 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <57047D3C.2030700@sdamon.com> References: <57044567.6070308@sdamon.com> <57047D3C.2030700@sdamon.com> Message-ID: On 6 April 2016 at 13:06, Alexander Walters wrote: > I think the naysayers would be satisfied with an object that... while not > str or bytes or a derived class of either... acted like str when it had to. > Is that possible without deriving from str or bytes? Only if the consuming code explicitly casts with "str()", and that's *too* permissive for most use cases (since __str__ and the __repr__ fallback are completely inappropriate as a "convert to a text representation of a filesystem path" command). A "__text__" protocol for non-lossy conversions to str would arguably be feasible, but its scope goes way beyond what's needed for a "rich path object" conversion protocol. Implementing that model in the general case would require something more akin to https://www.python.org/dev/peps/pep-0357/, which added __index__ as a guaranteed-non-lossy conversion from other types to a builtin integer, allowing non-builtin integers to accepted for things like slicing and sequence repetition, without inadvertently also accepting non-integral types like builtin floats. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Apr 6 01:00:01 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Apr 2016 22:00:01 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <5704909E.8070908@stoneleaf.us> References: <5704909E.8070908@stoneleaf.us> Message-ID: On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman wrote: > [...] we can't do: > > app_root = Path(...) > config = app_root/'settings.cfg' > with open(config) as blah: > # whatever > > It feels like instead of addressing this basic disconnect, the answer has > instead been: add that to pathlib! Which works great -- until a user or a > library gets this path object and tries to use something from os on it. I agree that asking for config.open() isn't the right answer here (even if it happens to work). But in this example, once 3.5.2 is out, the solution would be to use open(config.path), and that will also work when passing it to a library. Is it still unacceptable then? -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Apr 6 01:03:22 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Apr 2016 22:03:22 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On Tue, Apr 5, 2016 at 7:44 PM, Nick Coghlan wrote: > Option 4: define a rich-object-to-text path serialisation convention, Unfortunately that sounds like a classic "serious programming" solution (objects, abstractions, serialization, all big important words :-). -- --Guido van Rossum (python.org/~guido) From storchaka at gmail.com Wed Apr 6 01:28:29 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Apr 2016 08:28:29 +0300 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: On 06.04.16 01:41, Brett Cannon wrote: > After a rather extensive discussion on python-ideas about > pathlib.PurePath not inheriting from str, another point that came up was > that the use of pathlib has been rather light. Unfortunately even the > stdlib doesn't really use pathlib because it's currently marked as > provisional (or at least that's why I haven't tried to use it where > possible in importlib). > > Do we have a plan of what is required to remove the provisional label > from pathlib? The behavior of the Path.resolve() method likely should be changed with breaking backward compatibility. There is an open issue about this. From stephen at xemacs.org Wed Apr 6 01:37:27 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 6 Apr 2016 14:37:27 +0900 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> <20160406025154.GH12526@ando.pearwood.info> Message-ID: <22276.41111.455186.755173@turnbull.sk.tsukuba.ac.jp> Chris Angelico writes: > Outside of deliberate tests, we don't create files on our disks > whose names are strings of random bytes; Wishful thinking. First, names made of control characters have often been deliberately used by miscreants to conceal their warez. Second, in some systems it's all too easy to create paths with components in different locales (the place I've seen it most frequently is in NFS mounts). I think that's much less true today, but perhaps that's only because my employer figured out that it was much less pain if system paths were pure ASCII so that it mostly didn't matter what encoding users chose for their subtrees. It remains important to be able to handle nearly arbitrary bytestrings in file names as far as I can see. Please note that 100 million Japanese and 1 billion Chinese by and large still prefer their homegrown encodings (plural!!) to Unicode, while many systems are now defaulting filenames to UTF-8. There's plenty of room remaining for copying bytestrings to arguments of open and friends. From stephen at xemacs.org Wed Apr 6 01:40:06 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 6 Apr 2016 14:40:06 +0900 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <5704922D.7060905@stoneleaf.us> References: <20160406024012.GG12526@ando.pearwood.info> <5704922D.7060905@stoneleaf.us> Message-ID: <22276.41270.715557.562304@turnbull.sk.tsukuba.ac.jp> Ethan Furman writes: > No, Stephen, that is not what this is about. Wrong Steven. Spelling matters in email too. And he's more worth paying attention to than I am. But I'll have my say anyway. ;-) > This is about the ugliness of code with str(path) this and > str(path) that -1 Not good enough. I wouldn't do it that often that "ugly" overrides the reasoning Brett presented, and if you do, I bet one or two personal helpers would clean up 95% of your cases. But see Nick's comment that "str(var)" is too permissive. I'll have to think about that, but my first take is he's right, and we need to do something about making use of Path more straightforward within the stdlib. Whatever that is, preferably would make life easier for 3rd party usage too, of course. Is error-checking within Path sufficiently robust in the light of "too permissive"? (I don't know exactly what I mean by that, but something like if "str(var_purporting_to_be_Path)" is too permissive, are we sure that "str(really_is_Path_var)" is "safe"? Apparently we haven't had a lot of beta testing.) > and let's not forget the Path(this_returned_string) and > Path(that_returned_string), But we don't object to (de)serializing dicts to (from) str (as JSON or pickle). I think Path vs. string is similarly different to justify saying so (especially when treating user input). Note, too, that based on discussion in that thread it seems likely that Path is likely to be inappropriate as an internal representation of URL.RFC3986.Path. Thus, strings that look like paths (as strings) actually will have multiple internal representations, similarly to the way that a dict can have multiple serializations. If representation transformation is not invertible, EIBTI says we need the "boilerplate". YMMV, but that's my take. From ncoghlan at gmail.com Wed Apr 6 01:44:41 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 15:44:41 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 6 April 2016 at 15:03, Guido van Rossum wrote: > On Tue, Apr 5, 2016 at 7:44 PM, Nick Coghlan wrote: >> Option 4: define a rich-object-to-text path serialisation convention, > > Unfortunately that sounds like a classic "serious programming" > solution (objects, abstractions, serialization, all big important > words :-). Yeah, my choice of phrasing made the idea sound more complicated than it is. The actual change would be to add the following to some Python standard library APIs that accept a filesystem path as an argument: arg = getattr(arg, "path", arg) and the C API based equivalent to some C modules. (With the main bike-sheddable part being whether to use the generic "path" or something more explicit like "__fspath__" for the property name, since pathlib can readily support either/both of them, and "__fspath__" would be in line with the "os.fsencode" and "os.fsdecode" abbreviations) The key goal of this approach would be to make it so that most third party libraries would "just work" with path objects if they were already using os.path and other standard library APIs for path manipulation (rather than using string methods directly), while still avoiding the type confusion that comes from inheriting directly from str. >From a testing perspective, it would arguably make sense to tackle it as a separate "test_path_protocol" test case that checked pathlib compatibility with the APIs of interest, simply to avoid adding a pathlib dependency to all those module tests. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Wed Apr 6 01:50:41 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Apr 2016 08:50:41 +0300 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 06.04.16 05:44, Nick Coghlan wrote: > The next challenge would then be to make a list of APIs to be updated > for 3.6 to implicitly accept "rich path" objects via the agreed > convention, with pathlib.PurePath used as a test class: > > * open() > * codecs.open() (et al) > * io.* > * os.path.* > * other os functions > * shutil.* > * tempfile.* > * shelve.* > * csv.* Not sure about os.path.*. The purpose of os.path module is manipulating string paths. From the perspective of pathlib it can look lower level. Supporting pathlib.Path will complicate and slow down os.path functions (they are already more complex and slow than were in Python 2). Since os.path functions often called several times in a loop, their performance is important. On other hand, some Path methods are more efficient than os.path functions, and Path specialized code at higher level can be more preferable. From greg.ewing at canterbury.ac.nz Wed Apr 6 01:52:34 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 06 Apr 2016 17:52:34 +1200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <5704A422.8020008@canterbury.ac.nz> Nick Coghlan wrote: > The most promising option for that is probably "getattr(path, 'path', > path)", Is there something seriously wrong with str(path)? -- Greg From storchaka at gmail.com Wed Apr 6 01:57:12 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Apr 2016 08:57:12 +0300 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 06.04.16 05:44, Nick Coghlan wrote: > The most promising option for that is probably "getattr(path, 'path', > path)", since the "path" attribute is being added to pathlib, and the > given idiom can be readily adopted in Python 2/3 compatible code > (since normal strings and any other object without a "path" attribute > are passed through unchanged). Alternatively, since it's a protocol, > double-underscores on the property name may be appropriate (i.e. > "getattr(path, '__path__', path)") This was already discussed. Current conclusion is using the "path" attribute. See http://bugs.python.org/issue22570 . From storchaka at gmail.com Wed Apr 6 01:59:04 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Apr 2016 08:59:04 +0300 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <5704A422.8020008@canterbury.ac.nz> References: <57044567.6070308@sdamon.com> <5704A422.8020008@canterbury.ac.nz> Message-ID: On 06.04.16 08:52, Greg Ewing wrote: > Nick Coghlan wrote: >> The most promising option for that is probably "getattr(path, 'path', >> path)", > > Is there something seriously wrong with str(path)? What if path is None or bytes? From ethan at stoneleaf.us Wed Apr 6 02:20:47 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 05 Apr 2016 23:20:47 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <22276.41270.715557.562304@turnbull.sk.tsukuba.ac.jp> References: <20160406024012.GG12526@ando.pearwood.info> <5704922D.7060905@stoneleaf.us> <22276.41270.715557.562304@turnbull.sk.tsukuba.ac.jp> Message-ID: <5704AABF.9080201@stoneleaf.us> On 04/05/2016 10:40 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: > > > No, Stephen, that is not what this is about. > > Wrong Steven. Spelling matters in email too. Yes, it absolutely does. My apologies. > -1 Not good enough. I wouldn't do it that often that "ugly" overrides > the reasoning Brett presented [...] > But we don't object to (de)serializing dicts to (from) str (as JSON or > pickle). Amusingly enough, I don't have to deal with serializing dicts. :) However, as a comparison: imagine you had to transform your dict to JSON every time some function wanted a dict as input. And had to transform returned JSON strings in to dicts. > I think Path vs. string is similarly different to justify > saying so (especially when treating user input). [...] > Thus, strings that look like paths (as strings) actually will have > multiple internal representations, similarly to the way that a dict > can have multiple serializations. I don't follow. When dealing with the file system one passes a string* representing the path of the object one wants -- pretty much the same string that was passed in to Path. -- ~Ethan~ * or bytes, but the same sameness, really. From ncoghlan at gmail.com Wed Apr 6 02:23:00 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 16:23:00 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> <5704A422.8020008@canterbury.ac.nz> Message-ID: On 6 April 2016 at 15:59, Serhiy Storchaka wrote: > On 06.04.16 08:52, Greg Ewing wrote: >> >> Nick Coghlan wrote: >>> >>> The most promising option for that is probably "getattr(path, 'path', >>> path)", >> >> >> Is there something seriously wrong with str(path)? > > What if path is None or bytes? Or an int, float, list, dict, or arbitrary other object. To be more explicit, the problem isn't what happens when the API doing "str(path)" internally is used correctly, it's what happens when it's used incorrectly: you end up proceeding with a nonsense string as your path name, rather than failing early with TypeError or AttributeError. Doing "getattr(path, 'path', path)" instead means that in the error case (i.e. no "path" attribute), any existing argument checking is still triggered normally. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Wed Apr 6 02:25:05 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 6 Apr 2016 16:25:05 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <22276.41111.455186.755173@turnbull.sk.tsukuba.ac.jp> References: <57044567.6070308@sdamon.com> <20160406025154.GH12526@ando.pearwood.info> <22276.41111.455186.755173@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Apr 6, 2016 at 3:37 PM, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > Outside of deliberate tests, we don't create files on our disks > > whose names are strings of random bytes; > > Wishful thinking. First, names made of control characters have often > been deliberately used by miscreants to conceal their warez. Second, > in some systems it's all too easy to create paths with components in > different locales (the place I've seen it most frequently is in NFS > mounts). I think that's much less true today, but perhaps that's only > because my employer figured out that it was much less pain if system > paths were pure ASCII so that it mostly didn't matter what encoding > users chose for their subtrees. Control characters are still characters, though. You can take a bytestring consisting of byte values less than 32, decode it as UTF-8, and have a series of codepoints to work with. If your employer has "solved" the problem by restricting system paths to ASCII, that's a fine solution for a single system with a single ASCII-compatible encoding; a better solution is to mandate UTF-8 as the file system encoding, as that's what most people are expecting anyway. > It remains important to be able to handle nearly arbitrary bytestrings > in file names as far as I can see. Please note that 100 million > Japanese and 1 billion Chinese by and large still prefer their > homegrown encodings (plural!!) to Unicode, while many systems are now > defaulting filenames to UTF-8. There's plenty of room remaining for > copying bytestrings to arguments of open and friends. Why exactly do they prefer these other encodings? Are they representing characters that Unicode doesn't contain? If so, we have a fundamental problem (no Python program is going to be able to cope with these, without a third party library or some stupid mess of local code); if not, you can always represent it as Unicode and encode it as UTF-8 when it reaches the file system. Re-encoding is something that's easy when you treat something as text, and impossible when you treat it as bytes. So far, you're still actually agreeing with me: paths are *text*, but sometimes we don't know the encoding (and that's a problem to be solved). ChrisA From ethan at stoneleaf.us Wed Apr 6 02:25:59 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 05 Apr 2016 23:25:59 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <5704ABF7.1010905@stoneleaf.us> On 04/05/2016 10:50 PM, Serhiy Storchaka wrote: > On 06.04.16 05:44, Nick Coghlan wrote: >> The next challenge would then be to make a list of APIs to be updated >> for 3.6 to implicitly accept "rich path" objects via the agreed >> convention, with pathlib.PurePath used as a test class: >> >> * open() >> * codecs.open() (et al) >> * io.* >> * os.path.* >> * other os functions >> * shutil.* >> * tempfile.* >> * shelve.* >> * csv.* > > Not sure about os.path.*. The purpose of os.path module is manipulating > string paths. From the perspective of pathlib it can look lower level. The point is that a function that receives a "path" object (whether str or Path) shouldn't have to care: it should be able to call os.path.split on the thing it received and get back a usable answer. -- ~Ethan~ From ncoghlan at gmail.com Wed Apr 6 02:29:34 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 16:29:34 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 6 April 2016 at 15:57, Serhiy Storchaka wrote: > On 06.04.16 05:44, Nick Coghlan wrote: >> >> The most promising option for that is probably "getattr(path, 'path', >> path)", since the "path" attribute is being added to pathlib, and the >> given idiom can be readily adopted in Python 2/3 compatible code >> (since normal strings and any other object without a "path" attribute >> are passed through unchanged). Alternatively, since it's a protocol, >> double-underscores on the property name may be appropriate (i.e. >> "getattr(path, '__path__', path)") > > This was already discussed. Current conclusion is using the "path" > attribute. See http://bugs.python.org/issue22570 . I'd missed the existing precedent in DirEntry.path, so simply taking that and running with it sounds good to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Wed Apr 6 02:50:06 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 05 Apr 2016 23:50:06 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <5704909E.8070908@stoneleaf.us> Message-ID: <5704B19E.3000405@stoneleaf.us> On 04/05/2016 10:00 PM, Guido van Rossum wrote: > On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman wrote: >> [...] we can't do: >> >> app_root = Path(...) >> config = app_root/'settings.cfg' >> with open(config) as blah: >> # whatever >> >> It feels like instead of addressing this basic disconnect, the answer has >> instead been: add that to pathlib! Which works great -- until a user or a >> library gets this path object and tries to use something from os on it. > > I agree that asking for config.open() isn't the right answer here > (even if it happens to work). But in this example, once 3.5.2 is out, > the solution would be to use open(config.path), and that will also > work when passing it to a library. Is it still unacceptable then? On the one hand that is definitely more palatable. On the other hand it doesn't address having the stdlib itself directly support Path. On the gripping hand this feels reminiscent of the arguments over bytes vs unicode, but without any of the "This is why unicode is better!" bits. Why is pathlib better than plain strings? - attribute access to different parts such as the dirname, the filename, the extension (suffix) - easy access to on-disk answers such as .exists(), .stat(), .chdir - easy creation/modification of Path objects What problem is it solving that makes the pain worth dealing with? - no idea This is an especially important point considering the str-derived Path libraries already out there that have the same advantages as pathlib, but none of the pain. -- ~Ethan~ From ncoghlan at gmail.com Wed Apr 6 02:52:53 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 16:52:53 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <5704ABF7.1010905@stoneleaf.us> References: <57044567.6070308@sdamon.com> <5704ABF7.1010905@stoneleaf.us> Message-ID: On 6 April 2016 at 16:25, Ethan Furman wrote: > On 04/05/2016 10:50 PM, Serhiy Storchaka wrote: >> On 06.04.16 05:44, Nick Coghlan wrote: >>> The next challenge would then be to make a list of APIs to be updated >>> for 3.6 to implicitly accept "rich path" objects via the agreed >>> convention, with pathlib.PurePath used as a test class: >>> >>> * open() >>> * codecs.open() (et al) >>> * io.* >>> * os.path.* >>> * other os functions >>> * shutil.* >>> * tempfile.* >>> * shelve.* >>> * csv.* >> >> >> Not sure about os.path.*. The purpose of os.path module is manipulating >> string paths. From the perspective of pathlib it can look lower level. > > The point is that a function that receives a "path" object (whether str or > Path) shouldn't have to care: it should be able to call os.path.split on the > thing it received and get back a usable answer. I actually think it makes sense to pursue this question in a test driven manner: create "test_pathlib_support" as a new test case, start passing pathlib.PurePath instances to a relatively high level API like shutil, and see what low level interfaces need to be updated accept filesystem path objects (in addition to strings) in order to make that work. If shutil can be updated to support pathlib with changes solely at at the io and os module layer, then that bodes well for transparently enabling support in 3rd party APIs as well. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Wed Apr 6 02:53:05 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Apr 2016 23:53:05 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote: > On 6 April 2016 at 15:57, Serhiy Storchaka wrote: >> On 06.04.16 05:44, Nick Coghlan wrote: >>> >>> The most promising option for that is probably "getattr(path, 'path', >>> path)", since the "path" attribute is being added to pathlib, and the >>> given idiom can be readily adopted in Python 2/3 compatible code >>> (since normal strings and any other object without a "path" attribute >>> are passed through unchanged). Alternatively, since it's a protocol, >>> double-underscores on the property name may be appropriate (i.e. >>> "getattr(path, '__path__', path)") >> >> This was already discussed. Current conclusion is using the "path" >> attribute. See http://bugs.python.org/issue22570 . > > I'd missed the existing precedent in DirEntry.path, so simply taking > that and running with it sounds good to me. This makes me twitch slightly, because NumPy has had a whole set of problems due to the ancient and minimally-considered decision to assume a bunch of ad hoc non-namespaced method names fulfilled some protocol -- like all .sum methods will have a signature that's compatible with numpy's, and if an object has a .log method then surely that computes the logarithm (what else in computing could "log" possibly refer to?), etc. This experience may or may not be relevant, I'm not sure -- sometimes these kinds of twitches are good guides to intuition, and sometimes they are just knee-jerk responses to an old and irrelevant problem :-). But you might want to at least think about how common it might be to have existing objects with unrelated attributes that happen to be called "path", and the bizarro problems that might be caused if someone accidentally passes one of them to a function that expects all .path attributes to be instances of this new protocol. -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Wed Apr 6 02:57:49 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Apr 2016 16:57:49 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 6 April 2016 at 16:53, Nathaniel Smith wrote: > On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote: >> I'd missed the existing precedent in DirEntry.path, so simply taking >> that and running with it sounds good to me. > > This makes me twitch slightly, because NumPy has had a whole set of > problems due to the ancient and minimally-considered decision to > assume a bunch of ad hoc non-namespaced method names fulfilled some > protocol -- like all .sum methods will have a signature that's > compatible with numpy's, and if an object has a .log method then > surely that computes the logarithm (what else in computing could "log" > possibly refer to?), etc. This experience may or may not be relevant, > I'm not sure -- sometimes these kinds of twitches are good guides to > intuition, and sometimes they are just knee-jerk responses to an old > and irrelevant problem :-) > > But you might want to at least think about > how common it might be to have existing objects with unrelated > attributes that happen to be called "path", and the bizarro problems > that might be caused if someone accidentally passes one of them to a > function that expects all .path attributes to be instances of this new > protocol. sys.path, for example. That's why I'd actually prefer the implicit conversion protocol to be the more explicitly named "__fspath__", with suitable "__fspath__ = path" assignments added to DirEntry and pathlib. However, I'm also not offering to actually *do* the work here, and the casting vote goes to the folks pursuing the implementation effort. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wes.turner at gmail.com Wed Apr 6 03:14:53 2016 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 6 Apr 2016 02:14:53 -0500 Subject: [Python-Dev] When should pathlib stop being provisional? Message-ID: On Apr 6, 2016 1:26 AM, "Chris Angelico" wrote: > > On Wed, Apr 6, 2016 at 3:37 PM, Stephen J. Turnbull wrote: > > Chris Angelico writes: > > > > > Outside of deliberate tests, we don't create files on our disks > > > whose names are strings of random bytes; > > > > Wishful thinking. First, names made of control characters have often > > been deliberately used by miscreants to conceal their warez. Second, > > in some systems it's all too easy to create paths with components in > > different locales (the place I've seen it most frequently is in NFS > > mounts). I think that's much less true today, but perhaps that's only > > because my employer figured out that it was much less pain if system > > paths were pure ASCII so that it mostly didn't matter what encoding > > users chose for their subtrees. > > Control characters are still characters, though. You can take a > bytestring consisting of byte values less than 32, decode it as UTF-8, > and have a series of codepoints to work with. > > If your employer has "solved" the problem by restricting system paths > to ASCII, that's a fine solution for a single system with a single > ASCII-compatible encoding; a better solution is to mandate UTF-8 as > the file system encoding, as that's what most people are expecting > anyway. > > > It remains important to be able to handle nearly arbitrary bytestrings > > in file names as far as I can see. Please note that 100 million > > Japanese and 1 billion Chinese by and large still prefer their > > homegrown encodings (plural!!) to Unicode, while many systems are now > > defaulting filenames to UTF-8. There's plenty of room remaining for > > copying bytestrings to arguments of open and friends. > > Why exactly do they prefer these other encodings? Are they > representing characters that Unicode doesn't contain? If so, we have a > fundamental problem (no Python program is going to be able to cope > with these, without a third party library or some stupid mess of local > code); if not, you can always represent it as Unicode and encode it as > UTF-8 when it reaches the file system. Re-encoding is something that's > easy when you treat something as text, and impossible when you treat > it as bytes. > > So far, you're still actually agreeing with me: paths are *text*, but > sometimes we don't know the encoding (and that's a problem to be > solved). re: bytestring, unicode, encodings after e.g. os.path.split / Path.split: from "[Python-ideas] Type hints for text/binary data in Python 2+3 code" https://mail.python.org/pipermail/python-ideas/2016-March/038869.html >> would/will it be possible to use Typing.Text as a base class for even-more abstract string types https://mail.python.org/pipermail/python-ideas/2016-March/039016.html >> * Text.encoding >> * Text.lang (urn:ietf:rfc:3066) ... forgot to CC: >> * https://tools.ietf.org/html/rfc5646 "Tags for Identifying Languages" urn:ietf:rfc:5646 is this (Path) a narrower case of string types (#strypes), because after transformations we want to preserve string metadata like e.g encoding? I'd vote for * adding DirEntry.__path__ as a proxy to DirEntry.path * standardizing on __path__ (over .path) * because this operation *is* fundamentally similar to e.g. __str__ * operator.path pathify, pathifize > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Apr 6 05:02:09 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 Apr 2016 10:02:09 +0100 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <5704909E.8070908@stoneleaf.us> Message-ID: On 6 April 2016 at 06:00, Guido van Rossum wrote: > On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman wrote: >> [...] we can't do: >> >> app_root = Path(...) >> config = app_root/'settings.cfg' >> with open(config) as blah: >> # whatever >> >> It feels like instead of addressing this basic disconnect, the answer has >> instead been: add that to pathlib! Which works great -- until a user or a >> library gets this path object and tries to use something from os on it. > > I agree that asking for config.open() isn't the right answer here > (even if it happens to work). But in this example, once 3.5.2 is out, > the solution would be to use open(config.path), and that will also > work when passing it to a library. Is it still unacceptable then? My sense is that this will remain unacceptable to those people who have a problem here. The issue is not so much the ugliness of the code (in spite of the fact that this is what people focus on) but rather the disconnect between the mental model people have and the reality of the code they have to write. The basic idea behind pathlib.Path objects is that they represent a *path*. And when you call open, you should pass it a path. So (the argument goes) why should you have to convert the path you have (a Path object) to pass it to a function (like open) that requires a path argument? Making stdlib functions work with Path objects would fix a lot of the conceptual difficulties here. And it would also mean that (thanks to duck typing) a lot of 3rd party code would work without change, further alleviating the issue. But ultimately, there will still be code that needs changing to be aware of Path objects. The change is simple enough (patharg = str(patharg), or the getattr('path') approach) but it's a change in mental model (this time by library authors) and the benefit of the change is not sufficiently obvious. Inheriting from str is the commonly-proposed solution, because in practical terms it works. But it does so by mixing layers of abstraction in a way that is difficult to explain to someone who thinks of a "path" as an abstract object rather than as a (text? byte?) string. Ultimately, all that's happening is that the burden of keeping the abstractions separate is placed on the design, rather than being explicit in the code. But while I have no evidence that this is a problem, it does leave me with a nagging feeling that it "seems similar to the bytes/text issue". My feelings: - I'd *like* to push for the cleaner separation of abstractions that a "pure" Path object provides. - It does need library writers (and in particular the stdlib) to "buy into" the model and make changes to support Path objects - I don't have a huge problem with using str(p) or p.path as a workaround during the transition, but that's from the POV of throwaway scripting. I'm not sure I'd be so happy using the workaround in code that would need to be supported for a long time. - I'd rather compromise on principles than abandon the idea of a stdlib Path object - In practical terms, inheriting from str is probably fine. At least evidence from 3rd party path libraries indicates so. Paul From encukou at gmail.com Wed Apr 6 05:30:32 2016 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 6 Apr 2016 11:30:32 +0200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <5704D738.4070507@gmail.com> On 04/06/2016 08:53 AM, Nathaniel Smith wrote: > On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote: >> On 6 April 2016 at 15:57, Serhiy Storchaka wrote: >>> On 06.04.16 05:44, Nick Coghlan wrote: >>>> >>>> The most promising option for that is probably "getattr(path, 'path', >>>> path)", since the "path" attribute is being added to pathlib, and the >>>> given idiom can be readily adopted in Python 2/3 compatible code >>>> (since normal strings and any other object without a "path" attribute >>>> are passed through unchanged). Alternatively, since it's a protocol, >>>> double-underscores on the property name may be appropriate (i.e. >>>> "getattr(path, '__path__', path)") >>> >>> This was already discussed. Current conclusion is using the "path" >>> attribute. See http://bugs.python.org/issue22570 . >> >> I'd missed the existing precedent in DirEntry.path, so simply taking >> that and running with it sounds good to me. > > This makes me twitch slightly, because NumPy has had a whole set of > problems due to the ancient and minimally-considered decision to > assume a bunch of ad hoc non-namespaced method names fulfilled some > protocol -- like all .sum methods will have a signature that's > compatible with numpy's, and if an object has a .log method then > surely that computes the logarithm (what else in computing could "log" > possibly refer to?), etc. This experience may or may not be relevant, > I'm not sure -- sometimes these kinds of twitches are good guides to > intuition, and sometimes they are just knee-jerk responses to an old > and irrelevant problem :-). But you might want to at least think about > how common it might be to have existing objects with unrelated > attributes that happen to be called "path", and the bizarro problems > that might be caused if someone accidentally passes one of them to a > function that expects all .path attributes to be instances of this new > protocol. > > -n > Python was in a similar situation with the .next method on iterators, which changed to __next__ in Python 3. PEP 3114 (which explains this change) says: > Code that nowhere contains an explicit call to a next method can > nonetheless be silently affected by the presence of such > a method. Therefore, this PEP proposes that iterators should have > a __next__ method instead of a next method (with no change in > semantics). How well does that apply to path/__path__? That PEP also introduced the next() builtin. This suggests that a protocol with __path__/__fspath__ would need a corresponding path()/fspath() builtin. From antoine at python.org Wed Apr 6 05:41:18 2016 From: antoine at python.org (Antoine Pitrou) Date: Wed, 6 Apr 2016 09:41:18 +0000 (UTC) Subject: [Python-Dev] When should pathlib stop being provisional? References: Message-ID: Brett Cannon python.org> writes: > > :) I figured. I was close myself until I decided to be the "not inheriting from str is a sane decision" camp because people weren't understanding where the design decision probably came from, hence?http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str That's a good write-up, thank you. Paths don't have to inherit str any more than IP addresses or any other thing that happens to be passed as a string in traditional APIs. On a concrete point, inheriting str would make the API a horrible, confusing, dangerous mess missing regular string semantics (concatenation with +, for example, or indexing) with path-specific semantics and various grey areas (should .split() have path semantics or str semantics? what is the rule and how are people supposed to remember it?). (of course, for PHP or Javascript programmers it may not sound like a problem. Let "adding" two IP addresses return the concatenation of their string representations...) Regards Antoine. From antoine at python.org Wed Apr 6 05:44:30 2016 From: antoine at python.org (Antoine Pitrou) Date: Wed, 6 Apr 2016 09:44:30 +0000 (UTC) Subject: [Python-Dev] When should pathlib stop being provisional? References: <57044567.6070308@sdamon.com> Message-ID: Nick Coghlan gmail.com> writes: > > sys.path, for example. > > That's why I'd actually prefer the implicit conversion protocol to be > the more explicitly named "__fspath__", with suitable "__fspath__ = > path" assignments added to DirEntry and pathlib. That was my preference as well. > However, I'm also not > offering to actually *do* the work here, and the casting vote goes to > the folks pursuing the implementation effort. Indeed. Regards Antoine. From antoine at python.org Wed Apr 6 05:50:45 2016 From: antoine at python.org (Antoine Pitrou) Date: Wed, 6 Apr 2016 09:50:45 +0000 (UTC) Subject: [Python-Dev] When should pathlib stop being provisional? References: <57044567.6070308@sdamon.com> <5704ABF7.1010905@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > > > > Not sure about os.path.*. The purpose of os.path module is manipulating > > string paths. From the perspective of pathlib it can look lower level. > > The point is that a function that receives a "path" object (whether str > or Path) shouldn't have to care: it should be able to call os.path.split > on the thing it received and get back a usable answer. pathlib should already replicate the useful parts of os.path. That was the design goal after all. So this is like saying you want a Python file or socket object to be accepted by os.read(). In the rare case where you want that, you call the .fileno() method explicitly. The equivalent for Path objects is to lookup the .path attribute explicitly. Regards Antoine. From steve at pearwood.info Wed Apr 6 06:45:08 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 6 Apr 2016 20:45:08 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <20160406104508.GI12526@ando.pearwood.info> On Tue, Apr 05, 2016 at 11:53:05PM -0700, Nathaniel Smith wrote: > This makes me twitch slightly, because NumPy has had a whole set of > problems due to the ancient and minimally-considered decision to > assume a bunch of ad hoc non-namespaced method names fulfilled some > protocol -- like all .sum methods will have a signature that's > compatible with numpy's, and if an object has a .log method then > surely that computes the logarithm (what else in computing could "log" > possibly refer to?), etc. It's the down-side of duck-typing. It's all well and good accepting anything with a quack method, but not everything is that straight- forward: artist.draw() gunslinger.draw() I think that file system paths are important enough, and tricky enough, to justify their own protocol. I like Nick's suggestion of a special dunder method for converting path-like objects into paths, without the problems that str(x) has, or the risk of assuming that anything with a .path attribute refers to a file system path. (maze.path, garden.path, career.path perhaps?) -- Steve From p.f.moore at gmail.com Wed Apr 6 07:03:29 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 Apr 2016 12:03:29 +0100 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: On 6 April 2016 at 00:45, Guido van Rossum wrote: > This does sound like it's the crucial issue, and it is worth writing > up clearly the pros and cons. Let's draft those lists in a thread > (this one's fine) and then add them to the PEP. We can then decide to: > > - keep the status quo > - change PurePath to inherit from str > - decide it's never going to be settled and kill pathlib.py > > (And yes, I'm dead serious about the latter, rather Solomonic option.) By the way, even if there's no solution that satisfies everyone to the "inherit from str" question, I'd still be unhappy if pathlib disappeared from the stdlib. It's useful for quick admin scripts that don't justify an external dependency. Those typically do quite a bit of path manipulation, and as such benefit from the improved API of pathlib over os.path. +1 on making (and documenting) a final decision on the "inherit from str" question -1 on removing pathlib just because that decision might not satisfy everyone Paul From rdmurray at bitdance.com Wed Apr 6 10:04:15 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 06 Apr 2016 10:04:15 -0400 Subject: [Python-Dev] bugs.python.org email blockage at gmail In-Reply-To: References: <20160405195631.B2F35B14156@webabinitio.net> Message-ID: <20160406140417.57BB1B14023@webabinitio.net> On Wed, 06 Apr 2016 12:21:04 +1000, Nick Coghlan wrote: > On 6 April 2016 at 11:27, Terry Reedy wrote: > bugs.python.org is currently sending notification emails directly to > recipients, rather than routing them via the outbound SMTP server on > mail.python.org. Correct. > Reconfiguring it to relay notifications via the main outgoing server > is the longer term fix, but an initial attempt at enabling that > resulted in errors in the bugs.python.org mail logs, so David reverted > to the direct email configuration for the time being. Specifically, I think we should clean up the issues that are causing reputation loss (which pretty much means dropping rietveld, although in theory we could fix rietveld instead if someone wants to finish Ezio's patch). And then we need to understand the issue that caused me to back out the change: something is sending null-Sender emails to multiple recipients. We may not need to fix it (mail.python.org rejected them but they may be useless messages), but we probably should. I suspect they are actual bounces, but I don't have the time to investigate further at this time. --David From rdmurray at bitdance.com Wed Apr 6 10:08:39 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 06 Apr 2016 10:08:39 -0400 Subject: [Python-Dev] bugs.python.org email blockage at gmail In-Reply-To: <22276.31880.854091.86500@turnbull.sk.tsukuba.ac.jp> References: <20160405195631.B2F35B14156@webabinitio.net> <22276.31880.854091.86500@turnbull.sk.tsukuba.ac.jp> Message-ID: <20160406140842.E13AEB14023@webabinitio.net> On Wed, 06 Apr 2016 12:03:36 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > again. However, the IPV4 address has a poor reputation, and Verizon > > at least appears to be blocking it. So more work is still needed. > > Don't take Verizon's policy as meaningful. Tell Verizon customers to > get another address. That is the only solution that works for Verizon > subscribers for very long (based on 15 years of Mailman-Users posts), > they have never been a high-quality email provider. Further, Verizon > (as an email provider) is in the process of dying anyway (they are > very much alive as the new owner of AOL), so improvements in their > email practices have a likelihood of zero to the resolution of a C > float. Yes, Mark reminded me that Verizon still isn't accepting mail from mail.python.org, despite multiple contacts from the postmaster team. So they are pretty much a lost cause and no one should use them for email, I think. However, the "poor reputation" comment came from the error message returned by gmail when it bounced the spam-bounce-reports bugs was trying to send to Ezio. --David From steve at pearwood.info Wed Apr 6 10:39:12 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 7 Apr 2016 00:39:12 +1000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <5704D738.4070507@gmail.com> References: <57044567.6070308@sdamon.com> <5704D738.4070507@gmail.com> Message-ID: <20160406143909.GJ12526@ando.pearwood.info> On Wed, Apr 06, 2016 at 11:30:32AM +0200, Petr Viktorin wrote: > Python was in a similar situation with the .next method on iterators, > which changed to __next__ in Python 3. PEP 3114 (which explains this > change) says: > > > Code that nowhere contains an explicit call to a next method can > > nonetheless be silently affected by the presence of such > > a method. Therefore, this PEP proposes that iterators should have > > a __next__ method instead of a next method (with no change in > > semantics). > > How well does that apply to path/__path__? I think it's potentially the same. Possibly there are fewer existing uses of "obj.path" out there which conflict with this use, but there's at least one in the std lib: sys.path. > That PEP also introduced the next() builtin. This suggests that a > protocol with __path__/__fspath__ would need a corresponding > path()/fspath() builtin. Not necessarily. Take a look at (say) dir(object()) and you'll see a few dunders that don't correspond to built-ins: __reduce__ and __reduce_ex__ are used by pickle; __sizeof__ is used by sys.getsizeof; __subclasshook__ is used by the ABC system; Another example is __trunc__ used by math.trunc(). So any such fspath function should stand on its own as a useful feature, not just because there's a dunder method __fspath__. -- Steve From njs at pobox.com Wed Apr 6 10:50:23 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Apr 2016 07:50:23 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <20160406143909.GJ12526@ando.pearwood.info> References: <57044567.6070308@sdamon.com> <5704D738.4070507@gmail.com> <20160406143909.GJ12526@ando.pearwood.info> Message-ID: On Apr 6, 2016 07:44, "Steven D'Aprano" wrote: > > On Wed, Apr 06, 2016 at 11:30:32AM +0200, Petr Viktorin wrote: > > > Python was in a similar situation with the .next method on iterators, > > which changed to __next__ in Python 3. PEP 3114 (which explains this > > change) says: > > > > > Code that nowhere contains an explicit call to a next method can > > > nonetheless be silently affected by the presence of such > > > a method. Therefore, this PEP proposes that iterators should have > > > a __next__ method instead of a next method (with no change in > > > semantics). > > > > How well does that apply to path/__path__? > > I think it's potentially the same. Possibly there are fewer existing > uses of "obj.path" out there which conflict with this use, but there's > at least one in the std lib: sys.path. > > > > That PEP also introduced the next() builtin. This suggests that a > > protocol with __path__/__fspath__ would need a corresponding > > path()/fspath() builtin. > > Not necessarily. Take a look at (say) dir(object()) and you'll see a few > dunders that don't correspond to built-ins: > > __reduce__ and __reduce_ex__ are used by pickle; > __sizeof__ is used by sys.getsizeof; > __subclasshook__ is used by the ABC system; > > Another example is __trunc__ used by math.trunc(). > > So any such fspath function should stand on its own as a useful > feature, not just because there's a dunder method __fspath__. An even more precise analogy is provided by __index__, whose semantics are to provide safe casting to integer (the name is a historical accident), as opposed to __int__'s tendency to cast things to integer willy-nilly, including things that really shouldn't be silently accepted as integers. Basically __index__ is to __int__ as __(fs)path__ would be to __str__. There's an operator.index but no builtins.index. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 6 11:01:30 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 08:01:30 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> <5704ABF7.1010905@stoneleaf.us> Message-ID: <570524CA.50108@stoneleaf.us> On 04/06/2016 02:50 AM, Antoine Pitrou wrote: > Ethan Furman stoneleaf.us> writes: >>> >>> Not sure about os.path.*. The purpose of os.path module is manipulating >>> string paths. From the perspective of pathlib it can look lower level. >> >> The point is that a function that receives a "path" object (whether str >> or Path) shouldn't have to care: it should be able to call os.path.split >> on the thing it received and get back a usable answer. > > pathlib should already replicate the useful parts of os.path. That was > the design goal after all. Yes it does, and very well. > So this is like saying you want a Python file or socket object to be > accepted by os.read(). In the rare case where you want that, you call the > .fileno() method explicitly. The equivalent for Path objects is to > lookup the .path attribute explicitly. Unfortunately for Path objects there is already a well-established ecosystem for dealing with paths as strings, and it currently breaks when passed a Path path object. This is a high barrier to entry. Having the stdlib support Path objects would lower that barrier significantly. -- ~Ethan~ From ethan at stoneleaf.us Wed Apr 6 11:10:06 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 08:10:06 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <570526CE.5080401@stoneleaf.us> On 04/05/2016 11:57 PM, Nick Coghlan wrote: > On 6 April 2016 at 16:53, Nathaniel Smith wrote: >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote: >>> I'd missed the existing precedent in DirEntry.path, so simply taking >>> that and running with it sounds good to me. >> >> This makes me twitch slightly, because NumPy has had a whole set of >> problems due to the ancient and minimally-considered decision to >> assume a bunch of ad hoc non-namespaced method names fulfilled some >> protocol -- like all .sum methods will have a signature that's >> compatible with numpy's, and if an object has a .log method then >> surely that computes the logarithm (what else in computing could "log" >> possibly refer to?), etc. This experience may or may not be relevant, >> I'm not sure -- sometimes these kinds of twitches are good guides to >> intuition, and sometimes they are just knee-jerk responses to an old >> and irrelevant problem :-) >> >> But you might want to at least think about >> how common it might be to have existing objects with unrelated >> attributes that happen to be called "path", and the bizarro problems >> that might be caused if someone accidentally passes one of them to a >> function that expects all .path attributes to be instances of this new >> protocol. > > sys.path, for example. > > That's why I'd actually prefer the implicit conversion protocol to be > the more explicitly named "__fspath__", with suitable "__fspath__ = > path" assignments added to DirEntry and pathlib. However, I'm also not > offering to actually *do* the work here, and the casting vote goes to > the folks pursuing the implementation effort. If we decide upon __fspath__ (or __path__) I will do the work on pathlib and scandir to add those attributes. -- ~Ethan~ From brett at python.org Wed Apr 6 13:26:36 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 17:26:36 +0000 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: <570526CE.5080401@stoneleaf.us> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: WIth Ethan volunteering to do the work to help make a path protocol a thing -- and I'm willing to help along with propagating this through the stdlib where I think Serhiy might be interested in helping as well -- and a seeming consensus this is a good idea, it seems like this proposal has a chance of actually coming to fruition. Now we need clear details. :) Some open questions are: 1. Name: __path__, __fspath__, or something else? 2. Method or attribute? (changes what kind of one-liner you might use in libraries, but I think historically all protocols have been methods and the serialized string representation might be costly to build) 3. Built-in? (name is dependent on #1 if we add one) 4. Add the method/attribute to str? (I assume so, much like __index__() is on int, but I have not seen it explicitly stated so I would rather clarify it) 5. Expand the C API to have something like PyObject_Path()? Some people have asked for the pathlib PEP to have a more flushed out reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't want to do it I can try to instil my blog post into a more succinct paragraph or two and update the PEP myself. Is this going to require a PEP or if we can agree on the points here are we just going to do it? If we think it requires a PEP I'm willing to write it, but I obviously have no issue if we skip that step either. :) Oh, and we should resolve this before the next release of Python 3.4, 3.5, or 3.6 so that pathlib can be updated in those releases. -Brett On Wed, 6 Apr 2016 at 08:09 Ethan Furman wrote: > On 04/05/2016 11:57 PM, Nick Coghlan wrote: > > On 6 April 2016 at 16:53, Nathaniel Smith wrote: > >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan > wrote: > > >>> I'd missed the existing precedent in DirEntry.path, so simply taking > >>> that and running with it sounds good to me. > >> > >> This makes me twitch slightly, because NumPy has had a whole set of > >> problems due to the ancient and minimally-considered decision to > >> assume a bunch of ad hoc non-namespaced method names fulfilled some > >> protocol -- like all .sum methods will have a signature that's > >> compatible with numpy's, and if an object has a .log method then > >> surely that computes the logarithm (what else in computing could "log" > >> possibly refer to?), etc. This experience may or may not be relevant, > >> I'm not sure -- sometimes these kinds of twitches are good guides to > >> intuition, and sometimes they are just knee-jerk responses to an old > >> and irrelevant problem :-) > >> > >> But you might want to at least think about > >> how common it might be to have existing objects with unrelated > >> attributes that happen to be called "path", and the bizarro problems > >> that might be caused if someone accidentally passes one of them to a > >> function that expects all .path attributes to be instances of this new > >> protocol. > > > > sys.path, for example. > > > > That's why I'd actually prefer the implicit conversion protocol to be > > the more explicitly named "__fspath__", with suitable "__fspath__ = > > path" assignments added to DirEntry and pathlib. However, I'm also not > > offering to actually *do* the work here, and the casting vote goes to > > the folks pursuing the implementation effort. > > If we decide upon __fspath__ (or __path__) I will do the work on pathlib > and scandir to add those attributes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Wed Apr 6 13:35:25 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 6 Apr 2016 19:35:25 +0200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: <570548DD.7080108@gmail.com> Wouldn't be better to generalize that to a "__location__" protocol, which allow to return any kind of location, including path, url or coordinate, ip_address, etc ? Le 06/04/2016 19:26, Brett Cannon a ?crit : > WIth Ethan volunteering to do the work to help make a path protocol a > thing -- and I'm willing to help along with propagating this through the > stdlib where I think Serhiy might be interested in helping as well -- > and a seeming consensus this is a good idea, it seems like this proposal > has a chance of actually coming to fruition. > > Now we need clear details. :) Some open questions are: > > 1. Name: __path__, __fspath__, or something else? > 2. Method or attribute? (changes what kind of one-liner you might use > in libraries, but I think historically all protocols have been > methods and the serialized string representation might be costly to > build) > 3. Built-in? (name is dependent on #1 if we add one) > 4. Add the method/attribute to str? (I assume so, much like __index__() > is on int, but I have not seen it explicitly stated so I would > rather clarify it) > 5. Expand the C API to have something like PyObject_Path()? > > > Some people have asked for the pathlib PEP to have a more flushed out > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't > want to do it I can try to instil my blog post into a more succinct > paragraph or two and update the PEP myself. > > Is this going to require a PEP or if we can agree on the points here are > we just going to do it? If we think it requires a PEP I'm willing to > write it, but I obviously have no issue if we skip that step either. :) > > Oh, and we should resolve this before the next release of Python 3.4, > 3.5, or 3.6 so that pathlib can be updated in those releases. > > -Brett > > > On Wed, 6 Apr 2016 at 08:09 Ethan Furman > wrote: > > On 04/05/2016 11:57 PM, Nick Coghlan wrote: > > On 6 April 2016 at 16:53, Nathaniel Smith > wrote: > >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan > wrote: > > >>> I'd missed the existing precedent in DirEntry.path, so simply taking > >>> that and running with it sounds good to me. > >> > >> This makes me twitch slightly, because NumPy has had a whole set of > >> problems due to the ancient and minimally-considered decision to > >> assume a bunch of ad hoc non-namespaced method names fulfilled some > >> protocol -- like all .sum methods will have a signature that's > >> compatible with numpy's, and if an object has a .log method then > >> surely that computes the logarithm (what else in computing could > "log" > >> possibly refer to?), etc. This experience may or may not be relevant, > >> I'm not sure -- sometimes these kinds of twitches are good guides to > >> intuition, and sometimes they are just knee-jerk responses to an old > >> and irrelevant problem :-) > >> > >> But you might want to at least think about > >> how common it might be to have existing objects with unrelated > >> attributes that happen to be called "path", and the bizarro problems > >> that might be caused if someone accidentally passes one of them to a > >> function that expects all .path attributes to be instances of > this new > >> protocol. > > > > sys.path, for example. > > > > That's why I'd actually prefer the implicit conversion protocol to be > > the more explicitly named "__fspath__", with suitable "__fspath__ = > > path" assignments added to DirEntry and pathlib. However, I'm also not > > offering to actually *do* the work here, and the casting vote goes to > > the folks pursuing the implementation effort. > > If we decide upon __fspath__ (or __path__) I will do the work on pathlib > and scandir to add those attributes. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com > From wes.turner at gmail.com Wed Apr 6 13:37:06 2016 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 6 Apr 2016 12:37:06 -0500 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <570526CE.5080401@stoneleaf.us> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: * +1 for __path__, __fspath__ (though I don't know what each does) * why not Text(basestring / bytestring) and pathlib.Path(Text)? * are there examples of cases where this cannot be? * if not, +1 for subclassing str/Text * where are the examples of method collisions between the str interface and the pathlib.Path interface? * str.__div__ is nonsensical * pathlib.Path.__div__ is super-useful On Apr 6, 2016 10:10 AM, "Ethan Furman" wrote: > On 04/05/2016 11:57 PM, Nick Coghlan wrote: > >> On 6 April 2016 at 16:53, Nathaniel Smith wrote: >> >>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan >>> wrote: >>> >> > I'd missed the existing precedent in DirEntry.path, so simply taking >>>> that and running with it sounds good to me. >>>> >>> >>> This makes me twitch slightly, because NumPy has had a whole set of >>> problems due to the ancient and minimally-considered decision to >>> assume a bunch of ad hoc non-namespaced method names fulfilled some >>> protocol -- like all .sum methods will have a signature that's >>> compatible with numpy's, and if an object has a .log method then >>> surely that computes the logarithm (what else in computing could "log" >>> possibly refer to?), etc. This experience may or may not be relevant, >>> I'm not sure -- sometimes these kinds of twitches are good guides to >>> intuition, and sometimes they are just knee-jerk responses to an old >>> and irrelevant problem :-) >>> >>> But you might want to at least think about >>> how common it might be to have existing objects with unrelated >>> attributes that happen to be called "path", and the bizarro problems >>> that might be caused if someone accidentally passes one of them to a >>> function that expects all .path attributes to be instances of this new >>> protocol. >>> >> >> sys.path, for example. >> >> That's why I'd actually prefer the implicit conversion protocol to be >> the more explicitly named "__fspath__", with suitable "__fspath__ = >> path" assignments added to DirEntry and pathlib. However, I'm also not >> offering to actually *do* the work here, and the casting vote goes to >> the folks pursuing the implementation effort. >> > > If we decide upon __fspath__ (or __path__) I will do the work on pathlib > and scandir to add those attributes. > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 6 13:41:14 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 17:41:14 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <570548DD.7080108@gmail.com> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <570548DD.7080108@gmail.com> Message-ID: On Wed, 6 Apr 2016 at 10:36 Michel Desmoulin wrote: > Wouldn't be better to generalize that to a "__location__" protocol, > which allow to return any kind of location, including path, url or > coordinate, ip_address, etc ? > No because all of those things have different semantic meaning. See the __index__ PEP for reasons why you would tightly bound protocols instead of overloading ones like __int__ for multiple meanings. -Brett > > Le 06/04/2016 19:26, Brett Cannon a ?crit : > > WIth Ethan volunteering to do the work to help make a path protocol a > > thing -- and I'm willing to help along with propagating this through the > > stdlib where I think Serhiy might be interested in helping as well -- > > and a seeming consensus this is a good idea, it seems like this proposal > > has a chance of actually coming to fruition. > > > > Now we need clear details. :) Some open questions are: > > > > 1. Name: __path__, __fspath__, or something else? > > 2. Method or attribute? (changes what kind of one-liner you might use > > in libraries, but I think historically all protocols have been > > methods and the serialized string representation might be costly to > > build) > > 3. Built-in? (name is dependent on #1 if we add one) > > 4. Add the method/attribute to str? (I assume so, much like __index__() > > is on int, but I have not seen it explicitly stated so I would > > rather clarify it) > > 5. Expand the C API to have something like PyObject_Path()? > > > > > > Some people have asked for the pathlib PEP to have a more flushed out > > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't > > want to do it I can try to instil my blog post into a more succinct > > paragraph or two and update the PEP myself. > > > > Is this going to require a PEP or if we can agree on the points here are > > we just going to do it? If we think it requires a PEP I'm willing to > > write it, but I obviously have no issue if we skip that step either. :) > > > > Oh, and we should resolve this before the next release of Python 3.4, > > 3.5, or 3.6 so that pathlib can be updated in those releases. > > > > -Brett > > > > > > On Wed, 6 Apr 2016 at 08:09 Ethan Furman > > wrote: > > > > On 04/05/2016 11:57 PM, Nick Coghlan wrote: > > > On 6 April 2016 at 16:53, Nathaniel Smith > > wrote: > > >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan > > wrote: > > > > >>> I'd missed the existing precedent in DirEntry.path, so simply > taking > > >>> that and running with it sounds good to me. > > >> > > >> This makes me twitch slightly, because NumPy has had a whole set > of > > >> problems due to the ancient and minimally-considered decision to > > >> assume a bunch of ad hoc non-namespaced method names fulfilled > some > > >> protocol -- like all .sum methods will have a signature that's > > >> compatible with numpy's, and if an object has a .log method then > > >> surely that computes the logarithm (what else in computing could > > "log" > > >> possibly refer to?), etc. This experience may or may not be > relevant, > > >> I'm not sure -- sometimes these kinds of twitches are good guides > to > > >> intuition, and sometimes they are just knee-jerk responses to an > old > > >> and irrelevant problem :-) > > >> > > >> But you might want to at least think about > > >> how common it might be to have existing objects with unrelated > > >> attributes that happen to be called "path", and the bizarro > problems > > >> that might be caused if someone accidentally passes one of them > to a > > >> function that expects all .path attributes to be instances of > > this new > > >> protocol. > > > > > > sys.path, for example. > > > > > > That's why I'd actually prefer the implicit conversion protocol to > be > > > the more explicitly named "__fspath__", with suitable "__fspath__ = > > > path" assignments added to DirEntry and pathlib. However, I'm also > not > > > offering to actually *do* the work here, and the casting vote goes > to > > > the folks pursuing the implementation effort. > > > > If we decide upon __fspath__ (or __path__) I will do the work on > pathlib > > and scandir to add those attributes. > > > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 6 13:46:51 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 17:46:51 +0000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 10:41 Wes Turner wrote: > * +1 for __path__, __fspath__ > (though I don't know what each does) > Returns a string representing a file system path. > * why not Text(basestring / bytestring) and pathlib.Path(Text)? > See the points about next() vs __next__() > * are there examples of cases where this cannot be? > I don't understand what you think "cannot be". > * if not, +1 for subclassing str/Text > > * where are the examples of method collisions between the str > interface and the pathlib.Path interface? > There aren't any and that's partially why some people wanted the str subclass to begin with. Please consider this thread a str-subclass-free zone. This line of discussion is to flesh out the proposal for a path protocol as a proposal against subclassing str, not to settle the whole discussion outright. If you want to continue to debate the subclassing-str side of this please use the other thread. -Brett > * str.__div__ is nonsensical > * pathlib.Path.__div__ is super-useful > > > On Apr 6, 2016 10:10 AM, "Ethan Furman" wrote: > >> On 04/05/2016 11:57 PM, Nick Coghlan wrote: >> >>> On 6 April 2016 at 16:53, Nathaniel Smith wrote: >>> >>>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan >>>> wrote: >>>> >>> >> I'd missed the existing precedent in DirEntry.path, so simply taking >>>>> that and running with it sounds good to me. >>>>> >>>> >>>> This makes me twitch slightly, because NumPy has had a whole set of >>>> problems due to the ancient and minimally-considered decision to >>>> assume a bunch of ad hoc non-namespaced method names fulfilled some >>>> protocol -- like all .sum methods will have a signature that's >>>> compatible with numpy's, and if an object has a .log method then >>>> surely that computes the logarithm (what else in computing could "log" >>>> possibly refer to?), etc. This experience may or may not be relevant, >>>> I'm not sure -- sometimes these kinds of twitches are good guides to >>>> intuition, and sometimes they are just knee-jerk responses to an old >>>> and irrelevant problem :-) >>>> >>>> But you might want to at least think about >>>> how common it might be to have existing objects with unrelated >>>> attributes that happen to be called "path", and the bizarro problems >>>> that might be caused if someone accidentally passes one of them to a >>>> function that expects all .path attributes to be instances of this new >>>> protocol. >>>> >>> >>> sys.path, for example. >>> >>> That's why I'd actually prefer the implicit conversion protocol to be >>> the more explicitly named "__fspath__", with suitable "__fspath__ = >>> path" assignments added to DirEntry and pathlib. However, I'm also not >>> offering to actually *do* the work here, and the casting vote goes to >>> the folks pursuing the implementation effort. >>> >> >> If we decide upon __fspath__ (or __path__) I will do the work on pathlib >> and scandir to add those attributes. >> >> -- >> ~Ethan~ >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 6 14:05:47 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 11:05:47 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: <57054FFB.5070709@stoneleaf.us> On 04/06/2016 10:26 AM, Brett Cannon wrote: > WIth Ethan volunteering to do the work to help make a path protocol a > thing -- and I'm willing to help along with propagating this through the > stdlib where I think Serhiy might be interested in helping as well -- > and a seeming consensus this is a good idea, it seems like this proposal > has a chance of actually coming to fruition. Excellent! Let's proceed along this path ;) until somebody objects. > Now we need clear details. :) Some open questions are: > > 1. Name: __path__, __fspath__, or something else? __fspath__ > 2. Method or attribute? (changes what kind of one-liner you might use > in libraries, but I think historically all protocols have been > methods and the serialized string representation might be costly to > build) I would prefer an attribute, but yeah I think dunders are typically methods, and I don't see this being special enough to not follow that trend. > 3. Built-in? (name is dependent on #1 if we add one) fspath() -- and it would be handy to have a function that return either the __fspath__ results, or the string (if it was one), or raise an exception if neither of the above work out. > 4. Add the method/attribute to str? (I assume so, much like __index__() > is on int, but I have not seen it explicitly stated so I would > rather clarify it) I don't think that's needed. With Path() and fspath() it's trivial to make sure one has what one wants. > 5. Expand the C API to have something like PyObject_Path()? No opinion. > Some people have asked for the pathlib PEP to have a more flushed out > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't > want to do it I can try to instil my blog post into a more succinct > paragraph or two and update the PEP myself. Nice. > Is this going to require a PEP or if we can agree on the points here are > we just going to do it? If we think it requires a PEP I'm willing to > write it, but I obviously have no issue if we skip that step either. :) If there are no (serious?) objects I don't think a PEP is needed. > Oh, and we should resolve this before the next release of Python 3.4, > 3.5, or 3.6 so that pathlib can be updated in those releases. Agreed. -- ~Ethan~ From brett at python.org Wed Apr 6 14:32:07 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 18:32:07 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57054FFB.5070709@stoneleaf.us> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 11:06 Ethan Furman wrote: > On 04/06/2016 10:26 AM, Brett Cannon wrote: > > > WIth Ethan volunteering to do the work to help make a path protocol a > > thing -- and I'm willing to help along with propagating this through the > > stdlib where I think Serhiy might be interested in helping as well -- > > and a seeming consensus this is a good idea, it seems like this proposal > > has a chance of actually coming to fruition. > > Excellent! Let's proceed along this path ;) until somebody objects. > > > > Now we need clear details. :) Some open questions are: > > > > 1. Name: __path__, __fspath__, or something else? > > __fspath__ > +1 for __path__, +0 for __fspath__ (I don't know how widespread the notion that "fs" means "file system" is). > > > > 2. Method or attribute? (changes what kind of one-liner you might use > > in libraries, but I think historically all protocols have been > > methods and the serialized string representation might be costly to > > build) > > I would prefer an attribute, but yeah I think dunders are typically > methods, and I don't see this being special enough to not follow that > trend. > Depends on what we want to tell 3rd-party libraries to do to support pathlib if they are on 3.3 or if they are worried about people using Python 3.4.2 or 3.5.1. An attribute still works with `getattr(path, '__path__', path)`. But with a method you probably want either `path.__path__() if hasattr(path, '__path__') else path` or `getattr(path, '__path__', lambda: path)()`. > > > > 3. Built-in? (name is dependent on #1 if we add one) > > fspath() -- and it would be handy to have a function that return either > the __fspath__ results, or the string (if it was one), or raise an > exception if neither of the above work out. > So: # Attribute def fspath(path): hasattr(path, '__path__'): return path.__path__ if isinstance(path, str): return path raise NotImplementedError # Or TypeError? # Method def fspath(path): try: return path.__path__() except AttributeError: if isinstance(path, str): return path raise TypeError # Or NotImplementedError? Or you can drop the isinstance() check and simply check for the attribute/method and use it and otherwise return `path` and let the code's duck-typing of str handle catching an unexpected type for a path. At which point the built-in becomes whatever idiom we promote for pathlib usage that pre-dates this protocol. > > > 4. Add the method/attribute to str? (I assume so, much like __index__() > > is on int, but I have not seen it explicitly stated so I would > > rather clarify it) > > I don't think that's needed. With Path() and fspath() it's trivial to > make sure one has what one wants. > If we add str.__fspath__ then the function becomes: def fspath(path): return path.__fspath__() Which might be too simplistic for a built-in, but that also means adding it on str would potentially negate the need for a built-in. > > > > 5. Expand the C API to have something like PyObject_Path()? > > No opinion. > If we add a built-in then I say we add an equivalent function in the C API. -Brett > > > > Some people have asked for the pathlib PEP to have a more flushed out > > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't > > want to do it I can try to instil my blog post into a more succinct > > paragraph or two and update the PEP myself. > > Nice. > > > > Is this going to require a PEP or if we can agree on the points here are > > we just going to do it? If we think it requires a PEP I'm willing to > > write it, but I obviously have no issue if we skip that step either. :) > > If there are no (serious?) objects I don't think a PEP is needed. > > > > Oh, and we should resolve this before the next release of Python 3.4, > > 3.5, or 3.6 so that pathlib can be updated in those releases. > > Agreed. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 6 14:54:08 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 11:54:08 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: <57055B50.2030209@stoneleaf.us> On 04/06/2016 11:32 AM, Brett Cannon wrote: > On Wed, 6 Apr 2016 at 11:06 Ethan Furman wrote: >> On 04/06/2016 10:26 AM, Brett Cannon wrote: >>> Now we need clear details. :) Some open questions are: >>> >>> 1. Name: __path__, __fspath__, or something else? >> >> __fspath__ > > +1 for __path__, +0 for __fspath__ (I don't know how widespread the > notion that "fs" means "file system" is). Maybe __os_path__ then? I would rather be explicit about the type of path we are dealing with -- who knows if we won't have __url_path__ in the future (besides Guido, of course ;) > def fspath(path): > try: > return path.__path__() > except AttributeError: > if isinstance(path, str): > return path > raise TypeError # Or NotImplementedError? > > Or you can drop the isinstance() check and [...] If the purpose of fspath() is to return a usable path-as-string then we should raise if unable to do it. > If we add str.__fspath__ then the function becomes: > > def fspath(path): > return path.__fspath__() > > Which might be too simplistic for a built-in, but that also means adding > it on str would potentially negate the need for a built-in. That is an attractive option. -- ~Ethan~ From alexander.belopolsky at gmail.com Wed Apr 6 15:02:35 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 6 Apr 2016 15:02:35 -0400 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, Apr 6, 2016 at 2:32 PM, Brett Cannon wrote: > +1 for __path__, +0 for __fspath__ (I don't know how widespread the notion > that "fs" means "file system" is). Same here. In the good old days, "fs" stood for a "Font Server." And in even older (and better?) days, FS was a "Field Separator." -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Apr 6 15:18:06 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 7 Apr 2016 05:18:06 +1000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57055B50.2030209@stoneleaf.us> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57055B50.2030209@stoneleaf.us> Message-ID: On Thu, Apr 7, 2016 at 4:54 AM, Ethan Furman wrote: > Maybe __os_path__ then? I would rather be explicit about the type of path > we are dealing with -- who knows if we won't have __url_path__ in the future > (besides Guido, of course ;) > Bikeshedding furiously... I don't like os_path here as it's too similar to os.path; unless that's deliberate? ChrisA From ericfahlgren at gmail.com Wed Apr 6 15:28:02 2016 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Wed, 6 Apr 2016 12:28:02 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <20160406143909.GJ12526@ando.pearwood.info> References: <57044567.6070308@sdamon.com> <5704D738.4070507@gmail.com> <20160406143909.GJ12526@ando.pearwood.info> Message-ID: <01d601d1903a$70a1f460$51e5dd20$@gmail.com> On Wednesday, April 06, 2016 07:39, Steven D'Aprano wrote: > > How well does that apply to path/__path__? > > I think it's potentially the same. Possibly there are fewer existing uses of > "obj.path" out there which conflict with this use, but there's at least one in the > std lib: sys.path. Somewhat ironically, also os. >>> import os.path >>> getattr(os, "path") From rymg19 at gmail.com Wed Apr 6 15:29:51 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 6 Apr 2016 14:29:51 -0500 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ On Apr 6, 2016 12:28 PM, "Brett Cannon" wrote: > > WIth Ethan volunteering to do the work to help make a path protocol a thing -- and I'm willing to help along with propagating this through the stdlib where I think Serhiy might be interested in helping as well -- and a seeming consensus this is a good idea, it seems like this proposal has a chance of actually coming to fruition. > > Now we need clear details. :) Some open questions are: My votes: > Name: __path__, __fspath__, or something else? __path__. Considering everything related to `pathlib` uses the word `path`, __fspath__ seems kind of odd. > Method or attribute? (changes what kind of one-liner you might use in libraries, but I think historically all protocols have been methods and the serialized string representation might be costly to build) Method. Using an attribute would be needlessly inconsistent. > Built-in? (name is dependent on #1 if we add one) > Add the method/attribute to str? (I assume so, much like __index__() is on int, but I have not seen it explicitly stated so I would rather clarify it) I agree; this would avoid lots of excess complexity. > Expand the C API to have something like PyObject_Path()? -1. PyFileObject was already removed from Python 3; it seems useless to add another one. > > Some people have asked for the pathlib PEP to have a more flushed out reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't want to do it I can try to instil my blog post into a more succinct paragraph or two and update the PEP myself. > > Is this going to require a PEP or if we can agree on the points here are we just going to do it? If we think it requires a PEP I'm willing to write it, but I obviously have no issue if we skip that step either. :) > > Oh, and we should resolve this before the next release of Python 3.4, 3.5, or 3.6 so that pathlib can be updated in those releases. > > -Brett > > > On Wed, 6 Apr 2016 at 08:09 Ethan Furman wrote: >> >> On 04/05/2016 11:57 PM, Nick Coghlan wrote: >> > On 6 April 2016 at 16:53, Nathaniel Smith wrote: >> >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote: >> >> >>> I'd missed the existing precedent in DirEntry.path, so simply taking >> >>> that and running with it sounds good to me. >> >> >> >> This makes me twitch slightly, because NumPy has had a whole set of >> >> problems due to the ancient and minimally-considered decision to >> >> assume a bunch of ad hoc non-namespaced method names fulfilled some >> >> protocol -- like all .sum methods will have a signature that's >> >> compatible with numpy's, and if an object has a .log method then >> >> surely that computes the logarithm (what else in computing could "log" >> >> possibly refer to?), etc. This experience may or may not be relevant, >> >> I'm not sure -- sometimes these kinds of twitches are good guides to >> >> intuition, and sometimes they are just knee-jerk responses to an old >> >> and irrelevant problem :-) >> >> >> >> But you might want to at least think about >> >> how common it might be to have existing objects with unrelated >> >> attributes that happen to be called "path", and the bizarro problems >> >> that might be caused if someone accidentally passes one of them to a >> >> function that expects all .path attributes to be instances of this new >> >> protocol. >> > >> > sys.path, for example. >> > >> > That's why I'd actually prefer the implicit conversion protocol to be >> > the more explicitly named "__fspath__", with suitable "__fspath__ = >> > path" assignments added to DirEntry and pathlib. However, I'm also not >> > offering to actually *do* the work here, and the casting vote goes to >> > the folks pursuing the implementation effort. >> >> If we decide upon __fspath__ (or __path__) I will do the work on pathlib >> and scandir to add those attributes. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Apr 6 15:32:39 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 Apr 2016 20:32:39 +0100 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On 6 April 2016 at 19:32, Brett Cannon wrote: >> > Now we need clear details. :) Some open questions are: >> > >> > 1. Name: __path__, __fspath__, or something else? >> >> __fspath__ > > +1 for __path__, +0 for __fspath__ (I don't know how widespread the notion > that "fs" means "file system" is). Agreed. But if we have a builtin, it should follow the name of the special attribute/method. And I'm not that keen on having a builtin with a generic name like 'path'. >> > 2. Method or attribute? (changes what kind of one-liner you might use >> > in libraries, but I think historically all protocols have been >> > methods and the serialized string representation might be costly to >> > build) >> >> I would prefer an attribute, but yeah I think dunders are typically >> methods, and I don't see this being special enough to not follow that >> trend. > > Depends on what we want to tell 3rd-party libraries to do to support pathlib > if they are on 3.3 or if they are worried about people using Python 3.4.2 or > 3.5.1. An attribute still works with `getattr(path, '__path__', path)`. But > with a method you probably want either `path.__path__() if hasattr(path, > '__path__') else path` or `getattr(path, '__path__', lambda: path)()`. I'm a little confused by this. To support the older pathlib, they have to do patharg = str(patharg), because *none* of the proposed attributes (path or __path__) will exist. The getattr trick is needed to support the *new* pathlib, when you need a real string. Currently you need a string if you call stdlib functions or builtins. If we fix the stdlib/builtins, the need goes away for those cases, but remains if you need to call libraries that *don't* support pathlib (os.path will likely be one of those) or do direct string manipulation. In practice, I see the getattr trick as an "easy fix" for libraries that want to add support but in a minimally-intrusive way. On that basis, making the trick easy to use is important, which argues for an attribute. >> > 3. Built-in? (name is dependent on #1 if we add one) >> >> fspath() -- and it would be handy to have a function that return either >> the __fspath__ results, or the string (if it was one), or raise an >> exception if neither of the above work out. fspath regardless of the name chosen in #1 - a new builtin called path just has too much likelihood of clashing with user code. But I'm not sure we need a builtin. I'm not at all clear how frequently we expect user code to need to use this protocol. Users can't use the builtin if they want to be backward compatible, But code that doesn't need backward compatibility can probably just work with pathlib (and the stdlib support for it) directly. For display, the implicit conversion to str is fine. For "get me a string representing the path", is the "path" attribute being abandoned in favour of this special method? I'm inclined to think that if you are writing "pure pathlib" code, pathobj.path looks more readable than fspath(pathobj) - certainly no *less* readable. But I'm not one of the people who disliked using .path, so I'm probably not best placed to judge. It would be good if someone who *does* feel strongly could explain why fspath(pathobj) is better than pathobj.path. > So: > > # Attribute > def fspath(path): > hasattr(path, '__path__'): > return path.__path__ > if isinstance(path, str): > return path > raise NotImplementedError # Or TypeError? > > # Method > def fspath(path): > try: > return path.__path__() > except AttributeError: > if isinstance(path, str): > return path > raise TypeError # Or NotImplementedError? You could of course use try/except for the attribute case. Or hasattr for the method case (where it would avoid masking AttributeError exceptions raised within the dunder method call (a possibility if user classes implement their own version of the protocol). Paul From phd at phdru.name Wed Apr 6 15:26:42 2016 From: phd at phdru.name (Oleg Broytman) Date: Wed, 6 Apr 2016 21:26:42 +0200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57055B50.2030209@stoneleaf.us> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57055B50.2030209@stoneleaf.us> Message-ID: <20160406192642.GA11074@phdru.name> On Wed, Apr 06, 2016 at 11:54:08AM -0700, Ethan Furman wrote: > On 04/06/2016 11:32 AM, Brett Cannon wrote: > >On Wed, 6 Apr 2016 at 11:06 Ethan Furman wrote: > >>On 04/06/2016 10:26 AM, Brett Cannon wrote: > > >>>Now we need clear details. :) Some open questions are: > >>> > >>> 1. Name: __path__, __fspath__, or something else? > >> > >>__fspath__ > > > >+1 for __path__, +0 for __fspath__ (I don't know how widespread the > >notion that "fs" means "file system" is). > > Maybe __os_path__ then? I would rather be explicit about the type of path > we are dealing with -- who knows if we won't have __url_path__ in the future > (besides Guido, of course ;) __pathstr__? __urlstr__? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From barry at python.org Wed Apr 6 15:33:30 2016 From: barry at python.org (Barry Warsaw) Date: Wed, 6 Apr 2016 15:33:30 -0400 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <5704909E.8070908@stoneleaf.us> References: <5704909E.8070908@stoneleaf.us> Message-ID: <20160406153330.024a66a5@subdivisions.wooz.org> On Apr 05, 2016, at 09:29 PM, Ethan Furman wrote: >We should either remove it or make the rest of the stdlib work with it. >Currently, pathlib.*Paths are second-class citizens, and working with them is >not significantly better than working with os.path.* simply because we have >to cast to str every time we want to deal with any other part of the stdlib. This. I've tried to use them in a couple of projects and in many ways pathlib objects are nice to work with. But rarely can they be used exclusively. There are just too many other packages and APIs that use os.path and the two do not interoperate very well. That makes practical use of pathlib objects just too unwieldy for project-wide adoption. I don't know if inheriting them from str would fix this problem. I'm +0 on removing the provisional status of pathlib and in trying to figure out ways for them to work better with other libraries (both stdlib and 3rd party) that will continue to be os.path based for the foreseeable future. Cheers, -Barry From brett at python.org Wed Apr 6 15:31:17 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 19:31:17 +0000 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 12:29 Ryan Gonzalez wrote: > -- > Ryan > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > > > On Apr 6, 2016 12:28 PM, "Brett Cannon" wrote: > > > > WIth Ethan volunteering to do the work to help make a path protocol a > thing -- and I'm willing to help along with propagating this through the > stdlib where I think Serhiy might be interested in helping as well -- and a > seeming consensus this is a good idea, it seems like this proposal has a > chance of actually coming to fruition. > > > > Now we need clear details. :) Some open questions are: > > My votes: > > > Name: __path__, __fspath__, or something else? > > __path__. Considering everything related to `pathlib` uses the word > `path`, __fspath__ seems kind of odd. > > > Method or attribute? (changes what kind of one-liner you might use in > libraries, but I think historically all protocols have been methods and the > serialized string representation might be costly to build) > > Method. Using an attribute would be needlessly inconsistent. > > > Built-in? (name is dependent on #1 if we add one) > > Add the method/attribute to str? (I assume so, much like __index__() is > on int, but I have not seen it explicitly stated so I would rather clarify > it) > > I agree; this would avoid lots of excess complexity. > > > Expand the C API to have something like PyObject_Path()? > > -1. PyFileObject was already removed from Python 3; it seems useless to > add another one. > But that was removing a custom object, not a function that will implement whatever idiom we come up with for getting the string representation of a path. -Brett > > > > Some people have asked for the pathlib PEP to have a more flushed out > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't > want to do it I can try to instil my blog post into a more succinct > paragraph or two and update the PEP myself. > > > > Is this going to require a PEP or if we can agree on the points here are > we just going to do it? If we think it requires a PEP I'm willing to write > it, but I obviously have no issue if we skip that step either. :) > > > > Oh, and we should resolve this before the next release of Python 3.4, > 3.5, or 3.6 so that pathlib can be updated in those releases. > > > > -Brett > > > > > > On Wed, 6 Apr 2016 at 08:09 Ethan Furman wrote: > >> > >> On 04/05/2016 11:57 PM, Nick Coghlan wrote: > >> > On 6 April 2016 at 16:53, Nathaniel Smith wrote: > >> >> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan > wrote: > >> > >> >>> I'd missed the existing precedent in DirEntry.path, so simply taking > >> >>> that and running with it sounds good to me. > >> >> > >> >> This makes me twitch slightly, because NumPy has had a whole set of > >> >> problems due to the ancient and minimally-considered decision to > >> >> assume a bunch of ad hoc non-namespaced method names fulfilled some > >> >> protocol -- like all .sum methods will have a signature that's > >> >> compatible with numpy's, and if an object has a .log method then > >> >> surely that computes the logarithm (what else in computing could > "log" > >> >> possibly refer to?), etc. This experience may or may not be relevant, > >> >> I'm not sure -- sometimes these kinds of twitches are good guides to > >> >> intuition, and sometimes they are just knee-jerk responses to an old > >> >> and irrelevant problem :-) > >> >> > >> >> But you might want to at least think about > >> >> how common it might be to have existing objects with unrelated > >> >> attributes that happen to be called "path", and the bizarro problems > >> >> that might be caused if someone accidentally passes one of them to a > >> >> function that expects all .path attributes to be instances of this > new > >> >> protocol. > >> > > >> > sys.path, for example. > >> > > >> > That's why I'd actually prefer the implicit conversion protocol to be > >> > the more explicitly named "__fspath__", with suitable "__fspath__ = > >> > path" assignments added to DirEntry and pathlib. However, I'm also not > >> > offering to actually *do* the work here, and the casting vote goes to > >> > the folks pursuing the implementation effort. > >> > >> If we decide upon __fspath__ (or __path__) I will do the work on pathlib > >> and scandir to add those attributes. > > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 6 15:39:12 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 19:39:12 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 12:32 Paul Moore wrote: > On 6 April 2016 at 19:32, Brett Cannon wrote: > >> > Now we need clear details. :) Some open questions are: > >> > > >> > 1. Name: __path__, __fspath__, or something else? > >> > >> __fspath__ > > > > +1 for __path__, +0 for __fspath__ (I don't know how widespread the > notion > > that "fs" means "file system" is). > > Agreed. But if we have a builtin, it should follow the name of the > special attribute/method. And I'm not that keen on having a builtin > with a generic name like 'path'. > > >> > 2. Method or attribute? (changes what kind of one-liner you might use > >> > in libraries, but I think historically all protocols have been > >> > methods and the serialized string representation might be costly > to > >> > build) > >> > >> I would prefer an attribute, but yeah I think dunders are typically > >> methods, and I don't see this being special enough to not follow that > >> trend. > > > > Depends on what we want to tell 3rd-party libraries to do to support > pathlib > > if they are on 3.3 or if they are worried about people using Python > 3.4.2 or > > 3.5.1. An attribute still works with `getattr(path, '__path__', path)`. > But > > with a method you probably want either `path.__path__() if hasattr(path, > > '__path__') else path` or `getattr(path, '__path__', lambda: path)()`. > > I'm a little confused by this. To support the older pathlib, they have > to do patharg = str(patharg), because *none* of the proposed > attributes (path or __path__) will exist. > > The getattr trick is needed to support the *new* pathlib, when you > need a real string. Currently you need a string if you call stdlib > functions or builtins. If we fix the stdlib/builtins, the need goes > away for those cases, but remains if you need to call libraries that > *don't* support pathlib (os.path will likely be one of those) or do > direct string manipulation. > > In practice, I see the getattr trick as an "easy fix" for libraries > that want to add support but in a minimally-intrusive way. On that > basis, making the trick easy to use is important, which argues for an > attribute. > So then where's the confusion? :) You seem to get the points. I personally find `path.__path__() if hasattr(path, '__path__') else path` also readable (if obviously a bit longer). -Brett > > >> > 3. Built-in? (name is dependent on #1 if we add one) > >> > >> fspath() -- and it would be handy to have a function that return either > >> the __fspath__ results, or the string (if it was one), or raise an > >> exception if neither of the above work out. > > fspath regardless of the name chosen in #1 - a new builtin called path > just has too much likelihood of clashing with user code. > > But I'm not sure we need a builtin. I'm not at all clear how > frequently we expect user code to need to use this protocol. Users > can't use the builtin if they want to be backward compatible, But code > that doesn't need backward compatibility can probably just work with > pathlib (and the stdlib support for it) directly. For display, the > implicit conversion to str is fine. For "get me a string representing > the path", is the "path" attribute being abandoned in favour of this > special method? Yes. > I'm inclined to think that if you are writing "pure > pathlib" code, pathobj.path looks more readable than fspath(pathobj) - > certainly no *less* readable. > I don't' know what you mean by "pure pathlib". You mean code that only works with pathlib objects? Or do you mean code that accepts pathlib objects but uses strings internally? -Brett > > But I'm not one of the people who disliked using .path, so I'm > probably not best placed to judge. It would be good if someone who > *does* feel strongly could explain why fspath(pathobj) is better than > pathobj.path. > > > > So: > > > > # Attribute > > def fspath(path): > > hasattr(path, '__path__'): > > return path.__path__ > > if isinstance(path, str): > > return path > > raise NotImplementedError # Or TypeError? > > > > # Method > > def fspath(path): > > try: > > return path.__path__() > > except AttributeError: > > if isinstance(path, str): > > return path > > raise TypeError # Or NotImplementedError? > > You could of course use try/except for the attribute case. Or hasattr > for the method case (where it would avoid masking AttributeError > exceptions raised within the dunder method call (a possibility if user > classes implement their own version of the protocol). > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 6 15:40:16 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 19:40:16 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <20160406192642.GA11074@phdru.name> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57055B50.2030209@stoneleaf.us> <20160406192642.GA11074@phdru.name> Message-ID: On Wed, 6 Apr 2016 at 12:38 Oleg Broytman wrote: > On Wed, Apr 06, 2016 at 11:54:08AM -0700, Ethan Furman > wrote: > > On 04/06/2016 11:32 AM, Brett Cannon wrote: > > >On Wed, 6 Apr 2016 at 11:06 Ethan Furman wrote: > > >>On 04/06/2016 10:26 AM, Brett Cannon wrote: > > > > >>>Now we need clear details. :) Some open questions are: > > >>> > > >>> 1. Name: __path__, __fspath__, or something else? > > >> > > >>__fspath__ > > > > > >+1 for __path__, +0 for __fspath__ (I don't know how widespread the > > >notion that "fs" means "file system" is). > > > > Maybe __os_path__ then? I would rather be explicit about the type of > path > > we are dealing with -- who knows if we won't have __url_path__ in the > future > > (besides Guido, of course ;) > > __pathstr__? __urlstr__? > But we didn't call it __indexint__ either. No need to embed the type in the name. -Brett > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Apr 6 15:43:34 2016 From: barry at python.org (Barry Warsaw) Date: Wed, 6 Apr 2016 15:43:34 -0400 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <20160406154334.058182b6@subdivisions.wooz.org> On Apr 06, 2016, at 12:44 PM, Nick Coghlan wrote: >The next challenge would then be to make a list of APIs to be updated >for 3.6 to implicitly accept "rich path" objects via the agreed >convention, with pathlib.PurePath used as a test class: > >* open() >* codecs.open() (et al) >* io.* >* os.path.* >* other os functions >* shutil.* >* tempfile.* >* shelve.* >* csv.* Aside from the name of the attribute (though I'm partial to __path__), I think this would go a long way toward making path objects nicer to work with. And right, it doesn't have to be 100% but this would be a big improvement. Cheers, -Barry From ethan at stoneleaf.us Wed Apr 6 16:07:54 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 13:07:54 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: <57056C9A.5090105@stoneleaf.us> On 04/06/2016 12:32 PM, Paul Moore wrote: > But I'm not one of the people who disliked using .path, so I'm > probably not best placed to judge. It would be good if someone who > *does* feel strongly could explain why fspath(pathobj) is better than > pathobj.path. fspath() would be useful because you can pass it a str or a Path and get a str back (or an exception if you pass the wrong thing in). Just like with Path you can pass a str or a Path get a Path back (or an exception if ...). -- -- ~Ethan~ From ethan at stoneleaf.us Wed Apr 6 16:09:19 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 13:09:19 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57055B50.2030209@stoneleaf.us> Message-ID: <57056CEF.8010404@stoneleaf.us> On 04/06/2016 12:18 PM, Chris Angelico wrote: > On Thu, Apr 7, 2016 at 4:54 AM, Ethan Furman wrote: >> Maybe __os_path__ then? I would rather be explicit about the type of path >> we are dealing with -- who knows if we won't have __url_path__ in the future >> (besides Guido, of course ;) >> > > Bikeshedding furiously... I don't like os_path here as it's too > similar to os.path; unless that's deliberate? Well, it is a Operating System Path. ;) -- ~Ethan~ From srkunze at mail.de Wed Apr 6 16:13:09 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 6 Apr 2016 22:13:09 +0200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: <57056DD5.5050503@mail.de> On 06.04.2016 21:02, Alexander Belopolsky wrote: > > On Wed, Apr 6, 2016 at 2:32 PM, Brett Cannon > wrote: > > +1 for __path__, +0 for __fspath__? (I don't know how widespread > the notion that "fs" means "file system" is). > > > Same here.? In the good old days, "fs" stood for a "Font Server." > ? And in even older (and better?) days, FS was a "Field Separator." The future is not the past. ;) What about __file_path__ ? Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 6 16:20:59 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 13:20:59 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <57056FAB.2010703@stoneleaf.us> On 04/06/2016 02:41 AM, Antoine Pitrou wrote: > On a concrete point, inheriting str would make the API a horrible, > confusing, dangerous mess missing regular string semantics (concatenation > with +, for example, or indexing) with path-specific semantics and various > grey areas (should .split() have path semantics or str semantics? what > is the rule and how are people supposed to remember it?). While I agree in principle.. > (of course, for PHP or Javascript programmers it may not sound like a > problem. Let "adding" two IP addresses return the concatenation of > their string representations...) Like if had a subnet of '192.168' and a host of '.11.16' and adding them together gave you '192.168.11.16'? (yeah, a bit weak) Or, more appropriately: a path of '/home/ethan/mystuff' + '_bak' so I can make a copy? Actually, that would be stuff = pathlib.Path('/home/ethan/mystuff') # no issue here backup_stuff = stuff.with_name(stuff.name + '_bak') # eww Sure, you can make the argument that `with_suffix('.bak')` is cleaner, but it is not up to the stdlib to micromanage my code. Oh, and I do not consort with PHP, and only do so with Javascript when forced. -- ~Ethan~ From ethan at stoneleaf.us Wed Apr 6 16:22:04 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 13:22:04 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <57056FEC.6010201@stoneleaf.us> On 04/05/2016 11:53 PM, Nathaniel Smith wrote: > On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote: >> I'd missed the existing precedent in DirEntry.path, so simply taking >> that and running with it sounds good to me. > > This makes me twitch slightly, because NumPy has had a whole set of > problems due to the ancient and minimally-considered decision to > assume a bunch of ad hoc non-namespaced method names fulfilled some > protocol -- like all .sum methods will have a signature that's > compatible with numpy's, and if an object has a .log method then > surely that computes the logarithm (what else in computing could "log" > possibly refer to?), etc. This experience may or may not be relevant, > I'm not sure -- sometimes these kinds of twitches are good guides to > intuition, and sometimes they are just knee-jerk responses to an old > and irrelevant problem :-). But you might want to at least think about > how common it might be to have existing objects with unrelated > attributes that happen to be called "path", and the bizarro problems > that might be caused if someone accidentally passes one of them to a > function that expects all .path attributes to be instances of this new > protocol. A very good point, thank you. -- ~Ethan~ From brett at python.org Wed Apr 6 16:28:02 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 20:28:02 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57056DD5.5050503@mail.de> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57056DD5.5050503@mail.de> Message-ID: On Wed, 6 Apr 2016 at 13:20 Sven R. Kunze wrote: > On 06.04.2016 21:02, Alexander Belopolsky wrote: > > On Wed, Apr 6, 2016 at 2:32 PM, Brett Cannon wrote: > > +1 for __path__, +0 for __fspath__? (I don't know how widespread the >> notion that "fs" means "file system" is). > > > Same here.? In the good old days, "fs" stood for a "Font Server." ? And > in even older (and better?) days, FS was a "Field Separator." > > > The future is not the past. ;) > > > What about > > __file_path__ > Can be a directory as well (and you could argue semantics of file system inodes, beginners won't know the subtlety and/or wonder where __dir_path__ is). -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Apr 6 16:47:05 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 6 Apr 2016 22:47:05 +0200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <5704909E.8070908@stoneleaf.us> Message-ID: <570575C9.7060208@mail.de> On 06.04.2016 07:00, Guido van Rossum wrote: > On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman wrote: >> [...] we can't do: >> >> app_root = Path(...) >> config = app_root/'settings.cfg' >> with open(config) as blah: >> # whatever >> >> It feels like instead of addressing this basic disconnect, the answer has >> instead been: add that to pathlib! Which works great -- until a user or a >> library gets this path object and tries to use something from os on it. > I agree that asking for config.open() isn't the right answer here > (even if it happens to work). How come? > But in this example, once 3.5.2 is out, > the solution would be to use open(config.path), and that will also > work when passing it to a library. Is it still unacceptable then? I think so. Although in this example I would prefer the shorter config.open alternative as I am lazy. I still cannot remember what the concrete issue was why we dropped pathlib the same day we gave it a try. It was something really stupid and although I hoped to reduce the size of the code, it was less readable. But it was not the path->str issue but something more mundane. It was something that forced us to use os[.path] as Path didn't provide something equivalent. Cannot remember..... Best, Sven From srkunze at mail.de Wed Apr 6 16:54:13 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 6 Apr 2016 22:54:13 +0200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57056DD5.5050503@mail.de> Message-ID: <57057775.3040307@mail.de> On 06.04.2016 22:28, Brett Cannon wrote: > On Wed, 6 Apr 2016 at 13:20 Sven R. Kunze > wrote: > > > What about > > __file_path__ > > > Can be a directory as well (and you could argue semantics of file > system inodes, beginners won't know the subtlety and/or wonder where > __dir_path__ is). Good point. Well, then __fspath__ for me. I knew instantly what it means especially considering btrfs, ntfs, xfs, zfs, etc. Furthermore, we MIGHT later want some URI support, so I don't know off the top of my head if there's a difference between __fspath__ and __urlpath__ but better separate it now. Later we can re-merge then if necessary. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 6 16:55:41 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 20:55:41 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57057775.3040307@mail.de> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57056DD5.5050503@mail.de> <57057775.3040307@mail.de> Message-ID: On Wed, 6 Apr 2016 at 13:54 Sven R. Kunze wrote: > On 06.04.2016 22:28, Brett Cannon wrote: > > On Wed, 6 Apr 2016 at 13:20 Sven R. Kunze < > srkunze at mail.de> wrote: > > >> What about >> >> __file_path__ >> > > Can be a directory as well (and you could argue semantics of file system > inodes, beginners won't know the subtlety and/or wonder where __dir_path__ > is). > > > Good point. > > Well, then __fspath__ for me. > > > I knew instantly what it means especially considering btrfs, ntfs, xfs, > zfs, etc. > > Furthermore, we MIGHT later want some URI support, so I don't know off the > top of my head if there's a difference between __fspath__ and __urlpath__ > but better separate it now. Later we can re-merge then if necessary. > There's a difference as a URL represents something different than a file system path (URI doesn't necessarily). Plus the serialized format would be different, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Wed Apr 6 17:03:05 2016 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 6 Apr 2016 16:03:05 -0500 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On Apr 6, 2016 12:47 PM, "Brett Cannon" wrote: > > > > On Wed, 6 Apr 2016 at 10:41 Wes Turner wrote: >> >> * +1 for __path__, __fspath__ >> (though I don't know what each does) > > > Returns a string representing a file system path. Why two methods? __uripath__? (scheme, host (port), path, query, fragment) so, not __uripath__ what would be the difference between __path__ and __fspath__? > >> >> * why not Text(basestring / bytestring) and pathlib.Path(Text)? > > > See the points about next() vs __next__() Path(b'123') / u'456' similarly, Path(b'123') / UTF8 / UTF16 > >> >> * are there examples of cases where this cannot be? > > > I don't understand what you think "cannot be". What one recommends (path.py(str) / str(pathlib.Path()) + getattr) is distinct from what any given programmer chooses to do with their code. > >> >> * if not, +1 for subclassing str/Text >> >> * where are the examples of method collisions between the str interface and the pathlib.Path interface? > > > There aren't any and that's partially why some people wanted the str subclass to begin with. > > Please consider this thread a str-subclass-free zone. This line of discussion is to flesh out the proposal for a path protocol as a proposal against subclassing str, not to settle the whole discussion outright. If you want to continue to debate the subclassing-str side of this please use the other thread. this seems to be a sudden, arbitrary distinction. are these proposals necessarily disjoint? so, adding getattr(path, '__path__', path) to stdlib and other code is going to prevent which edge cases (before os.path.normpath()* anyway) for which benefit? when do I do getattr(path, '__fspath__', path)? > > -Brett > >> >> * str.__div__ is nonsensical >> * pathlib.Path.__div__ is super-useful ah, not .__add__() but .append() I suppose the request here is for the cases which would be prevented (that we need to learn to look for) >> >> >> >> On Apr 6, 2016 10:10 AM, "Ethan Furman" wrote: >>> >>> On 04/05/2016 11:57 PM, Nick Coghlan wrote: >>>> >>>> On 6 April 2016 at 16:53, Nathaniel Smith wrote: >>>>> >>>>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan wrote: >>> >>> >>>>>> I'd missed the existing precedent in DirEntry.path, so simply taking >>>>>> that and running with it sounds good to me. >>>>> >>>>> >>>>> This makes me twitch slightly, because NumPy has had a whole set of >>>>> problems due to the ancient and minimally-considered decision to >>>>> assume a bunch of ad hoc non-namespaced method names fulfilled some >>>>> protocol -- like all .sum methods will have a signature that's >>>>> compatible with numpy's, and if an object has a .log method then >>>>> surely that computes the logarithm (what else in computing could "log" >>>>> possibly refer to?), etc. This experience may or may not be relevant, >>>>> I'm not sure -- sometimes these kinds of twitches are good guides to >>>>> intuition, and sometimes they are just knee-jerk responses to an old >>>>> and irrelevant problem :-) >>>>> >>>>> But you might want to at least think about >>>>> how common it might be to have existing objects with unrelated >>>>> attributes that happen to be called "path", and the bizarro problems >>>>> that might be caused if someone accidentally passes one of them to a >>>>> function that expects all .path attributes to be instances of this new >>>>> protocol. >>>> >>>> >>>> sys.path, for example. >>>> >>>> That's why I'd actually prefer the implicit conversion protocol to be >>>> the more explicitly named "__fspath__", with suitable "__fspath__ = >>>> path" assignments added to DirEntry and pathlib. However, I'm also not >>>> offering to actually *do* the work here, and the casting vote goes to >>>> the folks pursuing the implementation effort. >>> >>> >>> If we decide upon __fspath__ (or __path__) I will do the work on pathlib and scandir to add those attributes. >>> >>> -- >>> ~Ethan~ >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> >>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 6 17:03:55 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 14:03:55 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <570575C9.7060208@mail.de> References: <5704909E.8070908@stoneleaf.us> <570575C9.7060208@mail.de> Message-ID: <570579BB.5070602@stoneleaf.us> On 04/06/2016 01:47 PM, Sven R. Kunze wrote: > I still cannot remember what the concrete issue was why we dropped > pathlib the same day we gave it a try. It was something really stupid > and although I hoped to reduce the size of the code, it was less > readable. But it was not the path->str issue but something more mundane. > It was something that forced us to use os[.path] as Path didn't provide > something equivalent. Cannot remember..... I'm willing to guess that if you had been able to just call os.whatever(your_path_obj) it would have been at most a minor annoyance. -- ~Ethan~ From brett at python.org Wed Apr 6 17:07:59 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 21:07:59 +0000 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 14:03 Wes Turner wrote: > > On Apr 6, 2016 12:47 PM, "Brett Cannon" wrote: > > > > > > > > On Wed, 6 Apr 2016 at 10:41 Wes Turner wrote: > >> > >> * +1 for __path__, __fspath__ > >> (though I don't know what each does) > > > > > > Returns a string representing a file system path. > > Why two methods? __uripath__? > > (scheme, host (port), path, query, fragment) so, not __uripath__ > > what would be the difference between __path__ and __fspath__? > There is no difference; we're trying to choose a name. > > > >> > >> * why not Text(basestring / bytestring) and pathlib.Path(Text)? > > > > > > See the points about next() vs __next__() > > Path(b'123') / u'456' > > similarly, > Path(b'123') / UTF8 / UTF16 > As other people pointed out on the other thread, while bytes paths do exist, we don't want to promote them as they are a mess to work with. -Brett > > > >> > >> * are there examples of cases where this cannot be? > > > > > > I don't understand what you think "cannot be". > > What one recommends (path.py(str) / str(pathlib.Path()) + getattr) is > distinct from what any given programmer chooses to do with their code. > > > > >> > >> * if not, +1 for subclassing str/Text > >> > >> * where are the examples of method collisions between the str > interface and the pathlib.Path interface? > > > > > > There aren't any and that's partially why some people wanted the str > subclass to begin with. > > > > Please consider this thread a str-subclass-free zone. This line of > discussion is to flesh out the proposal for a path protocol as a proposal > against subclassing str, not to settle the whole discussion outright. If > you want to continue to debate the subclassing-str side of this please use > the other thread. > > this seems to be a sudden, arbitrary distinction. > > are these proposals necessarily disjoint? > > so, > adding getattr(path, '__path__', path) to stdlib and other code is going > to prevent which edge cases (before os.path.normpath()* anyway) for which > benefit? > > when do I do getattr(path, '__fspath__', path)? > > > > > -Brett > > > >> > >> * str.__div__ is nonsensical > >> * pathlib.Path.__div__ is super-useful > > ah, not .__add__() but .append() > > I suppose the request here is for the cases which would be prevented (that > we need to learn to look for) > > >> > >> > >> > >> On Apr 6, 2016 10:10 AM, "Ethan Furman" wrote: > >>> > >>> On 04/05/2016 11:57 PM, Nick Coghlan wrote: > >>>> > >>>> On 6 April 2016 at 16:53, Nathaniel Smith wrote: > >>>>> > >>>>> On Tue, Apr 5, 2016 at 11:29 PM, Nick Coghlan > wrote: > >>> > >>> > >>>>>> I'd missed the existing precedent in DirEntry.path, so simply taking > >>>>>> that and running with it sounds good to me. > >>>>> > >>>>> > >>>>> This makes me twitch slightly, because NumPy has had a whole set of > >>>>> problems due to the ancient and minimally-considered decision to > >>>>> assume a bunch of ad hoc non-namespaced method names fulfilled some > >>>>> protocol -- like all .sum methods will have a signature that's > >>>>> compatible with numpy's, and if an object has a .log method then > >>>>> surely that computes the logarithm (what else in computing could > "log" > >>>>> possibly refer to?), etc. This experience may or may not be relevant, > >>>>> I'm not sure -- sometimes these kinds of twitches are good guides to > >>>>> intuition, and sometimes they are just knee-jerk responses to an old > >>>>> and irrelevant problem :-) > >>>>> > >>>>> But you might want to at least think about > >>>>> how common it might be to have existing objects with unrelated > >>>>> attributes that happen to be called "path", and the bizarro problems > >>>>> that might be caused if someone accidentally passes one of them to a > >>>>> function that expects all .path attributes to be instances of this > new > >>>>> protocol. > >>>> > >>>> > >>>> sys.path, for example. > >>>> > >>>> That's why I'd actually prefer the implicit conversion protocol to be > >>>> the more explicitly named "__fspath__", with suitable "__fspath__ = > >>>> path" assignments added to DirEntry and pathlib. However, I'm also not > >>>> offering to actually *do* the work here, and the casting vote goes to > >>>> the folks pursuing the implementation effort. > >>> > >>> > >>> If we decide upon __fspath__ (or __path__) I will do the work on > pathlib and scandir to add those attributes. > >>> > >>> -- > >>> ~Ethan~ > >>> _______________________________________________ > >>> Python-Dev mailing list > >>> Python-Dev at python.org > >>> https://mail.python.org/mailman/listinfo/python-dev > >>> > >>> Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > >> > >> _______________________________________________ > >> Python-Dev mailing list > >> Python-Dev at python.org > >> https://mail.python.org/mailman/listinfo/python-dev > >> Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Apr 6 17:15:17 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 6 Apr 2016 23:15:17 +0200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57056DD5.5050503@mail.de> <57057775.3040307@mail.de> Message-ID: <57057C65.3070802@mail.de> On 06.04.2016 22:55, Brett Cannon wrote: > On Wed, 6 Apr 2016 at 13:54 Sven R. Kunze > wrote: > > Furthermore, we MIGHT later want some URI support, so I don't know > off the top of my head if there's a difference between __fspath__ > and __urlpath__ but better separate it now. Later we can re-merge > then if necessary. > > > There's a difference as a URL represents something different than a > file system path (URI doesn't necessarily). Plus the serialized format > would be different, etc. Sure. URLs and URIs are more than just paths. I would expect __urlpath__ to be different than __url__ itself but if that's is a different discussion. So, __fspath__ for me. :) Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Apr 6 17:27:07 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 6 Apr 2016 23:27:07 +0200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <570579BB.5070602@stoneleaf.us> References: <5704909E.8070908@stoneleaf.us> <570575C9.7060208@mail.de> <570579BB.5070602@stoneleaf.us> Message-ID: <57057F2B.1020205@mail.de> Yeah, sure. But it was more like this on a single line: os.missing1(str(our_path.something1)) *** os.missing2(str(our_path.something1)) *** os.missing1(str(our_path.something1)) And then it started to get messy because you need to work on a single long line or you need to open more than one line. It was a simple thing actually. Like repeating the same calls to pathlib just because we need to switch to os.path.... I will ask my colleague if he remembers or if we can recover the code tommorrow... Best, Sven NOTE to myself: getting old, need to write down everything On 06.04.2016 23:03, Ethan Furman wrote: > On 04/06/2016 01:47 PM, Sven R. Kunze wrote: > >> I still cannot remember what the concrete issue was why we dropped >> pathlib the same day we gave it a try. It was something really stupid >> and although I hoped to reduce the size of the code, it was less >> readable. But it was not the path->str issue but something more mundane. >> It was something that forced us to use os[.path] as Path didn't provide >> something equivalent. Cannot remember..... > > I'm willing to guess that if you had been able to just call > > os.whatever(your_path_obj) > > it would have been at most a minor annoyance. > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de From p.f.moore at gmail.com Wed Apr 6 18:22:50 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 Apr 2016 23:22:50 +0100 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On 6 April 2016 at 20:39, Brett Cannon wrote: >> I'm a little confused by this. To support the older pathlib, they have >> to do patharg = str(patharg), because *none* of the proposed >> attributes (path or __path__) will exist. >> >> The getattr trick is needed to support the *new* pathlib, when you >> need a real string. Currently you need a string if you call stdlib >> functions or builtins. If we fix the stdlib/builtins, the need goes >> away for those cases, but remains if you need to call libraries that >> *don't* support pathlib (os.path will likely be one of those) or do >> direct string manipulation. >> >> In practice, I see the getattr trick as an "easy fix" for libraries >> that want to add support but in a minimally-intrusive way. On that >> basis, making the trick easy to use is important, which argues for an >> attribute. > > So then where's the confusion? :) You seem to get the points. I personally > find `path.__path__() if hasattr(path, '__path__') else path` also readable > (if obviously a bit longer). The confusion is that you seem to be saying that people can use getattr(path, '__path__', path) to support older versions of Python. But the older versions are precisely the ones that don't have __path__ so you won't be supporting them. >> >> > 3. Built-in? (name is dependent on #1 if we add one) >> >> >> >> fspath() -- and it would be handy to have a function that return either >> >> the __fspath__ results, or the string (if it was one), or raise an >> >> exception if neither of the above work out. >> >> fspath regardless of the name chosen in #1 - a new builtin called path >> just has too much likelihood of clashing with user code. >> >> But I'm not sure we need a builtin. I'm not at all clear how >> frequently we expect user code to need to use this protocol. Users >> can't use the builtin if they want to be backward compatible, But code >> that doesn't need backward compatibility can probably just work with >> pathlib (and the stdlib support for it) directly. For display, the >> implicit conversion to str is fine. For "get me a string representing >> the path", is the "path" attribute being abandoned in favour of this >> special method? > > Yes. OK. So the idiom to get a string from a known Path object would be any of: 1. str(path) 2. fspath(path) 3. path.__path__() (1) is safe if you know you have a Path object, but could incorrectly convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I miss any options? So I think we need a builtin. Code that needs to be backward compatible will still have to use str(path), because neither the builtin nor the __path__ protocol will exist in older versions of Python. Maybe a compatibility library could add try: fspath except NameError: try: import pathlib def fspath(p): if isinstance(p, pathlib.Path): return str(p) return p except ImportError: def fspath(p): return p It's messy, like all compatibility code, but it allows code to use fspath(p) in older versions. >> I'm inclined to think that if you are writing "pure >> pathlib" code, pathobj.path looks more readable than fspath(pathobj) - >> certainly no *less* readable. > > I don't' know what you mean by "pure pathlib". You mean code that only works > with pathlib objects? Or do you mean code that accepts pathlib objects but > uses strings internally? I mean code that knows it has a Path object to work with (and not a string or anything else). But the point is moot if the path attribute is going away. Other than to say that I do prefer the name "path", I just don't think it's a reasonable name for a builtin. Even if it's OK for user variables to have the same name as builtins, IDEs tend to colour builtins differently, which is distracting. (Temporary variables named "file" or "dir" are the ones I hit frequently...) If all we're debating is the name, though, I think we're pretty much there :-) Paul From brett at python.org Wed Apr 6 18:46:24 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 22:46:24 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 15:22 Paul Moore wrote: > On 6 April 2016 at 20:39, Brett Cannon wrote: > >> I'm a little confused by this. To support the older pathlib, they have > >> to do patharg = str(patharg), because *none* of the proposed > >> attributes (path or __path__) will exist. > >> > >> The getattr trick is needed to support the *new* pathlib, when you > >> need a real string. Currently you need a string if you call stdlib > >> functions or builtins. If we fix the stdlib/builtins, the need goes > >> away for those cases, but remains if you need to call libraries that > >> *don't* support pathlib (os.path will likely be one of those) or do > >> direct string manipulation. > >> > >> In practice, I see the getattr trick as an "easy fix" for libraries > >> that want to add support but in a minimally-intrusive way. On that > >> basis, making the trick easy to use is important, which argues for an > >> attribute. > > > > So then where's the confusion? :) You seem to get the points. I > personally > > find `path.__path__() if hasattr(path, '__path__') else path` also > readable > > (if obviously a bit longer). > > The confusion is that you seem to be saying that people can use > getattr(path, '__path__', path) to support older versions of Python. > But the older versions are precisely the ones that don't have __path__ > so you won't be supporting them. > Because pathlib is provisional the change will go into the next releases of Python 3.4, 3.5, and in 3.6 so new-old will have whatever we do. :) I think the key point is that this sort of thing will occur before you have access to some new built-in or something. > > >> >> > 3. Built-in? (name is dependent on #1 if we add one) > >> >> > >> >> fspath() -- and it would be handy to have a function that return > either > >> >> the __fspath__ results, or the string (if it was one), or raise an > >> >> exception if neither of the above work out. > >> > >> fspath regardless of the name chosen in #1 - a new builtin called path > >> just has too much likelihood of clashing with user code. > >> > >> But I'm not sure we need a builtin. I'm not at all clear how > >> frequently we expect user code to need to use this protocol. Users > >> can't use the builtin if they want to be backward compatible, But code > >> that doesn't need backward compatibility can probably just work with > >> pathlib (and the stdlib support for it) directly. For display, the > >> implicit conversion to str is fine. For "get me a string representing > >> the path", is the "path" attribute being abandoned in favour of this > >> special method? > > > > Yes. > > OK. So the idiom to get a string from a known Path object would be any of: > > 1. str(path) > 2. fspath(path) > 3. path.__path__() > > (1) is safe if you know you have a Path object, but could incorrectly > convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I > miss any options? > Other than path.__path__ being an attribute, nope. > > So I think we need a builtin. > Well, the ugliness shouldn't survive forever if the community shifts over to using pathlib while the built-in will. We also don't have a built-in for __index__() so it depends on whether we expect this sort of thing to be the purview of library authors or if normal people will be interacting with it (it's probably both during the transition, but I don't know afterwards). > > Code that needs to be backward compatible will still have to use > str(path), because neither the builtin nor the __path__ protocol will > exist in older versions of Python. str(path) will definitely work, path.__path__ will work if you're running the next set of bugfix releases. fspath(path) will only work in Python 3.6 and newer. > Maybe a compatibility library could > add > > try: > fspath > except NameError: > try: > import pathlib > def fspath(p): > if isinstance(p, pathlib.Path): > return str(p) > return p > except ImportError: > def fspath(p): > return p > > It's messy, like all compatibility code, but it allows code to use > fspath(p) in older versions. > I would tweak it to check for __fspath__ before it resorted to calling str(), but yes, that could be something people use. > > >> I'm inclined to think that if you are writing "pure > >> pathlib" code, pathobj.path looks more readable than fspath(pathobj) - > >> certainly no *less* readable. > > > > I don't' know what you mean by "pure pathlib". You mean code that only > works > > with pathlib objects? Or do you mean code that accepts pathlib objects > but > > uses strings internally? > > I mean code that knows it has a Path object to work with (and not a > string or anything else). But the point is moot if the path attribute > is going away. > > Other than to say that I do prefer the name "path", I just don't think > it's a reasonable name for a builtin. Even if it's OK for user > variables to have the same name as builtins, IDEs tend to colour > builtins differently, which is distracting. (Temporary variables named > "file" or "dir" are the ones I hit frequently...) > > If all we're debating is the name, though, I think we're pretty much there > :-) > It seems like __fspath__ may be leading as a name, but not that many people have spoken up. But that is not the only thing still up for debate. :) We have not settled on whether a built-in is necessary. Maybe whatever function we come with should live in pathlib itself and not have it be a built-in? We have also not settled on whether __fspath__ should be a method or attribute as that changes the boilerplate one-liner people may use if a built-in isn't available. This is the first half of the protocol. What exactly should this helper function do? E.g. does it simply return its argument if __fspath__ isn't defined, or does it check for __fspath__, then if it's an instance of str, then TypeError? This is the second half of the protocol and will end up defining what a "path-like object" represents. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Wed Apr 6 18:54:42 2016 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 06 Apr 2016 22:54:42 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: Note: While I do not object to the bike shed colors being proposed, if you call the attribute .__path__ that is somewhat confusing when thinking about the import system which declares that *"any module that contains a __path__ attribute is considered a package"*. So would module.__path__ become a Path instance in a potential future making module.__path__.__path__ meaningfully confusing? ;) I'm not worried about people who shove pathlib.Path instances in as values into sys.modules and expect anything but pain. :P __gps__ On Wed, Apr 6, 2016 at 3:46 PM Brett Cannon wrote: > On Wed, 6 Apr 2016 at 15:22 Paul Moore wrote: > >> On 6 April 2016 at 20:39, Brett Cannon wrote: >> >> I'm a little confused by this. To support the older pathlib, they have >> >> to do patharg = str(patharg), because *none* of the proposed >> >> attributes (path or __path__) will exist. >> >> >> >> The getattr trick is needed to support the *new* pathlib, when you >> >> need a real string. Currently you need a string if you call stdlib >> >> functions or builtins. If we fix the stdlib/builtins, the need goes >> >> away for those cases, but remains if you need to call libraries that >> >> *don't* support pathlib (os.path will likely be one of those) or do >> >> direct string manipulation. >> >> >> >> In practice, I see the getattr trick as an "easy fix" for libraries >> >> that want to add support but in a minimally-intrusive way. On that >> >> basis, making the trick easy to use is important, which argues for an >> >> attribute. >> > >> > So then where's the confusion? :) You seem to get the points. I >> personally >> > find `path.__path__() if hasattr(path, '__path__') else path` also >> readable >> > (if obviously a bit longer). >> >> The confusion is that you seem to be saying that people can use >> getattr(path, '__path__', path) to support older versions of Python. >> But the older versions are precisely the ones that don't have __path__ >> so you won't be supporting them. >> > > Because pathlib is provisional the change will go into the next releases > of Python 3.4, 3.5, and in 3.6 so new-old will have whatever we do. :) I > think the key point is that this sort of thing will occur before you have > access to some new built-in or something. > > >> >> >> >> > 3. Built-in? (name is dependent on #1 if we add one) >> >> >> >> >> >> fspath() -- and it would be handy to have a function that return >> either >> >> >> the __fspath__ results, or the string (if it was one), or raise an >> >> >> exception if neither of the above work out. >> >> >> >> fspath regardless of the name chosen in #1 - a new builtin called path >> >> just has too much likelihood of clashing with user code. >> >> >> >> But I'm not sure we need a builtin. I'm not at all clear how >> >> frequently we expect user code to need to use this protocol. Users >> >> can't use the builtin if they want to be backward compatible, But code >> >> that doesn't need backward compatibility can probably just work with >> >> pathlib (and the stdlib support for it) directly. For display, the >> >> implicit conversion to str is fine. For "get me a string representing >> >> the path", is the "path" attribute being abandoned in favour of this >> >> special method? >> > >> > Yes. >> >> OK. So the idiom to get a string from a known Path object would be any of: >> >> 1. str(path) >> 2. fspath(path) >> 3. path.__path__() >> >> (1) is safe if you know you have a Path object, but could incorrectly >> convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I >> miss any options? >> > > Other than path.__path__ being an attribute, nope. > > >> >> So I think we need a builtin. >> > > Well, the ugliness shouldn't survive forever if the community shifts over > to using pathlib while the built-in will. We also don't have a built-in for > __index__() so it depends on whether we expect this sort of thing to be the > purview of library authors or if normal people will be interacting with it > (it's probably both during the transition, but I don't know afterwards). > > >> >> Code that needs to be backward compatible will still have to use >> str(path), because neither the builtin nor the __path__ protocol will >> exist in older versions of Python. > > > str(path) will definitely work, path.__path__ will work if you're running > the next set of bugfix releases. fspath(path) will only work in Python 3.6 > and newer. > > >> Maybe a compatibility library could >> add >> >> try: >> fspath >> except NameError: >> try: >> import pathlib >> def fspath(p): >> if isinstance(p, pathlib.Path): >> return str(p) >> return p >> except ImportError: >> def fspath(p): >> return p >> >> It's messy, like all compatibility code, but it allows code to use >> fspath(p) in older versions. >> > > I would tweak it to check for __fspath__ before it resorted to calling > str(), but yes, that could be something people use. > > >> >> >> I'm inclined to think that if you are writing "pure >> >> pathlib" code, pathobj.path looks more readable than fspath(pathobj) - >> >> certainly no *less* readable. >> > >> > I don't' know what you mean by "pure pathlib". You mean code that only >> works >> > with pathlib objects? Or do you mean code that accepts pathlib objects >> but >> > uses strings internally? >> >> I mean code that knows it has a Path object to work with (and not a >> string or anything else). But the point is moot if the path attribute >> is going away. >> >> Other than to say that I do prefer the name "path", I just don't think >> it's a reasonable name for a builtin. Even if it's OK for user >> variables to have the same name as builtins, IDEs tend to colour >> builtins differently, which is distracting. (Temporary variables named >> "file" or "dir" are the ones I hit frequently...) >> >> If all we're debating is the name, though, I think we're pretty much >> there :-) >> > > It seems like __fspath__ may be leading as a name, but not that many > people have spoken up. But that is not the only thing still up for debate. > :) > > We have not settled on whether a built-in is necessary. Maybe whatever > function we come with should live in pathlib itself and not have it be a > built-in? > > We have also not settled on whether __fspath__ should be a method or > attribute as that changes the boilerplate one-liner people may use if a > built-in isn't available. This is the first half of the protocol. > > What exactly should this helper function do? E.g. does it simply return > its argument if __fspath__ isn't defined, or does it check for __fspath__, > then if it's an instance of str, then TypeError? This is the second half of > the protocol and will end up defining what a "path-like object" represents. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Apr 6 18:59:25 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Apr 2016 10:59:25 +1200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: <570594CD.5010805@canterbury.ac.nz> Nick Coghlan wrote: > I'd missed the existing precedent in DirEntry.path, so simply taking > that and running with it sounds good to me. It's not quite the same thing, though. DirEntry.path takes something that is not a path (a DirEntry instance) and gives you a path representing it, so the name makes sense. But a Path instance is already "a path", so Path.path is weird. Path.str would make more sense. -- Greg From njs at pobox.com Wed Apr 6 19:25:15 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Apr 2016 16:25:15 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, Apr 6, 2016 at 3:46 PM, Brett Cannon wrote: > > > On Wed, 6 Apr 2016 at 15:22 Paul Moore wrote: >> >> So I think we need a builtin. > > > Well, the ugliness shouldn't survive forever if the community shifts over to > using pathlib while the built-in will. We also don't have a built-in for > __index__() so it depends on whether we expect this sort of thing to be the > purview of library authors or if normal people will be interacting with it > (it's probably both during the transition, but I don't know afterwards). For __index__ the "built-in" is: from operator import index -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Wed Apr 6 19:27:13 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Apr 2016 16:27:13 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, Apr 6, 2016 at 3:54 PM, Gregory P. Smith wrote: > Note: While I do not object to the bike shed colors being proposed, if you > call the attribute .__path__ that is somewhat confusing when thinking about > the import system which declares that "any module that contains a __path__ > attribute is considered a package". To me this observation seems to rule out __path__ as an option: even if they wouldn't clash in practice, then right now googling __path__ sends you straight to the import system documentation. If we overload the meaning of the string then it'll make a mess of the trying-to-figure-out-what-this-__thing__-is experience. -n -- Nathaniel J. Smith -- https://vorpus.org From brett at python.org Wed Apr 6 19:26:58 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 23:26:58 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 16:25 Nathaniel Smith wrote: > On Wed, Apr 6, 2016 at 3:46 PM, Brett Cannon wrote: > > > > > > On Wed, 6 Apr 2016 at 15:22 Paul Moore wrote: > >> > >> So I think we need a builtin. > > > > > > Well, the ugliness shouldn't survive forever if the community shifts > over to > > using pathlib while the built-in will. We also don't have a built-in for > > __index__() so it depends on whether we expect this sort of thing to be > the > > purview of library authors or if normal people will be interacting with > it > > (it's probably both during the transition, but I don't know afterwards). > > For __index__ the "built-in" is: > > from operator import index > Which suggests perhaps we should have pathlib.fspath() instead of a built-in. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 6 19:27:27 2016 From: brett at python.org (Brett Cannon) Date: Wed, 06 Apr 2016 23:27:27 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Wed, 6 Apr 2016 at 15:54 Gregory P. Smith wrote: > Note: While I do not object to the bike shed colors being proposed, if you > call the attribute .__path__ that is somewhat confusing when thinking about > the import system which declares that *"any module that contains a > __path__ attribute is considered a package"*. > > So would module.__path__ become a Path instance in a potential future > making module.__path__.__path__ meaningfully confusing? ;) > > I'm not worried about people who shove pathlib.Path instances in as values > into sys.modules and expect anything but pain. :P > Ah, good point. I think that kills __path__ then as an option. -Brett > > __gps__ > > > > On Wed, Apr 6, 2016 at 3:46 PM Brett Cannon wrote: > >> On Wed, 6 Apr 2016 at 15:22 Paul Moore wrote: >> >>> On 6 April 2016 at 20:39, Brett Cannon wrote: >>> >> I'm a little confused by this. To support the older pathlib, they have >>> >> to do patharg = str(patharg), because *none* of the proposed >>> >> attributes (path or __path__) will exist. >>> >> >>> >> The getattr trick is needed to support the *new* pathlib, when you >>> >> need a real string. Currently you need a string if you call stdlib >>> >> functions or builtins. If we fix the stdlib/builtins, the need goes >>> >> away for those cases, but remains if you need to call libraries that >>> >> *don't* support pathlib (os.path will likely be one of those) or do >>> >> direct string manipulation. >>> >> >>> >> In practice, I see the getattr trick as an "easy fix" for libraries >>> >> that want to add support but in a minimally-intrusive way. On that >>> >> basis, making the trick easy to use is important, which argues for an >>> >> attribute. >>> > >>> > So then where's the confusion? :) You seem to get the points. I >>> personally >>> > find `path.__path__() if hasattr(path, '__path__') else path` also >>> readable >>> > (if obviously a bit longer). >>> >>> The confusion is that you seem to be saying that people can use >>> getattr(path, '__path__', path) to support older versions of Python. >>> But the older versions are precisely the ones that don't have __path__ >>> so you won't be supporting them. >>> >> >> Because pathlib is provisional the change will go into the next releases >> of Python 3.4, 3.5, and in 3.6 so new-old will have whatever we do. :) I >> think the key point is that this sort of thing will occur before you have >> access to some new built-in or something. >> >> >>> >>> >> >> > 3. Built-in? (name is dependent on #1 if we add one) >>> >> >> >>> >> >> fspath() -- and it would be handy to have a function that return >>> either >>> >> >> the __fspath__ results, or the string (if it was one), or raise an >>> >> >> exception if neither of the above work out. >>> >> >>> >> fspath regardless of the name chosen in #1 - a new builtin called path >>> >> just has too much likelihood of clashing with user code. >>> >> >>> >> But I'm not sure we need a builtin. I'm not at all clear how >>> >> frequently we expect user code to need to use this protocol. Users >>> >> can't use the builtin if they want to be backward compatible, But code >>> >> that doesn't need backward compatibility can probably just work with >>> >> pathlib (and the stdlib support for it) directly. For display, the >>> >> implicit conversion to str is fine. For "get me a string representing >>> >> the path", is the "path" attribute being abandoned in favour of this >>> >> special method? >>> > >>> > Yes. >>> >>> OK. So the idiom to get a string from a known Path object would be any >>> of: >>> >>> 1. str(path) >>> 2. fspath(path) >>> 3. path.__path__() >>> >>> (1) is safe if you know you have a Path object, but could incorrectly >>> convert non-Path objects. (2) is safe in all cases. (3) is ugly. Did I >>> miss any options? >>> >> >> Other than path.__path__ being an attribute, nope. >> >> >>> >>> So I think we need a builtin. >>> >> >> Well, the ugliness shouldn't survive forever if the community shifts over >> to using pathlib while the built-in will. We also don't have a built-in for >> __index__() so it depends on whether we expect this sort of thing to be the >> purview of library authors or if normal people will be interacting with it >> (it's probably both during the transition, but I don't know afterwards). >> >> >>> >>> Code that needs to be backward compatible will still have to use >>> str(path), because neither the builtin nor the __path__ protocol will >>> exist in older versions of Python. >> >> >> str(path) will definitely work, path.__path__ will work if you're running >> the next set of bugfix releases. fspath(path) will only work in Python 3.6 >> and newer. >> >> >>> Maybe a compatibility library could >>> add >>> >>> try: >>> fspath >>> except NameError: >>> try: >>> import pathlib >>> def fspath(p): >>> if isinstance(p, pathlib.Path): >>> return str(p) >>> return p >>> except ImportError: >>> def fspath(p): >>> return p >>> >>> It's messy, like all compatibility code, but it allows code to use >>> fspath(p) in older versions. >>> >> >> I would tweak it to check for __fspath__ before it resorted to calling >> str(), but yes, that could be something people use. >> >> >>> >>> >> I'm inclined to think that if you are writing "pure >>> >> pathlib" code, pathobj.path looks more readable than fspath(pathobj) - >>> >> certainly no *less* readable. >>> > >>> > I don't' know what you mean by "pure pathlib". You mean code that only >>> works >>> > with pathlib objects? Or do you mean code that accepts pathlib objects >>> but >>> > uses strings internally? >>> >>> I mean code that knows it has a Path object to work with (and not a >>> string or anything else). But the point is moot if the path attribute >>> is going away. >>> >>> Other than to say that I do prefer the name "path", I just don't think >>> it's a reasonable name for a builtin. Even if it's OK for user >>> variables to have the same name as builtins, IDEs tend to colour >>> builtins differently, which is distracting. (Temporary variables named >>> "file" or "dir" are the ones I hit frequently...) >>> >>> If all we're debating is the name, though, I think we're pretty much >>> there :-) >>> >> >> It seems like __fspath__ may be leading as a name, but not that many >> people have spoken up. But that is not the only thing still up for debate. >> :) >> >> We have not settled on whether a built-in is necessary. Maybe whatever >> function we come with should live in pathlib itself and not have it be a >> built-in? >> >> We have also not settled on whether __fspath__ should be a method or >> attribute as that changes the boilerplate one-liner people may use if a >> built-in isn't available. This is the first half of the protocol. >> >> What exactly should this helper function do? E.g. does it simply return >> its argument if __fspath__ isn't defined, or does it check for __fspath__, >> then if it's an instance of str, then TypeError? This is the second half of >> the protocol and will end up defining what a "path-like object" represents. >> > _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 6 19:37:11 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 16:37:11 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: <57059DA7.2090504@stoneleaf.us> On 04/06/2016 04:26 PM, Brett Cannon wrote: > On Wed, 6 Apr 2016 at 16:25 Nathaniel Smith wrote: >> For __index__ the "built-in" is: >> >> from operator import index > > Which suggests perhaps we should have pathlib.fspath() instead of a > built-in. +1 -- ~Ethan~ From ethan at stoneleaf.us Wed Apr 6 19:44:59 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 16:44:59 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: <57059F7B.3090901@stoneleaf.us> On 04/06/2016 04:27 PM, Brett Cannon wrote: > On Wed, 6 Apr 2016 at 15:54 Gregory P. Smithwrote: >> >> So would module.__path__ become a Path instance in a potential >> future making module.__path__.__path__ meaningfully confusing? ;) >> >> I'm not worried about people who shove pathlib.Path instances in as >> values into sys.modules and expect anything but pain. :P > > Ah, good point. I think that kills __path__ then as an option. Excellent! Narrowing the field then to: __fspath__ __os_path__ Step right up! Cast yer votes! -- ~Ethan~ From v+python at g.nevcal.com Wed Apr 6 20:21:03 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 6 Apr 2016 17:21:03 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57059F7B.3090901@stoneleaf.us> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> Message-ID: <5705A7EF.1070401@g.nevcal.com> On 4/6/2016 4:44 PM, Ethan Furman wrote: > On 04/06/2016 04:27 PM, Brett Cannon wrote: >> On Wed, 6 Apr 2016 at 15:54 Gregory P. Smithwrote: >>> >>> So would module.__path__ become a Path instance in a potential >>> future making module.__path__.__path__ meaningfully confusing? ;) >>> >>> I'm not worried about people who shove pathlib.Path instances in as >>> values into sys.modules and expect anything but pain. :P >> >> Ah, good point. I think that kills __path__ then as an option. > > Excellent! Narrowing the field then to: > > __fspath__ -1: not all os names that look like files actually refer to the file system: pipes, devices, etc. > > __os_path__ +1: the special names are os dependent, so os seems like an appropriate prefix. > > > Step right up! Cast yer votes! > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/v%2Bpython%40g.nevcal.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Apr 6 20:43:42 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 6 Apr 2016 17:43:42 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57059F7B.3090901@stoneleaf.us> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> Message-ID: <-5625672377616017435@unknownmsgid> >> Ah, good point. I think that kills __path__ then as an option. Darn. I really preferred that. Oh well. > __fspath__ +0.1 But not a big deal. I think this is pretty much for occasional use by library authors, so not a big deal what it is named. Which also means that I don't think we need a built-in function that calls it, either. How often do people need a stringified-path version of an arbitrary object? Which makes me think: str() calls __str__ on an arbitrary object, and creates a new string object. But fspath(), if it exists, would call __fspath__ on an arbitrary object, and create a new string -- not a new Path. That may be confusing... If we were starting from scratch, I suppose __path__ would return a Path object -- it would be a protocol one could use to duck-type a path. But since we have history, we are creating a protocol that conforms to the existing string-as-path protocol. So are we imagining that future libs will be written that only take objects with a __fspath__ method? In which case, do we need to add it to str? In which case, this is all kind of pointless. Or maybe all future libs will continue to accept either an str or an object with __fspath__. In which case, this is pretty pointless, too. I guess what I'm wondering is if we are stuck with str-paths as the lingua-Franca for paths forever. In which case, we should embrace that and just call str() on anything passed in as a path argument. Sure, then open(3.5) will give you a file not found error, or maybe create a file with a weird name, but really? Who's going to make that mistake and not figure it out really quickly? -CHB From ethan at stoneleaf.us Wed Apr 6 20:57:21 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 17:57:21 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <-5625672377616017435@unknownmsgid> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> Message-ID: <5705B071.1010207@stoneleaf.us> On 04/06/2016 05:43 PM, Chris Barker - NOAA Federal wrote: >> __fspath__ > > +0.1 > > But not a big deal. I think this is pretty much for occasional use by > library authors, so not a big deal what it is named. It's mostly for the stdlib itself. I imagine that most libraries would just take what they are given and pass it along to open or os.stat or whatever. > Which also means that I don't think we need a built-in function that > calls it, either. How often do people need a stringified-path version > of an arbitrary object? Not often. > Which makes me think: str() calls __str__ on an arbitrary object, and > creates a new string object. > > But fspath(), if it exists, would call __fspath__ on an arbitrary > object, and create a new string -- not a new Path. That may be > confusing... It would be more along the lines of pickle -- give me the standard serialized form of this Path, please. > If we were starting from scratch, I suppose __path__ would return a > Path object -- it would be a protocol one could use to duck-type a > path. Sure. > But since we have history, we are creating a protocol that conforms to > the existing string-as-path protocol. Yup. > So are we imagining that future libs will be written that only take > objects with a __fspath__ method? In which case, do we need to add it > to str? In which case, this is all kind of pointless. We are imagining that future libraries that have to muck about with paths will work with Path objects, either by accepting them or converting to them as the (possibly) stringified paths are passed in -- and when necessary those libs can pass either the Path obj or the stringified path to the stdlib. > Or maybe all future libs will continue to accept either an str or an > object with __fspath__. In which case, this is pretty pointless, too. The point is to allow future programs to work with Path and be able to work with the stdlib as seamlessly and painlessly as possible. > I guess what I'm wondering is if we are stuck with str-paths as the > lingua-Franca for paths forever. In which case, we should embrace that > and just call str() on anything passed in as a path argument. Nah. That's inviting trouble and pain, and we're trying to get away from that. > Sure, then open(3.5) will give you a file not found error, or maybe > create a file with a weird name, but really? Who's going to make that > mistake and not figure it out really quickly? Well, since the 3.5 was actually in my_var, and could have been written before it was read, it could easily be days, weeks, or even months -- probably after the last guy quit, you took the job, the server died, and you had to restore from backup -- at which point you'll see all the really, really strange file names and wonder what they are. And of course, whatever logic was determining those weird names is now out of sync because of the server swap. And, yeah, I've seen weirder things happen. -- ~Ethan~ From wes.turner at gmail.com Wed Apr 6 22:24:19 2016 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 6 Apr 2016 21:24:19 -0500 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On Apr 6, 2016 6:31 PM, "Brett Cannon" wrote: > > > > On Wed, 6 Apr 2016 at 16:25 Nathaniel Smith wrote: >> >> On Wed, Apr 6, 2016 at 3:46 PM, Brett Cannon wrote: >> > >> > >> > On Wed, 6 Apr 2016 at 15:22 Paul Moore wrote: >> >> >> >> So I think we need a builtin. >> > >> > >> > Well, the ugliness shouldn't survive forever if the community shifts over to >> > using pathlib while the built-in will. We also don't have a built-in for >> > __index__() so it depends on whether we expect this sort of thing to be the >> > purview of library authors or if normal people will be interacting with it >> > (it's probably both during the transition, but I don't know afterwards). >> >> For __index__ the "built-in" is: >> >> from operator import index > > > Which suggests perhaps we should have pathlib.fspath() instead of a built-in. Would it make sense to instead have pathlib.Path.__init__? > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 6 22:40:55 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 19:40:55 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: <5705C8B7.6000802@stoneleaf.us> On 04/06/2016 07:24 PM, Wes Turner wrote: > On Apr 6, 2016 6:31 PM, "Brett Cannon" wrote: >> Which suggests perhaps we should have pathlib.fspath() instead of a >> built-in. > > Would it make sense to instead have pathlib.Path.__init__? We already have that -- it's what makes a Path. What we are looking for is a function that accepts a Path or a str and returns the Path as a str, or the str passed in. -- ~Ethan~ From wes.turner at gmail.com Wed Apr 6 23:12:47 2016 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 6 Apr 2016 22:12:47 -0500 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <5705C8B7.6000802@stoneleaf.us> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <5705C8B7.6000802@stoneleaf.us> Message-ID: My mistake. On Wed, Apr 6, 2016 at 9:40 PM, Ethan Furman wrote: > On 04/06/2016 07:24 PM, Wes Turner wrote: > >> On Apr 6, 2016 6:31 PM, "Brett Cannon" wrote: >> > > Which suggests perhaps we should have pathlib.fspath() instead of a >>> built-in. >>> >> >> Would it make sense to instead have pathlib.Path.__init__? >> > > We already have that -- it's what makes a Path. > > What we are looking for is a function that accepts a Path or a str and > returns the Path as a str, or the str passed in. > > -- > ~Ethan~ > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Apr 6 23:50:23 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 6 Apr 2016 20:50:23 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <5705B071.1010207@stoneleaf.us> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705B071.1010207@stoneleaf.us> Message-ID: On Wed, Apr 6, 2016 at 5:57 PM, Ethan Furman wrote: > But not a big deal. I think this is pretty much for occasional use by > > library authors, so not a big deal what it is named. >> > > It's mostly for the stdlib itself. I imagine that most libraries would > just take what they are given and pass it along to open or os.stat or > whatever. > Exactly -- so we really don't need a builtin shortcut. > Which makes me think: str() calls __str__ on an arbitrary object, and >> creates a new string object. >> >> But fspath(), if it exists, would call __fspath__ on an arbitrary >> object, and create a new string -- not a new Path. That may be >> confusing... >> > > It would be more along the lines of pickle -- give me the standard > serialized form of this Path, please. > well, give me the standard serialized-path of this arbitrary object, yes? > So are we imagining that future libs will be written that only take >> objects with a __fspath__ method? In which case, do we need to add it >> to str? In which case, this is all kind of pointless. >> > > We are imagining that future libraries that have to muck about with paths > will work with Path objects, either by accepting them or converting to them > as the (possibly) stringified paths are passed in -- and when necessary > those libs can pass either the Path obj or the stringified path to the > stdlib. if that's the case, we don't need the __fspath__ protocol -- the reason for the protocol is that we imagine there may be any number of third-party objects to represent/work-with paths, that aren't strings or stdlib Path objects. Or maybe all future libs will continue to accept either an str or an >> object with __fspath__. In which case, this is pretty pointless, too. >> > > The point is to allow future programs to work with Path and be able to > work with the stdlib as seamlessly and painlessly as possible. > again, we don't need a new protocol for that -- we only need the protocol if we want arbitrary future programs to work with arbitrary path implementations. which I suppose we do -- there are already other path implimentaitons out there (though at least some are strings :-) ) > I guess what I'm wondering is if we are stuck with str-paths as the >> lingua-Franca for paths forever. In which case, we should embrace that >> and just call str() on anything passed in as a path argument. >> > > Nah. That's inviting trouble and pain, and we're trying to get away from > that. > > Sure, then open(3.5) will give you a file not found error, or maybe >> create a file with a weird name, but really? Who's going to make that >> mistake and not figure it out really quickly? >> > > Well, since the 3.5 was actually in my_var, and could have been written > before it was read, it could easily be days, weeks, or even months -- > probably after the last guy quit, you took the job, the server died, and > you had to restore from backup -- at which point you'll see all the really, > really strange file names and wonder what they are. And of course, > whatever logic was determining those weird names is now out of sync because > of the server swap. > > And, yeah, I've seen weirder things happen. > People can totally screw up path variables as strings or Path objects too -- I'm having trouble seeing that this is all that more likely -- after all, python is a dynamic language -- if we wanted full type safety, we wouldn't be using python... Speaking of which, how is this going to work with the new type system? Do we need an ABC, rather than just a protocol? But as long as we get to the stdlib taking Path objects, I'm happy :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Apr 7 00:15:19 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 21:15:19 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705B071.1010207@stoneleaf.us> Message-ID: <5705DED7.2070303@stoneleaf.us> On 04/06/2016 08:50 PM, Chris Barker wrote: > On Wed, Apr 6, 2016 at 5:57 PM, Ethan Furman wrote: >> It's mostly for the stdlib itself. I imagine that most libraries >> would just take what they are given and pass it along to open or >> os.stat or whatever. > > Exactly -- so we really don't need a builtin shortcut. Hey, we have to program the stdlib too! No need to make it harder for ourselves. >> It would be more along the lines of pickle -- give me the standard >> serialized form of this Path, please. > > well, give me the standard serialized-path of this arbitrary object, > yes? Yes. :) >> We are imagining that future libraries that have to muck about with >> paths will work with Path objects, either by accepting them or >> converting to them as the (possibly) stringified paths are passed in >> -- and when necessary those libs can pass either the Path obj or the >> stringified path to the stdlib. > > if that's the case, we don't need the __fspath__ protocol -- then > reason for the protocol is that we imagine there may be any number of > third-party objects to represent/work-with paths, that aren't strings > or stdlib Path objects. The purpose of the __os_path__ method is two-fold: - it's presence declares that the object is a path (or convertible to one) - it does the conversion Since we need it for ourselves there's no reason to prevent others from taking advantage of it. >> The point is to allow future programs to work with Path and be able >> to work with the stdlib as seamlessly and painlessly as possible. > > again, we don't need a new protocol for that -- we only need the > protocol if we want arbitrary future programs to work with arbitrary > path implementations. I am certainly not opposed to that. > which I suppose we do -- there are already other path implimentaitons > out there (though at least some are strings :-) ) Right. And I'm already making changes to mine to work with this new stuff. > People can totally screw up path variables as strings or Path objects > too -- I'm having trouble seeing that this is all that more likely -- > after all, python is a dynamic language -- if we wanted full type > safety, we wouldn't be using python... Very True. ;) > Speaking of which, how is this going to work with the new type > system? Do we need an ABC, rather than just a protocol? I do not know, good question. > But as long as we get to the stdlib taking Path objects, I'm happy :-) Excellent! -- ~Ethan~ From stephen at xemacs.org Thu Apr 7 00:37:48 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 7 Apr 2016 13:37:48 +0900 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705B071.1010207@stoneleaf.us> Message-ID: <22277.58396.936738.919834@turnbull.sk.tsukuba.ac.jp> Chris Barker writes: > which I suppose we do -- there are already other path implimentaitons out > there (though at least some are strings :-) ) Even so, their __fspath__ implementation might return syntactically canonicalized or realpath paths, rather than whatever is input. If cached and the path frequently accessed, the realpath implementation might be a significant win in some applications. From raymond.hettinger at gmail.com Thu Apr 7 01:08:53 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Wed, 6 Apr 2016 22:08:53 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <7324A271-1736-4385-8D35-CD48EA74F4C8@gmail.com> > On Apr 5, 2016, at 3:55 PM, Guido van Rossum wrote: > > It's been provisional since 3.4. I think if it is still there in 3.6.0 > it should be considered no longer provisional. But this may indeed be > a test case for the ultimate fate of provisional modules -- should we > remove it? I lean slightly towards for removal. Having worked through the API when it is first released, I find it to be highly forgettable (i.e. I have to re-read the docs each time I've revisited it). While I haven't seen any uptake in real code, there are occasional questions about it on StackOverflow, so we do know that there is at least some interest. I'm not sure that it needs to live in the standard library though. Raymond From ericsnowcurrently at gmail.com Thu Apr 7 01:45:56 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 Apr 2016 23:45:56 -0600 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> <20160406154334.058182b6@subdivisions.wooz.org> Message-ID: On Apr 6, 2016 14:00, "Barry Warsaw" wrote: > Aside from the name of the attribute (though I'm partial to __path__), Ahem, pkg.__path__. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Apr 7 02:15:40 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Apr 2016 18:15:40 +1200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <-5625672377616017435@unknownmsgid> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> Message-ID: <5705FB0C.2090705@canterbury.ac.nz> Chris Barker - NOAA Federal wrote: > But fspath(), if it exists, would call __fspath__ on an arbitrary > object, and create a new string -- not a new Path. That may be > confusing... Maybe something like fspathstr/__fspathstr__ would be better? -- Greg From ethan at stoneleaf.us Thu Apr 7 02:31:27 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 23:31:27 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <5705FB0C.2090705@canterbury.ac.nz> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705FB0C.2090705@canterbury.ac.nz> Message-ID: <5705FEBF.3070301@stoneleaf.us> On 04/06/2016 11:15 PM, Greg Ewing wrote: > Chris Barker - NOAA Federal wrote: >> But fspath(), if it exists, would call __fspath__ on an arbitrary >> object, and create a new string -- not a new Path. That may be >> confusing... > > Maybe something like fspathstr/__fspathstr__ would be better? As someone already said, we don't need to embed the type in the name. The point of the __os_path__ protocol is to return the serialized version of the Path the object represents. This would be somewhat similar to the various __reduce*__ protocols (which I thought had something to do with adding until I learned what they were for). -- ~Ethan~ From ethan at stoneleaf.us Thu Apr 7 02:34:27 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 06 Apr 2016 23:34:27 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: <5705FF73.2050806@stoneleaf.us> On 04/06/2016 10:26 AM, Brett Cannon wrote: > 2. Method or attribute? (changes what kind of one-liner you might use > in libraries, but I think historically all protocols have been > methods and the serialized string representation might be costly to > build) Having thought about this some more, it seems we have enough __dunder__ attributes that are plain strings that having this one also be a plain string should not be a problem: - __name__ - __module__ - __file__ Since Paths are immutable the __os_path__ attribute isn't going to change and doesn't need to be a method. -- ~Ethan~ From songofacandy at gmail.com Thu Apr 7 03:00:49 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 7 Apr 2016 16:00:49 +0900 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <570548DD.7080108@gmail.com> Message-ID: On Thu, Apr 7, 2016 at 2:41 AM, Brett Cannon wrote: > > > On Wed, 6 Apr 2016 at 10:36 Michel Desmoulin > wrote: > >> Wouldn't be better to generalize that to a "__location__" protocol, >> which allow to return any kind of location, including path, url or >> coordinate, ip_address, etc ? >> > > No because all of those things have different semantic meaning. See the > __index__ PEP for reasons why you would tightly bound protocols instead of > overloading ones like __int__ for multiple meanings. > > -Brett > https://www.python.org/dev/peps/pep-0357/ > It is not possible to use the nb_int (and __int__ special method) > for this purpose because that method is used to *coerce* objects > to integers. I feel adding protocol only for path is bit over engineering. So I'm -0.5 on adding __fspath__. I'm +1 on adding general protocol for *coerce to string* like __index__. +0.5 on inherit from str (and drop byte path support). -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Thu Apr 7 03:04:28 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 7 Apr 2016 16:04:28 +0900 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <570548DD.7080108@gmail.com> Message-ID: FYI, Ruby's Pathname class doesn't inherit String. http://ruby-doc.org/stdlib-2.1.0/libdoc/pathname/rdoc/Pathname.html Ruby has two "convert to string" method. `.to_s` is like `__str__`. `.to_str` is like `__index__` but for str. It is used for implicit conversion. File.open accepts any object implements `.to_str`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Thu Apr 7 03:19:17 2016 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 7 Apr 2016 09:19:17 +0200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On 04/06/2016 07:26 PM, Brett Cannon wrote: > WIth Ethan volunteering to do the work to help make a path protocol a thing -- > and I'm willing to help along with propagating this through the stdlib where I > think Serhiy might be interested in helping as well -- and a seeming consensus > this is a good idea, it seems like this proposal has a chance of actually coming > to fruition. > > Now we need clear details. :) Some open questions are: Throwing in my 2 bikesheds here, not having read all subthreads: > 1. Name: __path__, __fspath__, or something else? __path__ is already taken as a module attribute, so I would avoid it. __fspath__ is fine with me, although the more explicit variants are also ok. It's not like you need to read/write it constantly (that's the goal). > 2. Method or attribute? (changes what kind of one-liner you might use in > libraries, but I think historically all protocols have been methods and the > serialized string representation might be costly to build) An attribute would be somewhat inconsistent with the special-method lookup rules (looked up on the type, not the instance), so a method is probably a better choice. > 3. Built-in? (name is dependent on #1 if we add one) I don't think it warrants a builtin. I'd place it as a function in pathlib. > 4. Add the method/attribute to str? (I assume so, much like __index__() is on > int, but I have not seen it explicitly stated so I would rather clarify it) +1. > 5. Expand the C API to have something like PyObject_Path()? +1 (with _Py_ at first) since you're going to need it in a lot of C functions. Georg From p.f.moore at gmail.com Thu Apr 7 03:59:14 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 7 Apr 2016 08:59:14 +0100 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> Message-ID: On 6 April 2016 at 23:46, Brett Cannon wrote: > str(path) will definitely work, path.__path__ will work if you're running > the next set of bugfix releases. fspath(path) will only work in Python 3.6 > and newer. Ah, that was something I hadn't appreciated, that the builtin would be 3.6+ whereas the protocol would be added to current bugfix releases. >> Maybe a compatibility library could >> add >> >> try: >> fspath >> except NameError: >> try: >> import pathlib >> def fspath(p): >> if isinstance(p, pathlib.Path): >> return str(p) >> return p >> except ImportError: >> def fspath(p): >> return p >> >> It's messy, like all compatibility code, but it allows code to use >> fspath(p) in older versions. > > > I would tweak it to check for __fspath__ before it resorted to calling > str(), but yes, that could be something people use. Yeah, the above code assumes that if the builtin isn't available, nor will the protocol be (see my misunderstanding above). Paul From Nikolaus at rath.org Thu Apr 7 06:48:28 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Thu, 07 Apr 2016 12:48:28 +0200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <5705FEBF.3070301@stoneleaf.us> (Ethan Furman's message of "Wed, 06 Apr 2016 23:31:27 -0700") References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us> Message-ID: <87oa9l3dab.fsf@thinkpad.rath.org> On Apr 06 2016, Ethan Furman wrote: > On 04/06/2016 11:15 PM, Greg Ewing wrote: >> Chris Barker - NOAA Federal wrote: >>> But fspath(), if it exists, would call __fspath__ on an arbitrary >>> object, and create a new string -- not a new Path. That may be >>> confusing... >> >> Maybe something like fspathstr/__fspathstr__ would be better? > > As someone already said, we don't need to embed the type in the name. > > The point of the __os_path__ protocol is to return the serialized > version of the Path the object represents. This would be somewhat > similar to the various __reduce*__ protocols (which I thought had > something to do with adding until I learned what they were for). Does anyone anticipate any classes other than those from pathlib to come with such a method? It seems odd to me to introduce a special method (and potentially a buildin too) if it's only going to be used by a single module. Why is: path = getattr(obj, '__fspath__') if hasattr(obj, '__fspath__') else obj better than path = str(obj) if isinstance(obj, pathlib.Path) else obj ? Yes, I know there are other pathlib-like modules out there. But isn't pathlib meant to replace them? Best, Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From donald at stufft.io Thu Apr 7 07:03:56 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 7 Apr 2016 07:03:56 -0400 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <87oa9l3dab.fsf@thinkpad.rath.org> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us> <87oa9l3dab.fsf@thinkpad.rath.org> Message-ID: <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io> > On Apr 7, 2016, at 6:48 AM, Nikolaus Rath wrote: > > Does anyone anticipate any classes other than those from pathlib to come > with such a method? It seems like it would be reasonable for pathlib.Path to call fspath on the path passed to pathlib.Path.__init__, which would mean that if other libraries implemented __fspath__ then you could pass their path objects to pathlib and it would just work (and similarly, if they also called fspath it would enable interoperation between all of the various path libraries). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Thu Apr 7 07:05:43 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 7 Apr 2016 12:05:43 +0100 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <87oa9l3dab.fsf@thinkpad.rath.org> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us> <87oa9l3dab.fsf@thinkpad.rath.org> Message-ID: On 7 April 2016 at 11:48, Nikolaus Rath wrote: > Why is: > > path = getattr(obj, '__fspath__') if hasattr(obj, '__fspath__') else obj > > better than > > path = str(obj) if isinstance(obj, pathlib.Path) else obj One reason is that the former doesn't need you to import pathlib, which is good if you need to work with older versions of Python that don't have pathlib at all (yes, it's just some standard conditional import boilerplate, but it's additional messiness). Paul From rosuav at gmail.com Thu Apr 7 08:11:34 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 7 Apr 2016 22:11:34 +1000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57059F7B.3090901@stoneleaf.us> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> Message-ID: On Thu, Apr 7, 2016 at 9:44 AM, Ethan Furman wrote: > Excellent! Narrowing the field then to: > > __fspath__ > > __os_path__ > > > Step right up! Cast yer votes! +0.9 for __fspath__; I'd prefer a one-word name, but with __path__ out of the running (which I agree with), there's no other obvious word. __fspath__ is a close second. -1 for __os_path__, unless it's reasonable to justify it as "most of the standard library uses Path objects, but os.path uses strings, so before you pass a Path to anything in os.path, you call path.ospath() on it, which calls __os_path__()". And that seems a bit hairy and roundabout; what it's _really_ doing is giving you back a string, and that has little to do with os.path. ChrisA From ericsnowcurrently at gmail.com Thu Apr 7 10:21:34 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 7 Apr 2016 08:21:34 -0600 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On Apr 7, 2016 1:22 AM, "Georg Brandl" wrote: > > On 04/06/2016 07:26 PM, Brett Cannon wrote: > > 1. Name: __path__, __fspath__, or something else? > > __path__ is already taken as a module attribute, so I would avoid it. > __fspath__ is fine with me, although the more explicit variants are also > ok. It's not like you need to read/write it constantly (that's the goal). +1 I also think that __ospath__ may be more correct since it is an OS-dependent representation, e.g. slash vs. backslash. > > > 2. Method or attribute? (changes what kind of one-liner you might use in > > libraries, but I think historically all protocols have been methods and the > > serialized string representation might be costly to build) > > An attribute would be somewhat inconsistent with the special-method lookup rules > (looked up on the type, not the instance), so a method is probably a better > choice. I was just about to point this out. The deviation by pickle (lookup on instance rather than type) has been a source of pain. > > > 3. Built-in? (name is dependent on #1 if we add one) > > I don't think it warrants a builtin. I'd place it as a function in pathlib. +1 > > > 4. Add the method/attribute to str? (I assume so, much like __index__() is on > > int, but I have not seen it explicitly stated so I would rather clarify it) > > +1. +1 > > > 5. Expand the C API to have something like PyObject_Path()? > > +1 (with _Py_ at first) since you're going to need it in a lot of C functions. +1 -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Thu Apr 7 10:25:43 2016 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Thu, 7 Apr 2016 10:25:43 -0400 Subject: [Python-Dev] pathlib (was: Defining a path protocol) Message-ID: (1) I think the "built-in" should instead be a module-level function in the pathlib. If you aren't already expecting pathlib paths, then you're just expecting strings to work anyhow, and a builtin isn't likely to be helpful. (2) I prefer that the function be explicit about the fact that it is downcasting the representation to a string. e.g., pathlib.path_as_string(my_path) But if the final result is ospath or fspath or ... I won't fight too hard, particularly since the output may be a bytestring rather than a str. -jJ From ericsnowcurrently at gmail.com Thu Apr 7 10:40:37 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 7 Apr 2016 08:40:37 -0600 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: References: Message-ID: On Apr 6, 2016 11:11 PM, "Raymond Hettinger" wrote: > Having worked through the API when it is first released, I find it to be highly forgettable (i.e. I have to re-read the docs each time I've revisited it). Agreed, though it's arguably better than argparse, logging, unittest, or several other stdlib modules. To some extent the challenge with those is the complexity of the problem space. Furthermore, the key for any sufficiently complex module is that the common-case usage is intuitive and simple enough. Some stdlib modules do a better job of that than others. :/ How much would you say that any of that applies to pathlib? What about relative to other similar packages on the cheeseshop? Regardless, are there any specific improvements you'd recommend while the module is still provisional? Are your concerns a matter of structure vs. naming? Usability vs. (intuitive) discoverability? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Apr 7 11:18:55 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 7 Apr 2016 16:18:55 +0100 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: References: Message-ID: On 7 April 2016 at 15:40, Eric Snow wrote: > On Apr 6, 2016 11:11 PM, "Raymond Hettinger" > wrote: >> Having worked through the API when it is first released, I find it to be >> highly forgettable (i.e. I have to re-read the docs each time I've revisited >> it). > > Agreed, though it's arguably better than argparse, logging, unittest, or > several other stdlib modules. To some extent the challenge with those is > the complexity of the problem space. Furthermore, the key for any > sufficiently complex module is that the common-case usage is intuitive and > simple enough. Some stdlib modules do a better job of that than others. :/ > How much would you say that any of that applies to pathlib? What about > relative to other similar packages on the cheeseshop? Personally, the main issue I have with remembering pathlib method names, is the inconsistency with the existing modules. I always have to check that it's path.is_dir() compared to os.path.isdir(pathstr). And it's os.path.dirname(pathstr) vs path.parent. On the other hand, the consistency between path.parent (for the immediate parent) and path.parents (for the sequence of parents) is useful, so it's not clear cut. There's nothing fundamentally *wrong* with the pathlib method names, but there's no obvious reason why they needed to change. I'll get used to them. It's just one more stumbling block that makes me feel like it's a bit too hard to bother, and I go back to os.path. Would I change the names? I honestly don't know. If os.path was going to disappear, then no - the inconsistency is a short term problem. But even if there's a major switch to pathlib, I expect os.path to remain indefinitely, and that inconsistency will be a wart that we'll have to live with for a long time. Paul From ethan at stoneleaf.us Thu Apr 7 11:33:13 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 07 Apr 2016 08:33:13 -0700 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <57067DB9.7060804@stoneleaf.us> On 04/07/2016 08:18 AM, Paul Moore wrote: > On 7 April 2016 at 15:40, Eric Snow wrote: >> On Apr 6, 2016 11:11 PM, "Raymond Hettinger" wrote: >>> Having worked through the API when it is first released, I find it to be >>> highly forgettable (i.e. I have to re-read the docs each time I've revisited >>> it). >> >> Agreed, though it's arguably better than argparse, logging, unittest, or >> several other stdlib modules. > Personally, the main issue I have with remembering pathlib method > names, is the inconsistency with the existing modules. That is one of the things I really dislike. If the behaviour is the same as the os version, it should have the same name. I also have no problem with new names that makes more sense so long as an alias exists for the os version (can even be deprecated without removal). > Would I change the names? I honestly don't know. If os.path was going > to disappear, then no - the inconsistency is a short term problem. But > even if there's a major switch to pathlib, I expect os.path to remain > indefinitely, and that inconsistency will be a wart that we'll have to > live with for a long time. os.path isn't going anywhere. -- ~Ethan~ From desmoulinmichel at gmail.com Thu Apr 7 06:50:42 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Thu, 7 Apr 2016 12:50:42 +0200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <570575C9.7060208@mail.de> References: <5704909E.8070908@stoneleaf.us> <570575C9.7060208@mail.de> Message-ID: <57063B82.3090307@gmail.com> Le 06/04/2016 22:47, Sven R. Kunze a ?crit : > On 06.04.2016 07:00, Guido van Rossum wrote: >> On Tue, Apr 5, 2016 at 9:29 PM, Ethan Furman wrote: >>> [...] we can't do: >>> >>> app_root = Path(...) >>> config = app_root/'settings.cfg' >>> with open(config) as blah: >>> # whatever >>> >>> It feels like instead of addressing this basic disconnect, the answer >>> has >>> instead been: add that to pathlib! Which works great -- until a >>> user or a >>> library gets this path object and tries to use something from os on it. >> I agree that asking for config.open() isn't the right answer here >> (even if it happens to work). > > How come? > >> But in this example, once 3.5.2 is out, >> the solution would be to use open(config.path), and that will also >> work when passing it to a library. Is it still unacceptable then? > > I think so. Although in this example I would prefer the shorter > config.open alternative as I am lazy. > > > I still cannot remember what the concrete issue was why we dropped > pathlib the same day we gave it a try. It was something really stupid > and although I hoped to reduce the size of the code, it was less > readable. But it was not the path->str issue but something more mundane. > It was something that forced us to use os[.path] as Path didn't provide > something equivalent. Cannot remember..... Path objects don't have splitext() or and don't allow "string" / path. Those are the ones bugging me the most. > > > Best, > Sven > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com > From projetmbc at gmail.com Thu Apr 7 01:24:42 2016 From: projetmbc at gmail.com (Christophe Bal) Date: Thu, 7 Apr 2016 07:24:42 +0200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <57044567.6070308@sdamon.com> Message-ID: As a simple user, pathlib simplifies playing with paths. A lot of things are easy to do. For example, Pathlib / "subfile" is so useful. I also have a subclass of pathlib.Path on github that makes easy seeking for files and directories. So keep alive pathlib ! Le 6 avr. 2016 13:06, "Paul Moore" a ?crit : On 6 April 2016 at 00:45, Guido van Rossum wrote: > This does sound like it's the crucial issue, and it is worth writing > up clearly the pros and cons. Let's draft those lists in a thread > (this one's fine) and then add them to the PEP. We can then decide to: > > - keep the status quo > - change PurePath to inherit from str > - decide it's never going to be settled and kill pathlib.py > > (And yes, I'm dead serious about the latter, rather Solomonic option.) By the way, even if there's no solution that satisfies everyone to the "inherit from str" question, I'd still be unhappy if pathlib disappeared from the stdlib. It's useful for quick admin scripts that don't justify an external dependency. Those typically do quite a bit of path manipulation, and as such benefit from the improved API of pathlib over os.path. +1 on making (and documenting) a final decision on the "inherit from str" question -1 on removing pathlib just because that decision might not satisfy everyone Paul _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/projetmbc%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Apr 7 11:44:12 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 7 Apr 2016 08:44:12 -0700 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: <57067DB9.7060804@stoneleaf.us> References: <57067DB9.7060804@stoneleaf.us> Message-ID: <-4168706305295605305@unknownmsgid> >> Personally, the main issue I have with remembering pathlib method >> names, is the inconsistency with the existing modules. Was this *really* not brought up when this was introduced? Oh well. We could add aliases, but I think it's not such a big deal. I'm convinced that the largest barrier to adoption has been that it can't be used with the stdlib. And I think the discussion on Python-ideas supports that. That, and py2 compatibility. There is a back port on PyPi, but it can't be used with the stdlib, either. Not sure what to do about that--maybe it should inherit from Unicode? -CHB > That is one of the things I really dislike. If the behaviour is the same as the os version, it should have the same name. I also have no problem with new names that makes more sense so long as an alias exists for the os version (can even be deprecated without removal). > >> Would I change the names? I honestly don't know. If os.path was going >> to disappear, then no - the inconsistency is a short term problem. But >> even if there's a major switch to pathlib, I expect os.path to remain >> indefinitely, and that inconsistency will be a wart that we'll have to >> live with for a long time. > > os.path isn't going anywhere. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov From ethan at stoneleaf.us Thu Apr 7 11:47:56 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 07 Apr 2016 08:47:56 -0700 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: <-4168706305295605305@unknownmsgid> References: <57067DB9.7060804@stoneleaf.us> <-4168706305295605305@unknownmsgid> Message-ID: <5706812C.6060507@stoneleaf.us> On 04/07/2016 08:44 AM, Chris Barker - NOAA Federal wrote: > We could add aliases, but I think it's not such a big deal. I'm > convinced that the largest barrier to adoption has been that it can't > be used with the stdlib. And I think the discussion on Python-ideas > supports that. Lack of interoperability is a huge issue; using different but similar names is still an issue. > That, and py2 compatibility. There is a back port on PyPi, but it > can't be used with the stdlib, either. Not sure what to do about > that--maybe it should inherit from Unicode? Also huge, and agree it (the backport) should inherit from unicode. -- ~Ethan~ From ethan at stoneleaf.us Thu Apr 7 11:52:11 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 07 Apr 2016 08:52:11 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <57063B82.3090307@gmail.com> References: <5704909E.8070908@stoneleaf.us> <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com> Message-ID: <5706822B.1010801@stoneleaf.us> On 04/07/2016 03:50 AM, Michel Desmoulin wrote: > Path objects don't have splitext() or and don't allow "string" / path. > Those are the ones bugging me the most. --> Path('README.md') --> p = Path('README.md') # PosixPath('README.md') --> '/home/ethan' / p # PosixPath('/home/ethan/README.md') --> p.splitext() Traceback (most recent call last): File "", line 1, in AttributeError: 'PosixPath' object has no attribute 'splitext' So, yeah, no .splitext() -- ~Ethan~ From zachary.ware+pydev at gmail.com Thu Apr 7 12:13:22 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Thu, 7 Apr 2016 11:13:22 -0500 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <57063B82.3090307@gmail.com> References: <5704909E.8070908@stoneleaf.us> <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com> Message-ID: On Thu, Apr 7, 2016 at 5:50 AM, Michel Desmoulin wrote: > Path objects don't have splitext() or and don't allow "string" / path. > Those are the ones bugging me the most. >>> import pathlib >>> p = '/some/test' / pathlib.Path('path') / 'file_with.ext' >>> p PosixPath('/some/test/path/file_with.ext') >>> p.parent, p.stem, p.suffix (PosixPath('/some/test/path'), 'file_with', '.ext') -- Zach From chris.barker at noaa.gov Thu Apr 7 12:50:49 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 7 Apr 2016 09:50:49 -0700 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: <57063B82.3090307@gmail.com> References: <5704909E.8070908@stoneleaf.us> <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com> Message-ID: On Thu, Apr 7, 2016 at 3:50 AM, Michel Desmoulin wrote: > > Path objects don't have splitext() that is useful -- let's add it. (and others if need be) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Apr 7 12:56:21 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 7 Apr 2016 09:56:21 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <570548DD.7080108@gmail.com> Message-ID: On Thu, Apr 7, 2016 at 12:00 AM, INADA Naoki wrote: > > I feel adding protocol only for path is bit over engineering. So I'm -0.5 > on adding __fspath__. > > I'm +1 on adding general protocol for *coerce to string* like __index__. > isn't __str__ the protocol for "coerce to string" ? __index__ is a protocol for "coerce to an integer that can be used as an index", which is like __fspath__ would be "coerce to a string that can be used as a path" the whole point is that __str__ will "work" with virtually anything -- whether it can reasonably be used as a path or not. I'm not sure that's a problem, but if it is, then that's what this new protocol is trying to solve, just like __Index__ enforces that only things that are intended to be used as indexes will work. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Apr 7 12:59:22 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 7 Apr 2016 09:59:22 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us> <87oa9l3dab.fsf@thinkpad.rath.org> <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io> Message-ID: On Thu, Apr 7, 2016 at 4:03 AM, Donald Stufft wrote: > It seems like it would be reasonable for pathlib.Path to call fspath on the > path passed to pathlib.Path.__init__, which would mean that if other > libraries > implemented __fspath__ then you could pass their path objects to pathlib > and > it would just work and then any lib that needed a path, could simply wrap Path() around whatever was passed in. This is much like using np.array() if you want numpy arrays -- it works great. numpy is trickier because they are mutable and can be big, so you don't want to make a copy if you don't need to -- hence the np.asarray() function -- but Paths are immutable and far more lightweight. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Thu Apr 7 13:02:11 2016 From: antoine at python.org (Antoine Pitrou) Date: Thu, 7 Apr 2016 19:02:11 +0200 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: References: Message-ID: <57069293.9040800@python.org> Le 07/04/2016 16:40, Eric Snow a ?crit : > > On Apr 6, 2016 11:11 PM, "Raymond Hettinger" > > wrote: >> Having worked through the API when it is first released, I find it to > be highly forgettable (i.e. I have to re-read the docs each time I've > revisited it). > > Agreed, though it's arguably better than argparse, logging, unittest, or > several other stdlib modules. To some extent the challenge with those > is the complexity of the problem space. Furthermore, the key for any > sufficiently complex module is that the common-case usage is intuitive > and simple enough. This is terribly unspecific as far as criticism goes. "Highly forgettable" depends on who you ask. I tend to find unittest and logging quite useful myself, even if I have to look at the docs from time to time (and I'm certainly not the only one). I don't think you'll find an API that doesn't need any learning or getting used, unless it's simply copying another API. os.path() is extremely forgettable as well, but after years of getting used people may feel it's "natural". Put Python in the hands of a non-Python programmer, they will find many things bizarre and uncomfortable compared to their language of choice... Regards Antoine. From desmoulinmichel at gmail.com Thu Apr 7 14:19:04 2016 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Thu, 7 Apr 2016 20:19:04 +0200 Subject: [Python-Dev] When should pathlib stop being provisional? In-Reply-To: References: <5704909E.8070908@stoneleaf.us> <570575C9.7060208@mail.de> <57063B82.3090307@gmail.com> Message-ID: <5706A498.90507@gmail.com> Fair enough, I stand corrected for both points. Le 07/04/2016 18:13, Zachary Ware a ?crit : > On Thu, Apr 7, 2016 at 5:50 AM, Michel Desmoulin > wrote: >> Path objects don't have splitext() or and don't allow "string" / path. >> Those are the ones bugging me the most. > >>>> import pathlib >>>> p = '/some/test' / pathlib.Path('path') / 'file_with.ext' >>>> p > PosixPath('/some/test/path/file_with.ext') >>>> p.parent, p.stem, p.suffix > (PosixPath('/some/test/path'), 'file_with', '.ext') > > From njs at pobox.com Thu Apr 7 14:44:30 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 7 Apr 2016 11:44:30 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <570548DD.7080108@gmail.com> Message-ID: On Apr 7, 2016 10:00 AM, "Chris Barker" wrote: > > On Thu, Apr 7, 2016 at 12:00 AM, INADA Naoki wrote: >> >> >> I feel adding protocol only for path is bit over engineering. So I'm -0.5 on adding __fspath__. >> >> I'm +1 on adding general protocol for *coerce to string* like __index__. > > > isn't __str__ the protocol for "coerce to string" ? > > __index__ is a protocol for "coerce to an integer that can be used as an index", which is like __fspath__ would be "coerce to a string that can be used as a path" No, __index__ is the protocol for "do a safe coerce to integer". The name is misleading, but its use in non-indexing contexts is well established. E.g. " ab" * obj will return a string with obj.__index__() repetitions. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Apr 7 15:03:31 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 7 Apr 2016 12:03:31 -0700 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: <57069293.9040800@python.org> References: <57069293.9040800@python.org> Message-ID: On Thu, Apr 7, 2016 at 10:02 AM, Antoine Pitrou wrote: > >> Having worked through the API when it is first released, I find it to > > be highly forgettable > > This is terribly unspecific as far as criticism goes. "Highly > forgettable" depends on who you ask. Exactly -- for my part, I need to look up most of os.path every time I use it.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Apr 7 15:06:19 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 7 Apr 2016 12:06:19 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <570548DD.7080108@gmail.com> Message-ID: On Thu, Apr 7, 2016 at 11:44 AM, Nathaniel Smith wrote: > No, __index__ is the protocol for "do a safe coerce to integer". The name > is misleading, but its use in non-indexing contexts is well established. > E.g. > > " ab" * obj > > will return a string with obj.__index__() repetitions. > A good argument for Chris A's proposal over on python-ideas to have a dunder method for "coerce to a lossless string", that could be used for Path, but also for who knows what else? As I see it , exactly the same as the __fspath__ idea, except that we'd use a name that made it clear you might want to use it for other things (and str would grow that method...) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Apr 8 01:59:43 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Apr 2016 17:59:43 +1200 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> Message-ID: <570748CF.3090503@canterbury.ac.nz> Chris Angelico wrote: > -1 for __os_path__, unless it's reasonable to justify it as "most of > the standard library uses Path objects, but os.path uses strings, so > before you pass a Path to anything in os.path, you call path.ospath() > on it, which calls __os_path__()". A less roundabout interpretation would be that it returns the path in a form that is directly acceptable to the OS. BTW, if __fspath__ is acceptable, __ospath__ (without the embedded _) should be as well. -- Greg From ethan at stoneleaf.us Fri Apr 8 02:27:28 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 07 Apr 2016 23:27:28 -0700 Subject: [Python-Dev] summary: a Path protocol Message-ID: <57074F50.7080407@stoneleaf.us> The discussion has ranged all over, so let me try to sum up: Name: __ospath__ Method or attribute? Method (implementations are of course free to pre-build and/or cache the value) Built-in? no, rather a function in pathlib - ospath() Add the method/attribute to str? Not necessary -- but if somebody else wants to do that part I am not opposed Expand the C API to have something like PyObject_Path()? Yes - and if I understood correctly this function will do the same as pathlib.ospath(), just at the C level? And what will its name be, exactly? -- ~Ethan~ From victor.stinner at gmail.com Fri Apr 8 02:35:54 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Apr 2016 08:35:54 +0200 Subject: [Python-Dev] summary: a Path protocol In-Reply-To: <57074F50.7080407@stoneleaf.us> References: <57074F50.7080407@stoneleaf.us> Message-ID: Sorry, I don't have time to read the whole discussion. What is the problem with adding a __str__ to pathlib? Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Apr 8 02:57:55 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 8 Apr 2016 16:57:55 +1000 Subject: [Python-Dev] summary: a Path protocol In-Reply-To: References: <57074F50.7080407@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 4:35 PM, Victor Stinner wrote: > Sorry, I don't have time to read the whole discussion. What is the problem > with adding a __str__ to pathlib? > > Victor Everything else has __str__ too, so you run the risk of open(["Hello", "World"], "w") working and doing something weird. Or of passing an open file object to something that was expecting a file name, and having *that* work too. Calling str(p) on something that ought to be either a Path or a string should raise an exception if given something else. ChrisA From ncoghlan at gmail.com Fri Apr 8 05:50:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Apr 2016 19:50:04 +1000 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On 7 April 2016 at 03:26, Brett Cannon wrote: > WIth Ethan volunteering to do the work to help make a path protocol a thing > -- and I'm willing to help along with propagating this through the stdlib > where I think Serhiy might be interested in helping as well -- and a seeming > consensus this is a good idea, it seems like this proposal has a chance of > actually coming to fruition. > > Now we need clear details. :) Some open questions are: > > Name: __path__, __fspath__, or something else? __fspath__ > Method or attribute? (changes what kind of one-liner you might use in > libraries, but I think historically all protocols have been methods and the > serialized string representation might be costly to build) Method, as long as there's a helper function somewhere > Built-in? (name is dependent on #1 if we add one) os.fspath (alongside os.fsencode and os.fsdecode) (Putting this in a module low in the dependency stack makes it easy for other modules to access without pulling in all of pathlib's dependencies) > Add the method/attribute to str? (I assume so, much like __index__() is on > int, but I have not seen it explicitly stated so I would rather clarify it) Makes sense > Expand the C API to have something like PyObject_Path()? PyUnicode_FromFSPath, perhaps? The return type is well-defined here, so it can be done as an alternate constructor, and the C API counterparts of os.fsdecode and os.fsencode are PyUnicode functions (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault) > Some people have asked for the pathlib PEP to have a more flushed out > reasoning as to why pathlib doesn't inherit from str. If Antoine doesn't > want to do it I can try to instil my blog post into a more succinct > paragraph or two and update the PEP myself. > > Is this going to require a PEP or if we can agree on the points here are we > just going to do it? If we think it requires a PEP I'm willing to write it, > but I obviously have no issue if we skip that step either. :) It's worth summarising in a PEP at least for communications purposes - very easy for folks that don't follow python-dev to miss otherwise. Plus my specific API suggestions are pretty different from Ethan's :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Fri Apr 8 09:31:49 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Apr 2016 15:31:49 +0200 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: Please write a new PEP. The topic looks to be discussed since many months by many different people on different mailing list. A PEP is a good standard to take a decision and it became clear that a decision must be taken for pathlib. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Apr 8 09:45:36 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Apr 2016 15:45:36 +0200 Subject: [Python-Dev] Other pathlib improvements? was: When should pathlib stop being provisional? In-Reply-To: <57069293.9040800@python.org> References: <57069293.9040800@python.org> Message-ID: FYI the doc of the builtin functions is the #1 in stats of docs python.org. I also read this doc every week, even if I consider that I know well Python. IMHO it's not an issue to regulary read the doc. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Apr 8 09:56:04 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Apr 2016 15:56:04 +0200 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: I like __fspath__ because it looks like os.fsencode() and os.fsdecode(). Please no builtin function, we have enough of them, but make sure that the __fspath__ is accepted in all functions expecting a filename. If you consider that a function would make your change simpler, I suggest to add os.fspath(): if isinstance(obj, str): return obj try: return obj.__fspath__ except AttributeError: raise TypeError(...) Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon+python-dev at unequivocal.co.uk Fri Apr 8 10:18:47 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Fri, 8 Apr 2016 15:18:47 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) Message-ID: <20160408141847.GQ4951@unequivocal.co.uk> I've made another attempt at Python sandboxing, which does something which I've not seen tried before - using the 'ast' module to do static analysis of the untrusted code before it's executed, to prevent most of the sneaky tricks that have been used to break out of past attempts at sandboxes. In short, I'm turning Python's usual "gentleman's agreement" that you should not access names and attributes that are indicated as private by starting with an underscore into a rigidly enforced rule: try and access anything starting with an underscore and your code will not be run. Anyway the code is at https://github.com/jribbens/unsafe It requires Python 3.4 or later (it could probably be made to work on Python 2.7 as well, but it would need some changes). I would be very interested to see if anyone can manage to break it. Bugs which are trivially fixable are of course welcomed, but the real question is: is this approach basically sound, or is it fundamentally unworkable? From p.f.moore at gmail.com Fri Apr 8 10:37:45 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 8 Apr 2016 15:37:45 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: On 8 April 2016 at 15:18, Jon Ribbens wrote: > I would be very interested to see if anyone can manage to break it. > Bugs which are trivially fixable are of course welcomed, but the real > question is: is this approach basically sound, or is it fundamentally > unworkable? What are the limitations? It seems to even block "import" which seems over-zealous (no import math?) Paul From jon+python-dev at unequivocal.co.uk Fri Apr 8 10:55:36 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Fri, 8 Apr 2016 15:55:36 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160408145536.GA17895@unequivocal.co.uk> On Fri, Apr 08, 2016 at 03:37:45PM +0100, Paul Moore wrote: > On 8 April 2016 at 15:18, Jon Ribbens wrote: > > I would be very interested to see if anyone can manage to break it. > > Bugs which are trivially fixable are of course welcomed, but the real > > question is: is this approach basically sound, or is it fundamentally > > unworkable? > > What are the limitations? It seems to even block "import" which seems > over-zealous (no import math?) The restrictions are: Of the builtins, __import__, compile, globals, input, locals, memoryview, open, print, type and vars are unavailable (and some of the exceptions, but mostly because they're irrelevant). You cannot access any name or attribute which starts with "_", or is called "gi_frame" or "gi_code". You cannot use the "with" statement (although it's possible it might be safe for me to add that back in if I also disallow access to attributes called "tb_frame"). Importing modules is fundamentally unsafe because the untrusted code might alter the module, and the altered version would then be used by the containing application. My code has a "_copy_module" function which copies (some of) the contents of modules, so some sort of import functionality of a white-list of modules could be added using this, but there's no point in me going through working out which modules are safe to white-list until I'm vaguely confident that my approach isn't fundamentally broken in the first place. From arthur at darcet.fr Fri Apr 8 11:21:38 2016 From: arthur at darcet.fr (Arthur Darcet) Date: Fri, 8 Apr 2016 17:21:38 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: On 8 April 2016 at 16:18, Jon Ribbens wrote: > I've made another attempt at Python sandboxing, which does something > which I've not seen tried before - using the 'ast' module to do static > analysis of the untrusted code before it's executed, to prevent most > of the sneaky tricks that have been used to break out of past attempts > at sandboxes. > > In short, I'm turning Python's usual "gentleman's agreement" that you > should not access names and attributes that are indicated as private > by starting with an underscore into a rigidly enforced rule: try and > access anything starting with an underscore and your code will not be > run. > > Anyway the code is at https://github.com/jribbens/unsafe > It requires Python 3.4 or later (it could probably be made to work on > Python 2.7 as well, but it would need some changes). > > I would be very interested to see if anyone can manage to break it. > Bugs which are trivially fixable are of course welcomed, but the real > question is: is this approach basically sound, or is it fundamentally > unworkable? > If i'm not mistaken, this breaks out: > exec('open("out", "w").write("a")', {}) because if the second argument of exec does not contain a __builtins__ key, then a copy of the original builtins module is inserted: https://docs.python.org/3/library/functions.html#exec -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Apr 8 11:33:39 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Apr 2016 08:33:39 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: <5707CF53.5060502@stoneleaf.us> On 04/08/2016 02:50 AM, Nick Coghlan wrote: >> Built-in? (name is dependent on #1 if we add one) > > os.fspath (alongside os.fsencode and os.fsdecode) I like this better. >> Add the method/attribute to str? (I assume so, much like __index__() is on >> int, but I have not seen it explicitly stated so I would rather clarify it) > > Makes sense What will this do? Return a Path or a str? I don't think we need either. >> Expand the C API to have something like PyObject_Path()? > > PyUnicode_FromFSPath, perhaps? The return type is well-defined here, > so it can be done as an alternate constructor, and the C API > counterparts of os.fsdecode and os.fsencode are PyUnicode functions > (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault) So this will do the same thing as os.fspath() at the C level, yes? > It's worth summarising in a PEP at least for communications purposes - > very easy for folks that don't follow python-dev to miss otherwise. > Plus my specific API suggestions are pretty different from Ethan's :) *sigh* Okay. -- ~Ethan~ From brett at python.org Fri Apr 8 11:41:30 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 15:41:30 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <5707CF53.5060502@stoneleaf.us> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <5707CF53.5060502@stoneleaf.us> Message-ID: On Fri, 8 Apr 2016 at 08:33 Ethan Furman wrote: > On 04/08/2016 02:50 AM, Nick Coghlan wrote: > > >> Built-in? (name is dependent on #1 if we add one) > > > > os.fspath (alongside os.fsencode and os.fsdecode) > > I like this better. > > > >> Add the method/attribute to str? (I assume so, much like __index__() is > on > >> int, but I have not seen it explicitly stated so I would rather clarify > it) > > > > Makes sense > > What will this do? Return a Path or a str? I don't think we need either. > When I brought this up it was to return self. > > > >> Expand the C API to have something like PyObject_Path()? > > > > PyUnicode_FromFSPath, perhaps? The return type is well-defined here, > > so it can be done as an alternate constructor, and the C API > > counterparts of os.fsdecode and os.fsencode are PyUnicode functions > > (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault) > > So this will do the same thing as os.fspath() at the C level, yes? > Yes. > > > > It's worth summarising in a PEP at least for communications purposes - > > very easy for folks that don't follow python-dev to miss otherwise. > > Plus my specific API suggestions are pretty different from Ethan's :) > > *sigh* Okay > Chris Angelico and I have been asked by Guido to work together to come up with a proposal after all the discussions are finished and it will most likely be a patch to the pathlib PEP. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon+python-dev at unequivocal.co.uk Fri Apr 8 11:44:15 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Fri, 8 Apr 2016 16:44:15 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160408154415.GB17895@unequivocal.co.uk> On Fri, Apr 08, 2016 at 05:21:38PM +0200, Arthur Darcet wrote: > If i'm not mistaken, this breaks out: > > exec('open("out", "w").write("a")', {}) > because if the second argument of exec does not contain a __builtins__ > key, then a copy of the original builtins module is inserted: > https://docs.python.org/3/library/functions.html#exec Ah, that's a good point. I did think allowing eval/exec was a bit ambitious. I've updated it to disallow passing namespace arguments to them. From koriakin at 0x04.net Fri Apr 8 11:49:12 2016 From: koriakin at 0x04.net (=?UTF-8?Q?Marcin_Ko=c5=9bcielnicki?=) Date: Fri, 8 Apr 2016 17:49:12 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <5707D2F8.9080901@0x04.net> On 08/04/16 16:18, Jon Ribbens wrote: > I've made another attempt at Python sandboxing, which does something > which I've not seen tried before - using the 'ast' module to do static > analysis of the untrusted code before it's executed, to prevent most > of the sneaky tricks that have been used to break out of past attempts > at sandboxes. > > In short, I'm turning Python's usual "gentleman's agreement" that you > should not access names and attributes that are indicated as private > by starting with an underscore into a rigidly enforced rule: try and > access anything starting with an underscore and your code will not be > run. > > Anyway the code is at https://github.com/jribbens/unsafe > It requires Python 3.4 or later (it could probably be made to work on > Python 2.7 as well, but it would need some changes). > > I would be very interested to see if anyone can manage to break it. > Bugs which are trivially fixable are of course welcomed, but the real > question is: is this approach basically sound, or is it fundamentally > unworkable? That one is trivially fixable, but here goes: async def a(): global c c = b.cr_frame.f_back.f_back.f_back b = a() b.send(None) c.f_builtins['print']('broken') Also, if the point of giving me a subclass of datetime is to prevent access to the actual class, that can be circumvented: >>> real_datetime = datetime.datetime.mro()[1] >>> real_datetime But I'm not sure what good that is. From chris.barker at noaa.gov Fri Apr 8 12:04:23 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 8 Apr 2016 09:04:23 -0700 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan wrote: > On 7 April 2016 at 03:26, Brett Cannon wrote: > > > Method or attribute? (changes what kind of one-liner you might use in > > libraries, but I think historically all protocols have been methods and > the > > serialized string representation might be costly to build) > couldn't it be a property? > Method, as long as there's a helper function somewhere what has the helper function got to do with whether it's a method or attribute (would we call a property an attribute here?) > Built-in? (name is dependent on #1 if we add one) > > os.fspath (alongside os.fsencode and os.fsdecode) > > (Putting this in a module low in the dependency stack makes it easy > for other modules to access without pulling in all of pathlib's > dependencies) Iike that -- though still =0.5 on having one at all -- this is only going to be used by the stdlib and other path-using libraries, not user code -- is that that hard to call obj.__fspath__() ? > Add the method/attribute to str? (I assume so, much like __index__() is on > > int, but I have not seen it explicitly stated so I would rather clarify > it) > I thought the whole point off all this is that not any old string can be a path! (whereas any int can be an index). Unless we go with Chris A's suggestion that this be a more generic lossless string protocol, rather than just for paths. > It's worth summarising in a PEP at least for communications purposes - > very easy for folks that don't follow python-dev to miss otherwise. > I'd say add it to the existing pathlib PEP -- along with the extra discussion of why Path does not inherit from str. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Apr 8 12:08:40 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 8 Apr 2016 18:08:40 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160408160840.E1D44568A7@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-04-01 - 2016-04-08) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5477 ( +6) closed 32993 (+22) total 38470 (+28) Open issues with patches: 2381 Issues opened (23) ================== #26686: email.parser stops parsing headers too soon. http://bugs.python.org/issue26686 opened by msapiro #26687: Use Py_RETURN_NONE in sqlite3 module http://bugs.python.org/issue26687 opened by berker.peksag #26689: Add `has_flag` method to `distutils.CCompiler` http://bugs.python.org/issue26689 opened by sylvain.corlay #26692: cgroups support in multiprocessing http://bugs.python.org/issue26692 opened by Satrajit Ghosh #26693: Exception ignored in: "may NOT be returned http://bugs.python.org/issue26479 closed by Samuel Colvin #26509: asyncio: spurious ConnectionAbortedError logged on Windows http://bugs.python.org/issue26509 closed by haypo #26586: Simple enhancement to BaseHTTPRequestHandler http://bugs.python.org/issue26586 closed by martin.panter #26671: Clean up path_converter in posixmodule.c http://bugs.python.org/issue26671 closed by serhiy.storchaka #26673: Tkinter error when opening IDLE configuration menu http://bugs.python.org/issue26673 closed by terry.reedy #26678: Incorrect linking to elements in datetime package http://bugs.python.org/issue26678 closed by martin.panter #26679: curses: Descripton of KEY_NPAGE and KEY_PPAGE inverted http://bugs.python.org/issue26679 closed by berker.peksag #26688: unittest2 referenced in unittest.mock documentation http://bugs.python.org/issue26688 closed by berker.peksag #26690: PyUnicode_Decode breaks when Python / sqlite3 is built with sq http://bugs.python.org/issue26690 closed by zzzeek #26691: Update the typing module to match what's in github.com/python/ http://bugs.python.org/issue26691 closed by gvanrossum #26709: Year 2038 problem in plistlib http://bugs.python.org/issue26709 closed by serhiy.storchaka #26713: Change f-literal grammar so that escaping isn???t possible or http://bugs.python.org/issue26713 closed by r.david.murray From k7hoven at gmail.com Fri Apr 8 12:02:04 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 8 Apr 2016 19:02:04 +0300 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) Message-ID: Nick Coghlan wrote: > On 7 April 2016 at 03:26, Brett Cannon wrote: >> >> Name: __path__, __fspath__, or something else? > > __fspath__ > I think I might like this dunder name because it does not clutter the list of regular methods and attributes, and is perhaps more pythonic. >> Method or attribute? (changes what kind of one-liner you might use in >> libraries, but I think historically all protocols have been methods and the >> serialized string representation might be costly to build) > > Method, as long as there's a helper function somewhere As a further minor benefit of it being a method, it may be easier to distinguish it from from `__path__`, which is an iterable attribute. >> Built-in? (name is dependent on #1 if we add one) > > os.fspath (alongside os.fsencode and os.fsdecode) > > (Putting this in a module low in the dependency stack makes it easy > for other modules to access without pulling in all of pathlib's > dependencies) Strong +1 on putting it in os. This should also be implemented in DirEntry, instances of which are "yielded" by os.scandir. Also, you have a strong case regarding naming with the 'fs' prefix. It is also easier to read fspath as f-s-path than it is to read ospath as o-s-path, because ospath could also be pronounced as a single (meaningless?) word. I'm still thinking a little bit about 'pathname', which to me sounds more like a string than fspath does [1]. It would be nice to have the string/path distinction especially when pathlib adoption grows larger. But who knows, maybe somewhere in the far future, no-one will care much about fspath, fsencode, fsdecode or os.path. >> Add the method/attribute to str? (I assume so, much like __index__() is on >> int, but I have not seen it explicitly stated so I would rather clarify it) > > Makes sense If added to str, it should also be added to bytes. But will that then return str or bytes? See also the next point. > Expand the C API to have something like PyObject_Path()? > > PyUnicode_FromFSPath, perhaps? The return type is well-defined here, > so it can be done as an alternate constructor, and the C API > counterparts of os.fsdecode and os.fsencode are PyUnicode functions > (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault) What about DirEntry, which may have a bytes representation? I would expect the function return type of os.fspath to be Union[str, bytes], unless bytes pathnames are decoded with surrogate escapes. [1] https://mail.python.org/pipermail/python-ideas/2016-April/039595.html PS. I have been reading this list occasionally on the google groups mirror, and I now subscribed to it just to send this. (BTW, I probably broke the thread, as I did not have Nick's email in my inbox to reply to. Sorry about that.) I'll have to mention that I was surprised, to say the least, to find that the pathlib discussion had moved here from python-ideas, where I had mentioned I was working on a proposal. Then, I also found that the solution discussed here was seemingly an improved version of what I had proposed on python-ideas somewhat earlier [1], but did not get any reactions to. While I can only make guesses about what happened, these kinds of things easily make you go from "Hey, maybe I'll be able to do something to improve Python!" to "These people don't seem to want me here or appreciate my efforts.". Not to accuse anyone in particular; just to let people know. Anyway, I somehow got sucked into thinking deeply about pathlib etc. (which I do use). Not that I really have much at stake here, except spending ridiculous amounts of time thinking about paths, mainly during my Easter holidays and after that. I really had a hard time explaining to friends and family what the heck I was doing ;). From rosuav at gmail.com Fri Apr 8 12:20:49 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 9 Apr 2016 02:20:49 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens wrote: > Anyway the code is at https://github.com/jribbens/unsafe > It requires Python 3.4 or later (it could probably be made to work on > Python 2.7 as well, but it would need some changes). Not being a security expert, I'm not the best one to try to break it maliciously; but I can break things accidentally. Pull request sent through. :) ChrisA From ethan at stoneleaf.us Fri Apr 8 12:30:38 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Apr 2016 09:30:38 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <5707CF53.5060502@stoneleaf.us> Message-ID: <5707DCAE.9010407@stoneleaf.us> On 04/08/2016 08:41 AM, Brett Cannon wrote: > On Fri, 8 Apr 2016 at 08:33 Ethan Furman wrote: >> Brett previously queried: >>> Add the method/attribute to str? (I assume so, much like >>> __index__() is on int, but I have not seen it explicitly >>> stated so I would rather clarify it) > >> What will this do? Return a Path or a str? I don't think >> we need either. > > When I brought this up it was to return self. Okay, thanks. > Chris Angelico and I have been asked by Guido to work together to come > up with a proposal after all the discussions are finished and it will > most likely be a patch to the pathlib PEP. Cool. I wasn't looking forward to that part. -- ~Ethan~ From ethan at stoneleaf.us Fri Apr 8 12:36:03 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Apr 2016 09:36:03 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: <5707DDF3.5030106@stoneleaf.us> On 04/08/2016 09:04 AM, Chris Barker wrote: > On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan wrote: >> Method, as long as there's a helper function somewhere > > what has the helper function got to do with whether it's a method or > attribute (would we call a property an attribute here?) > >> Built-in? (name is dependent on #1 if we add one) > > os.fspath (alongside os.fsencode and os.fsdecode) > > [...] this is only going to be used by the stdlib and other > path-using libraries, not user code -- is that that hard to > call obj.__fspath__() ? 1) user code may call it 2) folks who write libraries are still users ;) 3) using __dunder__s directly is usually poor form. > I thought the whole point off all this is that not any old string can be > a path! (whereas any int can be an index). Unless we go with Chris A's > suggestion that this be a more generic lossless string protocol, rather > than just for paths. That does seem to be a valid point against str.__fspath__. -- ~Ethan~ From chris.barker at noaa.gov Fri Apr 8 12:42:49 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 8 Apr 2016 09:42:49 -0700 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: Message-ID: On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: > I'm still thinking a little bit about 'pathname', which to me sounds > more like a string than fspath does [1]. I like that a lot - or even "__pathstr__" or "__pathstring__" after all, we're making a big deal out of the fact that a path is *not a string*, but rather a string is a *representation* (or serialization) of a path. > If added to str, it should also be added to bytes. ouch! not sure I want to go there, though... > I'll have to mention that I was surprised, to > say the least, to find that the pathlib discussion had moved here from > python-ideas, where I had mentioned I was working on a proposal. ... > While I can only make > guesses about what happened, these kinds of things easily make you go > from "Hey, maybe I'll be able to do something to improve Python!" to > "These people don't seem to want me here or appreciate my efforts.". > For the record, this is pretty rare -- and it was announced on -ideas that the discussion had started up here -- maybe you missed that post? I think in this case, there were ideas over on -ideas, but then it was decided (by whom, who knows?) that the goal of supporting PAth in the stdlib was decided upon, so it was time to talk implementation, rather than ideas -- thus python-dev. In fact, the implementation turned out to be less straightforward than originally thought, so maybe it should have stayed on -ideas, but there you go. > Not to accuse anyone in particular; just to let people know. Anyway, I > somehow got sucked into thinking deeply about pathlib etc. (which I do > use). Not that I really have much at stake here, except spending > ridiculous amounts of time thinking about paths, mainly during my > Easter holidays and after that. I really had a hard time explaining to > friends and family what the heck I was doing ;). speaking only for me - thanks for your contribution -- I'm glad you found the discussion here. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon+python-dev at unequivocal.co.uk Fri Apr 8 12:47:16 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Fri, 8 Apr 2016 17:47:16 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <5707D2F8.9080901@0x04.net> References: <20160408141847.GQ4951@unequivocal.co.uk> <5707D2F8.9080901@0x04.net> Message-ID: <20160408164715.GC17895@unequivocal.co.uk> On Fri, Apr 08, 2016 at 05:49:12PM +0200, Marcin Ko?cielnicki wrote: > On 08/04/16 16:18, Jon Ribbens wrote: > That one is trivially fixable, but here goes: > > async def a(): > global c > c = b.cr_frame.f_back.f_back.f_back > > b = a() > b.send(None) > c.f_builtins['print']('broken') Ah, I've not used Python 3.5, and I can't find any documentation on this cr_frame business, but I've added cr_frame and f_back to the disallowed attributes list. > Also, if the point of giving me a subclass of datetime is to prevent access > to the actual class, that can be circumvented: > > >>> real_datetime = datetime.datetime.mro()[1] > >>> real_datetime > > > But I'm not sure what good that is. It means you can alter the datetime class that is used by the containing application, which is bad - you could lie to it about what day it is for example ;-) I've made it so instead of a direct subclass it now makes an intermediate subclass which makes mro() return an empty list. From brett at python.org Fri Apr 8 13:26:36 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 17:26:36 +0000 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> Message-ID: On Fri, 8 Apr 2016 at 09:05 Chris Barker wrote: > On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan wrote: > >> On 7 April 2016 at 03:26, Brett Cannon wrote: >> > > >> > Method or attribute? (changes what kind of one-liner you might use in >> > libraries, but I think historically all protocols have been methods and >> the >> > serialized string representation might be costly to build) >> > > couldn't it be a property? > A property is a method pretending to be an attribute, so yes. :) > > >> Method, as long as there's a helper function somewhere > > > what has the helper function got to do with whether it's a method or > attribute (would we call a property an attribute here?) > Yes, a property is an attribute in this instance. And it somewhat tweaks how simple of a one-liner is needed which in turn makes the function either nearly redundant or helpful. With an attribute: getattr(path, '__ospath__', path) With a method: path.__ospath__() if hasattr(path, '__ospath__') else path > > > Built-in? (name is dependent on #1 if we add one) >> >> os.fspath (alongside os.fsencode and os.fsdecode) >> >> (Putting this in a module low in the dependency stack makes it easy >> for other modules to access without pulling in all of pathlib's >> dependencies) > > > Iike that -- though still =0.5 on having one at all -- this is only going > to be used by the stdlib and other path-using libraries, not user code -- > is that that hard to call obj.__fspath__() ? > With a function we can add some type checking so that you know you are getting back a string and not something else like an file descriptor int or something. > > > Add the method/attribute to str? (I assume so, much like __index__() is >> on >> > int, but I have not seen it explicitly stated so I would rather clarify >> it) >> > > I thought the whole point off all this is that not any old string can be a > path! (whereas any int can be an index). Unless we go with Chris A's > suggestion that this be a more generic lossless string protocol, rather > than just for paths. > The whole point is to not treat a path object like any old string. We still have to support a string someone created that is a valid path. Remember, what we're trying to avoid is people simply doing `str(path)` everywhere since that works with e.g. None. > > >> It's worth summarising in a PEP at least for communications purposes - >> very easy for folks that don't follow python-dev to miss otherwise. >> > > I'd say add it to the existing pathlib PEP -- along with the extra > discussion of why Path does not inherit from str. > That's the plan. -Brett > > -CHB > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Apr 8 13:32:23 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 17:32:23 +0000 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: Message-ID: On Fri, 8 Apr 2016 at 09:13 Koos Zevenhoven wrote: > Nick Coghlan wrote: > > On 7 April 2016 at 03:26, Brett Cannon wrote: > >> > >> Name: __path__, __fspath__, or something else? > > > > __fspath__ > > > > I think I might like this dunder name because it does not clutter the > list of regular methods and attributes, and is perhaps more pythonic. > > >> Method or attribute? (changes what kind of one-liner you might use in > >> libraries, but I think historically all protocols have been methods and > the > >> serialized string representation might be costly to build) > > > > Method, as long as there's a helper function somewhere > > As a further minor benefit of it being a method, it may be easier to > distinguish it from from `__path__`, which is an iterable attribute. > > >> Built-in? (name is dependent on #1 if we add one) > > > > os.fspath (alongside os.fsencode and os.fsdecode) > > > > (Putting this in a module low in the dependency stack makes it easy > > for other modules to access without pulling in all of pathlib's > > dependencies) > > Strong +1 on putting it in os. This should also be implemented in > DirEntry, instances of which are "yielded" by os.scandir. > > Also, you have a strong case regarding naming with the 'fs' prefix. It > is also easier to read fspath as f-s-path than it is to read ospath as > o-s-path, because ospath could also be pronounced as a single > (meaningless?) word. > > I'm still thinking a little bit about 'pathname', which to me sounds > more like a string than fspath does [1]. It would be nice to have the > string/path distinction especially when pathlib adoption grows larger. > But who knows, maybe somewhere in the far future, no-one will care > much about fspath, fsencode, fsdecode or os.path. > > >> Add the method/attribute to str? (I assume so, much like __index__() is > on > >> int, but I have not seen it explicitly stated so I would rather clarify > it) > > > > Makes sense > > If added to str, it should also be added to bytes. But will that then > return str or bytes? See also the next point. > > > Expand the C API to have something like PyObject_Path()? > > > > PyUnicode_FromFSPath, perhaps? The return type is well-defined here, > > so it can be done as an alternate constructor, and the C API > > counterparts of os.fsdecode and os.fsencode are PyUnicode functions > > (specifically PyUnicode_DecodeFSDefault and PyUnicode_EncodeFSDefault) > > What about DirEntry, which may have a bytes representation? I would > expect the function return type of os.fspath to be Union[str, bytes], > unless bytes pathnames are decoded with surrogate escapes. > > [1] https://mail.python.org/pipermail/python-ideas/2016-April/039595.html > > > PS. I have been reading this list occasionally on the google groups > mirror, and I now subscribed to it just to send this. (BTW, I probably > broke the thread, as I did not have Nick's email in my inbox to reply > to. Sorry about that.) I'll have to mention that I was surprised, to > say the least, to find that the pathlib discussion had moved here from > python-ideas, where I had mentioned I was working on a proposal. Then, > I also found that the solution discussed here was seemingly an > improved version of what I had proposed on python-ideas somewhat > earlier [1], but did not get any reactions to. While I can only make > guesses about what happened, these kinds of things easily make you go > from "Hey, maybe I'll be able to do something to improve Python!" to > "These people don't seem to want me here or appreciate my efforts.". > Not to accuse anyone in particular; just to let people know. Anyway, I > somehow got sucked into thinking deeply about pathlib etc. (which I do > use). Not that I really have much at stake here, except spending > ridiculous amounts of time thinking about paths, mainly during my > Easter holidays and after that. I really had a hard time explaining to > friends and family what the heck I was doing ;). > Since I kicked up the discussion here on python-dev, I can explain what happened. After the python-ideas threads kicked up I realized I was not using pathlib in importlib and there were a handful of places it could be supported. But since pathlib is provisional I didn't want to have to start making the stdlib support it if we removed the whole module itself. So I simply asked over here on python-dev what it would take to remove the provisional label from pathlib. People then pulled over the python-ideas discussion of what people were upset about in regards to pathlib to help decide what it would require to remove the provisional label and the conversation forked (I also assumed Guido and others had muted the discussion over on python-ideas so it would have been a new thread somewhere regardless). And then when I realized what had happened I was going to reply to one of your emails on python-ideas to point out the bifurcation but someone beat me to it. So the whole thing just became a tangled mess of discussion. :) I viewed the threads on improving pathlib as separate from a discussion of what the requirements were to remove the provisional label and very specific to python-dev since this isn't an idea of a concrete development/maintenance question, but people tied the two together and that's how we ended up here. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon+python-dev at unequivocal.co.uk Fri Apr 8 13:34:49 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Fri, 8 Apr 2016 18:34:49 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160408173449.GD17895@unequivocal.co.uk> On Sat, Apr 09, 2016 at 02:20:49AM +1000, Chris Angelico wrote: > On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens > wrote: > > Anyway the code is at https://github.com/jribbens/unsafe > > It requires Python 3.4 or later (it could probably be made to work on > > Python 2.7 as well, but it would need some changes). > > Not being a security expert, I'm not the best one to try to break it > maliciously; but I can break things accidentally. Pull request sent > through. :) Thanks, I've merged that in. From brett at python.org Fri Apr 8 13:34:57 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 17:34:57 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <5707DDF3.5030106@stoneleaf.us> References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <5707DDF3.5030106@stoneleaf.us> Message-ID: On Fri, 8 Apr 2016 at 09:39 Ethan Furman wrote: > On 04/08/2016 09:04 AM, Chris Barker wrote: > > On Fri, Apr 8, 2016 at 2:50 AM, Nick Coghlan wrote: > > >> Method, as long as there's a helper function somewhere > > > > what has the helper function got to do with whether it's a method or > > attribute (would we call a property an attribute here?) > > > >> Built-in? (name is dependent on #1 if we add one) > > > > os.fspath (alongside os.fsencode and os.fsdecode) > > > > [...] this is only going to be used by the stdlib and other > > path-using libraries, not user code -- is that that hard to > > call obj.__fspath__() ? > > 1) user code may call it > 2) folks who write libraries are still users ;) > 3) using __dunder__s directly is usually poor form. > > > I thought the whole point off all this is that not any old string can be > > a path! (whereas any int can be an index). Unless we go with Chris A's > > suggestion that this be a more generic lossless string protocol, rather > > than just for paths. > > That does seem to be a valid point against str.__fspath__. > Yep, and I'm expecting we won't want that at this point. The fact that paths need strings for low-level OS stuff is a historical and technical detail, so no need to drag the entire str type into it if we can provide a reasonable helper function (for either the ABC or magic method solution). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Apr 8 13:36:13 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Apr 2016 10:36:13 -0700 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: Message-ID: <5707EC0D.907@stoneleaf.us> On 04/08/2016 09:42 AM, Chris Barker wrote: > On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: >> While I can only make guesses about what happened, these kinds of >> things easily make you go from "Hey, maybe I'll be able to do something >> to improve Python!" to "These people don't seem to want me here or >> appreciate my efforts.". Ouch, sorry about that. Glad to have you on -Dev, too. -- ~Ethan~ From k7hoven at gmail.com Fri Apr 8 13:46:14 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 8 Apr 2016 20:46:14 +0300 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: Message-ID: On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: > On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: >> >> I'm still thinking a little bit about 'pathname', which to me sounds >> more like a string than fspath does [1]. > > > I like that a lot - or even "__pathstr__" or "__pathstring__" > > after all, we're making a big deal out of the fact that a path is *not a > string*, but rather a string is a *representation* (or serialization) of a > path. For me, the point here is the reverse: that any str is not a path, and that it is misleading to call it *path* when whole point is to make it *not* a specialized path object but a plain string. I think it's ok to think of a path as special kind of string. For instance, an URI is explicitly defined as a *sequence of characters*, and URIs can be thought of as a more recent, improved and broadened concept than paths. This is the point of view I took in my recent proposal, but I don't think it's the only valid way to think about paths "in theory". I like the "serialization" interpretation as well, but i tend to think that that string serialization is what is called a path. Anyway, I don't think these philosophical considerations should dictate how Python is implemented. But it is always good to also have a valid theoretical point of view to back up a design decision. > For the record, this is pretty rare -- and it was announced on -ideas that > the discussion had started up here -- maybe you missed that post? If you mean in Ethan's response to my proposal, I noticed that, but the discussions here had already gone quite far by that time. Even more so by the time I had time to see what was going on. I do have to say this is not the first time I felt there was some sort of hostility towards newcomers on python-ideas. Sure, it might be partly because those people don't know the culture on the list, but I'm not sure if that should be used as an excuse. -Koos From ethan at stoneleaf.us Fri Apr 8 14:13:47 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Apr 2016 11:13:47 -0700 Subject: [Python-Dev] Pathlib enhancments - method name only Message-ID: <5707F4DB.7000501@stoneleaf.us> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote: > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: >>> I'm still thinking a little bit about 'pathname', which to me sounds >>> more like a string than fspath does. >> >> >> I like that a lot - or even "__pathstr__" or "__pathstring__" >> after all, we're making a big deal out of the fact that a path is >> *not a string*, but rather a string is a *representation* (or >> serialization) of a path. That's a decent point. So the plausible choices are, I think: - __fspath__ # File System Path -- possible confusion with Path - __fsstr__ # File System String - __fspathstr__ # File System Path String -- zero ambiguity, but # what a mouthful -- ~Ethan~ From chris.barker at noaa.gov Fri Apr 8 14:20:21 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 8 Apr 2016 11:20:21 -0700 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <5707F4DB.7000501@stoneleaf.us> References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 11:13 AM, Ethan Furman wrote: > So the plausible choices are, I think: > > - __fspath__ # File System Path -- possible confusion with Path > > - __fsstr__ # File System String I think we really need "path" in there somewhere.... > > - __fspathstr__ # File System Path String -- zero ambiguity, but > # what a mouthful > we rejected plain old __path__ because this is already ued in another context, but if we add "str" on the end, that's not longer an issue, so do we need the "fs"? __pathstr__ # pathstring -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Apr 8 14:25:25 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 18:25:25 +0000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <5707F4DB.7000501@stoneleaf.us> References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, 8 Apr 2016 at 11:13 Ethan Furman wrote: > On 04/08/2016 10:46 AM, Koos Zevenhoven wrote: > > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: > >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: > > >>> I'm still thinking a little bit about 'pathname', which to me sounds > >>> more like a string than fspath does. > >> > >> > >> I like that a lot - or even "__pathstr__" or "__pathstring__" > >> after all, we're making a big deal out of the fact that a path is > >> *not a string*, but rather a string is a *representation* (or > >> serialization) of a path. > > That's a decent point. > > So the plausible choices are, I think: > > - __fspath__ # File System Path -- possible confusion with Path > +1 > > - __fsstr__ # File System String > -1 Looks like a cat walked across my keyboard or someone trying to come up with a trendy startup name. > > - __fspathstr__ # File System Path String -- zero ambiguity, but > # what a mouthful > -1 See above. I personally still like __ospath__ as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Fri Apr 8 14:34:00 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 8 Apr 2016 21:34:00 +0300 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 9:20 PM, Chris Barker wrote: > > we rejected plain old __path__ because this is already ued in another > context, but if we add "str" on the end, that's not longer an issue, so do > we need the "fs"? > > __pathstr__ # pathstring > Or perhaps __pathstring__ in case it may be or return byte strings. -Koos From chris.barker at noaa.gov Fri Apr 8 15:03:48 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 8 Apr 2016 12:03:48 -0700 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven wrote: > > > > __pathstr__ # pathstring > > > > Or perhaps __pathstring__ in case it may be or return byte strings. > I'm fine with __pathstring__ , but I thought it was already decided that it would NOT return a bytestring! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Apr 8 15:09:03 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 9 Apr 2016 05:09:03 +1000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker wrote: > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven wrote: >> >> > >> > __pathstr__ # pathstring >> > >> >> Or perhaps __pathstring__ in case it may be or return byte strings. > > > I'm fine with __pathstring__ , but I thought it was already decided that it > would NOT return a bytestring! I sincerely hope that's been settled on. There's no reason to have this ever return anything other than a str. (Famous last words, I know.) ChrisA From brett at python.org Fri Apr 8 15:24:44 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 19:24:44 +0000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, 8 Apr 2016 at 12:10 Chris Angelico wrote: > On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker > wrote: > > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven > wrote: > >> > >> > > >> > __pathstr__ # pathstring > >> > > >> > >> Or perhaps __pathstring__ in case it may be or return byte strings. > > > > > > I'm fine with __pathstring__ , but I thought it was already decided that > it > > would NOT return a bytestring! > > I sincerely hope that's been settled on. There's no reason to have > this ever return anything other than a str. (Famous last words, I > know.) > It has been settled: pathlib.Path itself won't accept bytes anyway so there's no reason to expect this to ever return anything but str. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Fri Apr 8 16:39:31 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 08 Apr 2016 16:39:31 -0400 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: <20160408203932.740D7B14158@webabinitio.net> On Fri, 08 Apr 2016 19:24:44 -0000, Brett Cannon wrote: > On Fri, 8 Apr 2016 at 12:10 Chris Angelico wrote: > > > On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker > > wrote: > > > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven > > wrote: > > >> > > >> > > > >> > __pathstr__ # pathstring > > >> > > > >> > > >> Or perhaps __pathstring__ in case it may be or return byte strings. But there are other paths than OS file system paths. I prefer __fspath__ or __os_path__ myself. I think the fact that it is a string is implied by the fact that it is getting us the thing we can pass to the os (since Python3 deals with os paths as strings unless you specify otherwise, only converting them back to bytes, on unix, at the last moment). Heh, although I suppose one could make the argument that it should return whatever the native OS wants, and save the low level code from having to do that? Pass the path object all the way down to that "final step" in the C layer? (Just ignore me, I'm sure I'm only making trouble :) --David From k7hoven at gmail.com Fri Apr 8 16:51:00 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 8 Apr 2016 23:51:00 +0300 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <20160408203932.740D7B14158@webabinitio.net> References: <5707F4DB.7000501@stoneleaf.us> <20160408203932.740D7B14158@webabinitio.net> Message-ID: On Fri, Apr 8, 2016 at 11:39 PM, R. David Murray wrote: > On Fri, 08 Apr 2016 19:24:44 -0000, Brett Cannon wrote: >> On Fri, 8 Apr 2016 at 12:10 Chris Angelico wrote: >> >> > On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker >> > wrote: >> > > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven >> > wrote: >> > >> >> > >> > >> > >> > __pathstr__ # pathstring >> > >> > >> > >> >> > >> Or perhaps __pathstring__ in case it may be or return byte strings. > > But there are other paths than OS file system paths. I prefer > __fspath__ or __os_path__ myself. I think the fact that it is a string > is implied by the fact that it is getting us the thing we can pass > to the os (since Python3 deals with os paths as strings unless you > specify otherwise, only converting them back to bytes, on unix, at the last > moment). > > Heh, although I suppose one could make the argument that it should > return whatever the native OS wants, and save the low level code > from having to do that? Pass the path object all the way down > to that "final step" in the C layer? (Just ignore me, I'm sure > I'm only making trouble :) My favorites are fspath and pathname, and since this is a dunder methdod, it is not as crucial what it is called. I have the feeling the consensus is converging towards fspath? I'll comment on the bytes issue in the other thread. Boy these threads are all over the place! -Koos From sunnycemetery at gmail.com Fri Apr 8 17:00:06 2016 From: sunnycemetery at gmail.com (Grady Martin) Date: Fri, 8 Apr 2016 17:00:06 -0400 Subject: [Python-Dev] Incomplete Internationalization in Argparse Module Message-ID: <20160408210006.GB1484@slim> Hello, all. I was wondering if the following string was left untouched by gettext for a purpose (from line 720 of argparse.py, in class ArgumentError): 'argument %(argument_name)s: %(message)s' There may be other untranslatable strings in the argparse module, but I have yet to encounter them in the wild. Thank you. From brett at python.org Fri Apr 8 17:07:48 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 21:07:48 +0000 Subject: [Python-Dev] Incomplete Internationalization in Argparse Module In-Reply-To: <20160408210006.GB1484@slim> References: <20160408210006.GB1484@slim> Message-ID: On Fri, 8 Apr 2016 at 14:05 Grady Martin wrote: > Hello, all. I was wondering if the following string was left untouched by > gettext for a purpose (from line 720 of argparse.py, in class > ArgumentError): > > 'argument %(argument_name)s: %(message)s' > > There may be other untranslatable strings in the argparse module, but I > have yet to encounter them in the wild. > Probably so that anyone introspecting on the error message can count on somewhat of a consistent format (comes into play with doctest typically). -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Fri Apr 8 17:23:50 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sat, 9 Apr 2016 00:23:50 +0300 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <5707DDF3.5030106@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 8:34 PM, Brett Cannon wrote: > On Fri, 8 Apr 2016 at 09:39 Ethan Furman wrote: >> > I thought the whole point off all this is that not any old string can be >> > a path! (whereas any int can be an index). Unless we go with Chris A's >> > suggestion that this be a more generic lossless string protocol, rather >> > than just for paths. >> >> That does seem to be a valid point against str.__fspath__. > > Yep, and I'm expecting we won't want that at this point. The fact that paths > need strings for low-level OS stuff is a historical and technical detail, so > no need to drag the entire str type into it if we can provide a reasonable > helper function (for either the ABC or magic method solution). I'm not sure I understand what these points are about. Anyway, disallowing str or bytes as pathnames will break backwards compatibility if done at some point in the future. There's no way around that. But regarding all this talk of mine about bytes is because it has not been completely clear to me if something can break when converting a bytes path to str. I did originally propose guaranteeing a str, but I am so far only 85% convinced that that does not cause any problems. I understand that fsencode(fsdecode(bytes_path)) should always be equal to bytes_path. But can some other path operations fail when there are surrogates in the strings? And again, not to forget DirEntry, which may have a byte string path. Either way, I suppose os.fspath should accept anything that has __fspath__ or is a str or bytes (whether these have the dunder method or not). Then the options are either to return Union[str, bytes] or to always return str. And if the latter does not cause any problems, I like it way better, and it seems others would do too. And in that case it would probably be time to deprecate bytes paths on posix too (on Windows, this is already the case). But do we know that converting all paths to str does not cause any problems? -Koos From brett at python.org Fri Apr 8 17:53:18 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 21:53:18 +0000 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <5707DDF3.5030106@stoneleaf.us> Message-ID: On Fri, 8 Apr 2016 at 14:23 Koos Zevenhoven wrote: > On Fri, Apr 8, 2016 at 8:34 PM, Brett Cannon wrote: > > On Fri, 8 Apr 2016 at 09:39 Ethan Furman wrote: > >> > I thought the whole point off all this is that not any old string can > be > >> > a path! (whereas any int can be an index). Unless we go with Chris A's > >> > suggestion that this be a more generic lossless string protocol, > rather > >> > than just for paths. > >> > >> That does seem to be a valid point against str.__fspath__. > > > > Yep, and I'm expecting we won't want that at this point. The fact that > paths > > need strings for low-level OS stuff is a historical and technical > detail, so > > no need to drag the entire str type into it if we can provide a > reasonable > > helper function (for either the ABC or magic method solution). > > I'm not sure I understand what these points are about. It means we most likely won't add a new method to str in regards to this proposal. > Anyway, > disallowing str or bytes as pathnames will break backwards > compatibility if done at some point in the future. There's no way > around that. > No one is proposing disallowing str or bytes for a pre-existing API that supports either. The whole point of this is to make APIs work with strings and pathlib. > > But regarding all this talk of mine about bytes is because it has not > been completely clear to me if something can break when converting a > bytes path to str. I did originally propose guaranteeing a str, but I > am so far only 85% convinced that that does not cause any problems. Depends on your definition of "problem". If you somehow blindly converted a bytes object representing a path to a str without knowing its encoding you will definitely break someone silently (and even os.fsdecode() isn't fool-proof thanks to multiple encodings on a single file system). > I > understand that fsencode(fsdecode(bytes_path)) should always be equal > to bytes_path. But can some other path operations fail when there are > surrogates in the strings? And again, not to forget DirEntry, which > may have a byte string path. > At this point no one wants to touch bytes paths. If you need that level of control because of multiple encodings within a single file system then you will probably have to stick with managing bytes paths on your own to get the encoding right. And just because DirEntry supports bytes doesn't mean that any magic method it gains has to carry that forward (it can always raise a TypeError if necessary). > > Either way, I suppose os.fspath should accept anything that has > __fspath__ or is a str or bytes (whether these have the dunder method > or not). Maybe. I'm not sure if we will want to down that route of both bytes and str being supported out of the same function as that gets messy quickly. The main reason os.scandir() supports it is so it can be a drop-in replacement for os.listdir(). It really depends on how we choose to structure the function in terms of just doing the right thing for objects that follow the protocol or if we want to introduce some required structure for the resulting path and implement some type guarantees so you have a better idea of what you will be working with after calling the function. > Then the options are either to return Union[str, bytes] or to > always return str. And if the latter does not cause any problems, I > like it way better, and it seems others would do too. You don't have to convert byte paths to str, you can simply raise an exception in the face of them. > And in that case > it would probably be time to deprecate bytes paths on posix too (on > Windows, this is already the case). > Can't do that as Stephen Turnbull will tell you. :) At best we can marginalize the support of bytes-based paths to only low-level APIs exposed through the os package. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Apr 8 17:57:52 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 8 Apr 2016 15:57:52 -0600 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon wrote: > I personally still like __ospath__ as well. Same here. The strings are essentially an OS-dependent serialization, rather than related to a particular file system. -eric From chris.barker at noaa.gov Fri Apr 8 18:21:12 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 8 Apr 2016 15:21:12 -0700 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: <-8088150910827119255@unknownmsgid> > On Apr 8, 2016, at 3:00 PM, Eric Snow wrote: > >> On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon wrote: >> I personally still like __ospath__ as well. > > Same here. The strings are essentially an OS-dependent serialization, > rather than related to a particular file system. Huh? I though the strings were a OS-independent, human readable serialization and interchange format. Bytes would be the OS-dependent serialization. But yes, I suppose the file-system-level version would be inodes or something. But this is a string that represents a path, thus __pathstr__. And the term "path" is used all over the place (including os.path and pathlib) for this particular type of path, so I don't see why we need the "fs" or "os", other than the fact that __path__ is already taken. But I'm looking forward to using this bike shed regardless of its color, so that's the last I'll comment on that. -CHB From brett at python.org Fri Apr 8 18:23:41 2016 From: brett at python.org (Brett Cannon) Date: Fri, 08 Apr 2016 22:23:41 +0000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <-8088150910827119255@unknownmsgid> References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> Message-ID: On Fri, 8 Apr 2016 at 15:21 Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > > On Apr 8, 2016, at 3:00 PM, Eric Snow > wrote: > > > >> On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon wrote: > >> I personally still like __ospath__ as well. > > > > Same here. The strings are essentially an OS-dependent serialization, > > rather than related to a particular file system. > > Huh? I though the strings were a OS-independent, human readable > serialization and interchange format. > Depends if you use `/` or `\` as your path separator if they are truly OS-independent. :) -Brett > > Bytes would be the OS-dependent serialization. > > But yes, I suppose the file-system-level version would be inodes or > something. > > But this is a string that represents a path, thus __pathstr__. And the > term "path" is used all over the place (including os.path and pathlib) > for this particular type of path, so I don't see why we need the "fs" > or "os", other than the fact that __path__ is already taken. > > But I'm looking forward to using this bike shed regardless of its > color, so that's the last I'll comment on that. > > -CHB > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Apr 8 18:28:03 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 8 Apr 2016 16:28:03 -0600 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 3:57 PM, Eric Snow wrote: > On Fri, Apr 8, 2016 at 12:25 PM, Brett Cannon wrote: >> I personally still like __ospath__ as well. > > Same here. The strings are essentially an OS-dependent serialization, > rather than related to a particular file system. Hmm. It's important to note the distinction between a standardized representation of a path and the OS-dependent representation. That is essentially the same distinction as provided by Go's "path" vs. "path/fliepath" packages. pathlib provides an abstraction of FS paths, but does it provide a standardized representation? From what I can tell you only ever get some OS-dependent representation. All this matters because it impacts the value returned from __ospath__(). Should it return the string representation of the path for the current OS or some standardized representation? I'd expect the former. However, if that is the expectation then something like pathlib.PureWindowsPath will give you the wrong thing if your current OS is linux. pathlib.PureWindowsPath.__ospath__() would have to fail or first internally convert to pathlib.PurePosixPath? On the other hand, it seems like the caller should be in charge of deciding the required meaning. That implies that returning a standardized representation or even something like a pathlib.PureGenericPath would be more appropriate. -eric From k7hoven at gmail.com Fri Apr 8 19:05:41 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sat, 9 Apr 2016 02:05:41 +0300 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <57044567.6070308@sdamon.com> <570526CE.5080401@stoneleaf.us> <5707DDF3.5030106@stoneleaf.us> Message-ID: On Sat, Apr 9, 2016 at 12:53 AM, Brett Cannon wrote: > On Fri, 8 Apr 2016 at 14:23 Koos Zevenhoven wrote: > > At this point no one wants to touch bytes paths. If you need that level of > control because of multiple encodings within a single file system then you > will probably have to stick with managing bytes paths on your own to get the > encoding right. What does this mean? I assume you don't mean os.path.* would stop dealing with bytes? And if not, then you seem to mean that os.fspath would do nothing except call .__fspath__(). In that case, I think we should go back to it being an attribute (or property) and a variation of the now very famous idiom getattr(path, '__fspath__', path) and perhaps have os.fspath do exactly that. > And just because DirEntry supports bytes doesn't mean that any magic method > it gains has to carry that forward (it can always raise a TypeError if > necessary). No, but what if some code gets pathnames from whatever other places and passes them on to os.scandir. Whenever it happens to get a bytes path, a TypeError gets raised, but only when it picks one of the DirEntry objects and for instance tries to open(...) it. Of course, I'm not sure how common this is. > It really depends on how we choose to structure the > function in terms of just doing the right thing for objects that follow the > protocol or if we want to introduce some required structure for the > resulting path and implement some type guarantees so you have a better idea > of what you will be working with after calling the function. Do you have an example of potential 'required structure'? >> Then the options are either to return Union[str, bytes] or to >> always return str. And if the latter does not cause any problems, I >> like it way better, and it seems others would do too. > > You don't have to convert byte paths to str, you can simply raise an > exception in the face of them. > I thought the point was for existing APIs to start supporting path objects, wouldn't raising an exception break the API? -Koos From v+python at g.nevcal.com Fri Apr 8 19:09:13 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 8 Apr 2016 16:09:13 -0700 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: <57083A19.3070808@g.nevcal.com> On 4/8/2016 3:28 PM, Eric Snow wrote: > All this matters because it impacts the value returned from > __ospath__(). Should it return the string representation of the path > for the current OS or some standardized representation? I'd expect > the former. However, if that is the expectation then something like > pathlib.PureWindowsPath will give you the wrong thing if your current > OS is linux. pathlib.PureWindowsPath.__ospath__() would have to fail > or first internally convert to pathlib.PurePosixPath? Now that Windows 10++ will run Ubuntu apps, will Python be able to tell the difference for when it should return Windows-format paths and Posix-format paths? (I'm sure the answer is yes, the Python-for-Ubuntu running on Windows would do the latter, and the Python-for-Windows would do the former. Although, it is not clear what sys.platform will return, yet...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Fri Apr 8 19:05:01 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 8 Apr 2016 16:05:01 -0700 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: <5708391D.5030503@g.nevcal.com> On 4/8/2016 11:25 AM, Brett Cannon wrote: > I personally still like __ospath__ as well. +1. Because they aren't always files... but what else they might be is os dependent. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Apr 8 19:33:31 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Apr 2016 16:33:31 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: References: <570526CE.5080401@stoneleaf.us> <5707DDF3.5030106@stoneleaf.us> Message-ID: <57083FCB.1000808@stoneleaf.us> On 04/08/2016 04:05 PM, Koos Zevenhoven wrote: > On Sat, Apr 9, 2016 at 12:53 AM, Brett Cannon wrote: >> On Fri, 8 Apr 2016 at 14:23 Koos Zevenhoven wrote: >> >> At this point no one wants to touch bytes paths. If you need that level of >> control because of multiple encodings within a single file system then you >> will probably have to stick with managing bytes paths on your own to get the >> encoding right. > > What does this mean? I assume you don't mean os.path.* would stop > dealing with bytes? No, it does not mean that. It means the stuff in place won't change, but the stuff we're adding now to integrate with Path will only support str (which is one reason why os.path isn't going to die). > And if not, then you seem to mean that os.fspath > would do nothing except call .__fspath__(). Fair point. So it should be something like this: def fspath(thing): # look for path attribute string = getattr(thing, '__fspath__', None) if string is not None: return string # not found, do we have a str or bytes object? if isinstance(thing, (str, bytes)): return thing raise TypeError('`thing` must implement the __fspath__ protocol or be an instance of str or bytes') >> And just because DirEntry supports bytes doesn't mean that any magic method >> it gains has to carry that forward (it can always raise a TypeError if >> necessary). > > No, but what if some code gets pathnames from whatever other places > and passes them on to os.scandir. Whenever it happens to get a bytes > path, a TypeError gets raised, but only when it picks one of the > DirEntry objects and for instance tries to open(...) it. Of course, > I'm not sure how common this is. Yeah, I don't think this is a good idea. Given that fspath() should be able to return bytes if bytes are passed in, DirEntry's __fspath__ could return bytes to no ill effect. I realize this may not be ideal, but throwing bytes to the wind is going to bite us in the end. After all, the idea is to make these things work with the stdlib, and the stdlib accepts bytes for path strings. -- ~Ethan~ From larry at hastings.org Fri Apr 8 20:56:10 2016 From: larry at hastings.org (Larry Hastings) Date: Fri, 8 Apr 2016 17:56:10 -0700 Subject: [Python-Dev] Question about the current implementation of str Message-ID: <5708532A.3040207@hastings.org> I have a straightforward question about the str object, specifically the PyUnicodeObject. I've tried reading the source to answer the question myself but it's nearly impenetrable. So I was hoping someone here who understands the current implementation could answer it for me. Although the str object is immutable from Python's perspective, the C object itself is mutable. For example, for dynamically-created strings the hash field may be lazy-computed and cached inside the object. I was wondering if there were other fields like this. For example, are there similar lazy-computed cached objects for the different encoded versions (utf8 utf16) of the str? What would really help an exhaustive list of the fields of a str object that may ever change after the object's initial creation. Thanks! We now return you to the debate about the pathlib module, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Apr 8 21:12:54 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 8 Apr 2016 18:12:54 -0700 Subject: [Python-Dev] Incomplete Internationalization in Argparse Module In-Reply-To: References: <20160408210006.GB1484@slim> Message-ID: That string looks like it is aimed at the developer, not the user of the program, so it makes sense not to translate it. On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon wrote: > > > On Fri, 8 Apr 2016 at 14:05 Grady Martin wrote: >> >> Hello, all. I was wondering if the following string was left untouched by >> gettext for a purpose (from line 720 of argparse.py, in class >> ArgumentError): >> >> 'argument %(argument_name)s: %(message)s' >> >> There may be other untranslatable strings in the argparse module, but I >> have yet to encounter them in the wild. > > > Probably so that anyone introspecting on the error message can count on > somewhat of a consistent format (comes into play with doctest typically). > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From brett at python.org Fri Apr 8 21:41:19 2016 From: brett at python.org (Brett Cannon) Date: Sat, 09 Apr 2016 01:41:19 +0000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <57083A19.3070808@g.nevcal.com> References: <5707F4DB.7000501@stoneleaf.us> <57083A19.3070808@g.nevcal.com> Message-ID: On Fri, Apr 8, 2016, 16:13 Glenn Linderman wrote: > On 4/8/2016 3:28 PM, Eric Snow wrote: > > All this matters because it impacts the value returned from > __ospath__(). Should it return the string representation of the path > for the current OS or some standardized representation? I'd expect > the former. However, if that is the expectation then something like > pathlib.PureWindowsPath will give you the wrong thing if your current > OS is linux. pathlib.PureWindowsPath.__ospath__() would have to fail > or first internally convert to pathlib.PurePosixPath? > > Now that Windows 10++ will run Ubuntu apps, will Python be able to tell > the difference for when it should return Windows-format paths and > Posix-format paths? > All the bits of code in Python accept / as a separator on Windows so it doesn't matter (but Ubuntu on Windows is Linux, so it will be / just like any other Linux install). > (I'm sure the answer is yes, the Python-for-Ubuntu running on Windows > would do the latter, and the Python-for-Windows would do the former. > Although, it is not clear what sys.platform will return, yet...) > It should return Linux. -Brett _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Apr 8 22:45:32 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 8 Apr 2016 20:45:32 -0600 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <57083A19.3070808@g.nevcal.com> Message-ID: On Fri, Apr 8, 2016 at 7:41 PM, Brett Cannon wrote: > > > On Fri, Apr 8, 2016, 16:13 Glenn Linderman wrote: >> >> On 4/8/2016 3:28 PM, Eric Snow wrote: >> >> All this matters because it impacts the value returned from >> __ospath__(). Should it return the string representation of the path >> for the current OS or some standardized representation? I'd expect >> the former. However, if that is the expectation then something like >> pathlib.PureWindowsPath will give you the wrong thing if your current >> OS is linux. pathlib.PureWindowsPath.__ospath__() would have to fail >> or first internally convert to pathlib.PurePosixPath? >> >> Now that Windows 10++ will run Ubuntu apps, will Python be able to tell >> the difference for when it should return Windows-format paths and >> Posix-format paths? > > > All the bits of code in Python accept / as a separator on Windows so it > doesn't matter (but Ubuntu on Windows is Linux, so it will be / just like > any other Linux install). Technically it isn't linux. :) It's the Ubuntu user-space using the linux syscalls (like normal), and those syscalls are implemented as light wrappers around the Windows kernel. They even implemented fork. On Windows. There's no linux kernel involved. > >> >> (I'm sure the answer is yes, the Python-for-Ubuntu running on Windows >> would do the latter, and the Python-for-Windows would do the former. >> Although, it is not clear what sys.platform will return, yet...) > > > It should return Linux. >From screenshots it looks like lsb_release -a returns the normal Ubuntu info [1] and uname -a says linux (don't know if that will change). [2] So yeah, sys.platform should return Linux. -eric [1] https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/ [2] https://insights.ubuntu.com/2016/03/30/ubuntu-on-windows-the-ubuntu-userspace-for-windows-developers/ From ncoghlan at gmail.com Sat Apr 9 02:48:45 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Apr 2016 16:48:45 +1000 Subject: [Python-Dev] pathlib (was: Defining a path protocol) In-Reply-To: References: Message-ID: On 8 April 2016 at 00:25, Jim J. Jewett wrote: > (1) I think the "built-in" should instead be a module-level function > in the pathlib. If you aren't already expecting pathlib paths, then > you're just expecting strings to work anyhow, and a builtin isn't > likely to be helpful. Concrete data in relation to "Why not put the helper function in pathlib?": >>> import sys >>> orig_modules = set(sys.modules) >>> "os" in orig_modules True >>> import pathlib >>> extra_dependencies = set(sys.modules) - orig_modules >>> print(sorted(extra_dependencies)) ['_collections', '_functools', '_heapq', '_operator', '_sre', 'collections', 'contextlib', 'copyreg', 'fnmatch', 'functools', 'heapq', 'itertools', 'keyword', 'ntpath', 'operator', 'pathlib', 're', 'reprlib', 'sre_compile', 'sre_constants', 'sre_parse', 'urllib', 'urllib.parse', 'weakref'] We want to be able to readily use the protocol helper in builtin modules like os and low level Python modules like os.path, which means we want it to be much lower down in the import hierarchy than pathlib. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Apr 9 02:58:54 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Apr 2016 16:58:54 +1000 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: Message-ID: On 9 April 2016 at 02:02, Koos Zevenhoven wrote: > I'm still thinking a little bit about 'pathname', which to me sounds > more like a string than fspath does [1]. It would be nice to have the > string/path distinction especially when pathlib adoption grows larger. > But who knows, maybe somewhere in the far future, no-one will care > much about fspath, fsencode, fsdecode or os.path. Ah, I like it - adding the "name" suffix nicely distinguishes the protocol from the rich path objects in pathlib. I'll catch up on Ethan's dedicated naming thread before commenting further, though :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Sat Apr 9 03:07:01 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 9 Apr 2016 09:07:01 +0200 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: Message-ID: os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a deliberate choice. I strongly suggest to only support Unicode for filenames in Python 3. So __fspath__ must only return str, or a TypeError must be raised. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Apr 9 03:16:29 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 09 Apr 2016 00:16:29 -0700 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: Message-ID: <5708AC4D.3080809@stoneleaf.us> On 04/09/2016 12:07 AM, Victor Stinner wrote: > os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a > deliberate choice. 3.5.0 scandir supports bytes: --> huh = list(scandir(b'.')) --> huh [, , , , , ] --> huh[0].path b'./minicourse-ajax-project' -- ~Ethan~ From ncoghlan at gmail.com Sat Apr 9 03:18:10 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Apr 2016 17:18:10 +1000 Subject: [Python-Dev] Question about the current implementation of str In-Reply-To: <5708532A.3040207@hastings.org> References: <5708532A.3040207@hastings.org> Message-ID: On 9 April 2016 at 10:56, Larry Hastings wrote: > > > I have a straightforward question about the str object, specifically the > PyUnicodeObject. I've tried reading the source to answer the question > myself but it's nearly impenetrable. So I was hoping someone here who > understands the current implementation could answer it for me. > > Although the str object is immutable from Python's perspective, the C object > itself is mutable. For example, for dynamically-created strings the hash > field may be lazy-computed and cached inside the object. I was wondering if > there were other fields like this. For example, are there similar > lazy-computed cached objects for the different encoded versions (utf8 utf16) > of the str? What would really help an exhaustive list of the fields of a > str object that may ever change after the object's initial creation. https://www.python.org/dev/peps/pep-0393/#specification should have most of the relevant details. Aside from the hash and the interned-or-not flag in the state, most things should be locked once the string is ready, except that generating the utf-8 and wchar_t forms is deferred until they're needed if they're not the same as the canonical form. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Apr 9 03:48:38 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Apr 2016 17:48:38 +1000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On 9 April 2016 at 04:25, Brett Cannon wrote: > On Fri, 8 Apr 2016 at 11:13 Ethan Furman wrote: >> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote: >> > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: >> >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: >> >> >>> I'm still thinking a little bit about 'pathname', which to me sounds >> >>> more like a string than fspath does. >> >> >> >> >> >> I like that a lot - or even "__pathstr__" or "__pathstring__" >> >> after all, we're making a big deal out of the fact that a path is >> >> *not a string*, but rather a string is a *representation* (or >> >> serialization) of a path. >> >> That's a decent point. >> >> So the plausible choices are, I think: >> >> - __fspath__ # File System Path -- possible confusion with Path > > +1 I like __fspath__, but I'm also sympathetic to Koos' point that we're really dealing with path *names* being produced via this protocol, rather than the paths themselves. That would bring the completely explicit "__fspathname__" into the mix, which would be comparable in length to "__getattribute__" as a magic method name (both in terms of number of syllable and number of characters). Considering the helper function usage, here's some examples in combination with os.fsencode and os.fsdecode: # Status quo for binary/text path conversions text_path = os.fsdecode(bytes_path) bytes_path = os.fsencode(text_path) # Getting a text path from an arbitrary object text_path = os.fspath(obj) # This doesn't scream "returns text!" to me text_path = os.fspathname(obj) # This does # Getting a binary path from an arbitrary object bytes_path = os.fsencode(os.fspath(obj)) bytes_path = os.fsencode(os.fspathname(obj)) I'm starting to think the semantic nudge from the "name" suffix when reading the code is worth the extra four characters when writing it (keeping in mind that the whole point of this exercise is that most folks *won't* be writing explicit conversions - the stdlib will handle it on their behalf). I also think the more explicit name helps answer some of the type signature questions that have arisen: 1. Does os.fspathname return rich Path objects? No, it returns names as str objects 2. Will file descriptors pass through os.fspathname? No, as they're not names, they're numeric descriptors. 3. Will bytes-like objects pass through os.fspathname? No, as they're not names, they're encodings of names When the name is instead "os.fspath", the appropriate answers to those three questions are far more debatable. > I personally still like __ospath__ as well. That one fails the "Is it ambiguous when spoken aloud?" test for me: if someone mentions "oh-ess-path", are they talking about os.path or __ospath__? With "eff-ess-path" or "eff-ess-path-name", that problem doesn't arise. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Sat Apr 9 03:52:24 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 9 Apr 2016 09:52:24 +0200 Subject: [Python-Dev] Question about the current implementation of str In-Reply-To: <5708532A.3040207@hastings.org> References: <5708532A.3040207@hastings.org> Message-ID: Le 9 avr. 2016 03:04, "Larry Hastings" a ?crit : > Although the str object is immutable from Python's perspective, the C object itself is mutable. For example, for dynamically-created strings the hash field may be lazy-computed and cached inside the object. Yes, the hash is computed once on demand. It doesn't matter how you build the string. > I was wondering if there were other fields like this. For example, are there similar lazy-computed cached objects for the different encoded versions (utf8 utf16) of the str? Cached utf8 is only cached when you call the C functions filling this cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it. On Windows, there is a cache for wchar_t* which is utf16. This format is used by all C functions of the Windows API (Python should only use the Unicode flavor of the Windows API). I don't recall other caches. > What would really help an exhaustive list of the fields of a str object that may ever change after the object's initial creation. I don't recall exactly what happens if a cache is created and then the string is modified. If I recall correctly, the cache is invalidated. But the hash is used as an heuristic to decide if a string is "immutable" or not, the refcount is also used by the heuristic. If the string is immutable, an operation like resize must create a new string. You can document the PEP 393 in Include/unicodeobject.h. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From sunnycemetery at gmail.com Sat Apr 9 04:25:55 2016 From: sunnycemetery at gmail.com (Grady Martin) Date: Sat, 9 Apr 2016 04:25:55 -0400 Subject: [Python-Dev] Incomplete Internationalization in Argparse Module In-Reply-To: References: <20160408210006.GB1484@slim> Message-ID: <20160409082555.GF1484@slim> I agree. However, an incorrect choice for an argument with a choices parameter results in this string. On 2016?04?08? 18?12?, Guido van Rossum wrote: > >That string looks like it is aimed at the developer, not the user of >the program, so it makes sense not to translate it. > >On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon wrote: >> >> >> On Fri, 8 Apr 2016 at 14:05 Grady Martin wrote: >>> >>> Hello, all. I was wondering if the following string was left untouched by >>> gettext for a purpose (from line 720 of argparse.py, in class >>> ArgumentError): >>> >>> 'argument %(argument_name)s: %(message)s' >>> >>> There may be other untranslatable strings in the argparse module, but I >>> have yet to encounter them in the wild. >> >> >> Probably so that anyone introspecting on the error message can count on >> somewhat of a consistent format (comes into play with doctest typically). >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > >-- >--Guido van Rossum (python.org/~guido) From storchaka at gmail.com Sat Apr 9 05:00:30 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 9 Apr 2016 12:00:30 +0300 Subject: [Python-Dev] Question about the current implementation of str In-Reply-To: References: <5708532A.3040207@hastings.org> Message-ID: On 09.04.16 10:52, Victor Stinner wrote: > Le 9 avr. 2016 03:04, "Larry Hastings" > a ?crit : > > Although the str object is immutable from Python's perspective, the C > object itself is mutable. For example, for dynamically-created strings > the hash field may be lazy-computed and cached inside the object. > > Yes, the hash is computed once on demand. It doesn't matter how you > build the string. > > > I was wondering if there were other fields like this. For example, > are there similar lazy-computed cached objects for the different encoded > versions (utf8 utf16) of the str? > > Cached utf8 is only cached when you call the C functions filling this > cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it. > > On Windows, there is a cache for wchar_t* which is utf16. This format is > used by all C functions of the Windows API (Python should only use the > Unicode flavor of the Windows API). > > I don't recall other caches. > > > What would really help an exhaustive list of the fields of a str > object that may ever change after the object's initial creation. > > I don't recall exactly what happens if a cache is created and then the > string is modified. If I recall correctly, the cache is invalidated. You must remember, some bugs with desynchronized utf8 and wchar_t* caches were fixed just few months ago. > But the hash is used as an heuristic to decide if a string is > "immutable" or not, the refcount is also used by the heuristic. If the > string is immutable, an operation like resize must create a new string. > > You can document the PEP 393 in Include/unicodeobject.h. In normal case the string object can be mutated only at creation time. But CPython uses some tricks that modifies already created strings if they have no external references and are not interned. For example "a += b" or "a = a + b" can resize the "a" string. From victor.stinner at gmail.com Sat Apr 9 05:09:37 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 9 Apr 2016 11:09:37 +0200 Subject: [Python-Dev] Question about the current implementation of str In-Reply-To: References: <5708532A.3040207@hastings.org> Message-ID: 2016-04-09 9:52 GMT+02:00 Victor Stinner : > But the hash is used as an heuristic to decide if a string is "immutable" or > not, the refcount is also used by the heuristic. If the string is immutable, > an operation like resize must create a new string. I'm talking about this private function: static int unicode_modifiable(PyObject *unicode) { assert(_PyUnicode_CHECK(unicode)); if (Py_REFCNT(unicode) != 1) return 0; if (_PyUnicode_HASH(unicode) != -1) return 0; if (PyUnicode_CHECK_INTERNED(unicode)) return 0; if (!PyUnicode_CheckExact(unicode)) return 0; #ifdef Py_DEBUG /* singleton refcount is greater than 1 */ assert(!unicode_is_singleton(unicode)); #endif return 1; } Victor From g.rodola at gmail.com Sat Apr 9 06:37:23 2016 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sat, 9 Apr 2016 12:37:23 +0200 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Fri, Apr 8, 2016 at 9:09 PM, Chris Angelico wrote: > On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker > wrote: > > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven > wrote: > >> > >> > > >> > __pathstr__ # pathstring > >> > > >> > >> Or perhaps __pathstring__ in case it may be or return byte strings. > > > > > > I'm fine with __pathstring__ , but I thought it was already decided that > it > > would NOT return a bytestring! > > I sincerely hope that's been settled on. There's no reason to have > this ever return anything other than a str. (Famous last words, I > know.) > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com > I'm kind of scared about this: scared to state and be 100% sure that bytes won't *never ever* be returned. As such I would call this __fspath__ or something, but I would definitively avoid to use "str". -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sat Apr 9 06:51:23 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sat, 9 Apr 2016 13:51:23 +0300 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: <5708AC4D.3080809@stoneleaf.us> References: <5708AC4D.3080809@stoneleaf.us> Message-ID: On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman wrote: > On 04/09/2016 12:07 AM, Victor Stinner wrote: >> >> os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a >> deliberate choice. > > > 3.5.0 scandir supports bytes: > > --> huh = list(scandir(b'.')) > --> huh > [, , b'__MACOSX'>, , , b'index.html'>] > > --> huh[0].path > b'./minicourse-ajax-project' > > Maybe it's the bytes support in scandir that should be deprecated? (And not bytes support in general, which cannot be done on posix, as I hear Stephen T. will tell me). -Koos From victor.stinner at gmail.com Sat Apr 9 08:43:19 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 9 Apr 2016 14:43:19 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design. Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/ And the LWN article: https://lwn.net/Articles/574215/ There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite). I started as you, add more and more things to a blacklist, but it doesn't work. See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Sat Apr 9 08:47:55 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sat, 9 Apr 2016 15:47:55 +0300 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: I'm with Victor here. In fact I tried (and failed) to convince Victor that the approach is entirely unworkable when he was starting, don't be the next one :-) On Sat, Apr 9, 2016 at 3:43 PM, Victor Stinner wrote: > Please don't loose time trying yet another sandbox inside CPython. It's just > a waste of time. It's broken by design. > > Please read my email about my attempt (pysandbox): > https://lwn.net/Articles/574323/ > > And the LWN article: > https://lwn.net/Articles/574215/ > > There are a lot of safe ways to run CPython inside a sandbox (and not rhe > opposite). > > I started as you, add more and more things to a blacklist, but it doesn't > work. > > See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > a list of know code to crash CPython (I don't recall the dieectory in > sources), even with the latest version of CPython. > > Victor > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > From rdmurray at bitdance.com Sat Apr 9 09:02:04 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 09 Apr 2016 09:02:04 -0400 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: <20160409130206.6E4B1B14158@webabinitio.net> On Sat, 09 Apr 2016 17:48:38 +1000, Nick Coghlan wrote: > On 9 April 2016 at 04:25, Brett Cannon wrote: > > On Fri, 8 Apr 2016 at 11:13 Ethan Furman wrote: > >> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote: > >> > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: > >> >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: > >> > >> >>> I'm still thinking a little bit about 'pathname', which to me sounds > >> >>> more like a string than fspath does. > >> >> > >> >> > >> >> I like that a lot - or even "__pathstr__" or "__pathstring__" > >> >> after all, we're making a big deal out of the fact that a path is > >> >> *not a string*, but rather a string is a *representation* (or > >> >> serialization) of a path. > >> > >> That's a decent point. > >> > >> So the plausible choices are, I think: > >> > >> - __fspath__ # File System Path -- possible confusion with Path > > > > +1 > > I like __fspath__, but I'm also sympathetic to Koos' point that we're > really dealing with path *names* being produced via this protocol, > rather than the paths themselves. > > That would bring the completely explicit "__fspathname__" into the > mix, which would be comparable in length to "__getattribute__" as a > magic method name (both in terms of number of syllable and number of > characters). I'm not going to vote -1, but for the record I have no real intuition as to what a "path name" would be. An arbitrary identifier that we're using to refer to an os path? That is, a 'filename' is the identifier we've assigned to this thing pointed to by an inode in linux, but an os path is a text representation of the path from the root filename to a specified filename. That is, the path *is* the name, so to say "path name" sounds redundant and confusing to me. --David From Nikolaus at rath.org Sat Apr 9 10:32:28 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 09 Apr 2016 07:32:28 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io> (Donald Stufft's message of "Thu, 7 Apr 2016 07:03:56 -0400") References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us> <87oa9l3dab.fsf@thinkpad.rath.org> <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io> Message-ID: <878u0mzwcj.fsf@vostro.rath.org> On Apr 07 2016, Donald Stufft wrote: >> On Apr 7, 2016, at 6:48 AM, Nikolaus Rath wrote: >> >> Does anyone anticipate any classes other than those from pathlib to come >> with such a method? > > > It seems like it would be reasonable for pathlib.Path to call fspath on the > path passed to pathlib.Path.__init__, which would mean that if other libraries > implemented __fspath__ then you could pass their path objects to pathlib and > it would just work (and similarly, if they also called fspath it would enable > interoperation between all of the various path libraries). Indeed, but my question is: is this actually going to happen? Are there going to be other libraries that will implement __fspath__, and will there be demand for pathlib to support them? Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 997 bytes Desc: not available URL: From guido at python.org Sat Apr 9 11:16:41 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 9 Apr 2016 08:16:41 -0700 Subject: [Python-Dev] Incomplete Internationalization in Argparse Module In-Reply-To: <20160409082555.GF1484@slim> References: <20160408210006.GB1484@slim> <20160409082555.GF1484@slim> Message-ID: OK, so this should be taken to the bug tracker. On Saturday, April 9, 2016, Grady Martin wrote: > I agree. However, an incorrect choice for an argument with a choices > parameter results in this string. > > On 2016?04?08? 18?12?, Guido van Rossum wrote: > >> >> That string looks like it is aimed at the developer, not the user of >> the program, so it makes sense not to translate it. >> >> On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon wrote: >> >>> >>> >>> On Fri, 8 Apr 2016 at 14:05 Grady Martin >>> wrote: >>> >>>> >>>> Hello, all. I was wondering if the following string was left untouched >>>> by >>>> gettext for a purpose (from line 720 of argparse.py, in class >>>> ArgumentError): >>>> >>>> 'argument %(argument_name)s: %(message)s' >>>> >>>> There may be other untranslatable strings in the argparse module, but I >>>> have yet to encounter them in the wild. >>>> >>> >>> >>> Probably so that anyone introspecting on the error message can count on >>> somewhat of a consistent format (comes into play with doctest typically). >>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > -- --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Apr 9 11:30:05 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 09 Apr 2016 08:30:05 -0700 Subject: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?) In-Reply-To: References: <5708AC4D.3080809@stoneleaf.us> Message-ID: <57091FFD.3070909@stoneleaf.us> On 04/09/2016 03:51 AM, Koos Zevenhoven wrote: > On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman wrote: >> 3.5.0 scandir supports bytes: > > Maybe it's the bytes support in scandir that should be deprecated? > (And not bytes support in general, which cannot be done on posix, as I > hear Stephen T. will tell me). No, scandir is a low-level function -- it needs to support bytes. -- ~Ethan~ From ethan at stoneleaf.us Sat Apr 9 11:39:30 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 09 Apr 2016 08:39:30 -0700 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <878u0mzwcj.fsf@vostro.rath.org> References: <570526CE.5080401@stoneleaf.us> <57054FFB.5070709@stoneleaf.us> <57059F7B.3090901@stoneleaf.us> <-5625672377616017435@unknownmsgid> <5705FB0C.2090705@canterbury.ac.nz> <5705FEBF.3070301@stoneleaf.us> <87oa9l3dab.fsf@thinkpad.rath.org> <12E898F1-0A5D-4504-9E7F-08A509BCAEEB@stufft.io> <878u0mzwcj.fsf@vostro.rath.org> Message-ID: <57092232.6010002@stoneleaf.us> On 04/09/2016 07:32 AM, Nikolaus Rath wrote: > On Apr 07 2016, Donald Stufft wrote: >>> On Apr 7, 2016, at 6:48 AM, Nikolaus Rath wrote: >>> >>> Does anyone anticipate any classes other than those from pathlib to come >>> with such a method? >> >> >> It seems like it would be reasonable for pathlib.Path to call fspath on the >> path passed to pathlib.Path.__init__, which would mean that if other libraries >> implemented __fspath__ then you could pass their path objects to pathlib and >> it would just work (and similarly, if they also called fspath it would enable >> interoperation between all of the various path libraries). > > Indeed, but my question is: is this actually going to happen? Are there > going to be other libraries that will implement __fspath__, and will > there be demand for pathlib to support them? There will be at least one. :) -- ~Ethan~ From ethan at stoneleaf.us Sat Apr 9 12:41:01 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 09 Apr 2016 09:41:01 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() Message-ID: <5709309D.8030007@stoneleaf.us> On 04/09/2016 12:48 AM, Nick Coghlan wrote: > Considering the helper function usage, here's some examples in > combination with os.fsencode and os.fsdecode: > > # Status quo for binary/text path conversions > text_path = os.fsdecode(bytes_path) > bytes_path = os.fsencode(text_path) > > # Getting a text path from an arbitrary object > text_path = os.fspath(obj) # This doesn't scream "returns text!" > text_path = os.fspathname(obj) # This does > > # Getting a binary path from an arbitrary object > bytes_path = os.fsencode(os.fspath(obj)) > bytes_path = os.fsencode(os.fspathname(obj)) > > I'm starting to think the semantic nudge from the "name" suffix when > reading the code is worth the extra four characters when writing it > (keeping in mind that the whole point of this exercise is that most > folks *won't* be writing explicit conversions - the stdlib will handle > it on their behalf). > > I also think the more explicit name helps answer some of the type > signature questions that have arisen: > > 1. Does os.fspathname return rich Path objects? No, it returns names > as str objects > 2. Will file descriptors pass through os.fspathname? No, as they're > not names, they're numeric descriptors. > 3. Will bytes-like objects pass through os.fspathname? No, as they're > not names, they're encodings of names This worries me. I know the primary purpose of this change is to enable pathlib and os and the rest of the stdlib to work together, but consider . . . If adding a new attribute/method was as far as we went, new code (stdlib or otherwise) would look like: if isinstance(a_path_thingy, bytes): # because os can accept bytes pass elif isinstance(a_path_thingy, str): # but it's usually text pass elif hasattr(a_path_thingy, '__fspath__'): a_path_thingy = a_path_thingy.__fspath__() else: raise TypeError('not a valid path') # do something with the path If we add os.fspath(), but don't allow bytes to be returned from it, our above example looks more like: if isinstance(a_path_thingy, bytes): # because os can accept bytes pass else: a_path_thingy = os.fspath(a_path_thingy) # do something with the path Yes, it's better -- but it still requires a pre-check before calling os.fspath(). It is my contention that this is better: a_path_thingy = os.fspath(a_path_thingy) This raises two issues: 1) Part of the stdlib is the new scandir module, which can work with, and return, both bytes and text -- if __fspath__ can only hold text, DirEntry will not get the __fspath__ method added, and the pre-check, boiler-plate code will flourish; 2) pathlib.Path accepts bytes -- so what happens when a byte-derived Path is passed to os.fspath()? Is a TypeError raised? Do we guess and auto-convert with fsdecode()? I think the best answer is to - let __fspath__ hold bytes as well as text - let fspath() return bytes as well as text -- ~Ethan~ From sunnycemetery at gmail.com Sat Apr 9 18:55:41 2016 From: sunnycemetery at gmail.com (Grady Martin) Date: Sat, 9 Apr 2016 18:55:41 -0400 Subject: [Python-Dev] Incomplete Internationalization in Argparse Module In-Reply-To: References: <20160408210006.GB1484@slim> <20160409082555.GF1484@slim> Message-ID: <20160409225541.GI1484@slim> Excellent. Issue/patch here: http://bugs.python.org/issue26726 On 2016?04?09? 08?16?, Guido van Rossum wrote: > >OK, so this should be taken to the bug tracker. > >On Saturday, April 9, 2016, Grady Martin wrote: > >> I agree. However, an incorrect choice for an argument with a choices >> parameter results in this string. >> >> On 2016?04?08? 18?12?, Guido van Rossum wrote: >> >>> >>> That string looks like it is aimed at the developer, not the user of >>> the program, so it makes sense not to translate it. >>> >>> On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon wrote: >>> >>>> >>>> >>>> On Fri, 8 Apr 2016 at 14:05 Grady Martin >>>> wrote: >>>> >>>>> >>>>> Hello, all. I was wondering if the following string was left untouched >>>>> by >>>>> gettext for a purpose (from line 720 of argparse.py, in class >>>>> ArgumentError): >>>>> >>>>> 'argument %(argument_name)s: %(message)s' >>>>> >>>>> There may be other untranslatable strings in the argparse module, but I >>>>> have yet to encounter them in the wild. >>>>> >>>> >>>> >>>> Probably so that anyone introspecting on the error message can count on >>>> somewhat of a consistent format (comes into play with doctest typically). >>>> >>>> _______________________________________________ >>>> Python-Dev mailing list >>>> Python-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/python-dev >>>> Unsubscribe: >>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org >>>> >>>> >>> >>> >>> -- >>> --Guido van Rossum (python.org/~guido) >>> >> > >-- >--Guido (mobile) From ncoghlan at gmail.com Sun Apr 10 00:51:23 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Apr 2016 14:51:23 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: On 9 April 2016 at 22:43, Victor Stinner wrote: > Please don't loose time trying yet another sandbox inside CPython. It's just > a waste of time. It's broken by design. > > Please read my email about my attempt (pysandbox): > https://lwn.net/Articles/574323/ > > And the LWN article: > https://lwn.net/Articles/574215/ > > There are a lot of safe ways to run CPython inside a sandbox (and not rhe > opposite). > > I started as you, add more and more things to a blacklist, but it doesn't > work. > > See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > a list of know code to crash CPython (I don't recall the dieectory in > sources), even with the latest version of CPython. They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers There's also https://hg.python.org/cpython/file/tip/Lib/test/test_crashers.py which was designed to run them regularly to catch when they were resolved, but it was too fragile and tended to hang the buildbots. Even without those considerations though, there are system level denial of service attacks that untrusted code can perform without even trying to break out of the sandbox - the most naive is "while 1: pass", but there are more interesting ones like "from itertools import count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int, 1))". Operating system level security sandboxes still aren't particularly easy to use correctly, but they're a lot more reliable than language runtime level sandboxes, can be used to defend against many more attack vectors, and even offer increased flexibility (e.g. "can write to these directories, but no others", "can read these files, but no others", "can contact these IP addresses, but no others"). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Apr 10 01:04:47 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Apr 2016 15:04:47 +1000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <20160409130206.6E4B1B14158@webabinitio.net> References: <5707F4DB.7000501@stoneleaf.us> <20160409130206.6E4B1B14158@webabinitio.net> Message-ID: On 9 April 2016 at 23:02, R. David Murray wrote: > That is, a 'filename' is the identifier we've assigned to this thing > pointed to by an inode in linux, but an os path is a text representation > of the path from the root filename to a specified filename. That is, > the path *is* the name, so to say "path name" sounds redundant and > confusing to me. "The path is the name" is a true statement in the context of: 1. The way *nix APIs work 2. Existing filesystem interfaces in the standard library 3. Path abstractions that inherit from str/unicode It's no longer true in the context of pathlib - there, the path name is a serialised representation of a rich path object. It's also not really true in the context of Python 3 in general - bytes-like objects are an encoding of the path name, rather than the name itself. This means that "path" has become ambiguous due to the changing context - do we mean the path name representation, the binary encoding of that name, or a higher level rich path object? We're never going to be able to eliminate that ambiguity (Python's *nix & C roots run too deep for that), but we *can* potentially standardise the terms used when disambiguation is needed: path name (str), encoded path name (bytes-like object), rich path object (object implementing the new protocol) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Sun Apr 10 01:08:45 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 Apr 2016 15:08:45 +1000 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: References: <20151016005711.GC11980@ando.pearwood.info> Message-ID: <20160410050845.GA12526@ando.pearwood.info> I've just spotted this email from Guido, sorry about the delay in responding. Further comments below. On Thu, Jan 14, 2016 at 10:47:09AM -0800, Guido van Rossum wrote: > I think the discussion petered out and nobody asked me to approve it yet > (or I lost track of it). I'm almost happy to approve it in the current > state. My only quibble is with some naming -- I'm not sure that a > super-generic name like 'equal' is better than the original > ('compare_digest'), Changed. > and I would have picked a different name for token_url > -- probably token_urlsafe. But maybe Steven can convince me that the names > currently in the PEP are better. Changed. > (I also don't like the wishy-washy > position of the PEP on the actual specs of the proposed functions. But I'm > fine with the actual implementation shown as the spec.) I'm not really sure what you want me to do to improve that. Can you be more concrete about what you would like the PEP to say? I haven't updated the PEP yet, but the newest version of the secrets module with the changes requested is here: https://bitbucket.org/sdaprano/secrets -- Steve From ncoghlan at gmail.com Sun Apr 10 01:31:30 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Apr 2016 15:31:30 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <5709309D.8030007@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> Message-ID: On 10 April 2016 at 02:41, Ethan Furman wrote: > If we add os.fspath(), but don't allow bytes to be returned from it, our > above example looks more like: > > if isinstance(a_path_thingy, bytes): > # because os can accept bytes > pass > else: > a_path_thingy = os.fspath(a_path_thingy) > # do something with the path > > Yes, it's better -- but it still requires a pre-check before calling > os.fspath(). > > It is my contention that this is better: > > a_path_thingy = os.fspath(a_path_thingy) That approach often doesn't work, though - by design, there are situations where you can't transparently handle bytes and str with the same code path in Python 3 the way you could in Python 2. When somebody hands you bytes rather than text you need to worry about the encoding, and you need to worry about returning bytes rather than text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1 provides an illustration of how fiddly that can get, and that's in the URL context - cross-platform filesystem path handling is worse, since you need to worry about the significant differences between the way Windows and *nix handle binary paths, and you can't use os.sep directly any more (since that's always text). > This raises two issues: > > 1) Part of the stdlib is the new scandir module, which can work > with, and return, both bytes and text -- if __fspath__ can only > hold text, DirEntry will not get the __fspath__ method added, > and the pre-check, boiler-plate code will flourish; DirEntry can still get the check, it can just throw TypeError when it represents a binary path (that's one of the advantages of using a method-based protocol - exceptions on method calls are more acceptable than exceptions on property access). > 2) pathlib.Path accepts bytes -- so what happens when a byte-derived > Path is passed to os.fspath()? Is a TypeError raised? Do we > guess and auto-convert with fsdecode()? pathlib is str-only (which makes sense, since it's a cross-platform API and binary paths basically don't work on Windows): >>> pathlib.Path(b".") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.4/pathlib.py", line 907, in __new__ self = cls._from_parts(args, init=False) File "/usr/lib64/python3.4/pathlib.py", line 589, in _from_parts drv, root, parts = self._parse_args(args) File "/usr/lib64/python3.4/pathlib.py", line 581, in _parse_args % type(a)) TypeError: argument should be a path or str object, not The only specific mention of binary support in the pathlib docs is to state that "bytes(p)" uses os.fsencode() to convert to the binary representation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Sun Apr 10 01:58:08 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 10 Apr 2016 17:58:08 +1200 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> Message-ID: <5709EB70.7030308@canterbury.ac.nz> Brett Cannon wrote: > Depends if you use `/` or `\` as your path separator Or whether your pathnames look entirely different, e.g VMS: device:[topdir.subdir.subsubdir]filename.ext;version Pathnames are very much OS-dependent in both syntax *and* semantics. Even the main two in use today (unix and windows) can't be mapped directly onto each other, because windows has drive letters and unix doesn't. -- Greg From greg.ewing at canterbury.ac.nz Sun Apr 10 02:24:23 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 10 Apr 2016 18:24:23 +1200 Subject: [Python-Dev] pathlib (was: Defining a path protocol) In-Reply-To: References: Message-ID: <5709F197.7020802@canterbury.ac.nz> Nick Coghlan wrote: > We want to be able to readily use the protocol helper in builtin > modules like os and low level Python modules like os.path, which means > we want it to be much lower down in the import hierarchy than pathlib. Also, it's more general than that. It works on any object that wants to behave as a path, not just pathlib ones, so it should be in a neutral place. -- Greg From greg.ewing at canterbury.ac.nz Sun Apr 10 02:38:35 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 10 Apr 2016 18:38:35 +1200 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: <5709F4EB.6060200@canterbury.ac.nz> Eric Snow wrote: > All this matters because it impacts the value returned from > __ospath__(). Should it return the string representation of the path > for the current OS or some standardized representation? What standardized representation? I'm not aware of such a thing. > I'd expect > the former. However, if that is the expectation then something like > pathlib.PureWindowsPath will give you the wrong thing if your current > OS is linux. No, you should get the representation corresponding to the kind of path object you started with. If you're working with Windows path objects on a Unix system, they must be representing something on some Windows system somewhere, not the one you're running the code on. The only reason to ask for a string representation of such a path is for use by that other system. I don't think it even makes sense to ask for a Unix representation of a Windows path or vice versa, because the semantics are different. How do you translate a Windows drive letter into Unix? What drive letter do you use for an absolute Unix path? -- Greg From ncoghlan at gmail.com Sun Apr 10 02:43:16 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Apr 2016 16:43:16 +1000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <5709EB70.7030308@canterbury.ac.nz> References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> <5709EB70.7030308@canterbury.ac.nz> Message-ID: On 10 April 2016 at 15:58, Greg Ewing wrote: > Brett Cannon wrote: > >> Depends if you use `/` or `\` as your path separator > > > Or whether your pathnames look entirely different, e.g VMS: > > device:[topdir.subdir.subsubdir]filename.ext;version > > Pathnames are very much OS-dependent in both syntax *and* semantics. > > Even the main two in use today (unix and windows) can't be > mapped directly onto each other, because windows has drive > letters and unix doesn't. This does raise a concrete API design question: how should PurePath.__fspath__ behave when called on a mismatched OS? For PurePath vs Path, the latter raises NotImplementedError if you try to create a concrete path that doesn't match the running system: >>> pathlib.PureWindowsPath(".") PureWindowsPath('.') >>> pathlib.WindowsPath(".") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.4/pathlib.py", line 910, in __new__ % (cls.__name__,)) NotImplementedError: cannot instantiate 'WindowsPath' on your system The question we need to address is what happens if you do: >>> os.fspath(pathlib.PureWindowsPath(".")) on a *nix system? Similar to my proposal for dealing with DirEntry.path being a bytes-like object, I'd like to suggest raising TypeError in __fspath__ if the request is nonsensical for the currently running system - *nix systems can *manipulate* Windows paths (and vice-versa), but actually trying to *use* them with the local filesystem isn't going to work properly, since the syntax and semantics are different. >>> os.fspath(pathlib.WindowsPath(".")) Traceback (most recent call last): ... TypeError: cannot render 'PureWindowsPath' as filesystem path on 'posix' system (I'm also suggesting replacing "your" with the value of os.name) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Sun Apr 10 02:51:23 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 10 Apr 2016 18:51:23 +1200 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <20160409130206.6E4B1B14158@webabinitio.net> Message-ID: <5709F7EB.9010603@canterbury.ac.nz> > On 9 April 2016 at 23:02, R. David Murray wrote: > >>That is, a 'filename' is the identifier we've assigned to this thing >>pointed to by an inode in linux, but an os path is a text representation >>of the path from the root filename to a specified filename. That is, >>the path *is* the name, so to say "path name" sounds redundant and >>confusing to me. The term "pathname" is what is conventionally used to refer to a textual string passed to the OS to identify an object in the file system. It's often abbreviated to just "path", but that's ambiguous for our purposes, because "path" can also refer to one of our higher-level objects. -- Greg From greg.ewing at canterbury.ac.nz Sun Apr 10 03:12:18 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 10 Apr 2016 19:12:18 +1200 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> <5709EB70.7030308@canterbury.ac.nz> Message-ID: <5709FCD2.3070808@canterbury.ac.nz> Nick Coghlan wrote: > Similar to my proposal for dealing with DirEntry.path being a > bytes-like object, I'd like to suggest raising TypeError in __fspath__ > if the request is nonsensical for the currently running system - *nix > systems can *manipulate* Windows paths (and vice-versa), but actually > trying to *use* them with the local filesystem isn't going to work > properly, since the syntax and semantics are different. That sounds reasonable, since it would be preferable to fail early if you mistakenly pass a PureWindowsPath to e.g. open(). But there needs to be some way to ask a path object for its native string representation, otherwise there would be no point in using foreign path objects at all. -- Greg From ncoghlan at gmail.com Sun Apr 10 03:36:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Apr 2016 17:36:36 +1000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <5709FCD2.3070808@canterbury.ac.nz> References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> <5709EB70.7030308@canterbury.ac.nz> <5709FCD2.3070808@canterbury.ac.nz> Message-ID: On 10 April 2016 at 17:12, Greg Ewing wrote: > Nick Coghlan wrote: >> >> Similar to my proposal for dealing with DirEntry.path being a >> bytes-like object, I'd like to suggest raising TypeError in __fspath__ >> if the request is nonsensical for the currently running system - *nix >> systems can *manipulate* Windows paths (and vice-versa), but actually >> trying to *use* them with the local filesystem isn't going to work >> properly, since the syntax and semantics are different. > > > That sounds reasonable, since it would be preferable to > fail early if you mistakenly pass a PureWindowsPath to > e.g. open(). > > But there needs to be some way to ask a path object for > its native string representation, otherwise there would > be no point in using foreign path objects at all. In addition to the existing "str(pathobj)", a "path" property was recently added for that purpose: >>> import pathlib >>> pathlib.PureWindowsPath(".") PureWindowsPath('.') >>> pathlib.PureWindowsPath(".").path '.' (The specific property name was chosen to match os.scandir's DirEntry.path) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Sun Apr 10 03:58:06 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 10 Apr 2016 08:58:06 +0100 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> <5709EB70.7030308@canterbury.ac.nz> <5709FCD2.3070808@canterbury.ac.nz> Message-ID: On 10 April 2016 at 08:36, Nick Coghlan wrote: > In addition to the existing "str(pathobj)", a "path" property was > recently added for that purpose: > > >>> import pathlib > >>> pathlib.PureWindowsPath(".") > PureWindowsPath('.') > >>> pathlib.PureWindowsPath(".").path > '.' > > (The specific property name was chosen to match os.scandir's DirEntry.path) I believe that under the current proposal, the ".path" property will be removed again in favour of the new protocol, so the only actual option would be str(pathobj). Paul From srkunze at mail.de Sun Apr 10 10:07:50 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 10 Apr 2016 16:07:50 +0200 Subject: [Python-Dev] pathlib+os/shutil feedback Message-ID: <570A5E36.2070606@mail.de> I talked to my colleague. He didn't remember the concrete use-case, though, he instantly mentioned three possible things (no order of preference): 1) pathlib + mtime 2) os.walk and pathlib 3) creation/removal of paths He wasn't too sure but I checked with the docs and his memories seemed to be correct: ----- 1) https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat High-level path objects should return high-level [insert type here] objects. Put differently, an API for retrieving time-stats as real date/time objects would be nice. I think that can be expanded to other pathlib methods as well, to make them less "os-wrapper"-like and provide added value. ----- 2) I remember a discussion on python-ideas about using "glob" or "rglob". However, when searching the docs for "walk" like in "os.walk" or for "iter", I don't find "glob"/"rglob". I can imagine ourselves (pathlib newbies back then), we didn't discover them. It would be great if the docs could be improved like the following: """ Path.rglob(pattern) Walk down a given path; a wrapper for "os.scandir"/"os.listdir". This is like calling glob() with ?**? added in front of the given pattern: """ I think it would make "glob" and "rglob" more discoverable to new users. NOTE: """ Using the ?**? pattern in large directory trees may consume an inordinate amount of time.""" sounds not really encouraging. This is especially true for "rglob" as it is defined as "like calling glob() with ?**?". That leads to wondering whether "rglob" performs slow globbing instead of a "os.scandir"/"os.listdir". https://docs.python.org/3/library/pathlib.html#basic-use even promotes "glob" with "**" in the beginning which seems rather discouraging to use "rglob" as a fast alternative to "os.walk/scandir/listdir". Renaming "rglob"/adding a "scan" method would definitely help here. ----- 3) Again searching the docs for "create", "delete" (nothing found) and "remove", I found "Path.touch", "Path.rmdir" and "Path.unlink". It would be great if we had an easy way to remove a complete subtree as with "shutil.rmtree". We mostly don't care if a directory is empty. We need the system to be in a state of "this path does not exist anymore". Moreover, touching a file is good enough to "create" it if you don't care about changing its mtime. It you care about its mtime, it's a problem to "touch". ------ That's it for our issues with pathlib from the past. Additionally, I got two further observations: A) pathlib tries to mimic/publish some low-level APIs to its users. "unlink" is not something people would expect to use when they want to "delete" or to "remove" a file or a directory. I know where the term stems from but it's the wrong layer of abstraction IMHO. Same for "touch" or "chmod". B) "rename" vs "replace". The difference is not really clear from the docs. You need to read "Path.replace" in order to understand "Path.rename" completely. (one raises an exception, the other don't if I read it correctly). If there's some agreement to change things with respect to those 5 points, I am willing to put some time into it. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Apr 10 10:51:05 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 10 Apr 2016 15:51:05 +0100 Subject: [Python-Dev] pathlib+os/shutil feedback In-Reply-To: <570A5E36.2070606@mail.de> References: <570A5E36.2070606@mail.de> Message-ID: On 10 April 2016 at 15:07, Sven R. Kunze wrote: > If there's some agreement to change things with respect to those 5 points, I > am willing to put some time into it. In broad terms I agree with these points. Thanks for doing the research. It would certainly be good to try to improve pathlib based on this sort of feedback while it is still provisional. One specific point - you say: """ Path.rglob(pattern) Walk down a given path; a wrapper for "os.scandir"/"os.listdir". """ However, at least in 3.5, Path.rglob does *not* wrap scandir. There's a difference in principle, in that scandir (DirEntry) objects cache stat data, where pathlib does not. Whether that makes using scandir in Path.rglob impossible, I don't know. Ideally I'd like to see pathlib modified to use scandir (because otherwise there will always be people saying "use os.walk rather than scandir, as it's faster) - or if it's not possible to do so because of the difference in principle, then I'd like to see a clear discussion of the issue in the docs, including the recommended approach for people who want scandir performance *without* having to abandon pathlib for lower level functions. Paul From rdmurray at bitdance.com Sun Apr 10 11:03:27 2016 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 10 Apr 2016 11:03:27 -0400 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: <5709F7EB.9010603@canterbury.ac.nz> References: <5707F4DB.7000501@stoneleaf.us> <20160409130206.6E4B1B14158@webabinitio.net> <5709F7EB.9010603@canterbury.ac.nz> Message-ID: <20160410150329.A58AEB14158@webabinitio.net> On Sun, 10 Apr 2016 18:51:23 +1200, Greg Ewing wrote: > > On 9 April 2016 at 23:02, R. David Murray wrote: > > > >>That is, a 'filename' is the identifier we've assigned to this thing > >>pointed to by an inode in linux, but an os path is a text representation > >>of the path from the root filename to a specified filename. That is, > >>the path *is* the name, so to say "path name" sounds redundant and > >>confusing to me. > > The term "pathname" is what is conventionally used to refer > to a textual string passed to the OS to identify an object > in the file system. > > It's often abbreviated to just "path", but that's ambiguous > for our purposes, because "path" can also refer to one of > our higher-level objects. I find it interesting that in all my years of unix computing I've never run into this (at least so that I became concious of it). I see now that in fact the Posix spec uses 'pathname'. Objection, such as it was, completely withdrawn :) (Nick's point about Path object vs path is also a good one.) --David From ethan at stoneleaf.us Sun Apr 10 11:26:31 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 10 Apr 2016 08:26:31 -0700 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> <5709EB70.7030308@canterbury.ac.nz> <5709FCD2.3070808@canterbury.ac.nz> Message-ID: <570A70A7.3000609@stoneleaf.us> On 04/10/2016 12:36 AM, Nick Coghlan wrote: > On 10 April 2016 at 17:12, Greg Ewing wrote: >> But there needs to be some way to ask a path object for >> its native string representation, otherwise there would >> be no point in using foreign path objects at all. > > In addition to the existing "str(pathobj)", a "path" property was > recently added for that purpose: > > >>> import pathlib > >>> pathlib.PureWindowsPath(".") > PureWindowsPath('.') > >>> pathlib.PureWindowsPath(".").path > '.' > > (The specific property name was chosen to match os.scandir's DirEntry.path) But with the new __fspath__ enhancements wouldn't the .path attribute go away? -- ~Ethan~ From donald at stufft.io Sun Apr 10 11:50:24 2016 From: donald at stufft.io (Donald Stufft) Date: Sun, 10 Apr 2016 11:50:24 -0400 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> <5709EB70.7030308@canterbury.ac.nz> Message-ID: > On Apr 10, 2016, at 2:43 AM, Nick Coghlan wrote: > > This does raise a concrete API design question: how should > PurePath.__fspath__ behave when called on a mismatched OS? I think that PurePath.__fspath__ should return a string. There?s no reason why we can?t in my opinion and doing so just limits the usefulness of the method. For instance, it?d prevent it from being possible to serialize a pure windows path and send it over the wire to a process running on a Windows machine, like say if you have a build master running on Linux and a build slave running on Windows. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ethan at stoneleaf.us Sun Apr 10 12:16:39 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 10 Apr 2016 09:16:39 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> Message-ID: <570A7C67.3010304@stoneleaf.us> On 04/09/2016 10:31 PM, Nick Coghlan wrote: > On 10 April 2016 at 02:41, Ethan Furman wrote: > When somebody hands you bytes rather than text you need to worry about > the encoding, and you need to worry about returning bytes rather than > text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1 > provides an illustration of how fiddly that can get, and that's in the > URL context - cross-platform filesystem path handling is worse, since > you need to worry about the significant differences between the way > Windows and *nix handle binary paths, and you can't use os.sep > directly any more (since that's always text). Okay, that makes sense. > DirEntry can still get the check, it can just throw TypeError when it > represents a binary path (that's one of the advantages of using a > method-based protocol - exceptions on method calls are more acceptable > than exceptions on property access). I guess I don't see the point of this. Either DirEntry's [1] only get partial support (which is only marginally better than the no support pathlib currently has), or stdlib code will need to catch those errors and then do an isinstance check to see if knows what the type is and how to deal with it [1]. On the other hand, if __fspath__ is allowed to hold bytes then the algorithm gets easier: - get the serialized form - check for bytes or str and act accordingly As a practicality argument that seems a lot easier for everybody. -- ~Ethan~ [1] Being a low-level function I think working with either bytes or str is entirely appropriate for DirEntry. [2] DirEntry? Oh yeah, grab the .path attribute. Something else? Bah, let the exception propogate. From stephen at xemacs.org Sun Apr 10 12:29:00 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 11 Apr 2016 01:29:00 +0900 Subject: [Python-Dev] Defining a path protocol In-Reply-To: <57083FCB.1000808@stoneleaf.us> References: <570526CE.5080401@stoneleaf.us> <5707DDF3.5030106@stoneleaf.us> <57083FCB.1000808@stoneleaf.us> Message-ID: <22282.32588.172670.633359@turnbull.sk.tsukuba.ac.jp> Ethan Furman writes: > It means the stuff in place won't change, but the stuff we're > adding now to integrate with Path will only support str (which is > one reason why os.path isn't going to die). I don't think this is a reason for keeping os.path. (Backward compatibility with existing code is sufficient, of course.) Support of str for all file names is provided by PEP 383. ISTM there's no big loss to using PEP 383's 'surrogateescape' handler to allow un-decode- able filenames in pathlib.Path: they're very rare. AFAIK pathlib doesn't care about surrogates -- after all, they're entirely "consenting adults" stuff. Of course that detracts a bit from the attractiveness of pathlib.Path vs. os.path or bytes methods, but only for a use case most people won't encounter in practice. We continue to support bytes at the os/io/open level for the same reasons you added formatting back to bytes: there are times when it's as least as natural to work with bytes as str (eg, when the path is passed around without manipulation) and more convenient (eg, you don't have to deal with encodings and UnicodeError handling). > After all, the idea is to make these things work with the stdlib, and > the stdlib accepts bytes for path strings. I don't see a problem. In dealing with legacy data (archives that include paths, such as .zips and .isos) we may find un-decode-able paths, or paths that are decode-able but by undetermined encoding, for a while to come (decades). For those, the bytes interfaces are preferable to unlovely expedients like decoding as 'iso8859-1'. But those are specialized use cases. Sane people dealing with current file systems won't need bytes in pathlib, and most "out of bounds" uses for pathlib I can think of in my own experience will be able to use surrogateescape. From jon+python-dev at unequivocal.co.uk Sun Apr 10 12:43:08 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Sun, 10 Apr 2016 17:43:08 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160410164308.GE17895@unequivocal.co.uk> On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote: > Please don't loose time trying yet another sandbox inside CPython. It's > just a waste of time. It's broken by design. > > Please read my email about my attempt (pysandbox): > https://lwn.net/Articles/574323/ > > And the LWN article: > https://lwn.net/Articles/574215/ > > There are a lot of safe ways to run CPython inside a sandbox (and not rhe > opposite). > > I started as you, add more and more things to a blacklist, but it doesn't > work. That's the opposite of my approach though - I'm starting small and adding things, not starting with everything and removing stuff. Even if what we end up with is an extremely restricted subset of Python, there are still cases where that could be a useful tool to have. I've read your links above, and indeed everything I can find written by anyone about historical attempts to sandbox Python. I'm aware that others have tried and failed at this in the past, so it's certainly true that there is room for suspicion that this simply cannot be done. However on the other hand, nobody has tried before to do what I am doing (static code analysis), so it's not necessarily a safe assumption that the idea is doomed. For example, as far as I can see, none of the methods used to break out of your pysandbox would work to break out of my experiment. From jon+python-dev at unequivocal.co.uk Sun Apr 10 12:51:13 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Sun, 10 Apr 2016 17:51:13 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160410165113.GF17895@unequivocal.co.uk> On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote: > On 9 April 2016 at 22:43, Victor Stinner wrote: > > See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > > a list of know code to crash CPython (I don't recall the dieectory in > > sources), even with the latest version of CPython. > > They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers Thanks. I take your point that sandboxing Python requires CPython to free of code execution bugs. However I will note that none of the crashers in that directory will work inside my experiment (except "infinite_loop_re.py", which isn't a crasher just a long loop). > Even without those considerations though, there are system level > denial of service attacks that untrusted code can perform without even > trying to break out of the sandbox - the most naive is "while 1: > pass", but there are more interesting ones like "from itertools import > count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int, > 1))". Yes, of course. I have already explicitly noted that infinite loops and memory exhausation are not preventable. > Operating system level security sandboxes still aren't particularly > easy to use correctly, but they're a lot more reliable than language > runtime level sandboxes, can be used to defend against many more > attack vectors, and even offer increased flexibility (e.g. "can write > to these directories, but no others", "can read these files, but no > others", "can contact these IP addresses, but no others"). I don't entirely trust operating system sandboxes either - I generally assume that if someone can execute arbitrary code on my machine, then they can do anything they want to that machine. What I *might* trust, though, would be a "sandbox Python" that is itself running inside an operating system sandbox... From guido at python.org Sun Apr 10 14:43:08 2016 From: guido at python.org (Guido van Rossum) Date: Sun, 10 Apr 2016 11:43:08 -0700 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: <20160410050845.GA12526@ando.pearwood.info> References: <20151016005711.GC11980@ando.pearwood.info> <20160410050845.GA12526@ando.pearwood.info> Message-ID: Hi Steven, No probIem with the delay -- it's still before 3.6.0. I do think it's just about a record gap in the PEP review process. :-) I will approve the PEP as soon as you've updated the two function names in the PEP. (If you don't have write access to the peps repo, send the new version to peps at python.org -- or send a link to the new draft somewhere online, e.g. github if you're using that. If you do have peps repo write access, just reply here when it's done.) Regarding the alluded vagueness of the PEP on the specs, I think I was mostly about the phrase "At the time of writing, the following functions have been suggested" which doesn't seem to commit very strongly to a specific API. The later phrase "The following pseudo-code can be taken as a possible starting point for the real implementation" doesn't really do much to take away the feeling that the PEP is non-committal on the actual API it proposes. But I don't want to approve the *idea* of a secrets module -- I want to approve a specific API. Maybe you can just change the words a bit to say something like "this PEP proposes the following API; the implementations given here are not final". None of this will prevent adding more functions to secrets.py before 3.6.0 is released (or, of course, in 3.7, 3.8 etc.), but it should send a clear message that we've agreed on these specific names and signatures, and that those are what I'm approving. If we change our minds about the API of the module before releasing 3.6.0, we should treat it as an amendment to the PEP and take it pretty seriously (but it's happened before so it's not impossible). Hopefully this message isn't drowned in the infinity of pathlib and ~bool threads, and we can proceed to add secrets.py to the 3.6 stdlib. You should be proud of that accomplishment! --Guido On Sat, Apr 9, 2016 at 10:08 PM, Steven D'Aprano wrote: > I've just spotted this email from Guido, sorry about the delay in > responding. > > Further comments below. > > > On Thu, Jan 14, 2016 at 10:47:09AM -0800, Guido van Rossum wrote: > >> I think the discussion petered out and nobody asked me to approve it yet >> (or I lost track of it). I'm almost happy to approve it in the current >> state. My only quibble is with some naming -- I'm not sure that a >> super-generic name like 'equal' is better than the original >> ('compare_digest'), > > Changed. > > >> and I would have picked a different name for token_url >> -- probably token_urlsafe. But maybe Steven can convince me that the names >> currently in the PEP are better. > > Changed. > > >> (I also don't like the wishy-washy >> position of the PEP on the actual specs of the proposed functions. But I'm >> fine with the actual implementation shown as the spec.) > > I'm not really sure what you want me to do to improve that. Can you be > more concrete about what you would like the PEP to say? > > > I haven't updated the PEP yet, but the newest version of the secrets > module with the changes requested is here: > > https://bitbucket.org/sdaprano/secrets > > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From wes.turner at gmail.com Sun Apr 10 17:05:59 2016 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 10 Apr 2016 16:05:59 -0500 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> Message-ID: On Apr 10, 2016 11:51 AM, "Jon Ribbens" wrote: > > On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote: > > On 9 April 2016 at 22:43, Victor Stinner wrote: > > > See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > > > a list of know code to crash CPython (I don't recall the dieectory in > > > sources), even with the latest version of CPython. > > > > They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers > > Thanks. I take your point that sandboxing Python requires CPython to > free of code execution bugs. However I will note that none of the > crashers in that directory will work inside my experiment (except > "infinite_loop_re.py", which isn't a crasher just a long loop). > > > Even without those considerations though, there are system level > > denial of service attacks that untrusted code can perform without even > > trying to break out of the sandbox - the most naive is "while 1: > > pass", but there are more interesting ones like "from itertools import > > count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int, > > 1))". > > Yes, of course. I have already explicitly noted that infinite loops > and memory exhausation are not preventable. > > > Operating system level security sandboxes still aren't particularly > > easy to use correctly, but they're a lot more reliable than language > > runtime level sandboxes, can be used to defend against many more > > attack vectors, and even offer increased flexibility (e.g. "can write > > to these directories, but no others", "can read these files, but no > > others", "can contact these IP addresses, but no others"). > > I don't entirely trust operating system sandboxes either - I generally > assume that if someone can execute arbitrary code on my machine, then > they can do anything they want to that machine. > > What I *might* trust, though, would be a "sandbox Python" that is > itself running inside an operating system sandbox... > * https://github.com/jupyter/jupyterhub/wiki/Spawners - Docker LXC Containers - https://github.com/jupyter/jupyterhub/wiki/Authenticators - DOS is still trivial - Segfault is still trivial * http://doc.pypy.org/en/latest/sandbox.html#introduction _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sun Apr 10 17:07:48 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 11 Apr 2016 00:07:48 +0300 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160410165113.GF17895@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> Message-ID: On 10.04.16 19:51, Jon Ribbens wrote: > On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote: >> On 9 April 2016 at 22:43, Victor Stinner wrote: >>> See pysandbox test suite for a lot of ways to escape a sandbox. CPython has >>> a list of know code to crash CPython (I don't recall the dieectory in >>> sources), even with the latest version of CPython. >> >> They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers > > Thanks. I take your point that sandboxing Python requires CPython to > free of code execution bugs. However I will note that none of the > crashers in that directory will work inside my experiment (except > "infinite_loop_re.py", which isn't a crasher just a long loop). Try following example: it = iter([1]) for i in range(1000000): it = filter(None, it) next(it) From Nikolaus at rath.org Sun Apr 10 17:08:16 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sun, 10 Apr 2016 14:08:16 -0700 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160410164308.GE17895@unequivocal.co.uk> (Jon Ribbens's message of "Sun, 10 Apr 2016 17:43:08 +0100") References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> Message-ID: <87k2k5p3y7.fsf@vostro.rath.org> On Apr 10 2016, Jon Ribbens wrote: > On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote: >> Please don't loose time trying yet another sandbox inside CPython. It's >> just a waste of time. It's broken by design. >> >> Please read my email about my attempt (pysandbox): >> https://lwn.net/Articles/574323/ >> >> And the LWN article: >> https://lwn.net/Articles/574215/ >> >> There are a lot of safe ways to run CPython inside a sandbox (and not rhe >> opposite). >> >> I started as you, add more and more things to a blacklist, but it doesn't >> work. > > That's the opposite of my approach though - I'm starting small and > adding things, not starting with everything and removing stuff. That contradicts what you said in another mail: On Apr 08 2016, Jon Ribbens wrote: > Ah, I've not used Python 3.5, and I can't find any documentation on > this cr_frame business, but I've added cr_frame and f_back to the > disallowed attributes list. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From jon+python-dev at unequivocal.co.uk Sun Apr 10 17:53:41 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Sun, 10 Apr 2016 22:53:41 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> Message-ID: <20160410215341.GI17895@unequivocal.co.uk> On Mon, Apr 11, 2016 at 12:07:48AM +0300, Serhiy Storchaka wrote: > On 10.04.16 19:51, Jon Ribbens wrote: > >On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote: > >>On 9 April 2016 at 22:43, Victor Stinner wrote: > >>>See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > >>>a list of know code to crash CPython (I don't recall the dieectory in > >>>sources), even with the latest version of CPython. > >> > >>They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers > > > >Thanks. I take your point that sandboxing Python requires CPython to > >free of code execution bugs. However I will note that none of the > >crashers in that directory will work inside my experiment (except > >"infinite_loop_re.py", which isn't a crasher just a long loop). > > Try following example: > > it = iter([1]) > for i in range(1000000): > it = filter(None, it) > next(it) That does indeed segfault. I guess you should report that as a bug! From jon+python-dev at unequivocal.co.uk Sun Apr 10 18:31:57 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Sun, 10 Apr 2016 23:31:57 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <87k2k5p3y7.fsf@vostro.rath.org> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <87k2k5p3y7.fsf@vostro.rath.org> Message-ID: <20160410223157.GJ17895@unequivocal.co.uk> On Sun, Apr 10, 2016 at 02:08:16PM -0700, Nikolaus Rath wrote: > On Apr 10 2016, Jon Ribbens wrote: > > On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote: > > That's the opposite of my approach though - I'm starting small and > > adding things, not starting with everything and removing stuff. > > That contradicts what you said in another mail: > > On Apr 08 2016, Jon Ribbens wrote: > > Ah, I've not used Python 3.5, and I can't find any documentation on > > this cr_frame business, but I've added cr_frame and f_back to the > > disallowed attributes list. No, you've just misunderstood my meaning. Obviously I'm not only allowing access to whitelisted variable and property names, that would be ridiculous ("your code may only use variables called 'foo', 'bar' and 'baz'..."). The point is that we can start with, say, only allowing expressions and not statements, and a __builtins__ that contains literally nothing. We can even limit ourselves to disallow, say, lambda and yield and generator expressions if we like. Can this minimal language be made "safe"? If so, we have already won something - the ability to use "eval" as a powerful calculator function. Then, can we allow statements? Can we allow user-defined classes? Can we allow try/catch? etc. With regard to names by the way, I suspect that disallowing just anything starting "_" and the names of the properties of frame objects would be good enough. Unless someone knows a way to get to an object's __dict__ or its type without using vars() or type() or underscore attributes... From oscar.j.benjamin at gmail.com Sun Apr 10 19:02:28 2016 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 11 Apr 2016 00:02:28 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160410215341.GI17895@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> Message-ID: On 10 Apr 2016 22:55, "Jon Ribbens" wrote: > > On Mon, Apr 11, 2016 at 12:07:48AM +0300, Serhiy Storchaka wrote: > > On 10.04.16 19:51, Jon Ribbens wrote: > > >On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote: > > >>On 9 April 2016 at 22:43, Victor Stinner wrote: > > >>>See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > > >>>a list of know code to crash CPython (I don't recall the dieectory in > > >>>sources), even with the latest version of CPython. > > >> > > >>They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers > > > > > >Thanks. I take your point that sandboxing Python requires CPython to > > >free of code execution bugs. However I will note that none of the > > >crashers in that directory will work inside my experiment (except > > >"infinite_loop_re.py", which isn't a crasher just a long loop). > > > > Try following example: > > > > it = iter([1]) > > for i in range(1000000): > > it = filter(None, it) > > next(it) > > That does indeed segfault. I guess you should report that as a bug! There will be always be obscure ways to crash the interpreter. That one can be fixed but if someone really wants to break your sandbox this way then they will be able to. Remember that exploits are often based on bugs and any codebase the size of CPython has bugs. I haven't looked at your sandbox but for a different approach try this one: L = [None] L.extend(iter(L)) On my Linux machine that doesn't just crash Python. -- Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcgoble3 at gmail.com Sun Apr 10 20:12:30 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Sun, 10 Apr 2016 20:12:30 -0400 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> Message-ID: On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin wrote: > I haven't looked at your sandbox but for a different approach try this one: > > L = [None] > L.extend(iter(L)) > > On my Linux machine that doesn't just crash Python. For the record: don't try this if you have unsaved files open on your computer, because you will lose them. When I typed these two lines into the Py3.5 interactive prompt, it completely and totally froze Windows to the point that nothing would respond and I had to resort to the old trick of holding the power button down for five seconds to forcibly shut the computer down. Fortunately, I made extra certain everything was fully saved before I opened the Python interpreter, so I'm not TOTALLY dumb. :-P From tseaver at palladion.com Sun Apr 10 21:49:27 2016 From: tseaver at palladion.com (Tres Seaver) Date: Sun, 10 Apr 2016 21:49:27 -0400 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160410223157.GJ17895@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <87k2k5p3y7.fsf@vostro.rath.org> <20160410223157.GJ17895@unequivocal.co.uk> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/10/2016 06:31 PM, Jon Ribbens wrote: > Unless someone knows a way to get to an object's __dict__ or its type > without using vars() or type() or underscore attributes... Hmm, 'classmethod'-wrapped functions get passed the type. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXCwKgAAoJEPKpaDSJE9HYHbAP/ibVrlKBTqkwePFr4n4hfA5Z 6te+FCzYm4RfAiIMq0Mitc9mFzeeAx5J9Z6kxONkbCBoBbhttcngR1uHWHHR/7tk a9OVKCu0fzvQvKM9J1wPWdu6uB50TZ2PmRiZ1nChXG2XKC8F3xnj/JwZod0N+3vK zus1T6/5vB6pm+q/hm9gh1yd9gTRldzoVQ9T2Tp8vo6PiYxe5qBwfhIHKR8xtWVs eUG0OR1w8QzaU97NDTOShotDq9Ekow66zqlhppqUGSmt2nOTDndLekse6q1l/oir nMuPBxgyb/CkQ9+KNXb3UvT5l8MLmCtJaMm/To0n8OUBSXG8sspP0oUSiMLUXc5a F/haZnpD2jLmCFz9ivBxIpFRVkLIajwovzLLItSzePclZHj6TChctSQvGPY0roVD BYVnGa4i7vi46mSzkeWvXKT2XFed2pCklD+FLnS6RnShxaxj1VEct8LVAJHFNAJ4 qg1dyLlTeclWUdoerRdGG2J7oa3Ib04ydh9OxnB1Y5KGa5iDCmfydHw24BU0gzvu DIX8tEpq5XSqzN5QAkIbtIV5nyqFwPj1Jun275ETkESTvI0fdja/8RJvJ5npYZj0 yJ5Gc5iXwQWazF18ALFYdyeV+ZKKv2Q5UiYEOBxG02XYaH8GZypAqMbf5apJKQAj PXHMjfW/YIuASrzcporx =1Wrb -----END PGP SIGNATURE----- From steve at pearwood.info Sun Apr 10 23:09:19 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 11 Apr 2016 13:09:19 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> Message-ID: <20160411030919.GC12526@ando.pearwood.info> On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote: > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin > wrote: > > I haven't looked at your sandbox but for a different approach try this one: > > > > L = [None] > > L.extend(iter(L)) > > > > On my Linux machine that doesn't just crash Python. > > For the record: don't try this if you have unsaved files open on your > computer, because you will lose them. When I typed these two lines > into the Py3.5 interactive prompt, it completely and totally froze > Windows to the point that nothing would respond and I had to resort to > the old trick of holding the power button down for five seconds to > forcibly shut the computer down. I think this might improve matters: http://bugs.python.org/issue26351 although I must admit I don't understand why the entire OS is effected. -- Steve From phd at phdru.name Sun Apr 10 23:50:31 2016 From: phd at phdru.name (Oleg Broytman) Date: Mon, 11 Apr 2016 05:50:31 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411030919.GC12526@ando.pearwood.info> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> <20160411030919.GC12526@ando.pearwood.info> Message-ID: <20160411035031.GA7952@phdru.name> On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano wrote: > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote: > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin > > wrote: > > > I haven't looked at your sandbox but for a different approach try this one: > > > > > > L = [None] > > > L.extend(iter(L)) > > > > > > On my Linux machine that doesn't just crash Python. > > > > For the record: don't try this if you have unsaved files open on your > > computer, because you will lose them. When I typed these two lines > > into the Py3.5 interactive prompt, it completely and totally froze > > Windows to the point that nothing would respond and I had to resort to > > the old trick of holding the power button down for five seconds to > > forcibly shut the computer down. > > > I think this might improve matters: > > http://bugs.python.org/issue26351 > > although I must admit I don't understand why the entire OS is effected. Memory exhaustion? > -- > Steve Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From wes.turner at gmail.com Mon Apr 11 01:42:47 2016 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 11 Apr 2016 00:42:47 -0500 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411035031.GA7952@phdru.name> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> <20160411030919.GC12526@ando.pearwood.info> <20160411035031.GA7952@phdru.name> Message-ID: On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman wrote: > On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano < > steve at pearwood.info> wrote: > > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote: > > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin > > > wrote: > > > > I haven't looked at your sandbox but for a different approach try > this one: > > > > > > > > L = [None] > > > > L.extend(iter(L)) > > > > > > > > On my Linux machine that doesn't just crash Python. > > > > > > For the record: don't try this if you have unsaved files open on your > > > computer, because you will lose them. When I typed these two lines > > > into the Py3.5 interactive prompt, it completely and totally froze > > > Windows to the point that nothing would respond and I had to resort to > > > the old trick of holding the power button down for five seconds to > > > forcibly shut the computer down. > > > > > > I think this might improve matters: > > > > http://bugs.python.org/issue26351 > > > > although I must admit I don't understand why the entire OS is effected. > > Memory exhaustion? > * https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-domainname-hostname-ipc-mac-address-mem-limit-memswap-limit-privileged-read-only-restart-stdin-open-tty-user-working-dir * https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile > > > -- > > Steve > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Mon Apr 11 02:06:34 2016 From: phd at phdru.name (Oleg Broytman) Date: Mon, 11 Apr 2016 08:06:34 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> <20160411030919.GC12526@ando.pearwood.info> <20160411035031.GA7952@phdru.name> Message-ID: <20160411060634.GA16992@phdru.name> On Mon, Apr 11, 2016 at 12:42:47AM -0500, Wes Turner wrote: > On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman wrote: > > > On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano < > > steve at pearwood.info> wrote: > > > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote: > > > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin > > > > wrote: > > > > > I haven't looked at your sandbox but for a different approach try > > this one: > > > > > > > > > > L = [None] > > > > > L.extend(iter(L)) > > > > > > > > > > On my Linux machine that doesn't just crash Python. > > > > > > > > For the record: don't try this if you have unsaved files open on your > > > > computer, because you will lose them. When I typed these two lines > > > > into the Py3.5 interactive prompt, it completely and totally froze > > > > Windows to the point that nothing would respond and I had to resort to > > > > the old trick of holding the power button down for five seconds to > > > > forcibly shut the computer down. > > > > > > > > > I think this might improve matters: > > > > > > http://bugs.python.org/issue26351 > > > > > > although I must admit I don't understand why the entire OS is effected. > > > > Memory exhaustion? > > * > https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-domainname-hostname-ipc-mac-address-mem-limit-memswap-limit-privileged-read-only-restart-stdin-open-tty-user-working-dir > > * https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile I think memory control groups in Linux can be used to limit memory usage. I have mem. c. g. configured and I'll try to find time to experiment with the code above. > > > -- > > > Steve Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Mon Apr 11 02:20:05 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Apr 2016 16:20:05 +1000 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> <-8088150910827119255@unknownmsgid> <5709EB70.7030308@canterbury.ac.nz> Message-ID: On 11 April 2016 at 01:50, Donald Stufft wrote: > >> On Apr 10, 2016, at 2:43 AM, Nick Coghlan wrote: >> >> This does raise a concrete API design question: how should >> PurePath.__fspath__ behave when called on a mismatched OS? > > I think that PurePath.__fspath__ should return a string. There?s no > reason why we can?t in my opinion and doing so just limits the usefulness > of the method. For instance, it?d prevent it from being possible to > serialize a pure windows path and send it over the wire to a process running > on a Windows machine, like say if you have a build master running on Linux > and a build slave running on Windows. Yeah, given that you have to go out of your way to create a path object for an alternate platform, this makes sense - the "I know what I'm doing" indicator is calling pathlib.Pure[Windows|Posix]Path instead of ""pathlib.PurePath in the first place, and so __fspath__ can just do its thing as a pure text-based operation, without worrying about the current platform. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Apr 11 02:27:07 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Apr 2016 16:27:07 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570A7C67.3010304@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> Message-ID: On 11 April 2016 at 02:16, Ethan Furman wrote: > On 04/09/2016 10:31 PM, Nick Coghlan wrote: >> >> On 10 April 2016 at 02:41, Ethan Furman wrote: > > >> When somebody hands you bytes rather than text you need to worry about >> the encoding, and you need to worry about returning bytes rather than >> text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1 >> provides an illustration of how fiddly that can get, and that's in the >> URL context - cross-platform filesystem path handling is worse, since >> you need to worry about the significant differences between the way >> Windows and *nix handle binary paths, and you can't use os.sep >> directly any more (since that's always text). > > > Okay, that makes sense. > >> DirEntry can still get the check, it can just throw TypeError when it >> represents a binary path (that's one of the advantages of using a >> method-based protocol - exceptions on method calls are more acceptable >> than exceptions on property access). > > > I guess I don't see the point of this. Either DirEntry's [1] only get > partial support (which is only marginally better than the no support pathlib > currently has), or stdlib code will need to catch those errors and then do > an isinstance check to see if knows what the type is and how to deal with it > [1]. What's wrong with only gaining partial support? Standard library code that doesn't currently support DirEntry at all will gain the ability to support str-based DirEntry objects, while bytes-based DirEntry objects will continue to be a low level object that isn't interoperable with most other APIs (which is fine - anyone writing low level POSIX-specific code can deal with unpacking the values explicitly, it just won't happen implicitly anywhere). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From raymond.hettinger at gmail.com Mon Apr 11 02:36:29 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 10 Apr 2016 23:36:29 -0700 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: References: <20151016005711.GC11980@ando.pearwood.info> <20160410050845.GA12526@ando.pearwood.info> Message-ID: > On Apr 10, 2016, at 11:43 AM, Guido van Rossum wrote: > > I will approve the PEP as soon as you've updated the two function > names in the PEP. Congratulations Steven. Raymond From robertc at robertcollins.net Mon Apr 11 03:08:54 2016 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 11 Apr 2016 19:08:54 +1200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <87k2k5p3y7.fsf@vostro.rath.org> <20160410223157.GJ17895@unequivocal.co.uk> Message-ID: On 11 April 2016 at 13:49, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 04/10/2016 06:31 PM, Jon Ribbens wrote: >> Unless someone knows a way to get to an object's __dict__ or its type >> without using vars() or type() or underscore attributes... > > Hmm, 'classmethod'-wrapped functions get passed the type. yeah, but to access that you need to assign the descriptor to the type - circular loop. If you can arrange that assignment its easy: thetype = [] class gettype: def __get__(self, obj, type=None): thetype.append((obj, type)) return None classIwant.query = gettype() classIwant().query thetype[0][1]... but you've already gotten to classIwant there. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From storchaka at gmail.com Mon Apr 11 03:26:57 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 11 Apr 2016 10:26:57 +0300 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160410215341.GI17895@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> Message-ID: On 11.04.16 00:53, Jon Ribbens wrote: >> Try following example: >> >> it = iter([1]) >> for i in range(1000000): >> it = filter(None, it) >> next(it) > > That does indeed segfault. I guess you should report that as a bug! There is old issue that doesn't have adequate solution. And this is only one example, you can get segfault with other recursive iterators. From phd at phdru.name Mon Apr 11 05:17:58 2016 From: phd at phdru.name (Oleg Broytman) Date: Mon, 11 Apr 2016 11:17:58 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411060634.GA16992@phdru.name> References: <20160410165113.GF17895@unequivocal.co.uk> <20160410215341.GI17895@unequivocal.co.uk> <20160411030919.GC12526@ando.pearwood.info> <20160411035031.GA7952@phdru.name> <20160411060634.GA16992@phdru.name> Message-ID: <20160411091758.GA20672@phdru.name> On Mon, Apr 11, 2016 at 08:06:34AM +0200, Oleg Broytman wrote: > On Mon, Apr 11, 2016 at 12:42:47AM -0500, Wes Turner wrote: > > On Sun, Apr 10, 2016 at 10:50 PM, Oleg Broytman wrote: > > > > > On Mon, Apr 11, 2016 at 01:09:19PM +1000, Steven D'Aprano < > > > steve at pearwood.info> wrote: > > > > On Sun, Apr 10, 2016 at 08:12:30PM -0400, Jonathan Goble wrote: > > > > > On Sun, Apr 10, 2016 at 7:02 PM, Oscar Benjamin > > > > > wrote: > > > > > > I haven't looked at your sandbox but for a different approach try > > > this one: > > > > > > > > > > > > L = [None] > > > > > > L.extend(iter(L)) > > > > > > > > > > > > On my Linux machine that doesn't just crash Python. > > > > > > > > > > For the record: don't try this if you have unsaved files open on your > > > > > computer, because you will lose them. When I typed these two lines > > > > > into the Py3.5 interactive prompt, it completely and totally froze > > > > > Windows to the point that nothing would respond and I had to resort to > > > > > the old trick of holding the power button down for five seconds to > > > > > forcibly shut the computer down. > > > > > > > > > > > > I think this might improve matters: > > > > > > > > http://bugs.python.org/issue26351 > > > > > > > > although I must admit I don't understand why the entire OS is effected. > > > > > > Memory exhaustion? > > * > > https://docs.docker.com/compose/compose-file/#cpu-shares-cpu-quota-cpuset-domainname-hostname-ipc-mac-address-mem-limit-memswap-limit-privileged-read-only-restart-stdin-open-tty-user-working-dir > > > > * https://github.com/jupyter/dockerspawner/blob/master/systemuser/Dockerfile > > I think memory control groups in Linux can be used to limit memory > usage. I have mem. c. g. configured and I'll try to find time to > experiment with the code above. With limited memory it was fast: $ ulimit -d 50000 -m 80000 -s 10000 -v 100000 $ python Python 2.7.9 (default, Mar 1 2015, 18:22:53) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> L = [None] >>> L.extend(iter(L)) Traceback (most recent call last): File "", line 1, in MemoryError Memory control groups don't help because they don't limit virtual memory so the process simply starts thrashing. > > > > -- > > > > Steve Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From victor.stinner at gmail.com Mon Apr 11 05:40:05 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 11 Apr 2016 11:40:05 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160410164308.GE17895@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> Message-ID: 2016-04-10 18:43 GMT+02:00 Jon Ribbens : > On Sat, Apr 09, 2016 at 02:43:19PM +0200, Victor Stinner wrote: >> Please don't loose time trying yet another sandbox inside CPython. It's >> just a waste of time. It's broken by design. >> >> Please read my email about my attempt (pysandbox): >> https://lwn.net/Articles/574323/ >> >> And the LWN article: >> https://lwn.net/Articles/574215/ >> >> There are a lot of safe ways to run CPython inside a sandbox (and not rhe >> opposite). >> >> I started as you, add more and more things to a blacklist, but it doesn't >> work. > > That's the opposite of my approach though - I'm starting small and > adding things, not starting with everything and removing stuff. Even > if what we end up with is an extremely restricted subset of Python, > there are still cases where that could be a useful tool to have. You design rely on the assumption that CPython is only pure Python. That's wrong. A *lot* of Python features are implemented in C and "ignore" your sandboxing code. Quick reminder: 50% of CPython is written in the C language. It means that your protections like hiding builtin functions from the Python scope don't work. If an attacker gets access to a C function giving access to the hidden builtin, the game is over. pysandbox is based on the idea of tav (his project safelite.py): remove features in the dictionary of builtin C types like FrameType, CodeObject, etc. See sandbox/attributes.py. It's not enough to be 100% safe, a C function can still access fields of the C structure directly, but it was enough to protect "most" C functions. It's hard to list all features of the C code which are indirectly accessible from the Python scope. Some examples: warnings and tracebacks. These features killed the pysandbox project because they open directly files on the filesystem, it's not possible to control these features from the Python scope. Another example which exposes a vulnerability of your sandbox: str.format() gets directly object attributes without the getattr() builtin function, so it's possible to escape your sandbox. Example: "{0.__class__}".format(obj) shows the type of an object. Think also about the new f-string which allows arbitrary Python code: f"{code}". > However on the other hand, nobody has tried before to do what I am > doing (static code analysis), You're wrong. Zope Security ("RestrictedPython") has a similar design. Analyzing AST is a common design to build a sanbox. But it's not safe. The "See also" section of my pysandbox has a long list of Python sandboxes without various design. > so it's not necessarily a safe > assumption that the idea is doomed. For example, as far as I can see, > none of the methods used to break out of your pysandbox would work to > break out of my experiment. What I see is that you asked to break your sandbox, and less than 1 hour later, a first vulnerability was found (exec called with two parameters). A few hours later, a second vulnerability was found (async generator and cr_frame). By the way, are you sure that you fixed the vulnerability? You blacklisted "cb_frame", not cr_frame ;-) You should look closer, pysandbox is very close to you project. It also uses whitelists for some protections (ex: builtins) and blacklist for other protections (ex: hide sensitive attributes). You are using a blacklist for attributes. By the way, you hide cr_frame but not cr_code. I'm quite sure that it's possible to execute arbitrary bytecode in your sandbox, I just don't have enough time to dig into the code. Your sandbox is not fully based on whitelists. Victor From k7hoven at gmail.com Mon Apr 11 07:46:12 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 11 Apr 2016 14:46:12 +0300 Subject: [Python-Dev] Pathlib enhancments - method name only In-Reply-To: References: <5707F4DB.7000501@stoneleaf.us> Message-ID: On Sat, Apr 9, 2016 at 10:48 AM, Nick Coghlan wrote: > On 9 April 2016 at 04:25, Brett Cannon wrote: >> On Fri, 8 Apr 2016 at 11:13 Ethan Furman wrote: >>> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote: >>> > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: >>> >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: >>> >>> >>> >>> I'm still thinking a little bit about 'pathname', which to me sounds >>> >>> more like a string than fspath does. >>> >> >>> >> >>> >> I like that a lot - or even "__pathstr__" or "__pathstring__" >>> >> after all, we're making a big deal out of the fact that a path is >>> >> *not a string*, but rather a string is a *representation* (or >>> >> serialization) of a path. >>> >>> That's a decent point. >>> >>> So the plausible choices are, I think: >>> >>> - __fspath__ # File System Path -- possible confusion with Path >> >> +1 > > I like __fspath__, but I'm also sympathetic to Koos' point that we're > really dealing with path *names* being produced via this protocol, > rather than the paths themselves. > > That would bring the completely explicit "__fspathname__" into the > mix, which would be comparable in length to "__getattribute__" as a > magic method name (both in terms of number of syllable and number of > characters). > > Considering the helper function usage, here's some examples in > combination with os.fsencode and os.fsdecode: > > # Status quo for binary/text path conversions > text_path = os.fsdecode(bytes_path) > bytes_path = os.fsencode(text_path) > > # Getting a text path from an arbitrary object > text_path = os.fspath(obj) # This doesn't scream "returns text!" to me > text_path = os.fspathname(obj) # This does > > # Getting a binary path from an arbitrary object > bytes_path = os.fsencode(os.fspath(obj)) > bytes_path = os.fsencode(os.fspathname(obj)) > > I'm starting to think the semantic nudge from the "name" suffix when > reading the code is worth the extra four characters when writing it > (keeping in mind that the whole point of this exercise is that most > folks *won't* be writing explicit conversions - the stdlib will handle > it on their behalf). > Regarding the name, I completely agree with Nick's reasoning (above). I'm not sure it's a high priority to make dunder-method names short. They are not typed very often, and when the number of these "protocols" increases, you face potentially ambiguous names more and more often (there already is a '__path__' and a '__file__' etc., as has been brought up earlier in these threads.). In other words, it's a good idea to have some information in the name. > I also think the more explicit name helps answer some of the type > signature questions that have arisen: > > 1. Does os.fspathname return rich Path objects? No, it returns names > as str objects Or byte strings, it seems, unfortunately. > 2. Will file descriptors pass through os.fspathname? No, as they're > not names, they're numeric descriptors. > 3. Will bytes-like objects pass through os.fspathname? No, as they're > not names, they're encodings of names > If fspathname(...) is to be used in os.path.*, it will break things if it starts to turn encoded bytes pathnames into str pathnames, which it did not previously do. And if fspathname is not to be used in os.path.*, who would be our intended user of fspathname? I assume we we don't want to encourage typical 'users' to manipulate pathnames by hand. >> I personally still like __ospath__ as well. > > That one fails the "Is it ambiguous when spoken aloud?" test for me: > if someone mentions "oh-ess-path", are they talking about os.path or > __ospath__? With "eff-ess-path" or "eff-ess-path-name", that problem > doesn't arise. > +1 to this too. -Koos From mail at timgolden.me.uk Mon Apr 11 10:41:35 2016 From: mail at timgolden.me.uk (Tim Golden) Date: Mon, 11 Apr 2016 15:41:35 +0100 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #25910: Fixed more links in the docs. In-Reply-To: <20160411143851.18859.27207.908B2F75@psf.io> References: <20160411143851.18859.27207.908B2F75@psf.io> Message-ID: <570BB79F.2000708@timgolden.me.uk> On 11/04/2016 15:38, serhiy.storchaka wrote: > - `__. > + `__. Is there any intended irony in our link to openssl not being via https? :) TJG From jon+python-dev at unequivocal.co.uk Mon Apr 11 10:46:44 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Mon, 11 Apr 2016 15:46:44 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> Message-ID: <20160411144644.GA8206@unequivocal.co.uk> On Mon, Apr 11, 2016 at 11:40:05AM +0200, Victor Stinner wrote: > 2016-04-10 18:43 GMT+02:00 Jon Ribbens : > > That's the opposite of my approach though - I'm starting small and > > adding things, not starting with everything and removing stuff. Even > > if what we end up with is an extremely restricted subset of Python, > > there are still cases where that could be a useful tool to have. > > You design rely on the assumption that CPython is only pure Python. No it doesn't. Obviously I know CPython is written in C - the clue is in the name. I'm not sure what you mean here. > It means that your protections like hiding builtin functions from the > Python scope don't work. If an attacker gets access to a C function > giving access to the hidden builtin, the game is over. The former is only true if you assume the latter is possible. Is there any reason to believe it is? > It's hard to list all features of the C code which are indirectly > accessible from the Python scope. Some examples: warnings and > tracebacks. These features killed the pysandbox project because they > open directly files on the filesystem, it's not possible to control > these features from the Python scope. I think what you're referring to is when they show context for errors, for which they try and find the source code lines to display by identifying the filename, and you can subvert that process by changing __file__ and/or __name__. If so, you can't do that within my experiment because you're not allowed to access either of those names. > Another example which exposes a vulnerability of your sandbox: > str.format() gets directly object attributes without the getattr() > builtin function, so it's possible to escape your sandbox. Example: > "{0.__class__}".format(obj) shows the type of an object. Yes, I'd thought of that. However getting access to a string which contains the name or a representation of an object is not at all the same thing as getting access to the object itself. > Think also about the new f-string which allows arbitrary Python > code: f"{code}". Obviously I can't speak to features of future versions of Python. I'd have to see the ast generated by an f-string to know if they pose a problem or not, but I suspect they would compile to expression nodes and hence be caught by the existing checks. > > However on the other hand, nobody has tried before to do what I am > > doing (static code analysis), > > You're wrong. > > Zope Security ("RestrictedPython") has a similar design. Analyzing AST > is a common design to build a sanbox. But it's not safe. Ah, I hadn't seen that one. Yes, they are doing something similar (but also much more complex!) I don't know why you say this is a "common design" though, that one is the only one that appears to use it. > What I see is that you asked to break your sandbox, and less than 1 > hour later, a first vulnerability was found (exec called with two > parameters). A few hours later, a second vulnerability was found > (async generator and cr_frame). The former was just a stupid bug, it says nothing about the viability of the methodology. The latter was a new feature in a Python version later than I have ever used, and again does not imply anything much about the viability. I think now I've blocked the names of frame object attributes it wouldn't be a vulnerability any more anyway. > By the way, are you sure that you fixed the vulnerability? You > blacklisted "cb_frame", not cr_frame ;-) Ah, thanks. As above, I think this doesn't actually make any difference, but I've updated the code regardless. > You should look closer, pysandbox is very close to you project. I've just looked through it all again, and I don't understand why you are saying that. It's nothing like my experiment. It's trying to alter the global Python environment so that arbitrary code can be executed, whereas I am not even trying to allow execution of arbitrary code and am not altering the global environment. From antoine at python.org Mon Apr 11 10:48:27 2016 From: antoine at python.org (Antoine Pitrou) Date: Mon, 11 Apr 2016 14:48:27 +0000 (UTC) Subject: [Python-Dev] Pathlib enhancments - method name only References: <5707F4DB.7000501@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > > That's a decent point. > > So the plausible choices are, I think: > > - __fspath__ # File System Path -- possible confusion with Path This would have my preference. Regards Antoine. From antoine at python.org Mon Apr 11 10:56:25 2016 From: antoine at python.org (Antoine Pitrou) Date: Mon, 11 Apr 2016 14:56:25 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Pathlib_enhancements_-_acceptable_inputs_?= =?utf-8?q?and_outputs_for_=5F=5Ffspath=5F=5F_and_os=2Efspath=28=29?= References: <5709309D.8030007@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > > I also think the more explicit name helps answer some of the type > > signature questions that have arisen: > > > > 1. Does os.fspathname return rich Path objects? No, it returns names > > as str objects > > 2. Will file descriptors pass through os.fspathname? No, as they're > > not names, they're numeric descriptors. > > 3. Will bytes-like objects pass through os.fspathname? No, as they're > > not names, they're encodings of names > > If we add os.fspath(), but don't allow bytes to be returned from it, our > above example looks more like: > > if isinstance(a_path_thingy, bytes): > # because os can accept bytes > pass > else: > a_path_thingy = os.fspath(a_path_thingy) > # do something with the path > > Yes, it's better -- but it still requires a pre-check before calling > os.fspath(). > > It is my contention that this is better: > > a_path_thingy = os.fspath(a_path_thingy) It's not better, because a_path_thingy then may be a bytes object, and the os.fspath() caller has to deal with it. Conversely, if os.fspath() is guaranteed to return a unicode string, then the caller only has to worry about bytes paths if it really wants to; most callers probably don't care. I know what some people say: support for bytes paths is necessary for "low-level functions" (definition required ;-)). But in a PEP 383 world, it's not necessary at all. > 2) pathlib.Path accepts bytes -- Does it? Or are you proposing such a change? >>> pathlib.Path(b".") Traceback (most recent call last): File "", line 1, in File "/home/antoine/35/lib/python3.5/pathlib.py", line 956, in __new__ self = cls._from_parts(args, init=False) File "/home/antoine/35/lib/python3.5/pathlib.py", line 638, in _from_parts drv, root, parts = self._parse_args(args) File "/home/antoine/35/lib/python3.5/pathlib.py", line 630, in _parse_args % type(a)) TypeError: argument should be a path or str object, not Regards Antoine. From k7hoven at gmail.com Mon Apr 11 11:02:47 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 11 Apr 2016 18:02:47 +0300 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> Message-ID: On Mon, Apr 11, 2016 at 9:27 AM, Nick Coghlan wrote: > On 11 April 2016 at 02:16, Ethan Furman wrote: >> >> I guess I don't see the point of this. Either DirEntry's [1] only get >> partial support (which is only marginally better than the no support pathlib >> currently has), or stdlib code will need to catch those errors and then do >> an isinstance check to see if knows what the type is and how to deal with it >> [1]. > > What's wrong with only gaining partial support? Standard library code > that doesn't currently support DirEntry at all will gain the ability > to support str-based DirEntry objects, while bytes-based DirEntry > objects will continue to be a low level object that isn't > interoperable with most other APIs (which is fine - anyone writing low > level POSIX-specific code can deal with unpacking the values > explicitly, it just won't happen implicitly anywhere). > While I'm also tempted to lean towards 'marginalizing bytes support', it seems a little bit dangerous to me. Currently, os.path is heavily based on duck typing of str and bytes, so there may be code out there that does all kinds of things with paths without knowing whether it deals with bytes or str objects. If such code gets in contact with this pathname protocol, it will raise an exception whenever it happens to be fed a bytes path. That is, if the approach of 'partial support' is taken. And still there is the question I just posted in another branch of this mess: Who should use os.fspathname(...)? If it's os.path.* and other traditional (low-level?) functions that deal with paths, then fspathname should, in the name of backwards compatiblity, be able to deal with bytes and return bytes in those cases. Otherwise fspathname would do nothing for you, and all the work of isinstance/hasattr/whatever would be left to the caller of os.fspathname (or maybe this is what you want?). So a somewhat useful fspathname might indeed look something like this: def fspathname(pathlike) -> Union[str, bytes]: pathname = getattr(pathlike, '__fspathname__', pathlike) if not isinstance(pathname, (str, bytes)): raise TypeError("your thing is not pathlike") return pathname But maybe it is enough to have the __fspathname__ attribute, and make fspathname() some internal implementation detail of os.path.* and the like. -Koos From p.f.moore at gmail.com Mon Apr 11 11:04:21 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 11 Apr 2016 16:04:21 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411144644.GA8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> Message-ID: On 11 April 2016 at 15:46, Jon Ribbens wrote: > It's trying to alter > the global Python environment so that arbitrary code can be executed, > whereas I am not even trying to allow execution of arbitrary code and > am not altering the global environment. However, it's not at all clear (to me at least) what you *are* trying to do. You're limiting the subset of Python that people can use, understood. And you're trying to ensure that people can't do "bad things". Again, understood. But what subset are you actually allowing, and what things are you trying to protect against? (For example, I can't calculate sin(1.2) using the math module - why is that not alllowed? It's just as safe as using the built in exponential operator, and indeed I could write a sin() function in pure Python, although it would be too slow to be useful, unlike math.sin...) It feels at the moment as if I'm playing a game where I don't know the rules, and every time I think I scored a point, the rules are changed to retroactively disallow it. Paul From ethan at stoneleaf.us Mon Apr 11 11:18:47 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 08:18:47 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> Message-ID: <570BC057.4040809@stoneleaf.us> On 04/11/2016 07:56 AM, Antoine Pitrou wrote: >> 2) pathlib.Path accepts bytes -- > > Does it? Or are you proposing such a change? It used to (I posted a couple examples from 3.5.0). I finally rebuilt with the latest and it no longer does. -- ~Ethan~ From Nikolaus at rath.org Mon Apr 11 11:35:11 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Mon, 11 Apr 2016 08:35:11 -0700 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411144644.GA8206@unequivocal.co.uk> (Jon Ribbens's message of "Mon, 11 Apr 2016 15:46:44 +0100") References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> Message-ID: <8760vorweo.fsf@thinkpad.rath.org> On Apr 11 2016, Jon Ribbens wrote: >> What I see is that you asked to break your sandbox, and less than 1 >> hour later, a first vulnerability was found (exec called with two >> parameters). A few hours later, a second vulnerability was found >> (async generator and cr_frame). > > The former was just a stupid bug, it says nothing about the viability > of the methodology. The latter was a new feature in a Python version > later than I have ever used, and again does not imply anything much > about the viability. It implies that new versions of Python may break your sandbox. That doesn't sound like a viable long-term solution. > I think now I've blocked the names of frame > object attributes it wouldn't be a vulnerability any more anyway. It seems like you're playing whack-a-mole. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From ijmorlan at uwaterloo.ca Mon Apr 11 07:04:56 2016 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Mon, 11 Apr 2016 07:04:56 -0400 (EDT) Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> Message-ID: On Mon, 11 Apr 2016, Victor Stinner wrote: > 2016-04-10 18:43 GMT+02:00 Jon Ribbens : >> >> That's the opposite of my approach though - I'm starting small and >> adding things, not starting with everything and removing stuff. Even >> if what we end up with is an extremely restricted subset of Python, >> there are still cases where that could be a useful tool to have. > > You design rely on the assumption that CPython is only pure Python. > That's wrong. A *lot* of Python features are implemented in C and > "ignore" your sandboxing code. Quick reminder: 50% of CPython is > written in the C language. > > It means that your protections like hiding builtin functions from the > Python scope don't work. If an attacker gets access to a C function > giving access to the hidden builtin, the game is over. [....] Non-Python core developer, non-expert-specifically-in-computer-security here, so won't take up much room on this list. I know enough about almost everything in Computer Science to know just how ignorant I am about almost everything in Computer Science. But I would not use for security purposes a Python sandbox that was not formally verified to be correct and unbreakable. Of course in order for this to be possible, there first has to be a formal semantics for Python. Has anybody made a formal semantics for Python? If not, then this project is missing a pretty important pre-requisite. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist From jcristau at debian.org Mon Apr 11 09:49:10 2016 From: jcristau at debian.org (Julien Cristau) Date: Mon, 11 Apr 2016 15:49:10 +0200 Subject: [Python-Dev] tp_new selection regression in the 2.7 branch Message-ID: <20160411134910.GG2889@betterave.cristau.org> Hi, changeset https://hg.python.org/cpython/rev/e7062dd9085e in the 2.7 branch changes how tp_new is assigned, and causes regressions with multiple inheritance from extension classes. http://bugs.python.org/issue25731#msg262922 has a fairly simple reproducer using cython. The __base__ attribute is set correctly, but tp_new is now wrong and thus the object initialization is broken. Can this change be fixed or reverted before the next 2.7.x release? (I have not verified if this regression also affects the 3.5 branch) Thanks, Julien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From rosuav at gmail.com Mon Apr 11 12:01:33 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 02:01:33 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> Message-ID: On Mon, Apr 11, 2016 at 9:04 PM, Isaac Morland wrote: > But I would not use for security purposes a Python sandbox that was not > formally verified to be correct and unbreakable. Of course in order for > this to be possible, there first has to be a formal semantics for Python. > Has anybody made a formal semantics for Python? If not, then this project > is missing a pretty important pre-requisite. Formal semantics for the language? Yes; most of docs.python.org is about the language, independently of any particular implementation. (There are odd notes here and there about "CPython implementation detail" and such, and there are some entire modules that are specifically stated as being implementation-specific, but they're a tiny proportion.) You can also read through the PEPs, which (again, for the most part) deal with language-level concerns ahead of implementation details. However, even with that information, it's virtually impossible to formally verify that the sandbox is unbreakable. A Python-in-Python sandbox is almost guaranteed to leak information across the boundary, and when information is leaked, it's extremely hard to prove that privilege escalation is impossible. ChrisA From ethan at stoneleaf.us Mon Apr 11 12:18:01 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 09:18:01 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> Message-ID: <570BCE39.8090306@stoneleaf.us> On 04/10/2016 11:27 PM, Nick Coghlan wrote: > On 11 April 2016 at 02:16, Ethan Furman wrote: >>> DirEntry can still get the check, it can just throw TypeError when it >>> represents a binary path (that's one of the advantages of using a >>> method-based protocol - exceptions on method calls are more acceptable >>> than exceptions on property access). >> >> >> I guess I don't see the point of this. Either DirEntry's [1] only get >> partial support (which is only marginally better than the no support pathlib >> currently has), or stdlib code will need to catch those errors and then do >> an isinstance check to see if knows what the type is and how to deal with it >> [1]. > > What's wrong with only gaining partial support? Standard library code > that doesn't currently support DirEntry at all will gain the ability > to support str-based DirEntry objects, while bytes-based DirEntry > objects will continue to be a low level object [...] Let's consider to functions, one that accepts bytes/str for the path, and one that only accepts str: str-only support ---------------- # before new protocol def do_fritz(a_path): if not isinstance(a_path, str): raise TypeError('str required') ... # after new protocol with str-only support def do_fritz(a_path): a_path = fspath(a_path) ... # after new protocol with bytes/str support a_path = fspath(a_path) if not isinstance(a_path, str): raise TypeError('str required') ... bytes/str support ----------------- # before new protocol def zingar(a_path): if not isinstance(a_path, (bytes,str)): raise TypeError('bytes or str required') ... # after new protocol with str-only support def zingar(a_path): if not isinstance(a_path, bytes): try: a_path = fspath(a_path) except FSPathError: raise TypeError('bytes or str required') ... # after new protocol with bytes/str support def zingar(a_path): a_path = fspath(a_path) if not isinstance(a_path, (bytes,str)): raise TypeError('bytes or str required') ... If those examples are anywhere close to accurate, an fspath protocol that supported both bytes and str seems a lot easier to work with. -- ~Ethan~ From ajm at flonidan.dk Mon Apr 11 09:54:49 2016 From: ajm at flonidan.dk (Anders Munch) Date: Mon, 11 Apr 2016 13:54:49 +0000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) Message-ID: Steven D'Aprano: > although I must admit I don't understand why the entire OS is effected. A consequence of memory overcommit, I'd wager. The crasher code not only allocates vast swathes of memory, but accesses it as well, which is bad news for Linux with overcommit enabled. When the OS runs out of backing store to handle page faults, anything can happen. - Anders From zachary.ware+pydev at gmail.com Mon Apr 11 12:32:29 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Mon, 11 Apr 2016 11:32:29 -0500 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570BCE39.8090306@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> Message-ID: On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman wrote: > If those examples are anywhere close to accurate, an fspath protocol that > supported both bytes and str seems a lot easier to work with. But why are you working with bytes paths in the first place? Where did you get them from, and why couldn't you decode them at that boundary? In 7ish years of working with Python (almost exclusively Python 3) on Windows and UNIX, I have never used bytes paths on any platform. -- Zach From jon+python-dev at unequivocal.co.uk Mon Apr 11 12:44:49 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Mon, 11 Apr 2016 17:44:49 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <8760vorweo.fsf@thinkpad.rath.org> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <8760vorweo.fsf@thinkpad.rath.org> Message-ID: <20160411164449.GB8206@unequivocal.co.uk> On Mon, Apr 11, 2016 at 08:35:11AM -0700, Nikolaus Rath wrote: > On Apr 11 2016, Jon Ribbens wrote: > >> What I see is that you asked to break your sandbox, and less than 1 > >> hour later, a first vulnerability was found (exec called with two > >> parameters). A few hours later, a second vulnerability was found > >> (async generator and cr_frame). > > > > The former was just a stupid bug, it says nothing about the viability > > of the methodology. The latter was a new feature in a Python version > > later than I have ever used, and again does not imply anything much > > about the viability. > > It implies that new versions of Python may break your sandbox. That > doesn't sound like a viable long-term solution. That is obviously always going to be true of major new versions with major new features, no matter what language we're talking about or what method is being used to sandbox - unless the sandboxing were to be built in to the language itself, which I have deliberately not suggested. But having said that, I already pointed out in the message you're responding to that with the method I'm using now, coroutines would not have been an issue even if I hadn't specifically fixed them. > > I think now I've blocked the names of frame > > object attributes it wouldn't be a vulnerability any more anyway. > > It seems like you're playing whack-a-mole. Well, no, quite the opposite in fact. If that was true then I would have given up already as the method having been proved useless. So far it looks like blocking "_*" and the frame object attributes appears to be sufficient. From jon+python-dev at unequivocal.co.uk Mon Apr 11 12:53:54 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Mon, 11 Apr 2016 17:53:54 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> Message-ID: <20160411165354.GC8206@unequivocal.co.uk> On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote: > However, it's not at all clear (to me at least) what you *are* trying > to do. I'm trying to see to what extent we can use ast node inspection to remedy the failures of prior attempts at Python sandboxing. Is there *any* extent to which Python can be sandboxed, or is even trying to use it as a calculator function unfixably insecure? > You're limiting the subset of Python that people can use, > understood. And you're trying to ensure that people can't do "bad > things". Again, understood. But what subset are you actually allowing, > and what things are you trying to protect against? (For example, I > can't calculate sin(1.2) using the math module - why is that not > alllowed? It wasn't allowed in the earlier version because I wasn't allowing import at all, because this is just an experiment. As it happens, I added 'import' yesterday so yes you can use math.sin. > It feels at the moment as if I'm playing a game where I don't know the > rules, and every time I think I scored a point, the rules are changed > to retroactively disallow it. The challenge is to show some code that will escape from the sandbox, in a way that is not trivially fixable with a tiny patch, or in a way that demonstrates that such a large number of tiny patches would be required as to be unworkable. From rosuav at gmail.com Mon Apr 11 13:02:54 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 03:02:54 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411165354.GC8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 2:53 AM, Jon Ribbens wrote: > On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote: >> However, it's not at all clear (to me at least) what you *are* trying >> to do. > > I'm trying to see to what extent we can use ast node inspection to > remedy the failures of prior attempts at Python sandboxing. Is there > *any* extent to which Python can be sandboxed, or is even trying to > use it as a calculator function unfixably insecure? > It all depends on how much functionality you want. If all you need is a numeric expression evaluator, that's not too hard - disallow all forms of attribute access, etc, and just have simple numbers and operators. That's pretty useful, and safe. Alternatively, go completely the other way. Let people run whatever code they like... in an environment where it can't hurt anyone else. That's what PyPyJS does - don't bother looking for security holes in it, because all you're doing is attacking your own computer. The hard part comes when you want to allow *some*, but not all, interaction with the outside world. When I was looking into this kind of sandboxing (although it was Python-in-C++ rather than Python-in-Python), it was to allow untrusted users to control certain parts of server-side execution. The result was dismal, because it's fundamentally impossible to allow the level of control I wanted without allowing a level of control I didn't want. So before you can ask whether Python is unfixably insecure, you first have to decide what the minimum level of functionality is that you'll accept. Do you need basic arithmetic plus trignometric functions? Easy enough - disallow all attribute access and imports, and populate builtins with "from math import *". Need them to be able to assign variables and define functions? That's gonna be harder. ChrisA From ethan at stoneleaf.us Mon Apr 11 13:12:55 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 10:12:55 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> Message-ID: <570BDB17.5000601@stoneleaf.us> On 04/11/2016 09:32 AM, Zachary Ware wrote: > On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman wrote: >> If those examples are anywhere close to accurate, an fspath protocol that >> supported both bytes and str seems a lot easier to work with. > > But why are you working with bytes paths in the first place? Where did > you get them from, and why couldn't you decode them at that boundary? > In 7ish years of working with Python (almost exclusively Python 3) on > Windows and UNIX, I have never used bytes paths on any platform. I'm not saying that bytes paths are common -- and if this was a brand-new feature I wouldn't be pushing for it so hard; however, bytes paths are already supported and it seems to me to be much less of a headache to continue the support in this new protocol instead of drawing an artificial line in the sand. Also, let me be clear that the new protocol will not adversely affect my own library is it directly subclasses bytes and strings (bPath and uPath), so they will pass through either way (or be appropriately rejected if the function only supports str -- are there any?) . This kind of feels like PEP 361 again -- the vast majority of Python programmers do not need %-interpolation for bytes, but what a pain in the rear for those that did! (Yes, I was one of those.) Admittedly, the pain from this will not be nearly as severe as that was, but why should we have any unnecessary pain at all? Asked another way, what are we gaining by disallowing bytes in this new way of getting paths versus the pain caused when bytes are needed and/or accepted? From my point of view the pain of simply implementing this without bytes support in the existing os and os.path modules is not worth excluding bytes. -- ~Ethan~ From donald at stufft.io Mon Apr 11 13:18:01 2016 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Apr 2016 13:18:01 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570BDB17.5000601@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> Message-ID: > On Apr 11, 2016, at 1:12 PM, Ethan Furman wrote: > > Asked another way, what are we gaining by disallowing bytes in this new way of getting paths versus the pain caused when bytes are needed and/or accepted? It seems fine to me to allow __fspath__ to return bytes as well as str. The only argument I can think against it is that something like pathlib.Path() would not work with a bytes returning __fspath__, but that?s not any different than what happens if you pass a bytes object directly into pathlib.Path as well. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From brett at python.org Mon Apr 11 13:36:33 2016 From: brett at python.org (Brett Cannon) Date: Mon, 11 Apr 2016 17:36:33 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570BDB17.5000601@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> Message-ID: On Mon, 11 Apr 2016 at 10:13 Ethan Furman wrote: > On 04/11/2016 09:32 AM, Zachary Ware wrote: > > On Mon, Apr 11, 2016 at 11:18 AM, Ethan Furman wrote: > > >> If those examples are anywhere close to accurate, an fspath protocol > that > >> supported both bytes and str seems a lot easier to work with. > > > > But why are you working with bytes paths in the first place? Where did > > you get them from, and why couldn't you decode them at that boundary? > > In 7ish years of working with Python (almost exclusively Python 3) on > > Windows and UNIX, I have never used bytes paths on any platform. > > I'm not saying that bytes paths are common -- and if this was a > brand-new feature I wouldn't be pushing for it so hard; however, bytes > paths are already supported and it seems to me to be much less of a > headache to continue the support in this new protocol instead of drawing > an artificial line in the sand. > Headache for you? The stdlib? Library authors? Users of libraries? There are a lot of users of this who have varying levels of pain for this. > > Also, let me be clear that the new protocol will not adversely affect my > own library is it directly subclasses bytes and strings (bPath and > uPath), so they will pass through either way (or be appropriately > rejected if the function only supports str -- are there any?) . > Well, technically it depends on whether we prefer the protocol or explicit type checking and how we define the protocol. If we say __ospath__ has to return str and we check for that first then that would be bad for you. If we do isinstance() checks before calling the protocol or allow both str and bytes then we open it up. > > This kind of feels like PEP 361 again -- the vast majority of Python > programmers do not need %-interpolation for bytes, but what a pain in > the rear for those that did! (Yes, I was one of those.) Admittedly, > the pain from this will not be nearly as severe as that was, but why > should we have any unnecessary pain at all? > > Asked another way, what are we gaining by disallowing bytes in this new > way of getting paths versus the pain caused when bytes are needed and/or > accepted? > Type consistency. E.g. if I pass in a DirEntry object into os.fspath() and I don't know what the heck I'm getting back then that can lead to subtle bugs, especially when you didn't check ahead of time what DirEntry.path was. To me, that bumps up against "In the face of ambiguity, refuse the temptation to guess". Having the type vary even when the type doesn't can get messy if you don't expect to always vary (i.e. this isn't getattr()). > > From my point of view the pain of simply implementing this without > bytes support in the existing os and os.path modules is not worth > excluding bytes. > How about we take something from the "explicit is better than implicit" playbook and add a keyword argument to os.fspath() to allow bytes to pass through? def fspath(path, *, allow_bytes=False): if isinstance(path, str): return path # Allow bytearray? elif allow_bytes and isinstance(path, bytes): return path try: protocol = path.__fspath__() except AttributeError: pass else: # Explicit type check worth it, or better to rely on duck typing? if isinstance(protocol_path, str): return protocol_path raise TypeError("expected a path-like object, str, or bytes (if allowed), not {type(path)}") For DirEntry users who use bytes, they will simply have to pass around DirEntry.path which is not as nice as simply passing around DirEntry, but it does allow them to continue to operate without having to decode the bytes if allow_bytes is True. We get type consistency in the protocol fas we can continue to expect people to return strings for __fspath__. And for those APIs where supporting bytes won't be an issue, they can explicitly choose to support bytes or not and then not have to juggle support for both str and bytes if they choose not to. IOW consenting adults to bytes paths can not get cut out and have a ton of hoops to jump through as long as they opt-in, but those adults who don't consent to bytes paths have their lives simplified. -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Mon Apr 11 13:48:51 2016 From: antoine at python.org (Antoine Pitrou) Date: Mon, 11 Apr 2016 17:48:51 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Pathlib_enhancements_-_acceptable_inputs_?= =?utf-8?q?and_outputs_for_=5F=5Ffspath=5F=5F_and_os=2Efspath=28=29?= References: <5709309D.8030007@stoneleaf.us> <570BC057.4040809@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > > On 04/11/2016 07:56 AM, Antoine Pitrou wrote: > > >> 2) pathlib.Path accepts bytes -- > > > > Does it? Or are you proposing such a change? > > It used to (I posted a couple examples from 3.5.0). I finally rebuilt > with the latest and it no longer does. This is surprising, since in its entire lifetime, pathlib was never supposed to support bytes inputs. See the argument check in the initial checkin of pathlib.py: https://hg.python.org/cpython/rev/43377dcfb801/#l6.571 Perhaps that slipped through at some point (and obviously no test was there to prevent it :-)). Regards Antoine. From steve at pearwood.info Mon Apr 11 13:50:37 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 12 Apr 2016 03:50:37 +1000 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: References: <20151016005711.GC11980@ando.pearwood.info> <20160410050845.GA12526@ando.pearwood.info> Message-ID: <20160411175036.GA1819@ando.pearwood.info> On Sun, Apr 10, 2016 at 11:43:08AM -0700, Guido van Rossum wrote: > Hi Steven, > > No probIem with the delay -- it's still before 3.6.0. I do think it's > just about a record gap in the PEP review process. :-) > > I will approve the PEP as soon as you've updated the two function > names in the PEP. (If you don't have write access to the peps repo, > send the new version to peps at python.org -- or send a link to the new > draft somewhere online, e.g. github if you're using that. If you do > have peps repo write access, just reply here when it's done.) I have done that, and updated the API and Implementation section to be less wishy-washy and more commital about what exactly will be included. Hope it meets with your approval, and thanks for your guidance! -- Steve From random832 at fastmail.com Mon Apr 11 14:18:08 2016 From: random832 at fastmail.com (Random832) Date: Mon, 11 Apr 2016 14:18:08 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> Message-ID: <1460398688.3275807.575485137.7B32BC19@webmail.messagingengine.com> On Mon, Apr 11, 2016, at 13:36, Brett Cannon wrote: > How about we take something from the "explicit is better than implicit" > playbook and add a keyword argument to os.fspath() to allow bytes to pass > through? Except, we already know how to convert a bytes-path into a str (and vice versa) with sys.getfilesystemencoding and surrogateescape. So why not just have the argument specify what return type is desired? def fspath(path, *, want_bytes=False): if isinstance(path, (bytes, str)): ppath = path else: try: ppath = path.__fspath__() except AttributeError: raise TypeError if isinstance(ppath, str): return ppath.encode(...) if want_bytes else ppath elif isinstance(ppath, bytes): return ppath if want_bytes else ppath.decode(...) else: raise TypeError This way the posix os module can call the function and have the bytes value already prepared for it to pass to the real open() syscall. You could even add the same thing in other places, e.g. os.path.join (defaulting to if the first argument is a bytes). From ethan at stoneleaf.us Mon Apr 11 14:28:22 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 11:28:22 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> Message-ID: <570BECC6.1080708@stoneleaf.us> On 04/11/2016 10:36 AM, Brett Cannon wrote: > On Mon, 11 Apr 2016 at 10:13 Ethan Furman wrote: >> I'm not saying that bytes paths are common -- and if this was a >> brand-new feature I wouldn't be pushing for it so hard; however, bytes >> paths are already supported and it seems to me to be much less of a >> headache to continue the support in this new protocol instead of drawing >> an artificial line in the sand. > > Headache for you? The stdlib? Library authors? Users of libraries? There > are a lot of users of this who have varying levels of pain for this. Yes, yes, maybe, maybe. :) >> Asked another way, what are we gaining by disallowing bytes in this new >> way of getting paths versus the pain caused when bytes are needed and/or >> accepted? > > Type consistency. E.g. if I pass in a DirEntry object into os.fspath() > and I don't know what the heck I'm getting back then that can lead to > subtle bugs [...] > How about we take something from the "explicit is better than implicit" > playbook and add a keyword argument to os.fspath() to allow bytes to > pass through? > > def fspath(path, *, allow_bytes=False): > if isinstance(path, str): > return path > # Allow bytearray? > elif allow_bytes and isinstance(path, bytes): > return path > try: > protocol = path.__fspath__() > except AttributeError: > pass > else: > # Explicit type check worth it, or better to rely on duck typing? > if isinstance(protocol_path, str): > return protocol_path > raise TypeError("expected a path-like object, str, or bytes (if > allowed), not {type(path)}") I think that might work. We currently have four path related things: bytes, str, Path, DirEntry -- two are str-only, one is bytes-only, and one can be either. I would write the above as: def fspath(path, *, allow_bytes=False): try: path = path.__fspath__() except AttributeError: pass if isinstance(path, str): return path elif allow_bytes and isinstance(path, bytes): return path else: raise SomeError() > For DirEntry users who use bytes, they will simply have to pass around > DirEntry.path which is not as nice as simply passing around DirEntry, If we go with the above we allow DirEntry.__fspath__ to return bytes and still get type-consistency of str unless the user explicitly declares they're okay with getting either (and even then the field is narrowed from four possible source types (or more as time goes on) to two. To recap, this would allow both str & bytes in __fspath__, but the fspath() function defaults to only allowing str through. I can live with that. -- ~Ethan~ From storchaka at gmail.com Mon Apr 11 14:29:02 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 11 Apr 2016 21:29:02 +0300 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #25910: Fixed more links in the docs. In-Reply-To: <570BB79F.2000708@timgolden.me.uk> References: <20160411143851.18859.27207.908B2F75@psf.io> <570BB79F.2000708@timgolden.me.uk> Message-ID: On 11.04.16 17:41, Tim Golden wrote: > On 11/04/2016 15:38, serhiy.storchaka wrote: >> - `__. >> + `__. > > Is there any intended irony in our link to openssl not being via https? > > :) http://bugs.python.org/issue26736 From guido at python.org Mon Apr 11 14:35:31 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 11 Apr 2016 11:35:31 -0700 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: <20160411175036.GA1819@ando.pearwood.info> References: <20151016005711.GC11980@ando.pearwood.info> <20160410050845.GA12526@ando.pearwood.info> <20160411175036.GA1819@ando.pearwood.info> Message-ID: Most excellent! PEP 506 is hereby approved. Congrats again. On Mon, Apr 11, 2016 at 10:50 AM, Steven D'Aprano wrote: > On Sun, Apr 10, 2016 at 11:43:08AM -0700, Guido van Rossum wrote: >> Hi Steven, >> >> No probIem with the delay -- it's still before 3.6.0. I do think it's >> just about a record gap in the PEP review process. :-) >> >> I will approve the PEP as soon as you've updated the two function >> names in the PEP. (If you don't have write access to the peps repo, >> send the new version to peps at python.org -- or send a link to the new >> draft somewhere online, e.g. github if you're using that. If you do >> have peps repo write access, just reply here when it's done.) > > I have done that, and updated the API and Implementation section to be > less wishy-washy and more commital about what exactly will be included. > Hope it meets with your approval, and thanks for your guidance! > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Apr 11 14:45:21 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 11:45:21 -0700 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: References: <20151016005711.GC11980@ando.pearwood.info> <20160410050845.GA12526@ando.pearwood.info> <20160411175036.GA1819@ando.pearwood.info> Message-ID: <570BF0C1.6070209@stoneleaf.us> On 04/11/2016 11:35 AM, Guido van Rossum wrote: > Most excellent! PEP 506 is hereby approved. Congrats again. Congratulations, Steven! -- ~Ethan~ From brett at python.org Mon Apr 11 15:00:41 2016 From: brett at python.org (Brett Cannon) Date: Mon, 11 Apr 2016 19:00:41 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570BECC6.1080708@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> Message-ID: On Mon, 11 Apr 2016 at 11:28 Ethan Furman wrote: > On 04/11/2016 10:36 AM, Brett Cannon wrote: > > On Mon, 11 Apr 2016 at 10:13 Ethan Furman wrote: > > >> I'm not saying that bytes paths are common -- and if this was a > >> brand-new feature I wouldn't be pushing for it so hard; however, bytes > >> paths are already supported and it seems to me to be much less of a > >> headache to continue the support in this new protocol instead of drawing > >> an artificial line in the sand. > > > > Headache for you? The stdlib? Library authors? Users of libraries? There > > are a lot of users of this who have varying levels of pain for this. > > Yes, yes, maybe, maybe. :) > > >> Asked another way, what are we gaining by disallowing bytes in this new > >> way of getting paths versus the pain caused when bytes are needed and/or > >> accepted? > > > > Type consistency. E.g. if I pass in a DirEntry object into os.fspath() > > and I don't know what the heck I'm getting back then that can lead to > > subtle bugs [...] > > > How about we take something from the "explicit is better than implicit" > > playbook and add a keyword argument to os.fspath() to allow bytes to > > pass through? > > > > def fspath(path, *, allow_bytes=False): > > if isinstance(path, str): > > return path > > # Allow bytearray? > > elif allow_bytes and isinstance(path, bytes): > > return path > > try: > > protocol = path.__fspath__() > > except AttributeError: > > pass > > else: > > # Explicit type check worth it, or better to rely on duck > typing? > > if isinstance(protocol_path, str): > > return protocol_path > > raise TypeError("expected a path-like object, str, or bytes (if > > allowed), not {type(path)}") > > I think that might work. We currently have four path related things: > bytes, str, Path, DirEntry -- two are str-only, one is bytes-only, and > one can be either. > > I would write the above as: > > def fspath(path, *, allow_bytes=False): > try: > path = path.__fspath__() > except AttributeError: > pass > if isinstance(path, str): > return path > elif allow_bytes and isinstance(path, bytes): > return path > else: > raise SomeError() > > > For DirEntry users who use bytes, they will simply have to pass around > > DirEntry.path which is not as nice as simply passing around DirEntry, > > If we go with the above we allow DirEntry.__fspath__ to return bytes and > still get type-consistency of str unless the user explicitly declares > they're okay with getting either (and even then the field is narrowed > from four possible source types (or more as time goes on) to two. > You get type consistency from so.fspath(), not the protocol, though. > > To recap, this would allow both str & bytes in __fspath__, but the > fspath() function defaults to only allowing str through. > > I can live with that. > I'm -0 on allowing __fspath__ to return bytes, but we can see what others think. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Apr 11 16:19:39 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 13:19:39 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> Message-ID: <570C06DB.3050705@stoneleaf.us> On 04/11/2016 12:00 PM, Brett Cannon wrote: > On Mon, 11 Apr 2016 at 11:28 Ethan Furman wrote: >> I would write the above as: >> >> def fspath(path, *, allow_bytes=False): > > You get type consistency from so.fspath(), not the protocol, though. Well, since the protocol is also a function, we could put the allow_bytes on that as well -- not sure if that is a good idea or not. -- ~Ethan~ From tritium-list at sdamon.com Mon Apr 11 16:33:15 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 11 Apr 2016 16:33:15 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. Message-ID: <570C0A0B.90109@sdamon.com> In reviewing the ongoing arguments about how to make pathlib better, there have been circular arguments about if it is even broken, if it should support bytes, if there should be a path protocol that all functions that touch the filesystem should use, if that protocol should support bytes, how that protocol should be open or closed to allow third party modules to act as paths, etc., etc. If there is headway being made, I do not see it. I don't think we can come to an agreement that will make anyone happy, or have any effect on the adoption of the pathlib module in the standard library. Maybe, just maybe, since there is an ecosystem of third party modules already doing this job (and arguably doing it much better than pathlib, and for more supported versions of python than any future version of pathlib will), it should be dropped from the standard library and left on pypi as a third party module. From srkunze at mail.de Mon Apr 11 16:39:19 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 11 Apr 2016 22:39:19 +0200 Subject: [Python-Dev] pathlib+os/shutil feedback In-Reply-To: References: <570A5E36.2070606@mail.de> Message-ID: <570C0B77.7080505@mail.de> On 10.04.2016 16:51, Paul Moore wrote: > On 10 April 2016 at 15:07, Sven R. Kunze wrote: >> If there's some agreement to change things with respect to those 5 points, I >> am willing to put some time into it. > In broad terms I agree with these points. Thanks for doing the > research. It would certainly be good to try to improve pathlib based > on this sort of feedback while it is still provisional. I'd appreciate some guidance on this. Just let me know what I can do since I don't know the processes of hacking CPython. > """ > Path.rglob(pattern) > Walk down a given path; a wrapper for "os.scandir"/"os.listdir". > """ > > However, at least in 3.5, Path.rglob does *not* wrap scandir. There's > a difference in principle, in that scandir (DirEntry) objects cache > stat data, where pathlib does not. Whether that makes using scandir in > Path.rglob impossible, I don't know. Ideally I'd like to see pathlib > modified to use scandir (because otherwise there will always be people > saying "use os.walk rather than scandir, as it's faster) - or if it's > not possible to do so because of the difference in principle, then I'd > like to see a clear discussion of the issue in the docs, including the > recommended approach for people who want scandir performance *without* > having to abandon pathlib for lower level functions. Good point. The proposed docstring was just to illustrate the functionality to the uninformed reader. People mostly trust the docs without digging deeper but they should be accurate of course. Best, Sven From marky1991 at gmail.com Mon Apr 11 16:40:15 2016 From: marky1991 at gmail.com (marky1991 .) Date: Mon, 11 Apr 2016 16:40:15 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0A0B.90109@sdamon.com> References: <570C0A0B.90109@sdamon.com> Message-ID: Neverending email chains aside, as a mere user, I like pathlib even as it is today and like the convenience of it being in the stdlib. (And would like it even more if the stdlib played nicely with it) I would be disappointed if it were taken out. (It's one of the few recent additions that I find useful to be honest) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Apr 11 16:42:20 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 11 Apr 2016 22:42:20 +0200 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> Message-ID: 2016-04-11 21:00 GMT+02:00 Brett Cannon : > I'm -0 on allowing __fspath__ to return bytes, but we can see what others > think. With the PEP 383, a bytes filename can be stored as str using the surrogateescape error handler. So DirEntry can convert a bytes path to str using os.fsdecode(). A "byte string" is unclear in Python. There is the immutable "bytes" type. But there is also the mutable "bytearray" type. And the buffer protocol which can have different shapes. I like the idea of a simple protocol: only allow a single type, str. Victor From srkunze at mail.de Mon Apr 11 16:48:39 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 11 Apr 2016 22:48:39 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0A0B.90109@sdamon.com> References: <570C0A0B.90109@sdamon.com> Message-ID: <570C0DA7.6030407@mail.de> On 11.04.2016 22:33, Alexander Walters wrote: > If there is headway being made, I do not see it. Funny that you brought it up. I was about posting something myself. I cannot agree completely. But starting with a comment from Paul, I realized that pathlib is something different than a string. After doing the research and our issues with pathlib, I found: - pathlib just needs to be improved (see my 5 points) - os[.path] should not tinkered with I know that all of those discussions of a new protocol (path->str, __fspath__ etc. etc.) might be rendered worthless by these two statements. But that's my conclusion. "os" and "os.path" are just lower level. "pathlib" is a high-level, convenience library. When using it, I don't want to use "os" or "os.path" anymore. If I still do, "pathlib" needs improving. *Not "os" nor "os.path"*. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Apr 11 16:51:28 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 13:51:28 -0700 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0A0B.90109@sdamon.com> References: <570C0A0B.90109@sdamon.com> Message-ID: <570C0E50.3080502@stoneleaf.us> On 04/11/2016 01:33 PM, Alexander Walters wrote: > In reviewing the ongoing arguments about how to make pathlib better, > there have been circular arguments about if it is even broken, if it > should support bytes, if there should be a path protocol that all > functions that touch the filesystem should use, if that protocol should > support bytes, how that protocol should be open or closed to allow third > party modules to act as paths, etc., etc. Do not take lots of discussion as a negative. It's better to thrash it out thoroughly first. > If there is headway being made, I do not see it. It's being made, and I dare say we are close to the end. -- ~Ethan~ From tritium-list at sdamon.com Mon Apr 11 16:55:05 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 11 Apr 2016 16:55:05 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0DA7.6030407@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> Message-ID: <570C0F29.5010904@sdamon.com> If i had my druthers, this thread would be kept to either: "Shut up alex, we are really close to figuring this out" or "Ok, maybe you have a point." Every conceivable way to fix pathlib have already been argued. Are any of them worth doing? Can we get consensus enough to implement one of them? If not, we should consider either dropping the matter or dropping the module. On 4/11/2016 16:48, Sven R. Kunze wrote: > On 11.04.2016 22:33, Alexander Walters wrote: >> If there is headway being made, I do not see it. > > Funny that you brought it up. I was about posting something myself. I > cannot agree completely. But starting with a comment from Paul, I > realized that pathlib is something different than a string. After > doing the research and our issues with pathlib, I found: > > > - pathlib just needs to be improved (see my 5 points) > - os[.path] should not tinkered with > > > I know that all of those discussions of a new protocol (path->str, > __fspath__ etc. etc.) might be rendered worthless by these two > statements. But that's my conclusion. > > "os" and "os.path" are just lower level. "pathlib" is a high-level, > convenience library. When using it, I don't want to use "os" or > "os.path" anymore. If I still do, "pathlib" needs improving. *Not "os" > nor "os.path"*. > > > Best, > Sven > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Mon Apr 11 16:56:15 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 11 Apr 2016 16:56:15 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0E50.3080502@stoneleaf.us> References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us> Message-ID: <570C0F6F.6060606@sdamon.com> That is great news. I just couldn't see it myself in the threads On 4/11/2016 16:51, Ethan Furman wrote: >> If there is headway being made, I do not see it. > > It's being made, and I dare say we are close to the end. From srkunze at mail.de Mon Apr 11 17:04:29 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 11 Apr 2016 23:04:29 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0F29.5010904@sdamon.com> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> Message-ID: <570C115D.1030104@mail.de> On 11.04.2016 22:55, Alexander Walters wrote: > Every conceivable way to fix pathlib have already been argued. Are any > of them worth doing? Can we get consensus enough to implement one of > them? If not, we should consider either dropping the matter or > dropping the module. Right now, I don't see pathlib removed. Why? Because using strings alone has its caveats (we all know that). So, I cannot imagine an alternative concept to pathlib right now. We might call it differently, but the concept stays unchanged. MAYBE, if there's an alternative concept, I could be convinced to support dropping the module. Best, Sven PS: The only way out that I can imagine is to fix pathlib. I am not in favor of fixing functions of "os" and "os.path" to except "path" objects; which does the majority here discuss now with the new __fspath__ protocol. But shaping what we have is definitely worth it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Apr 11 17:10:26 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 14:10:26 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> Message-ID: <570C12C2.9000602@stoneleaf.us> On 04/11/2016 01:42 PM, Victor Stinner wrote: > 2016-04-11 21:00 GMT+02:00 Brett Cannon: >> I'm -0 on allowing __fspath__ to return bytes, but we can see what others >> think. > > With the PEP 383, a bytes filename can be stored as str using the > surrogateescape error handler. So DirEntry can convert a bytes path to > str using os.fsdecode(). I am far from a unicode expert, but if I understand this correctly you are proposing that DirEntry.__whatever__ can always return a str using the surogateescape (SE) method. However, before this SE string can be used, it would need to be converted back to bytes, and with the same SE method, yes? And this has already been implemented in the stdlib? So my concern in such a case is what happens if we pass this SE string somewhere else: a UTF-8 file, or over a socket, or into a database? Does this have issues that we wouldn't face if we just used bytes? -- ~Ethan~ From random832 at fastmail.com Mon Apr 11 17:05:59 2016 From: random832 at fastmail.com (Random832) Date: Mon, 11 Apr 2016 17:05:59 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0DA7.6030407@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> Message-ID: <1460408759.3318333.575686073.50BEA1FB@webmail.messagingengine.com> On Mon, Apr 11, 2016, at 16:48, Sven R. Kunze wrote: > On 11.04.2016 22:33, Alexander Walters wrote: > > If there is headway being made, I do not see it. > > Funny that you brought it up. I was about posting something myself. I > cannot agree completely. But starting with a comment from Paul, I > realized that pathlib is something different than a string. After doing > the research and our issues with pathlib, I found: > > > - pathlib just needs to be improved (see my 5 points) > - os[.path] should not tinkered with I'm not so sure. Is there any particular reason os.path.join should require its arguments to be homogenous, rather than allowing os.path.join('a', b'b', Path('c')) to return 'a/b/c'? > I know that all of those discussions of a new protocol (path->str, > __fspath__ etc. etc.) might be rendered worthless by these two > statements. But that's my conclusion. > > "os" and "os.path" are just lower level. "pathlib" is a high-level, > convenience library. When using it, I don't want to use "os" or > "os.path" anymore. If I still do, "pathlib" needs improving. *Not "os" > nor "os.path"*. The problem isn't you using os. It's you using other modules that use os. or io, shutil, or builtins.open. Or pathlib, if what *you're* using is some other path library. Are you content living in a walled garden where there is only your code and pathlib, and you never might want to pass a Path to some function someone else (who didn't use pathlib) wrote? os is being used as an example because fixing os probably gets you most other things (that just pass it through to builtins.open which passes it through to os.open) for free. From random832 at fastmail.com Mon Apr 11 17:08:51 2016 From: random832 at fastmail.com (Random832) Date: Mon, 11 Apr 2016 17:08:51 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C115D.1030104@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> Message-ID: <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> On Mon, Apr 11, 2016, at 17:04, Sven R. Kunze wrote: > PS: The only way out that I can imagine is to fix pathlib. I am not in > favor of fixing functions of "os" and "os.path" to except "path" > objects; Why not? From tritium-list at sdamon.com Mon Apr 11 17:11:22 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Mon, 11 Apr 2016 17:11:22 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C115D.1030104@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> Message-ID: <570C12FA.3030609@sdamon.com> This stance was probably already argued in the threads in question. This thread is more of a health-check. As an observer, it did not look like any headway was being made, and I suggested the solimaic solution. It has been pointed out to me that headway IS being made and they are close to a solution. I think this thread can safely be sunset. On 4/11/2016 17:04, Sven R. Kunze wrote: > On 11.04.2016 22:55, Alexander Walters wrote: >> Every conceivable way to fix pathlib have already been argued. Are >> any of them worth doing? Can we get consensus enough to implement >> one of them? If not, we should consider either dropping the matter >> or dropping the module. > > Right now, I don't see pathlib removed. Why? Because using strings > alone has its caveats (we all know that). So, I cannot imagine an > alternative concept to pathlib right now. We might call it > differently, but the concept stays unchanged. > > MAYBE, if there's an alternative concept, I could be convinced to > support dropping the module. > > Best, > Sven > > PS: The only way out that I can imagine is to fix pathlib. I am not in > favor of fixing functions of "os" and "os.path" to except "path" > objects; which does the majority here discuss now with the new > __fspath__ protocol. But shaping what we have is definitely worth it. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Apr 11 17:15:02 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 14:15:02 -0700 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C115D.1030104@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> Message-ID: <570C13D6.4090609@stoneleaf.us> On 04/11/2016 02:04 PM, Sven R. Kunze wrote: > On 11.04.2016 22:55, Alexander Walters wrote: >> Every conceivable way to fix pathlib have already been argued. Are any >> of them worth doing? Can we get consensus enough to implement one of >> them? If not, we should consider either dropping the matter or >> dropping the module. > > Right now, I don't see pathlib removed. Why? Because using strings alone > has its caveats (we all know that). So, I cannot imagine an alternative > concept to pathlib right now. We might call it differently, but the > concept stays unchanged. We've pretty decided that we have two options: 1. remove pathlib 2. make the stdlib work with pathlib So we're trying to make option 2 work before falling back to option 1. If you have a way to make pathlib work with the stdlib that doesn't involve "fixing" os and os.path, now is the time to speak up. -- ~Ethan~ From srkunze at mail.de Mon Apr 11 17:21:36 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 11 Apr 2016 23:21:36 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> Message-ID: <570C1560.7070105@mail.de> On 11.04.2016 23:08, Random832 wrote: > On Mon, Apr 11, 2016, at 17:04, Sven R. Kunze wrote: >> PS: The only way out that I can imagine is to fix pathlib. I am not in >> favor of fixing functions of "os" and "os.path" to except "path" >> objects; > Why not? It occurred to me after pondering over Paul's comments. "os" and "os.path" is just a completely different level of abstraction. There is just no need to mess with them. The initial failure of my colleague and me of using pathlib can be solely attributed to pathlib's lack of functionality. Not to the incompatibility of "os" nor "os.path" with "Path" objects. Best, Sven From srkunze at mail.de Mon Apr 11 17:33:38 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 11 Apr 2016 23:33:38 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <1460408759.3318333.575686073.50BEA1FB@webmail.messagingengine.com> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <1460408759.3318333.575686073.50BEA1FB@webmail.messagingengine.com> Message-ID: <570C1832.6010509@mail.de> On 11.04.2016 23:05, Random832 wrote: > On Mon, Apr 11, 2016, at 16:48, Sven R. Kunze wrote: >> On 11.04.2016 22:33, Alexander Walters wrote: >>> If there is headway being made, I do not see it. >> Funny that you brought it up. I was about posting something myself. I >> cannot agree completely. But starting with a comment from Paul, I >> realized that pathlib is something different than a string. After doing >> the research and our issues with pathlib, I found: >> >> >> - pathlib just needs to be improved (see my 5 points) >> - os[.path] should not tinkered with > I'm not so sure. Is there any particular reason os.path.join should > require its arguments to be homogenous, rather than allowing > os.path.join('a', b'b', Path('c')) to return 'a/b/c'? Besides the fact, that I don't like mixing types (this was something that worried me about the discussion from the beginning), you can achieve the same using pathlib alone. There's no need of it let alone the maintenance and slowdown of these implicit conversions. >> I know that all of those discussions of a new protocol (path->str, >> __fspath__ etc. etc.) might be rendered worthless by these two >> statements. But that's my conclusion. >> >> "os" and "os.path" are just lower level. "pathlib" is a high-level, >> convenience library. When using it, I don't want to use "os" or >> "os.path" anymore. If I still do, "pathlib" needs improving. *Not "os" >> nor "os.path"*. > The problem isn't you using os. It's you using other modules that use > os. or io, shutil, or builtins.open. Or pathlib, if what *you're* using > is some other path library. Are you content living in a walled garden > where there is only your code and pathlib, and you never might want to > pass a Path to some function someone else (who didn't use pathlib) > wrote? > > os is being used as an example because fixing os probably gets you most > other things (that just pass it through to builtins.open which passes it > through to os.open) for free. Hypothetical assumptions meeting implicit type conversions. You might prefer those, I don't because of good reason. I was one of those starting the discussion around pathlib improvements. I understand now, that this is one of its minor issues. And btw. using some "other pathlib" is no argument for or against improving "THE pathlib". The .path attribute will do it from what I can see. Best, Sven From brett at python.org Mon Apr 11 17:40:29 2016 From: brett at python.org (Brett Cannon) Date: Mon, 11 Apr 2016 21:40:29 +0000 Subject: [Python-Dev] pathlib+os/shutil feedback In-Reply-To: <570C0B77.7080505@mail.de> References: <570A5E36.2070606@mail.de> <570C0B77.7080505@mail.de> Message-ID: On Mon, 11 Apr 2016 at 13:40 Sven R. Kunze wrote: > On 10.04.2016 16:51, Paul Moore wrote: > > On 10 April 2016 at 15:07, Sven R. Kunze wrote: > >> If there's some agreement to change things with respect to those 5 > points, I > >> am willing to put some time into it. > > In broad terms I agree with these points. Thanks for doing the > > research. It would certainly be good to try to improve pathlib based > > on this sort of feedback while it is still provisional. > > I'd appreciate some guidance on this. Just let me know what I can do > since I don't know the processes of hacking CPython. > https://docs.python.org/devguide/ and https://mail.python.org/mailman/listinfo/core-mentorship are your friends. :) For new features of a module you can discuss it on python-ideas first before proposing a patch if you're worried a patch implementing the feature might get rejected and you don't want to risk wasting your time. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Mon Apr 11 17:41:30 2016 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 12 Apr 2016 07:41:30 +1000 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us> <570C0F6F.6060606@sdamon.com> Message-ID: <85mvoz2585.fsf@benfinney.id.au> Alexander Walters writes: > That is great news. I just couldn't see it myself in the threads Agreed. A summary posting, from someone who has a good handle on the issue and outcome, would be very helpful. -- \ ?Firmness in decision is often merely a form of stupidity. It | `\ indicates an inability to think the same thing out twice.? | _o__) ?Henry L. Mencken | Ben Finney From brett at python.org Mon Apr 11 17:43:01 2016 From: brett at python.org (Brett Cannon) Date: Mon, 11 Apr 2016 21:43:01 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570C12C2.9000602@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> Message-ID: On Mon, 11 Apr 2016 at 14:11 Ethan Furman wrote: > On 04/11/2016 01:42 PM, Victor Stinner wrote: > > 2016-04-11 21:00 GMT+02:00 Brett Cannon: > > >> I'm -0 on allowing __fspath__ to return bytes, but we can see what > others > >> think. > > > > With the PEP 383, a bytes filename can be stored as str using the > > surrogateescape error handler. So DirEntry can convert a bytes path to > > str using os.fsdecode(). > > I am far from a unicode expert, but if I understand this correctly you > are proposing that DirEntry.__whatever__ can always return a str using > the surogateescape (SE) method. > > However, before this SE string can be used, it would need to be > converted back to bytes, and with the same SE method, yes? And this has > already been implemented in the stdlib? > > So my concern in such a case is what happens if we pass this SE string > somewhere else: a UTF-8 file, or over a socket, or into a database? > Does this have issues that we wouldn't face if we just used bytes? > This is my worry as well and why I have not proposed this kind of universal normalizing of bytes paths using os.fsdecode() w/ surrogateescape. Doing this sort of thing from the system boundary and documenting as such as PEP 383 proposed makes a bit more sense as the expectation is more controlled and is a clear input boundary. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Apr 11 17:43:49 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 11 Apr 2016 23:43:49 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C13D6.4090609@stoneleaf.us> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> Message-ID: <570C1A95.1060100@mail.de> On 11.04.2016 23:15, Ethan Furman wrote: > We've pretty decided that we have two options: > > 1. remove pathlib > 2. make the stdlib work with pathlib > > So we're trying to make option 2 work before falling back to option 1. > > If you have a way to make pathlib work with the stdlib that doesn't > involve "fixing" os and os.path, now is the time to speak up. As I said, I don't like messing with os or os.path. They are built with a different level of abstraction in mind. What makes people want to go down from pathlib to os (speaking in terms of abstraction) is the fact that pathlib suggests/promise a convenience that it cannot hold. You might have seen my "feedback" post here on python-dev. If those points were corrected in a reasonable way, we wouldn't have had the need to go down to os or other stdlib modules. As it presents itself, it feels like a poor wrapper for os and os.path. I hope that makes sense. So, I might add: 3. add more high-level features to pathlib to prevent a downgrade to os or os.path Best, Sven From brett at python.org Mon Apr 11 17:55:55 2016 From: brett at python.org (Brett Cannon) Date: Mon, 11 Apr 2016 21:55:55 +0000 Subject: [Python-Dev] Summary of the pathlib discussion (Re: Maybe, just maybe, pathlib doesn't belong.) In-Reply-To: <85mvoz2585.fsf@benfinney.id.au> References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us> <570C0F6F.6060606@sdamon.com> <85mvoz2585.fsf@benfinney.id.au> Message-ID: On Mon, 11 Apr 2016 at 14:42 Ben Finney wrote: > Alexander Walters writes: > > > That is great news. I just couldn't see it myself in the threads > > Agreed. A summary posting, from someone who has a good handle on the > issue and outcome, would be very helpful. > - Guido has put Chris Angelico and myself in charge of drafting a proposal once we are done discussing things as a PEP (probably an amendment to the pathlib PEP where I will also explain why we are still not subclassing str) - Ethan Furman has volunteered to help out with code work (as have I) - Name bikeshedding never seems to end, but there seems to be coalescing around __fspath__ or __fspathname__ (I think, although __fspath__ seems to be what everyone has been typing today; I'm trying to stay out of it so as to not influence too much) - We are only discussing two things still (all going on in the threads relating to return values, arguments, types, etc. in their titles)... - Should path.__fspath__() be allowed to return bytes on top of strings? (we seem to have found an amicable way to allow os.fspath() to let a bytes argument pass through just like str in an explicit fashion) - Should we explicitly type check in os.fspath() what path.__fspath__() returns or just let it fall through and hope people do the right thing? That's pretty much it unless Chris or Ethan disagree. So I think pathlib is far from being as dead as a parrot. ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Apr 11 17:58:43 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 14:58:43 -0700 Subject: [Python-Dev] pathlib - current status of discussions Message-ID: <570C1E13.4090909@stoneleaf.us> name: ---- We are down to two choices: - __fspath__, or - __fspathname__ The final choice I suspect will be affected by the choice to allow (or not) bytes. method or attribute: ------------------- method built-in: -------- Almost - we'll put it in the os module add to str: ---------- No, not all strings are paths. add to C API: ------------ Yes. Possible names include PyUnicode_FromFSPath and PyObject_Path -- again, the choice of bytes inclusion will affect the final choice of name. add a Path ABC: -------------- undecided Sticking points: --------------- Do we allow bytes to be returned from os.fspath()? If yes, then do we allow bytes from __fspath__()? -- ~Ethan~ From ethan at stoneleaf.us Mon Apr 11 18:00:46 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 15:00:46 -0700 Subject: [Python-Dev] Summary of the pathlib discussion (Re: Maybe, just maybe, pathlib doesn't belong.) In-Reply-To: References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us> <570C0F6F.6060606@sdamon.com> <85mvoz2585.fsf@benfinney.id.au> Message-ID: <570C1E8E.1080205@stoneleaf.us> On 04/11/2016 02:55 PM, Brett Cannon wrote: > That's pretty much it unless Chris or Ethan disagree. So I think pathlib > is far from being as dead as a parrot. ;) That's nearly exactly what I wrote in my summary. :) So, yes, we are nearly there! -- ~Ethan~ From wes.turner at gmail.com Mon Apr 11 18:02:46 2016 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 11 Apr 2016 17:02:46 -0500 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> Message-ID: You seem to be defining a (restricted subset of an existing) language; which will need version strings and ABI tags for compatibility purposes: * Build Tags (for Python variants): * https :// www.python.org /dev/peps/pep-0425/ * Python tag * ABI tag * Platform tag * https://www.python.org/dev/peps/pep-0513/ manylinux1 * https://www.python.org/dev/peps/pep-3149/ .so file tags * RestrictedPython does not have ABI tags An Android CPython build discussion about just exposing an extra attribute in the platform module (the Android build also ships without some modules IIRC): * https://mail.python.org/pipermail/python-dev/2014-August/135606.html * https://mail.python.org/pipermail/python-dev/2014-August/thread.html#135640 On 11 April 2016 at 15:46, Jon Ribbens wrote: > It's trying to alter > the global Python environment so that arbitrary code can be executed, > whereas I am not even trying to allow execution of arbitrary code and > am not altering the global environment. However, it's not at all clear (to me at least) what you *are* trying to do. You're limiting the subset of Python that people can use, understood. And you're trying to ensure that people can't do "bad things". Again, understood. But what subset are you actually allowing, and what things are you trying to protect against? (For example, I can't calculate sin(1.2) using the math module - why is that not alllowed? It's just as safe as using the built in exponential operator, and indeed I could write a sin() function in pure Python, although it would be too slow to be useful, unlike math.sin...) It feels at the moment as if I'm playing a game where I don't know the rules, and every time I think I scored a point, the rules are changed to retroactively disallow it. Paul _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Mon Apr 11 18:38:56 2016 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Apr 2016 18:38:56 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570C1E13.4090909@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> Message-ID: > On Apr 11, 2016, at 5:58 PM, Ethan Furman wrote: > > name: > ---- > > We are down to two choices: > > - __fspath__, or > - __fspathname__ > > The final choice I suspect will be affected by the choice to allow (or not) bytes. +1 on __fspath__, -0 on __fspathname__ > > > > add a Path ABC: > -------------- > > undecided I think it makes sense to add it, but maybe only in 3.6? Path accepting code could be updated to do something like `isinstance(obj, (bytes, str, PathMeta))` which seems like a net win to me. > > > Sticking points: > --------------- > > Do we allow bytes to be returned from os.fspath()? If yes, then do we allow bytes from __fspath__()? I think yes and yes, it seems like making it needlessly harder to deal with a bytes path in the scenarios that you?re actually dealing with them is the kind of change that 3.0 made that ended up getting rolled back where it could. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jon+python-dev at unequivocal.co.uk Mon Apr 11 18:43:17 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Mon, 11 Apr 2016 23:43:17 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> Message-ID: <20160411224317.GD8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 03:02:54AM +1000, Chris Angelico wrote: > On Tue, Apr 12, 2016 at 2:53 AM, Jon Ribbens > wrote: > > On Mon, Apr 11, 2016 at 04:04:21PM +0100, Paul Moore wrote: > >> However, it's not at all clear (to me at least) what you *are* trying > >> to do. > > > > I'm trying to see to what extent we can use ast node inspection to > > remedy the failures of prior attempts at Python sandboxing. Is there > > *any* extent to which Python can be sandboxed, or is even trying to > > use it as a calculator function unfixably insecure? > > It all depends on how much functionality you want. If all you need is > a numeric expression evaluator, that's not too hard - disallow all > forms of attribute access, etc, and just have simple numbers and > operators. That's pretty useful, and safe. By "calculator" I didn't necessarily mean to imply numeric-only, sorry if I was unclear. Also perhaps I should have said "non-trivial", inasmuch as if we restrict it that far then it would quite possibly be simpler and quicker just to write the expression evaluator from scratch and not use the Python interpreter at all. > Alternatively, go completely the other way. Let people run whatever > code they like... in an environment where it can't hurt anyone else. > That's what PyPyJS does - don't bother looking for security holes in > it, because all you're doing is attacking your own computer. That's a very specific use case though: running client-side in the user's browser. > So before you can ask whether Python is unfixably insecure, you first > have to decide what the minimum level of functionality is that you'll > accept. Do you need basic arithmetic plus trignometric functions? Easy > enough - disallow all attribute access and imports, and populate > builtins with "from math import *". Need them to be able to assign > variables and define functions? That's gonna be harder. I think calling functions and accessing variables and attributes is likely a minimum. Defining functions would be useful, and of course defining classes would be another useful step further. From random832 at fastmail.com Mon Apr 11 18:56:05 2016 From: random832 at fastmail.com (Random832) Date: Mon, 11 Apr 2016 18:56:05 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C1A95.1060100@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de> Message-ID: <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com> On Mon, Apr 11, 2016, at 17:15, Ethan Furman wrote: > So we're trying to make option 2 work before falling back to option 1. > > If you have a way to make pathlib work with the stdlib that doesn't > involve "fixing" os and os.path, now is the time to speak up. Fully general re-dispatch from argument types on any call to a function that raises TypeError or NotImplemented? [e.g. call Path.__missing_func__(os.open, path, mode)] Have pathlib monkey-patch things at import? On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote: > So, I might add: > > 3. add more high-level features to pathlib to prevent a downgrade to os > or os.path 3. reimplement the entire ecosystem in every walled garden so no-one has to leave their walled gardens. What's the point of batteries being included if you can't wire them to anything? I don't get what you mean by this whole "different level of abstraction" thing, anyway. The fact that there is one obvious thing to want to do with open and a Path strongly suggests that that should be able to be done by passing the Path to open. Also, what level of abstraction is builtin open? Maybe we should _just_ leave os alone on the grounds of some holy sacred lowest-level-itude, but allow io and shutils to accept Path? From victor.stinner at gmail.com Mon Apr 11 19:43:16 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Apr 2016 01:43:16 +0200 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570C12C2.9000602@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> Message-ID: Le 11 avr. 2016 11:11 PM, "Ethan Furman" a ?crit : > So my concern in such a case is what happens if we pass this SE string somewhere else: a UTF-8 file, or over a socket, or into a database? Does this have issues that we wouldn't face if we just used bytes? "SE string" are returned by os.listdir(str), os.walk(str), os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under the sun. Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding error. A surrogate is created to store an undecodable byte in a filename. IHMO it's safer to get an encoding error rather than no error when you concatenate two byte strings encoded to two different encodings (mojibake). print(os.fspath(obj)) will more likely do what you expect if os.fspath() always return str. I mean that it will encode your filename to the encoding of the terminal which can be different than the filesystem encoding. If fspath() can return bytes, you should write print(os.fsdecode(os.fspath(obj))). -- On Linux, open(DirEntry) for a bytes entry (os.scandir(bytes)) would have to first decode a bytes filename with os.fsdecode() to then encode it back with os.fsencode(). Yeah, that's inefficient. But we now have super fast codecs (ex: encode and decode is almost memcpy for pure ascii). And filenames are usually very short (less than 300 bytes). IMHO the interface matters more than performance. As I showed with my print example, filenames are not only used to access the filesystem, you also want to display them. Using Unicode avoids bad surprises (mojibake). -- Well, the question is more why you want to get bytes at the first place. Why not only using Unicode? I understood that some people expect mojibake when using Unicode, whereas using bytes cannot lead to mojibake. Well, in practice it's simply the opposite :-) Maybe devs read that Linux syscalls and C functions take bytes, so using bytes give access to any filenames including "invalid filenames". That's true. But it's also true for Unicode if you use os.fsdecode(). Maybe dev don't understand, don't know and fear Unicode :-) My goal is more to educate users and help them to avoid mojibake. Did I mention that you must not use bytes filename on Windows? So using Unicode everywhere helps to write really portable code. On Windows, using Unicode is requied to be able to open any file. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Apr 11 20:01:14 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 10:01:14 +1000 Subject: [Python-Dev] Summary of the pathlib discussion (Re: Maybe, just maybe, pathlib doesn't belong.) In-Reply-To: References: <570C0A0B.90109@sdamon.com> <570C0E50.3080502@stoneleaf.us> <570C0F6F.6060606@sdamon.com> <85mvoz2585.fsf@benfinney.id.au> Message-ID: On Tue, Apr 12, 2016 at 7:55 AM, Brett Cannon wrote: > That's pretty much it unless Chris or Ethan disagree. So I think pathlib is > far from being as dead as a parrot. ;) That looks like an accurate summary! ChrisA From ethan at stoneleaf.us Mon Apr 11 20:40:50 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 17:40:50 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> Message-ID: <570C4412.4070600@stoneleaf.us> On 04/11/2016 01:42 PM, Victor Stinner wrote: > With the PEP 383, a bytes filename can be stored as str using the > surrogateescape error handler. So DirEntry can convert a bytes path to > str using os.fsdecode(). Does this mean that os.fsdecode() is simply a wrapper that sets the errors to the surrogateescape handler? -- ~Ethan~ From songofacandy at gmail.com Mon Apr 11 20:51:21 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Tue, 12 Apr 2016 09:51:21 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> Message-ID: Sorry, I've forgot to use "Reply All". On Tue, Apr 12, 2016 at 9:49 AM, INADA Naoki wrote: > IHMO it's safer to get an encoding error rather than no error when you >> concatenate two byte strings encoded to two different encodings (mojibake). >> >> print(os.fspath(obj)) will more likely do what you expect if os.fspath() >> always return str. I mean that it will encode your filename to the encoding >> of the terminal which can be different than the filesystem encoding. >> >> If fspath() can return bytes, you should write >> print(os.fsdecode(os.fspath(obj))). >> >> > Why not print(obj)? > str() is normal high-level API, and __fspath__ and os.fspath() should be > low level API. > Normal users shouldn't use __fspath__ and os.fspath(). Only library > developers should use it. > > -- > INADA Naoki > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Apr 11 20:55:43 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Apr 2016 12:55:43 +1200 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570BCE39.8090306@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> Message-ID: <570C478F.6050400@canterbury.ac.nz> Ethan Furman wrote: > # after new protocol with bytes/str support > def zingar(a_path): > a_path = fspath(a_path) > if not isinstance(a_path, (bytes,str)): > raise TypeError('bytes or str required') > ... I think that one would be just def zingar(a_path): a_path = fspath(a_path) because fspath() would presumably check the result for str/bytesness itself. At least I can't think of a reason for it not to, since returning either str or bytes is part of its contract. -- Greg From greg.ewing at canterbury.ac.nz Mon Apr 11 21:08:36 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Apr 2016 13:08:36 +1200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411164449.GB8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <8760vorweo.fsf@thinkpad.rath.org> <20160411164449.GB8206@unequivocal.co.uk> Message-ID: <570C4A94.1010402@canterbury.ac.nz> Jon Ribbens wrote: > So far it looks like blocking "_*" and the frame object attributes > appears to be sufficient. Even if your sandbox as it currently exists is secure, it's only an extremely restricted subset. You seem to be assuming that if your technique works so far, then it can be extended to cover a larger subset, but I don't think that's certain. One problem that's been raised is how to prevent untrusted code from monkeypatching imported modules. Possibly that could be addressed by giving the untrusted code a copy of the module, but I'm not entirely sure -- accidentally importing two copies of the same source file is a well-known source of bugs, after all. A related, but more difficult problem is that if we allow the untrusted code to import any pure-Python classes, it will be able to monkeypatch them. So it seems like it will need its own copy of those classes as well -- and having two copies of the same class around is *another* well known source of bugs. -- Greg From wes.turner at gmail.com Mon Apr 11 21:52:10 2016 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 11 Apr 2016 20:52:10 -0500 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <570C4A94.1010402@canterbury.ac.nz> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <8760vorweo.fsf@thinkpad.rath.org> <20160411164449.GB8206@unequivocal.co.uk> <570C4A94.1010402@canterbury.ac.nz> Message-ID: On Mon, Apr 11, 2016 at 8:08 PM, Greg Ewing wrote: > Jon Ribbens wrote: > >> So far it looks like blocking "_*" and the frame object attributes >> appears to be sufficient. >> > > Even if your sandbox as it currently exists is secure, it's > only an extremely restricted subset. You seem to be assuming > that if your technique works so far, then it can be extended > to cover a larger subset, but I don't think that's certain. > How would you test that? > One problem that's been raised is how to prevent untrusted > code from monkeypatching imported modules. Possibly that > could be addressed by giving the untrusted code a copy of > the module, but I'm not entirely sure -- accidentally > importing two copies of the same source file is a well-known > source of bugs, after all. > https://en.wikipedia.org/wiki/Monkey_patch#Pitfalls * https://pypi.python.org/pypi?%3Aaction=search&term=monkeypatch&submit=search * https://pypi.python.org/pypi/apparmor_monkeys * http://eventlet.net/doc/patching.html#monkeypatching-the-standard-library * http://www.gevent.org/gevent.monkey.html * https://docs.python.org/3/library/asyncio-sync.html#locks * https://docs.python.org/2/library/threading.html#lock-objects * https://docs.python.org/2/library/sets.html?highlight=immutable#sets.ImmutableSet * http://doc.pypy.org/en/latest/stm.html#locks - " Infinite recursion just segfaults for now." * https://github.com/tobgu/pyrsistent #justfoundthis - https://github.com/tobgu/pyrsistent#invariants - https://github.com/tobgu/pyrsistent#freeze-and-thaw - freeze, thaw * define a @property (and no @propname.setter) - https://docs.python.org/2/howto/descriptor.html#properties - https://docs.python.org/2/library/functions.html#property > A related, but more difficult problem is that if we allow > the untrusted code to import any pure-Python classes, it > will be able to monkeypatch them. So it seems like it will > need its own copy of those classes as well -- * https://docs.python.org/3/library/importlib.html#importlib.__import__ * > and having > two copies of the same class around is *another* well > known source of bugs. One way to reduce the likelihood of this is to bundle all dependencies into a self-contained PEX ZIP package and specify entry points. * http://legacy.python.org/dev/peps/pep-0441/ * https://pex.readthedocs.org/en/stable/buildingpex.html#specifying-entry-points * https://pex.readthedocs.org/en/stable/buildingpex.html#tailoring-pex-execution-at-build-time > > > -- > Greg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon+python-dev at unequivocal.co.uk Mon Apr 11 22:00:29 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 03:00:29 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <570C4A94.1010402@canterbury.ac.nz> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <8760vorweo.fsf@thinkpad.rath.org> <20160411164449.GB8206@unequivocal.co.uk> <570C4A94.1010402@canterbury.ac.nz> Message-ID: <20160412020029.GE8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 01:08:36PM +1200, Greg Ewing wrote: > Jon Ribbens wrote: > >So far it looks like blocking "_*" and the frame object attributes > >appears to be sufficient. > > Even if your sandbox as it currently exists is secure, it's > only an extremely restricted subset. I'm not sure what you think the restrictions are, but yes a highly restricted Python that was secure would be very useful sometimes. > You seem to be assuming that if your technique works so far, then it > can be extended to cover a larger subset, but I don't think that's > certain. No, I'm not assuming that. > One problem that's been raised is how to prevent untrusted > code from monkeypatching imported modules. Possibly that > could be addressed by giving the untrusted code a copy of > the module, Yes, that's what it does. > but I'm not entirely sure -- accidentally importing two copies of > the same source file is a well-known source of bugs, after all. I'm not sure what you mean by that. > A related, but more difficult problem is that if we allow > the untrusted code to import any pure-Python classes, it > will be able to monkeypatch them. So it seems like it will > need its own copy of those classes as well Yes, that's also what it does. > -- and having two copies of the same class around is *another* well > known source of bugs. I'm not sure what you mean by that either. From rosuav at gmail.com Mon Apr 11 22:13:07 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 12:13:07 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411224317.GD8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160411224317.GD8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 8:43 AM, Jon Ribbens wrote: > On Tue, Apr 12, 2016 at 03:02:54AM +1000, Chris Angelico wrote: >> It all depends on how much functionality you want. If all you need is >> a numeric expression evaluator, that's not too hard - disallow all >> forms of attribute access, etc, and just have simple numbers and >> operators. That's pretty useful, and safe. > > By "calculator" I didn't necessarily mean to imply numeric-only, > sorry if I was unclear. Also perhaps I should have said "non-trivial", > inasmuch as if we restrict it that far then it would quite possibly be > simpler and quicker just to write the expression evaluator from scratch > and not use the Python interpreter at all. I'm aware you wanted more. My point is that it's not hard to secure the trivially simple, and it doesn't have to be entirely useless. But every bit of additional power brings with it additional risk. ChrisA From ncoghlan at gmail.com Mon Apr 11 23:45:00 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Apr 2016 13:45:00 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570C1E13.4090909@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> Message-ID: On 12 April 2016 at 07:58, Ethan Furman wrote: > Sticking points: > --------------- > > Do we allow bytes to be returned from os.fspath()? If yes, then do we allow > bytes from __fspath__()? I've come around to the point of view that allowing both str and bytes-like objects to pass through unchanged makes sense, with the rationale being the one someone mentioned regarding ease-of-use in os.path. Consider os.path.join: with a permissive os.fspath, the necessary update should just be to introduce "map(os.fspath, args)" (or its C equivalent), and then continue with the existing bytes vs str handling logic. Functions consuming os.fspath can then decide on a case-by-case basis how they want to handle binary paths: either use them as is (which will usually work on mostly-ASCII systems), convert them to text with os.fsdecode (which will usually work on *nix systems), or disallow them entirely (which would probably only be appropriate for libraries that wanted to ensure support for non-ASCII paths on Windows systems). That then cascades into the other open questions mentioned: - permitted return types for both fspath and __fspath__ would be (str, bytes) - the names would be fspath and __fspath__, since the result may be either a path name as text, or an encoded path name as bytes Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Apr 11 23:58:29 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Apr 2016 13:58:29 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On 12 April 2016 at 13:45, Nick Coghlan wrote: > Consider os.path.join: with a permissive os.fspath, the necessary > update should just be to introduce "map(os.fspath, args)" (or its C > equivalent), and then continue with the existing bytes vs str handling > logic. That does remind me: once a patch is available, we should check the benchmark numbers with the patch applied. I'd expect the new protocol overhead to be swamped by the actual IO costs, but this kind of low level change can have surprising consequences. Regarding the type checks, PyObject_AsFilesystemPath (or whatever we call it) will be implemented in C, with os.fspath just calling that, so doing "PyUnicode_Check(path) || PyBytes_Check(path)" on the result will be both cheap and convenient for API consumers (since it means they know they only have to cope with bytes or str instances internally, and will get a clear error message if handed something else). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From chris.barker at noaa.gov Tue Apr 12 01:14:29 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 11 Apr 2016 22:14:29 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <-9219200259368253896@unknownmsgid> > with the > rationale being the one someone mentioned regarding ease-of-use in > os.path. > > Consider os.path.join: Why in the world do the os.path functions need to work with Path objects? ( and other conforming objects) Thus all started with the goal of using Path objects in the stdlib, but that's for opening files, etc. Path is an alternative to os.path -- you don't need to use both. And if you do have a byte path, you can stick with os.path.... BTW, I'm confused about what a bytes path IS -- is it encoded? Can you assume it can be decoded ? It seems to me that the ONLY time you should get a byte path is from a low level system call on a posix system, and you may have no idea how it's encoded. So the ONLY thing you should do with it is pass it along to another low level system call. I can't see why we should support anything else with bytes objects. > - the names would be fspath and __fspath__, since the result may be > either a path name as text, or an encoded path name as bytes You just used the phrase "path name as bytes" -- so why is __pathname__ inappropriate if it might return bytes? I like __pathname__ better because this entire effort is because we' be decided itMs important to make the distinction between a "path" and the text representation of said path. Just sayin' -CHB From stephen at xemacs.org Tue Apr 12 01:28:51 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 12 Apr 2016 14:28:51 +0900 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> Donald Stufft writes: > I think yes and yes [__fspath__ and fspath should be allowed to > handle bytes, otherwise] it seems like making it needlessly harder > to deal with a bytes path It's not needless. This kind of polymorphism makes it hard to review code locally. Once bytes get a foothold inside a text application, they metastasize altogether too easily, and you end up with TypeErrors or UnicodeErrors quite far from the origin. Debugging often requires tracing data flows over hill and over dale while choking from the dusty trail, or band-aids like a top-level "except UnicodeError: log_and_quarantine(bytes)". I can't prove that returning bytes from these APIs is a big risk in this sense, but I can't see a way to prove that it's not, either, given that their point is duck-typing, and therefore they may be generalized in the future, and by third parties. I understand that there are applications where it's bytes all the way down, but by the very nature of computing systems, there are systems where bytes are decoded to text. For historical reasons (the encoding Tower of Babel), it's very error-prone to do that on demand. Best practice is to do the conversion as close to the boundary as possible, and process only text internally. In text applications, "bytes as carcinogen" is an apt metaphor. Now, I'm not Dutch, so I can't tell you it's obvious that the risk to text-processing applications is more important than the inconvenience to byte-shoveling applications. But there is a need to be parsimonious with polymorphism. From robertc at robertcollins.net Tue Apr 12 01:30:04 2016 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 12 Apr 2016 17:30:04 +1200 Subject: [Python-Dev] thoughts on backporting __wrapped__ to 2.7? In-Reply-To: <22276.31903.569346.438240@turnbull.sk.tsukuba.ac.jp> References: <22276.31903.569346.438240@turnbull.sk.tsukuba.ac.jp> Message-ID: On 6 April 2016 at 15:03, Stephen J. Turnbull wrote: > Robert Collins writes: > > > Sadly that has the ordering bug of assigning __wrapped__ first and appears > > a little unmaintained based on the bug tracker :( > > You can fix two problems with one patch, then! > Not really - taking over a project is somewhat long winded; it would be centralising yet another backport which may-or-may-not-be-a-good-thing, and I'm not exactly overflowing with spare tuits. If someone wants to do it - great, more power to them, but the last thing we need is to move it from one unmaintained spot to another unmaintained spot. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From greg.ewing at canterbury.ac.nz Tue Apr 12 01:40:16 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Apr 2016 17:40:16 +1200 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <-9219200259368253896@unknownmsgid> References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> Message-ID: <570C8A40.6020903@canterbury.ac.nz> Chris Barker - NOAA Federal wrote: > Why in the world do the os.path functions need to work with Path > objects? So that applications using path objects can pass them to library code that uses os.path to manipulate them. > I'm confused about what a bytes path IS -- is it encoded? It's a sequence of bytes identifying a file. Often it will be an encoding of som piece of text in the file system encoding, but there's no guarantee of that. > Can you assume it can be decoded ? Only if you use an encoding in which all byte sequences are valid, such as latin1 or utf8+surrogateescape. > So the ONLY thing > you should do with it is pass it along to another low level system > call. Not quite -- you can separate it into components and work with them. Essentially the same set of operations that os.path provides. >>- the names would be fspath and __fspath__, since the result may be >>either a path name as text, or an encoded path name as bytes > > I like __pathname__ better because this entire effort is because we' > be decided itMs important to make the distinction between a "path" and > the text representation of said path. I agree -- the term "pathname" can cover both text and bytes. When posix talks about pathnames it's really talking about bytes. -- Greg From ethan at stoneleaf.us Tue Apr 12 02:00:14 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Apr 2016 23:00:14 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <-9219200259368253896@unknownmsgid> References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> Message-ID: <570C8EEE.6050904@stoneleaf.us> On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote: >> Consider os.path.join: > > Why in the world do the os.path functions need to work with Path > objects? ( and other conforming objects) Because library XYZ that takes a path and wants to open it shouldn't have to care whether that path is a string or pathlib.Path -- but if os.open can't use pathlib.Path then the library has to care (or the user has to care). > This all started with the goal of using Path objects in the stdlib, > but that's for opening files, etc. Etc. as in os.join? os.stat? os.path.split? > Path is an alternative to os.path -- you don't need to use both. As a user you don't, no. As a library that has no control over what kind of "path" is passed to you -- well, if os and os.path can accept Path objects then you can just use os and os.path; otherwise you have to use os and os.path if passed a str or bytes, and pathlib.Path if passed a pathlib.Path -- so you do have to use both. >> - the names would be fspath and __fspath__, since the result may be >> either a path name as text, or an encoded path name as bytes > > You just used the phrase "path name as bytes" -- so why is > __pathname__ inappropriate if it might return bytes? No, he used the phrase "*encoded* path name as bytes". Names are typically represented as text, and since bytes might be returned we don't want a signal that says text. > I like __pathname__ better because this entire effort is because we' > be decided itMs important to make the distinction between a "path" and > the text representation of said path. No, this entire effort is to make pathlib work with the rest of the stdlib. -- ~Ethan~ From stephen at xemacs.org Tue Apr 12 02:21:12 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 12 Apr 2016 15:21:12 +0900 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C0A0B.90109@sdamon.com> References: <570C0A0B.90109@sdamon.com> Message-ID: <22284.37848.204411.503483@turnbull.sk.tsukuba.ac.jp> Alexander Walters writes: > If there is headway being made, I do not see it. Filter out everything but the posts by Brett, and see if you still feel that way. (Other people have contributed[1], but that filter has about 20dB better S/N than the whole thread does.) Footnotes: [1] Brett may even claim none of the ideas are his. From stephen at xemacs.org Tue Apr 12 03:52:19 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 12 Apr 2016 16:52:19 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> Message-ID: <22284.43315.54899.838953@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > > Why not print(obj)? print(obj) will give mojibake by default if sys.getfilenameencoding() != sys.getdefaultencoding(). > > str() is normal high-level API, and __fspath__ and os.fspath() should be > > low level API. > > Normal users shouldn't use __fspath__ and os.fspath(). Only library > > developers should use it. This is the price we pay for the stubbornness of the bytes-are-text-too meme. From p.f.moore at gmail.com Tue Apr 12 04:17:28 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 12 Apr 2016 09:17:28 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160411165354.GC8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> Message-ID: On 11 April 2016 at 17:53, Jon Ribbens wrote: >> You're limiting the subset of Python that people can use, >> understood. And you're trying to ensure that people can't do "bad >> things". Again, understood. But what subset are you actually allowing, >> and what things are you trying to protect against? (For example, I >> can't calculate sin(1.2) using the math module - why is that not >> alllowed? > > It wasn't allowed in the earlier version because I wasn't allowing > import at all, because this is just an experiment. As it happens, > I added 'import' yesterday so yes you can use math.sin. Well, I'll ask the obvious question, then. In allowing "import" did you allow "import ctypes"? If so, then I win :-) Or did you explicitly whitelist certain modules? And if so, which ones are they, and did I succeed if I manage to import a module you hadn't whitelisted? >> It feels at the moment as if I'm playing a game where I don't know the >> rules, and every time I think I scored a point, the rules are changed >> to retroactively disallow it. > > The challenge is to show some code that will escape from the sandbox, > in a way that is not trivially fixable with a tiny patch, or in a way > that demonstrates that such a large number of tiny patches would be > required as to be unworkable. But I'm still not clear when I count as "outside the sandbox", given that I don't know what the rules of what is allowed *in* the sandbox are... Paul From rosuav at gmail.com Tue Apr 12 04:28:34 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 18:28:34 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens wrote: > Anyway the code is at https://github.com/jribbens/unsafe > It requires Python 3.4 or later (it could probably be made to work on > Python 2.7 as well, but it would need some changes). Rather annoying point: Your interactive mode allows no editing keys (readline etc), and also doesn't have underscore for "last result", as that's a forbidden name. :( Makes tinkering fiddly. ChrisA From p.f.moore at gmail.com Tue Apr 12 04:31:21 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 12 Apr 2016 09:31:21 +0100 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> References: <570C1E13.4090909@stoneleaf.us> <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> Message-ID: On 12 April 2016 at 06:28, Stephen J. Turnbull wrote: > Donald Stufft writes: > > > I think yes and yes [__fspath__ and fspath should be allowed to > > handle bytes, otherwise] it seems like making it needlessly harder > > to deal with a bytes path > > It's not needless. This kind of polymorphism makes it hard to review > code locally. Once bytes get a foothold inside a text application, > they metastasize altogether too easily, and you end up with TypeErrors > or UnicodeErrors quite far from the origin. Debugging often requires > tracing data flows over hill and over dale while choking from the > dusty trail, or band-aids like a top-level "except UnicodeError: > log_and_quarantine(bytes)". I can't prove that returning bytes from > these APIs is a big risk in this sense, but I can't see a way to prove > that it's not, either, given that their point is duck-typing, and > therefore they may be generalized in the future, and by third parties. > > I understand that there are applications where it's bytes all the way > down, but by the very nature of computing systems, there are systems > where bytes are decoded to text. For historical reasons (the encoding > Tower of Babel), it's very error-prone to do that on demand. Best > practice is to do the conversion as close to the boundary as possible, > and process only text internally. > > In text applications, "bytes as carcinogen" is an apt metaphor. > > Now, I'm not Dutch, so I can't tell you it's obvious that the risk to > text-processing applications is more important than the inconvenience > to byte-shoveling applications. But there is a need to be > parsimonious with polymorphism. As someone who has done a lot of work helping projects to port from the 2.x bytes/text model to the 3.x model, I have similar concerns that rooting out the source of bytes objects appearing in a program could be an issue with the proposed "return either" approach. The most effective tool I have found in fixing programs with text/bytes issues is carefully and thoroughly annotating precisely which functions accept and return bytes, and which accept and return text. The sort of mixed-mode processing we're talking about here makes that substantially harder. And note that the signature of os.fspath can return bytes or text *independent* of the type of the argument - it's not a "bytes in, bytes out" function like the usual pattern of "polymorphic support for bytes". But just like Stephen, I have no feel for how significant the risk will be in real life. I've never worked on code that actually has a need for bytestring paths (particularly now that surrogateescape ensures that most cases "just work"). Paul From ncoghlan at gmail.com Tue Apr 12 04:56:44 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Apr 2016 18:56:44 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> References: <570C1E13.4090909@stoneleaf.us> <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> Message-ID: On 12 April 2016 at 15:28, Stephen J. Turnbull wrote: > Donald Stufft writes: > > > I think yes and yes [__fspath__ and fspath should be allowed to > > handle bytes, otherwise] it seems like making it needlessly harder > > to deal with a bytes path > > It's not needless. This kind of polymorphism makes it hard to review > code locally. Once bytes get a foothold inside a text application, > they metastasize altogether too easily, and you end up with TypeErrors > or UnicodeErrors quite far from the origin. Debugging often requires > tracing data flows over hill and over dale while choking from the > dusty trail, or band-aids like a top-level "except UnicodeError: > log_and_quarantine(bytes)". I can't prove that returning bytes from > these APIs is a big risk in this sense, but I can't see a way to prove > that it's not, either, given that their point is duck-typing, and > therefore they may be generalized in the future, and by third parties. > > I understand that there are applications where it's bytes all the way > down, but by the very nature of computing systems, there are systems > where bytes are decoded to text. For historical reasons (the encoding > Tower of Babel), it's very error-prone to do that on demand. Best > practice is to do the conversion as close to the boundary as possible, > and process only text internally. One possible way to address this concern would be to have the underlying protocol be bytes/str (since boundary code frequently needs to handle the paths-are-bytes assumption in POSIX), but offer an "os.fspathname" API that rejected bytes output from os.fspath. That is, it would be equivalent to: def fspathname(path): name = os.fspath(path) if not isinstance(name, str): raise TypeError("Expected str for pathname, not {}".format(type(name))) return name That way folks that wanted the clean "must be str" signature could use os.fspathname, while those that wanted to accept either could use the lower level os.fspath. The ambiguity in question here is inherent in the differences between the way POSIX and Windows work, so there are limits to how far we can go in hiding it without making things worse rather than better. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Tue Apr 12 04:57:37 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 18:57:37 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 6:17 PM, Paul Moore wrote: > Well, I'll ask the obvious question, then. In allowing "import" did > you allow "import ctypes"? If so, then I win :-) Or did you explicitly > whitelist certain modules? And if so, which ones are they, and did I > succeed if I manage to import a module you hadn't whitelisted? The module whitelist is given at the top of the source code: _SAFE_MODULES = frozenset(( "base64", "binascii", "bisect", "calendar", "cmath", "crypt", "datetime", "decimal", "enum", "errno", "fractions", "functools", "hashlib", "hmac", "ipaddress", "itertools", "math", "numbers", "queue", "re", "statistics", "textwrap", "unicodedata", "urllib.parse", )) And yes, you win if you get another module. Interestingly, you're allowed to import urllib.parse, but not urllib itself; but "import urllib.parse" makes urllib available - and, since modules inside modules are blacklisted, "urllib.parse" doesn't exist (AttributeError). You can access the decimal module, and call decimal.getcontext(). This returns the same default context object that the "outer" Python uses; consequently, this sandboxing technique MUST NOT be used in any program that, now or ever in the future, uses the decimal module (or at least its default context; but I'm not sure how you'd be absolutely sure you never EVER use the default context). Even more curiously, you can "import fractions", but you don't get fractions.Fraction - though you *do* get fractions.Decimal. And importing enum gives you EnumMeta, but metaclasses seem to be broken, and you can't get enum.Enum. The sandbox code assumes that an attacker cannot create files in the current directory. rosuav at sikorsky:~/tmp/unsafe$ echo 'import sys; real_module = lambda mod: sys.modules[mod]' >hashlib.py rosuav at sikorsky:~/tmp/unsafe$ ./unsafe.py -i Python 3.6.0a0 (default:78b84ae0b745+, Apr 6 2016, 03:43:18) [GCC 5.3.1 20160323] on linux Type "help", "copyright", "credits" or "license" for more information. (SafeInteractiveConsole) >>> import hashlib >>> hashlib.real_module("sys") Setting LC_ALL and then working with calendar.LocaleTextCalendar() causes locale files to be read. I'm not sure if you can turn that into an exploit, but the attack surface depends on the installed locales on the system. This is still a massive game of whack-a-mole. ChrisA From jon+python-dev at unequivocal.co.uk Tue Apr 12 05:08:05 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 10:08:05 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160412090805.GF8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 06:28:34PM +1000, Chris Angelico wrote: > On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens > wrote: > > Anyway the code is at https://github.com/jribbens/unsafe > > It requires Python 3.4 or later (it could probably be made to work on > > Python 2.7 as well, but it would need some changes). > > Rather annoying point: Your interactive mode allows no editing keys > (readline etc), and also doesn't have underscore for "last result", as > that's a forbidden name. :( Makes tinkering fiddly. It's just a subclass of the stdlib class code.InteractiveConsole, which seems not to offer those features unfortunately. From jon+python-dev at unequivocal.co.uk Tue Apr 12 06:06:23 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 11:06:23 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> Message-ID: <20160412100623.GG8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote: > And yes, you win if you get another module. Interestingly, you're > allowed to import urllib.parse, but not urllib itself; but "import > urllib.parse" makes urllib available - and, since modules inside > modules are blacklisted, "urllib.parse" doesn't exist > (AttributeError). Yes, this is issue #3 on github. I'd need to spend a few minutes thinking about how to make importing of submodules work out properly. > You can access the decimal module, and call decimal.getcontext(). This > returns the same default context object that the "outer" Python uses; OK, decimal goes ;-) > Even more curiously, you can "import fractions", but you don't get > fractions.Fraction - though you *do* get fractions.Decimal. That seems to be because Fraction inherits from numbers.Number, which has a metaclass, so type(Fraction) is abc.ABCMeta not 'type'. That's obviously not a security hole and may well be fixable. > The sandbox code assumes that an attacker cannot create files in the > current directory. If the attacker can create such files then the system is already compromised even if you're not using any sandboxing system, because you won't be able to trust any normal imports from your own code. > Setting LC_ALL and then working with calendar.LocaleTextCalendar() > causes locale files to be read. I don't think that has any obvious relevance. Doing "import enum" causes "enum.py" to be read too, and that isn't a security hole. > This is still a massive game of whack-a-mole. No, it still isn't. If the names blacklist had to keep being extended then you would be right, but that hasn't happened so far. Whitelists by definition contain only a small, limited number of potential moles. The only thing you found above that even remotely approaches an exploit is the decimal.getcontext() thing, and even that I don't think you could use to do any code execution. From rosuav at gmail.com Tue Apr 12 06:27:14 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 20:27:14 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412100623.GG8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens wrote: > On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote: >> The sandbox code assumes that an attacker cannot create files in the >> current directory. > > If the attacker can create such files then the system is already > compromised even if you're not using any sandboxing system, because > you won't be able to trust any normal imports from your own code. Just confirming that, yeah. Though you could protect against it somewhat by pre-importing everything that can legally be imported; that way, at least the attack surface ceases once untrusted code starts executing. Consider it a privilege escalation attack; you can move from "create file in current directory" to "remote code execution" simply by creating hashlib.py and then importing it. >> Setting LC_ALL and then working with calendar.LocaleTextCalendar() >> causes locale files to be read. > > I don't think that has any obvious relevance. Doing "import enum" > causes "enum.py" to be read too, and that isn't a security hole. I mean the system locale files, not just locale.py itself. If nothing else, it's a means of discovering info about the system. I don't know what you can get by figuring out what locales are installed, but it's another concern to think about. >> This is still a massive game of whack-a-mole. > > No, it still isn't. If the names blacklist had to keep being extended > then you would be right, but that hasn't happened so far. Whitelists > by definition contain only a small, limited number of potential moles. > > The only thing you found above that even remotely approaches an > exploit is the decimal.getcontext() thing, and even that I don't > think you could use to do any code execution. decimal.getcontext is a simple and obvious example of a way that global mutable objects can be accessed across the boundary. There is no way to mathematically prove that there are no more, so it's still a matter of blacklisting. I still think you need to work out a "minimum viable set" and set down some concrete rules: if any feature in this set has to be blacklisted in order to achieve security, the experiment has failed. ChrisA From p.f.moore at gmail.com Tue Apr 12 06:41:14 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 12 Apr 2016 11:41:14 +0100 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C1560.7070105@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> <570C1560.7070105@mail.de> Message-ID: On 11 April 2016 at 22:21, Sven R. Kunze wrote: > On 11.04.2016 23:08, Random832 wrote: >> >> On Mon, Apr 11, 2016, at 17:04, Sven R. Kunze wrote: >>> >>> PS: The only way out that I can imagine is to fix pathlib. I am not in >>> favor of fixing functions of "os" and "os.path" to except "path" >>> objects; >> >> Why not? > > > It occurred to me after pondering over Paul's comments. > > "os" and "os.path" is just a completely different level of abstraction. > There is just no need to mess with them. > > The initial failure of my colleague and me of using pathlib can be solely > attributed to pathlib's lack of functionality. Not to the incompatibility of > "os" nor "os.path" with "Path" objects. As your thoughts appear to have been triggered by my comments, I feel I should clarify. 1. I like pathlib even as it is right now, and I'm strongly -1 on removing it. 2. The "external dependency" aspect of 3rd party solutions makes them far less useful to me. 3. The work on improving integration with the stdlib (which is nearly sorted now, as far as I can see) is a big improvement, and I'm all in favour. But even without it, I wouldn't want pathlib to be removed. 4. There are further improvements that could be made to pathlib, certainly, but again they are optional, and pathlib is fine without them. 5. I wish more 3rd party code integrated better with pathlib. The improved integration work might help with this. But ultimately, Python 2 compatibility is likely to be the biggest block (either perceived or real - we can make pathlib support as simple as possible, but some 3rd party authors will remain unwilling to add support for Python 3 only features in the short term). This isn't a pathlib problem. 6. There will probably always be a place for low-level os/os.path code. Adding support in those modules for pathlib doesn't affect that fact, but does make it easier to use pathlib "seamlessly", so why not do so? tl; dr; I'm 100% in favour of pathlib, and in the direction the current discussion (excluding "let's give up on pathlib" digressions) is going. Paul From jon+python-dev at unequivocal.co.uk Tue Apr 12 07:10:40 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 12:10:40 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> Message-ID: <20160412111040.GH8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote: > On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens > wrote: > > No, it still isn't. If the names blacklist had to keep being extended > > then you would be right, but that hasn't happened so far. Whitelists > > by definition contain only a small, limited number of potential moles. > > > > The only thing you found above that even remotely approaches an > > exploit is the decimal.getcontext() thing, and even that I don't > > think you could use to do any code execution. > > decimal.getcontext is a simple and obvious example of a way that > global mutable objects can be accessed across the boundary. There is > no way to mathematically prove that there are no more, so it's still a > matter of blacklisting. No, it's a matter of reducing the whitelist. I must admit that I don't understand in what way this is not already clear. Look: >>> len(unsafe._SAFE_MODULES) 23 I could "mathematically prove" that there are no more security holes in that list by reducing its length to zero. There are still plenty of circumstances in which the experiment would be a useful tool even with no modules allowed to be imported. > I still think you need to work out a "minimum viable set" and set down > some concrete rules: if any feature in this set has to be blacklisted > in order to achieve security, the experiment has failed. The "minimum viable set" in my view would be: no builtins at all, only allowing eval() not exec(), and disallowing yield [from], lambdas and generator expressions. From jon+python-dev at unequivocal.co.uk Tue Apr 12 07:14:45 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 12:14:45 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> Message-ID: <20160412111445.GI8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote: > On Tue, 12 Apr 2016, Jon Ribbens wrote: > >>This is still a massive game of whack-a-mole. > > > >No, it still isn't. If the names blacklist had to keep being extended > >then you would be right, but that hasn't happened so far. Whitelists > >by definition contain only a small, limited number of potential moles. > > > >The only thing you found above that even remotely approaches an > >exploit is the decimal.getcontext() thing, and even that I don't > >think you could use to do any code execution. > > "I don't think"? > > Where's the formal proof? I disallowed the module completely, that's the proof. > Without a proof, this is indeed just a game of whack-a-mole. Almost no computer programs are ever "formally proved" to be secure. None of those that run the global Internet are. I don't see why it makes any sense to demand that my experiment be held to a massively higher standard than the rest of the code everyone relies on every day. From fijall at gmail.com Tue Apr 12 07:38:09 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Tue, 12 Apr 2016 13:38:09 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412111445.GI8206@unequivocal.co.uk> References: <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111445.GI8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 1:14 PM, Jon Ribbens wrote: > On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote: >> On Tue, 12 Apr 2016, Jon Ribbens wrote: >> >>This is still a massive game of whack-a-mole. >> > >> >No, it still isn't. If the names blacklist had to keep being extended >> >then you would be right, but that hasn't happened so far. Whitelists >> >by definition contain only a small, limited number of potential moles. >> > >> >The only thing you found above that even remotely approaches an >> >exploit is the decimal.getcontext() thing, and even that I don't >> >think you could use to do any code execution. >> >> "I don't think"? >> >> Where's the formal proof? > > I disallowed the module completely, that's the proof. > >> Without a proof, this is indeed just a game of whack-a-mole. > > Almost no computer programs are ever "formally proved" to be secure. > None of those that run the global Internet are. I don't see why it > makes any sense to demand that my experiment be held to a massively > higher standard than the rest of the code everyone relies on every day. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com Jon, let me reiterate. You asked people to break it (that's the title of the thread) and they did so almost immediately. Then you patched the thing and asked them to break it again and they did. Now the faulty assumption here is that this procedure, repeated enough times will produce a secure environment - this is not how security works, you need to be secure against people who will spend more than 5 minutes and who are not on this list or reading this incredibly long email chain. You can't do that just by asking on the mailing list and whacking all the examples. As others pointed out, this particular approach (with maybe different details) has been tried again and again and again and the result has been the same - you end up with either a completely unusable python (the python that can't run anything is trivially secure) or you end up with something that's insecure. I suggest you look instead at something like PyPy sandbox - which systematically replaces all external calls with a call to a proxy. Because PyPy is written in RPython, you can do that - the amount of code that needs reviewing is relatively small, a couple pages of code. The code you need to review in order to be even remotely secure is much larger - it's the amount of C code you can call from your python with or without knowing that it can happen. Cheers, fijal From rosuav at gmail.com Tue Apr 12 08:05:22 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 22:05:22 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412111040.GH8206@unequivocal.co.uk> References: <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111040.GH8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 9:10 PM, Jon Ribbens wrote: > On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote: >> decimal.getcontext is a simple and obvious example of a way that >> global mutable objects can be accessed across the boundary. There is >> no way to mathematically prove that there are no more, so it's still a >> matter of blacklisting. > > No, it's a matter of reducing the whitelist. I must admit that > I don't understand in what way this is not already clear. Look: > > >>> len(unsafe._SAFE_MODULES) > 23 > > I could "mathematically prove" that there are no more security holes > in that list by reducing its length to zero. There are still plenty > of circumstances in which the experiment would be a useful tool even > with no modules allowed to be imported. Yes, you just removed decimal because of getcontext. What about the next module with that kind of issue? Or what about the next non-underscore attribute on a core type that can cause you grief (like how async functions leak stack frames)? >> I still think you need to work out a "minimum viable set" and set down >> some concrete rules: if any feature in this set has to be blacklisted >> in order to achieve security, the experiment has failed. > > The "minimum viable set" in my view would be: no builtins at all, > only allowing eval() not exec(), and disallowing yield [from], > lambdas and generator expressions. Then start with that. Don't give ANYTHING else. Otherwise you're still playing with the blacklist. But at that point, you pretty much have something that can't be recognized as Python. You may as well start from a completely different basis and design your own expression evaluator, maybe making use of parse-to-AST, but not actually eval'ing the source code. That's how fundamental this issue is - to dodge the security problems, you get to the point where you've dodged all of what makes Python Python. ChrisA From victor.stinner at gmail.com Tue Apr 12 08:05:06 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Apr 2016 14:05:06 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412111040.GH8206@unequivocal.co.uk> References: <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111040.GH8206@unequivocal.co.uk> Message-ID: 2016-04-12 13:10 GMT+02:00 Jon Ribbens : > No, it's a matter of reducing the whitelist. I must admit that > I don't understand in what way this is not already clear. Look: > > >>> len(unsafe._SAFE_MODULES) > 23 You don't understand that even if the visible "Python scope", "Python namespace", or call it as you want (the code that is accessible from your sandbox) looks very tiny, the real effictive code is HUGE. For example, you give a full access to the str type which is made of 20K lines of C code: haypo at smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c Objects/stringlib/*h 15670 Objects/unicodeobject.c 297 Objects/unicodectype.c 29 Objects/stringlib/asciilib.h 827 Objects/stringlib/codecs.h 27 Objects/stringlib/count.h 109 Objects/stringlib/ctype.h 25 Objects/stringlib/eq.h 250 Objects/stringlib/fastsearch.h 201 Objects/stringlib/find.h 133 Objects/stringlib/find_max_char.h 140 Objects/stringlib/join.h 180 Objects/stringlib/localeutil.h 116 Objects/stringlib/partition.h 53 Objects/stringlib/replace.h 390 Objects/stringlib/split.h 28 Objects/stringlib/stringdefs.h 266 Objects/stringlib/transmogrify.h 30 Objects/stringlib/ucs1lib.h 29 Objects/stringlib/ucs2lib.h 29 Objects/stringlib/ucs4lib.h 11 Objects/stringlib/undef.h 32 Objects/stringlib/unicodedefs.h 1284 Objects/stringlib/unicode_format.h 20156 total Did you review carefully *all* these lines? If a single C line gives access to the real Python namespace, the game is over. In a few minutes, I found "{0.__class__}".format(obj) which is not a full escape of the sandbox, but it's just to give one example. With more time, I'm sure that a line can be found in the str type to escape your sandbox. > I could "mathematically prove" that there are no more security holes > in that list by reducing its length to zero. You only see a very tiny portion of the real attack surface. > The "minimum viable set" in my view would be: no builtins at all, > only allowing eval() not exec(), and disallowing yield [from], > lambdas and generator expressions. IMHO it's a waste of time to try to reduce the great Python with battery included to a simple calculator to compute 1+2. You will never be able to fix all holes, there are too many holes in your sandbox. It's very easy to implement your own calculator in pure Python, from the parser to the code to compute the operators. If you write yourself the whole code, it's much easier to control what is allowed and put limits. For example, with your own code, you can put limits on the maximum number, whereas your sandbox will kill your CPU and memory if you try 2**(2**100) (no builtin function required for this "exploit"). Victor From victor.stinner at gmail.com Tue Apr 12 08:16:57 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Apr 2016 14:16:57 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160408141847.GQ4951@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: 2016-04-08 16:18 GMT+02:00 Jon Ribbens : > I've made another attempt at Python sandboxing, which does something > which I've not seen tried before - using the 'ast' module to do static > analysis of the untrusted code before it's executed, to prevent most > of the sneaky tricks that have been used to break out of past attempts > at sandboxes. Right, it blocks the most trivial attacks against sandboxes. But you only fixed a few holes, they are still a wide area of holes to escape your sandbox. I read your code and the code of CPython. I found many issues. Your sandbox runs untrusted code in a new namespace. The game is to get access of the outter namespace, the real Python namespace. For example, get the namespace of the unsafe module. Your bet is that blocking access to "_" variables, using a whitelist of modules and a few other protections is enough to block access to the real namespace. The problem is that Python provides a very wide range of tools for introspection. I expected to find a hole using the C code, but in fact, it was much simpler than that. Your "safe import" hides real functions with a proxy. Ok. But the code of modules is still run in the real namespace, where I expected that modules run in the untrusted (restricted) namespace. The game is now to find a way to retrieve content from the real namespace using any function exposed in modules. I found functools.update_wrapper(). I was very surprised because this function calls getattr() and setattr(), whereas your sandbox replaces these builtin functions. In fact, the "safe" getattr and setattr are only installed in the untrusted namespace, and as I wrote, the modules run in the real Python namespace. > I would be very interested to see if anyone can manage to break it. So here you have: --- import functools # any proxy function from unsafe.py import base64 src = base64.main # hack to get any attribute of an object def getattr(obj, attr): secret = None class A: def __setattr__(self, key, value): nonlocal secret if key == attr: secret = value dst = A() functools.update_wrapper(dst, src, assigned=(attr,), updated=()) return secret builtins = getattr(base64.main, "__globals__")["__builtins__"] fn = "/tmp/owned" with builtins.open(fn, "w") as f: f.write("game over!\n") --- The exploit is based on two things: * update_wrapper() is used to get the secret attribute using the real getattr() function * update_wrapper() + A.__setattr__ are used to pass the secret from the real namespace to the untrusted namespace > Bugs which are trivially fixable are of course welcomed, but the real > question is: is this approach basically sound, or is it fundamentally > unworkable? You can block the functools.update_wrapper(), or even the whole functools module. But it will not fix the root cause: modules must run in the untrusted namespace. In pysandbox, I have code to ensure that all modules run in the untrusted namespace: see CleanupBuiltins in sandbox/builtins.py. But it was not enough, many vulnerabilities were found even with all my protections. I'm sure that many others will find other ways to escape your sandbox with enough time. It's a matter of time, not a matter of whitelists. As I wrote in my long explaning why pysandbox is broken by design, writing a sandbox inside a CPython doesn't work. In fact, what you want to restrict is the access to limited resources like CPU and memory, and block access to the filesystem. This is the job of the operating system, and external sandboxes help to block access to the filesystem. Victor From jon+python-dev at unequivocal.co.uk Tue Apr 12 08:18:33 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 13:18:33 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111445.GI8206@unequivocal.co.uk> Message-ID: <20160412121833.GK8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 01:38:09PM +0200, Maciej Fijalkowski wrote: > Jon, let me reiterate. You asked people to break it (that's the title > of the thread) and they did so almost immediately. Then you patched > the thing and asked them to break it again and they did. Now the > faulty assumption here is that this procedure, repeated enough times > will produce a secure environment - this is not how security works, That is not an accurate summary of what has happened so far, nor am I making that assumption. You are misunderstanding the purpose of the experiment - I am not sure how, as I have tried to be quite clear. The question is: with a minimal (or empty) set of builtins, and a restriction on ast.Name and ast.Attribute nodes, can exec/eval be made 'safe' so they cannot execute code outside the sandbox. The answer appears to be "yes", if the restriction is "^f?_". (If you additionally inject external objects to the namespace then they need to be proxied and mro() prevented.) > You can't do that just by asking on the mailing list and whacking > all the examples. If anyone had managed to find any more examples of holes in the original featureset after the first couple then I would agree with you, but they haven't. > As others pointed out, this particular approach (with maybe > different details) has been tried again and again and again This simply isn't true either. As far as I can see, only RestrictedPython has tried anything remotely similar, and to the best of my ability to determine, that project is not considerd a failure. From victor.stinner at gmail.com Tue Apr 12 08:20:37 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Apr 2016 14:20:37 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111445.GI8206@unequivocal.co.uk> Message-ID: 2016-04-12 13:38 GMT+02:00 Maciej Fijalkowski : > (...) you end up with either a > completely unusable python (the python that can't run anything is > trivially secure) Yeah, that's the obvious question: what's the purpose of such very limited Python subset, for example something limited to int with a few operators (+ - * /)? That's also why I gave up with pysandbox. It became impossible to execute anything more complex than an hello world. By the way, I noticed that enum.Enum and enum.EnumMeta don't work in your sandbox. Victor From victor.stinner at gmail.com Tue Apr 12 08:24:31 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Apr 2016 14:24:31 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412121833.GK8206@unequivocal.co.uk> References: <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111445.GI8206@unequivocal.co.uk> <20160412121833.GK8206@unequivocal.co.uk> Message-ID: 2016-04-12 14:18 GMT+02:00 Jon Ribbens : > The question is: with a minimal (or empty) set of builtins, and a > restriction on ast.Name and ast.Attribute nodes, can exec/eval be > made 'safe' so they cannot execute code outside the sandbox. According to multiple exploits listed in this thread, no, it's not possible. > If anyone had managed to find any more examples of holes in the > original featureset after the first couple then I would agree with > you, but they haven't. See my latest exploit using functools.update_wrapper() + A.__setattr__() ;-) >> As others pointed out, this particular approach (with maybe >> different details) has been tried again and again and again > > This simply isn't true either. As far as I can see, only > RestrictedPython has tried anything remotely similar, and > to the best of my ability to determine, that project is not > considerd a failure. IMHO nobody seriously audited RestrictedPython. It doesn't mean that it's secure. When it was created, security was less important than nowadays. Victor From victor.stinner at gmail.com Tue Apr 12 08:31:19 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Apr 2016 14:31:19 +0200 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: 2016-04-12 14:16 GMT+02:00 Victor Stinner : > I read your code and the code of CPython. I found many issues. > (...) > The exploit is based on two things: > > * update_wrapper() is used to get the secret attribute using the real > getattr() function > * update_wrapper() + A.__setattr__ are used to pass the secret from > the real namespace to the untrusted namespace Oh, I forgot to mention another vulnerability: you block access to attributes by replacing getattr and by analyzing the AST. Ok, but one more time, it's not enough. If you get access to obj.__dict__, you will likely get access to any attribute using obj_dict[attr] instead of obj.attr. I wrote pysandbox because I liked Tav's idea of *removing* sensitive dictionary keys of sensitive types like functions, frames and code objects. Again, it was not enough. Victor From jon+python-dev at unequivocal.co.uk Tue Apr 12 08:31:45 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 13:31:45 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111040.GH8206@unequivocal.co.uk> Message-ID: <20160412123145.GL8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote: > 2016-04-12 13:10 GMT+02:00 Jon Ribbens : > > No, it's a matter of reducing the whitelist. I must admit that > > I don't understand in what way this is not already clear. Look: > > > > >>> len(unsafe._SAFE_MODULES) > > 23 > > You don't understand that even if the visible "Python scope", "Python > namespace", or call it as you want (the code that is accessible from > your sandbox) looks very tiny, the real effictive code is HUGE. You are mistaken, I do understand that. > In a few minutes, I found "{0.__class__}".format(obj) which is not a > full escape of the sandbox, but it's just to give one example. It's something I'd already thought of, and it's not an escape at all. > > I could "mathematically prove" that there are no more security holes > > in that list by reducing its length to zero. > > You only see a very tiny portion of the real attack surface. You've misunderstood my comment - I was saying that the security holes from imported modules can be easily eliminated. That doesn't say anything about security holes not from imported modules, of course. > > The "minimum viable set" in my view would be: no builtins at all, > > only allowing eval() not exec(), and disallowing yield [from], > > lambdas and generator expressions. > > IMHO it's a waste of time to try to reduce the great Python with > battery included to a simple calculator to compute 1+2. And in my opinion it isn't. There are plenty of use cases for such a thing. Take a look at this for example: https://developer.blender.org/D1862 > It's very easy to implement your own calculator in pure Python, from > the parser to the code to compute the operators. If you write yourself > the whole code, it's much easier to control what is allowed and put > limits. For example, with your own code, you can put limits on the > maximum number, whereas your sandbox will kill your CPU and memory if > you try 2**(2**100) (no builtin function required for this "exploit"). Yes, I'd already thought of that too, although if you allow functions and methods to be called (which they are, in my minimal viable set suggestion above) then I think perhaps you've not actually bought yourself very much with all that work. From jon+python-dev at unequivocal.co.uk Tue Apr 12 08:42:31 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 13:42:31 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160412124231.GM8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote: > Oh, I forgot to mention another vulnerability: you block access to > attributes by replacing getattr and by analyzing the AST. Ok, but one > more time, it's not enough. If you get access to obj.__dict__, you > will likely get access to any attribute using obj_dict[attr] instead > of obj.attr. That's not a vulnerability, and it's something I already explicitly mentioned - if you can get a function to return an object's __dict__ then you win. The question is: can you do that? From rosuav at gmail.com Tue Apr 12 08:45:06 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 22:45:06 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412124231.GM8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160412124231.GM8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens wrote: > On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote: >> Oh, I forgot to mention another vulnerability: you block access to >> attributes by replacing getattr and by analyzing the AST. Ok, but one >> more time, it's not enough. If you get access to obj.__dict__, you >> will likely get access to any attribute using obj_dict[attr] instead >> of obj.attr. > > That's not a vulnerability, and it's something I already explicitly > mentioned - if you can get a function to return an object's __dict__ > then you win. The question is: can you do that? The question is, rather: Can you prove that we cannot? ChrisA From jon+python-dev at unequivocal.co.uk Tue Apr 12 08:48:22 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 13:48:22 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> Message-ID: <20160412124822.GN8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 02:16:57PM +0200, Victor Stinner wrote: > I read your code and the code of CPython. I found many issues. Thanks for your efforts. > Your "safe import" hides real functions with a proxy. Ok. But the code > of modules is still run in the real namespace, Yes, that was the intention. > I found functools.update_wrapper(). I was very surprised because this > function calls getattr() and setattr(), whereas your sandbox replaces > these builtin functions. Good point. It seems it was almost certainly foolish of me to add 'import' back in in response to peoples' comments while my original concept was still being discussed. > So here you have: > --- > import functools Thanks, that was pretty clever. I've of course fixed it by reducing the list of imports (a lot, since I had really audited them at all). But you make a good point. From jon+python-dev at unequivocal.co.uk Tue Apr 12 08:49:50 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 13:49:50 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160412124231.GM8206@unequivocal.co.uk> Message-ID: <20160412124950.GO8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote: > On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens > wrote: > > That's not a vulnerability, and it's something I already explicitly > > mentioned - if you can get a function to return an object's __dict__ > > then you win. The question is: can you do that? > > The question is, rather: Can you prove that we cannot? I refer you to the answer given previously. Can you prove you cannot write code to escape JavaScript sandboxes? No? Then why have you not disabled JavaScript in your browser? From rosuav at gmail.com Tue Apr 12 09:03:11 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 23:03:11 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412124950.GO8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160412124231.GM8206@unequivocal.co.uk> <20160412124950.GO8206@unequivocal.co.uk> Message-ID: On Tue, Apr 12, 2016 at 10:49 PM, Jon Ribbens wrote: > On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote: >> On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens >> wrote: >> > That's not a vulnerability, and it's something I already explicitly >> > mentioned - if you can get a function to return an object's __dict__ >> > then you win. The question is: can you do that? >> >> The question is, rather: Can you prove that we cannot? > > I refer you to the answer given previously. Can you prove you cannot > write code to escape JavaScript sandboxes? No? Then why have you not > disabled JavaScript in your browser? I personally cannot, any more than I can prove that SSL is secure or that my Linux+Apache system doesn't allow remote code execution [1]. I trust other people to, and then make a value judgement: is it worth breaking all the web sites that depend on it? (And sometimes the answer is "yes".) One of the key differences with scripts in web browsers is that there *is* no "outer environment" to access. Remember what I said about the difference between Python-in-Python sandboxing and, say, Lua-in-Python? One tiny exploit in Python-in-Python and you suddenly gain access to the entire outer environment, and it's game over. One tiny exploit in Lua-in-Python and you have whatever that exploit gave you, nothing more. In fact, if you're prepared to forfeit almost all of Python's power to achieve security, you probably should look into embedding a JavaScript or Lua engine in your Python code. You'll get a comparable expression evaluator, and most people won't be able to tell the difference. You've already cut the set of modules down to just cmath, datetime, math, and re; I suspect re is next on the chopping block (it has a global cache - if the outer system uses a regular expression more than once, it would potentially be possible to mess with it in the cache, and then next time it gets used, the injected code gets run), and datetime might not be that far behind. And if they do go, all you have left is a scientific calculator. You can implement that in any language you like. ChrisA [1] And if anyone mentions PHP, I will set him to work on the hardest PHP problem I know of - no, not securing it. I mean convincing end users that it's not necessary. Securing it is trivial by comparison. From steve at pearwood.info Tue Apr 12 09:12:27 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 12 Apr 2016 23:12:27 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111040.GH8206@unequivocal.co.uk> Message-ID: <20160412131226.GB1819@ando.pearwood.info> I haven't been following this thread in detail, so perhaps I have missed something, but I have a question... On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote: > You don't understand that even if the visible "Python scope", "Python > namespace", or call it as you want (the code that is accessible from > your sandbox) looks very tiny, the real effictive code is HUGE. For > example, you give a full access to the str type which is made of 20K > lines of C code: > > haypo at smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c > Objects/stringlib/*h > 15670 Objects/unicodeobject.c [...] > 1284 Objects/stringlib/unicode_format.h > 20156 total > > Did you review carefully *all* these lines? If a single C line gives > access to the real Python namespace, the game is over. I don't follow this logic. Jon's sandbox doesn't provide an interface to calling arbitrary lines of C code from Python. It is limited to only a restricted set of Python operations. So sticking to string methods for the sake of discussion, it doesn't matter if (let's say) str.upper has access to the real Python namespace. There is no API for str.upper to return that namespace. It only returns a new string. So where is the error in the following reasoning? There are 44 string methods, excluding those that start with an underscore. So if Jon audits those 44 methods, and determines which ones return (let's say) strings and which give access to namespaces, then he can block the ones which give access to namespaces and allow the ones which return strings. To give a concrete example... suppose that the C locale library is unsafe. Further, let's suppose that the str.isdigit method calls code from the C locale library, to determine whether or not the string is made up of locale-specific digits. How does this make str.isdigit (potentially) unsafe? Regardless of what happens inside the method, it still returns either True or False and nothing else. There's no str.isdigit API to access the locale library. I can think of one possible threat. Suppose that the locale library has a bug, so that calling "aardvark".isdigit seg faults, potentially executing arbitrary C code, but at the very least crashing the application. Is that the sort of attack you're concerned by? > In a few minutes, I found "{0.__class__}".format(obj) which is not a > full escape of the sandbox, but it's just to give one example. With > more time, I'm sure that a line can be found in the str type to escape > your sandbox. Maybe so. And then Jon will fix that vulnerability. And somebody will find a new one. And he'll fix that too, or decide that it is too hard to fix and give up. That's how security works. Even software designed for security can have exploitable bugs: http://securityvulns.com/news/FreeBSD/jail/chdir.html It seems unfair to me to hold Jon to a higher standard than we hold people like Apple, or the Linux kernal devs. I fully accept and respect your personal opinion, based on your experience, that Jon's tactic is doomed to failure. But if he needs to learn this for himself, just as you had to learn it for yourself (otherwise you wouldn't have started your own sandbox project), I can respect that too. Progress depends on the unreasonable person who thinks they can overturn the conventional wisdom. You're telling Jon not to bother trying to sandbox CPython, he should use PyPy's sandbox instead. But if the PyPy people had believed the conventional wisdom that you can't sandbox Python, they wouldn't have a sandbox either. Even if the only thing we learn from Jon's experiment is a new set of tricks for breaking out of the sandbox, that's still interesting, if not useful. And maybe he'll find some combination of whielist and OS-level jail that together makes a practical sandbox. And if not, well, it's his own time he is wasting. > IMHO it's a waste of time to try to reduce the great Python with > battery included to a simple calculator to compute 1+2. Completely agree. But hopefully the whitelist won't be that restrictive, and will allow subtraction and multiplication as well :-) -- Steve From rosuav at gmail.com Tue Apr 12 09:19:53 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Apr 2016 23:19:53 +1000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412131226.GB1819@ando.pearwood.info> References: <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111040.GH8206@unequivocal.co.uk> <20160412131226.GB1819@ando.pearwood.info> Message-ID: On Tue, Apr 12, 2016 at 11:12 PM, Steven D'Aprano wrote: > To give a concrete example... suppose that the C locale library is > unsafe. Further, let's suppose that the str.isdigit method calls code > from the C locale library, to determine whether or not the string is > made up of locale-specific digits. How does this make str.isdigit > (potentially) unsafe? Regardless of what happens inside the method, it > still returns either True or False and nothing else. There's no > str.isdigit API to access the locale library. > > I can think of one possible threat. Suppose that the locale library has > a bug, so that calling "aardvark".isdigit seg faults, potentially > executing arbitrary C code, but at the very least crashing the > application. Is that the sort of attack you're concerned by? That is a potentially significant attack vector, as it depends on a lot of external-to-Python information (the current locale, for instance; and we've seen exploits that involve remotely setting environment variables, which could include LC_ALL). However, you're right that it isn't the concern here. There is one other thing to worry about, and that's anything where the "inner" system can affect or influence the "outer" system. With the str type, that's unlikely (since strings are immutable), but I raised the potential concern of the regex cache, as there's a chance someone could attack that. The mere presence of decimal.getcontext() resulted in the whole module getting off the whitelist. If you want complete isolation of one and the other, that's easy: have no communication whatsoever. But then there's no point in having them both execute in the same interpreter. You may as well create a chroot and run Python inside that, have it serialize the result to JSON and write it to stdout, which you can then retrieve. That would pretty much solve the problem. (And in fact, if I were to do-over the project where I wanted Python sandboxing, that's probably what I'd do.) ChrisA From ijmorlan at uwaterloo.ca Tue Apr 12 06:21:04 2016 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Tue, 12 Apr 2016 06:21:04 -0400 (EDT) Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412100623.GG8206@unequivocal.co.uk> References: <20160408141847.GQ4951@unequivocal.co.uk> <20160410164308.GE17895@unequivocal.co.uk> <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> Message-ID: On Tue, 12 Apr 2016, Jon Ribbens wrote: >> This is still a massive game of whack-a-mole. > > No, it still isn't. If the names blacklist had to keep being extended > then you would be right, but that hasn't happened so far. Whitelists > by definition contain only a small, limited number of potential moles. > > The only thing you found above that even remotely approaches an > exploit is the decimal.getcontext() thing, and even that I don't > think you could use to do any code execution. "I don't think"? Where's the formal proof? Without a proof, this is indeed just a game of whack-a-mole. I don't "think" Python is a suitable foundation for a sandboxing system intended for security purposes, but my "think" won't lead to security holes whereas yours will. So, I would respectfully suggest that unless you increase the rigour of your effort substantially, it is not worthwhile. Python is great for lots of applications already - there is no need to force it into unsuitable problem domains. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist From dw+python-dev at hmmz.org Tue Apr 12 09:40:57 2016 From: dw+python-dev at hmmz.org (David Wilson) Date: Tue, 12 Apr 2016 13:40:57 +0000 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412131226.GB1819@ando.pearwood.info> References: <20160411144644.GA8206@unequivocal.co.uk> <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111040.GH8206@unequivocal.co.uk> <20160412131226.GB1819@ando.pearwood.info> Message-ID: <20160412134057.GA15550@k3> On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote: > I can think of one possible threat. Suppose that the locale library > has a bug, so that calling "aardvark".isdigit seg faults, potentially > executing arbitrary C code, but at the very least crashing the > application. Is that the sort of attack you're concerned by? This thread already covered the need to address SEGV at length. For a truly evil user, almost any kind of crash is an opportunity to take control of the system, and a security solution ignoring this is no security solution at all. > Maybe so. And then Jon will fix that vulnerability. And somebody will > find a new one. And he'll fix that too, or decide that it is too hard > to fix and give up. > > That's how security works. Even software designed for security can > have exploitable bugs: > > It seems unfair to me to hold Jon to a higher standard than we hold > people like Apple, or the Linux kernal devs. I don't believe that's what is happening here. In the OS analogy, Jon is generating busywork trying to secure an environment similar to Windows 3.1 that was simply never designed with e.g. memory protection in mind to begin with, and there is no evidence after numerous attempts spanning many years by multiple people that such an environment can be secured meaningfully while still remaining generally useful. > I fully accept and respect your personal opinion, based on your > experience, that Jon's tactic is doomed to failure. But if he needs to > learn this for himself, just as you had to learn it for yourself > (otherwise you wouldn't have started your own sandbox project), I can > respect that too. Progress depends on the unreasonable person who > thinks they can overturn the conventional wisdom. I'd deeply prefer it is this turned into an investigation or patchset making CPython work nicely with seccomp, sandbox(7), pledge(2) or whatever capability minimization mechanisms exist on Windows, they are all mechanisms to make it much safer for random code to be executing on your system, designed by folk who at all times expressively had security in mind. But that's not what's happening, instead a dead horse is being flogged over a hundred messages in our inboxes and IMHO it is excruciating to watch. > Even if the only thing we learn from Jon's experiment is a new set of > tricks for breaking out of the sandbox, that's still interesting, if > not useful. Don't forget the worst case: a fundamentally broken security module heavily marketed to the naive using claims the core team couldn't break it. David From jon+python-dev at unequivocal.co.uk Tue Apr 12 09:48:12 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 14:48:12 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: References: <20160408141847.GQ4951@unequivocal.co.uk> <20160412124231.GM8206@unequivocal.co.uk> <20160412124950.GO8206@unequivocal.co.uk> Message-ID: <20160412134812.GP8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 11:03:11PM +1000, Chris Angelico wrote: > One of the key differences with scripts in web browsers is that there > *is* no "outer environment" to access. If you think that then I think you considerably misunderstand how modern browsers work. > Remember what I said about the difference between Python-in-Python > sandboxing and, say, Lua-in-Python? One tiny exploit in > Python-in-Python and you suddenly gain access to the entire outer > environment, and it's game over. One tiny exploit in Lua-in-Python > and you have whatever that exploit gave you, nothing more. Are you imagining the Lua-in-Python as being completely isolated from the Python namespace then? > In fact, if you're prepared to forfeit almost all of Python's power to > achieve security, you probably should look into embedding a JavaScript > or Lua engine in your Python code. Yes, I have in fact already done this (JavaScript using SpiderMonkey). It allows the JavaScript to access Python objects and methods directly from JavaScript so it doesn't actually help, but I think I could put limits on that (e.g. making things read-only) and unlike most of this Python stuff, that could be made a solid rule with no clever ways around it. > I suspect re is next on the chopping block (it has a global cache - > if the outer system uses a regular expression more than once, it > would potentially be possible to mess with it in the cache, and then > next time it gets used, the injected code gets run), All you could do would be to give misleading results from the regular expression methods, but yes that is a good point. I regret that I added the import stuff at all now - it has just been a distraction from my original point. > [1] And if anyone mentions PHP, I will set him to work on the hardest > PHP problem I know of - no, not securing it. I mean convincing end > users that it's not necessary. Securing it is trivial by comparison. Fortunately I have managed to exclude PHP completely these days from any system I have anything to do with! From jon+python-dev at unequivocal.co.uk Tue Apr 12 10:03:47 2016 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Tue, 12 Apr 2016 15:03:47 +0100 Subject: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited) In-Reply-To: <20160412134057.GA15550@k3> References: <20160411165354.GC8206@unequivocal.co.uk> <20160412100623.GG8206@unequivocal.co.uk> <20160412111040.GH8206@unequivocal.co.uk> <20160412131226.GB1819@ando.pearwood.info> <20160412134057.GA15550@k3> Message-ID: <20160412140347.GQ8206@unequivocal.co.uk> On Tue, Apr 12, 2016 at 01:40:57PM +0000, David Wilson wrote: > On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote: > > I can think of one possible threat. Suppose that the locale library > > has a bug, so that calling "aardvark".isdigit seg faults, potentially > > executing arbitrary C code, but at the very least crashing the > > application. Is that the sort of attack you're concerned by? > > This thread already covered the need to address SEGV at length. For a > truly evil user, almost any kind of crash is an opportunity to take > control of the system, and a security solution ignoring this is no > security solution at all. Indeed. > But that's not what's happening, instead a dead horse is being flogged > over a hundred messages in our inboxes and IMHO it is excruciating to > watch. I don't think that is true at all, and I personally I have found this thread very interesting. I apologise if others have not. > > Even if the only thing we learn from Jon's experiment is a new set of > > tricks for breaking out of the sandbox, that's still interesting, if > > not useful. > > Don't forget the worst case: a fundamentally broken security module > heavily marketed to the naive using claims the core team couldn't break > it. I should point out that my module is called "unsafe.py", is titled an "experiment", and prominently states in the README: Do not use this code for any purpose in the real world. I will not be putting it up as an installable package, and as already stated it was never my intention to suggest that it or anything like it be included in the stdlib. I will however leave it on github for anyone who wants to have a go at breaking into it in the future. From srkunze at mail.de Tue Apr 12 10:52:30 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 12 Apr 2016 16:52:30 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de> <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com> Message-ID: <570D0BAE.9020404@mail.de> On 12.04.2016 00:56, Random832 wrote: > Fully general re-dispatch from argument types on any call to a function > that raises TypeError or NotImplemented? [e.g. call > Path.__missing_func__(os.open, path, mode)] > > Have pathlib monkey-patch things at import? Implicit conversion. No, thanks. > On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote: >> So, I might add: >> >> 3. add more high-level features to pathlib to prevent a downgrade to os >> or os.path > 3. reimplement the entire ecosystem in every walled garden so no-one has > to leave their walled gardens. > > What's the point of batteries being included if you can't wire them to > anything? Huh? That makes not sense to me. > I don't get what you mean by this whole "different level of abstraction" > thing, anyway. Strings are strings. Paths are paths. That's were the difference is. > The fact that there is one obvious thing to want to do > with open and a Path strongly suggests that that should be able to be > done by passing the Path to open. Path(...).open() is your friend then. I don't see why you need os.open. Refusing to upgrade it like saying, everything was better in the old days. So let's use os.open instead of Path(...).open(). > Also, what level of abstraction is builtin open? Maybe we should _just_ > leave os alone on the grounds of some holy sacred lowest-level-itude, > but allow io and shutils to accept Path? os, io and shutils accept strings. Not Path objects. Why? Because the semantics of "being a path" are applied implicitly by those modules. You are free to use a random string as a path and later as the name of your pet. Semantics of a string comes from usage. Path objects however have built-in semantics. Furthermore, if os, io and shutils are changed, we allow code like the following: my_path.touch() os.remove(my_path) I don't know how to explain reasonably why my_path sometimes stays in front of the method call and sometimes behind it to newbies. Best, Sven From srkunze at mail.de Tue Apr 12 10:54:03 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 12 Apr 2016 16:54:03 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> <570C1560.7070105@mail.de> Message-ID: <570D0C0B.7000208@mail.de> On 12.04.2016 12:41, Paul Moore wrote: > As your thoughts appear to have been triggered by my comments, I feel > I should clarify. > > 1. I like pathlib even as it is right now, and I'm strongly -1 on removing it. > 2. The "external dependency" aspect of 3rd party solutions makes them > far less useful to me. > 3. The work on improving integration with the stdlib (which is nearly > sorted now, as far as I can see) is a big improvement, and I'm all in > favour. But even without it, I wouldn't want pathlib to be removed. > 4. There are further improvements that could be made to pathlib, > certainly, but again they are optional, and pathlib is fine without > them. My conclusion is that these changes are not optional and tweaking os, io and shutil is just yet another workaround for a clean solution. :) Just my two cents. > 5. I wish more 3rd party code integrated better with pathlib. The > improved integration work might help with this. But ultimately, Python > 2 compatibility is likely to be the biggest block (either perceived or > real - we can make pathlib support as simple as possible, but some 3rd > party authors will remain unwilling to add support for Python 3 only > features in the short term). This isn't a pathlib problem. > 6. There will probably always be a place for low-level os/os.path > code. Adding support in those modules for pathlib doesn't affect that > fact, but does make it easier to use pathlib "seamlessly", so why not > do so? > > tl; dr; I'm 100% in favour of pathlib, and in the direction the > current discussion (excluding "let's give up on pathlib" digressions) > is going. Best, Sven From donald at stufft.io Tue Apr 12 10:58:11 2016 From: donald at stufft.io (Donald Stufft) Date: Tue, 12 Apr 2016 10:58:11 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570D0BAE.9020404@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de> <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com> <570D0BAE.9020404@mail.de> Message-ID: <196C5476-DEC9-4822-93BD-A7C53D76D50C@stufft.io> > On Apr 12, 2016, at 10:52 AM, Sven R. Kunze wrote: > > Path(...).open() is your friend then. I don't see why you need os.open. > > Refusing to upgrade it like saying, everything was better in the old days. So let's use os.open instead of Path(...).open(). I think it was a mistake to have Path(?).open to be honest and I think the main reason it exists is because open(Path(?)) doesn?t work (yet!). You can?t hang every single thing you might ever want to do to a Path off the path object. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From random832 at fastmail.com Tue Apr 12 10:59:05 2016 From: random832 at fastmail.com (Random832) Date: Tue, 12 Apr 2016 10:59:05 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570D0BAE.9020404@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de> <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com> <570D0BAE.9020404@mail.de> Message-ID: <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com> On Tue, Apr 12, 2016, at 10:52, Sven R. Kunze wrote: > On 12.04.2016 00:56, Random832 wrote: > > Fully general re-dispatch from argument types on any call to a function > > that raises TypeError or NotImplemented? [e.g. call > > Path.__missing_func__(os.open, path, mode)] > > > > Have pathlib monkey-patch things at import? > > Implicit conversion. No, thanks. No more so than __radd__ - I didn't actually mean this as a serious suggestion, but but python *does* already have multiple dispatch. > > On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote: > > I don't get what you mean by this whole "different level of abstraction" > > thing, anyway. > > Strings are strings. Paths are paths. That's were the difference is. Yes but why aren't these both "things that you may want to use to open a file"? > > The fact that there is one obvious thing to want to do > > with open and a Path strongly suggests that that should be able to be > > done by passing the Path to open. > > Path(...).open() is your friend then. I don't see why you need os.open. Because I'm passing it to modfoo.dosomethingwithafile() which takes a filename and passes it to shutils, which passes it to builtin open, which passes it to os.open. Should Path grow a dosomethingwithmodfoo method? From rosuav at gmail.com Tue Apr 12 11:25:15 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 13 Apr 2016 01:25:15 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570C1E13.4090909@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Tue, Apr 12, 2016 at 7:58 AM, Ethan Furman wrote: > Sticking points: > --------------- > > Do we allow bytes to be returned from os.fspath()? If yes, then do we allow > bytes from __fspath__()? > I would say No and No, on the basis that it's *far* easier to widen their scope in 3.7 than to narrow it. Once you declare that one or both of these may return bytes, it becomes an annoying incompatibility to change that (even if it *is* marked provisional), which almost certainly means it won't happen. By restricting them both, we force the issue: if you want bytes, you'll know about it. I'd also prefer to stick to Unicode path names, for reasons I've stated in other threads. Undecodable path byte streams can be handled already, so what are we really gaining by allowing a Path-like object to emit bytes? If it becomes a major issue for a lot of types, it wouldn't be hard to add a helper function somewhere (or a mixin class that provides a ready-to-go __fspath__, which might well be sufficient). ChrisA From srkunze at mail.de Tue Apr 12 11:38:36 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 12 Apr 2016 17:38:36 +0200 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570C8EEE.6050904@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8EEE.6050904@stoneleaf.us> Message-ID: <570D167C.8040202@mail.de> Sorry for disturbing this thread's harmony. On 12.04.2016 08:00, Ethan Furman wrote: > On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote: > >>> Consider os.path.join: >> >> Why in the world do the os.path functions need to work with Path >> objects? ( and other conforming objects) > > Because library XYZ that takes a path and wants to open it shouldn't > have to care whether that path is a string or pathlib.Path -- but if > os.open can't use pathlib.Path then the library has to care (or the > user has to care). > >> This all started with the goal of using Path objects in the stdlib, >> but that's for opening files, etc. > > Etc. as in os.join? os.stat? os.path.split? > >> Path is an alternative to os.path -- you don't need to use both. > I agree with that quote of Chris. > As a user you don't, no. As a library that has no control over what > kind of "path" is passed to you -- well, if os and os.path can accept > Path objects then you can just use os and os.path; otherwise you have > to use os and os.path if passed a str or bytes, and pathlib.Path if > passed a pathlib.Path -- so you do have to use both. I don't agree here. There's no need to increase the convenience for a library maintainer when it comes to implicit conversions. When people want to use your library and it requires a string, the can simply use "my_path.path" and everything still works for them when they switch to pathlib. Best, Sven From stephen at xemacs.org Tue Apr 12 11:52:43 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 13 Apr 2016 00:52:43 +0900 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> Message-ID: <22285.6603.756030.873091@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > One possible way to address this concern would be to have the > underlying protocol be bytes/str (since boundary code frequently > needs to handle the paths-are-bytes assumption in POSIX), What "needs"? As has been pointed out several times, with PEP 383 you can deal with bytes losslessly by using an arbitrary codec and errors=surrogateescape. I know why *I* use bytes nevertheless: because when I must guess the encoding, it just makes more sense to read bytes and then iterate over codecs until the result looks like words I know in some language. I don't understand why people who mostly believe "bytes are text, too" because almost all they ever see are bytes in the range 0x00-0x7f need bytes. For them, fsdecode and fsencode DTRT. If you want to claim "efficiency", I can't gainsay since I don't know the applications, but if you're trying to manipulate file names millions of times per second, I have to wonder what you're doing with them that benefits so much from Path. > but offer an "os.fspathname" API that rejected bytes output from > os.fspath. Either it's a YAGNI because I'm not going to get any bytes in the first place, or it raises where I probably could have done something useful with bytes if I were expecting them (see "pathological" below). > That way folks that wanted the clean "must be str" signature Er, I don't need no steenkin' "clean signature". I need str, and if I can't get it from __fspath__, there's always os.fsdecode. But this is serious horse-before cart-putting, punishing those who do things Python-3-ishly right. > The ambiguity in question here is inherent in the differences between > the way POSIX and Windows work, Not with PEP 383, it's not. And I don't do Windows, so my preference for str has nothing to do with it mapping to native OS APIs well. The ambiguity in question here is inherent in the differences between the ways Python 2 and Python 3 programmers work on POSIX AFAICS. Certainly, there will be times when fsdecode doesn't DTRT. So those times you have to use an explicit bytes.decode. Note that when you *do* care enough to do that, it's because the Path is *text* -- you're going to display it to a human, or pass it out of the module. If all you're going to do is access the filesystem object denoted, fsdecode does a sufficiently accurate job. So if for some reason you're getting bytes at the boundary, I see no reason why you can't have a convenience constructor def pathological(str_or_bytes_or_path_seq): args = [] for s_o_b in str_or_bytes_or_path_seq: args.append(os.fsdecode(s_o_b) if isinstance(s_o_b, bytes) else s_o_b) return pathlib.Path(str_or_path_list) for when that's good enough (maybe Antoine would even allow it into pathlib?) > so there are limits to how far we can go in hiding it without > making things worse rather than better. What "hide"? Nobody is suggesting that the polymorphic os APIs should go away. Indeed, they are perfect TOOWTDI, giving the programmer exactly the flexibility needed *and no more*, *at* the boundary. The questions on my mind are: (A) Why does anybody need bytes out of a pathlib.Path (or other __fspath__-toting, higher-level API) *inside* the boundary? Note that the APIs in os (etc) *don't need* bytes because they are already polymorphic. (B) If they do, why can't they just apply bytes() to the object? I understand that that would offend Ethan's aesthetic sense, so it's worth looking for a nice way around it. But allowing __fspath__ to return bytes or str is hideous, because Paths are clearly on the application side of the boundary. Note that bytes() may not have the serious problem that str() does of being too catholic about its argument: nothing in __builtins__ has a __bytes__! Of course there are a few things that do work: ints, and sequences of ints. From srkunze at mail.de Tue Apr 12 11:57:24 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 12 Apr 2016 17:57:24 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de> <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com> <570D0BAE.9020404@mail.de> <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com> Message-ID: <570D1AE4.9010607@mail.de> On 12.04.2016 16:59, Random832 wrote: > >> Strings are strings. Paths are paths. That's were the difference is. > Yes but why aren't these both "things that you may want to use to open a > file"? Because "things that you may want to use to open a file" is a bit vague and thus conceal the fact that we really need. As an example: time.sleep takes a number of seconds (notice the primitive datatype just like a string) and does not take timedelta. Why don't we add datetime.timedelta support to time.sleep? Very same thing. >>> The fact that there is one obvious thing to want to do >>> with open and a Path strongly suggests that that should be able to be >>> done by passing the Path to open. >> Path(...).open() is your friend then. I don't see why you need os.open. > Because I'm passing it to modfoo.dosomethingwithafile() which takes a > filename and passes it to shutils, which passes it to builtin open, > which passes it to os.open. > > Should Path grow a dosomethingwithmodfoo method? Because we can argue here the other way round and say: "oh, pathlib can do things, I cannot do with os.path." Should os.path grow those things? Put differently, you cannot do everything. But the most common issues should be resolved in the correct module. This is no argument for or against either solution. I am sorry, if my contribution on the threads of python-ideas made it seem that I would always support this idea. I don't anymore. However, I will still be happy with the outcome even if not perfect, will help making the Python stdlib better. :) Best, Sven From chris.barker at noaa.gov Tue Apr 12 11:59:19 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 12 Apr 2016 08:59:19 -0700 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> <570C1560.7070105@mail.de> Message-ID: one little note: On Tue, Apr 12, 2016 at 3:41 AM, Paul Moore wrote: > 4. There are further improvements that could be made to pathlib, > certainly, but again they are optional, and pathlib is fine without > them. > Exactly -- "improvements to pathlib" and "make the stdlib pathlib compatible" are completely orthogonal. > 5. I wish more 3rd party code integrated better with pathlib. The > improved integration work might help with this. But ultimately, Python > 2 compatibility is likely to be the biggest block (either perceived or > real - we can make pathlib support as simple as possible, but some 3rd > party authors will remain unwilling to add support for Python 3 only > features in the short term). This isn't a pathlib problem. > true -- though the proposed protocol approach opens doors there -- any third party lib can check for a __whatever_it's_called__ and run fine in py2 or py3 or, indeed, any version of python. Also if you really don't like pathlib, then the protocol allows you to write/use a different path implementation -- really win-win. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Apr 12 12:04:21 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 12 Apr 2016 09:04:21 -0700 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570D0C0B.7000208@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> <570C1560.7070105@mail.de> <570D0C0B.7000208@mail.de> Message-ID: On Tue, Apr 12, 2016 at 7:54 AM, Sven R. Kunze wrote: > > My conclusion is that these changes are not optional and tweaking os, io > and shutil is just yet another workaround for a clean solution. :) > Is the clean solution to re-implement EVERYTHING in the stdlib that involves a path in a new, fancy pathlib way? If we were starting from scratch, I _might_ like that idea, but we're not starting from scratch. And that would cement in pathlib itself, leaving no room for other path implementations. kind of like how the pre-__Index__ python cemented in python integers as the only objects once could use to index a sequence. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Apr 12 12:10:55 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Apr 2016 09:10:55 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570C1E13.4090909@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> Message-ID: <570D1E0F.5040502@stoneleaf.us> On 04/11/2016 02:58 PM, Ethan Furman wrote: > Sticking points: > --------------- > > Do we allow bytes to be returned from os.fspath()? If yes, then do we > allow bytes from __fspath__()? On 04/11/2016 10:28 PM, Stephen J. Turnbull wrote: > In text applications, "bytes as carcinogen" is an apt metaphor. On 04/12/2016 08:25 AM, Chris Angelico wrote: > I would say No and No, on the basis that it's *far* easier to widen > their scope in 3.7 than to narrow it. On 04/11/2016 08:45 PM, Nick Coghlan wrote: > I've come around to the point of view that allowing both str and > bytes-like objects to pass through unchanged makes sense, with the > rationale being the one someone mentioned regarding ease-of-use in > os.path. [...] > One possible way to address this concern would be to have the > underlying protocol be bytes/str (since boundary code frequently needs > to handle the paths-are-bytes assumption in POSIX), but offer an > "os.fspathname" API that rejected bytes output from os.fspath. I think this is the way forward: offer a standard way to get paths-as-strings, with an easily supported way of working with paths-as-bytes. This could be with on os.fspathname() & os.fspath() pair of functions, or with a single function that has a parameter specifying what to do with bytes objects: reject (default), accept, or (maybe) an encoding to use to coerce to bytes. -- ~Ethan~ From srkunze at mail.de Tue Apr 12 12:14:29 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 12 Apr 2016 18:14:29 +0200 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> <570C1560.7070105@mail.de> <570D0C0B.7000208@mail.de> Message-ID: <570D1EE5.4090904@mail.de> On 12.04.2016 18:04, Chris Barker wrote: > On Tue, Apr 12, 2016 at 7:54 AM, Sven R. Kunze > wrote: > > > My conclusion is that these changes are not optional and tweaking > os, io and shutil is just yet another workaround for a clean > solution. :) > > > Is the clean solution to re-implement EVERYTHING in the stdlib that > involves a path in a new, fancy pathlib way? > > If we were starting from scratch, I _might_ like that idea, but we're > not starting from scratch. And that would cement in pathlib itself, > leaving no room for other path implementations. kind of like how the > pre-__Index__ python cemented in python integers as the only objects > once could use to index a sequence. I cannot remember us using another datetime library. So, I don't value this "advantage" as much as you do. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Apr 12 12:15:34 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Apr 2016 09:15:34 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> Message-ID: <570D1F26.5090800@stoneleaf.us> On 04/11/2016 04:43 PM, Victor Stinner wrote: > Le 11 avr. 2016 11:11 PM, "Ethan Furman" a ?crit : >> So my concern in such a case is what happens if we pass this SE >> string somewhere else: a UTF-8 file, or over a socket, or into a >> database? Does this have issues that we wouldn't face if we just used bytes? > > "SE string" are returned by os.listdir(str), os.walk(str), > os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under > the sun. So when we pass a bytes object in, Python (on posix) converts that to a string using surrogateescape, gets back strings from the os, and encodes them back to bytes, again using surrogateescape? > Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding > error. latin1? I thought latin1 had a code point for 0-255, so how could using it raise an encoding error? -- ~Ethan~ From rosuav at gmail.com Tue Apr 12 12:20:17 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 13 Apr 2016 02:20:17 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570D1F26.5090800@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman wrote: > On 04/11/2016 04:43 PM, Victor Stinner wrote: >> >> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a ?crit : > > >>> So my concern in such a case is what happens if we pass this SE >>> string somewhere else: a UTF-8 file, or over a socket, or into a >>> database? Does this have issues that we wouldn't face if we just used >>> bytes? >> >> >> "SE string" are returned by os.listdir(str), os.walk(str), >> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under >> the sun. > > > So when we pass a bytes object in, Python (on posix) converts that to a > string using surrogateescape, gets back strings from the os, and encodes > them back to bytes, again using surrogateescape? > > >> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding >> error. > > > latin1? I thought latin1 had a code point for 0-255, so how could using it > raise an encoding error? Latin-1 / ISO-8859-1 defines a character for every byte, so any byte string will *decode*. It only defines 256 characters as having equivalent bytes, though, so *encoding* can fail. ChrisA From chris.barker at noaa.gov Tue Apr 12 12:19:41 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 12 Apr 2016 09:19:41 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570C8A40.6020903@canterbury.ac.nz> References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz> Message-ID: On Mon, Apr 11, 2016 at 10:40 PM, Greg Ewing wrote: > > So the ONLY thing >> you should do with it is pass it along to another low level system >> call. >> > > Not quite -- you can separate it into components and > work with them. Essentially the same set of operations > that os.path provides. > ahh yes, so while posix claims that paths are "just a char*", they are really bytes where we can assume that the byte with value 2F is the pathsep (and that 2E separates an extension?), so I suppose os.path is useful. But I still think that most of us should never deal with bytes paths, and the few that need to should just work with the low level functions and be done with it. One more though came up just now: there are different level sof abstractions and representations for paths. We don't want to make Path a subclass of string, because Path is supposed to be a higher level abstraction -- good. then at the bottom of the stack, we NEED the bytes level path, because that what ultimately gets passed to the OS. THe legacy from the single-byte encoding days is that bytes and strings were the same, so we could let people work with nice human readable strings, while also working with byte paths in the same way -- but those days are gone -- py3 make s clear (and important) distiction between nice human readable strings and the bytes that represent them. So: why use strings as the lingua franca of paths? i.e. the basis of the path protocol. maybe we should support only two path representations: 1) A "proper" path object -- i.e. pathlib.Path or anything else that supports the path protocol. 2) the bytes that the OS actually needs. this would mean that the protocol would be to have a __pathbytes__() method that woulde return the bytes that should be passed off to the OS. A posix Path implementation could store that internal bytes representation, so it could pass it off unchanged if that's all you need to do. Any current API that takes bytes could be made to easily work. I'm SURE I'm missing something really big here, but it seems like maybe it's better to get farther from "strings as paths" rather than closer to it.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Tue Apr 12 12:26:24 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 12 Apr 2016 19:26:24 +0300 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Apr 12, 2016 at 11:56 AM, Nick Coghlan wrote: > One possible way to address this concern would be to have the > underlying protocol be bytes/str (since boundary code frequently needs > to handle the paths-are-bytes assumption in POSIX), but offer an > "os.fspathname" API that rejected bytes output from os.fspath. That > is, it would be equivalent to: > > def fspathname(path): > name = os.fspath(path) > if not isinstance(name, str): > raise TypeError("Expected str for pathname, not > {}".format(type(name))) > return name > > That way folks that wanted the clean "must be str" signature could use > os.fspathname, while those that wanted to accept either could use the > lower level os.fspath. I'm not necessarily opposed to this. I kept bringing up bytes in the discussion because os.path.* etc. and DirEntry support bytes and will need to keep doing so for backwards compatibility. I have no intention to use bytes pathnames myself. But it may break existing code if functions, for instance, began to decode bytes paths to str if they did not previously do so (or to reject them). It is indeed a lot safer to make new code not support bytes paths than to change the behavior of old code. But then again, do we really recommend new code to use os.fspath (or os.fspathname)? Should they not be using either pathlib or os.path.* etc. so they don't have to care? I'm sure Ethan and his library (or some other path library) will manage without the function in the stdlib, as long as the dunder attribute is there. So I'm, once again, posing this question (that I don't think got any reactions previously): Is there a significant audience for this new function, or is it enough to keep it a private function for the stdlib to use? That handful of third-party path libraries can decide for themselves if they want to (a) reject bytes or (b) implicitly fsdecode them or (c) pass them through just like str, depending on whatever their case requires in terms of backwards compatiblity or other goals. If we forget about the os.fswhatever function, we only have to decide whether the magic dunder attribute can be str or bytes or just str. -Koos From k7hoven at gmail.com Tue Apr 12 12:32:13 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 12 Apr 2016 19:32:13 +0300 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz> Message-ID: On Tue, Apr 12, 2016 at 7:19 PM, Chris Barker wrote: > > One more though came up just now: there are different level sof abstractions > and representations for paths. We don't want to make Path a subclass of > string, because Path is supposed to be a higher level abstraction -- good. > > then at the bottom of the stack, we NEED the bytes level path, because that > what ultimately gets passed to the OS. > > THe legacy from the single-byte encoding days is that bytes and strings were > the same, so we could let people work with nice human readable strings, > while also working with byte paths in the same way -- but those days are > gone -- py3 make s clear (and important) distiction between nice human > readable strings and the bytes that represent them. > > So: why use strings as the lingua franca of paths? i.e. the basis of the > path protocol. maybe we should support only two path representations: > > 1) A "proper" path object -- i.e. pathlib.Path or anything else that > supports the path protocol. > > 2) the bytes that the OS actually needs. > You do have a point there. But since bytes pathnames are deprecated on windows, this seems to lead to supporting both str and bytes in the protocol, or having two protocols __fspathbytes__ and __fspathstr__ (and one being preferred over the other, potentially even depending on the platform)., -Koos From chris.barker at noaa.gov Tue Apr 12 12:33:54 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 12 Apr 2016 09:33:54 -0700 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570D1AE4.9010607@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> <570C1A95.1060100@mail.de> <1460415365.1410453.575777673.71BE8F33@webmail.messagingengine.com> <570D0BAE.9020404@mail.de> <1460473145.3562520.576479217.31E08CDE@webmail.messagingengine.com> <570D1AE4.9010607@mail.de> Message-ID: On Tue, Apr 12, 2016 at 8:57 AM, Sven R. Kunze wrote: > As an example: time.sleep takes a number of seconds (notice the primitive > datatype just like a string) and does not take timedelta. > > Why don't we add datetime.timedelta support to time.sleep? Very same thing. yup -- and it there were a lot of commonly used APIs that took strings, and multiple timedelta implementations, then it would make sense to introduce a __seconds_int__ protocol. I don't think the use-cases rise to that level, myself. Though if someone wanted to put a call in to obj.totalseconds() into time.sleep, that might actually be worth it :-) (now that yo mention it -- I have a substantial library that uses seconds internally, and currently has an ugly sometimes integer seconds, sometimes timedelta API -- maybe I'll introduce that protocol. Not sure why I didn't think of that before now. Because I'm passing it to modfoo.dosomethingwithafile() which takes a >> filename and passes it to shutils, which passes it to builtin open, >> which passes it to os.open. >> >> Should Path grow a dosomethingwithmodfoo method? > > It can't -- modfoo could be a third-party module -- it is impossible for Path to grow everything that any third party module might support. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Apr 12 12:37:13 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Apr 2016 09:37:13 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: <570D2439.7010005@stoneleaf.us> On 04/12/2016 09:20 AM, Chris Angelico wrote: > On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman >> latin1? I thought latin1 had a code point for 0-255, so how could using it >> raise an encoding error? > > Latin-1 / ISO-8859-1 defines a character for every byte, so any byte > string will *decode*. It only defines 256 characters as having > equivalent bytes, though, so *encoding* can fail. Ah, right -- so if you start with bytes it cannot fail, if you start with a string it can. -- ~Ethan~ From chris.barker at noaa.gov Tue Apr 12 12:36:32 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 12 Apr 2016 09:36:32 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: On Tue, Apr 12, 2016 at 9:20 AM, Chris Angelico wrote: > > latin1? I thought latin1 had a code point for 0-255, so how could using > it > > raise an encoding error? > > Latin-1 / ISO-8859-1 defines a character for every byte, so any byte > string will *decode*. It only defines 256 characters as having > equivalent bytes, though, so *encoding* can fail. > unless it was decoded as latin-1 in the first place. doesn't the surrogate escape thing only work properly if you decode/encode with the same encoding? -CHB Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Apr 12 12:39:59 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Apr 2016 09:39:59 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> Message-ID: <570D24DF.4050902@stoneleaf.us> On 04/12/2016 09:26 AM, Koos Zevenhoven wrote: > So I'm, once again, posing this question (that I don't think got any > reactions previously): Is there a significant audience for this new > function, or is it enough to keep it a private function for the stdlib > to use? Quite frankly, I expect the stdlib itself to be the primary consumer. But I see no reason to not publish the function so that users who need the advanced functionality have easy access to it. -- ~Ethan~ From chris.barker at noaa.gov Tue Apr 12 12:40:00 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 12 Apr 2016 09:40:00 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz> Message-ID: On Tue, Apr 12, 2016 at 9:32 AM, Koos Zevenhoven wrote: > > 1) A "proper" path object -- i.e. pathlib.Path or anything else that > > supports the path protocol. > > > > 2) the bytes that the OS actually needs. > > > > You do have a point there. But since bytes pathnames are deprecated on > windows, Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest level, but the decision was already made there to use str as the the lingua-franca -- i.e. the user NEVER sees a path as a bytestring on Windows? I guess that's decided then. str is the exchange format. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Tue Apr 12 12:45:42 2016 From: random832 at fastmail.com (Random832) Date: Tue, 12 Apr 2016 12:45:42 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz> Message-ID: <1460479542.3589839.576602801.5DD53A06@webmail.messagingengine.com> On Tue, Apr 12, 2016, at 12:40, Chris Barker wrote: > Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest > level, Only in the sense that literally everything's bytes at the lowest level. But the bytes Windows needs are not in an ASCII-compatible encoding so it's not reasonable to talk about them in the same way as every other kind of bytes filename. > but the decision was already made there to use str as the the > lingua-franca -- i.e. the user NEVER sees a path as a bytestring on > Windows? I guess that's decided then. str is the exchange format. From barry at barrys-emacs.org Tue Apr 12 13:03:56 2016 From: barry at barrys-emacs.org (Barry Scott) Date: Tue, 12 Apr 2016 18:03:56 +0100 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570C13D6.4090609@stoneleaf.us> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <570C13D6.4090609@stoneleaf.us> Message-ID: <20160412180356.000005a2@barrys-emacs.org> On Mon, 11 Apr 2016 14:15:02 -0700 Ethan Furman wrote: > We've pretty decided that we have two options: > > 1. remove pathlib > 2. make the stdlib work with pathlib > > So we're trying to make option 2 work before falling back to option 1. I have been doing a lot of porting to Python 3 and have really enjoyed having pathlib, even in its current state. In one of my previous projects using python 2 on linux we had to code to handle files with names that where not utf-8. (Users could FTP a file into the file system and it could end up non-utf-8). Today we would have used pathlib to represent paths in the app. But we would need to be able to detect the paths that do not following the fs encoding rules. I would suggest a predicate in Path to report that the path cannot be encoding without the use of surrogates. Not sure what to call the predicate. This can be used by code that cares to handle converting the path into a suitable presentation string for showing to a user. I'm assuming here that PEP383 may not provide an presentation string that is suitable for showing to users. In the case of our product we refused to use files that did not encode to utf-8 and had a UI to allow the user to fix the name. One reason for files that can only be represented as bytes() being detectable I suspect is to avoid security issues. I think if I have my black hat on I would probe a python3 app with filenames that are non-utf-8 and see if I can break the app. Barry From k7hoven at gmail.com Tue Apr 12 13:31:21 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 12 Apr 2016 20:31:21 +0300 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <22285.6603.756030.873091@turnbull.sk.tsukuba.ac.jp> References: <570C1E13.4090909@stoneleaf.us> <22284.34707.332239.239088@turnbull.sk.tsukuba.ac.jp> <22285.6603.756030.873091@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Apr 12, 2016 at 6:52 PM, Stephen J. Turnbull wrote: > > (A) Why does anybody need bytes out of a pathlib.Path (or other > __fspath__-toting, higher-level API) *inside* the boundary? Note > that the APIs in os (etc) *don't need* bytes because they are > already polymorphic. > Indeed not from pathlib.*Path , but from DirEntry, which may have a path as bytes. So the options for DirEntry (or things like Ethan's 'antipathy') are: (1) Provide bytes or str via the protocol, depending on which type this DirEntry has Downside: The protocol needs to support str and bytes. (2) Decode bytes using os.fsdecode and provide a str via the protocol Downside: The user passed in bytes and maybe had a reason to do so. This might lead to a weird mixture of str and bytes in the same code. (3) Do not implement the protocol when dealing with bytes Downside: If a function calling os.scandir accepts both bytes and str in a duck-typing fashion, then, if this adopted something that uses the new protocol, it will lose its bytes compatiblity. This risk might not be huge, so perhaps (3) is an option? > (B) If they do, why can't they just apply bytes() to the object? I > understand that that would offend Ethan's aesthetic sense, so it's > worth looking for a nice way around it. But allowing __fspath__ > to return bytes or str is hideous, because Paths are clearly on > the application side of the boundary. > > Note that bytes() may not have the serious problem that str() does of > being too catholic about its argument: nothing in __builtins__ has a > __bytes__! Of course there are a few things that do work: ints, and > sequences of ints. Good point. But this only applies to when the user _explicitly_ deals with bytes. But when the user just deals with the type (str or bytes) that is passed in, as os.path.* as well as DirEntry now do, this does not work. -Koos From tritium-list at sdamon.com Tue Apr 12 13:41:12 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Tue, 12 Apr 2016 13:41:12 -0400 Subject: [Python-Dev] Maybe, just maybe, pathlib doesn't belong. In-Reply-To: <570D1EE5.4090904@mail.de> References: <570C0A0B.90109@sdamon.com> <570C0DA7.6030407@mail.de> <570C0F29.5010904@sdamon.com> <570C115D.1030104@mail.de> <1460408931.3318740.575696409.02DA49B5@webmail.messagingengine.com> <570C1560.7070105@mail.de> <570D0C0B.7000208@mail.de> <570D1EE5.4090904@mail.de> Message-ID: <570D3338.5060402@sdamon.com> On 4/12/2016 12:14, Sven R. Kunze wrote: > I cannot remember us using another datetime library. So, I don't value > this "advantage" as much as you do. They exist, and there are many cases where you would use a datetime library other than datetime for various reasons (integration in third party systems is only one reason). But this is just a tangent. In fact the situation with pathlib is similar to datetime - before the inclusion of datetime in the stdlib, there were several datetime libraries available. Before pathlib, there were several path object libraries. Only now, the third party options offer a great deal of competition over the stdlib option, thus these many hundreds, if not thousands, of emails on the subject. From chris.barker at noaa.gov Tue Apr 12 18:37:14 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 12 Apr 2016 15:37:14 -0700 Subject: [Python-Dev] ping on issue 18378: locale.getdefaultlocale() fails on recent Mac OS X Message-ID: Hi folks, There have been multiple reports of folks having failures on startup of matplotlib, which appears to be due to the most recent OS-X version setting the locale weirdly. This was identified last summer in this issue: http://bugs.python.org/issue18378 It looks like the issue was figured out, and even a patch contributed, but it stalled out before being applied. I have no idea if the patch is any good, but it would be great to get this fixed! -Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Apr 12 22:56:50 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 13 Apr 2016 11:56:50 +0900 Subject: [Python-Dev] List posting custom [was: current status of discussions] In-Reply-To: <570D167C.8040202@mail.de> References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8EEE.6050904@stoneleaf.us> <570D167C.8040202@mail.de> Message-ID: <22285.46450.255405.357217@turnbull.sk.tsukuba.ac.jp> The following is my opinion, as will become obvious, but it's based on over a decade of observing these lists, and other open source development lists. In a context where some core developers have unsubscribed from these lists, and others regularly report muting threads with a certain air of asperity, I think it's worth the risk of seeming arrogant to explain some of the customs (which are complex and subtle) around posting to Python developer lists. I'm posting publicly because there are several new developers whose activity and fresh perspective is very welcome, but harmony *is* being disturbed, IMO unnecessarily. This particular post caught my eye, but it's only an example of one of the most unharmonious posting styles that has become common recently. Attribution deliberately removed. > Sorry for disturbing this thread's harmony. *sigh* There is way too much of this on Python-Ideas recently, and there shouldn't be any on Python-Dev. So please don't. Specifically, disagreement with an apparently developing consensus is fine but please avoid this: > >> Path is an alternative to os.path -- you don't need to use both. > > I agree with that quote of Chris. It's a waste of time to post *what* you agree with.[1] Decisions are not taken by vote in this community, except for the color of the bikeshed, where it is agreed that *what* decision is taken doesn't matter, but that some decision should be taken expeditiously.[2] Chris already stated this position clearly and it's not a "color", so there is no need to reiterate. It simply wastes others' time to read it. (Whether it was a waste of the poster's time is not for me to comment on.) What matters to the decision is *why* you agree (or disagree). If you think that some of Chris's arguments are bogus (and should be disregarded) and others are important, that is valuable information. It's even better if you can shed additional light on the matter (example below). Also, expression of agreement is often a prelude to a request for information. "I agree with Z's post. At least, I have never needed X. *When* do you need X? Let's look for a better way than X!" Unsupported (dis)agreement to statements about "needs" also may be taken as *rude*, because others may infer your arrogant claim to know what *they* do or don't need. Admittedly there's a difficult distinction here between Chris's *idiom* where "you don't need to" translates to "In my understanding, it is generally not necessary to", and your *unsupported* agreement, which in my dialect of English changes the emphasis to imply you know better than those who disagree with you and Chris. And, of course, the position that others are "too easily offended" is often reasonable, but you should be aware that there will be an impact on your reputation and ability to influence development of Python (even if it doesn't come near the point where a moderator invokes "Code of Conduct"). "Me too" posts aren't entirely forbidden, but I feel that in Python custom they are most appropriate when voting on bikeshed colors, and as applause for a *technically* excellent suggestion. They should be avoided in the context of value judgments (of "need" and "simplicity", for example) for the reason given above. > When people want to use your library and it requires a string, the > can simply use "my_path.path" and everything still works for them > when they switch to pathlib. This is disrespectful in tone. I don't know if you're responding to Ethan here, but he's one of the authors in question. We *know* that Ethan doesn't like such inelegant idioms -- he said so -- where "this object has an appropriate conversion to your argument type, so you should apply it implicitly" is unambiguous.[3] So for him, it's *not* so simple. Since it's not a matter of voting, each proponent should provide more contexts where preferred programming idioms are "Pythonic" to sway the sense of the community, or if necessary, the BDFL. Where that aesthetic came up was in the context of consistently wrapping arguments that might be Paths in str, as in p = Path(*stuff) or defaultstring # 500 lines crossing function and module boundaries! with open(str(p)) as f: process(f) I think it was Nick who posted agreement with Ethan on the aesthetics of str-wrapping. If that were all, he probably wouldn't have posted (see fn. 1), but he further pointed out that this application of str is *dangerous* because *everything* in Python can be coerced to str. That was a very valuable observation, which swayed the list in favor of "Uh-oh, we can't recommend 'os.method(str(Path))'!" This is my last post on this particular topic, but I will be happy to discuss off-list. (I may discuss further in public on my blog, but first I have to get a blog. :-) Footnotes: [1] "You" is generic here. There are a couple of developers whose agreement has the status of pronouncement of Pythonicity. Aspire to that, but don't assume it -- very few have it, and it's actually *very* rarely exercised. And you can recognize them because they are *asked* to pronounce -- by people whose statements you thought were already authoritative! [2] And even so votes are often overturned by later arguments, both theoretical and based in experience. See for example the several threads over time on the naming of Py_XSETREF. [3] Interpreting Zen koans frequently requires figure-ground inversion. In this case we can apply "In the face of ambiguity, refuse to guess" in the form "in the absence of ambiguity, don't wait to be asked". I'm hardly authoritative, but FWIW :-) I think Ethan's esthetic sense here accords with Pythonicity. From tjreedy at udel.edu Wed Apr 13 00:39:11 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 13 Apr 2016 00:39:11 -0400 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: On 4/4/2016 5:05 PM, Terry Reedy wrote: Since a few days, I am getting bug tracker emails again, in my Inbox. I just got a Rietveld review in the Inbox and I believe it went there directly instead of first to Junk. Thank you to whoever made the improvements. -- Terry Jan Reedy From cybersol at yahoo.com Wed Apr 13 01:37:01 2016 From: cybersol at yahoo.com (Michael Mysinger) Date: Wed, 13 Apr 2016 05:37:01 +0000 (UTC) Subject: [Python-Dev] pathlib - current status of discussions References: <570C1E13.4090909@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > Do we allow bytes to be returned from os.fspath()? If yes, then do we > allow bytes from __fspath__()? De-lurking. Especially since the ultimate goal is better interoperability, I feel like an implementation that people can play with would help guide the few remaining decisions. To help test the various options you could temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to both pathlib.__fspath__() and os.fspath(), with distinct configurable defaults for each. In the spirit of Python 3 I feel like bytes might not be needed in practice, but something like this with defaults of False will allow people to easily test all the various options. From victor.stinner at gmail.com Wed Apr 13 07:40:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Apr 2016 13:40:44 +0200 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! Message-ID: Hi, Last months, most 3.x buildbots failed randomly. Some of them were always failing. I spent some time to fix almost all Windows and Linux buildbots. There were a lot of different issues. So please try to not break buildbots again and remind to watch them sometimes: http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable Next weeks, I will try to backport some fixes to Python 3.5 (if needed) to make these buildbots more stable too. Python 2.7 buildbots are also in a sad state (ex: test_marshal segfaults on Windows, see issue #25264). But it's not easy to get a Windows with the right compiler to develop on Python 2.7 on Windows. -- Maybe it's time to move more 3.x buildbots to the "stable" category? http://buildbot.python.org/all/waterfall?category=3.x.stable By the way, I don't understand why "AMD64 OpenIndiana 3.x" is considered as stable since it's failing with multiple issues since many months and nobody is working on these failures. I suggest to move this buildbot back to the unstable category. -- We have many offline buildbots. What's the status of these buildbots? Should we expect that they come back soon? Or would it be possible to hide them? It would help to check the status of all buildbots. -- Failing buildbots: - AMD64 FreeBSD CURRENT 3.x: http://bugs.python.org/issue26566 -- I installed a fresh FreeBSD CURRENT in a VM and I'm unable to reproduce failures. Maybe the buildbot slave is oudated and FreeBSD must be upgraded? - AMD64 OpenIndiana 3.x, x86 OpenIndiana 3.x: test_socket failures on sendfile. Sorry but I'm not really interested by this OS. - PPC64 AIX 3.x: failing tests: test_httplib, test_httpservers, test_socket, test_distutils, test_asyncio, (...); random timeout failure in test_eintr, etc. I don't have access to AIX and I'm not interested to acquire an AIX license, nor to install it. I'm not sure that it's useful to have an AIX buildbot and no core developer have access to AIX, and nobody is working on AIX failures. Maybe HP wants to help us to support AIX? (Provide manpower, access to AIX servers, or something like that.) - x86 OpenBSD 3.x: 5 tests failed, test_crypt test_socket test_ssl test_strptime test_time. This OS needs some love ;-) - the 4 ICC buildbots are failing with stack overflow, segfault, etc. Again, I'm not sure that these buildbots are useful since it looks like we don't support this compiler yet. Or does it help to work on supporting this compiler? Who is working on ICC support? -- FYI I also made some enhancements on regrtest (our test runner for the test suite), mostly to debug failures: - display the duration of tests taking longer than 30 seconds - new timestamp prefix, used to debug buildbot hangs - when parallel tests are interrupted, display progress on waiting for completion - add timeout to main process when using -jN: it should help to debug buildbot hang - "Run tests in parallel using 3 child processes" or "Run tests sequentially" message which helps to understand how tests are running. There is the -j1 trap which has no effect: tests are still run sequentially. By the way, I proposed to really use subprocesses when -j1 is used: http://bugs.python.org/issue25285 The default timeout changed from 1 hour to 15 min, it's the maximum duration to run a single test file (ex: test_os.py). On my Linux box, running the whole test suite in parallel (10 child processes for my 4 CPU cores with hyperthreading) with Python compiled in debug mode (slow) takes 4 min 37 sec. Tell me if the default timeout is too low. It can be configured per buildbot if needed (TESTTIMEOUT env var). -- By the way, I'm always surprised by the huge difference of time needed to run a build on the different slaves: from a few minutes to more than 3 hours. The fatest Windows slave takes 28 minutes (run tests in parallel using 4 child processes), whereas the 3 others (run tests sequentially and) take between 2 hours and more than 3 hours! Why running tests on Windows takes so long? Maybe we should make sure that no buildbot run tests sequentially, because it creates a lot of annoying side effects (even if sometimes it helps to find tricky bugs, sometimes bugs restricted to the tests themself) and because a lot of time simply wait a few seconds. So running mutliple tests in parallel don't burn your CPU, it's just faster. IMHO the risk of random timeout failures is low compared to the speedup. -- The most interesting bug was a deadlock in locale.setlocale() on Windows 7: the bug made the buildbot to hang "sometimes" (randomly). Jeremy Kloth identified the bug, but Steve Dower noticed us that it's already fixed in Visual Studio 2015 Update 1: so please update VS if it's not the case yet. Steve added a post-build test to check if the ucrtbase/ucrtbased DLL has the known bug. => http://bugs.python.org/issue26624 Victor From rosuav at gmail.com Wed Apr 13 08:19:34 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 13 Apr 2016 22:19:34 +1000 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On Wed, Apr 13, 2016 at 9:40 PM, Victor Stinner wrote: > Maybe it's time to move more 3.x buildbots to the "stable" category? > http://buildbot.python.org/all/waterfall?category=3.x.stable Move the Bruces into stable, perhaps? The AMD64 Debian Root one. Been fairly consistently green. ChrisA From eric at trueblade.com Wed Apr 13 08:32:34 2016 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 13 Apr 2016 08:32:34 -0400 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: <570E3C62.2080305@trueblade.com> On 4/13/2016 7:40 AM, Victor Stinner wrote: > Last months, most 3.x buildbots failed randomly. Some of them were > always failing. I spent some time to fix almost all Windows and Linux > buildbots. There were a lot of different issues. Thanks for all of your work on this, Victor. It's much appreciated. Eric. From mail at timgolden.me.uk Wed Apr 13 08:56:53 2016 From: mail at timgolden.me.uk (Tim Golden) Date: Wed, 13 Apr 2016 13:56:53 +0100 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: <570E4215.5090101@timgolden.me.uk> On 13/04/2016 12:40, Victor Stinner wrote: > Last months, most 3.x buildbots failed randomly. Some of them were > always failing. I spent some time to fix almost all Windows and Linux > buildbots. There were a lot of different issues. Can I state the obvious and offer a huge vote of thanks for this work, which is often tedious and unrewarding? Thank you TJG From stefan at bytereef.org Wed Apr 13 09:13:07 2016 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 13 Apr 2016 13:13:07 +0000 (UTC) Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! References: Message-ID: Victor Stinner gmail.com> writes: > Maybe it's time to move more 3.x buildbots to the "stable" category? > http://buildbot.python.org/all/waterfall?category=3.x.stable +1 I think anything that is actually stable should be in that category. > By the way, I don't understand why "AMD64 OpenIndiana 3.x" is > considered as stable since it's failing with multiple issues since > many months and nobody is working on these failures. I suggest to move > this buildbot back to the unstable category. +1 The bot was very stable and fast for some time but has been unstable for at least a year. > - PPC64 AIX 3.x: failing tests: test_httplib, test_httpservers, > test_socket, test_distutils, test_asyncio, (...); random timeout > failure in test_eintr, etc. I don't have access to AIX and I'm not > interested to acquire an AIX license, nor to install it. I'm not sure > that it's useful to have an AIX buildbot and no core developer have > access to AIX, and nobody is working on AIX failures. Maybe HP wants > to help us to support AIX? (Provide manpower, access to AIX servers, > or something like that.) Well, I think in this case it's the gcc AIX maintainer running it, so... I think we should have a policy to stop reporting issues on unstable bots unless someone has a concrete fix OR the bot maintainers are known to fix issues fast (but that does not seem to be the case). Stefan Krah From ncoghlan at gmail.com Wed Apr 13 09:51:02 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Apr 2016 23:51:02 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570D1F26.5090800@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: On 13 April 2016 at 02:15, Ethan Furman wrote: > On 04/11/2016 04:43 PM, Victor Stinner wrote: >> >> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a ?crit : > > >>> So my concern in such a case is what happens if we pass this SE >>> string somewhere else: a UTF-8 file, or over a socket, or into a >>> database? Does this have issues that we wouldn't face if we just used >>> bytes? >> >> >> "SE string" are returned by os.listdir(str), os.walk(str), >> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under >> the sun. > > > So when we pass a bytes object in, Python (on posix) converts that to a > string using surrogateescape, gets back strings from the os, and encodes > them back to bytes, again using surrogateescape? On POSIX, if you pass bytes to the os module, it will pass bytes to the underlying system API, and then pass bytes back to your application. The potentially SE-strings only come back when you pass str, and the operating system data isn't properly encoded according to the nominal filesystem encoding. They round trip nicely to other operating system APIs, but can indeed be a problem if they escape to other parts of your program (hence ideas like http://bugs.python.org/issue18814#msg251694 and the preceding discussion in that issue) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Apr 13 10:04:29 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 00:04:29 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8A40.6020903@canterbury.ac.nz> Message-ID: On 13 April 2016 at 02:19, Chris Barker wrote: > So: why use strings as the lingua franca of paths? i.e. the basis of the > path protocol. maybe we should support only two path representations: > > 1) A "proper" path object -- i.e. pathlib.Path or anything else that > supports the path protocol. > > 2) the bytes that the OS actually needs. > > this would mean that the protocol would be to have a __pathbytes__() method > that woulde return the bytes that should be passed off to the OS. The reason to favour strings over raw bytes for path manipulation is the same reason to favour them anywhere else: to avoid having to worry about encodings *while* you're manipulating things, and instead only worry about the encoding when actually talking to the OS (which may be UTF-16-LE to talk to a Windows API, or UTF-8 to talk to a *nix API, or something else entirely if your OS is set up that way, or you're writing the path to a file or network packet, rather than using it locally). Regardless of what we decide about os.fspath's return type, that general principle won't change - if you're manipulating bytes paths directly, you're doing something relatively specialised (like working on CPython's own os module). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Wed Apr 13 10:11:13 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 13 Apr 2016 15:11:13 +0100 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: On 13 April 2016 at 14:51, Nick Coghlan wrote: > The potentially SE-strings only come back when you pass str, and the > operating system data isn't properly encoded according to the nominal > filesystem encoding. They round trip nicely to other operating system > APIs, but can indeed be a problem if they escape to other parts of > your program If the operating system APIs handle SE-strings correctly, is it not acceptable to require the fspath protocol to return strings, and then places like DirEntry or Ethan's module, when they want to return bytes, can just SE-encode the bytes and return those? Or will the fspath protocol be used at a low enough level that it's *below* the point where SE-encoded strings are handled properly? Paul From ncoghlan at gmail.com Wed Apr 13 10:21:37 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 00:21:37 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: On 14 April 2016 at 00:11, Paul Moore wrote: > On 13 April 2016 at 14:51, Nick Coghlan wrote: >> The potentially SE-strings only come back when you pass str, and the >> operating system data isn't properly encoded according to the nominal >> filesystem encoding. They round trip nicely to other operating system >> APIs, but can indeed be a problem if they escape to other parts of >> your program > > If the operating system APIs handle SE-strings correctly, is it not > acceptable to require the fspath protocol to return strings, and then > places like DirEntry or Ethan's module, when they want to return > bytes, can just SE-encode the bytes and return those? > > Or will the fspath protocol be used at a low enough level that it's > *below* the point where SE-encoded strings are handled properly? I'd expect the main consumers to be os and os.path, and would honestly be surprised if we needed many explicit invocations above that layer, other than in pathlib itself. That's actually the main factor in my suggesting the two level API design - from a protocol consumer perspective, bytes-or-str is a natural fit for os and os.path, while str-only is a natural fit for pathlib. I also now believe it makes sense to postpone a final decision on this aspect of the design until after a draft implementation has been put together, as my and Ethan's assumption that os and os.path will be the main consumers is exactly that: an assumption. Putting the draft implementation together will let us know whether or not it's an accurate one. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Wed Apr 13 11:09:36 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 08:09:36 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: <570E6130.6080507@stoneleaf.us> On 04/13/2016 07:21 AM, Nick Coghlan wrote: > On 14 April 2016 at 00:11, Paul Moore wrote: >> On 13 April 2016 at 14:51, Nick Coghlan wrote: >>> The potential SE-strings only come back when you pass str, and the >>> operating system data isn't properly encoded according to the nominal >>> filesystem encoding. They round trip nicely to other operating system >>> APIs, but can indeed be a problem if they escape to other parts of >>> your program >> >> If the operating system APIs handle SE-strings correctly, is it not >> acceptable to require the fspath protocol to return strings, and then >> places like DirEntry or Ethan's module, when they want to return >> bytes, can just SE-encode the bytes and return those? >> >> Or will the fspath protocol be used at a low enough level that it's >> *below* the point where SE-encoded strings are handled properly? > > I'd expect the main consumers to be os and os.path, and would honestly > be surprised if we needed many explicit invocations above that layer, > other than in pathlib itself. > > That's actually the main factor in my suggesting the two level API > design - from a protocol consumer perspective, bytes-or-str is a > natural fit for os and os.path, while str-only is a natural fit for > pathlib. > > I also now believe it makes sense to postpone a final decision on this > aspect of the design until after a draft implementation has been put > together, as my and Ethan's assumption that os and os.path will be the > main consumers is exactly that: an assumption. Putting the draft > implementation together will let us know whether or not it's an > accurate one. Sounds reasonable. However, there is still one choice that needs to be made: - a single os.fspath() with an allow_bytes parameter (mostly True in os and os.path, mostly False everywhere else) - a str-only os.fspathname() and a str/bytes os.fspath() I'm partial to the first choice as it is simplicity itself to know when looking at it if bytes might be coming back by the presence or absence of a second argument to the call; otherwise one has to keep straight in one's head which is str-only and which might allow bytes (I'm not very good at keeping similar sounding functions separate -- what's the difference between shutil.copy and shutil.copy2? I have to look it up every time). -- ~Ethan~ From random832 at fastmail.com Wed Apr 13 11:17:41 2016 From: random832 at fastmail.com (Random832) Date: Wed, 13 Apr 2016 11:17:41 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> Message-ID: <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote: > I'd expect the main consumers to be os and os.path, and would honestly > be surprised if we needed many explicit invocations above that layer, > other than in pathlib itself. I made a toy implementation to try this out, and making os.open support it does not get you builtin open "for free" as I had suspected; builtin open has its own type checks in _iomodule.c. Probably anything not implemented in pure python that deals with filenames is going to have to have its type checking revised. From ethan at stoneleaf.us Wed Apr 13 11:28:28 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 08:28:28 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> Message-ID: <570E659C.8010108@stoneleaf.us> On 04/13/2016 08:17 AM, Random832 wrote: > On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote: >> I'd expect the main consumers to be os and os.path, and would honestly >> be surprised if we needed many explicit invocations above that layer, >> other than in pathlib itself. > > I made a toy implementation to try this out, and making os.open support > it does not get you builtin open "for free" as I had suspected; builtin > open has its own type checks in _iomodule.c. Yup, it will take some effort to make this work. > Probably anything not implemented in pure python that deals with > filenames is going to have to have its type checking revised. Agreed. You can see why there was no point in pursuing the conversation unless someone was willing to do the work. -- ~Ethan~ From fred at fdrake.net Wed Apr 13 12:18:36 2016 From: fred at fdrake.net (Fred Drake) Date: Wed, 13 Apr 2016 12:18:36 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570E6130.6080507@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman wrote: > - a single os.fspath() with an allow_bytes parameter > (mostly True in os and os.path, mostly False everywhere > else) -0 > - a str-only os.fspathname() and a str/bytes os.fspath() +1 on using separate functions. > I'm partial to the first choice as it is simplicity itself to know when > looking at it if bytes might be coming back by the presence or absence of a > second argument to the call; otherwise one has to keep straight in one's > head which is str-only and which might allow bytes (I'm not very good at > keeping similar sounding functions separate -- what's the difference between > shutil.copy and shutil.copy2? I have to look it up every time). I do the same, but... this is one of those cases where a caller will usually be passing a constant directly. If passed as a positional argument, it'll just be confusing ("what's True?" is my usual reaction to a Boolean positional argument). If passed as a keyword argument with a descriptive name, it'll be longer than I'd like to see: path_str = os.fspath(path, allow_bytes=True) Names like os.fspath() and os.fssyspath() seem good to me. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From victor.stinner at gmail.com Wed Apr 13 12:24:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Apr 2016 18:24:44 +0200 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units Message-ID: Hi, In the middle of recent discussions about Python performance, it was discussed to change the Python bytecode. Serhiy proposed to reuse MicroPython short bytecode to reduce the disk space and reduce the memory footprint. Demur Rumed proposes a different change to use a regular bytecode using 16-bit units: an instruction has always one 8-bit argument, it's zero if the instruction doesn't have an argument: http://bugs.python.org/issue26647 According to benchmarks, it looks faster: http://bugs.python.org/issue26647#msg263339 IMHO it's a nice enhancement: it makes the code simpler. The most interesting change is made in Python/ceval.c: - if (HAS_ARG(opcode)) - oparg = NEXTARG(); + oparg = NEXTARG(); This code is the very hot loop evaluating Python bytecode. I expect that removing a conditional branch here can reduce the CPU branch misprediction. I reviewed first versions of the change, and IMHO it's almost ready to be merged. But I would prefer to have a review from a least a second core reviewer. Can someone please review the change? -- The side effect of wordcode is that arguments in 0..255 now uses 2 bytes per instruction instead of 3, so it also reduce the size of bytecode for the most common case. Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit argument for keyword defaults and 24-bit argument for annotations. Other common instruction known to use large argument are jumps for bytecode longer than 256 bytes. -- Right now, ceval.c still fetchs opcode and then oparg with two 8-bit instructions. Later, we can discuss if it would be possible to ensure that the bytecode is always aligned to 16-bit in memory to fetch the two bytes using a uint16_t* pointer. Maybe we can overallocate 1 byte in codeobject.c and align manually the memory block if needed. Or ceval.c should maybe copy the code if it's not aligned? Raymond Hettinger proposes something like that, but it looks like there are concerns about non-aligned memory accesses: http://bugs.python.org/issue25823 The cost of non-aligned memory accesses depends on the CPU architecture, but it can raise a SIGBUS on some arch (MIPS and SPARC?). Victor From brett at python.org Wed Apr 13 12:26:34 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 16:26:34 +0000 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: Glad it's working again! And it was a combination or R. David Murray, Ezio Melotti, Mark Mangoba ( http://pyfound.blogspot.com/2016/04/the-psf-has-hired-it-manager.html in case you don't know who Mark is), and myself along with Upfront (b.p.o hosting provider). On Tue, 12 Apr 2016 at 21:40 Terry Reedy wrote: > On 4/4/2016 5:05 PM, Terry Reedy wrote: > > Since a few days, I am getting bug tracker emails again, in my Inbox. I > just got a Rietveld review in the Inbox and I believe it went there > directly instead of first to Junk. Thank you to whoever made the > improvements. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Apr 13 12:27:48 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 13 Apr 2016 17:27:48 +0100 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: On 13 April 2016 at 17:18, Fred Drake wrote: > Names like os.fspath() and os.fssyspath() seem good to me. -1 on fssyspath - the "system" representation is bytes on POSIX, but not on Windows. Let's be explicit and go with fsbytespath(). But agreed that always-constant boolean parameters are a bad idea. The hard bit is good naming of the separate functions (100% agree that shutil is a good example of how not to do it :-)) Paul From ethan at stoneleaf.us Wed Apr 13 12:30:37 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 09:30:37 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: <570E742D.2050303@stoneleaf.us> On 04/13/2016 09:18 AM, Fred Drake wrote: > On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman wrote: >> - a single os.fspath() with an allow_bytes parameter >> (mostly True in os and os.path, mostly False everywhere >> else) > > -0 > >> - a str-only os.fspathname() and a str/bytes os.fspath() > > +1 on using separate functions. > Names like os.fspath() and os.fssyspath() seem good to me. Ooh, I like that! I could probably keep those names separate in my head. :) -- ~Ethan~ From ethan at stoneleaf.us Wed Apr 13 12:31:32 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 09:31:32 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: <570E7464.2070008@stoneleaf.us> On 04/13/2016 09:27 AM, Paul Moore wrote: > On 13 April 2016 at 17:18, Fred Drake wrote: >> Names like os.fspath() and os.fssyspath() seem good to me. > > -1 on fssyspath - the "system" representation is bytes on POSIX, but > not on Windows. Let's be explicit and go with fsbytespath(). It will be confusing that fsbytespath() can return a string. -- ~Ethan~ From fred at fdrake.net Wed Apr 13 12:31:09 2016 From: fred at fdrake.net (Fred Drake) Date: Wed, 13 Apr 2016 12:31:09 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: On Wed, Apr 13, 2016 at 12:27 PM, Paul Moore wrote: > -1 on fssyspath - the "system" representation is bytes on POSIX, but > not on Windows. Let's be explicit and go with fsbytespath(). Depends on the semantics; if we're expecting it to return str-or-bytes, os.fssyspath() seems fine. If only returning bytes (not sure that ever makes sense on Windows, since I don't use Windows), then I'd be happy with os.fsbytespath(). -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From guido at python.org Wed Apr 13 12:33:34 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 13 Apr 2016 09:33:34 -0700 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: Nice work. I think that for CPython, speed is much more important than memory use for the code. Disk space is practically free for anything smaller than a video. :-) On Wed, Apr 13, 2016 at 9:24 AM, Victor Stinner wrote: > Hi, > > In the middle of recent discussions about Python performance, it was > discussed to change the Python bytecode. Serhiy proposed to reuse > MicroPython short bytecode to reduce the disk space and reduce the > memory footprint. > > Demur Rumed proposes a different change to use a regular bytecode > using 16-bit units: an instruction has always one 8-bit argument, it's > zero if the instruction doesn't have an argument: > > http://bugs.python.org/issue26647 > > According to benchmarks, it looks faster: > > http://bugs.python.org/issue26647#msg263339 > > IMHO it's a nice enhancement: it makes the code simpler. The most > interesting change is made in Python/ceval.c: > > - if (HAS_ARG(opcode)) > - oparg = NEXTARG(); > + oparg = NEXTARG(); > > This code is the very hot loop evaluating Python bytecode. I expect > that removing a conditional branch here can reduce the CPU branch > misprediction. > > I reviewed first versions of the change, and IMHO it's almost ready to > be merged. But I would prefer to have a review from a least a second > core reviewer. > > Can someone please review the change? > > -- > > The side effect of wordcode is that arguments in 0..255 now uses 2 > bytes per instruction instead of 3, so it also reduce the size of > bytecode for the most common case. > > Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead > of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 > bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit > argument for keyword defaults and 24-bit argument for annotations. > Other common instruction known to use large argument are jumps for > bytecode longer than 256 bytes. > > -- > > Right now, ceval.c still fetchs opcode and then oparg with two 8-bit > instructions. Later, we can discuss if it would be possible to ensure > that the bytecode is always aligned to 16-bit in memory to fetch the > two bytes using a uint16_t* pointer. > > Maybe we can overallocate 1 byte in codeobject.c and align manually > the memory block if needed. Or ceval.c should maybe copy the code if > it's not aligned? > > Raymond Hettinger proposes something like that, but it looks like > there are concerns about non-aligned memory accesses: > > http://bugs.python.org/issue25823 > > The cost of non-aligned memory accesses depends on the CPU > architecture, but it can raise a SIGBUS on some arch (MIPS and > SPARC?). > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Wed Apr 13 12:41:11 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 13 Apr 2016 17:41:11 +0100 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570E7464.2070008@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> <570E7464.2070008@stoneleaf.us> Message-ID: On 13 April 2016 at 17:31, Ethan Furman wrote: > On 04/13/2016 09:27 AM, Paul Moore wrote: >> >> On 13 April 2016 at 17:18, Fred Drake wrote: > > >>> Names like os.fspath() and os.fssyspath() seem good to me. >> >> >> -1 on fssyspath - the "system" representation is bytes on POSIX, but >> not on Windows. Let's be explicit and go with fsbytespath(). > > > It will be confusing that fsbytespath() can return a string. Oh, wait, yes fssyspath is for allow_bytes=True which *may* be bytes, but could still be a string. My mistake. On that basis, I could go with fssyspath (thinking "sys" = "low level"). Paul From brett at python.org Wed Apr 13 12:43:31 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 16:43:31 +0000 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On Wed, 13 Apr 2016 at 06:14 Stefan Krah wrote: > Victor Stinner gmail.com> writes: > > Maybe it's time to move more 3.x buildbots to the "stable" category? > > http://buildbot.python.org/all/waterfall?category=3.x.stable > > +1 I think anything that is actually stable should be in that category. > > > > By the way, I don't understand why "AMD64 OpenIndiana 3.x" is > > considered as stable since it's failing with multiple issues since > > many months and nobody is working on these failures. I suggest to move > > this buildbot back to the unstable category. > > +1 The bot was very stable and fast for some time but has been unstable > for at least a year. > > > > > - PPC64 AIX 3.x: failing tests: test_httplib, test_httpservers, > > test_socket, test_distutils, test_asyncio, (...); random timeout > > failure in test_eintr, etc. I don't have access to AIX and I'm not > > interested to acquire an AIX license, nor to install it. I'm not sure > > that it's useful to have an AIX buildbot and no core developer have > > access to AIX, and nobody is working on AIX failures. Maybe HP wants > > to help us to support AIX? (Provide manpower, access to AIX servers, > > or something like that.) > > Well, I think in this case it's the gcc AIX maintainer running it, so... > > > I think we should have a policy to stop reporting issues on unstable > bots unless someone has a concrete fix OR the bot maintainers are > known to fix issues fast (but that does not seem to be the case). > Official policy per https://www.python.org/dev/peps/pep-0011/#supporting-platforms states that there must be a core developer to maintain the compatibility, so if there's no one helping to keep a particular buildbot green then I agree it should be marked as unstable and thus not supported. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 13 12:44:08 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 16:44:08 +0000 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: <570E4215.5090101@timgolden.me.uk> References: <570E4215.5090101@timgolden.me.uk> Message-ID: On Wed, 13 Apr 2016 at 05:57 Tim Golden wrote: > On 13/04/2016 12:40, Victor Stinner wrote: > > Last months, most 3.x buildbots failed randomly. Some of them were > > always failing. I spent some time to fix almost all Windows and Linux > > buildbots. There were a lot of different issues. > > Can I state the obvious and offer a huge vote of thanks for this work, > which is often tedious and unrewarding? > Yep, big thanks from me as well! -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Wed Apr 13 12:51:29 2016 From: random832 at fastmail.com (Random832) Date: Wed, 13 Apr 2016 12:51:29 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570E659C.8010108@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> Message-ID: <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> On Wed, Apr 13, 2016, at 11:28, Ethan Furman wrote: > On 04/13/2016 08:17 AM, Random832 wrote: > > On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote: > > >> I'd expect the main consumers to be os and os.path, and would honestly > >> be surprised if we needed many explicit invocations above that layer, > >> other than in pathlib itself. > > > > I made a toy implementation to try this out, and making os.open support > > it does not get you builtin open "for free" as I had suspected; builtin > > open has its own type checks in _iomodule.c. > > Yup, it will take some effort to make this work. A corner case just occurred to me... For functions that will continue to accept str/bytes (and functions that accept some other type such as Number or file-like objects), what should be done with an object that is one of these, *and* has an __fspath__ method, *and* this method returns a value other than the object's own value? Basically, should the protocol check be done unconditionally (before attempting to use the argument as a string) or only if the argument is not a string (there's an efficiency argument for this). Or should it be left "unspecified", with the understanding that such objects are badly behaved and may not be handled consistently across different functions / python implementations / cpython versions? Also, should the os.fspath (or whatever we call it) function itself accept str/bytes, even if these are not going to implement the protocol? From brett at python.org Wed Apr 13 12:58:22 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 16:58:22 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: On Wed, 13 Apr 2016 at 09:19 Fred Drake wrote: > On Wed, Apr 13, 2016 at 11:09 AM, Ethan Furman wrote: > > - a single os.fspath() with an allow_bytes parameter > > (mostly True in os and os.path, mostly False everywhere > > else) > > -0 > > > - a str-only os.fspathname() and a str/bytes os.fspath() > > +1 on using separate functions. > > > I'm partial to the first choice as it is simplicity itself to know when > > looking at it if bytes might be coming back by the presence or absence > of a > > second argument to the call; otherwise one has to keep straight in one's > > head which is str-only and which might allow bytes (I'm not very good at > > keeping similar sounding functions separate -- what's the difference > between > > shutil.copy and shutil.copy2? I have to look it up every time). > > I do the same, but... this is one of those cases where a caller will > usually be passing a constant directly. If passed as a positional > argument, it'll just be confusing ("what's True?" is my usual reaction > to a Boolean positional argument). It would be keyword-only so this isn't even a possibility. > If passed as a keyword argument > with a descriptive name, it'll be longer than I'd like to see: > > path_str = os.fspath(path, allow_bytes=True) > I think the expectation that the number of people actually directly calling this function with that argument specified is going to be rather small, so the common-case will simply be: path_str = os.fspath(path) > > Names like os.fspath() and os.fssyspath() seem good to me. > -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Wed Apr 13 12:59:35 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 13 Apr 2016 09:59:35 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570E6130.6080507@stoneleaf.us> (Ethan Furman's message of "Wed, 13 Apr 2016 08:09:36 -0700") References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: <87y48hcumg.fsf@thinkpad.rath.org> On Apr 13 2016, Ethan Furman wrote: > (I'm not very good at keeping similar sounding functions separate -- > what's the difference between shutil.copy and shutil.copy2? I have to > look it up every time). Well, "2" is more than "" (or 1), so copy2() copies *more* than copy() - it includes the metadata. That always helps me. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From ethan at stoneleaf.us Wed Apr 13 13:06:33 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 10:06:33 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <570E6130.6080507@stoneleaf.us> Message-ID: <570E7C99.3020402@stoneleaf.us> On 04/13/2016 09:58 AM, Brett Cannon wrote:> On Wed, 13 Apr 2016 at 09:19 Fred Drake wrote: >> I do the same, but... this is one of those cases where a caller will >> usually be passing a constant directly. If passed as a positional >> argument, it'll just be confusing ("what's True?" is my usual >> reaction to a Boolean positional argument). > > It would be keyword-only so this isn't even a possibility. > >> If passed as a keyword argument >> with a descriptive name, it'll be longer than I'd like to see: >> >> path_str = os.fspath(path, allow_bytes=True) > > I think the expectation that the number of people actually directly > calling this function with that argument specified is going to be > rather small, so the common-case will simply be: > > path_str = os.fspath(path) That is certainly my expectation. :) >> Names like os.fspath() and os.fssyspath() seem good to me. A single function is definitely my preference, but if that's not possible then I'm fine with that pair of names. -- ~Ethan~ From brett at python.org Wed Apr 13 13:10:09 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 17:10:09 +0000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev < python-dev at python.org> wrote: > Ethan Furman stoneleaf.us> writes: > > > Do we allow bytes to be returned from os.fspath()? If yes, then do we > > allow bytes from __fspath__()? > > De-lurking. Especially since the ultimate goal is better interoperability, > I > feel like an implementation that people can play with would help guide the > few remaining decisions. To help test the various options you could > temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to > both > pathlib.__fspath__() and os.fspath(), with distinct configurable defaults > for > each. > > In the spirit of Python 3 I feel like bytes might not be needed in > practice, > but something like this with defaults of False will allow people to easily > test all the various options. > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed). -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 13 13:20:07 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 17:20:07 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: On Wed, 13 Apr 2016 at 09:52 Random832 wrote: > On Wed, Apr 13, 2016, at 11:28, Ethan Furman wrote: > > On 04/13/2016 08:17 AM, Random832 wrote: > > > On Wed, Apr 13, 2016, at 10:21, Nick Coghlan wrote: > > > > >> I'd expect the main consumers to be os and os.path, and would honestly > > >> be surprised if we needed many explicit invocations above that layer, > > >> other than in pathlib itself. > > > > > > I made a toy implementation to try this out, and making os.open support > > > it does not get you builtin open "for free" as I had suspected; builtin > > > open has its own type checks in _iomodule.c. > > > > Yup, it will take some effort to make this work. > > A corner case just occurred to me... > > For functions that will continue to accept str/bytes (and functions that > accept some other type such as Number or file-like objects), what should > be done with an object that is one of these, *and* has an __fspath__ > method, *and* this method returns a value other than the object's own > value? Basically, should the protocol check be done unconditionally > (before attempting to use the argument as a string) or only if the > argument is not a string (there's an efficiency argument for this). Or > should it be left "unspecified", with the understanding that such > objects are badly behaved and may not be handled consistently across > different functions / python implementations / cpython versions? > > Also, should the os.fspath (or whatever we call it) function itself > accept str/bytes, even if these are not going to implement the protocol? > All of this is demonstrated in https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by the various possibilities. In the end it's not a corner case because the definition of __fspath__ will be such that there's no ambiguity in what os.fspath() will accept and what __fspath__ can return and the code will be written to conform to what the PEP dictates (IOW I'm aware that this needs to be considered in the implementation :) . -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Wed Apr 13 13:22:48 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Wed, 13 Apr 2016 13:22:48 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <570E8068.4060303@sdamon.com> On 4/13/2016 13:10, Brett Cannon wrote: > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has > the four potential approaches implemented (although it doesn't follow > the "separate functions" approach some are proposing and instead goes > with the allow_bytes approach I originally proposed). Number 4 is my personal favorite - it has a simple control flow path and is the least needlessly restrictive. (I could rant about needless restrictions, but I am about a decade late for that, so I wont bother.) From ethan at stoneleaf.us Wed Apr 13 13:49:48 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 10:49:48 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570E8068.4060303@sdamon.com> References: <570C1E13.4090909@stoneleaf.us> <570E8068.4060303@sdamon.com> Message-ID: <570E86BC.2030703@stoneleaf.us> On 04/13/2016 10:22 AM, Alexander Walters wrote: > On 4/13/2016 13:10, Brett Cannon wrote: >> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 >> has the four potential approaches implemented (although it doesn't >> follow the "separate functions" approach some are proposing and >> instead goes with the allow_bytes approach I originally proposed). > > Number 4 is my personal favorite - it has a simple control flow path and > is the least needlessly restrictive. Number 3: it allows bytes, but only when told it's okay to do so. Having code get a bytes object when one is not expected is not a headache we need to inflict on anyone. -- ~Ethan~ From antoine at python.org Wed Apr 13 14:25:52 2016 From: antoine at python.org (Antoine Pitrou) Date: Wed, 13 Apr 2016 18:25:52 +0000 (UTC) Subject: [Python-Dev] pathlib - current status of discussions References: <570C1E13.4090909@stoneleaf.us> Message-ID: Brett Cannon python.org> writes: > In the spirit of Python 3 I feel like bytes might not be needed in practice, > but something like this with defaults of False will allow people to easily > test all the various options. > > > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1?has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed).? Either number 1 or number 3 for me (I don't think bytes path-like objects are useful in Python). Regards Antoine. From rosuav at gmail.com Wed Apr 13 15:24:35 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 14 Apr 2016 05:24:35 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon wrote: > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the > four potential approaches implemented (although it doesn't follow the > "separate functions" approach some are proposing and instead goes with the > allow_bytes approach I originally proposed). All of them have this construct: try: path = path.__fspath__() except AttributeError: pass Is that the intention, or should the exception catching be narrower? I know it's clunky to write it in Python, but AIUI it's less so in C: try: callme = path.__fspath__ except AttributeError: pass else: path = callme() ChrisA From brett at python.org Wed Apr 13 15:30:30 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 19:30:30 +0000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Wed, 13 Apr 2016 at 12:25 Chris Angelico wrote: > On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon wrote: > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 > has the > > four potential approaches implemented (although it doesn't follow the > > "separate functions" approach some are proposing and instead goes with > the > > allow_bytes approach I originally proposed). > > All of them have this construct: > > try: > path = path.__fspath__() > except AttributeError: > pass > > Is that the intention, or should the exception catching be narrower? I > know it's clunky to write it in Python, but AIUI it's less so in C: > > try: > callme = path.__fspath__ > except AttributeError: > pass > else: > path = callme() > I'm assuming the C code will do what you're suggesting. My way is just faster to write in 2 minutes of coding. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fred at fdrake.net Wed Apr 13 15:36:12 2016 From: fred at fdrake.net (Fred Drake) Date: Wed, 13 Apr 2016 15:36:12 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico wrote: > Is that the intention, or should the exception catching be narrower? I > know it's clunky to write it in Python, but AIUI it's less so in C: > > try: > callme = path.__fspath__ > except AttributeError: > pass > else: > path = callme() +1 for this variant; I really don't like masking errors inside the __fspath__ implementation. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From rosuav at gmail.com Wed Apr 13 15:37:30 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 14 Apr 2016 05:37:30 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Thu, Apr 14, 2016 at 5:30 AM, Brett Cannon wrote: > > > On Wed, 13 Apr 2016 at 12:25 Chris Angelico wrote: >> >> On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon wrote: >> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has >> > the >> > four potential approaches implemented (although it doesn't follow the >> > "separate functions" approach some are proposing and instead goes with >> > the >> > allow_bytes approach I originally proposed). >> >> All of them have this construct: >> >> try: >> path = path.__fspath__() >> except AttributeError: >> pass >> >> Is that the intention, or should the exception catching be narrower? I >> know it's clunky to write it in Python, but AIUI it's less so in C: >> >> try: >> callme = path.__fspath__ >> except AttributeError: >> pass >> else: >> path = callme() > > > I'm assuming the C code will do what you're suggesting. My way is just > faster to write in 2 minutes of coding. :) Cool cool. Just checking! You're already aware that my preference is for the first one, str-only. I don't think the second one has much value (a path-like object can only ever return a str, but a bytes can be passed through unchanged?), and the fourth strikes me as a bad idea (just allowing bytes any time). So my votes are +1, -0.5, +0, -1. ChrisA From tritium-list at sdamon.com Wed Apr 13 15:42:43 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Wed, 13 Apr 2016 15:42:43 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570E86BC.2030703@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> <570E8068.4060303@sdamon.com> <570E86BC.2030703@stoneleaf.us> Message-ID: <570EA133.6030504@sdamon.com> On 4/13/2016 13:49, Ethan Furman wrote: > Number 3: it allows bytes, but only when told it's okay to do so. > Having code get a bytes object when one is not expected is not a > headache we need to inflict on anyone. This is an artifact of the other needless restrictions I said I wouldn't rant about. I think it is in the best interest not to perpetuate those needless restrictions. From random832 at fastmail.com Wed Apr 13 15:46:37 2016 From: random832 at fastmail.com (Random832) Date: Wed, 13 Apr 2016 15:46:37 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <1460576797.3970855.577940577.074B7E19@webmail.messagingengine.com> On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote: > Is that the intention, or should the exception catching be narrower? I > know it's clunky to write it in Python, but AIUI it's less so in C: How is it less so in C? You lose the ability to PyObject_CallMethod. From rosuav at gmail.com Wed Apr 13 15:54:37 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 14 Apr 2016 05:54:37 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460576797.3970855.577940577.074B7E19@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <1460576797.3970855.577940577.074B7E19@webmail.messagingengine.com> Message-ID: On Thu, Apr 14, 2016 at 5:46 AM, Random832 wrote: > On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote: >> Is that the intention, or should the exception catching be narrower? I >> know it's clunky to write it in Python, but AIUI it's less so in C: > > How is it less so in C? You lose the ability to PyObject_CallMethod. I might be wrong, then. Wasn't sure how it was all implemented. Anyway, it's a correctness thing, not a simplicity one, so even if it is clunkier, it ought to be the case. And that is the intention, so we're fine. ChrisA From brett at python.org Wed Apr 13 15:54:44 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 19:54:44 +0000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Wed, 13 Apr 2016 at 12:39 Fred Drake wrote: > On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico wrote: > > Is that the intention, or should the exception catching be narrower? I > > know it's clunky to write it in Python, but AIUI it's less so in C: > > > > try: > > callme = path.__fspath__ > > except AttributeError: > > pass > > else: > > path = callme() > > +1 for this variant; I really don't like masking errors inside the > __fspath__ implementation. > Don't read too much into the code in that gist. I just did them quickly to get the point across of the proposals in terms of str/bytes, not what will be proposed in any final patch. -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Apr 13 15:59:50 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 13 Apr 2016 22:59:50 +0300 Subject: [Python-Dev] List posting custom [was: current status of discussions] In-Reply-To: <22285.46450.255405.357217@turnbull.sk.tsukuba.ac.jp> References: <570C1E13.4090909@stoneleaf.us> <-9219200259368253896@unknownmsgid> <570C8EEE.6050904@stoneleaf.us> <570D167C.8040202@mail.de> <22285.46450.255405.357217@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Apr 13, 2016 at 5:56 AM, Stephen J. Turnbull wrote: > The following is my opinion, as will become obvious, but it's based on > over a decade of observing these lists, and other open source > development lists. In a context where some core developers have > unsubscribed from these lists, and others regularly report muting > threads with a certain air of asperity, I think it's worth the risk of > seeming arrogant to explain some of the customs (which are complex and > subtle) around posting to Python developer lists. I'm posting > publicly because there are several new developers whose activity and > fresh perspective is very welcome, but harmony *is* being disturbed, > IMO unnecessarily. > Thank you for this thoughtful post. While none of the quotes you refer to are mine, I did try to find whether any of the advice is something I should learn from. While I didn't find a whole lot (please do correct me if you think otherwise), it is also valuable to hear these things from someone more experienced, even just to confirm what I may have thought or guessed. I can't really tell, but possibly some of the thoughts are interesting even to people significantly more experienced than me. I know you are not interested in discussing this further here, but I'll add some inexperienced points of view inline below, just in case someone is interested: > This particular post caught my eye, but it's only an example of one of > the most unharmonious posting styles that has become common recently. > Attribution deliberately removed. > > > Sorry for disturbing this thread's harmony. > > *sigh* There is way too much of this on Python-Ideas recently, and > there shouldn't be any on Python-Dev. So please don't. Specifically, > disagreement with an apparently developing consensus is fine but > please avoid this: > > > >> Path is an alternative to os.path -- you don't need to use both. > > > > I agree with that quote of Chris. > > It's a waste of time to post *what* you agree with.[1] Decisions are > not taken by vote in this community, except for the color of the > bikeshed, where it is agreed that *what* decision is taken doesn't > matter, but that some decision should be taken expeditiously.[2] > Chris already stated this position clearly and it's not a "color", so > there is no need to reiterate. It simply wastes others' time to read > it. (Whether it was a waste of the poster's time is not for me to > comment on.) > > What matters to the decision is *why* you agree (or disagree). If you > think that some of Chris's arguments are bogus (and should be > disregarded) and others are important, that is valuable information. > It's even better if you can shed additional light on the matter > (example below). > > Also, expression of agreement is often a prelude to a request for > information. "I agree with Z's post. At least, I have never needed > X. *When* do you need X? Let's look for a better way than X!" > That's what I thought too. I remember several times recently that I have mentioned I agreed about something, then continuing to add more to it, or even saying I disagree about something else. Part of the reason to also state that I agree is an attempt to keep the overall tone more positive. After all, the other person might be a highly experienced core developer who just did not happen to have gone though all the same thoughts regarding that specific question recently. I hope that has not been interpreted as arrogance such as "I know better than these people". For me, as one of the (many?) newcomers, especially on -dev, it can sometimes be difficult to tell whether not getting a reaction means "Good point, I agree", "I did not understand so I'll just ignore it", "I don't want to argue with you" or something else. Then again, someone just saying essentially the same thing without a reference a few posts later just feels strange. Also, if the only thing people apparently do is disagree about things, it makes the overall tone of the discussions at least *seem* very negative. From this point of view there seems to be some good in positive comments. > Unsupported (dis)agreement to statements about "needs" also may be > taken as *rude*, because others may infer your arrogant claim to know > what *they* do or don't need. Admittedly there's a difficult > distinction here between Chris's *idiom* where "you don't need to" > translates to "In my understanding, it is generally not necessary to", > and your *unsupported* agreement, which in my dialect of English > changes the emphasis to imply you know better than those who disagree > with you and Chris. And, of course, the position that others are "too > easily offended" is often reasonable, but you should be aware that > there will be an impact on your reputation and ability to influence > development of Python (even if it doesn't come near the point where > a moderator invokes "Code of Conduct"). > > "Me too" posts aren't entirely forbidden, but I feel that in Python > custom they are most appropriate when voting on bikeshed colors, and > as applause for a *technically* excellent suggestion. They should be > avoided in the context of value judgments (of "need" and "simplicity", > for example) for the reason given above. Personally, I've sometimes feeled the urge to give a positive comment just to make sure something gets noticed, or to help keep the discussion *not* go around in circles by pointing out more clearly the important points to the people not as involved in the topic of discussion. But I've tried to resist this urge when I don't have anything to add. I find the notion of S/N (signal-to-noise ratio), which you in fact brought up recently in another thread, very important. > > When people want to use your library and it requires a string, the > > can simply use "my_path.path" and everything still works for them > > when they switch to pathlib. > > This is disrespectful in tone. I don't know if you're responding to > Ethan here, but he's one of the authors in question. We *know* that > Ethan doesn't like such inelegant idioms -- he said so -- where "this > object has an appropriate conversion to your argument type, so you > should apply it implicitly" is unambiguous.[3] So for him, it's *not* > so simple. Since it's not a matter of voting, each proponent should > provide more contexts where preferred programming idioms are > "Pythonic" to sway the sense of the community, or if necessary, the > BDFL. > > Where that aesthetic came up was in the context of consistently > wrapping arguments that might be Paths in str, as in > > p = Path(*stuff) or defaultstring > # 500 lines crossing function and module boundaries! > with open(str(p)) as f: > process(f) > > I think it was Nick who posted agreement with Ethan on the aesthetics > of str-wrapping. If that were all, he probably wouldn't have posted > (see fn. 1), but he further pointed out that this application of str > is *dangerous* because *everything* in Python can be coerced to str. > That was a very valuable observation, which swayed the list in favor > of "Uh-oh, we can't recommend 'os.method(str(Path))'!" > > This is my last post on this particular topic, but I will be happy to > discuss off-list. (I may discuss further in public on my blog, but > first I have to get a blog. :-) > > > Footnotes: > [1] "You" is generic here. There are a couple of developers whose > agreement has the status of pronouncement of Pythonicity. Aspire to > that, but don't assume it -- very few have it, and it's actually > *very* rarely exercised. And you can recognize them because they are > *asked* to pronounce -- by people whose statements you thought were > already authoritative! > > [2] And even so votes are often overturned by later arguments, both > theoretical and based in experience. See for example the several > threads over time on the naming of Py_XSETREF. > > [3] Interpreting Zen koans frequently requires figure-ground > inversion. In this case we can apply "In the face of ambiguity, > refuse to guess" in the form "in the absence of ambiguity, don't wait > to be asked". I'm hardly authoritative, but FWIW :-) I think Ethan's > esthetic sense here accords with Pythonicity. From zachary.ware+pydev at gmail.com Wed Apr 13 16:16:08 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Wed, 13 Apr 2016 15:16:08 -0500 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On Wed, Apr 13, 2016 at 6:40 AM, Victor Stinner wrote: > Hi, > > Last months, most 3.x buildbots failed randomly. Some of them were > always failing. I spent some time to fix almost all Windows and Linux > buildbots. There were a lot of different issues. Thank you for doing this! > Maybe it's time to move more 3.x buildbots to the "stable" category? > http://buildbot.python.org/all/waterfall?category=3.x.stable A few months ago, I put together a list of suggestions for updating the stable/unstable list, but never got around to implementing it. > We have many offline buildbots. What's the status of these buildbots? > Should we expect that they come back soon? My Windows 8.1 bot is a VM that resides on a machine that has been disturbingly unstable lately, and it's starting to seem like the instability is due to that VM. I hope to have it back up (and stable) again soon, but have no timetable for it. My Docs bot was off after losing power over the weekend, and I just hadn't noticed yet. It's back now. I'll ping the python-buildbots list about other offline bots. > Or would it be possible to hide them? It would help to check the > status of all buildbots. I'm not sure, but that would be a nice feature. > - the 4 ICC buildbots are failing with stack overflow, segfault, etc. > Again, I'm not sure that these buildbots are useful since it looks > like we don't support this compiler yet. Or does it help to work on > supporting this compiler? Who is working on ICC support? The Ubuntu ICC bot is generally quite stable. The OSX ICC bot is currently offline, but has only a couple of known issues. The Windows ICC bot is still a bit experimental, but has inched closer to producing a working build. R. David Murray and I have been working with Intel on ICC support. > By the way, I'm always surprised by the huge difference of time needed > to run a build on the different slaves: from a few minutes to more > than 3 hours. The fatest Windows slave takes 28 minutes (run tests in > parallel using 4 child processes), whereas the 3 others (run tests > sequentially and) take between 2 hours and more than 3 hours! Why > running tests on Windows takes so long? Most of that is down to debug mode; building Python in debug mode links with the debug CRT which also enables all manner of extra checks. When it's up, the non-debug Windows bot also runs the test suite in ~28 minutes, running sequentially. --- After receiving a suggestion from koobs several months ago, I've been intermittently thinking about completely redoing our buildmaster setup such that instead of a single builder per version on each slave, we instead set up a series of builders with particular 'tags', and each builder attaches to each slave that satisfies the tags (running each build only on the first slave available). This would allow us to test some of the rarer options (such as --without-threads) significantly more often than 'never', and generally get a lot more customization/flexibility of builds. I haven't had a chance to sit down and think out all the edge cases of this idea, but what do people generally think of it? I think the GitHub switchover will be a good time to do this if it's generally seen as a decent idea, since there will need to be some work on the buildmaster to do the switch anyway. -- Zach From brett at python.org Wed Apr 13 16:37:46 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 20:37:46 +0000 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On Wed, 13 Apr 2016 at 13:17 Zachary Ware wrote: > [SNIP] > --- > > After receiving a suggestion from koobs several months ago, I've been > intermittently thinking about completely redoing our buildmaster setup > such that instead of a single builder per version on each slave, we > instead set up a series of builders with particular 'tags', and each > builder attaches to each slave that satisfies the tags (running each > build only on the first slave available). This would allow us to test > some of the rarer options (such as --without-threads) significantly > more often than 'never', and generally get a lot more > customization/flexibility of builds. I haven't had a chance to sit > down and think out all the edge cases of this idea, but what do people > generally think of it? I think the GitHub switchover will be a good > time to do this if it's generally seen as a decent idea, since there > will need to be some work on the buildmaster to do the switch anyway. > So we have slaves connect to multiple builders who have requirements of what they are testing? So the --without-threads master would have all slaves able to compile --without-threads connect to it and then do that build? And those same slaves may also connect to the gcc and clang masters to do those builds as well? So would that mean slaves could potentially do a bunch of builds per change? That sounds nice to me as long as the slave maintainers are also up to utilizing this by double/triple/quadrupling their builds. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Apr 13 16:39:42 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 13 Apr 2016 13:39:42 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: so are we worried that __fspath__ will exist and be callable, but might raise an AttributeError somewhere inside itself? if so isn't it broken anyway, so should it be ignored? and I know it's asking poermission rather than forgiveness, but what's wrong with: if hasattr(path, "__fspath__"): path = path.__fspath__() if you really want to check for the existence of the attribute first? or even: path = path.__fspath__ if hasattr(path, "__fspath__") else path (OK, really a Pythonic style question now....) -CHB On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannon wrote: > > > On Wed, 13 Apr 2016 at 12:39 Fred Drake wrote: > >> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico wrote: >> > Is that the intention, or should the exception catching be narrower? I >> > know it's clunky to write it in Python, but AIUI it's less so in C: >> > >> > try: >> > callme = path.__fspath__ >> > except AttributeError: >> > pass >> > else: >> > path = callme() >> >> +1 for this variant; I really don't like masking errors inside the >> __fspath__ implementation. >> > > Don't read too much into the code in that gist. I just did them quickly to > get the point across of the proposals in terms of str/bytes, not what will > be proposed in any final patch. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 13 16:42:48 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 20:42:48 +0000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Wed, 13 Apr 2016 at 13:40 Chris Barker wrote: > so are we worried that __fspath__ will exist and be callable, but might > raise an AttributeError somewhere inside itself? if so isn't it broken > anyway, so should it be ignored? > It should propagate instead of swallowing up the exception, otherwise it's hard to debug why __fspath__ seems to be ignored. > > and I know it's asking permission rather than forgiveness, but what's > wrong with: > > if hasattr(path, "__fspath__"): > path = path.__fspath__() > > if you really want to check for the existence of the attribute first? > > Nothing. > or even: > > path = path.__fspath__ if hasattr(path, "__fspath__") else path > > That also works. > > (OK, really a Pythonic style question now....) > Yes, this is getting a bit side-tracked over some example code to just get a concept across. -Brett > > -CHB > > > > On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannon wrote: > >> >> >> On Wed, 13 Apr 2016 at 12:39 Fred Drake wrote: >> >>> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico >>> wrote: >>> > Is that the intention, or should the exception catching be narrower? I >>> > know it's clunky to write it in Python, but AIUI it's less so in C: >>> > >>> > try: >>> > callme = path.__fspath__ >>> > except AttributeError: >>> > pass >>> > else: >>> > path = callme() >>> >>> +1 for this variant; I really don't like masking errors inside the >>> __fspath__ implementation. >>> >> >> Don't read too much into the code in that gist. I just did them quickly >> to get the point across of the proposals in terms of str/bytes, not what >> will be proposed in any final patch. >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Wed Apr 13 16:47:44 2016 From: random832 at fastmail.com (Random832) Date: Wed, 13 Apr 2016 16:47:44 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com> On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote: > so are we worried that __fspath__ will exist and be callable, but might > raise an AttributeError somewhere inside itself? if so isn't it broken > anyway, so should it be ignored? Well, if you're going to say "ignore the protocol because it's broken", where do you stop? What if it raises some other exception? What if it raises SystemExit? From ericfahlgren at gmail.com Wed Apr 13 17:02:27 2016 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Wed, 13 Apr 2016 14:02:27 -0700 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: <030201d195c7$ca9de130$5fd9a390$@gmail.com> On Wednesday, April 13, 2016 09:25, Victor Stinner wrote: > The side effect of wordcode is that arguments in 0..255 now uses 2 bytes per > instruction instead of 3, so it also reduce the size of bytecode for the most > common case. > > Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead of 3. > Arguments are supported up to 32-bit: 24-bit uses 3 units (6 bytes), 32-bit uses 4 > units (8 bytes). MAKE_FUNCTION uses 16-bit argument for keyword defaults and > 24-bit argument for annotations. > Other common instruction known to use large argument are jumps for bytecode > longer than 256 bytes. A couple months ago during an earlier discussion of wordcode, I got curious enough to instrument dis.dis so that I could calculate the actual size changes expected in practice. I ran it on a large chunk of our product code, here are the results (looks best with a fixed font). I suspect the fairly significant reduction in footprint will also give better cache hit characteristics, so we might see some "magic" speed ups from that, too. Code-generating source lines = 70,792 Total bytes = 1,196,653 Argument-bearing operators = 380,978 Operands over 1 byte long = 12,191 Extended arguments = 0 Percentage of 1-byte args = 96.80% Total operators = 434,697 Non-argument ops = 53,719 One-byte args = 368,787 Multi-byte args = 12,191 Byte code size = 1,196,653 Word code size = 893,776 Word:byte size = 74.69% Just for the record, here's my arithmetic: byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs (It is interesting to note that I have never encountered an EXTENDED_ARG operator in the wild, only in my own synthetic examples.) From victor.stinner at gmail.com Wed Apr 13 17:23:59 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Apr 2016 23:23:59 +0200 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: <030201d195c7$ca9de130$5fd9a390$@gmail.com> References: <030201d195c7$ca9de130$5fd9a390$@gmail.com> Message-ID: 2016-04-13 23:02 GMT+02:00 Eric Fahlgren : > Percentage of 1-byte args = 96.80% Yeah, I expected such high ratio. Good news that you confirm it. > Non-argument ops = 53,719 > One-byte args = 368,787 > Multi-byte args = 12,191 Again, only a very few arguments take multiple bytes. Good, the bytecode will be smaller. IMHO it's more a nice side effect than a real goal. The runtime performance matters more than the size of the bytecode, it's not like a bytecode take 4 MB. It's probably closer to 1 KB and so can probably benefit of the fatest CPU caches. > Just for the record, here's my arithmetic: > byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs > wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs If multiByteArgs means any size > 1 byte, the wordCodeSize formula is wrong: - no parameter: 2 bytes - 8-bit parameter: 2 bytes - 16-bit parameter: 4 bytes - 24-bit parameter: 6 bytes - 32-bit parameter: 8 bytes But you wrote that you didn't see EXTEND_ARG, so I guess that multibyte means 16-bit in your case, and so your formula is correct. Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit parameter for function with annotation. > (It is interesting to note that I have never encountered an EXTENDED_ARG operator in the wild, only in my own synthetic examples.) As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with annotations. Victor From rymg19 at gmail.com Wed Apr 13 17:29:05 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 13 Apr 2016 16:29:05 -0500 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: What is the value of HAS_ARG going to be now? -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ On Apr 13, 2016 11:26 AM, "Victor Stinner" wrote: > Hi, > > In the middle of recent discussions about Python performance, it was > discussed to change the Python bytecode. Serhiy proposed to reuse > MicroPython short bytecode to reduce the disk space and reduce the > memory footprint. > > Demur Rumed proposes a different change to use a regular bytecode > using 16-bit units: an instruction has always one 8-bit argument, it's > zero if the instruction doesn't have an argument: > > http://bugs.python.org/issue26647 > > According to benchmarks, it looks faster: > > http://bugs.python.org/issue26647#msg263339 > > IMHO it's a nice enhancement: it makes the code simpler. The most > interesting change is made in Python/ceval.c: > > - if (HAS_ARG(opcode)) > - oparg = NEXTARG(); > + oparg = NEXTARG(); > > This code is the very hot loop evaluating Python bytecode. I expect > that removing a conditional branch here can reduce the CPU branch > misprediction. > > I reviewed first versions of the change, and IMHO it's almost ready to > be merged. But I would prefer to have a review from a least a second > core reviewer. > > Can someone please review the change? > > -- > > The side effect of wordcode is that arguments in 0..255 now uses 2 > bytes per instruction instead of 3, so it also reduce the size of > bytecode for the most common case. > > Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead > of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6 > bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit > argument for keyword defaults and 24-bit argument for annotations. > Other common instruction known to use large argument are jumps for > bytecode longer than 256 bytes. > > -- > > Right now, ceval.c still fetchs opcode and then oparg with two 8-bit > instructions. Later, we can discuss if it would be possible to ensure > that the bytecode is always aligned to 16-bit in memory to fetch the > two bytes using a uint16_t* pointer. > > Maybe we can overallocate 1 byte in codeobject.c and align manually > the memory block if needed. Or ceval.c should maybe copy the code if > it's not aligned? > > Raymond Hettinger proposes something like that, but it looks like > there are concerns about non-aligned memory accesses: > > http://bugs.python.org/issue25823 > > The cost of non-aligned memory accesses depends on the CPU > architecture, but it can raise a SIGBUS on some arch (MIPS and > SPARC?). > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericfahlgren at gmail.com Wed Apr 13 17:35:27 2016 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Wed, 13 Apr 2016 14:35:27 -0700 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: <030201d195c7$ca9de130$5fd9a390$@gmail.com> Message-ID: The EXTENDED_ARG is included in the multibyte ops, I treat it just like any other operator. Here's a snippet of my hacked-dis.dis output, which made it clear to me that I could just count them as an "operator with word operand." Line 3000: x = x if x or not x and x is None else x 0001dc83 7c 00 00 LOAD_FAST x 0001dc86 91 01 00 EXTENDED_ARG 1 0001dc89 70 9f dc JUMP_IF_TRUE_OR_POP L1dc9f 0001dc8c 7c 00 00 LOAD_FAST x 0001dc8f 0c UNARY_NOT 0001dc90 91 01 00 EXTENDED_ARG 1 0001dc93 6f 9f dc JUMP_IF_FALSE_OR_POPL1dc9f 0001dc96 7c 00 00 LOAD_FAST x 0001dc99 74 01 00 LOAD_GLOBAL None 0001dc9c 6b 08 00 COMPARE_OP 'is' L1dc9f: 0001dc9f 91 01 00 EXTENDED_ARG 1 0001dca2 72 ab dc POP_JUMP_IF_FALSE L1dcab 0001dca5 7c 00 00 LOAD_FAST x 0001dca8 6e 03 00 JUMP_FORWARD L1dcae (+3) L1dcab: 0001dcab 7c 00 00 LOAD_FAST x L1dcae: 0001dcae 7d 00 00 STORE_FAST x On Wed, Apr 13, 2016 at 2:23 PM, Victor Stinner wrote: > 2016-04-13 23:02 GMT+02:00 Eric Fahlgren : > > Percentage of 1-byte args = 96.80% > > Yeah, I expected such high ratio. Good news that you confirm it. > > > > Non-argument ops = 53,719 > > One-byte args = 368,787 > > Multi-byte args = 12,191 > > Again, only a very few arguments take multiple bytes. Good, the > bytecode will be smaller. > > IMHO it's more a nice side effect than a real goal. The runtime > performance matters more than the size of the bytecode, it's not like > a bytecode take 4 MB. It's probably closer to 1 KB and so can probably > benefit of the fatest CPU caches. > > > > Just for the record, here's my arithmetic: > > byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs > > wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs > > If multiByteArgs means any size > 1 byte, the wordCodeSize formula is > wrong: > > - no parameter: 2 bytes > - 8-bit parameter: 2 bytes > - 16-bit parameter: 4 bytes > - 24-bit parameter: 6 bytes > - 32-bit parameter: 8 bytes > > But you wrote that you didn't see EXTEND_ARG, so I guess that > multibyte means 16-bit in your case, and so your formula is correct. > > Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit > parameter for function with annotation. > > > > (It is interesting to note that I have never encountered an EXTENDED_ARG > operator in the wild, only in my own synthetic examples.) > > As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with > annotations. > > Victor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Apr 13 17:37:33 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Apr 2016 23:37:33 +0200 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: Le mercredi 13 avril 2016, Brett Cannon a ?crit : > > All of this is demonstrated in > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by > the various possibilities. In the end it's not a corner case because the > definition of __fspath__ will be such that there's no ambiguity in what > os.fspath() will accept and what __fspath__ can return and the code will be > written to conform to what the PEP dictates (IOW I'm aware that this needs > to be considered in the implementation :) . > I'm not a big fan of a flag parameter to change the return type of a function. Usually, two functions are preferred. In the os module we have getcwd/getcwdb for example. I don't know if it's a good example Do you know other examples of Python functions taking a (flag) parameter to change the result type? Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Apr 13 17:39:29 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Apr 2016 23:39:29 +0200 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: Oops sorry, I forgot to add that I have no strong opinion on the type (I only have a minor preference for str only). Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.ware+pydev at gmail.com Wed Apr 13 16:50:52 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Wed, 13 Apr 2016 15:50:52 -0500 Subject: [Python-Dev] Tag-based buildmaster (was: Most 3.x buildbots are green again ... ) Message-ID: (Cross-posting to python-buildbots, discussion is probably best continued there) On Wed, Apr 13, 2016 at 3:37 PM, Brett Cannon wrote: > On Wed, 13 Apr 2016 at 13:17 Zachary Ware > wrote: >> After receiving a suggestion from koobs several months ago, I've been >> intermittently thinking about completely redoing our buildmaster setup >> such that instead of a single builder per version on each slave, we >> instead set up a series of builders with particular 'tags', and each >> builder attaches to each slave that satisfies the tags (running each >> build only on the first slave available). This would allow us to test >> some of the rarer options (such as --without-threads) significantly >> more often than 'never', and generally get a lot more >> customization/flexibility of builds. I haven't had a chance to sit >> down and think out all the edge cases of this idea, but what do people >> generally think of it? I think the GitHub switchover will be a good >> time to do this if it's generally seen as a decent idea, since there >> will need to be some work on the buildmaster to do the switch anyway. > > So we have slaves connect to multiple builders who have requirements of what > they are testing? So the --without-threads master would have all slaves able > to compile --without-threads connect to it and then do that build? And those > same slaves may also connect to the gcc and clang masters to do those builds > as well? So would that mean slaves could potentially do a bunch of builds > per change? That sounds nice to me as long as the slave maintainers are also > up to utilizing this by double/triple/quadrupling their builds. Basically, yes. I'm unsure as to whether the build would be done on all matching slaves on each change, or rotate between them (or use the next available) on each change; that would likely come down to which scheme we collectively want. I also have vague ideas about having 'daily' or even 'weekly' tags for builds that are deemed to not need a build for every changeset, which could alleviate some of the multiplying. -- Zach From victor.stinner at gmail.com Wed Apr 13 17:44:14 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Apr 2016 23:44:14 +0200 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: Le mercredi 13 avril 2016, Ryan Gonzalez a ?crit : > What is the value of HAS_ARG going to be now? > I asked Demur to keep HAS_ARG(). Not really for backward compatibility, but for the dis module: to keep a nice assembler. There are also debug traces in ceval.c which use it. For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0 rather than reading the bytecode) for operators with no argument. Or maybe it's completly useless :-) Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Apr 13 18:11:14 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 13 Apr 2016 17:11:14 -0500 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: So code that depends on iterating through bytecode via HAS_ARG is going to break... Darn it. :/ -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ On Apr 13, 2016 4:44 PM, "Victor Stinner" wrote: > Le mercredi 13 avril 2016, Ryan Gonzalez a ?crit : > >> What is the value of HAS_ARG going to be now? >> > > I asked Demur to keep HAS_ARG(). Not really for backward compatibility, > but for the dis module: to keep a nice assembler. There are also debug > traces in ceval.c which use it. > > For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0 > rather than reading the bytecode) for operators with no argument. Or maybe > it's completly useless :-) > > Victor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Apr 13 18:19:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 00:19:42 +0200 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: Oh, since others voted, I will also vote and explain my vote. I like choice 1, str only, because it's very well defined. In Python 3, Unicode is simply the native type for text. It's accepted by almost all functions. In other emails, I also explained that Unicode is fine to store undecodable filenames on UNIX, it works as expected since many years (since Python 3.3). -- If you cannot survive without bytes, I suggest to add two functions: one for str only, another which can return str or bytes. Maybe you want in fact two protocols: __fspath__(str only) and __fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or fallback to os.fsencode(__fspath__). os.fspath() would first try __fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not worth to have such complexity while Unicode handles all use cases. Or do you know functions implemented in Python accepting str *and* bytes? -- The C implementation of the os module has an important path_converter() function: * path_converter accepts (Unicode) strings and their * subclasses, and bytes and their subclasses. What * it does with the argument depends on the platform: * * * On Windows, if we get a (Unicode) string we * extract the wchar_t * and return it; if we get * bytes we extract the char * and return that. * * * On all other platforms, strings are encoded * to bytes using PyUnicode_FSConverter, then we * extract the char * from the bytes object and * return that. This function will implement something like os.fspath(). With os.fspath() only accepting str, we will return directly the Unicode string on Windows. On UNIX, Unicode will be encoded, as it's already done for Unicode strings. This specific function would benefit of the flavor 4 (os.fspath() can return str and bytes), but it's more an exception than the rule. I would be more a micro-optimization than a good reason to drive the API design. Victor Le mercredi 13 avril 2016, Brett Cannon a ?crit : > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed). From victor.stinner at gmail.com Wed Apr 13 18:26:00 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 00:26:00 +0200 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: 2016-04-14 0:11 GMT+02:00 Ryan Gonzalez : > So code that depends on iterating through bytecode via HAS_ARG is going to > break... Sure. This change is backward incompatible for applications parsing bytecode in C or Python. That's why the patch also has to update the dis module. I don't see how you plan to keep the backwad compatibility, since the argument size changed from 2 bytes to 1 byte. You must update your code (written in C or Python or whatever). Hopefully, the dis was enhanced in Python 3.4: get_instructions() now gives nice Instructon objects rather than only pure text output. FYI I wrote my own library to decode and decode bytecode. It provides abstract bytecode objects to easily modify bytecode: https://bytecode.readthedocs.org/ I suggest to use such library (or simply the dis module for simple needs) if you have to handle bytecode, rather than writing your own code. I know a few other projects which handle directly bytecode: * https://pypi.python.org/pypi/codetransformer * https://github.com/serprex/byteplay * https://pypi.python.org/pypi/coverage IHMO it's not a big deal to update these projects for the future Python 3.6. I can even help them to support the new bytecode format. Victor From yselivanov.ml at gmail.com Wed Apr 13 18:45:06 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 13 Apr 2016 18:45:06 -0400 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: <570ECBF2.7020205@gmail.com> On 2016-04-13 12:24 PM, Victor Stinner wrote: > Can someone please review the change? +1 for the change. I can take a look at the patch in a few days. Yury From Nikolaus at rath.org Wed Apr 13 18:45:23 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 13 Apr 2016 15:45:23 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: (Brett Cannon's message of "Wed, 13 Apr 2016 17:10:09 +0000") References: <570C1E13.4090909@stoneleaf.us> Message-ID: <87vb3lcem4.fsf@thinkpad.rath.org> On Apr 13 2016, Brett Cannon wrote: > On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev < > python-dev at python.org> wrote: > >> Ethan Furman stoneleaf.us> writes: >> >> > Do we allow bytes to be returned from os.fspath()? If yes, then do we >> > allow bytes from __fspath__()? >> >> De-lurking. Especially since the ultimate goal is better interoperability, >> I >> feel like an implementation that people can play with would help guide the >> few remaining decisions. To help test the various options you could >> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to >> both >> pathlib.__fspath__() and os.fspath(), with distinct configurable defaults >> for >> each. >> >> In the spirit of Python 3 I feel like bytes might not be needed in >> practice, >> but something like this with defaults of False will allow people to easily >> test all the various options. >> > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has > the four potential approaches implemented (although it doesn't follow the > "separate functions" approach some are proposing and instead goes with the > allow_bytes approach I originally proposed). When passing an object that is of type str and has a __fspath__ attribute, all approaches return the value of __fspath__(). However, when passing something of type bytes, the second approach returns the object, while the third returns the value of __fspath__(). Is this intentional? I think a __fspath__ attribute should always be preferred. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From ethan at stoneleaf.us Wed Apr 13 18:58:54 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 15:58:54 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <87vb3lcem4.fsf@thinkpad.rath.org> References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> Message-ID: <570ECF2E.1070004@stoneleaf.us> On 04/13/2016 03:45 PM, Nikolaus Rath wrote: > When passing an object that is of type str and has a __fspath__ > attribute, all approaches return the value of __fspath__(). > > However, when passing something of type bytes, the second approach > returns the object, while the third returns the value of __fspath__(). > > Is this intentional? I think a __fspath__ attribute should always be > preferred. Yes, it is intentional. The second approach assumes __fspath__ can only contain str, so there is no point in checking it for bytes. -- ~Ethan~ From brett at python.org Wed Apr 13 19:06:35 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 23:06:35 +0000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <87vb3lcem4.fsf@thinkpad.rath.org> References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> Message-ID: On Wed, 13 Apr 2016 at 15:46 Nikolaus Rath wrote: > On Apr 13 2016, Brett Cannon wrote: > > On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev < > > python-dev at python.org> wrote: > > > >> Ethan Furman stoneleaf.us> writes: > >> > >> > Do we allow bytes to be returned from os.fspath()? If yes, then do we > >> > allow bytes from __fspath__()? > >> > >> De-lurking. Especially since the ultimate goal is better > interoperability, > >> I > >> feel like an implementation that people can play with would help guide > the > >> few remaining decisions. To help test the various options you could > >> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to > >> both > >> pathlib.__fspath__() and os.fspath(), with distinct configurable > defaults > >> for > >> each. > >> > >> In the spirit of Python 3 I feel like bytes might not be needed in > >> practice, > >> but something like this with defaults of False will allow people to > easily > >> test all the various options. > >> > > > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has > > the four potential approaches implemented (although it doesn't follow the > > "separate functions" approach some are proposing and instead goes with > the > > allow_bytes approach I originally proposed). > > > When passing an object that is of type str and has a __fspath__ > attribute, all approaches return the value of __fspath__(). > > However, when passing something of type bytes, the second approach > returns the object, while the third returns the value of __fspath__(). > > Is this intentional? I think a __fspath__ attribute should always be > preferred. > It's very much intentional. If we define __fspath__() to only return strings but still want to minimize boilerplate of allowing bytes to simply pass through without checking a path argument to see if it is bytes then approach #2 is warranted. But if __fspath__() can return bytes then approach #3 allows for it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Apr 13 19:09:57 2016 From: brett at python.org (Brett Cannon) Date: Wed, 13 Apr 2016 23:09:57 +0000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On Wed, 13 Apr 2016 at 15:20 Victor Stinner wrote: > Oh, since others voted, I will also vote and explain my vote. > > I like choice 1, str only, because it's very well defined. In Python > 3, Unicode is simply the native type for text. It's accepted by almost > all functions. In other emails, I also explained that Unicode is fine > to store undecodable filenames on UNIX, it works as expected since > many years (since Python 3.3). > > -- > > If you cannot survive without bytes, I suggest to add two functions: > one for str only, another which can return str or bytes. > > Maybe you want in fact two protocols: __fspath__(str only) and > __fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or > fallback to os.fsencode(__fspath__). os.fspath() would first try > __fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not > worth to have such complexity while Unicode handles all use cases. > Implementing two magic methods for this seems like overkill. Best I would be willing to do with automatic encode/decode is use os.fsencode()/os.fsdecode() on the argument or what __fspath__() returned. > > Or do you know functions implemented in Python accepting str *and* bytes? > On purpose, nothing off the top of my head. > > -- > > The C implementation of the os module has an important > path_converter() function: > > * path_converter accepts (Unicode) strings and their > * subclasses, and bytes and their subclasses. What > * it does with the argument depends on the platform: > * > * * On Windows, if we get a (Unicode) string we > * extract the wchar_t * and return it; if we get > * bytes we extract the char * and return that. > * > * * On all other platforms, strings are encoded > * to bytes using PyUnicode_FSConverter, then we > * extract the char * from the bytes object and > * return that. > > This function will implement something like os.fspath(). > > With os.fspath() only accepting str, we will return directly the > Unicode string on Windows. On UNIX, Unicode will be encoded, as it's > already done for Unicode strings. > > This specific function would benefit of the flavor 4 (os.fspath() can > return str and bytes), but it's more an exception than the rule. I > would be more a micro-optimization than a good reason to drive the API > design. > Yep, it's interesting to know but Chris and I won't let it drive the decision (I assume). -Brett > > Victor > > Le mercredi 13 avril 2016, Brett Cannon a ?crit : > > > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 > has the four potential approaches implemented (although it doesn't follow > the "separate functions" approach some are proposing and instead goes with > the allow_bytes approach I originally proposed). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Apr 13 20:06:41 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 13 Apr 2016 17:06:41 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com> Message-ID: On Wed, Apr 13, 2016 at 1:47 PM, Random832 wrote: > On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote: > > so are we worried that __fspath__ will exist and be callable, but might > > raise an AttributeError somewhere inside itself? if so isn't it broken > > anyway, so should it be ignored? > > Well, if you're going to say "ignore the protocol because it's broken", > where do you stop? What if it raises some other exception? What if it > raises SystemExit? this is pretty much always the case with EAFTP coding: try: something() except SomeError: do_something_else() unless SomeError is a custom defined error that you know is never going to get raised anywhere else, then something() could raise SomeError for the reason you expect, or some code deep in the call stack could raise SomeError also, and you wouldn't know that. I had a student run into this and it took him a good while to debug it. But that was because the code in something() was pretty darn buggy. If he had tested something() by itself, there would have been no issue finding the problem. In this case, I don't know that we need to be tolerant of buggy __fspathname__() implementations -- they should be tested outside these checks, and not be buggy. So a buggy implementation may raise and may be ignored, depending on what Exception the bug triggers -- big deal. The only time it would matter is when the implementer is debugging the implementation. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Apr 13 20:29:19 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 17:29:19 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <1460580464.3984789.578009321.29A3CE1D@webmail.messagingengine.com> Message-ID: <570EE45F.2060303@stoneleaf.us> On 04/13/2016 05:06 PM, Chris Barker wrote: > In this case, I don't know that we need to be tolerant of buggy > __fspathname__() implementations -- they should be tested outside these > checks, and not be buggy. So a buggy implementation may raise and may be > ignored, depending on what Exception the bug triggers -- big deal. The > only time it would matter is when the implementer is debugging the > implementation. Yet the idea behind robust exception handling is to test as little as possible and only catch what you know how to correct. This code catches only one thing, only at one place, and we know how to deal with it: try: fsp = obj.__fspath__ except AttributeError: pass else: fsp = fsp() Contrarily, this next code catches the same error, but it could happen at the one place we know how to deal with it *or* anywhere further down the call stack where we have no clue what the proper course is to handle the problem... yet we suppress it anyway: try: fsp = obj.__fspath__() except AttributeError: pass Certainly not code I want to see in the stdlib. -- ~Ethan~ From random832 at fastmail.us Wed Apr 13 19:55:32 2016 From: random832 at fastmail.us (Random832) Date: Wed, 13 Apr 2016 19:55:32 -0400 Subject: [Python-Dev] pathlib - current status of discussions Message-ID: <20160414003711.079666800CE@frontend2.nyi.internal> On Apr 13, 2016 19:06, Brett Cannon wrote: > On Wed, 13 Apr 2016 at 15:46 Nikolaus Rath wrote: >> When passing an object that is of type str and has a __fspath__ >> attribute, all approaches return the value of __fspath__(). >> >> However, when passing something of type bytes, the second approach >> returns the object, while the third returns the value of __fspath__(). >> >> Is this intentional? I think a __fspath__ attribute should always be >> preferred. > > > It's very much intentional. If we define __fspath__() to only return strings but still want to minimize boilerplate of allowing bytes to simply pass through without checking a path argument to see if it is bytes then approach #2 is warranted. But if __fspath__() can return bytes then approach #3 allows for it.? Er, the difference comes in when the object passed to os.fspath is a subclass of bytes that, itself, has a __fspath__ method (which may return a str). It's unlikely to occur in the wild, but is a semantic difference between this case and all other objects with __fspath__ methods. From random832 at fastmail.us Wed Apr 13 20:25:28 2016 From: random832 at fastmail.us (Random832) Date: Wed, 13 Apr 2016 20:25:28 -0400 Subject: [Python-Dev] pathlib - current status of discussions Message-ID: <20160414003712.1B876680160@frontend2.nyi.internal> An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Apr 13 22:49:09 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 12:49:09 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: On 14 April 2016 at 07:37, Victor Stinner wrote: > Le mercredi 13 avril 2016, Brett Cannon a ?crit : >> >> All of this is demonstrated in >> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by the >> various possibilities. In the end it's not a corner case because the >> definition of __fspath__ will be such that there's no ambiguity in what >> os.fspath() will accept and what __fspath__ can return and the code will be >> written to conform to what the PEP dictates (IOW I'm aware that this needs >> to be considered in the implementation :) . > > I'm not a big fan of a flag parameter to change the return type of a > function. Usually, two functions are preferred. In the os module we have > getcwd/getcwdb for example. I don't know if it's a good example It is, as one of the benefits of the "two separate functions" model is to improve type inference during static analysis - you don't necessarily know the values of parameters at analysis time, but you do know which function is being called. > Do you know other examples of Python functions taking a (flag) parameter to > change the result type? subprocess.Popen has a couple of flags that can do that (more precisely, they change the return type of some methods on the resulting object), but that's not an especially pretty API in general. String based type variations are more common (e.g. file mode flags, using the codec module registry), but they're still used only sparingly (since they make the code harder to reason about for both humans and static analysers). In terms of types for filesystem path APIs: 1. I assume we'll want a fast path for bytes & str to avoid performance regressions (especially in os.path, where we may be doing pure data manipulation without any IO operations) 2. I favour defining __fspath__ and os.fspath() in terms of what the os and os.path modules need to handle both DirEntry and pathlib (which I currently expect to be str-or-bytes) 3. For the benefit of higher level cross-platform code like pathlib, it likely makes sense to also have a str-only API that throws an exception rather than returning bytes However, I also suggest deferring a decision on 3 until 2 has been definitively answered by way of implementing the changes. If I'm right about 2, then the API could be something like: - os.fspath -> str-or-bytes - os.fsencode -> bytes (with coercion from str) - os.fsdecode -> str (with coercion from bytes) - os.strpath -> str (no coercion) It's also worth noting that os.fsencode and os.fsdecode are already idempotent - their current signatures are "str-or-bytes -> bytes" and "str-or-bytes -> str". With a str-or-bytes return type on os.fspath, adapting them to handle rich path objects should just be a matter of adding an os.fspath call as the first step. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From Nikolaus at rath.org Wed Apr 13 22:57:57 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 13 Apr 2016 19:57:57 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570ECF2E.1070004@stoneleaf.us> (Ethan Furman's message of "Wed, 13 Apr 2016 15:58:54 -0700") References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us> Message-ID: <87oa9c3nii.fsf@vostro.rath.org> On Apr 13 2016, Ethan Furman wrote: > On 04/13/2016 03:45 PM, Nikolaus Rath wrote: > >> When passing an object that is of type str and has a __fspath__ >> attribute, all approaches return the value of __fspath__(). >> >> However, when passing something of type bytes, the second approach >> returns the object, while the third returns the value of __fspath__(). >> >> Is this intentional? I think a __fspath__ attribute should always be >> preferred. > > Yes, it is intentional. The second approach assumes __fspath__ can > only contain str, so there is no point in checking it for bytes. Either I haven't understood your answer, or you haven't understood my question. I'm concerned about this case: class Special(bytes): def __fspath__(self): return 'str-val' obj = Special('bytes-val', 'utf8') path_obj = fspath(obj, allow_bytes=True) With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. I would expect that fspath(obj, allow_bytes=True) == 'str-val' (after all, it's allow_bytes, not require_bytes). Bu Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From ncoghlan at gmail.com Wed Apr 13 23:04:09 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 13:04:09 +1000 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: On 14 April 2016 at 08:26, Victor Stinner wrote: > 2016-04-14 0:11 GMT+02:00 Ryan Gonzalez : >> So code that depends on iterating through bytecode via HAS_ARG is going to >> break... > > Sure. This change is backward incompatible for applications parsing > bytecode in C or Python. That's why the patch also has to update the > dis module. > > I don't see how you plan to keep the backwad compatibility, since the > argument size changed from 2 bytes to 1 byte. You must update your > code (written in C or Python or whatever). > > Hopefully, the dis was enhanced in Python 3.4: get_instructions() now > gives nice Instructon objects rather than only pure text output. > > FYI I wrote my own library to decode and decode bytecode. It provides > abstract bytecode objects to easily modify bytecode: > https://bytecode.readthedocs.org/ > > I suggest to use such library (or simply the dis module for simple > needs) if you have to handle bytecode, rather than writing your own > code. > > I know a few other projects which handle directly bytecode: > > * https://pypi.python.org/pypi/codetransformer > * https://github.com/serprex/byteplay > * https://pypi.python.org/pypi/coverage > > IHMO it's not a big deal to update these projects for the future > Python 3.6. I can even help them to support the new bytecode format. +1 We've also had previous discussions on adding a "minimum viable bytecode editing" API to the standard library, and updating these third party modules to support wordcode instead of bytecode could provide a good use-case-driven opportunity for defining that (i.e. it wouldn't be about providing an end user facing API directly, but rather about letting CPython take care of the bookkeeping details for things like lnotab and sorting out jump targets). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Wed Apr 13 23:14:44 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 13 Apr 2016 20:14:44 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <87oa9c3nii.fsf@vostro.rath.org> References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us> <87oa9c3nii.fsf@vostro.rath.org> Message-ID: <570F0B24.50705@stoneleaf.us> On 04/13/2016 07:57 PM, Nikolaus Rath wrote: > On Apr 13 2016, Ethan Furman wrote: >> On 04/13/2016 03:45 PM, Nikolaus Rath wrote: >>> When passing an object that is of type str and has a __fspath__ >>> attribute, all approaches return the value of __fspath__(). >>> >>> However, when passing something of type bytes, the second approach >>> returns the object, while the third returns the value of __fspath__(). >>> >>> Is this intentional? I think a __fspath__ attribute should always be >>> preferred. >> >> Yes, it is intentional. The second approach assumes __fspath__ can >> only contain str, so there is no point in checking it for bytes. > > Either I haven't understood your answer, or you haven't understood my > question. I'm concerned about this case: > > class Special(bytes): > def __fspath__(self): > return 'str-val' > obj = Special('bytes-val', 'utf8') > path_obj = fspath(obj, allow_bytes=True) > > With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. I misunderstood your question. That is... an interesting case. ;) -- ~Ethan~ From ncoghlan at gmail.com Wed Apr 13 23:17:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 13:17:36 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: On 14 April 2016 at 12:49, Nick Coghlan wrote: > The API could be something like: > > - os.fspath -> str-or-bytes > - os.fsencode -> bytes (with coercion from str) > - os.fsdecode -> str (with coercion from bytes) > - os.strpath -> str (no coercion) There seems to be fairly broad opposition to the idea of defining the public API in terms of what os and os.path are likely to need, which reminded me of Koos's suggestion of using a private API for the str-or-bytes variant. That approach would give us something like: - os.fspath -> str (no coercion) - os.fsdecode -> str (with coercion from bytes) - os.fsencode -> bytes (with coercion from str) - os._raw_fspath -> str-or-bytes (no coercion) (with "coercion" referring to how the result of __fspath__ and any directly passed in str or bytes objects are handled) The leading underscore on _raw_fspath would be of the "this is a documented and stable API, but you probably don't want to use it unless you really know what you're doing" variety, rather than the "this is an undocumented and potentially unstable private API" variety. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Apr 13 23:27:41 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 13:27:41 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570F0B24.50705@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us> <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us> Message-ID: On 14 April 2016 at 13:14, Ethan Furman wrote: > On 04/13/2016 07:57 PM, Nikolaus Rath wrote: >> Either I haven't understood your answer, or you haven't understood my >> question. I'm concerned about this case: >> >> class Special(bytes): >> def __fspath__(self): >> return 'str-val' >> obj = Special('bytes-val', 'utf8') >> path_obj = fspath(obj, allow_bytes=True) >> >> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. > > I misunderstood your question. That is... an interesting case. ;) In this kind of case, inheritance tends to trump protocol. For example, int subclasses can't override operator.index: >>> from operator import index >>> class NotAnInt(): ... def __index__(self): ... return 42 ... >>> index(NotAnInt()) 42 >>> class MyInt(int): ... def __index__(self): ... return 42 ... >>> index(MyInt(53)) 53 The reasons for that behaviour are more pragmatic than philosophical: builtins and their subclasses are extensively special-cased for speed reasons, and those shortcuts are encountered before the interpreter even considers using the general protocol. In cases where the magic method return types are polymorphic (so subclasses may want to override them) we'll use more restrictive exact type checks for the shortcuts, but that argument doesn't apply for typechecked protocols where the result is required to be an instance of a particular builtin type (but subclasses are considered acceptable). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.com Wed Apr 13 23:54:52 2016 From: random832 at fastmail.com (Random832) Date: Wed, 13 Apr 2016 23:54:52 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: <1460606092.516946.578278417.49F19066@webmail.messagingengine.com> On Wed, Apr 13, 2016, at 23:17, Nick Coghlan wrote: > - os.fspath -> str (no coercion) > - os.fsdecode -> str (with coercion from bytes) > - os.fsencode -> bytes (with coercion from str) > - os._raw_fspath -> str-or-bytes (no coercion) > > (with "coercion" referring to how the result of __fspath__ and any > directly passed in str or bytes objects are handled) > > The leading underscore on _raw_fspath would be of the "this is a > documented and stable API, but you probably don't want to use it > unless you really know what you're doing" variety, rather than the > "this is an undocumented and potentially unstable private API" > variety. In this scenario could the protocol return bytes? If the protocol cannot return bytes, then _raw_fspath will only return bytes if directly passed bytes. This limits its utility for the functions that consume it (presumably path_convert (os.open and friends) and builtin open), since they already have to act specially based on the types of their arguments (builtin open can accept an integer; path_convert has to behave radically differently on str or bytes input) and there's no reason they couldn't simply accept bytes directly while they're doing that. If the protocol can return bytes, then that means that types (DirEntry? someone had an alternate path library with a bPath?) which return bytes via the protocol will proliferate, and cannot be safely passed to anything that uses os.fspath. Numerous copies of "def myfspath(x): return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just monkey-patch os.fspath), and no-one actually uses os.fspath except toy examples. Why is it so objectionable for os.fspath to do coercion? From random832 at fastmail.com Thu Apr 14 00:05:43 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 00:05:43 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us> <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us> Message-ID: <1460606743.519577.578286961.5D1CB3F1@webmail.messagingengine.com> On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote: > In this kind of case, inheritance tends to trump protocol. For > example, int subclasses can't override operator.index: ... > The reasons for that behaviour are more pragmatic than philosophical: > builtins and their subclasses are extensively special-cased for speed > reasons, and those shortcuts are encountered before the interpreter > even considers using the general protocol. > > In cases where the magic method return types are polymorphic (so > subclasses may want to override them) we'll use more restrictive exact > type checks for the shortcuts, but that argument doesn't apply for > typechecked protocols where the result is required to be an instance > of a particular builtin type (but subclasses are considered > acceptable). Then why aren't we doing it for str? Because "try: path = path.__fspath__()" is more idiomatic than the alternative? If some sort of reasoned decision has been made to require the protocol to trump the special case for str subclasses, it's unreasonable not to apply the same decision to bytes subclasses. The decision should be "always use the protocol first" or "always use the type match first". In other words, why not this: def fspath(path, *, allow_bytes=False): if isinstance(path, (bytes, str) if allow_bytes else str) return path try: m = path.__fspath__ except AttributeError: raise TypeError path = m() if isinstance(path, (bytes, str) if allow_bytes else str) return path raise TypeError From ncoghlan at gmail.com Thu Apr 14 02:00:22 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 16:00:22 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <1460606092.516946.578278417.49F19066@webmail.messagingengine.com> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <1460606092.516946.578278417.49F19066@webmail.messagingengine.com> Message-ID: On 14 April 2016 at 13:54, Random832 wrote: > On Wed, Apr 13, 2016, at 23:17, Nick Coghlan wrote: > >> - os.fspath -> str (no coercion) >> - os.fsdecode -> str (with coercion from bytes) >> - os.fsencode -> bytes (with coercion from str) >> - os._raw_fspath -> str-or-bytes (no coercion) >> >> (with "coercion" referring to how the result of __fspath__ and any >> directly passed in str or bytes objects are handled) >> >> The leading underscore on _raw_fspath would be of the "this is a >> documented and stable API, but you probably don't want to use it >> unless you really know what you're doing" variety, rather than the >> "this is an undocumented and potentially unstable private API" >> variety. > > In this scenario could the protocol return bytes? Yes, that's desirable to handle DirEntry transparently regardless of type. > If the protocol can return bytes, then that means that types (DirEntry? > someone had an alternate path library with a bPath?) which return bytes > via the protocol will proliferate, and cannot be safely passed to > anything that uses os.fspath. Numerous copies of "def myfspath(x): > return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just > monkey-patch os.fspath), and no-one actually uses os.fspath except toy > examples. If folks want coercion, they can just use os.fsdecode(x), as that already has a str -> str passthrough from the input to the output (unlike codecs.decode) and will presumably be updated to include an implicit call to os._raw_fspath() on the passed in object. > Why is it so objectionable for os.fspath to do coercion? The first problem is that binary paths on Windows basically don't work, so it's preferable for them to fail fast regardless of platform, rather than to have them implicitly work on *nix, only to fail for Windows users using non-ASCII paths later. The second is that it would make os.fspath and os.fsdecode functionally equivalent, so we'd have two different spellings for the same operation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Apr 14 02:09:17 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 16:09:17 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460606743.519577.578286961.5D1CB3F1@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us> <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us> <1460606743.519577.578286961.5D1CB3F1@webmail.messagingengine.com> Message-ID: On 14 April 2016 at 14:05, Random832 wrote: > On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote: >> In this kind of case, inheritance tends to trump protocol. For >> example, int subclasses can't override operator.index: > ... >> The reasons for that behaviour are more pragmatic than philosophical: >> builtins and their subclasses are extensively special-cased for speed >> reasons, and those shortcuts are encountered before the interpreter >> even considers using the general protocol. >> >> In cases where the magic method return types are polymorphic (so >> subclasses may want to override them) we'll use more restrictive exact >> type checks for the shortcuts, but that argument doesn't apply for >> typechecked protocols where the result is required to be an instance >> of a particular builtin type (but subclasses are considered >> acceptable). > > Then why aren't we doing it for str? Because "try: path = > path.__fspath__()" is more idiomatic than the alternative? The sketches Brett posted will bear little resemblance to the actual implementation - that will be in C and use similar idioms to those we use for other abstract protocols (such as shortcuts for instances of builtin types, and doing the method lookup via the passed in object's type, rather than on the instance). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Thu Apr 14 02:55:49 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 14 Apr 2016 15:55:49 +0900 Subject: [Python-Dev] Pathlib enhancements - improve fsdecode and fsencode In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: <22287.16117.707682.669635@turnbull.sk.tsukuba.ac.jp> Please please please, junk both "filter out bytes" proposals. Since they involve an exception, they impose an unnecessary "try" on all text applications that fear death on bytes returns. May as well just wrap all objects with __fspath__ in fsdecode, and all is happy. Counterproposal: make fsdecode and fsencode grok __fspath__. Then: (1) Bytes-lovers and str-addicts are both safe. (2) They can omit fspath, too! No, that doesn't work if the bytes objects aren't in the file system encoding, but these are *bytes*, mon ami: you have no way to find out what that encoding is, so you either know already and you substitute that + fspath for fsdecode, or you're hosed. And in the only concrete use case so far, fsdecode Just Works. I suppose a similar argument holds for applications that want bytes and fsencode, but I leave that as an exercise for the reader. From stephen at xemacs.org Thu Apr 14 03:02:36 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 14 Apr 2016 16:02:36 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> I was going to read the new posts that came in since I started this one (at one point it was 5X as long as it is now), but this thread is way out of control. My apologies to anybody who has presented[1] use cases in support of the wildly speculative proposals under discussion, but my bet is that there have been none. Victor Stinner writes: > Oops sorry, I forgot to add that I have no strong opinion on the type (I > only have a minor preference for str only). I have a strong preference for str only, because I still don't see a use case for polymorphic __fspath__. os functions and os.path functions need to *accept* both str and bytes because they are interfaces to OS functionality used by both text and non-text applications, and so must check and convert to OS native type. Many of these function produce what they receive because both text and non-text applications use names of filesystem objects internally, as well as passing them to OS wrappers. The question is how far to take that logic. So let me propose what I think is the elephant in the room. If you're going to have a polymorphic __fspath__, then pathlib is *the* example of a module that *desperately* needs to be polymorphic. Consider: A non-text Application has some bytes and passes them to pathlib.Path as manipulates them and passes the result to os.scandir as expecting a return of DirEntries of == == bytes, and == Path is TOOWTDI, no? But under the current proposal which doesn't touch the internal mechanisms of pathlib and allows, but has no way to request, bytes returns, == str, == Path, and == str, requiring two explicit conversions that bytes-shoveling developers will tell you should be unnecessary. QED, pathlib should be polymorphic as a central part of this proposal. IMO that's not the right way to go (slippery slope, very quickly you hit manipulations that are "really" text operations). See also my proposal "Pathlib enhancements - improve fsdecode and fsencode" which suggests a (primitive) way for code to request the type it likes better. But WDOT? I'd especially like to hear if Nick is tempted to flip-flop (so far he's been in the "pathlib is a text utility" camp). Footnotes: [1] Just because I don't know of any I consider persuasive doesn't mean there aren't any, but what you don't tell me I don't know. (Maybe you'd have to kill me? If so, thanks for not telling!) From cybersol at yahoo.com Thu Apr 14 03:03:00 2016 From: cybersol at yahoo.com (Michael Mysinger) Date: Thu, 14 Apr 2016 07:03:00 +0000 (UTC) Subject: [Python-Dev] pathlib - current status of discussions References: <570C1E13.4090909@stoneleaf.us> Message-ID: Brett Cannon python.org> writes: > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1?has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed).? > Thanks Brett, it is definitely a start! Maybe I am just more unimaginative than most, but since interoperability is the goal, I would ideally be able to play with a full implementation where all the stdlib functions Nick originally mentioned accepted these "rich path" objects. However, for concrete example purposes, maybe it is sufficient to start with your fspath function, a toy RichPath class implementing __fspath__, and something like os.path.join, which is a meaty enough example to test some of the functionality. I posted a gist of a string only example at https://gist.github.com/mmysinger/0b5ae2cfb866f7013c387a2683c7fc39 After playing with and considering the 4 possibilities, anything where __fspath__ can return bytes seems like insanity that flies in the face of everything Python 3 is trying to accomplish. In particular, one RichPath class might return bytes and another str, or even worse the same class might sometimes return bytes and sometimes str. When will os.path.join blow up due to mixing bytes and str and when will it work in those situations? So for me that eliminates #3 and #4. Also the version #2 accepting bytes in os.fspath felt like it could be a very minor convenience, but even the str only version #1 is just requires one isinstance check in the rare case you need to also deal with bytes (see the os.path.join example in the gist above). So I lean toward the str only #1 version. In any case I would start with the strict str only full implementation and loosen it either in 3.6 or 3.7 depending on what people think after actually using it. From storchaka at gmail.com Thu Apr 14 04:36:29 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 14 Apr 2016 11:36:29 +0300 Subject: [Python-Dev] Bytes path Message-ID: What types should be accepted as bytes path? For now os.path is strict and accepts only bytes and bytes subclasses (even bytearray is not accepted) as bytes path. This is enough for working with low-level Posix paths and supporting backward compatibility. On other hand, most os functions is too permissive since 3.3 and accept any type that supports the buffer protocol as bytes path. Accepted even such meaningless objects as array('h'). Some functions (zipimport.zipimporter() in 3.x, _imp.load_dynamic() in 3.3+, builtin compile() etc in 3.4) accept even arbitrary iterables, e.g. [116, 101, 115, 116] (see http://bugs.python.org/issue26754). I think we should accept only bytes (and subclasses). Even bytearray is less acceptable since it is mutable and can't be used as a key in caches. From storchaka at gmail.com Thu Apr 14 04:51:53 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 14 Apr 2016 11:51:53 +0300 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: On 13.04.16 07:39, Terry Reedy wrote: > On 4/4/2016 5:05 PM, Terry Reedy wrote: > > Since a few days, I am getting bug tracker emails again, in my Inbox. I > just got a Rietveld review in the Inbox and I believe it went there > directly instead of first to Junk. Thank you to whoever made the > improvements. AFAIK David just disabled IPv6 support. Most bug tracker emails still went in the Spam folder. I have a filter for Roundap emails, but there is no any mark that I can use for filtering Rietveld emails. From storchaka at gmail.com Thu Apr 14 05:15:01 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 14 Apr 2016 12:15:01 +0300 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On 13.04.16 14:40, Victor Stinner wrote: > Last months, most 3.x buildbots failed randomly. Some of them were > always failing. I spent some time to fix almost all Windows and Linux > buildbots. There were a lot of different issues. Excelent! Many thanks for doing this. And new features of regrtest look nice. > So please try to not break buildbots again and remind to watch them sometimes: > > http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable A desirable but nonexistent feature is to write emails to authors of commits that broke buildbots. How hard to implement this? > Next weeks, I will try to backport some fixes to Python 3.5 (if > needed) to make these buildbots more stable too. > > Python 2.7 buildbots are also in a sad state (ex: test_marshal > segfaults on Windows, see issue #25264). But it's not easy to get a > Windows with the right compiler to develop on Python 2.7 on Windows. What are you think about backporting recent regrtest to 2.7? Most needed features to me are the -m and -G options. > Maybe it's time to move more 3.x buildbots to the "stable" category? > http://buildbot.python.org/all/waterfall?category=3.x.stable +1 > By the way, I don't understand why "AMD64 OpenIndiana 3.x" is > considered as stable since it's failing with multiple issues since > many months and nobody is working on these failures. I suggest to move > this buildbot back to the unstable category. I think the main cause is the lack of memory in this buildbot. I tried to minimize memory consumption and leaks, but some leaks are left, and they provoke other tests failures, and additional resource leaks. Would be nice to add a feature for running every test in separate subprocess. This will isolate the effect of failed tests. From p.f.moore at gmail.com Thu Apr 14 06:07:49 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 14 Apr 2016 11:07:49 +0100 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> Message-ID: On 14 April 2016 at 08:02, Stephen J. Turnbull wrote: > So let me propose what I think is the elephant in the room. If you're > going to have a polymorphic __fspath__, then pathlib is *the* example > of a module that *desperately* needs to be polymorphic. Consider: > > A non-text Application has some bytes and passes them to > pathlib.Path as > manipulates them and passes the result to > os.scandir as > expecting a return of > DirEntries of > > == == bytes, and == Path is TOOWTDI, no? I'm not sure I follow this logic at all. But from my reading your argument contradicts your conclusion, so maybe I'm misunderstanding. To me, the "obvious" conclusion is that pathlib is not appropriate in non-text applications, because *cannot* be bytes (the constructor rejects bytes). I see no reason to change that - non-text applications are inherently low level, and shouldn't expect to use high-level abstractions like pathlib. > But under the current proposal which doesn't touch the internal > mechanisms of pathlib and allows, but has no way to request, bytes > returns, == str, == Path, and == str, > requiring two explicit conversions that bytes-shoveling developers > will tell you should be unnecessary. QED, pathlib should be > polymorphic as a central part of this proposal. Nope, QED pathlib is not a low level abstraction. So your argument to me doesn't help much, because it's a given that pathlib is str-only. The debate is about how things like scandir (specifically DirEntry objects) and Ethan's pathlib replacement, which *do* allow bytes in and out, should participate in the new protocol, when they are bytes (they obviously should work just like pathlib when they are strings). In my opinion, they *shouldn't* the new protocol should be string-only (at least initially). If I understand (from a couple of brief mentions) Ethan has a string-like path object and a bytes-like path object, so he could support fspath on the string-like one but not the bytes-like one. He may not like having slightly different APIs for the two types, I don't know, but it's possible. But DirEntry is polymorphic, so it *will* have a __fspath__ method, and needs to know what to do when it's bytes-like (I guess with a bit of getattr hacking DirEntry *could* expose a __fspath__ method only if it's string-like, but that seems like a pretty gross hack). So: 1. pathlib remains string-like, and is the canonical example of __fspath__, returns strings only 2. DirEntry is the only other example of the protocol in the stdlib, but is polymorphic 3. I'm not aware of any 3rd party library that has polymorphic classes (Ethan can correct me if I'm wrong here) So the only purpose I know of for discussing __fspath__ returning bytes is for scandir, and hypothetical polymorphic 3rd party path abstractions (and possibly Ethan's preference to have a common API for his 2 classes). I propose we should have a string-only __fspath__ protocol in 3.6. Bytes-format DirEntry objects can raise an error in __fspath__. If it becomes obvious with usage that we need bytes support in __fspath__ we can add it (compatibly - string-only code wouldn't need to change) in 3.7. That seems far better to me than trying to design bytes support without actual use cases. Paul From vadmium+py at gmail.com Thu Apr 14 06:21:42 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Thu, 14 Apr 2016 10:21:42 +0000 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On 14 April 2016 at 09:15, Serhiy Storchaka wrote: > On 13.04.16 14:40, Victor Stinner wrote: >> By the way, I don't understand why "AMD64 OpenIndiana 3.x" is >> considered as stable since it's failing with multiple issues since >> many months and nobody is working on these failures. I suggest to move >> this buildbot back to the unstable category. > > I think the main cause is the lack of memory in this buildbot. I tried to > minimize memory consumption and leaks, but some leaks are left, and they > provoke other tests failures, and additional resource leaks. Would be nice > to add a feature for running every test in separate subprocess. This will > isolate the effect of failed tests. Last time I looked into the Open Indiana buildbot, I concluded that the biggest problem was Python using fork() to spawn subprocesses. I understand that OS does not do ?memory overcommitment? like Linux does, so every time you fork, the OS has to double the amount of memory that is reserved. It is ironic, but running each test using the current subprocess module (which uses fork) would probably make the problem worse. I suspect using posix_spawn() if possible would help a lot. But this was rejected in for not being flexible enough and making maintainence too complicated. From victor.stinner at gmail.com Thu Apr 14 06:25:40 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 12:25:40 +0200 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: Le 14 avr. 2016 11:16 AM, "Serhiy Storchaka" a ?crit : > A desirable but nonexistent feature is to write emails to authors of commits that broke buildbots. How hard to implement this? Yeah I also had this idea since many years but buildbots were quite unstable. Maybe we should be more strict to consider a buildbot as stable? I propose to experiment sending notifications of failure to the authors of changes *and* to a new mailing list. I would subscribe to such list. An even safer starting point would be to only start with the mailing list. FYI I'm connected to the #python-dev IRC channel which already contain these notifications. But I agree that mails are better. > What are you think about backporting recent regrtest to 2.7? Most needed features to me are the -m and -G options. Regrtest changed a lot in python 3.6 (new test.libregrtest library). I suggest to start from python 3.5. For -m: if it doesn't need to modify the unittest module, I agree. I don't know -G option. > Would be nice to add a feature for running every test in separate subprocess. This will isolate the effect of failed tests. See my email :-) I proposed to modify -j1 to run tests in subrpocesses. I even mentionned my issue. I suggest to use -jN on all buildbot, at least -j1. Maybe -j2 is even better since many tests are waiting on IO or simple sleep. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Apr 14 06:29:21 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 12:29:21 +0200 Subject: [Python-Dev] Bytes path In-Reply-To: References: Message-ID: IMHO it's more a side effect of the implementation than a deliberate choice. For new code which really want to support bytes paths, I suggest to only accept bytes and bytes subclasses. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Apr 14 06:32:05 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 12:32:05 +0200 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: Le 14 avr. 2016 10:53 AM, "Serhiy Storchaka" a ?crit : > Most bug tracker emails still went in the Spam folder. I have a filter for Roundap emails, but there is no any mark that I can use for filtering Rietveld emails. I'm using the base URL of Rietveld and match it in the mail body. Gmail filters have an option to never mark emails as spam. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From vadmium+py at gmail.com Thu Apr 14 06:33:19 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Thu, 14 Apr 2016 10:33:19 +0000 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: On 14 April 2016 at 08:51, Serhiy Storchaka wrote: > On 13.04.16 07:39, Terry Reedy wrote: >> >> On 4/4/2016 5:05 PM, Terry Reedy wrote: >> >> Since a few days, I am getting bug tracker emails again, in my Inbox. I >> just got a Rietveld review in the Inbox and I believe it went there >> directly instead of first to Junk. Thank you to whoever made the >> improvements. > > > AFAIK David just disabled IPv6 support. > > Most bug tracker emails still went in the Spam folder. I have a filter for > Roundap emails, but there is no any mark that I can use for filtering > Rietveld emails. FWIW I set up the following filter in Gmail for Rietveld reviews: Matches: http://bugs.python.org/review Do this: Never send it to Spam I suspect it helps, but occasionally I think stuff still goes to spam. (Just don?t tell this secret rule to actual spammers :) From storchaka at gmail.com Thu Apr 14 07:01:37 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 14 Apr 2016 14:01:37 +0300 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: On 14.04.16 13:33, Martin Panter wrote: > On 14 April 2016 at 08:51, Serhiy Storchaka wrote: >> Most bug tracker emails still went in the Spam folder. I have a filter for >> Roundap emails, but there is no any mark that I can use for filtering >> Rietveld emails. > > FWIW I set up the following filter in Gmail for Rietveld reviews: > > Matches: http://bugs.python.org/review > Do this: Never send it to Spam > > I suspect it helps, but occasionally I think stuff still goes to spam. > (Just don?t tell this secret rule to actual spammers :) Thank you and Victor for this advise. But this filter is not quite robust, for example it will cause this mail to be moved to the folder for Rietveld reviews. I was going to try a different approach, append "+py" to my address for the tracker, as in your address. From victor.stinner at gmail.com Thu Apr 14 07:26:12 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 13:26:12 +0200 Subject: [Python-Dev] Not receiving bug tracker emails In-Reply-To: References: Message-ID: 2016-04-14 13:01 GMT+02:00 Serhiy Storchaka : > But this filter is not quite robust, for example it will cause this mail to > be moved to the folder for Rietveld reviews. Right, it's just a workaround since I'm unable to fix the root cause (emails marked as spam which looks like a configuration issue in the SMTP server.) From ncoghlan at gmail.com Thu Apr 14 07:44:58 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 21:44:58 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> Message-ID: On 14 April 2016 at 17:02, Stephen J. Turnbull wrote: > But WDOT? I'd especially like to hear if Nick is tempted to flip-flop > (so far he's been in the "pathlib is a text utility" camp). pathlib is too high level (i.e. has too many dependencies) to be used in low level boundary code. The use case for returning bytes from __fspath__ is DirEntry, so you can write things like this in low level code: def myscandir(dirpath): for entry in os.scandir(dirpath): if entry.is_file(): with open(entry) as f: # do something and still have them automatically inherit the str/bytes handling of the core standard library APIs. By contrast, as soon as you type "import pathlib" at the top of your file, you've stepped outside the world of potentially pure boundary code, and are instead dealing with structured application level objects (which means traversing the bytes->str boundary before the str->Path one). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Thu Apr 14 08:02:55 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 14:02:55 +0200 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: Le jeudi 14 avril 2016, Nick Coghlan a ?crit : > > > IHMO it's not a big deal to update these projects for the future > > Python 3.6. I can even help them to support the new bytecode format. > > We've also had previous discussions on adding a "minimum viable > bytecode editing" API to the standard library, and updating these > third party modules to support wordcode instead of bytecode could > provide a good use-case-driven opportunity for defining that (i.e. it > wouldn't be about providing an end user facing API directly, but > rather about letting CPython take care of the bookkeeping details for > things like lnotab and sorting out jump targets). Yeah, I know well this discussion since it started with my PEP 511. I wrote the bytecode as a tool for the discussion, to try to understand better the use case. The main task was to design the API. I first looked at byteplay and codetranformer projects, but I found some issues in their design. Their API has some design issues. IMHO their API is not the best to modify bytecode. My goal is to support Bytecode.from_code(code).to_code()==code: store enough information to be able to emit again exactly the same bytecode (line numbers, exact argument value, etc.). I started with a long email, but I decided to document differences in bytecode documentation: https://bytecode.readthedocs.org/en/latest/byteplay_codetransformer.html Victor From victor.stinner at gmail.com Thu Apr 14 08:16:03 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 14:16:03 +0200 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: 2016-04-13 19:10 GMT+02:00 Brett Cannon : > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the > four potential approaches implemented (although it doesn't follow the > "separate functions" approach some are proposing and instead goes with the > allow_bytes approach I originally proposed). IMHO the best argument against the flavor 4 (fspath: str or bytes allowed) is the os.path.join() function. I consider that the final goal of the whole discussion is to support something like: path = os.path.join(pathlib_path, "str_path", direntry) Even if direntry uses a bytes filename. I expect genericpath.join() to be patched to use os.fspath(). If os.fspath() returns bytes, path.join() will fail with an annoying TypeError. I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, just to make my life easier. I recall that I used to say that Python 2 doesn't support Unicode filenames because os.path.join() raises a UnicodeDecodeError when you try to join a Unicode filename with a byte filename which contains non-ASCII bytes. The problem occurs indirectly in code using hardcoded paths, Unicode or bytes paths. Saying that "Python 2 doesn't support Unicode filenames" is wrong, but since Unicode is an hard problem, I tried to simplify my explanation :-) You can apply the same rationale for the flavors 2 and 3 (os.fspath(path, allow_bytes=True)). Indirectly, you will get similar TypeError on os.path.join(). Victor From random832 at fastmail.com Thu Apr 14 08:28:29 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 08:28:29 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <1460606092.516946.578278417.49F19066@webmail.messagingengine.com> Message-ID: <1460636909.4186032.578623185.789BF90D@webmail.messagingengine.com> On Thu, Apr 14, 2016, at 02:00, Nick Coghlan wrote: > > If the protocol can return bytes, then that means that types (DirEntry? > > someone had an alternate path library with a bPath?) which return bytes > > via the protocol will proliferate, and cannot be safely passed to > > anything that uses os.fspath. Numerous copies of "def myfspath(x): > > return os.fsdecode(os._raw_fspath(x))" will proliferate (or they'll just > > monkey-patch os.fspath), and no-one actually uses os.fspath except toy > > examples. > > If folks want coercion, they can just use os.fsdecode(x), as that > already has a str -> str passthrough from the input to the output > (unlike codecs.decode) and will presumably be updated to include an > implicit call to os._raw_fspath() on the passed in object. This is the first I've heard of any suggestion to have fsdecode accept non-strings. > > Why is it so objectionable for os.fspath to do coercion? > > The first problem is that binary paths on Windows basically don't > work, so it's preferable for them to fail fast regardless of platform, > rather than to have them implicitly work on *nix, only to fail for > Windows users using non-ASCII paths later. Ideally, this warning would be raised from a central place, and even fspath (and even fsdecode) would go through it. From random832 at fastmail.com Thu Apr 14 08:33:23 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 08:33:23 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> Message-ID: <1460637203.4187117.578627337.3BC93F7D@webmail.messagingengine.com> On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote: > I have a strong preference for str only, because I still don't see a > use case for polymorphic __fspath__. Ultimately we're talking about redundancy and performance here. The "use case" such as there is one, is if there's a class (be it DirEntry or whatever else) that natively stores bytes, and __fspath__ has to return str, then it calls fsdecode and then open immediately turns around and calls fsencode on the result, accomplishing nothing vs just passing everything straight through. From ncoghlan at gmail.com Thu Apr 14 09:40:33 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Apr 2016 23:40:33 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: On 14 April 2016 at 22:16, Victor Stinner wrote: > 2016-04-13 19:10 GMT+02:00 Brett Cannon : >> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the >> four potential approaches implemented (although it doesn't follow the >> "separate functions" approach some are proposing and instead goes with the >> allow_bytes approach I originally proposed). > > IMHO the best argument against the flavor 4 (fspath: str or bytes > allowed) is the os.path.join() function. > > I consider that the final goal of the whole discussion is to support > something like: > > path = os.path.join(pathlib_path, "str_path", direntry) That's not a *new* problem though, it already exists if you pass in a mix of bytes and str: >>> import os.path >>> os.path.join("str", b"bytes") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.4/posixpath.py", line 89, in join "components") from None TypeError: Can't mix strings and bytes in path components There's also already a solution (regardless of whether you want bytes or str as the result), which is to explicitly coerce all the arguments to the same type: >>> os.path.join(*map(os.fsdecode, ("str", b"bytes"))) 'str/bytes' >>> os.path.join(*map(os.fsencode, ("str", b"bytes"))) b'str/bytes' Assuming os.fsdecode and os.fsencode are updated to call os.fspath on their argument before continuing with the current logic, the latter two forms would both start automatically handling both DirEntry and pathlib objects, while the first form would continue to throw TypeError if handed an unexpected bytes value (whether directly or via an __fspath__ call). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.com Thu Apr 14 09:45:41 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 09:45:41 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote: > That's not a *new* problem though, it already exists if you pass in a > mix of bytes and str: > > There's also already a solution (regardless of whether you want bytes > or str as the result), which is to explicitly coerce all the arguments > to the same type: It'd be nice if that went away. Having to do that makes about as much sense to me as if you had to explicitly coerce an int to a float to add them together. Sure, explicit is better than implicit, but there are limits. You're explicitly calling os.path.join; isn't that explicit enough? From rosuav at gmail.com Thu Apr 14 09:50:57 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 14 Apr 2016 23:50:57 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> Message-ID: On Thu, Apr 14, 2016 at 11:45 PM, Random832 wrote: > On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote: >> That's not a *new* problem though, it already exists if you pass in a >> mix of bytes and str: >> >> There's also already a solution (regardless of whether you want bytes >> or str as the result), which is to explicitly coerce all the arguments >> to the same type: > > It'd be nice if that went away. Having to do that makes about as much > sense to me as if you had to explicitly coerce an int to a float to add > them together. Sure, explicit is better than implicit, but there are > limits. You're explicitly calling os.path.join; isn't that explicit > enough? Adding integers and floats is considered "safe" because most people's use of floats completely compasses their use of ints. (You'll get OverflowError if it can't be represented.) But float and Decimal are considered "unsafe": >>> 1.5 + decimal.Decimal("1.5") Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'float' and 'decimal.Decimal' This is more what's happening here. Floats and Decimals can represent similar sorts of things, but with enough incompatibilities that you can't simply merge them. ChrisA From victor.stinner at gmail.com Thu Apr 14 09:56:24 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 15:56:24 +0200 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: 2016-04-14 15:40 GMT+02:00 Nick Coghlan : >> I consider that the final goal of the whole discussion is to support >> something like: >> >> path = os.path.join(pathlib_path, "str_path", direntry) > > That's not a *new* problem though, it already exists if you pass in a > mix of bytes and str: > (...) > There's also already a solution (regardless of whether you want bytes > or str as the result), which is to explicitly coerce all the arguments > to the same type: > >>>> os.path.join(*map(os.fsdecode, ("str", b"bytes"))) > (...) I don't understand. What is the point of adding a new __fspath__ protocol to *implicitly* convert path objects to strings, if you still have to use an explicit conversion? I would really expect that a high-level API like pathlib would solve encodings issues for me. IMHO DirEntry entries created by os.scandir(bytes) must use os.fsdecode() in their __fspath__ method. os.path.join() is just one example of an operation on multiple paths. Look at os.path for other example ;-) > os.path.join(*map(os.fsdecode, ("str", b"bytes"))) This code is quite complex for a newbie, don't you think so? My example was os.path.join(pathlib_path, "str_path", direntry) where we can do something to make the API easier to use. I don't propose to do anything for os.path.join("str", b"bytes") which would continue to fail with TypeError, *as expected*. Victor From random832 at fastmail.com Thu Apr 14 10:01:44 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 10:01:44 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> Message-ID: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote: > Adding integers and floats is considered "safe" because most people's > use of floats completely compasses their use of ints. (You'll get > OverflowError if it can't be represented.) But float and Decimal are > considered "unsafe": > > >>> 1.5 + decimal.Decimal("1.5") > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for +: 'float' and > 'decimal.Decimal' > > This is more what's happening here. Floats and Decimals can represent > similar sorts of things, but with enough incompatibilities that you > can't simply merge them. And what such incompatibilities exist between bytes and str for the purpose of representing file paths? At the end of the day, there's exactly one answer to "what file on disk this represents (or would represent if it existed)". From ethan at stoneleaf.us Thu Apr 14 10:47:20 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 07:47:20 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <570FAD78.60505@stoneleaf.us> On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote: > Brett Cannon writes: > After playing with and considering the 4 possibilities, anything where > __fspath__ can return bytes seems like insanity that flies in the face of > everything Python 3 is trying to accomplish. In particular, one RichPath > class might return bytes and another str, or even worse the same class might > sometimes return bytes and sometimes str. When will os.path.join blow up due > to mixing bytes and str and when will it work in those situations? What are you asking here? Exactly where in os.join mixing bytes & str the exception will occur, or will mixing bytes & str ever work? The answer to the first is irrelevant (except for performance). The answer to the second is always/never. Meaning allowing os.fspath() and __fspath__ to return either bytes or str will never cause the combination of bytes and str to work. Said another way: if you are using os.path.join then all the pieces have be str or all the pieces have to be bytes. -- ~Ethan~ From stephen at xemacs.org Thu Apr 14 10:52:29 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 14 Apr 2016 23:52:29 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> Message-ID: <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > The use case for returning bytes from __fspath__ is DirEntry, so you > can write things like this in low level code: > > def myscandir(dirpath): > for entry in os.scandir(dirpath): > if entry.is_file(): > with open(entry) as f: > # do something Excuse me, but that is *not* a use case for returning bytes from DirEntry.__fspath__. open() is perfectly happy taking str (including surrogate-encoded rawbytes). If the trivial thing is for __fspath__ to return bytes, then implicitly applying os.fsencode to the value being returned is almost as trivial, and just as safe. A low price to pay for ensuring that text applications don't crash just because a bytes-oriented object decides to implement __fspath__. If there's any cost to defining __fspath__ as str-only, it's some other use case. What consumer of __fspath__ that expects bytes but not str do you envision? Is it generalizable, so that applying fsencode to the value of __fspath__ would lead to "unacceptably" widespread sprinkling of fsencode all over bytes-oriented code? The more I think about this, the more I like my proposal to junk fspath, and have fsdecode and fsencode consume __fspath__. That way application code can request its native type. > By contrast, as soon as you type "import pathlib" at the top of your > file, you've stepped outside the world of potentially pure boundary > code, "Potentially pure" is an odd term to apply to the boundary code IMO. We are agreed that conceptually paths are text, for human consumption (at least at last report we were). Therefore, paths represented as bytes are inherently an impure construct. Viz, surrogateescape. > and are instead dealing with structured application level > objects (which means traversing the bytes->str boundary before the > str->Path one). That assumes that pathlib.Path's str-only design is appropriate. I'm questioning that, primarily as a thought experiment. From ethan at stoneleaf.us Thu Apr 14 10:54:39 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 07:54:39 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <570FAF2F.6080304@stoneleaf.us> On 04/14/2016 05:16 AM, Victor Stinner wrote: > I consider that the final goal of the whole discussion is to support > something like: > > path = os.path.join(pathlib_path, "str_path", direntry) > > Even if direntry uses a bytes filename. I expect genericpath.join() to > be patched to use os.fspath(). If os.fspath() returns bytes, > path.join() will fail with an annoying TypeError. > > I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, > just to make my life easier. This would be where we strongly disagree. If pathlib, as a high-level construct, wants to take that approach I have no issues, but the functions in os are low-level and as such should not be changing data types unless I ask for it. I see __fspath__ as a retrieval mechanism, not a data-transformation mechanism. > You can apply the same rationale for the flavors 2 and 3 > (os.fspath(path, allow_bytes=True)). Indirectly, you will get similar > TypeError on os.path.join(). And that's fine. Low-level interfaces should not change data types unless explicitly requested -- and we have fsencode() and fsdecode() for that. -- ~Ethan~ From stephen at xemacs.org Thu Apr 14 10:57:10 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 14 Apr 2016 23:57:10 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <1460637203.4187117.578627337.3BC93F7D@webmail.messagingengine.com> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <1460637203.4187117.578627337.3BC93F7D@webmail.messagingengine.com> Message-ID: <22287.44998.72924.402412@turnbull.sk.tsukuba.ac.jp> Random832 writes: > On Thu, Apr 14, 2016, at 03:02, Stephen J. Turnbull wrote: > > I have a strong preference for str only, because I still don't see a > > use case for polymorphic __fspath__. > > Ultimately we're talking about redundancy and performance here. Ultimately, yes. Right now I have some epithets for you: Premature! Optimization!! Get thee behind me, Satan! More seriously, concrete use cases where this overhead matters? Church-of-Don-Knuth-member-ly y'rs, From nikita at nemkin.ru Thu Apr 14 05:04:34 2016 From: nikita at nemkin.ru (Nikita Nemkin) Date: Thu, 14 Apr 2016 14:04:34 +0500 Subject: [Python-Dev] MAKE_FUNCTION simplification Message-ID: MAKE_FUNCTION opcode is complex due to the way it receives input arguments: 1) default args, individually; 2) default kwonly args, individual name-value pairs; 3) a tuple of parameter names (single constant); 4) annotation values, individually; 5) code object; 6) qualname. The counts for 1,2,4 are packed into oparg bitfields, making oparg large. My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION, i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs for keyword defaults and annotations. Then, MAKE_FUNCTION will become a dramatically simpler 5 argument opcode, taking 1) default args tuple (optional); 2) default keyword only args dict (optional); 3) annotations dict (optional); 4) code object; 5) qualname. These arguments correspond exactly to __annotations__, __kwdefaults__, __defaults__, __code__ and __qualname__ attributes. For optional args, oparg bits should indicate individual arg presence. (This also saves None checks in opcode implementation.) If we add another optional argument (and oparg bit) for __closure__ attribute, then separate MAKE_CLOSURE opcode becomes unnecessary. Default args tuple is likely to be a constant and can be packaged whole, compensating for the extra size of explicit BUILD_* instructions. Compare the current implementation: https://github.com/python/cpython/blob/master/Python/ceval.c#L3262 with this provisional implementation (untested): TARGET(MAKE_FUNCTION) { PyObject *qualname = POP(); PyObject *codeobj = POP(); PyFunctionObject *func; func = (PyFunctionObject *)PyFunction_NewWithQualName( codeobj, f->f_globals, qualname); Py_DECREF(codeobj); Py_DECREF(qualname); if (func == NULL) goto error; /* NB: Py_None is not an acceptable value for these. */ if (oparg & 0x08) func->func_closure = POP(); if (oparg & 0x04) func->func_annotations = POP(); if (oparg & 0x02) func->func_kwdefaults = POP(); if (oparg & 0x01) func->func_defaults = POP(); PUSH((PyObject *)func); DISPATCH(); } compile.c also gets a bit simpler, but not much. What do you think? From ethan at stoneleaf.us Thu Apr 14 11:02:22 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 08:02:22 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> Message-ID: <570FB0FE.3060308@stoneleaf.us> On 04/14/2016 06:56 AM, Victor Stinner wrote: > 2016-04-14 15:40 GMT+02:00 Nick Coghlan: >> Even earlier, Victor Stinner wrote: >>> I consider that the final goal of the whole discussion is to support >>> something like: >>> >>> path = os.path.join(pathlib_path, "str_path", direntry) >> >> That's not a *new* problem though, it already exists if you pass in a >> mix of bytes and str: >> (...) >> There's also already a solution (regardless of whether you want bytes >> or str as the result), which is to explicitly coerce all the arguments >> to the same type: >> >>--> os.path.join(*map(os.fsdecode, ("str", b"bytes"))) >> (...) > > I don't understand. What is the point of adding a new __fspath__ > protocol to *implicitly* convert path objects to strings, if you still > have to use an explicit conversion? That's the crux of the issue -- some of us think the job of __fspath__ is to simply retrieve the inherent data from the pathy object, *not* to do any implicit conversions. > I would really expect that a high-level API like pathlib would solve > encodings issues for me. IMHO DirEntry entries created by > os.scandir(bytes) must use os.fsdecode() in their __fspath__ method. Then let pathlib do it. As a high-level interface I have no issue with pathlib converting DirEntry bytes objects to str using fsdecode (or whatever makes sense); os.path.join (and by extension os.fspath and __fspath__) should do no such thing. >> os.path.join(*map(os.fsdecode, ("str", b"bytes"))) > > This code is quite complex for a newbie, don't you think so? A newbie should be using pathlib. If pathlib is not low-level enough, then the newbie needs to learn about low-level stuff. -- ~Ethan~ From victor.stinner at gmail.com Thu Apr 14 11:19:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 17:19:42 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict Message-ID: Hi, I updated my PEP 509 to make the dictionary version globally unique. With *two* use cases of this PEP (Yury's method call patch and my FAT Python project), I think that the PEP is now ready to be accepted. Globally unique identifier is a requirement for Yury's patch optimizing method calls ( https://bugs.python.org/issue26110 ). It allows to check for free if the dictionary was replaced. I also renamed the ma_version field to ma_version_tag. HTML version: https://www.python.org/dev/peps/pep-0509/ Victor PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6 Abstract ======== Add a new private version to the builtin ``dict`` type, incremented at each dictionary creation and at each dictionary change, to implement fast guards on namespaces. Rationale ========= In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (namespace of a function) is usually optimized to an array, but it can be a dict too. Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards". The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a version to dictionaries to implement fast guards on namespaces. Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. Since the version is globally unique, the version is also enough to check if the namespace dictionary was not replaced with a new dictionary. The performance of a guard does not depend on the number of watched dictionary entries, complexity of O(1), if the dictionary version does not change. Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the variable is modified, the variable must be loaded at runtime when the function is called, instead of using the constant. See the `PEP 510 -- Specialized functions with guards `_ for the concrete usage of guards to specialize functions and for the rationale on Python static optimizers. Guard example ============= Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict) def check(self): """Return True if the dictionary entry did not changed and the dictionary was not replaced.""" # read the version of the dict structure version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True # the key was modified return False Usage of the dict version ========================= Speedup method calls 1.2x ------------------------- Yury Selivanov wrote a `patch to optimize method calls `_. The patch depends on the `implement per-opcode cache in ceval `_ patch which requires dictionary versions to invalidate the cache if the globals dictionary or the builtins dictionary has been modified. The cache also requires that the dictionary version is globally unique. It is possible to define a function in a namespace and call it in a different namespace: using ``exec()`` with the *globals* parameter for example. In this case, the globals dictionary was changed and the cache must be invalidated. Specialized functions using guards ---------------------------------- The `PEP 510 -- Specialized functions with guards `_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics. Example of a static Python optimizer: the `fatoptimizer `_ of the `FAT Python `_ project implements many optimizations which require guards on namespaces. Examples: * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required Pyjion ------ According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can also benefit from dictionary version to implement optimizations. Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime). Unladen Swallow --------------- Even if dictionary version was not explicitly mentioned, optimizing globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." Source: `Unladen Swallow ProjectPlan `_. Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective `_. Changes ======= Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global dictionary version. Each time a dictionary is created, the global version is incremented and the dictionary version is initialized to the global version. The global version is also incremented and copied to the dictionary version at each dictionary change: * ``clear()`` if the dict was non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the `key` does not exist * ``__detitem__(key)`` if the key exists * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value is not ``dict[key]`` * ``update(...)`` if new values are different than existing values: values are compared by identity, not by their content; the version can be incremented multiple times The ``PyDictObject`` structure is not part of the stable ABI. The field is called ``ma_version_tag`` rather than ``ma_version`` to suggest to compare it using ``version_tag == old_version_tag`` rather than ``version <= old_version`` which makes the integer overflow much likely. Example using an hypothetical ``dict_get_version(dict)`` function:: >>> d = {} >>> dict_get_version(d) 100 >>> d['key'] = 'value' >>> dict_get_version(d) 101 >>> d['key'] = 'new value' >>> dict_get_version(d) 102 >>> del d['key'] >>> dict_get_version(d) 103 The version is not incremented if an existing key is set to the same value. For efficiency, values are compared by their identity: ``new_value is old_value``, not by their content: ``new_value == old_value``. Example:: >>> d = {} >>> value = object() >>> d['key'] = value >>> dict_get_version(d) 40 >>> d['key'] = value >>> dict_get_version(d) 40 .. note:: CPython uses some singleton like integers in the range [-5; 257], empty tuple, empty strings, Unicode strings of a single character in the range [U+0000; U+00FF], etc. When a key is set twice to the same singleton, the version is not modified. Implementation and Performance ============================== The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject `_ contains a patch implementing this PEP. On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations. When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast). The `fat module `_ implements such guards: ``fat.GuardDict`` is based on the dictionary version. Integer overflow ================ The implementation uses the C type ``PY_UINT64_T`` to store the version: a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard. After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug only occurs at a guard check if there are exaclty ``2 ** 64`` dictionary creations or modifications since the previous guard check. If a dictionary is modified every nanosecond, ``2 ** 64`` modifications takes longer than 584 years. Using a 32-bit version, it only takes 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns. A risk of a bug every 584 years is acceptable. Alternatives ============ Expose the version at Python level as a read-only __version__ property ---------------------------------------------------------------------- The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API). There are multiple issues: * To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations must implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * Exposing the dictionary version at Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 ns, the difference is only 1.2 ns (3%):: $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). Mandatory bikeshedding on the property name: * ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() `_. * ``__version__`` * ``__timestamp__`` Add a version to each dict entry -------------------------------- A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version to avoid the strong reference to the value (only strong references to the dictionary and to the key are needed). Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_INT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete). Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` ``dict_get_entry_version(dict)`` functions:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key) def check(self): """Return True if the dictionary entry did not changed and the dictionary was not replaced.""" # read the version of the dict structure dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version return True # the key was modified return False The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or unused yet). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system. In Python, the memory footprint matters and the trend is to reduce it. Examples: * `PEP 393 -- Flexible String Representation `_ * `PEP 412 -- Key-Sharing Dictionary `_ Add a new dict subtype ---------------------- Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``. Leave the ``dict`` type unchanged to not add any overhead (memory footprint) when guards are not needed. Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues: * ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * Functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completely supports dict subtypes for namespaces The ``exec()`` issue is a blocker issue. Other issues: * The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower. Prior Art ========= Method cache and type version tag --------------------------------- In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure). The type version tag is not available at the Python level. The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``. By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated. On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``. See `Method cache (issue #1685986) `_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) `_. Globals / builtins cache ------------------------ In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) `_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``. The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache. The change on the ``PyDictObject`` structure is very similar to this PEP. Cached globals+builtins lookup ------------------------------ In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization `_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. Thread on python-dev: `About dictionary lookup caching `_. Guard against changing dict during iteration -------------------------------------------- In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) `_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified, and so is very similar to the proposed dictionary version. Sadly, the dictionary version proposed in this PEP doesn't help to detect dictionary mutation. The dictionary version changes when values are replaced, whereas modifying dictionary values while iterating on dictionary keys is legit in Python. PySizer ------- `PySizer `_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone. This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects. Discussion ========== Thread on the mailing lists: * python-dev: `PEP 509: Add a private version to dict `_ (january 2016) * python-ideas: `RFC: PEP: Add dict.__version__ `_ (january 2016) Copyright ========= This document has been placed in the public domain. From ethan at stoneleaf.us Thu Apr 14 11:25:04 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 08:25:04 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> Message-ID: <570FB650.203@stoneleaf.us> On 04/14/2016 07:52 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: >> The use case for returning bytes from __fspath__ is DirEntry, so you >> can write things like this in low level code: >> >> def myscandir(dirpath): >> for entry in os.scandir(dirpath): >> if entry.is_file(): >> with open(entry) as f: >> # do something > > Excuse me, but that is *not* a use case for returning bytes from > DirEntry.__fspath__. open() is perfectly happy taking str (including > surrogate-encoded rawbytes). Substitute open() with sending those bytes somewhere else: why should I have to reencode this str back to bytes, when bytes are what I asked for in the first place? > If the trivial thing is for __fspath__ > to return bytes, then implicitly applying os.fsencode to the value > being returned is almost as trivial, and just as safe. A low price to > pay for ensuring that text applications don't crash just because a > bytes-oriented object decides to implement __fspath__. How did this application get a bytes path object to begin with? Either it explicitly used bytes when calling scandir and friends (in which case it shouldn't be surprised to be working with bytes); or it got that bytes object from a database, over-the-wire, an-other-language-lib, etc. Those are the boundaries where bytes should be transformed to str if the app doesn't want to deal with bytes (whether for path manipulation or other text manipulation). os.fspath() is not a boundary function and shouldn't be used as if it were. > If there's any cost to defining __fspath__ as str-only, it's some > other use case. What consumer of __fspath__ that expects bytes but > not str do you envision? Is it generalizable, so that applying > fsencode to the value of __fspath__ would lead to "unacceptably" > widespread sprinkling of fsencode all over bytes-oriented code? If I'm working with bytes, why would I want to work with str? Python is a glue language, and Python practitioners don't always have the luxury of working only with text. -- ~Ethan~ From ethan at stoneleaf.us Thu Apr 14 11:29:02 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 08:29:02 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> Message-ID: <570FB73E.5000408@stoneleaf.us> On 04/14/2016 07:01 AM, Random832 wrote: > On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote: >> Adding integers and floats is considered "safe" because most people's >> use of floats completely compasses their use of ints. (You'll get >> OverflowError if it can't be represented.) But float and Decimal are >> considered "unsafe": >> >>--> 1.5 + decimal.Decimal("1.5") >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: unsupported operand type(s) for +: 'float' and >> 'decimal.Decimal' >> >> This is more what's happening here. Floats and Decimals can represent >> similar sorts of things, but with enough incompatibilities that you >> can't simply merge them. > > And what such incompatibilities exist between bytes and str for the > purpose of representing file paths? At the end of the day, there's > exactly one answer to "what file on disk this represents (or would > represent if it existed)". Interoperability with other systems and/or libraries. If we use surrogateescape to transform str to bytes, and the other side does not, we no longer have a workable path. -- ~Ethan~ From guido at python.org Thu Apr 14 11:27:52 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 14 Apr 2016 08:27:52 -0700 Subject: [Python-Dev] MAKE_FUNCTION simplification In-Reply-To: References: Message-ID: Great analysis! What might stand in the way of adoption is concern for bytecode manipulation libraries that would have to be changed. What might encourage adoption would be a benchmark showing this saves a lot of time. Personally I'm expecting it won't make much of a difference for real programs since almost always the cost of creating the function is dwarfed by the (total) cost of running it. But Python does create a lot of functions, and there's also lambdas. There's also talk of switching to wordcode, in a different thread. Maybe the idea would be easier to introduce there? (Bytecode libraries would have to change anyways, so the additional concern for this change would be minimal.) On Thu, Apr 14, 2016 at 2:04 AM, Nikita Nemkin wrote: > MAKE_FUNCTION opcode is complex due to the way it receives > input arguments: > > 1) default args, individually; > 2) default kwonly args, individual name-value pairs; > 3) a tuple of parameter names (single constant); > 4) annotation values, individually; > 5) code object; > 6) qualname. > > The counts for 1,2,4 are packed into oparg bitfields, making oparg large. > > My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION, > i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs > for keyword defaults and annotations. > > Then, MAKE_FUNCTION will become a dramatically simpler > 5 argument opcode, taking > > 1) default args tuple (optional); > 2) default keyword only args dict (optional); > 3) annotations dict (optional); > 4) code object; > 5) qualname. > > These arguments correspond exactly to __annotations__, __kwdefaults__, > __defaults__, __code__ and __qualname__ attributes. > > For optional args, oparg bits should indicate individual arg presence. > (This also saves None checks in opcode implementation.) > > If we add another optional argument (and oparg bit) for __closure__ > attribute, then separate MAKE_CLOSURE opcode becomes unnecessary. > > Default args tuple is likely to be a constant and can be packaged whole, > compensating for the extra size of explicit BUILD_* instructions. > > Compare the current implementation: > > https://github.com/python/cpython/blob/master/Python/ceval.c#L3262 > > with this provisional implementation (untested): > > TARGET(MAKE_FUNCTION) { > PyObject *qualname = POP(); > PyObject *codeobj = POP(); > PyFunctionObject *func; > func = (PyFunctionObject *)PyFunction_NewWithQualName( > codeobj, f->f_globals, qualname); > Py_DECREF(codeobj); > Py_DECREF(qualname); > if (func == NULL) > goto error; > > /* NB: Py_None is not an acceptable value for these. */ > if (oparg & 0x08) > func->func_closure = POP(); > if (oparg & 0x04) > func->func_annotations = POP(); > if (oparg & 0x02) > func->func_kwdefaults = POP(); > if (oparg & 0x01) > func->func_defaults = POP(); > > PUSH((PyObject *)func); > DISPATCH(); > } > > compile.c also gets a bit simpler, but not much. > > What do you think? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From victor.stinner at gmail.com Thu Apr 14 11:32:14 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 17:32:14 +0200 Subject: [Python-Dev] MAKE_FUNCTION simplification In-Reply-To: References: Message-ID: 2016-04-14 11:04 GMT+02:00 Nikita Nemkin : > MAKE_FUNCTION opcode is complex due to the way it receives > input arguments: (...) Yeah, I was always disturbed how this opcode gets parameters. > My suggestion is to pre-package 1-4 before calling MAKE_FUNCTION, > i.e. explicitly emit BUILD_TUPLE for defaults args and BUILD_MAPs > for keyword defaults and annotations. I read the code. I fact, I don't understand why it wasn't done like that since the beginning :-p > Then, MAKE_FUNCTION will become a dramatically simpler > 5 argument opcode, taking Would you like to work on a patch to implement that change? Since Python 3.6 may get a new bytecode format (wordcode, see the other thread on this mlailing list), I think that it's ok to change MAKE_FUNCTION in the same release. Victor From victor.stinner at gmail.com Thu Apr 14 11:36:10 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 17:36:10 +0200 Subject: [Python-Dev] MAKE_FUNCTION simplification In-Reply-To: References: Message-ID: 2016-04-14 17:27 GMT+02:00 Guido van Rossum : > Great analysis! What might stand in the way of adoption is concern for > bytecode manipulation libraries that would have to be changed. > (...) > There's also talk of switching to wordcode, in a different thread. I agree that breaking backward compatibility just for MAKE_FUNCTION is not worth. But if we accept the wordcode change, IMHO it's ok to take this as an opportunity to also modify MAKE_FUNCTION. > Maybe the idea would be easier to introduce there? (Bytecode libraries > would have to change anyways, so the additional concern for this > change would be minimal.) Exactly ;-) Victor From brett at python.org Thu Apr 14 11:40:55 2016 From: brett at python.org (Brett Cannon) Date: Thu, 14 Apr 2016 15:40:55 +0000 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On Thu, 14 Apr 2016 at 03:26 Victor Stinner wrote: > > Le 14 avr. 2016 11:16 AM, "Serhiy Storchaka" a > ?crit : > > A desirable but nonexistent feature is to write emails to authors of > commits that broke buildbots. How hard to implement this? > > Yeah I also had this idea since many years but buildbots were quite > unstable. Maybe we should be more strict to consider a buildbot as stable? > Depending on how fancy we get with our infrastructure after we move to GitHub, we could theoretically end up with a PR-merging bot that can detect which commit broke things and report on the PR that did it (we well as report anywhere else we wanted to). > I propose to experiment sending notifications of failure to the authors of > changes *and* to a new mailing list. I would subscribe to such list. An > even safer starting point would be to only start with the mailing list. > > FYI I'm connected to the #python-dev IRC channel which already contain > these notifications. But I agree that mails are better. > Yeah, I'm one of those that doesn't sit on #python-dev due to the lack of a persistently connected machine, so an email would work better (unless we want to be trendy and write a bot for Slack/Skype/FB Messenger :). > > What are you think about backporting recent regrtest to 2.7? Most needed > features to me are the -m and -G options. > > Regrtest changed a lot in python 3.6 (new test.libregrtest library). > I suggest to start from python 3.5. > > For -m: if it doesn't need to modify the unittest module, I agree. > > I don't know -G option. > > > Would be nice to add a feature for running every test in separate > subprocess. This will isolate the effect of failed tests. > > See my email :-) I proposed to modify -j1 to run tests in subrpocesses. I > even mentionned my issue. > > I suggest to use -jN on all buildbot, at least -j1. > > Maybe -j2 is even better since many tests are waiting on IO or simple > sleep. > Both ideas seems reasonable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cybersol at yahoo.com Thu Apr 14 11:59:56 2016 From: cybersol at yahoo.com (Michael Mysinger) Date: Thu, 14 Apr 2016 15:59:56 +0000 (UTC) Subject: [Python-Dev] pathlib - current status of discussions References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote: > > In particular, one RichPath > > class might return bytes and another str, or even worse the same class might > > sometimes return bytes and sometimes str. When will os.path.join blow up due > > to mixing bytes and str and when will it work in those situations? > > What are you asking here? ... Meaning allowing os.fspath() > and __fspath__ to return either bytes or str will never cause the > combination of bytes and str to work. Said another way: if you are > using os.path.join then all the pieces have be str or all the pieces > have to be bytes. I am saying that if os.path.join now accepts RichPath objects, and those objects can return either str or bytes, then its much harder to reason about when I have all bytes or all strings. In essence, you will force me to pre- wrap all RichPath objects in either os.fsencode(os.fspath(path)) or os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I have to always do that wrapping then os.path.join doesn't need to accept RichPath objects and call fspath at all. From victor.stinner at gmail.com Thu Apr 14 12:04:36 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 18:04:36 +0200 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570FB73E.5000408@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> <570FB73E.5000408@stoneleaf.us> Message-ID: 2016-04-14 17:29 GMT+02:00 Ethan Furman : > Interoperability with other systems and/or libraries. If we use > surrogateescape to transform str to bytes, and the other side does not, we > no longer have a workable path. I guess that you mean a Python library? When you exchange with external programs or call a C libraries, Python is responsible to encode Unicode to bytes with os.fsencode(). The external part is not aware that Python uses surrogateescape, it gets "regular" bytes. I suggest to consider such Python library as external programs and libraries: convert Unicode to bytes with os.fsencode(), but also process paths as Unicode "inside" your application. It's the basic rule to handle correctly Unicode in an application: decode inputs as soon as possible, and encode back as late as possible. Encode/decode at borders. Victor From stephen at xemacs.org Thu Apr 14 12:05:57 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 15 Apr 2016 01:05:57 +0900 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> Message-ID: <22287.49125.859016.872121@turnbull.sk.tsukuba.ac.jp> Random832 writes: > And what such incompatibilities exist between bytes and str for the > purpose of representing file paths? A plethora of encodings. > At the end of the day, there's exactly one answer to "what file on > disk this represents (or would represent if it existed)". Nope. Suppose those bytes were read from a file or a socket? It's dangerous to assume that encoding matches the file system's. From victor.stinner at gmail.com Thu Apr 14 12:09:14 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 18:09:14 +0200 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570FAF2F.6080304@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> <570FAF2F.6080304@stoneleaf.us> Message-ID: 2016-04-14 16:54 GMT+02:00 Ethan Furman : >> I consider that the final goal of the whole discussion is to support >> something like: >> >> path = os.path.join(pathlib_path, "str_path", direntry) >> >> (...) >> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, >> just to make my life easier. > > This would be where we strongly disagree. FYI it's ok that we disagree on this point, at least I expressed my opinion ;-) At least, we now identified better a point of disagreement. Victor From donald at stufft.io Thu Apr 14 12:13:02 2016 From: donald at stufft.io (Donald Stufft) Date: Thu, 14 Apr 2016 12:13:02 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> Message-ID: <1B962989-D6E6-4557-BDAD-3087F1E733E6@stufft.io> > On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev wrote: > > In essence, you will force me to pre- > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or > os.fsdecode(os.fspath(path)), just so I can reason about the type. This is only the case if you have a singular RichPath object that can represent both bytes and str (which is what DirEntry does, which I agree makes it harder? but that?s already the case with DirEntry.path). However that?s not the case if you have a bRichPath and uRichPath. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From nikita at nemkin.ru Thu Apr 14 12:03:01 2016 From: nikita at nemkin.ru (Nikita Nemkin) Date: Thu, 14 Apr 2016 21:03:01 +0500 Subject: [Python-Dev] MAKE_FUNCTION simplification Message-ID: On Thu, Apr 14, 2016 at 8:27 PM, Guido van Rossum wrote: > Great analysis! What might stand in the way of adoption is concern for > bytecode manipulation libraries that would have to be changed. What > might encourage adoption would be a benchmark showing this saves a lot > of time. > > Personally I'm expecting it won't make much of a difference for real > programs since almost always the cost of creating the function is > dwarfed by the (total) cost of running it. But Python does create a > lot of functions, and there's also lambdas. This change alone is very unlikely to have a measurable performance impact. The intention is to clean up ceval.c/compile.c a bit, nothing more. If many other opcodes were somehow slimmed down in the similar fashion, then we might (or might not) see perf gains. For example, most slot dispatch opcodes can be compressed into a single opcode+slot index with inlined dispatch logic, instead of each one individually calling C API functions... > There's also talk of switching to wordcode, in a different thread. > Maybe the idea would be easier to introduce there? (Bytecode libraries > would have to change anyways, so the additional concern for this > change would be minimal.) Wordcode can benefit from this change, because it guarantees single-byte MAKE_FUNCTION oparg. I think that Python should make bytecode explicitly unstable and subject to change with any major release. The potential for a faster Python interpreter (or simple JIT) is huge; requiring bytecode compatibility will slow down any progress in this area. From nikita at nemkin.ru Thu Apr 14 12:14:43 2016 From: nikita at nemkin.ru (Nikita Nemkin) Date: Thu, 14 Apr 2016 21:14:43 +0500 Subject: [Python-Dev] MAKE_FUNCTION simplification Message-ID: On Thu, Apr 14, 2016 at 8:32 PM, Victor Stinner wrote: > > Would you like to work on a patch to implement that change? I'll work on a patch. Should I post it to bugs.python.org? > Since Python 3.6 may get a new bytecode format (wordcode, see the > other thread on this mlailing list), I think that it's ok to change > MAKE_FUNCTION in the same release. Wordcode looks like pure win from (projected) 25% bytecode size reduction alone. From random832 at fastmail.com Thu Apr 14 12:18:18 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 12:18:18 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <22287.49125.859016.872121@turnbull.sk.tsukuba.ac.jp> References: <570C1E13.4090909@stoneleaf.us> <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> <22287.49125.859016.872121@turnbull.sk.tsukuba.ac.jp> Message-ID: <1460650698.48886.578872145.4926E180@webmail.messagingengine.com> On Thu, Apr 14, 2016, at 12:05, Stephen J. Turnbull wrote: > Random832 writes: > > > And what such incompatibilities exist between bytes and str for the > > purpose of representing file paths? > > A plethora of encodings. Only one encoding, fsencode/fsdecode. All other encodings are not for filenames. > > At the end of the day, there's exactly one answer to "what file on > > disk this represents (or would represent if it existed)". > > Nope. Suppose those bytes were read from a file or a socket? It's > dangerous to assume that encoding matches the file system's. Why can I pass them to os.open, then, or to os.path.join so long as everything else is also bytes? On UNIX, the filesystem is in bytes, so saying that bytes can't match the filesystem is absurd. Converting it to str with fsdecode will *always, absolutely, 100% of the time* give a str that will address the same file that the bytes does (even if it's "dangerous" to assume that was the name the user wanted, that's beyond the scope of what the module is capable of dealing with). From brett at python.org Thu Apr 14 12:28:36 2016 From: brett at python.org (Brett Cannon) Date: Thu, 14 Apr 2016 16:28:36 +0000 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: +1 from me! A couple of grammar/typo suggestions below. On Thu, 14 Apr 2016 at 08:20 Victor Stinner wrote: > Hi, > > I updated my PEP 509 to make the dictionary version globally unique. > With *two* use cases of this PEP (Yury's method call patch and my FAT > Python project), I think that the PEP is now ready to be accepted. > > Globally unique identifier is a requirement for Yury's patch > optimizing method calls ( https://bugs.python.org/issue26110 ). It > allows to check for free if the dictionary was replaced. > > I also renamed the ma_version field to ma_version_tag. > > HTML version: > https://www.python.org/dev/peps/pep-0509/ > > Victor > > > PEP: 509 > Title: Add a private version to dict > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 4-January-2016 > Python-Version: 3.6 > > > Abstract > ======== > > Add a new private version to the builtin ``dict`` type, incremented at > each dictionary creation and at each dictionary change, to implement > fast guards on namespaces. > > > Rationale > ========= > > In Python, the builtin ``dict`` type is used by many instructions. For > example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the > global namespace, or in the builtins namespace (two dict lookups). > Python uses ``dict`` for the builtins namespace, globals namespace, type > namespaces, instance namespaces, etc. The local namespace (namespace of > a function) is usually optimized to an array, but it can be a dict too. > > Python is hard to optimize because almost everything is mutable: builtin > functions, function code, global variables, local variables, ... can be > modified at runtime. Implementing optimizations respecting the Python > semantics requires to detect when "something changes": we will call > these checks "guards". > > The speedup of optimizations depends on the speed of guard checks. This > PEP proposes to add a version to dictionaries to implement fast guards > on namespaces. > > Dictionary lookups can be skipped if the version does not change which > is the common case for most namespaces. Since the version is globally > unique, the version is also enough to check if the namespace dictionary > was not replaced with a new dictionary. The performance of a guard does > not depend on the number of watched dictionary entries, complexity of > O(1), if the dictionary version does not change. > > Example of optimization: copy the value of a global variable to function > constants. This optimization requires a guard on the global variable to > check if it was modified. If the variable is modified, the variable must > be loaded at runtime when the function is called, instead of using the > constant. > > See the `PEP 510 -- Specialized functions with guards > `_ for the concrete usage of > guards to specialize functions and for the rationale on Python static > optimizers. > > > Guard example > ============= > > Pseudo-code of an fast guard to check if a dictionary entry was modified > (created, updated or deleted) using an hypothetical > ``dict_get_version(dict)`` function:: > > UNSET = object() > > class GuardDictKey: > def __init__(self, dict, key): > self.dict = dict > self.key = key > self.value = dict.get(key, UNSET) > self.version = dict_get_version(dict) > > def check(self): > """Return True if the dictionary entry did not changed > and the dictionary was not replaced.""" > "did not change" > > # read the version of the dict structure > version = dict_get_version(self.dict) > if version == self.version: > # Fast-path: dictionary lookup avoided > return True > > # lookup in the dictionary > value = self.dict.get(self.key, UNSET) > if value is self.value: > # another key was modified: > # cache the new dictionary version > self.version = version > return True > > # the key was modified > return False > > > Usage of the dict version > ========================= > > Speedup method calls 1.2x > ------------------------- > > Yury Selivanov wrote a `patch to optimize method calls > `_. The patch depends on the > `implement per-opcode cache in ceval > `_ patch which requires dictionary > versions to invalidate the cache if the globals dictionary or the > builtins dictionary has been modified. > > The cache also requires that the dictionary version is globally unique. > It is possible to define a function in a namespace and call it > in a different namespace: using ``exec()`` with the *globals* parameter > for example. In this case, the globals dictionary was changed and the > cache must be invalidated. > > > Specialized functions using guards > ---------------------------------- > > The `PEP 510 -- Specialized functions with guards > `_ proposes an API to support > specialized functions with guards. It allows to implement static > optimizers for Python without breaking the Python semantics. > > Example of a static Python optimizer: the `fatoptimizer > `_ of the `FAT Python > `_ project > implements many optimizations which require guards on namespaces. > Examples: > > * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on > ``builtins.__dict__['len']`` and ``globals()['len']`` are required > * Loop unrolling: to unroll the loop ``for i in range(...): ...``, > guards on ``builtins.__dict__['range']`` and ``globals()['range']`` > are required > > > Pyjion > ------ > > According of Brett Cannon, one of the two main developers of Pyjion, > Pyjion can also benefit from dictionary version to implement > optimizations. > > Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET > Core runtime). > > > Unladen Swallow > --------------- > > Even if dictionary version was not explicitly mentioned, optimizing > globals and builtins lookup was part of the Unladen Swallow plan: > "Implement one of the several proposed schemes for speeding lookups of > globals and builtins." Source: `Unladen Swallow ProjectPlan > `_. > > Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler > implemented with LLVM. The project stopped in 2011: `Unladen Swallow > Retrospective > >`_. > > > Changes > ======= > > Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with > the C type ``PY_INT64_T``, 64-bit unsigned integer. Don't you mean ``PY_UINT64_T``? > Add also a global > dictionary version. Each time a dictionary is created, the global > version is incremented and the dictionary version is initialized to the > global version. The global version is also incremented and copied to the > dictionary version at each dictionary change: > > * ``clear()`` if the dict was non-empty > * ``pop(key)`` if the key exists > * ``popitem()`` if the dict is non-empty > * ``setdefault(key, value)`` if the `key` does not exist > * ``__detitem__(key)`` if the key exists > * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value > is not ``dict[key]`` > * ``update(...)`` if new values are different than existing values: > values are compared by identity, not by their content; the version can > be incremented multiple times > > The ``PyDictObject`` structure is not part of the stable ABI. > > The field is called ``ma_version_tag`` rather than ``ma_version`` to > suggest to compare it using ``version_tag == old_version_tag`` rather > than ``version <= old_version`` which makes the integer overflow much > likely. > > Example using an hypothetical ``dict_get_version(dict)`` function:: > > >>> d = {} > >>> dict_get_version(d) > 100 > >>> d['key'] = 'value' > >>> dict_get_version(d) > 101 > >>> d['key'] = 'new value' > >>> dict_get_version(d) > 102 > >>> del d['key'] > >>> dict_get_version(d) > 103 > > The version is not incremented if an existing key is set to the same > value. For efficiency, values are compared by their identity: > ``new_value is old_value``, not by their content: > ``new_value == old_value``. Example:: > > >>> d = {} > >>> value = object() > >>> d['key'] = value > >>> dict_get_version(d) > 40 > >>> d['key'] = value > >>> dict_get_version(d) > 40 > > .. note:: > CPython uses some singleton like integers in the range [-5; 257], > empty tuple, empty strings, Unicode strings of a single character in > the range [U+0000; U+00FF], etc. When a key is set twice to the same > singleton, the version is not modified. > > > Implementation and Performance > ============================== > > The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject > `_ contains a patch implementing > this PEP. > > On pybench and timeit microbenchmarks, the patch does not seem to add > any overhead on dictionary operations. > > When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for > a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, > a guard can watch for multiple keys. For example, for an optimization > using 10 global variables in a function, 10 dictionary lookups costs 148 > ns, whereas the guard still only costs 3.8 ns when the version does not > change (39x as fast). > > The `fat module > `_ implements > such guards: ``fat.GuardDict`` is based on the dictionary version. > > > Integer overflow > ================ > > The implementation uses the C type ``PY_UINT64_T`` to store the version: > a 64 bits unsigned integer. The C code uses ``version++``. On integer > overflow, the version is wrapped to ``0`` (and then continue to be > incremented) according to the C standard. > > After an integer overflow, a guard can succeed whereas the watched > dictionary key was modified. The bug only occurs at a guard check if > there are exaclty ``2 ** 64`` dictionary creations or modifications > since the previous guard check. > > If a dictionary is modified every nanosecond, ``2 ** 64`` modifications > takes longer than 584 years. Using a 32-bit version, it only takes 4 > seconds. That's why a 64-bit unsigned type is also used on 32-bit > systems. A dictionary lookup at the C level takes 14.8 ns. > > A risk of a bug every 584 years is acceptable. > > > Alternatives > ============ > > Expose the version at Python level as a read-only __version__ property > ---------------------------------------------------------------------- > > The first version of the PEP proposed to expose the dictionary version > as a read-only ``__version__`` property at Python level, and also to add > the property to ``collections.UserDict`` (since this type must mimick > the ``dict`` API). > > There are multiple issues: > > * To be consistent and avoid bad surprises, the version must be added to > all mapping types. Implementing a new mapping type would require extra > work for no benefit, since the version is only required on the > ``dict`` type in practice. > * All Python implementations must implement this new property, it gives > more work to other implementations, whereas they may not use the > dictionary version at all. > * Exposing the dictionary version at Python level can lead the > false assumption on performances. Checking ``dict.__version__`` at > the Python level is not faster than a dictionary lookup. A dictionary > lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 > ns, the difference is only 1.2 ns (3%):: > > > $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] > == 33' > 10000000 loops, best of 3: 0.0487 usec per loop > $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' > 'd.__version__ == 100' > 10000000 loops, best of 3: 0.0475 usec per loop > > * The ``__version__`` can be wrapped on integer overflow. It is error > prone: using ``dict.__version__ <= guard_version`` is wrong, > ``dict.__version__ == guard_version`` must be used instead to reduce > the risk of bug on integer overflow (even if the integer overflow is > unlikely in practice). > > Mandatory bikeshedding on the property name: > > * ``__cache_token__``: name proposed by Nick Coghlan, name coming from > `abc.get_cache_token() > `_. > * ``__version__`` > * ``__timestamp__`` > > > Add a version to each dict entry > -------------------------------- > > A single version per dictionary requires to keep a strong reference to > the value which can keep the value alive longer than expected. If we add > also a version per dictionary entry, the guard can only store the entry > version to avoid the strong reference to the value (only strong > references to the dictionary and to the key are needed). > > Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, > the field has the C type ``PY_INT64_T``. When a key is created or > modified, the entry version is set to the dictionary version which is > incremented at any change (create, modify, delete). > > Pseudo-code of an fast guard to check if a dictionary key was modified > using hypothetical ``dict_get_version(dict)`` > ``dict_get_entry_version(dict)`` functions:: > > UNSET = object() > > class GuardDictKey: > def __init__(self, dict, key): > self.dict = dict > self.key = key > self.dict_version = dict_get_version(dict) > self.entry_version = dict_get_entry_version(dict, key) > > def check(self): > """Return True if the dictionary entry did not changed > and the dictionary was not replaced.""" > > # read the version of the dict structure > dict_version = dict_get_version(self.dict) > if dict_version == self.version: > # Fast-path: dictionary lookup avoided > return True > > # lookup in the dictionary > entry_version = get_dict_key_version(dict, key) > if entry_version == self.entry_version: > # another key was modified: > # cache the new dictionary version > self.dict_version = dict_version > return True > > # the key was modified > return False > > The main drawback of this option is the impact on the memory footprint. > It increases the size of each dictionary entry, so the overhead depends > on the number of buckets (dictionary entries, used or unused yet). For > example, it increases the size of each dictionary entry by 8 bytes on > 64-bit system. > > In Python, the memory footprint matters and the trend is to reduce it. > Examples: > > * `PEP 393 -- Flexible String Representation > `_ > * `PEP 412 -- Key-Sharing Dictionary > `_ > > > Add a new dict subtype > ---------------------- > > Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, > use the ``verdict`` for namespaces (module namespace, type namespace, > instance namespace, etc.) instead of ``dict``. > > Leave the ``dict`` type unchanged to not add any overhead (memory > footprint) when guards are not needed. > > Technical issue: a lot of C code in the wild, including CPython core, > expecting the exact ``dict`` type. Issues: > > * ``exec()`` requires a ``dict`` for globals and locals. A lot of code > use ``globals={}``. It is not possible to cast the ``dict`` to a > ``dict`` subtype because the caller expects the ``globals`` parameter > to be modified (``dict`` is mutable). > * Functions call directly ``PyDict_xxx()`` functions, instead of calling > ``PyObject_xxx()`` if the object is a ``dict`` subtype > * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some > functions require the exact ``dict`` type. > * ``Python/ceval.c`` does not completely supports dict subtypes for > namespaces > > > The ``exec()`` issue is a blocker issue. > > Other issues: > > * The garbage collector has a special code to "untrack" ``dict`` > instances. If a ``dict`` subtype is used for namespaces, the garbage > collector can be unable to break some reference cycles. > * Some functions have a fast-path for ``dict`` which would not be taken > for ``dict`` subtypes, and so it would make Python a little bit > slower. > > > Prior Art > ========= > > Method cache and type version tag > --------------------------------- > > In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It > was merged into Python 2.6. The patch adds a "type attribute cache > version tag" (``tp_version_tag``) and a "valid version tag" flag to > types (the ``PyTypeObject`` structure). > > The type version tag is not available at the Python level. > > The version tag has the C type ``unsigned int``. The cache is a global > hash table of 4096 entries, shared by all types. The cache is global to > "make it fast, have a deterministic and low memory footprint, and be > easy to invalidate". Each cache entry has a version tag. A global > version tag is used to create the next version tag, it also has the C > type ``unsigned int``. > > By default, a type has its "valid version tag" flag cleared to indicate > that the version tag is invalid. When the first method of the type is > cached, the version tag and the "valid version tag" flag are set. When a > type is modified, the "valid version tag" flag of the type and its > subclasses is cleared. Later, when a cache entry of these types is used, > the entry is removed because its version tag is outdated. > > On integer overflow, the whole cache is cleared and the global version > tag is reset to ``0``. > > See `Method cache (issue #1685986) > `_ and `Armin's method cache > optimization updated for Python 2.6 (issue #1700288) > `_. > > > Globals / builtins cache > ------------------------ > > In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue > #10401) `_ which adds a private > ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), > the field has the C type ``Py_ssize_t``. > > The patch adds a "global and builtin cache" to functions and frames, and > changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the > cache. > > The change on the ``PyDictObject`` structure is very similar to this > PEP. > > > Cached globals+builtins lookup > ------------------------------ > > In 2006, Andrea Griffini proposed a patch implementing a `Cached > globals+builtins lookup optimization > `_. The patch adds a private > ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), > the field has the C type ``size_t``. > > Thread on python-dev: `About dictionary lookup caching > >`_. > > > Guard against changing dict during iteration > -------------------------------------------- > > In 2013, Serhiy Storchaka proposed `Guard against changing dict during > iteration (issue #19332) `_ which > adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` > type), the field has the C type ``size_t``. This field is incremented > when the dictionary is modified, and so is very similar to the proposed > dictionary version. > > Sadly, the dictionary version proposed in this PEP doesn't help to > detect dictionary mutation. The dictionary version changes when values > are replaced, whereas modifying dictionary values while iterating on > dictionary keys is legit in Python. > > > PySizer > ------- > > `PySizer `_: a memory profiler for Python, > Google Summer of Code 2005 project by Nick Smallbone. > > This project has a patch for CPython 2.4 which adds ``key_time`` and > ``value_time`` fields to dictionary entries. It uses a global > process-wide counter for dictionaries, incremented each time that a > dictionary is modified. The times are used to decide when child objects > first appeared in their parent objects. > > > Discussion > ========== > > Thread on the mailing lists: > > * python-dev: `PEP 509: Add a private version to dict > >`_ > (january 2016) > * python-ideas: `RFC: PEP: Add dict.__version__ > >`_ > (january 2016) > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Apr 14 12:30:37 2016 From: brett at python.org (Brett Cannon) Date: Thu, 14 Apr 2016 16:30:37 +0000 Subject: [Python-Dev] MAKE_FUNCTION simplification In-Reply-To: References: Message-ID: On Thu, 14 Apr 2016 at 09:16 Nikita Nemkin wrote: > On Thu, Apr 14, 2016 at 8:32 PM, Victor Stinner > wrote: > > > > Would you like to work on a patch to implement that change? > > I'll work on a patch. Should I post it to bugs.python.org? > Yep. > > > Since Python 3.6 may get a new bytecode format (wordcode, see the > > other thread on this mlailing list), I think that it's ok to change > > MAKE_FUNCTION in the same release. > > Wordcode looks like pure win from (projected) 25% bytecode size > reduction alone. > CPU performance is more the worry here (which looks mostly unaffected, maybe even faster), but reduced .pyc files is a nice perk. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cybersol at yahoo.com Thu Apr 14 12:30:51 2016 From: cybersol at yahoo.com (Michael Mysinger) Date: Thu, 14 Apr 2016 16:30:51 +0000 (UTC) Subject: [Python-Dev] pathlib - current status of discussions References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> <1B962989-D6E6-4557-BDAD-3087F1E733E6@stufft.io> Message-ID: Donald Stufft stufft.io> writes: > > On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev python.org> wrote: > > > > In essence, you will force me to pre- > > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or > > os.fsdecode(os.fspath(path)), just so I can reason about the type. > > This is only the case if you have a singular RichPath object that can represent both bytes and str (which is > what DirEntry does, which I agree makes it harder? but that?s already the case with DirEntry.path). > However that?s not the case if you have a bRichPath and uRichPath. And you might even be able to retain your sanity if you enforce any particular class to be either bRichPath or uRichPath. But if you do that, then that still leaves DirEntry out in the cold, likely converting to str in its __fspath__. Which leaves me in the camp that bRichPath falls under YAGNI, and RichPath should be str only. From ethan at stoneleaf.us Thu Apr 14 12:39:06 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 09:39:06 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <570FAF2F.6080304@stoneleaf.us> Message-ID: <570FC7AA.7050805@stoneleaf.us> On 04/14/2016 09:09 AM, Victor Stinner wrote: > 2016-04-14 16:54 GMT+02:00 Ethan Furman: >>> I consider that the final goal of the whole discussion is to support >>> something like: >>> >>> path = os.path.join(pathlib_path, "str_path", direntry) >>> >>> (...) >>> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, >>> just to make my life easier. >> >> This would be where we strongly disagree. > > FYI it's ok that we disagree on this point, at least I expressed my opinion ;-) Absolutely. I appreciate you explaining your point of view. > At least, we now identified better a point of disagreement. Agreed. :) ~Ethan~ From victor.stinner at gmail.com Thu Apr 14 12:44:05 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 18:44:05 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: 2016-04-14 18:28 GMT+02:00 Brett Cannon : > +1 from me! Thanks. > A couple of grammar/typo suggestions below. Fixed. (Yes, I want to use unsigned type, so PY_UINT64_T.) Victor From ethan at stoneleaf.us Thu Apr 14 12:46:13 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 09:46:13 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> Message-ID: <570FC955.3080908@stoneleaf.us> On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote: > I am saying that if os.path.join now accepts RichPath objects, and those > objects can return either str or bytes, then its much harder to reason about > when I have all bytes or all strings. In essence, you will force me to pre- > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or > os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I > have to always do that wrapping then os.path.join doesn't need to accept > RichPath objects and call fspath at all. What many folks seem to be missing is that *you* (generic you) have control of your data. If you are not working at the bytes layer, you shouldn't be getting bytes objects because: - you specified str when asking for data from the OS, or - you transformed the incoming bytes from whatever external source to str when you received them. -- ~Ethan~ From stefan_ml at behnel.de Thu Apr 14 12:48:45 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 14 Apr 2016 18:48:45 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: +1 from me, too. I'm sure we can make some use of this in Cython. Stefan Victor Stinner schrieb am 14.04.2016 um 17:19: > PEP: 509 > Title: Add a private version to dict From tjreedy at udel.edu Thu Apr 14 12:56:54 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 14 Apr 2016 12:56:54 -0400 Subject: [Python-Dev] MAKE_FUNCTION simplification In-Reply-To: References: Message-ID: On 4/14/2016 12:03 PM, Nikita Nemkin wrote: > I think that Python should make bytecode explicitly unstable and subject > to change with any major release. https://docs.python.org/3/library/dis.html#module-dis CPython implementation detail: Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Version = minor release, as opposed to maintenance release. -- Terry Jan Reedy From guido at python.org Thu Apr 14 12:59:50 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 14 Apr 2016 09:59:50 -0700 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: I'll wait a day before formally pronouncing to see if any objections are made, but it looks good to me. On Thu, Apr 14, 2016 at 8:19 AM, Victor Stinner wrote: > Hi, > > I updated my PEP 509 to make the dictionary version globally unique. > With *two* use cases of this PEP (Yury's method call patch and my FAT > Python project), I think that the PEP is now ready to be accepted. > > Globally unique identifier is a requirement for Yury's patch > optimizing method calls ( https://bugs.python.org/issue26110 ). It > allows to check for free if the dictionary was replaced. > > I also renamed the ma_version field to ma_version_tag. > > HTML version: > https://www.python.org/dev/peps/pep-0509/ > > Victor > > > PEP: 509 > Title: Add a private version to dict > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 4-January-2016 > Python-Version: 3.6 > > > Abstract > ======== > > Add a new private version to the builtin ``dict`` type, incremented at > each dictionary creation and at each dictionary change, to implement > fast guards on namespaces. > > > Rationale > ========= > > In Python, the builtin ``dict`` type is used by many instructions. For > example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the > global namespace, or in the builtins namespace (two dict lookups). > Python uses ``dict`` for the builtins namespace, globals namespace, type > namespaces, instance namespaces, etc. The local namespace (namespace of > a function) is usually optimized to an array, but it can be a dict too. > > Python is hard to optimize because almost everything is mutable: builtin > functions, function code, global variables, local variables, ... can be > modified at runtime. Implementing optimizations respecting the Python > semantics requires to detect when "something changes": we will call > these checks "guards". > > The speedup of optimizations depends on the speed of guard checks. This > PEP proposes to add a version to dictionaries to implement fast guards > on namespaces. > > Dictionary lookups can be skipped if the version does not change which > is the common case for most namespaces. Since the version is globally > unique, the version is also enough to check if the namespace dictionary > was not replaced with a new dictionary. The performance of a guard does > not depend on the number of watched dictionary entries, complexity of > O(1), if the dictionary version does not change. > > Example of optimization: copy the value of a global variable to function > constants. This optimization requires a guard on the global variable to > check if it was modified. If the variable is modified, the variable must > be loaded at runtime when the function is called, instead of using the > constant. > > See the `PEP 510 -- Specialized functions with guards > `_ for the concrete usage of > guards to specialize functions and for the rationale on Python static > optimizers. > > > Guard example > ============= > > Pseudo-code of an fast guard to check if a dictionary entry was modified > (created, updated or deleted) using an hypothetical > ``dict_get_version(dict)`` function:: > > UNSET = object() > > class GuardDictKey: > def __init__(self, dict, key): > self.dict = dict > self.key = key > self.value = dict.get(key, UNSET) > self.version = dict_get_version(dict) > > def check(self): > """Return True if the dictionary entry did not changed > and the dictionary was not replaced.""" > > # read the version of the dict structure > version = dict_get_version(self.dict) > if version == self.version: > # Fast-path: dictionary lookup avoided > return True > > # lookup in the dictionary > value = self.dict.get(self.key, UNSET) > if value is self.value: > # another key was modified: > # cache the new dictionary version > self.version = version > return True > > # the key was modified > return False > > > Usage of the dict version > ========================= > > Speedup method calls 1.2x > ------------------------- > > Yury Selivanov wrote a `patch to optimize method calls > `_. The patch depends on the > `implement per-opcode cache in ceval > `_ patch which requires dictionary > versions to invalidate the cache if the globals dictionary or the > builtins dictionary has been modified. > > The cache also requires that the dictionary version is globally unique. > It is possible to define a function in a namespace and call it > in a different namespace: using ``exec()`` with the *globals* parameter > for example. In this case, the globals dictionary was changed and the > cache must be invalidated. > > > Specialized functions using guards > ---------------------------------- > > The `PEP 510 -- Specialized functions with guards > `_ proposes an API to support > specialized functions with guards. It allows to implement static > optimizers for Python without breaking the Python semantics. > > Example of a static Python optimizer: the `fatoptimizer > `_ of the `FAT Python > `_ project > implements many optimizations which require guards on namespaces. > Examples: > > * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on > ``builtins.__dict__['len']`` and ``globals()['len']`` are required > * Loop unrolling: to unroll the loop ``for i in range(...): ...``, > guards on ``builtins.__dict__['range']`` and ``globals()['range']`` > are required > > > Pyjion > ------ > > According of Brett Cannon, one of the two main developers of Pyjion, > Pyjion can also benefit from dictionary version to implement > optimizations. > > Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET > Core runtime). > > > Unladen Swallow > --------------- > > Even if dictionary version was not explicitly mentioned, optimizing > globals and builtins lookup was part of the Unladen Swallow plan: > "Implement one of the several proposed schemes for speeding lookups of > globals and builtins." Source: `Unladen Swallow ProjectPlan > `_. > > Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler > implemented with LLVM. The project stopped in 2011: `Unladen Swallow > Retrospective > `_. > > > Changes > ======= > > Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with > the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global > dictionary version. Each time a dictionary is created, the global > version is incremented and the dictionary version is initialized to the > global version. The global version is also incremented and copied to the > dictionary version at each dictionary change: > > * ``clear()`` if the dict was non-empty > * ``pop(key)`` if the key exists > * ``popitem()`` if the dict is non-empty > * ``setdefault(key, value)`` if the `key` does not exist > * ``__detitem__(key)`` if the key exists > * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value > is not ``dict[key]`` > * ``update(...)`` if new values are different than existing values: > values are compared by identity, not by their content; the version can > be incremented multiple times > > The ``PyDictObject`` structure is not part of the stable ABI. > > The field is called ``ma_version_tag`` rather than ``ma_version`` to > suggest to compare it using ``version_tag == old_version_tag`` rather > than ``version <= old_version`` which makes the integer overflow much > likely. > > Example using an hypothetical ``dict_get_version(dict)`` function:: > > >>> d = {} > >>> dict_get_version(d) > 100 > >>> d['key'] = 'value' > >>> dict_get_version(d) > 101 > >>> d['key'] = 'new value' > >>> dict_get_version(d) > 102 > >>> del d['key'] > >>> dict_get_version(d) > 103 > > The version is not incremented if an existing key is set to the same > value. For efficiency, values are compared by their identity: > ``new_value is old_value``, not by their content: > ``new_value == old_value``. Example:: > > >>> d = {} > >>> value = object() > >>> d['key'] = value > >>> dict_get_version(d) > 40 > >>> d['key'] = value > >>> dict_get_version(d) > 40 > > .. note:: > CPython uses some singleton like integers in the range [-5; 257], > empty tuple, empty strings, Unicode strings of a single character in > the range [U+0000; U+00FF], etc. When a key is set twice to the same > singleton, the version is not modified. > > > Implementation and Performance > ============================== > > The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject > `_ contains a patch implementing > this PEP. > > On pybench and timeit microbenchmarks, the patch does not seem to add > any overhead on dictionary operations. > > When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for > a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, > a guard can watch for multiple keys. For example, for an optimization > using 10 global variables in a function, 10 dictionary lookups costs 148 > ns, whereas the guard still only costs 3.8 ns when the version does not > change (39x as fast). > > The `fat module > `_ implements > such guards: ``fat.GuardDict`` is based on the dictionary version. > > > Integer overflow > ================ > > The implementation uses the C type ``PY_UINT64_T`` to store the version: > a 64 bits unsigned integer. The C code uses ``version++``. On integer > overflow, the version is wrapped to ``0`` (and then continue to be > incremented) according to the C standard. > > After an integer overflow, a guard can succeed whereas the watched > dictionary key was modified. The bug only occurs at a guard check if > there are exaclty ``2 ** 64`` dictionary creations or modifications > since the previous guard check. > > If a dictionary is modified every nanosecond, ``2 ** 64`` modifications > takes longer than 584 years. Using a 32-bit version, it only takes 4 > seconds. That's why a 64-bit unsigned type is also used on 32-bit > systems. A dictionary lookup at the C level takes 14.8 ns. > > A risk of a bug every 584 years is acceptable. > > > Alternatives > ============ > > Expose the version at Python level as a read-only __version__ property > ---------------------------------------------------------------------- > > The first version of the PEP proposed to expose the dictionary version > as a read-only ``__version__`` property at Python level, and also to add > the property to ``collections.UserDict`` (since this type must mimick > the ``dict`` API). > > There are multiple issues: > > * To be consistent and avoid bad surprises, the version must be added to > all mapping types. Implementing a new mapping type would require extra > work for no benefit, since the version is only required on the > ``dict`` type in practice. > * All Python implementations must implement this new property, it gives > more work to other implementations, whereas they may not use the > dictionary version at all. > * Exposing the dictionary version at Python level can lead the > false assumption on performances. Checking ``dict.__version__`` at > the Python level is not faster than a dictionary lookup. A dictionary > lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 > ns, the difference is only 1.2 ns (3%):: > > > $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' > 10000000 loops, best of 3: 0.0487 usec per loop > $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' > 'd.__version__ == 100' > 10000000 loops, best of 3: 0.0475 usec per loop > > * The ``__version__`` can be wrapped on integer overflow. It is error > prone: using ``dict.__version__ <= guard_version`` is wrong, > ``dict.__version__ == guard_version`` must be used instead to reduce > the risk of bug on integer overflow (even if the integer overflow is > unlikely in practice). > > Mandatory bikeshedding on the property name: > > * ``__cache_token__``: name proposed by Nick Coghlan, name coming from > `abc.get_cache_token() > `_. > * ``__version__`` > * ``__timestamp__`` > > > Add a version to each dict entry > -------------------------------- > > A single version per dictionary requires to keep a strong reference to > the value which can keep the value alive longer than expected. If we add > also a version per dictionary entry, the guard can only store the entry > version to avoid the strong reference to the value (only strong > references to the dictionary and to the key are needed). > > Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, > the field has the C type ``PY_INT64_T``. When a key is created or > modified, the entry version is set to the dictionary version which is > incremented at any change (create, modify, delete). > > Pseudo-code of an fast guard to check if a dictionary key was modified > using hypothetical ``dict_get_version(dict)`` > ``dict_get_entry_version(dict)`` functions:: > > UNSET = object() > > class GuardDictKey: > def __init__(self, dict, key): > self.dict = dict > self.key = key > self.dict_version = dict_get_version(dict) > self.entry_version = dict_get_entry_version(dict, key) > > def check(self): > """Return True if the dictionary entry did not changed > and the dictionary was not replaced.""" > > # read the version of the dict structure > dict_version = dict_get_version(self.dict) > if dict_version == self.version: > # Fast-path: dictionary lookup avoided > return True > > # lookup in the dictionary > entry_version = get_dict_key_version(dict, key) > if entry_version == self.entry_version: > # another key was modified: > # cache the new dictionary version > self.dict_version = dict_version > return True > > # the key was modified > return False > > The main drawback of this option is the impact on the memory footprint. > It increases the size of each dictionary entry, so the overhead depends > on the number of buckets (dictionary entries, used or unused yet). For > example, it increases the size of each dictionary entry by 8 bytes on > 64-bit system. > > In Python, the memory footprint matters and the trend is to reduce it. > Examples: > > * `PEP 393 -- Flexible String Representation > `_ > * `PEP 412 -- Key-Sharing Dictionary > `_ > > > Add a new dict subtype > ---------------------- > > Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, > use the ``verdict`` for namespaces (module namespace, type namespace, > instance namespace, etc.) instead of ``dict``. > > Leave the ``dict`` type unchanged to not add any overhead (memory > footprint) when guards are not needed. > > Technical issue: a lot of C code in the wild, including CPython core, > expecting the exact ``dict`` type. Issues: > > * ``exec()`` requires a ``dict`` for globals and locals. A lot of code > use ``globals={}``. It is not possible to cast the ``dict`` to a > ``dict`` subtype because the caller expects the ``globals`` parameter > to be modified (``dict`` is mutable). > * Functions call directly ``PyDict_xxx()`` functions, instead of calling > ``PyObject_xxx()`` if the object is a ``dict`` subtype > * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some > functions require the exact ``dict`` type. > * ``Python/ceval.c`` does not completely supports dict subtypes for > namespaces > > > The ``exec()`` issue is a blocker issue. > > Other issues: > > * The garbage collector has a special code to "untrack" ``dict`` > instances. If a ``dict`` subtype is used for namespaces, the garbage > collector can be unable to break some reference cycles. > * Some functions have a fast-path for ``dict`` which would not be taken > for ``dict`` subtypes, and so it would make Python a little bit > slower. > > > Prior Art > ========= > > Method cache and type version tag > --------------------------------- > > In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It > was merged into Python 2.6. The patch adds a "type attribute cache > version tag" (``tp_version_tag``) and a "valid version tag" flag to > types (the ``PyTypeObject`` structure). > > The type version tag is not available at the Python level. > > The version tag has the C type ``unsigned int``. The cache is a global > hash table of 4096 entries, shared by all types. The cache is global to > "make it fast, have a deterministic and low memory footprint, and be > easy to invalidate". Each cache entry has a version tag. A global > version tag is used to create the next version tag, it also has the C > type ``unsigned int``. > > By default, a type has its "valid version tag" flag cleared to indicate > that the version tag is invalid. When the first method of the type is > cached, the version tag and the "valid version tag" flag are set. When a > type is modified, the "valid version tag" flag of the type and its > subclasses is cleared. Later, when a cache entry of these types is used, > the entry is removed because its version tag is outdated. > > On integer overflow, the whole cache is cleared and the global version > tag is reset to ``0``. > > See `Method cache (issue #1685986) > `_ and `Armin's method cache > optimization updated for Python 2.6 (issue #1700288) > `_. > > > Globals / builtins cache > ------------------------ > > In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue > #10401) `_ which adds a private > ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), > the field has the C type ``Py_ssize_t``. > > The patch adds a "global and builtin cache" to functions and frames, and > changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the > cache. > > The change on the ``PyDictObject`` structure is very similar to this > PEP. > > > Cached globals+builtins lookup > ------------------------------ > > In 2006, Andrea Griffini proposed a patch implementing a `Cached > globals+builtins lookup optimization > `_. The patch adds a private > ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), > the field has the C type ``size_t``. > > Thread on python-dev: `About dictionary lookup caching > `_. > > > Guard against changing dict during iteration > -------------------------------------------- > > In 2013, Serhiy Storchaka proposed `Guard against changing dict during > iteration (issue #19332) `_ which > adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` > type), the field has the C type ``size_t``. This field is incremented > when the dictionary is modified, and so is very similar to the proposed > dictionary version. > > Sadly, the dictionary version proposed in this PEP doesn't help to > detect dictionary mutation. The dictionary version changes when values > are replaced, whereas modifying dictionary values while iterating on > dictionary keys is legit in Python. > > > PySizer > ------- > > `PySizer `_: a memory profiler for Python, > Google Summer of Code 2005 project by Nick Smallbone. > > This project has a patch for CPython 2.4 which adds ``key_time`` and > ``value_time`` fields to dictionary entries. It uses a global > process-wide counter for dictionaries, incremented each time that a > dictionary is modified. The times are used to decide when child objects > first appeared in their parent objects. > > > Discussion > ========== > > Thread on the mailing lists: > > * python-dev: `PEP 509: Add a private version to dict > `_ > (january 2016) > * python-ideas: `RFC: PEP: Add dict.__version__ > `_ > (january 2016) > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From random832 at fastmail.com Thu Apr 14 13:02:10 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 13:02:10 -0400 Subject: [Python-Dev] MAKE_FUNCTION simplification In-Reply-To: References: Message-ID: <1460653330.59950.578918777.61ACC6D9@webmail.messagingengine.com> On Thu, Apr 14, 2016, at 12:56, Terry Reedy wrote: > https://docs.python.org/3/library/dis.html#module-dis > CPython implementation detail: Bytecode is an implementation detail of > the CPython interpreter. No guarantees are made that bytecode will not > be added, removed, or changed between versions of Python. > > Version = minor release, as opposed to maintenance release. "between versions" is ambiguous. It could mean that there's no guarantee that there will be no changes from one version to the next, or it could mean, even more strongly, that there's no guarantee that there will be no changes in a maintenance release (which are, after all, released *between* minor releases) From p.f.moore at gmail.com Thu Apr 14 13:22:55 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 14 Apr 2016 18:22:55 +0100 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570FC955.3080908@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> <570FC955.3080908@stoneleaf.us> Message-ID: On 14 April 2016 at 17:46, Ethan Furman wrote: > On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote: > >> I am saying that if os.path.join now accepts RichPath objects, and those >> objects can return either str or bytes, then its much harder to reason >> about >> when I have all bytes or all strings. In essence, you will force me to >> pre- >> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or >> os.fsdecode(os.fspath(path)), just so I can reason about the type. And if >> I >> have to always do that wrapping then os.path.join doesn't need to accept >> RichPath objects and call fspath at all. > > > What many folks seem to be missing is that *you* (generic you) have control > of your data. > > If you are not working at the bytes layer, you shouldn't be getting bytes > objects because: > > - you specified str when asking for data from the OS, or > - you transformed the incoming bytes from whatever external source > to str when you received them. My experience is that (particularly with code that was originally written for Python 2) "you have control of your data" is often an illusion - bytes can appear in code from unexpected sources, and when they do I'd rather see an error if I'm using code where I expect a string. Certainly that's a bug in the code - all I'm saying is that it fail early rather than late. Having said this, I don't have an actual use case - but equally it seems to me that our problem is that *nobody* does (yet) because uptake of pathlib has been slow, thanks to limited stdlib support. My view remains that we should get the (relatively simple and uncontroversial) str support in place, and defer bytes support for when we have experience with that. I'd appreciate it if anyone can clarify why "gracefully extending" the protocol to include bytes support at a later date isn't practical. Paul From k7hoven at gmail.com Thu Apr 14 13:56:54 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Thu, 14 Apr 2016 20:56:54 +0300 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <570FC955.3080908@stoneleaf.us> References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> <570FC955.3080908@stoneleaf.us> Message-ID: On Thu, Apr 14, 2016 at 7:46 PM, Ethan Furman wrote: > > What many folks seem to be missing is that *you* (generic you) have control > of your data. > > If you are not working at the bytes layer, you shouldn't be getting bytes > objects because: > > - you specified str when asking for data from the OS, or > - you transformed the incoming bytes from whatever external source > to str when you received them. There is an apparent contradiction of the above with some previous posts, including your own. Let me try to fix it: Code that deals with paths can be divided in groups as follows: (1) Code that has access to pathname/filename data and has some level of control over what data type comes in. This code may for instance choose to deal with either bytes or str (2) Code that takes the path or file name that it happens to get and does something with it. This type of code can be divided into subgroups as follows: (2a) Code that accepts only one type of paths (e.g. str, bytes or pathlib) and fails if it gets something else. (2b) Code that wants to support different types of paths such as str, bytes or pathlib objects. This includes os.path.*, os.scandir, and various other standard library code. Presumably there is also third-party code that does the same. These functions may want to preserve the str-ness or bytes-ness of the paths in case they return paths, as the stdlib now does. But new code may even want to return pathlib objects when they get such objects as inputs. This is the duck-typing or polymorphic code we have been talking about. Code of this type (2b) may want to avoid implicit conversions because it makes the life of code of the other types more difficult. (feel free to fill in more categories of code) So the code of type (2b) is trying to make all categories happy by returning objects of the same type that it gets as input, while the other categories are probably in the situation where they don't necessarily need to make other categories of code happy. And the question is this: Do we need to make code using both bytes *and* scandir happy? This is largely the same question as whether we have to support bytes in addition to str in the protocol. (We may of course talk about third-party path libraries that have the same problem as scandir's DirEntry. Ethan's library is not exactly in the same category as DirEntry since its path objects *are* instances of bytes or str and therefore do not need this protocol to begin with, except perhaps for conversions from other high-level path types so that different path libraries work together nicely). -Koos From ethan at stoneleaf.us Thu Apr 14 14:12:33 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 11:12:33 -0700 Subject: [Python-Dev] MAKE_FUNCTION simplification In-Reply-To: <1460653330.59950.578918777.61ACC6D9@webmail.messagingengine.com> References: <1460653330.59950.578918777.61ACC6D9@webmail.messagingengine.com> Message-ID: <570FDD91.9030406@stoneleaf.us> On 04/14/2016 10:02 AM, Random832 wrote: > "between versions" is ambiguous. It could mean that there's no guarantee > that there will be no changes from one version to the next, or it could > mean, even more strongly, that there's no guarantee that there will be > no changes in a maintenance release (which are, after all, released > *between* minor releases) I don't see us making a breaking change in a maintenance release except to fix something that was already broken. -- ~Ethan~ From ethan at stoneleaf.us Thu Apr 14 14:17:25 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 11:17:25 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> <570FC955.3080908@stoneleaf.us> Message-ID: <570FDEB5.4050507@stoneleaf.us> On 04/14/2016 10:22 AM, Paul Moore wrote: > On 14 April 2016 at 17:46, Ethan Furman wrote: >> If you are not working at the bytes layer, you shouldn't be getting bytes >> objects because: >> >> - you specified str when asking for data from the OS, or >> - you transformed the incoming bytes from whatever external source >> to str when you received them. > > My experience is that (particularly with code that was originally > written for Python 2) "you have control of your data" is often an > illusion - bytes can appear in code from unexpected sources, and when > they do I'd rather see an error if I'm using code where I expect a > string. Certainly that's a bug in the code - all I'm saying is that it > fail early rather than late. If we have one function that uses a flag and you leave the flag alone (it defaults to rejecting bytes) -- voila! An error is raised when bytes show up. > I'd appreciate it if anyone can clarify why "gracefully extending" the > protocol to include bytes support at a later date isn't practical. It's going to be a bunch of work. I don't want to do the work twice. On the other hand, if while doing the work it becomes apparent that supporting bytes and str in the protocol is either infeasible, confusing, or a plain ol' bad idea I have no problem ripping out the bytes support and going to str only. -- ~Ethan~ From random832 at fastmail.com Thu Apr 14 14:35:35 2016 From: random832 at fastmail.com (Random832) Date: Thu, 14 Apr 2016 14:35:35 -0400 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> <570FC955.3080908@stoneleaf.us> Message-ID: <1460658935.81222.579003873.3129D94A@webmail.messagingengine.com> On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote: > (1) Code that has access to pathname/filename data and has some level > of control over what data type comes in. This code may for instance > choose to deal with either bytes or str > > (2) Code that takes the path or file name that it happens to get and > does something with it. This type of code can be divided into > subgroups as follows: > > (2a) Code that accepts only one type of paths (e.g. str, bytes or > pathlib) and fails if it gets something else. Ideally, these should go away. > (2b) Code that wants to support different types of paths such as > str, bytes or pathlib objects. This includes os.path.*, os.scandir, > and various other standard library code. Presumably there is also > third-party code that does the same. These functions may want to > preserve the str-ness or bytes-ness of the paths in case they return > paths, as the stdlib now does. But new code may even want to return > pathlib objects when they get such objects as inputs. Hold on. None of the discussion I've seen has included any way to specify how to construct a new object representing a different path other than the ones passed in. Surely you're not suggesting type(a)(b). Also, how does DirEntry fit in with any of this? > This is the > duck-typing or polymorphic code we have been talking about. Code of > this type (2b) may want to avoid implicit conversions because it makes > the life of code of the other types more difficult. As long as the type it returns is still a path/bytes/str (and therefore can be accepted when the caller passes it somewhere else) what's the problem? From ethan at stoneleaf.us Thu Apr 14 14:39:43 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 11:39:43 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> Message-ID: <570FE3EF.8050206@stoneleaf.us> On 04/13/2016 02:37 PM, Victor Stinner wrote: > I'm not a big fan of a flag parameter to change the return type of a > function. Usually, two functions are preferred. In the os module we have > getcwd/getcwdb for example. I don't know if it's a good example I think of os.fspath() as more of a filter/reduce operation: - str -> str - str DirEntry -> str - bytes -> bytes - bytes DirEntry -> bytes The purpose of os.fspath() (at least the one I'm arguing for ;) is to distil its inputs to the lowest common denominator, and no lower -- which is either str for string-based path objects, or bytes for bytes-based path objects. -- ~Ethan~ From k7hoven at gmail.com Thu Apr 14 15:17:21 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Thu, 14 Apr 2016 22:17:21 +0300 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460658935.81222.579003873.3129D94A@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <570FAD78.60505@stoneleaf.us> <570FC955.3080908@stoneleaf.us> <1460658935.81222.579003873.3129D94A@webmail.messagingengine.com> Message-ID: On Thu, Apr 14, 2016 at 9:35 PM, Random832 wrote: > On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote: >> (1) Code that has access to pathname/filename data and has some level >> of control over what data type comes in. This code may for instance >> choose to deal with either bytes or str >> >> (2) Code that takes the path or file name that it happens to get and >> does something with it. This type of code can be divided into >> subgroups as follows: >> >> (2a) Code that accepts only one type of paths (e.g. str, bytes or >> pathlib) and fails if it gets something else. > > Ideally, these should go away. > I don't think so. (1) might even be the most common type of all code. This is code that gets a path from user input, from a config file, from a database etc. and then does things with it, typically including passing it to type (2) code and potentially getting a path back from there too. >> (2b) Code that wants to support different types of paths such as >> str, bytes or pathlib objects. This includes os.path.*, os.scandir, >> and various other standard library code. Presumably there is also >> third-party code that does the same. These functions may want to >> preserve the str-ness or bytes-ness of the paths in case they return >> paths, as the stdlib now does. But new code may even want to return >> pathlib objects when they get such objects as inputs. > > Hold on. None of the discussion I've seen has included any way to > specify how to construct a new object representing a different path > other than the ones passed in. Surely you're not suggesting type(a)(b). > That's right. This protocol is not solving the issue of returning 'rich' path objects. It's solving the issue of passing those objects to lower-level functions or to interact with other 'rich' path types. What I meant by this is that there may be code that *does* want to do type(a)(b), which is out of our control. Maybe I should not have mentioned that. > Also, how does DirEntry fit in with any of this? > os.scandir + DirEntry are one of the many things in the stdlib that give you pathnames of the same type as those that were put in. >> This is the >> duck-typing or polymorphic code we have been talking about. Code of >> this type (2b) may want to avoid implicit conversions because it makes >> the life of code of the other types more difficult. > > As long as the type it returns is still a path/bytes/str (and therefore > can be accepted when the caller passes it somewhere else) what's the > problem? No, because not all paths are passed to the function that does the implicit conversion, and then when for instance os.path.joining two paths of a differenty type, it raises an error. In other words: Most non-library code (even library code?) deals with one specific type and does not want implicit conversions to other types. Some code (2b) deals with several types and, at least in the stdlib, such code returns paths of the same type as they are given, which makes said "most non-library code" happy, because it does not force the programmer to think about type conversions. (Then there is also code that explicitly deals with type conversions, such as os.fsencode and os.fsdecode.) -Koos From victor.stinner at gmail.com Thu Apr 14 15:49:12 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 21:49:12 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: It would be nice to hear Barry Warsow who was opposed to the PEP in january. He wanted to wait until FAT Python was proven to really be faster, which is still not case right now. (I mean that I didnt't run seriously benchmarks, but early macro benchmarks are not really promising, only micro benchmarks. I expect better results when the implemenation will be more complete.) The main change since january is that Yury wrote a patch making method calls using the PEP. https://mail.python.org/pipermail/python-dev/2016-January/142772.html Victor Le jeudi 14 avril 2016, Guido van Rossum a ?crit : > I'll wait a day before formally pronouncing to see if any objections > are made, but it looks good to me. > > On Thu, Apr 14, 2016 at 8:19 AM, Victor Stinner > > wrote: > > Hi, > > > > I updated my PEP 509 to make the dictionary version globally unique. > > With *two* use cases of this PEP (Yury's method call patch and my FAT > > Python project), I think that the PEP is now ready to be accepted. > > > > Globally unique identifier is a requirement for Yury's patch > > optimizing method calls ( https://bugs.python.org/issue26110 ). It > > allows to check for free if the dictionary was replaced. > > > > I also renamed the ma_version field to ma_version_tag. > > > > HTML version: > > https://www.python.org/dev/peps/pep-0509/ > > > > Victor > > > > > > PEP: 509 > > Title: Add a private version to dict > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Victor Stinner > > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 4-January-2016 > > Python-Version: 3.6 > > > > > > Abstract > > ======== > > > > Add a new private version to the builtin ``dict`` type, incremented at > > each dictionary creation and at each dictionary change, to implement > > fast guards on namespaces. > > > > > > Rationale > > ========= > > > > In Python, the builtin ``dict`` type is used by many instructions. For > > example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the > > global namespace, or in the builtins namespace (two dict lookups). > > Python uses ``dict`` for the builtins namespace, globals namespace, type > > namespaces, instance namespaces, etc. The local namespace (namespace of > > a function) is usually optimized to an array, but it can be a dict too. > > > > Python is hard to optimize because almost everything is mutable: builtin > > functions, function code, global variables, local variables, ... can be > > modified at runtime. Implementing optimizations respecting the Python > > semantics requires to detect when "something changes": we will call > > these checks "guards". > > > > The speedup of optimizations depends on the speed of guard checks. This > > PEP proposes to add a version to dictionaries to implement fast guards > > on namespaces. > > > > Dictionary lookups can be skipped if the version does not change which > > is the common case for most namespaces. Since the version is globally > > unique, the version is also enough to check if the namespace dictionary > > was not replaced with a new dictionary. The performance of a guard does > > not depend on the number of watched dictionary entries, complexity of > > O(1), if the dictionary version does not change. > > > > Example of optimization: copy the value of a global variable to function > > constants. This optimization requires a guard on the global variable to > > check if it was modified. If the variable is modified, the variable must > > be loaded at runtime when the function is called, instead of using the > > constant. > > > > See the `PEP 510 -- Specialized functions with guards > > `_ for the concrete usage of > > guards to specialize functions and for the rationale on Python static > > optimizers. > > > > > > Guard example > > ============= > > > > Pseudo-code of an fast guard to check if a dictionary entry was modified > > (created, updated or deleted) using an hypothetical > > ``dict_get_version(dict)`` function:: > > > > UNSET = object() > > > > class GuardDictKey: > > def __init__(self, dict, key): > > self.dict = dict > > self.key = key > > self.value = dict.get(key, UNSET) > > self.version = dict_get_version(dict) > > > > def check(self): > > """Return True if the dictionary entry did not changed > > and the dictionary was not replaced.""" > > > > # read the version of the dict structure > > version = dict_get_version(self.dict) > > if version == self.version: > > # Fast-path: dictionary lookup avoided > > return True > > > > # lookup in the dictionary > > value = self.dict.get(self.key, UNSET) > > if value is self.value: > > # another key was modified: > > # cache the new dictionary version > > self.version = version > > return True > > > > # the key was modified > > return False > > > > > > Usage of the dict version > > ========================= > > > > Speedup method calls 1.2x > > ------------------------- > > > > Yury Selivanov wrote a `patch to optimize method calls > > `_. The patch depends on the > > `implement per-opcode cache in ceval > > `_ patch which requires dictionary > > versions to invalidate the cache if the globals dictionary or the > > builtins dictionary has been modified. > > > > The cache also requires that the dictionary version is globally unique. > > It is possible to define a function in a namespace and call it > > in a different namespace: using ``exec()`` with the *globals* parameter > > for example. In this case, the globals dictionary was changed and the > > cache must be invalidated. > > > > > > Specialized functions using guards > > ---------------------------------- > > > > The `PEP 510 -- Specialized functions with guards > > `_ proposes an API to support > > specialized functions with guards. It allows to implement static > > optimizers for Python without breaking the Python semantics. > > > > Example of a static Python optimizer: the `fatoptimizer > > `_ of the `FAT Python > > `_ project > > implements many optimizations which require guards on namespaces. > > Examples: > > > > * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on > > ``builtins.__dict__['len']`` and ``globals()['len']`` are required > > * Loop unrolling: to unroll the loop ``for i in range(...): ...``, > > guards on ``builtins.__dict__['range']`` and ``globals()['range']`` > > are required > > > > > > Pyjion > > ------ > > > > According of Brett Cannon, one of the two main developers of Pyjion, > > Pyjion can also benefit from dictionary version to implement > > optimizations. > > > > Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET > > Core runtime). > > > > > > Unladen Swallow > > --------------- > > > > Even if dictionary version was not explicitly mentioned, optimizing > > globals and builtins lookup was part of the Unladen Swallow plan: > > "Implement one of the several proposed schemes for speeding lookups of > > globals and builtins." Source: `Unladen Swallow ProjectPlan > > `_. > > > > Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler > > implemented with LLVM. The project stopped in 2011: `Unladen Swallow > > Retrospective > > >`_. > > > > > > Changes > > ======= > > > > Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with > > the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global > > dictionary version. Each time a dictionary is created, the global > > version is incremented and the dictionary version is initialized to the > > global version. The global version is also incremented and copied to the > > dictionary version at each dictionary change: > > > > * ``clear()`` if the dict was non-empty > > * ``pop(key)`` if the key exists > > * ``popitem()`` if the dict is non-empty > > * ``setdefault(key, value)`` if the `key` does not exist > > * ``__detitem__(key)`` if the key exists > > * ``__setitem__(key, value)`` if the `key` doesn't exist or if the value > > is not ``dict[key]`` > > * ``update(...)`` if new values are different than existing values: > > values are compared by identity, not by their content; the version can > > be incremented multiple times > > > > The ``PyDictObject`` structure is not part of the stable ABI. > > > > The field is called ``ma_version_tag`` rather than ``ma_version`` to > > suggest to compare it using ``version_tag == old_version_tag`` rather > > than ``version <= old_version`` which makes the integer overflow much > > likely. > > > > Example using an hypothetical ``dict_get_version(dict)`` function:: > > > > >>> d = {} > > >>> dict_get_version(d) > > 100 > > >>> d['key'] = 'value' > > >>> dict_get_version(d) > > 101 > > >>> d['key'] = 'new value' > > >>> dict_get_version(d) > > 102 > > >>> del d['key'] > > >>> dict_get_version(d) > > 103 > > > > The version is not incremented if an existing key is set to the same > > value. For efficiency, values are compared by their identity: > > ``new_value is old_value``, not by their content: > > ``new_value == old_value``. Example:: > > > > >>> d = {} > > >>> value = object() > > >>> d['key'] = value > > >>> dict_get_version(d) > > 40 > > >>> d['key'] = value > > >>> dict_get_version(d) > > 40 > > > > .. note:: > > CPython uses some singleton like integers in the range [-5; 257], > > empty tuple, empty strings, Unicode strings of a single character in > > the range [U+0000; U+00FF], etc. When a key is set twice to the same > > singleton, the version is not modified. > > > > > > Implementation and Performance > > ============================== > > > > The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject > > `_ contains a patch implementing > > this PEP. > > > > On pybench and timeit microbenchmarks, the patch does not seem to add > > any overhead on dictionary operations. > > > > When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for > > a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, > > a guard can watch for multiple keys. For example, for an optimization > > using 10 global variables in a function, 10 dictionary lookups costs 148 > > ns, whereas the guard still only costs 3.8 ns when the version does not > > change (39x as fast). > > > > The `fat module > > `_ implements > > such guards: ``fat.GuardDict`` is based on the dictionary version. > > > > > > Integer overflow > > ================ > > > > The implementation uses the C type ``PY_UINT64_T`` to store the version: > > a 64 bits unsigned integer. The C code uses ``version++``. On integer > > overflow, the version is wrapped to ``0`` (and then continue to be > > incremented) according to the C standard. > > > > After an integer overflow, a guard can succeed whereas the watched > > dictionary key was modified. The bug only occurs at a guard check if > > there are exaclty ``2 ** 64`` dictionary creations or modifications > > since the previous guard check. > > > > If a dictionary is modified every nanosecond, ``2 ** 64`` modifications > > takes longer than 584 years. Using a 32-bit version, it only takes 4 > > seconds. That's why a 64-bit unsigned type is also used on 32-bit > > systems. A dictionary lookup at the C level takes 14.8 ns. > > > > A risk of a bug every 584 years is acceptable. > > > > > > Alternatives > > ============ > > > > Expose the version at Python level as a read-only __version__ property > > ---------------------------------------------------------------------- > > > > The first version of the PEP proposed to expose the dictionary version > > as a read-only ``__version__`` property at Python level, and also to add > > the property to ``collections.UserDict`` (since this type must mimick > > the ``dict`` API). > > > > There are multiple issues: > > > > * To be consistent and avoid bad surprises, the version must be added to > > all mapping types. Implementing a new mapping type would require extra > > work for no benefit, since the version is only required on the > > ``dict`` type in practice. > > * All Python implementations must implement this new property, it gives > > more work to other implementations, whereas they may not use the > > dictionary version at all. > > * Exposing the dictionary version at Python level can lead the > > false assumption on performances. Checking ``dict.__version__`` at > > the Python level is not faster than a dictionary lookup. A dictionary > > lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5 > > ns, the difference is only 1.2 ns (3%):: > > > > > > $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' > 'd["33"] == 33' > > 10000000 loops, best of 3: 0.0487 usec per loop > > $ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' > > 'd.__version__ == 100' > > 10000000 loops, best of 3: 0.0475 usec per loop > > > > * The ``__version__`` can be wrapped on integer overflow. It is error > > prone: using ``dict.__version__ <= guard_version`` is wrong, > > ``dict.__version__ == guard_version`` must be used instead to reduce > > the risk of bug on integer overflow (even if the integer overflow is > > unlikely in practice). > > > > Mandatory bikeshedding on the property name: > > > > * ``__cache_token__``: name proposed by Nick Coghlan, name coming from > > `abc.get_cache_token() > > `_. > > * ``__version__`` > > * ``__timestamp__`` > > > > > > Add a version to each dict entry > > -------------------------------- > > > > A single version per dictionary requires to keep a strong reference to > > the value which can keep the value alive longer than expected. If we add > > also a version per dictionary entry, the guard can only store the entry > > version to avoid the strong reference to the value (only strong > > references to the dictionary and to the key are needed). > > > > Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure, > > the field has the C type ``PY_INT64_T``. When a key is created or > > modified, the entry version is set to the dictionary version which is > > incremented at any change (create, modify, delete). > > > > Pseudo-code of an fast guard to check if a dictionary key was modified > > using hypothetical ``dict_get_version(dict)`` > > ``dict_get_entry_version(dict)`` functions:: > > > > UNSET = object() > > > > class GuardDictKey: > > def __init__(self, dict, key): > > self.dict = dict > > self.key = key > > self.dict_version = dict_get_version(dict) > > self.entry_version = dict_get_entry_version(dict, key) > > > > def check(self): > > """Return True if the dictionary entry did not changed > > and the dictionary was not replaced.""" > > > > # read the version of the dict structure > > dict_version = dict_get_version(self.dict) > > if dict_version == self.version: > > # Fast-path: dictionary lookup avoided > > return True > > > > # lookup in the dictionary > > entry_version = get_dict_key_version(dict, key) > > if entry_version == self.entry_version: > > # another key was modified: > > # cache the new dictionary version > > self.dict_version = dict_version > > return True > > > > # the key was modified > > return False > > > > The main drawback of this option is the impact on the memory footprint. > > It increases the size of each dictionary entry, so the overhead depends > > on the number of buckets (dictionary entries, used or unused yet). For > > example, it increases the size of each dictionary entry by 8 bytes on > > 64-bit system. > > > > In Python, the memory footprint matters and the trend is to reduce it. > > Examples: > > > > * `PEP 393 -- Flexible String Representation > > `_ > > * `PEP 412 -- Key-Sharing Dictionary > > `_ > > > > > > Add a new dict subtype > > ---------------------- > > > > Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, > > use the ``verdict`` for namespaces (module namespace, type namespace, > > instance namespace, etc.) instead of ``dict``. > > > > Leave the ``dict`` type unchanged to not add any overhead (memory > > footprint) when guards are not needed. > > > > Technical issue: a lot of C code in the wild, including CPython core, > > expecting the exact ``dict`` type. Issues: > > > > * ``exec()`` requires a ``dict`` for globals and locals. A lot of code > > use ``globals={}``. It is not possible to cast the ``dict`` to a > > ``dict`` subtype because the caller expects the ``globals`` parameter > > to be modified (``dict`` is mutable). > > * Functions call directly ``PyDict_xxx()`` functions, instead of calling > > ``PyObject_xxx()`` if the object is a ``dict`` subtype > > * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some > > functions require the exact ``dict`` type. > > * ``Python/ceval.c`` does not completely supports dict subtypes for > > namespaces > > > > > > The ``exec()`` issue is a blocker issue. > > > > Other issues: > > > > * The garbage collector has a special code to "untrack" ``dict`` > > instances. If a ``dict`` subtype is used for namespaces, the garbage > > collector can be unable to break some reference cycles. > > * Some functions have a fast-path for ``dict`` which would not be taken > > for ``dict`` subtypes, and so it would make Python a little bit > > slower. > > > > > > Prior Art > > ========= > > > > Method cache and type version tag > > --------------------------------- > > > > In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It > > was merged into Python 2.6. The patch adds a "type attribute cache > > version tag" (``tp_version_tag``) and a "valid version tag" flag to > > types (the ``PyTypeObject`` structure). > > > > The type version tag is not available at the Python level. > > > > The version tag has the C type ``unsigned int``. The cache is a global > > hash table of 4096 entries, shared by all types. The cache is global to > > "make it fast, have a deterministic and low memory footprint, and be > > easy to invalidate". Each cache entry has a version tag. A global > > version tag is used to create the next version tag, it also has the C > > type ``unsigned int``. > > > > By default, a type has its "valid version tag" flag cleared to indicate > > that the version tag is invalid. When the first method of the type is > > cached, the version tag and the "valid version tag" flag are set. When a > > type is modified, the "valid version tag" flag of the type and its > > subclasses is cleared. Later, when a cache entry of these types is used, > > the entry is removed because its version tag is outdated. > > > > On integer overflow, the whole cache is cleared and the global version > > tag is reset to ``0``. > > > > See `Method cache (issue #1685986) > > `_ and `Armin's method cache > > optimization updated for Python 2.6 (issue #1700288) > > `_. > > > > > > Globals / builtins cache > > ------------------------ > > > > In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue > > #10401) `_ which adds a private > > ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), > > the field has the C type ``Py_ssize_t``. > > > > The patch adds a "global and builtin cache" to functions and frames, and > > changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the > > cache. > > > > The change on the ``PyDictObject`` structure is very similar to this > > PEP. > > > > > > Cached globals+builtins lookup > > ------------------------------ > > > > In 2006, Andrea Griffini proposed a patch implementing a `Cached > > globals+builtins lookup optimization > > `_. The patch adds a private > > ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), > > the field has the C type ``size_t``. > > > > Thread on python-dev: `About dictionary lookup caching > > >`_. > > > > > > Guard against changing dict during iteration > > -------------------------------------------- > > > > In 2013, Serhiy Storchaka proposed `Guard against changing dict during > > iteration (issue #19332) `_ which > > adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` > > type), the field has the C type ``size_t``. This field is incremented > > when the dictionary is modified, and so is very similar to the proposed > > dictionary version. > > > > Sadly, the dictionary version proposed in this PEP doesn't help to > > detect dictionary mutation. The dictionary version changes when values > > are replaced, whereas modifying dictionary values while iterating on > > dictionary keys is legit in Python. > > > > > > PySizer > > ------- > > > > `PySizer `_: a memory profiler for Python, > > Google Summer of Code 2005 project by Nick Smallbone. > > > > This project has a patch for CPython 2.4 which adds ``key_time`` and > > ``value_time`` fields to dictionary entries. It uses a global > > process-wide counter for dictionaries, incremented each time that a > > dictionary is modified. The times are used to decide when child objects > > first appeared in their parent objects. > > > > > > Discussion > > ========== > > > > Thread on the mailing lists: > > > > * python-dev: `PEP 509: Add a private version to dict > > >`_ > > (january 2016) > > * python-ideas: `RFC: PEP: Add dict.__version__ > > < > https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_ > > (january 2016) > > > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Apr 14 15:56:10 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 21:56:10 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: Which kind of usage do you see in Cython? Off-topic (PEP 510): I really want to experiment automatic generation of Cython code from the Python using profiling to discover function parameters types. Then use the PEP 510 to attach the fast Cython code to a Python function, but fallback to bytecode if the types are different. See the example of builtin functions in the PEP: https://www.python.org/dev/peps/pep-0510/#using-builtin-function Before having something fully automated, we can use some manual steps, like annotate manually function types, compile manually the code, etc. Victor Le jeudi 14 avril 2016, Stefan Behnel a ?crit : > +1 from me, too. I'm sure we can make some use of this in Cython. > > Stefan > > > Victor Stinner schrieb am 14.04.2016 um 17:19: > > PEP: 509 > > Title: Add a private version to dict > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Apr 14 16:34:28 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 14 Apr 2016 22:34:28 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: Victor Stinner schrieb am 14.04.2016 um 21:56: > Which kind of usage do you see in Cython? Mainly caching, I guess. We could avoid global/module name lookups in some cases, especially inside of loops. > Off-topic (PEP 510): > > I really want to experiment automatic generation of Cython code from the > Python using profiling to discover function parameters types. Then use the > PEP 510 to attach the fast Cython code to a Python function, but fallback > to bytecode if the types are different. See the example of builtin > functions in the PEP: > https://www.python.org/dev/peps/pep-0510/#using-builtin-function > > Before having something fully automated, we can use some manual steps, like > annotate manually function types, compile manually the code, etc. Sounds like Cython's "Fused Types" could help here: http://docs.cython.org/src/userguide/fusedtypes.html It's essentially a generic functions implementation and you get a dispatch either at compile time or runtime, depending on where (Python/Cython) and how you call a function. Stefan From arigo at tunes.org Thu Apr 14 16:42:21 2016 From: arigo at tunes.org (Armin Rigo) Date: Thu, 14 Apr 2016 22:42:21 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: Hi Victor, On 14 April 2016 at 17:19, Victor Stinner wrote: > Each time a dictionary is created, the global > version is incremented and the dictionary version is initialized to the > global version. A detail, but why not set the version tag of new empty dictionaries to zero, always? Same after a clear(). This would satisfy the condition: equality of the version tag is supposed to mean "the dictionary content is precisely the same". A bient?t, Armin. From barry at python.org Thu Apr 14 16:50:51 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 14 Apr 2016 16:50:51 -0400 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: <20160414165051.341ab547@subdivisions> On Apr 14, 2016, at 09:49 PM, Victor Stinner wrote: >It would be nice to hear Barry Warsow who was opposed to the PEP in >january. He wanted to wait until FAT Python was proven to really be faster, >which is still not case right now. (I mean that I didnt't run seriously >benchmarks, but early macro benchmarks are not really promising, only micro >benchmarks. I expect better results when the implemenation will be more >complete.) Although I'm not totally convinced, I won't continue to object. You've provided some performance numbers in the PEP even without FAT, and you aren't exposing the API to Python, so it's not a burden being imposed on other implementations. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vgr255 at live.ca Thu Apr 14 17:00:58 2016 From: vgr255 at live.ca (=?UTF-8?Q?=C3=89manuel_Barry?=) Date: Thu, 14 Apr 2016 17:00:58 -0400 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: > From Armin Rigo > Sent: Thursday, April 14, 2016 4:42 PM > To: Victor Stinner > Cc: Python Dev > Subject: Re: [Python-Dev] RFC: PEP 509: Add a private version to dict > > Hi Victor, > > On 14 April 2016 at 17:19, Victor Stinner wrote: > > Each time a dictionary is created, the global > > version is incremented and the dictionary version is initialized to the > > global version. > > A detail, but why not set the version tag of new empty dictionaries to > zero, always? Same after a clear(). This would satisfy the > condition: equality of the version tag is supposed to mean "the > dictionary content is precisely the same". >From Victor's original post: "Globally unique identifier is a requirement for Yury's patch optimizing method calls ( https://bugs.python.org/issue26110 ). It allows to check for free if the dictionary was replaced." I think it's a good design idea, and there's no chance that this counter will ever overflow (I think Victor is using 64-bit unsigned integer). I don't think there's really any drawback to using a global vs per-dict counter (but Victor is better placed to answer that :)) -Emanuel ~Ducks lay where no programmer has ever been~ From victor.stinner at gmail.com Thu Apr 14 17:17:30 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 23:17:30 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: Hi, 2016-04-14 22:42 GMT+02:00 Armin Rigo : > Hi Victor, > > On 14 April 2016 at 17:19, Victor Stinner wrote: >> Each time a dictionary is created, the global >> version is incremented and the dictionary version is initialized to the >> global version. > > A detail, but why not set the version tag of new empty dictionaries to > zero, always? Same after a clear(). This would satisfy the > condition: equality of the version tag is supposed to mean "the > dictionary content is precisely the same". You're right that incrementing the global version is useless for these specific cases, and using the version 0 should work. It only matters that the version (version? version tag?) is different. I will play with that. If I don't see any issue, I will update the PEP. It's more an implementation detail, but it may help to mention it in the PEP. Victor From victor.stinner at gmail.com Thu Apr 14 17:19:24 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Apr 2016 23:19:24 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: <20160414165051.341ab547@subdivisions> References: <20160414165051.341ab547@subdivisions> Message-ID: 2016-04-14 22:50 GMT+02:00 Barry Warsaw : > Although I'm not totally convinced, I won't continue to object. You've > provided some performance numbers in the PEP even without FAT, and you aren't > exposing the API to Python, so it's not a burden being imposed on other > implementations. Cool! Ah right, the PEP evolved since its first version sent to python-ideas. I didn't recall the full context of the discussion. The PEP is now more complete and it has more known (future) use cases ;-) (now maybe also Cython?) Victor From barry at python.org Thu Apr 14 17:29:26 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 14 Apr 2016 17:29:26 -0400 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: <20160414172926.44085562@subdivisions> On Apr 14, 2016, at 11:17 PM, Victor Stinner wrote: >You're right that incrementing the global version is useless for these >specific cases, and using the version 0 should work. It only matters >that the version (version? version tag?) is different. > >I will play with that. If I don't see any issue, I will update the PEP. > >It's more an implementation detail, but it may help to mention it in the PEP. I can see why you might want a global version number, but not doing so would eliminate an implicit reliance on the GIL, or in a GIL-less implementation a lock around incrementing the global version number. -Barry From victor.stinner at gmail.com Thu Apr 14 18:13:21 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 00:13:21 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: <20160414172926.44085562@subdivisions> References: <20160414172926.44085562@subdivisions> Message-ID: 2016-04-14 23:29 GMT+02:00 Barry Warsaw : > I can see why you might want a global version number, but not doing so would > eliminate an implicit reliance on the GIL, or in a GIL-less implementation > a lock around incrementing the global version number. It's not like the builtin dict type is going to become GIL-free... So I think that it's ok to use a global version. A very few know that, but the GIL has some advantages sometimes... Victor From brett at python.org Thu Apr 14 18:22:04 2016 From: brett at python.org (Brett Cannon) Date: Thu, 14 Apr 2016 22:22:04 +0000 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <20160414172926.44085562@subdivisions> Message-ID: On Thu, 14 Apr 2016 at 15:14 Victor Stinner wrote: > 2016-04-14 23:29 GMT+02:00 Barry Warsaw : > > I can see why you might want a global version number, but not doing so > would > > eliminate an implicit reliance on the GIL, or in a GIL-less > implementation > > a lock around incrementing the global version number. > > It's not like the builtin dict type is going to become GIL-free... So > I think that it's ok to use a global version. > > A very few know that, but the GIL has some advantages sometimes... > And even if it was GIL-free you do run the risk of two dicts ending up at the same version # by simply mutating the same number of times if the counters were per-dict instead of process-wide. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Apr 14 18:33:23 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 00:33:23 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <20160414172926.44085562@subdivisions> Message-ID: 2016-04-15 0:22 GMT+02:00 Brett Cannon : > And even if it was GIL-free you do run the risk of two dicts ending up at > the same version # by simply mutating the same number of times if the > counters were per-dict instead of process-wide. For some optimizations, it is not needed to check if the dictionary was replaced, or you check it directly. So it doesn't matter to have the same version with the same number of operations. For the use case of Yury's optimization, having a globally unique version tag makes the guard much cheaper, and the guard must check that the dictionary was not replaced. IMHO it's cheap enough to make the version globally unique. I don't see any technical drawback of having a globally unique version. It doesn't make the integer overflow much more likely. We are still talking about many years before an overflow occurs. -- When we will be able to get ride of the GIL for the dict type, we will probably be able to get an atomic "global_version++" for 64-bit integer. Right now, I don't think that an atomic int64++ is available on 32-bit archs. Victor From v+python at g.nevcal.com Thu Apr 14 19:56:55 2016 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 14 Apr 2016 16:56:55 -0700 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <20160414172926.44085562@subdivisions> Message-ID: <57102E47.4020005@g.nevcal.com> On 4/14/2016 3:33 PM, Victor Stinner wrote: > When we will be able to get ride of the GIL for the dict type, we will > probably be able to get an atomic "global_version++" for 64-bit > integer. Right now, I don't think that an atomic int64++ is available > on 32-bit archs. By the time we get an atomic increment for 64-bit integer, we'll be wanting it for 128-bit... -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Apr 14 20:06:25 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 14 Apr 2016 20:06:25 -0400 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: <57103081.5060803@gmail.com> On 2016-04-14 4:42 PM, Armin Rigo wrote: > Hi Victor, > > On 14 April 2016 at 17:19, Victor Stinner wrote: >> Each time a dictionary is created, the global >> version is incremented and the dictionary version is initialized to the >> global version. > A detail, but why not set the version tag of new empty dictionaries to > zero, always? Same after a clear(). This would satisfy the > condition: equality of the version tag is supposed to mean "the > dictionary content is precisely the same". So {}.version_tag == {}.version_tag == 0 {'a':1}.version_tag != {'a':1}.version_tag right? For my patches I need globally unique version tags (making an exception for empty dicts is OK). Yury From python at mrabarnett.plus.com Thu Apr 14 20:11:16 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 15 Apr 2016 01:11:16 +0100 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: <571031A4.3090308@mrabarnett.plus.com> On 2016-04-14 21:42, Armin Rigo wrote: > Hi Victor, > > On 14 April 2016 at 17:19, Victor Stinner wrote: >> Each time a dictionary is created, the global >> version is incremented and the dictionary version is initialized to the >> global version. > > A detail, but why not set the version tag of new empty dictionaries to > zero, always? Same after a clear(). This would satisfy the > condition: equality of the version tag is supposed to mean "the > dictionary content is precisely the same". > If you did that, wouldn't it then be possible to replace an empty dict with another empty dict with you noticing? Would that matter? From stephen at xemacs.org Thu Apr 14 20:20:42 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 15 Apr 2016 09:20:42 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <570FB650.203@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <570FB650.203@stoneleaf.us> Message-ID: <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp> Ethan Furman writes: > Substitute open() with sending those bytes somewhere else: Eg, pathlib.Path, which will raise? Surely it should be safe to pass a DirEntry to a pathlib constructor? Note that having Path call fsdecode implicitly is a bad idea, because we don't know the provenance of generic bytes. But by design of __fspath__, its value (if str) is suitable for passing to Path, for further processing. > why should I have to reencode this str back to bytes, when bytes > are what I asked for in the first place? Erm, you didn't *ask* for bytes. You asked for whatever __fspath__ is going to give you. And in many cases, like pathlib, it will be str. I imagine that doesn't bother you; you plan to use antipathy anyway. But if there's uptake on the protocol, I'll bet that str-only implementations are the majority. And your question also cuts the other way. Why should *I* have to decode bytes to str, or suffer unexpected TypeErrors, or deal with the possibility of TypeErrors, just because __fspath__ is polymorphic? We're here to improve pathlib. There's been a huge amount of mission creep, with no use cases to provide intuition. You pit your abstract inconvenience against my 20 years of whack-a-mole with UnicodeErrors and TypeErrors in Mailman. I *know* that if you let bytes that represent text loose inside an application, eventually they'll end up in a str context and "blooey!" > How did this application get a bytes path object to begin with? > Either it explicitly used bytes when calling scandir and friends > (in which case it shouldn't be surprised to be working with bytes); > or it got that bytes object from a database, over-the-wire, > an-other-language-lib, etc. No, it got it from an __fspath__-toting object (such as a DirEntry) it received from some library, which constructed it polymorphically from bytes it got from some other place -- and so lost the original encoding. That's the scenario I think is impossible to rule out, and reducing that kind of scenario to the bare minimum is why bytes got demoted from being the default representation of text in Python 3 in the first place. > If I'm working with bytes, why would I want to work with str? First, are you actually *working* on those bytes, or are you just passing them to os functions? If the latter, you shouldn't care. Second, because paths are conceptually text (you may not agree, but Nick inter alia has indicated he does). Working with bytes paths (except literals) is a good way to get in trouble, because there are all kinds of ways they can end up inappropriately encoded. For example, the odds are very high that a bytes path read from a file (including from a zipfile directory) in Japan will be encoded in Shift JIS. On Mac OS X, that will either produce mojibake in the directory (if the access creates the file) or fail to access the intended file, because the filesystem encoding is UTF-8. Third, because you want to be portable to Windows, where you have no choice about whether paths are str or bytes. These reasons probably don't apply to you with much strength, but the question is how typical you are, vs. the nearly universal experience of mojibake and the dominant market share of Windows. > Python is a glue language, and Python practitioners don't always > have the luxury of working only with text. For paths? Of course you can work with them as text. ISTM what you really want is the luxury of working only with bytes, because you're in the habit of pretending they are text. I don't object to you having your luxury as long as it doesn't increase risk for my use cases. I think you're asking for trouble, and the practice is definitely nonportable, but consenting adults applies. However, the proposed polymorphism does create ambiguity and risk for my uses. I rarely have the luxury of *not* ensuring paths are text, regardless of the bytes-ness of the underlying application, because I can be pretty darn sure that somebody's going to feed me non- filesystem encodings, and soon. Even when I am working with bytes representing paths in the filesystem encoding, I need to convert to text to read the darn things when debugging! So I don't consent; you'll have to impose it on me. From ethan at stoneleaf.us Thu Apr 14 21:01:00 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 18:01:00 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <570FB650.203@stoneleaf.us> <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp> Message-ID: <57103D4C.7000001@stoneleaf.us> On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote: > However, the proposed polymorphism does create ambiguity and risk for > my uses. I rarely have the luxury of *not* ensuring paths are text, > regardless of the bytes-ness of the underlying application, because I > can be pretty darn sure that somebody's going to feed me non- > filesystem encodings, and soon. Even when I am working with bytes > representing paths in the filesystem encoding, I need to convert to > text to read the darn things when debugging! So I don't consent; > you'll have to impose it on me. Hmm. Well, the good news is you have convinced me that letting bytes through willy-nilly is akin to loosing the hounds of hell on our code. The bad news is I was never in that camp. ;) The camp I'm in is a function* that, be default, will raise if bytes enters the picture -- but will allow them through if the user specifically says they are okay with getting bytes. Would that work for you? -- ~Ethan~ *Or pair of functions, one that is str-only, one that allows both -- but I'd rather just have one function with a flag. From brett at python.org Thu Apr 14 21:42:39 2016 From: brett at python.org (Brett Cannon) Date: Fri, 15 Apr 2016 01:42:39 +0000 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: <571031A4.3090308@mrabarnett.plus.com> References: <571031A4.3090308@mrabarnett.plus.com> Message-ID: On Thu, Apr 14, 2016, 17:14 MRAB wrote: > On 2016-04-14 21:42, Armin Rigo wrote: > > Hi Victor, > > > > On 14 April 2016 at 17:19, Victor Stinner > wrote: > >> Each time a dictionary is created, the global > >> version is incremented and the dictionary version is initialized to the > >> global version. > > > > A detail, but why not set the version tag of new empty dictionaries to > > zero, always? Same after a clear(). This would satisfy the > > condition: equality of the version tag is supposed to mean "the > > dictionary content is precisely the same". > > > If you did that, wouldn't it then be possible to replace an empty dict > with another empty dict with you noticing? If you meant to say "without" then yes. Would that matter? > Nope because this is about versioining content, so having identical/empty content compare equal is fine. -brett > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Apr 15 00:22:07 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 14 Apr 2016 21:22:07 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <57103D4C.7000001@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <570FB650.203@stoneleaf.us> <22288.13274.270220.803532@turnbull.sk.tsukuba.ac.jp> <57103D4C.7000001@stoneleaf.us> Message-ID: <57106C6F.7020303@stoneleaf.us> On 04/14/2016 06:01 PM, Ethan Furman wrote: > On 04/14/2016 05:20 PM, Stephen J. Turnbull wrote: >> you'll have to impose it on me. > > Hmm. Well, the good news is you have convinced me that letting bytes > through willy-nilly is akin to loosing the hounds of hell on our code. > The bad news is I was never in that camp. ;) Actually, in retrospect, I was in that camp at the beginning. But Brett's code (and your arguments, amongst others) convinced me of that or would be better/safer. -- ~Ethan~ From steve at pearwood.info Fri Apr 15 00:52:54 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 15 Apr 2016 14:52:54 +1000 Subject: [Python-Dev] Should secrets include a fallback for hmac.compare_digest? Message-ID: <20160415045254.GI1819@ando.pearwood.info> Now that PEP 506 has been approved, I've checked in the secrets module, but an implementation question has come up regarding compare_digest. Currently, the module tries to import hmac.compare_digest, and if that fails, then it falls back to a Python version. But since compare_digest has been available since 3.3, I'm now questioning whether the fallback is useful at all. Perhaps for alternate Python implementations? So, two questions: - should secrets include a fallback? - if so, what is the preferred way of doing this? # option 1: fallback if compare_digest is missing try: from hmac import compare_digest except ImportError: def compare_digest(a, b): ... # option 2: "C accelerator idiom" def compare_digest(a, b): ... try: from hmac import compare_digest except ImportError: pass Option 1 is closer to how I would write hybrid 2/3 code, but option 2 is how PEP 399 suggests it should be written. https://www.python.org/dev/peps/pep-0399/ Currently, hmac imports compare_digest from _operator. There's no Python version in operator either. Should there be? -- Steve From senthil at uthcode.com Fri Apr 15 01:30:13 2016 From: senthil at uthcode.com (Senthil Kumaran) Date: Thu, 14 Apr 2016 22:30:13 -0700 Subject: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them! In-Reply-To: References: Message-ID: On Wed, Apr 13, 2016 at 4:40 AM, Victor Stinner wrote: > Last months, most 3.x buildbots failed randomly. Some of them were > always failing. I spent some time to fix almost all Windows and Linux > buildbots. There were a lot of different issues. > > So please try to not break buildbots again and remind to watch them > sometimes: > Piling in my thanks again, Victor. This is a great gesture from you to fix all the build bots. Keeping them stable is a proper thing to do and should be expected from all committers. -- Senthil -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Apr 15 01:39:03 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Apr 2016 07:39:03 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <20160414172926.44085562@subdivisions> Message-ID: Victor Stinner schrieb am 15.04.2016 um 00:33: > 2016-04-15 0:22 GMT+02:00 Brett Cannon: >> And even if it was GIL-free you do run the risk of two dicts ending up at >> the same version # by simply mutating the same number of times if the >> counters were per-dict instead of process-wide. > > For some optimizations, it is not needed to check if the dictionary > was replaced, or you check it directly. So it doesn't matter to have > the same version with the same number of operations. > > For the use case of Yury's optimization, having a globally unique > version tag makes the guard much cheaper, and the guard must check > that the dictionary was not replaced. How can that be achieved? If the tag is just a sequentially growing number, creating two dicts and applying one operation to the first one should give both the same version tag, right? Stefan From ncoghlan at gmail.com Fri Apr 15 03:11:35 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 Apr 2016 17:11:35 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> References: <570C1E13.4090909@stoneleaf.us> <1460641541.10420.578704073.7BFF2AD9@webmail.messagingengine.com> <1460642504.15711.578719113.17E236C5@webmail.messagingengine.com> Message-ID: On 15 April 2016 at 00:01, Random832 wrote: > On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote: >> Adding integers and floats is considered "safe" because most people's >> use of floats completely compasses their use of ints. (You'll get >> OverflowError if it can't be represented.) But float and Decimal are >> considered "unsafe": >> >> >>> 1.5 + decimal.Decimal("1.5") >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: unsupported operand type(s) for +: 'float' and >> 'decimal.Decimal' >> >> This is more what's happening here. Floats and Decimals can represent >> similar sorts of things, but with enough incompatibilities that you >> can't simply merge them. > > And what such incompatibilities exist between bytes and str for the > purpose of representing file paths? At the end of the day, there's > exactly one answer to "what file on disk this represents (or would > represent if it existed)". Bytes paths on WIndows are encoded as mbcs for use with the ASCII-only Windows APIs, and hence don't support the full range of characters that str does. The colloquial shorthand for that is "bytes paths don't work properly on Windows" (the more strictly accurate description is "bytes paths only work correctly on Windows if every code point in the path can be encoded using the 'mbcs' codec"). Even on *nix, os.fsencode may fail outright if the system is configured to use a non-universal encoding, while os.fsdecode may pollute the resulting string with surrogate escaped characters. Regardless of platform, if somebody hands you *mixed* bytes and str data, the appropriate default reaction is to complain about it rather than assume they meant one or the other. That complaint may take one of two forms: - for a high level, platform independent API, bytes should just be rejected outright - for a low level API with input type dependent behaviour, the input should be rejected as ambiguous - the API doesn't know whether the str behaviour or the bytes behaviour is the intended one pathlib falls into the first category - it just rejects bytes as input os.path.join falls into the second category - all str is fine, and all bytes is fine, but mixing them fails However, once somebody reaches for the coercion APIs (fsdecode and fsencode), they're now *explicitly* telling the interpreter what they want, since there's no ambiguity about the possible return types from those functions. In relation to Victor's comment about this being complex code to show to a novice: os.path.join(*map(os.fsdecode, ("str", b"bytes"))) I agree, but also think that's a good reason for people to switch to teaching novices pathlib rather than os.path, and letting them discover the underlying libraries as required by the code and examples they encounter. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Fri Apr 15 04:20:48 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 10:20:48 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <20160414172926.44085562@subdivisions> Message-ID: Le vendredi 15 avril 2016, Stefan Behnel a ?crit : > How can that be achieved? If the tag is just a sequentially growing number, > creating two dicts and applying one operation to the first one should give > both the same version tag, right? > Armin didn't propose to get ride of the global version. a = dict() # version = 0 b = dict() # version = 0 a['key'] = 'value' # version = 300 b['key'] = 'value' # version = 301 Victor PS: It looks like the iPad Gmail app foces me to use HTML, I don't know how to use plain text :-/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Apr 15 04:26:31 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 10:26:31 +0200 Subject: [Python-Dev] Should secrets include a fallback for hmac.compare_digest? In-Reply-To: <20160415045254.GI1819@ando.pearwood.info> References: <20160415045254.GI1819@ando.pearwood.info> Message-ID: It's easy to implement this function (in the native language of your Python implemenation), it's short. I'm not sure that a Python version is really safe. The secrets module is for Python 3.6, in this version the hmac already "requires" the compare_digest() function no? Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Fri Apr 15 05:01:00 2016 From: antoine at python.org (Antoine Pitrou) Date: Fri, 15 Apr 2016 09:01:00 +0000 (UTC) Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict References: Message-ID: Victor Stinner gmail.com> writes: > > Hi, > > 2016-04-14 22:42 GMT+02:00 Armin Rigo tunes.org>: > > Hi Victor, > > > > On 14 April 2016 at 17:19, Victor Stinner gmail.com> wrote: > >> Each time a dictionary is created, the global > >> version is incremented and the dictionary version is initialized to the > >> global version. > > > > A detail, but why not set the version tag of new empty dictionaries to > > zero, always? Same after a clear(). This would satisfy the > > condition: equality of the version tag is supposed to mean "the > > dictionary content is precisely the same". > > You're right that incrementing the global version is useless for these > specific cases, and using the version 0 should work. It only matters > that the version (version? version tag?) is different. Why do this? It's a nice property that two dicts always have different version tags, and now you're killing this property for... no obvious reason? Do you really think dict.clear() is in need of micro-optimizing a couple CPU cycles away? Regards Antoine. From stefan_ml at behnel.de Fri Apr 15 05:03:21 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Apr 2016 11:03:21 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <20160414172926.44085562@subdivisions> Message-ID: Victor Stinner schrieb am 15.04.2016 um 10:20: > Le vendredi 15 avril 2016, Stefan Behnel a ?crit : > >> How can that be achieved? If the tag is just a sequentially growing number, >> creating two dicts and applying one operation to the first one should give >> both the same version tag, right? >> > > Armin didn't propose to get ride of the global version. > > a = dict() # version = 0 > b = dict() # version = 0 > a['key'] = 'value' # version = 300 > b['key'] = 'value' # version = 301 Ah, sorry, should have read the PEP more closely. It's *always* the global version that gets incremented. Then yes, that's a safe point of distinction for dicts and their status. Stefan From steve at pearwood.info Fri Apr 15 05:21:55 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 15 Apr 2016 19:21:55 +1000 Subject: [Python-Dev] Should secrets include a fallback for hmac.compare_digest? In-Reply-To: References: <20160415045254.GI1819@ando.pearwood.info> Message-ID: <20160415092155.GK1819@ando.pearwood.info> On Fri, Apr 15, 2016 at 10:26:31AM +0200, Victor Stinner wrote: > It's easy to implement this function (in the native language of your Python > implemenation), it's short. I'm not sure that a Python version is really > safe. > > The secrets module is for Python 3.6, in this version the hmac already > "requires" the compare_digest() function no? The current version looks like this: try: from hmac import compare_digest except ImportError: # fallback version defined but I'm having second thoughts about this. I don't think it needs to support older versions of Python, but perhaps it needs to support implementations which don't include compare_digest? This isn't just a question about the secrets module. PEP 399 suggests than any C classes/functions should have a pure Python version as fallback, but compare_digest doesn't. I don't know whether it should or not. https://www.python.org/dev/peps/pep-0399/ -- Steve From victor.stinner at gmail.com Fri Apr 15 05:34:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 11:34:44 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: 2016-04-15 11:01 GMT+02:00 Antoine Pitrou : > Victor Stinner gmail.com> writes: >> You're right that incrementing the global version is useless for these >> specific cases, and using the version 0 should work. It only matters >> that the version (version? version tag?) is different. > > Why do this? It's a nice property that two dicts always have different > version tags, and now you're killing this property for... no obvious > reason? I guess that the reason is to reduce *a little bit* the risk of integer overflow (especially the bug when a guard doesn't see a change between new_version = old_version % 2**64). > Do you really think dict.clear() is in need of micro-optimizing a > couple CPU cycles away? The advantage of having a different version for empty dict is to be able to use the version to check that they are different. Using the dictionary pointer is not enough, since it's common that a new dictionary gets the address of a previously destroyed dictionary. This case can be avoided if you keep dictionaries alive by keeping a strong reference, but there are good reasons to not keep a strong reference. Victor From victor.stinner at gmail.com Fri Apr 15 05:35:56 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 11:35:56 +0200 Subject: [Python-Dev] Should secrets include a fallback for hmac.compare_digest? In-Reply-To: <20160415092155.GK1819@ando.pearwood.info> References: <20160415045254.GI1819@ando.pearwood.info> <20160415092155.GK1819@ando.pearwood.info> Message-ID: 2016-04-15 11:21 GMT+02:00 Steven D'Aprano : > This isn't just a question about the secrets module. PEP 399 suggests > than any C classes/functions should have a pure Python version as > fallback, but compare_digest doesn't. I don't know whether it should or > not. The hmac module is responsible to providing a fallback, not the secrets module. Victor From victor.stinner at gmail.com Fri Apr 15 05:39:02 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 11:39:02 +0200 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: References: <20151016005711.GC11980@ando.pearwood.info> <20160410050845.GA12526@ando.pearwood.info> <20160411175036.GA1819@ando.pearwood.info> Message-ID: Hi, Would it make sense to add a function to generate a random UUID4 (as a string) in secrets? The current implement in uuid.py of CPython 3.6 already uses os.urandom(): def uuid4(): """Generate a random UUID.""" return UUID(bytes=os.urandom(16), version=4) Victor From p.f.moore at gmail.com Fri Apr 15 05:55:38 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 15 Apr 2016 10:55:38 +0100 Subject: [Python-Dev] Should secrets include a fallback for hmac.compare_digest? In-Reply-To: References: <20160415045254.GI1819@ando.pearwood.info> <20160415092155.GK1819@ando.pearwood.info> Message-ID: On 15 April 2016 at 10:35, Victor Stinner wrote: > 2016-04-15 11:21 GMT+02:00 Steven D'Aprano : >> This isn't just a question about the secrets module. PEP 399 suggests >> than any C classes/functions should have a pure Python version as >> fallback, but compare_digest doesn't. I don't know whether it should or >> not. > > The hmac module is responsible to providing a fallback, not the secrets module. Agreed. The library docs state that the hmac module provides compare_digest, so you are therefore entitled to unconditionally import it (just as end user code would). Paul From ncoghlan at gmail.com Fri Apr 15 06:16:53 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 Apr 2016 20:16:53 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> Message-ID: On 15 April 2016 at 00:52, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > The use case for returning bytes from __fspath__ is DirEntry, so you > > can write things like this in low level code: > > > > def myscandir(dirpath): > > for entry in os.scandir(dirpath): > > if entry.is_file(): > > with open(entry) as f: > > # do something > > Excuse me, but that is *not* a use case for returning bytes from > DirEntry.__fspath__. open() is perfectly happy taking str (including > surrogate-encoded rawbytes). That results in a different type for the file object's name: >>> open("README.md").name 'README.md' >>> open(b"README.md").name b'README.md' Implicitly level shifting in a low level API isn't a good thing, especially when there are idempotent level shifting commands available (so you can always ensure a given value is on the level you expect, even if you don't know which level it was on originally). I completely agree with you that folks working with text in the binary domain are asking for trouble, but at the same time, that's the reality of the way a lot of *nix system interfaces operate. The guarantee we want to provide those folks is that if they're operating in the binary domain they'll stay there unless they explicitly shift out of it using a decoding API of some kind - doing it behind their back would be akin to implicitly shifting from the time domain to the frequency domain in an engineering library. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Apr 15 06:42:06 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 Apr 2016 20:42:06 +1000 Subject: [Python-Dev] PEP 506 secrets module In-Reply-To: References: <20151016005711.GC11980@ando.pearwood.info> <20160410050845.GA12526@ando.pearwood.info> <20160411175036.GA1819@ando.pearwood.info> Message-ID: On 15 April 2016 at 19:39, Victor Stinner wrote: > Hi, > > Would it make sense to add a function to generate a random UUID4 (as a > string) in secrets? > > The current implement in uuid.py of CPython 3.6 already uses os.urandom(): > > def uuid4(): > """Generate a random UUID.""" > return UUID(bytes=os.urandom(16), version=4) I don't think so, as folks looking to generate a UUID specifically are already likely to end up at the uuid module docs rather than trying to craft their own based on the random module (and the uuid module already does the right thing, and it would be a bug if it didn't). The new secrets module fills the gap for cases where random is otherwise an attractive nuisance by making it easy to say "use this instead". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Apr 15 06:48:44 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 Apr 2016 20:48:44 +1000 Subject: [Python-Dev] Should secrets include a fallback for hmac.compare_digest? In-Reply-To: <20160415045254.GI1819@ando.pearwood.info> References: <20160415045254.GI1819@ando.pearwood.info> Message-ID: On 15 April 2016 at 14:52, Steven D'Aprano wrote: > Now that PEP 506 has been approved, I've checked in the secrets module, > but an implementation question has come up regarding compare_digest. > > Currently, the module tries to import hmac.compare_digest, and if that > fails, then it falls back to a Python version. But since compare_digest > has been available since 3.3, I'm now questioning whether the fallback > is useful at all. Perhaps for alternate Python implementations? > > So, two questions: > > - should secrets include a fallback? It definitely *shouldn't* include a fallback, as the function needs to be writen in C (or some other not-normal-Python-code language) in order to provide the appropriate timing guarantees. We added hmac.compare_digest in response to Python web frameworks providing their own pure Python "constant time" comparison functions that were nevertheless still subject to remote timing atacks. I'd forgotten about the hmac vs operator indirection, but it's still better to import the public API from hmac (since operator._compare_digest is a Python implementation detail, and you may as well make it easy to extract the secrets module for use in earlier versions - 2.7 also gained hmac.compare_digest as part of PEP 466). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From status at bugs.python.org Fri Apr 15 12:08:25 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 15 Apr 2016 18:08:25 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160415160825.E7ECE5667A@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-04-08 - 2016-04-15) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5489 (+12) closed 33039 (+46) total 38528 (+58) Open issues with patches: 2381 Issues opened (45) ================== #11205: Evaluation order of dictionary display is different from refer http://bugs.python.org/issue11205 reopened by ncoghlan #25609: Add a ContextManager ABC and type http://bugs.python.org/issue25609 reopened by brett.cannon #25731: Assigning and deleting __new__ attr on the class does not allo http://bugs.python.org/issue25731 reopened by barry #26673: Tkinter error when opening IDLE configuration menu http://bugs.python.org/issue26673 reopened by terry.reedy #26716: EINTR handling in fcntl http://bugs.python.org/issue26716 opened by Jack Zhou #26717: wsgiref.simple_server: mojibake with cp1252 bytes in PATH_INFO http://bugs.python.org/issue26717 opened by Anthony Sottile #26720: memoryview from BufferedWriter becomes garbage http://bugs.python.org/issue26720 opened by martin.panter #26721: Avoid socketserver.StreamRequestHandler.wfile doing partial wr http://bugs.python.org/issue26721 opened by martin.panter #26724: Serialize dict with non-string keys to JSON ??? unexpected res http://bugs.python.org/issue26724 opened by anton-ryzhov #26726: Incomplete Internationalization in Argparse Module http://bugs.python.org/issue26726 opened by IronGrid #26728: make pdb.set_trace() accept debugger commands as arguments and http://bugs.python.org/issue26728 opened by irdb #26729: Incorrect __text_signature__ for sorted http://bugs.python.org/issue26729 opened by eriknw #26730: SpooledTemporaryFile doesn't correctly preserve data for text http://bugs.python.org/issue26730 opened by James Hennessy #26731: subprocess on windows leaks stdout/stderr handle to child proc http://bugs.python.org/issue26731 opened by saifujinaro #26732: multiprocessing sentinel resource leak http://bugs.python.org/issue26732 opened by quick-b #26733: staticmethod and classmethod are ignored when disassemble clas http://bugs.python.org/issue26733 opened by xiang.zhang #26736: Use HTTPS protocol in links http://bugs.python.org/issue26736 opened by serhiy.storchaka #26739: idle: Errno 10035 a non-blocking socket operation could not be http://bugs.python.org/issue26739 opened by MICHAEL JACOBSON #26740: tarfile: accessing (listing and extracting) tarball fails with http://bugs.python.org/issue26740 opened by Tomas Tomecek #26741: subprocess.Popen should emit a ResourceWarning in destructor i http://bugs.python.org/issue26741 opened by haypo #26742: imports in test_warnings changes warnings.filters http://bugs.python.org/issue26742 opened by haypo #26743: Unable to import random with python2.7 on power pc based machi http://bugs.python.org/issue26743 opened by ragreddy #26744: print() function hangs on MS-Windows 10 http://bugs.python.org/issue26744 opened by Ma Lin #26745: Redundant code in _PyObject_GenericSetAttrWithDict http://bugs.python.org/issue26745 opened by xiang.zhang #26746: struct.pack(): trailing padding bytes on x64 http://bugs.python.org/issue26746 opened by skrah #26750: Mock autospec does not work with subclasses of property() http://bugs.python.org/issue26750 opened by amaury.forgeotdarc #26751: Possible bug in sorting algorithm http://bugs.python.org/issue26751 opened by David.Manowitz #26752: Mock(2.0.0).assert_has_calls() raise AssertionError in two sam http://bugs.python.org/issue26752 opened by jekin000 #26753: Obmalloc lock LOCK_INIT and LOCK_FINI are never used http://bugs.python.org/issue26753 opened by larry #26754: PyUnicode_FSDecoder() accepts arbitrary iterable http://bugs.python.org/issue26754 opened by serhiy.storchaka #26755: Update version{added,changed} docs in devguide http://bugs.python.org/issue26755 opened by berker.peksag #26756: fileinput handling of unicode errors from standard input http://bugs.python.org/issue26756 opened by jmb236 #26757: test_urllib2net.test_http_basic() timeout after 15 min on http://bugs.python.org/issue26757 opened by haypo #26758: Unnecessary format string handling for no argument slot wrappe http://bugs.python.org/issue26758 opened by josh.r #26759: PyBytes_FromObject accepts arbitrary iterable http://bugs.python.org/issue26759 opened by serhiy.storchaka #26760: Document PyFrameObject http://bugs.python.org/issue26760 opened by brett.cannon #26762: test_multiprocessing_spawn leaves processes running in backgro http://bugs.python.org/issue26762 opened by martin.panter #26763: Update PEP-8 regarding binary operators http://bugs.python.org/issue26763 opened by IanLee1521 #26764: SystemError in bytes.__rmod__ http://bugs.python.org/issue26764 opened by serhiy.storchaka #26765: Factor out common bytes and bytearray implementation http://bugs.python.org/issue26765 opened by serhiy.storchaka #26766: The result type of bytearray formatting is not stable http://bugs.python.org/issue26766 opened by berker.peksag #26767: Inconsistant error messages for failed attribute modification http://bugs.python.org/issue26767 opened by serhiy.storchaka #26769: Python 2.7: make private file descriptors non inheritable http://bugs.python.org/issue26769 opened by haypo #26770: _Py_set_inheritable(): do nothing if the FD_CLOEXEC close is a http://bugs.python.org/issue26770 opened by haypo #26771: python-config.sh.in INCDIR does not match python version if ex http://bugs.python.org/issue26771 opened by benzea Most recent 15 issues with no replies (15) ========================================== #26771: python-config.sh.in INCDIR does not match python version if ex http://bugs.python.org/issue26771 #26769: Python 2.7: make private file descriptors non inheritable http://bugs.python.org/issue26769 #26767: Inconsistant error messages for failed attribute modification http://bugs.python.org/issue26767 #26765: Factor out common bytes and bytearray implementation http://bugs.python.org/issue26765 #26760: Document PyFrameObject http://bugs.python.org/issue26760 #26758: Unnecessary format string handling for no argument slot wrappe http://bugs.python.org/issue26758 #26752: Mock(2.0.0).assert_has_calls() raise AssertionError in two sam http://bugs.python.org/issue26752 #26750: Mock autospec does not work with subclasses of property() http://bugs.python.org/issue26750 #26739: idle: Errno 10035 a non-blocking socket operation could not be http://bugs.python.org/issue26739 #26728: make pdb.set_trace() accept debugger commands as arguments and http://bugs.python.org/issue26728 #26726: Incomplete Internationalization in Argparse Module http://bugs.python.org/issue26726 #26700: Make digest_size a class variable http://bugs.python.org/issue26700 #26697: tkFileDialog crash on askopenfilename Python 2.7 64-bit Win7 http://bugs.python.org/issue26697 #26696: Document collections.abc.ByteString http://bugs.python.org/issue26696 #26695: pickle and _pickle accelerator have different behavior when un http://bugs.python.org/issue26695 Most recent 15 issues waiting for review (15) ============================================= #26770: _Py_set_inheritable(): do nothing if the FD_CLOEXEC close is a http://bugs.python.org/issue26770 #26769: Python 2.7: make private file descriptors non inheritable http://bugs.python.org/issue26769 #26766: The result type of bytearray formatting is not stable http://bugs.python.org/issue26766 #26765: Factor out common bytes and bytearray implementation http://bugs.python.org/issue26765 #26764: SystemError in bytes.__rmod__ http://bugs.python.org/issue26764 #26763: Update PEP-8 regarding binary operators http://bugs.python.org/issue26763 #26755: Update version{added,changed} docs in devguide http://bugs.python.org/issue26755 #26750: Mock autospec does not work with subclasses of property() http://bugs.python.org/issue26750 #26745: Redundant code in _PyObject_GenericSetAttrWithDict http://bugs.python.org/issue26745 #26742: imports in test_warnings changes warnings.filters http://bugs.python.org/issue26742 #26741: subprocess.Popen should emit a ResourceWarning in destructor i http://bugs.python.org/issue26741 #26736: Use HTTPS protocol in links http://bugs.python.org/issue26736 #26733: staticmethod and classmethod are ignored when disassemble clas http://bugs.python.org/issue26733 #26730: SpooledTemporaryFile doesn't correctly preserve data for text http://bugs.python.org/issue26730 #26729: Incorrect __text_signature__ for sorted http://bugs.python.org/issue26729 Top 10 most discussed issues (10) ================================= #26743: Unable to import random with python2.7 on power pc based machi http://bugs.python.org/issue26743 20 msgs #26766: The result type of bytearray formatting is not stable http://bugs.python.org/issue26766 12 msgs #25702: Link Time Optimizations support for GCC and CLANG http://bugs.python.org/issue25702 10 msgs #25910: Fixing links in documentation http://bugs.python.org/issue25910 10 msgs #26647: ceval: use Wordcode, 16-bit bytecode http://bugs.python.org/issue26647 10 msgs #26716: EINTR handling in fcntl http://bugs.python.org/issue26716 9 msgs #26729: Incorrect __text_signature__ for sorted http://bugs.python.org/issue26729 9 msgs #26763: Update PEP-8 regarding binary operators http://bugs.python.org/issue26763 9 msgs #26359: CPython build options for out-of-the box performance http://bugs.python.org/issue26359 8 msgs #26601: Use new madvise()'s MADV_FREE on the private heap http://bugs.python.org/issue26601 8 msgs Issues closed (48) ================== #13410: String formatting bug in interactive mode http://bugs.python.org/issue13410 closed by serhiy.storchaka #13952: mimetypes doesn't recognize .csv http://bugs.python.org/issue13952 closed by berker.peksag #14784: Re-importing _warnings changes warnings.filters http://bugs.python.org/issue14784 closed by martin.panter #15984: Wrong documentation for PyUnicode_FromObject() and PyUnicode_F http://bugs.python.org/issue15984 closed by martin.panter #16329: mimetypes does not support webm type http://bugs.python.org/issue16329 closed by berker.peksag #17264: Update Building C and C++ Extensions with distutils documentat http://bugs.python.org/issue17264 closed by berker.peksag #17339: bytes() TypeError message is misleadingly narrow http://bugs.python.org/issue17339 closed by serhiy.storchaka #18461: X Error in tkinter http://bugs.python.org/issue18461 closed by serhiy.storchaka #21069: test_fileno of test_urllibnet intermittently fails http://bugs.python.org/issue21069 closed by martin.panter #22659: SyntaxError in the configure_ctypes http://bugs.python.org/issue22659 closed by berker.peksag #23397: PEP 431 implementation http://bugs.python.org/issue23397 closed by berker.peksag #24951: Idle test_configdialog fails on Fedora 23, 3.6 http://bugs.python.org/issue24951 closed by terry.reedy #25339: sys.stdout.errors is set to "surrogateescape" http://bugs.python.org/issue25339 closed by serhiy.storchaka #25496: tarfile: Default value for compresslevel is not documented http://bugs.python.org/issue25496 closed by martin.panter #25654: test_multiprocessing_spawn ResourceWarning with -Werror http://bugs.python.org/issue25654 closed by martin.panter #26057: Avoid nonneeded use of PyUnicode_FromObject() http://bugs.python.org/issue26057 closed by serhiy.storchaka #26257: Eliminate buffer_tests.py http://bugs.python.org/issue26257 closed by martin.panter #26404: socketserver context manager http://bugs.python.org/issue26404 closed by martin.panter #26585: Use html.escape to replace _quote_html in http.server http://bugs.python.org/issue26585 closed by martin.panter #26587: Possible duplicate entries in sys.path if .pth files are used http://bugs.python.org/issue26587 closed by brett.cannon #26609: Wrong request target in test_httpservers.py http://bugs.python.org/issue26609 closed by martin.panter #26610: test_venv.test_with_pip() fails when ctypes is missing http://bugs.python.org/issue26610 closed by haypo #26623: JSON encode: more informative error http://bugs.python.org/issue26623 closed by serhiy.storchaka #26624: Windows hangs in call to CRT setlocale() http://bugs.python.org/issue26624 closed by python-dev #26639: Tools/i18n/pygettext.py: replace deprecated imp module with im http://bugs.python.org/issue26639 closed by haypo #26668: Remove Lib/test/test_importlib/regrtest.py? http://bugs.python.org/issue26668 closed by brett.cannon #26685: Raise errors from socket.close() http://bugs.python.org/issue26685 closed by martin.panter #26687: Use Py_RETURN_NONE in sqlite3 module http://bugs.python.org/issue26687 closed by berker.peksag #26699: locale.str docstring is incorrect: "Convert float to integer" http://bugs.python.org/issue26699 closed by orsenthil #26706: Update OpenSSL version in readme http://bugs.python.org/issue26706 closed by python-dev #26712: Unify (r)split(), (l/r)strip() method tests http://bugs.python.org/issue26712 closed by martin.panter #26714: telnetlib.Telnet should act as a context manager http://bugs.python.org/issue26714 closed by SilentGhost #26715: can not deactivate venv (deactivate.bat) if the venv was activ http://bugs.python.org/issue26715 closed by zach.ware #26718: super.__init__ leaks memory if called multiple times http://bugs.python.org/issue26718 closed by brett.cannon #26719: More efficient formatting of ints and floats in json http://bugs.python.org/issue26719 closed by serhiy.storchaka #26722: Fold compare operators on constants (peephole) http://bugs.python.org/issue26722 closed by ncoghlan #26723: Add an option to skip _decimal module http://bugs.python.org/issue26723 closed by skrah #26725: list() destroys map object data http://bugs.python.org/issue26725 closed by ned.deily #26727: ctypes.util.find_msvcrt() does not work in python 3.5.1 http://bugs.python.org/issue26727 closed by steve.dower #26734: Repeated mmap\munmap calls during temporary allocation http://bugs.python.org/issue26734 closed by pitrou #26735: os.urandom(2500) fails on Solaris 11.3 http://bugs.python.org/issue26735 closed by haypo #26737: csv.DictReader throws generic error when fieldnames is accesse http://bugs.python.org/issue26737 closed by serhiy.storchaka #26738: listname.strip() does not work right if the name ends with an http://bugs.python.org/issue26738 closed by SilentGhost #26747: types.InstanceType only for old style class only in 2.7 http://bugs.python.org/issue26747 closed by berker.peksag #26748: enum.Enum is False-y http://bugs.python.org/issue26748 closed by ethan.furman #26749: Update devguide to include Fedora's DNF http://bugs.python.org/issue26749 closed by berker.peksag #26761: winsound module very unstable in Windows 10 http://bugs.python.org/issue26761 closed by zach.ware #26768: Fix instructions at WindowsCompilers for MSVC/SDKs http://bugs.python.org/issue26768 closed by berker.peksag From guido at python.org Fri Apr 15 12:53:13 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Apr 2016 09:53:13 -0700 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update Message-ID: After a fruitful discussion on python-ideas I've decided that it's fine to break lines *before* a binary operator. It looks better and Knuth recommends it. The head of the python-ideas discussion: https://mail.python.org/pipermail/python-ideas/2016-April/039752.html See also the discussion in the tracker: http://bugs.python.org/issue26763 Here's the diff I applied: https://hg.python.org/peps/rev/3857909d7956 The talk by Brandon Rhodes where Knuth is referenced ([3] below): http://rhodesmill.org/brandon/slides/2012-11-pyconca/#laying-down-the-law The key section in PEP 8 that was updated (apart from fixing up references): Should a line break before or after a binary operator? ------------------------------------------------------ For decades the recommended style has been to break after binary operators. However, recent reseach unearthed recommendations by Donald Knuth to break *before* binary operators, in his writings about typesetting [3]_. Therefore it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth's style is suggested. Some examples of code breaking before binary Boolean operators:: class Rectangle(Blob): def __init__(self, width, height, color='black', emphasis=None, highlight=0): if (width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100): raise ValueError("sorry, you lose") if (width == 0 and height == 0 and (color == 'red' or emphasis is None)): raise ValueError("I don't think so -- values are %s, %s" % (width, height)) Blob.__init__(self, width, height, color, emphasis, highlight) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Apr 15 13:02:43 2016 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 15 Apr 2016 12:02:43 -0500 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: [Guido] > After a fruitful discussion on python-ideas I've decided that it's fine to > break lines *before* a binary operator. It looks better and Knuth recommends > it. > ... > Therefore it is permissible to break before or > after a binary operator, as long as the convention is consistent > locally. For new code Knuth's style is suggested. > > Some examples of code breaking before binary Boolean operators:: > > class Rectangle(Blob): > > def __init__(self, width, height, > color='black', emphasis=None, highlight=0): > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > if (width == 0 and height == 0 > and (color == 'red' or emphasis is None)): > raise ValueError("I don't think so -- values are %s, %s" % > (width, height)) > Blob.__init__(self, width, height, > color, emphasis, highlight) > Note that this code still breaks a line after a binary operator (the string formatting "%" operator in the 2nd ValueError call). But it's perfectly clear the way it is. Good taste can't be reduced to rules ;-) From victor.stinner at gmail.com Fri Apr 15 13:03:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 19:03:44 +0200 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: Hum. if (width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100): raise ValueError("sorry, you lose") Please remove one space to vertically align "and" operators with the opening parenthesis: if (width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100): raise ValueError("sorry, you lose") (I'm not sure that the difference is obvious in a mail client, you need a fixed width font which is not the case in my Gmail editor.) It helps to visually see that the multiline test and the raise instruction are in two different blocks. (Moreover, the pep8 checks of OpenStack simply reject such syntax, but I cannot use this syntax anymore :-)) Victor From guido at python.org Fri Apr 15 13:06:00 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Apr 2016 10:06:00 -0700 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: On Fri, Apr 15, 2016 at 10:03 AM, Victor Stinner wrote: > Hum. > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > Please remove one space to vertically align "and" operators with the > opening parenthesis: > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > (I'm not sure that the difference is obvious in a mail client, you > need a fixed width font which is not the case in my Gmail editor.) > I can see it perfectly fin and I disagree. > It helps to visually see that the multiline test and the raise > instruction are in two different blocks. > > (Moreover, the pep8 checks of OpenStack simply reject such syntax, but > I cannot use this syntax anymore :-)) That's why that tool shouldn't be named after the PEP. See https://github.com/PyCQA/pycodestyle/issues/466 -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gjcarneiro at gmail.com Fri Apr 15 13:15:09 2016 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 15 Apr 2016 18:15:09 +0100 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: On 15 April 2016 at 18:03, Victor Stinner wrote: > Hum. > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > Please remove one space to vertically align "and" operators with the > opening parenthesis: > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > Personally, I think what you propose looks ugly. The first version looks so much better. It helps to visually see that the multiline test and the raise > instruction are in two different blocks. The only thing I would add would be an empty line to help distinguish the if expression block from the "then" code block: if (width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100): raise ValueError("sorry, you lose") -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Apr 15 13:24:12 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Apr 2016 20:24:12 +0300 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: On 15.04.16 20:03, Victor Stinner wrote: > Hum. > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > Please remove one space to vertically align "and" operators with the > opening parenthesis: > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") I would rather *add* spaces to wrapped condition lines. if (width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100): raise ValueError("sorry, you lose") From guido at python.org Fri Apr 15 13:43:44 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Apr 2016 10:43:44 -0700 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: The update is already serving its real purpose: showing that style is debatable and cannot always easily be reduced to fixed rules. On Fri, Apr 15, 2016 at 10:24 AM, Serhiy Storchaka wrote: > On 15.04.16 20:03, Victor Stinner wrote: > >> Hum. >> >> if (width == 0 >> and height == 0 >> and color == 'red' >> and emphasis == 'strong' >> or highlight > 100): >> raise ValueError("sorry, you lose") >> >> Please remove one space to vertically align "and" operators with the >> opening parenthesis: >> >> if (width == 0 >> and height == 0 >> and color == 'red' >> and emphasis == 'strong' >> or highlight > 100): >> raise ValueError("sorry, you lose") >> > > I would rather *add* spaces to wrapped condition lines. > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Apr 15 13:49:03 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 15 Apr 2016 18:49:03 +0100 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: <5711298F.7060308@mrabarnett.plus.com> On 2016-04-15 18:03, Victor Stinner wrote: > Hum. > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > Please remove one space to vertically align "and" operators with the > opening parenthesis: > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > (I'm not sure that the difference is obvious in a mail client, you > need a fixed width font which is not the case in my Gmail editor.) > > It helps to visually see that the multiline test and the raise > instruction are in two different blocks. > > (Moreover, the pep8 checks of OpenStack simply reject such syntax, but > I cannot use this syntax anymore :-)) > I always half-indent continuation lines: if (width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100): raise ValueError("sorry, you lose") From jimjjewett at gmail.com Fri Apr 15 13:54:59 2016 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Fri, 15 Apr 2016 10:54:59 -0700 (PDT) Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: Message-ID: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> On Thu Apr 14 11:19:42 EDT 2016, Victor Stinner posted the latest draft of PEP 509; dict version_tag (1) Meta Question: If this is really only for CPython, then is "Standards Track" the right classification? (2) Why *promise* not to update the version_tag when replacing a value with itself? Isn't that the sort of quality-of-implementation issue that got pushed to a note for objects that happen to be represented as singletons, such as small integers or ASCII chars? I think it is a helpful optimization, and worth documenting ... I just think it should be at the layer of "this particular patch", rather than something that sounds like part of the contract. e.g., ... The global version is also incremented and copied to the dictionary version at each dictionary change. The following dict methods can trigger changes: * ``clear()`` * ``pop(key)`` * ``popitem()`` * ``setdefault(key, value)`` * ``__detitem__(key)`` * ``__setitem__(key, value)`` * ``update(...)`` .. note:: As a quality of implementation issue, the actual patch does not increment the version_tag when it can prove that there was no actual change. For example, clear() on an already-empty dict will not trigger a version_tag change, nor will updating a dict with itself, since the values will be unchanged. For efficiency, the analysis considers only object identity (not equality) when deciding whether to increment the version_tag. [2A] Do you want to promise that replacing a value with a non-identical object *will* trigger a version_tag update *even* if the objects are equal? I would vote no, but I realize backwards-compatibility may create such a promise implicitly. (3) It is worth being explicit on whether empty dicts can share a version_tag of 0. If this PEP is about dict content, then that seems fine, and it may well be worth optimizing dict creation. There are times when it is important to keep the same empty dict; I can't think of any use cases where it is important to verify that some *other* code has done so, *and* I can't get a reference to the correct dict for an identity check. (4) Please be explicit about the locking around version++; it is enough to say that the relevant methods already need to hold the GIL (assuming that is true). (5) I'm not sure I understand the arguments around a per-entry version. On the one hand, you never need a strong reference to the value; if it has been collected, then it has obviously been removed from the dict and should trigger a change even with per-dict. On the other hand, I'm not sure per-entry would really allow finer-grained guards to avoid lookups; just because an entry hasn't been modified doesn't prove it hasn't been moved to another location, perhaps by replacing a dummy in a slot it would have preferred. (6) I'm also not sure why version_tag *doesn't* solve the problem of dicts that fool the iteration guards by mutating without changing size ( https://bugs.python.org/issue19332 ) ... are you just saying that the iterator views aren't allowed to rely on the version-tag remaining stable, because replacing a value (as opposed to a key-value pair) is allowed? I had always viewed the failing iterators as a supporting-this-case- makes-the-code-too-slow-and-ugly limitation, rather than a data integrity check. When I do care about the data not changing, (an exposed variant of) version_tag is as likely to be what I want as a hypothetical keys_version_tag would be. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From barry at python.org Fri Apr 15 14:37:23 2016 From: barry at python.org (Barry Warsaw) Date: Fri, 15 Apr 2016 14:37:23 -0400 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: <20160415143723.239895bc@subdivisions> On Apr 15, 2016, at 09:53 AM, Guido van Rossum wrote: >After a fruitful discussion on python-ideas I've decided that it's fine to >break lines *before* a binary operator. Thanks Guido, your changes look great. -Barry From oscar.j.benjamin at gmail.com Fri Apr 15 16:33:44 2016 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 15 Apr 2016 21:33:44 +0100 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> Message-ID: On 15 April 2016 at 18:54, Jim J. Jewett wrote: > > [2A] Do you want to promise that replacing a value with a > non-identical object *will* trigger a version_tag update *even* > if the objects are equal? > > I would vote no, but I realize backwards-compatibility may create > such a promise implicitly. It needs to trigger a version update. Equality doesn't guarantee any kind of equivalence in Python. It's not even guaranteed that a==b will come to the same value if evaluated twice in a row. An example: >>> from fractions import Fraction as F >>> F(1) == 1 True >>> d = globals() >>> d['a'] = F(1) >>> a.limit_denominator() Fraction(1, 1) >>> d['a'] = 1 >>> a.limit_denominator() Traceback (most recent call last): File "", line 1, in AttributeError: 'int' object has no attribute 'limit_denominator' -- Oscar From victor.stinner at gmail.com Fri Apr 15 16:41:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 22:41:44 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> Message-ID: 2016-04-15 19:54 GMT+02:00 Jim J. Jewett : > (1) Meta Question: If this is really only for CPython, then is > "Standards Track" the right classification? Yes, I think so. It doesn't seem to be an Informal nor a Process: https://www.python.org/dev/peps/pep-0001/#pep-types > (2) Why *promise* not to update the version_tag when replacing a > value with itself? It's an useful property. For example, let's say that you have a guard on globals()['value']. The guard is created with value=3. An unit test replaces the value with 50, but then restore the value to its previous value (3). Later, the guard is checked to decide if an optimization can be used. If the dict version is increased, you need a lookup. If the dict version is not increased, the guard is cheap. In C, it's very cheap to implement the test "new_value == old_value", it just compares two pointers. If an overhead is visible, I can drop it from the PEP, and implement the check in the guard. > Isn't that the sort of quality-of-implementation > issue that got pushed to a note for objects that happen to be > represented as singletons, such as small integers or ASCII chars? I prefer to require this property. > [2A] Do you want to promise that replacing a value with a > non-identical object *will* trigger a version_tag update *even* > if the objects are equal? It's already written in the PEP: "The version is not incremented if an existing key is set to the same value. For efficiency, values are compared by their identity: new_value is old_value , not by their content: new_value == old_value ." > (3) It is worth being explicit on whether empty dicts can share > a version_tag of 0. If this PEP is about dict content, then that > seems fine, and it may well be worth optimizing dict creation. This is not part of the PEP yet. I'm not sure that I will modify the PEP to use the version 0 for empty dictionaries. Antoine doesn't seem to be convinced :-) > (4) Please be explicit about the locking around version++; it > is enough to say that the relevant methods already need to hold > the GIL (assuming that is true). I don't think that it's important to mention it in the PEP. It's more an implementation detail. The version can be protected by atomic operations. > (5) I'm not sure I understand the arguments around a per-entry > version. It doesn't matter since I don't want this option :-) > On the one hand, you never need a strong reference to the value; > if it has been collected, then it has obviously been removed from > the dict and should trigger a change even with per-dict. Let's say that you watch the key1 of a dict. The key2 is modified, it increases the version. Later, you test the guard: to check if the key1 was modified, you need to lookup the key and compare the value. You need the value to compare it. > On the other hand, I'm not sure per-entry would really allow > finer-grained guards to avoid lookups; just because an entry hasn't > been modified doesn't prove it hasn't been moved to another location, > perhaps by replacing a dummy in a slot it would have preferred. The main advantage of per-entry version is to avoid the strong reference to values. According to my tests, the drawbacks are too important to take this option. I prefer a simple version per dictionary. > (6) I'm also not sure why version_tag *doesn't* solve the problem > of dicts that fool the iteration guards by mutating without changing > size ( https://bugs.python.org/issue19332 ) ... are you just saying > that the iterator views aren't allowed to rely on the version-tag > remaining stable, because replacing a value (as opposed to a > key-value pair) is allowed? If the dictionary values are modified during the loop, the dict version is increased. But it's allowed to modify values when you iterate on *keys*. Victor From ianlee1521 at gmail.com Fri Apr 15 16:48:25 2016 From: ianlee1521 at gmail.com (Ian Lee) Date: Fri, 15 Apr 2016 13:48:25 -0700 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: <5711298F.7060308@mrabarnett.plus.com> References: <5711298F.7060308@mrabarnett.plus.com> Message-ID: <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com> Cross posting the comment I?d left on the issue [1]. > My preference is to actually break that logic up and avoid the wrapping in the first place, as in [2]. Which in this particular class has the side benefit of that value being used again in the same function anyways. > I'm starting to realize that Brandon Rhodes really had a big impact on my ideas of styling as I've been learning Python these past few years, as this was another one style I'm stealing from that same talk [3]. [1] http://bugs.python.org/msg263509 [2] https://github.com/python/peps/commit/0c790e7b721bd13ad12ab9e6f6206836f398f9c4 ~ Ian Lee | IanLee1521 at gmail.com > On Apr 15, 2016, at 10:49, MRAB wrote: > > On 2016-04-15 18:03, Victor Stinner wrote: > > Hum. > > > > if (width == 0 > > and height == 0 > > and color == 'red' > > and emphasis == 'strong' > > or highlight > 100): > > raise ValueError("sorry, you lose") > > > > Please remove one space to vertically align "and" operators with the > > opening parenthesis: > > > > if (width == 0 > > and height == 0 > > and color == 'red' > > and emphasis == 'strong' > > or highlight > 100): > > raise ValueError("sorry, you lose") > > > > (I'm not sure that the difference is obvious in a mail client, you > > need a fixed width font which is not the case in my Gmail editor.) > > > > It helps to visually see that the multiline test and the raise > > instruction are in two different blocks. > > > > (Moreover, the pep8 checks of OpenStack simply reject such syntax, but > > I cannot use this syntax anymore :-)) > > > I always half-indent continuation lines: > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ianlee1521%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Fri Apr 15 17:07:26 2016 From: random832 at fastmail.com (Random832) Date: Fri, 15 Apr 2016 17:07:26 -0400 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> Message-ID: <1460754446.441936.580207105.774469D1@webmail.messagingengine.com> On Fri, Apr 15, 2016, at 16:41, Victor Stinner wrote: > If the dictionary values are modified during the loop, the dict > version is increased. But it's allowed to modify values when you > iterate on *keys*. Why is iterating over items different from iterating over keys? in other words, why do I have to write: for k in dict: v = dict[k] ...do some stuff... dict[k] = something rather than for k, v in dict.items(): ...do some stuff... dict[k] = something It's not clear why the latter is something you want to prevent. From ethan at stoneleaf.us Fri Apr 15 17:16:22 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 15 Apr 2016 14:16:22 -0700 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> Message-ID: <57115A26.10402@stoneleaf.us> On 04/15/2016 01:41 PM, Victor Stinner wrote: > 2016-04-15 19:54 GMT+02:00 Jim J. Jewett: >> (2) Why *promise* not to update the version_tag when replacing a >> value with itself? > > It's an useful property. For example, let's say that you have a guard > on globals()['value']. The guard is created with value=3. An unit test > replaces the value with 50, but then restore the value to its previous > value (3). Later, the guard is checked to decide if an optimization > can be used. I don't understand -- shouldn't the version be incremented with the value was replaced with 50, and again when re-replaced with 3? >> (6) I'm also not sure why version_tag *doesn't* solve the problem >> of dicts that fool the iteration guards by mutating without changing >> size ( https://bugs.python.org/issue19332 ) ... are you just saying >> that the iterator views aren't allowed to rely on the version-tag >> remaining stable, because replacing a value (as opposed to a >> key-value pair) is allowed? > > If the dictionary values are modified during the loop, the dict > version is increased. But it's allowed to modify values when you > iterate on *keys*. I don't understand. Could you provide a small example? -- ~Ethan~ From victor.stinner at gmail.com Fri Apr 15 17:19:10 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 23:19:10 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: <1460754446.441936.580207105.774469D1@webmail.messagingengine.com> References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> <1460754446.441936.580207105.774469D1@webmail.messagingengine.com> Message-ID: 2016-04-15 23:07 GMT+02:00 Random832 : > Why is iterating over items different from iterating over keys? > > in other words, why do I have to write: > > for k in dict: > v = dict[k] > ...do some stuff... > dict[k] = something > > rather than > > for k, v in dict.items(): > ...do some stuff... > dict[k] = something > > It's not clear why the latter is something you want to prevent. Hum, I think that you misunderstood what should be prevented. Please see https://bugs.python.org/issue19332 Sorry, I don't know well this issue. I just know that sadly the PEP 509 doesn't help to fix this issue. Maybe it's not worth to mention it... Victor From victor.stinner at gmail.com Fri Apr 15 17:24:21 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 23:24:21 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: <57115A26.10402@stoneleaf.us> References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> <57115A26.10402@stoneleaf.us> Message-ID: 2016-04-15 23:16 GMT+02:00 Ethan Furman : >> It's an useful property. For example, let's say that you have a guard >> on globals()['value']. The guard is created with value=3. An unit test >> replaces the value with 50, but then restore the value to its previous >> value (3). Later, the guard is checked to decide if an optimization >> can be used. > > I don't understand -- shouldn't the version be incremented with the value > was replaced with 50, and again when re-replaced with 3? Oh wait, I'm tired and you are right. Not increasing the value only helps on this code: dict[key] = value dict[key] = value # version doesn't change >> If the dictionary values are modified during the loop, the dict >> version is increased. But it's allowed to modify values when you >> iterate on *keys*. > > I don't understand. Could you provide a small example? For example, this loop is fine: for key in dict: dict[key] = None In this loop, the dict version is increased at each loop iteration. For iter(dict), the check prevents a crash. The following example raises a RuntimeError("dictionary changed size during iteration"): d={1:2} for k in d: d[k+1] = None Victor From victor.stinner at gmail.com Fri Apr 15 17:38:51 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Apr 2016 23:38:51 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: Message-ID: Hi, FYI I updated the implementation of the PEP 509: https://bugs.python.org/issue26058 2016-04-15 11:01 GMT+02:00 Antoine Pitrou : > Why do this? It's a nice property that two dicts always have different > version tags, and now you're killing this property for... no obvious > reason? > > Do you really think dict.clear() is in need of micro-optimizing a > couple CPU cycles away? So, I played with Armin's idea. I confirm that it works for my use case, guards on dict keys. It should also work on Yury's use case. Antoine is right, it's really a micro-optimization. It shouldn't help much for the integer overflow (which is not a real issue in practice). I propose to leave the PEP unchanged to keep the nice property of unique identifier for empty dictionaries. It can help for future use cases. Victor From jimjjewett at gmail.com Fri Apr 15 17:45:32 2016 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Fri, 15 Apr 2016 17:45:32 -0400 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> Message-ID: On Fri, Apr 15, 2016 at 4:41 PM, Victor Stinner wrote: > 2016-04-15 19:54 GMT+02:00 Jim J. Jewett : >> (2) Why *promise* not to update the version_tag when replacing a >> value with itself? > It's an useful property. For example, let's say that you have a guard > on globals()['value']. The guard is created with value=3. An unit test > replaces the value with 50, but then restore the value to its previous > value (3). Later, the guard is checked to decide if an optimization > can be used. > If the dict version is increased, you need a lookup. If the dict > version is not increased, the guard is cheap. I would expect the version to be increased twice, and therefore to require a lookup. Are you suggesting that unittest should provide an example of resetting the version back to the original value when it cleans up after itself? > In C, it's very cheap to implement the test "new_value == old_value", > it just compares two pointers. Yeah, I understand that it is likely a win in terms of performance, and a good way to start off (given that you're willing to do the work). I just worry that you may end up closing off even better optimizations later, if you make too many promises about exactly how you will do which ones. Today, dict only cares about ==, and you (reasonably) think that full == isn't always worth running ... but when it comes to which tests *are* worth running, I'm not confident that the answers won't change over the years. >> [2A] Do you want to promise that replacing a value with a >> non-identical object *will* trigger a version_tag update *even* >> if the objects are equal? > It's already written in the PEP: I read that as a description of what the code does, rather than a spec for what it should do... so it isn't clear whether I could count on that remaining true. For example, if I know that my dict values are all 4-digit integers, can I write: d[k] = d[k] + 0 and be assured that the version_tag will bump? Or is that something that a future optimizer might optimize out? >> (3) It is worth being explicit on whether empty dicts can share >> a version_tag of 0. If this PEP is about dict content, then that >> seems fine, and it may well be worth optimizing dict creation. > This is not part of the PEP yet. I'm not sure that I will modify the > PEP to use the version 0 for empty dictionaries. Antoine doesn't seem > to be convinced :-) True. But do note that "not hitting the global counter an extra time for every dict creation" is a more compelling reason than "we could speed up dict.clear(), sometimes". >> (4) Please be explicit about the locking around version++; it >> is enough to say that the relevant methods already need to hold >> the GIL (assuming that is true). > I don't think that it's important to mention it in the PEP. It's more > an implementation detail. The version can be protected by atomic > operations. Now I'm the one arguing from a specific implementation. :D My thought was that any sort of locking (including atomic operations) is slow, but if the GIL is already held, then there is no *extra* locking cost. (Well, a slightly longer hold on the lock, but...) >> (5) I'm not sure I understand the arguments around a per-entry >> version. >> On the one hand, you never need a strong reference to the value; >> if it has been collected, then it has obviously been removed from >> the dict and should trigger a change even with per-dict. > > Let's say that you watch the key1 of a dict. The key2 is modified, it > increases the version. Later, you test the guard: to check if the key1 > was modified, you need to lookup the key and compare the value. You > need the value to compare it. And the value for key1 is still there, so you can. The only reason you would notice that the key2 value had gone away is if you also care about key2 -- in which case the cached value is out of date, regardless of what specific value it used to hold. >> (6) I'm also not sure why version_tag *doesn't* solve the problem >> of dicts that fool the iteration guards by mutating without changing >> size ( https://bugs.python.org/issue19332 ) ... are you just saying >> that the iterator views aren't allowed to rely on the version-tag >> remaining stable, because replacing a value (as opposed to a >> key-value pair) is allowed? > If the dictionary values are modified during the loop, the dict > version is increased. But it's allowed to modify values when you > iterate on *keys*. Sure. So? I see three cases: (A) I don't care that the collection changed. The python implementation might, but I don't. (So no bug even today.) (B) I want to process exactly the collection that I started with. If some of the values get replaced, then I want to complain, even if python doesn't. version_tag is what I want. (C) I want to process exactly the original keys, but go ahead and use updated values. The bug still bites, but ... I don't think this case is any more common than B. -jJ From victor.stinner at gmail.com Fri Apr 15 19:31:45 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 16 Apr 2016 01:31:45 +0200 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> Message-ID: .2016-04-15 23:45 GMT+02:00 Jim J. Jewett : >> It's an useful property. For example, let's say that you have a guard >> on globals()['value']. The guard is created with value=3. An unit test >> replaces the value with 50, but then restore the value to its previous >> value (3). Later, the guard is checked to decide if an optimization >> can be used. > >> If the dict version is increased, you need a lookup. If the dict >> version is not increased, the guard is cheap. > > I would expect the version to be increased twice, and therefore to > require a lookup. Are you suggesting that unittest should provide an > example of resetting the version back to the original value when it > cleans up after itself? Sorry, as I wrote in another email that I was wrong. If you modify the value, the version is increased. The discussed case is really a corner case: the version does not change if the key is set again to exactly the same value. d[key] = value d[key] = value It's just that it's cheap to implement it :-) >> In C, it's very cheap to implement the test "new_value == old_value", >> it just compares two pointers. > > Yeah, I understand that it is likely a win in terms of performance, > and a good way to start off (given that you're willing to do the > work). > > I just worry that you may end up closing off even better optimizations > later, if you make too many promises about exactly how you will do > which ones. > > Today, dict only cares about ==, and you (reasonably) think that full > == isn't always worth running ... but when it comes to which tests > *are* worth running, I'm not confident that the answers won't change > over the years. I checked, currently there is no unit test for a==b, only for a is b. I will add add a test for a==b but a is not b, and ensure that the version is increased. >>> [2A] Do you want to promise that replacing a value with a >>> non-identical object *will* trigger a version_tag update *even* >>> if the objects are equal? > >> It's already written in the PEP: > > I read that as a description of what the code does, rather than a spec > for what it should do... so it isn't clear whether I could count on > that remaining true. > > For example, if I know that my dict values are all 4-digit integers, > can I write: > > d[k] = d[k] + 0 > > and be assured that the version_tag will bump? Or is that something > that a future optimizer might optimize out? Hum, I will try to clarify that. >>> (4) Please be explicit about the locking around version++; it >>> is enough to say that the relevant methods already need to hold >>> the GIL (assuming that is true). > >> I don't think that it's important to mention it in the PEP. It's more >> an implementation detail. The version can be protected by atomic >> operations. > > Now I'm the one arguing from a specific implementation. :D > > My thought was that any sort of locking (including atomic operations) > is slow, but if the GIL is already held, then there is no *extra* > locking cost. (Well, a slightly longer hold on the lock, but...) Hum, since the PEP clarify targets CPython, I will simply described its implementation, so explain that the GIL ensures that version++ is atomic. >>> On the one hand, you never need a strong reference to the value; >>> if it has been collected, then it has obviously been removed from >>> the dict and should trigger a change even with per-dict. >> >> Let's say that you watch the key1 of a dict. The key2 is modified, it >> increases the version. Later, you test the guard: to check if the key1 >> was modified, you need to lookup the key and compare the value. You >> need the value to compare it. > > And the value for key1 is still there, so you can. Sorry, how do you want to compare that dict[key1] value didn't change, using the value identifier? dict[key1] is old_value_id? The problem with storing an identifier (a pointer in C) with no strong reference is when the object is destroyed, a new object can likely get the same identifier. So it's likely that "dict[key] is old_value_id" can be true even if dict[key] is now a new object. > The only reason you would notice that the key2 value had gone away is > if you also care about key2 -- in which case the cached value is out > of date, regardless of what specific value it used to hold. I don't understand, technically, what do you mean by "out of date" for an object? >> If the dictionary values are modified during the loop, the dict >> version is increased. But it's allowed to modify values when you >> iterate on *keys*. > > Sure. So? > > I see three cases: > > (A) I don't care that the collection changed. The python > implementation might, but I don't. (So no bug even today.) I'm sorry, I don't understand your description. What do you mean by "collection"? It's different if you modify dict *keys*, or dict *values*, or both. Serhiy opened an issue because he wants to raise an exception if keys are modified while you iterate on keys: https://bugs.python.org/issue19332 But only modifying values must *not* raise an exception. > (B) I want to process exactly the collection that I started with. If > some of the values get replaced, then I want to complain, even if > python doesn't. version_tag is what I want. This is not the issue #19332. > (C) I want to process exactly the original keys, but go ahead and use > updated values. The bug still bites, but ... I don't think this case > is any more common than B. I don't understand exaclty your definition neither. Maybe you need to provide an example of code. Sorry, I don't understand why do you want to discuss the issue #19332 here. I only mentioned the issue in "Prior Work" because the implementation is *similar*, but the PEP 509 is different and so it doesn't help to fix this issue. Do you want to modify the PEP 509 to fix this issue? Or you don't understand why the PEP 509 cannot be used to fix the issue? I'm lost... Victor From pludemann at google.com Fri Apr 15 23:46:51 2016 From: pludemann at google.com (Peter Ludemann) Date: Fri, 15 Apr 2016 20:46:51 -0700 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com> References: <5711298F.7060308@mrabarnett.plus.com> <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com> Message-ID: If Python ever adopts the BCPL rule for implicit line continuation if the last thing on a line is an operator (or if there's an open parentheses), then the break-after-an-operator rule would be more persuasive. ;) [IIRC, the BCPL rule was that there was an implicit continuation if the grammar would not allow inserting a semicolon at the end of the line, which covered both the open-parens and last-item-is-operator cases, and probably a few others.] But I should shut up and leave shut discussions to python-ideas. On 15 April 2016 at 13:48, Ian Lee wrote: > Cross posting the comment I?d left on the issue [1]. > > > My preference is to actually break that logic up and avoid the wrapping > in the first place, as in [2]. Which in this particular class has the side > benefit of that value being used again in the same function anyways. > > > I'm starting to realize that Brandon Rhodes really had a big impact on > my ideas of styling as I've been learning Python these past few years, as > this was another one style I'm stealing from that same talk [3]. > > [1] http://bugs.python.org/msg263509 > [2] > https://github.com/python/peps/commit/0c790e7b721bd13ad12ab9e6f6206836f398f9c4 > > ~ Ian Lee | IanLee1521 at gmail.com > > On Apr 15, 2016, at 10:49, MRAB wrote: > > On 2016-04-15 18:03, Victor Stinner wrote: > > Hum. > > > > if (width == 0 > > and height == 0 > > and color == 'red' > > and emphasis == 'strong' > > or highlight > 100): > > raise ValueError("sorry, you lose") > > > > Please remove one space to vertically align "and" operators with the > > opening parenthesis: > > > > if (width == 0 > > and height == 0 > > and color == 'red' > > and emphasis == 'strong' > > or highlight > 100): > > raise ValueError("sorry, you lose") > > > > (I'm not sure that the difference is obvious in a mail client, you > > need a fixed width font which is not the case in my Gmail editor.) > > > > It helps to visually see that the multiline test and the raise > > instruction are in two different blocks. > > > > (Moreover, the pep8 checks of OpenStack simply reject such syntax, but > > I cannot use this syntax anymore :-)) > > > I always half-indent continuation lines: > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ianlee1521%40gmail.com > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/pludemann%40google.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Apr 16 00:09:23 2016 From: random832 at fastmail.com (Random832) Date: Sat, 16 Apr 2016 00:09:23 -0400 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: <5711298F.7060308@mrabarnett.plus.com> <3A1DA6EE-DD06-47D1-80D0-BF6822C5B041@gmail.com> Message-ID: <1460779763.2138868.580401713.09908C97@webmail.messagingengine.com> On Fri, Apr 15, 2016, at 23:46, Peter Ludemann via Python-Dev wrote: > If Python ever adopts the BCPL rule for implicit line continuation if > the last thing on a line is an operator (or if there's an open > parentheses), then the break-after-an-operator rule would be more > persuasive. ;) > > [IIRC, the BCPL rule was that there was an implicit continuation if > the grammar would not allow inserting a semicolon at the end of the > line, which covered both the open-parens and last-item-is-operator > cases, and probably a few others.] But I should shut up and leave shut > discussions to python-ideas. Sounds like Visual Basic. Meanwhile, Javascript's rule is that there's an implicit semicolon if and only if the grammar would not allow the two lines to be considered as a single statement. Insanity comes in all flavors. From stephen at xemacs.org Sat Apr 16 07:21:54 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 Apr 2016 20:21:54 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> Message-ID: <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > On 15 April 2016 at 00:52, Stephen J. Turnbull wrote: > > Nick Coghlan writes: > > > > > The use case for returning bytes from __fspath__ is DirEntry, so you > > > can write things like this in low level code: > > > > > > def myscandir(dirpath): > > > for entry in os.scandir(dirpath): > > > if entry.is_file(): > > > with open(entry) as f: > > > # do something > > > > Excuse me, but that is *not* a use case for returning bytes from > > DirEntry.__fspath__. open() is perfectly happy taking str (including > > surrogate-encoded rawbytes). > > That results in a different type for the file object's name: > > >>> open("README.md").name > 'README.md' > >>> open(b"README.md").name > b'README.md' OK, you win, __fspath__ needs to be polymorphic. But you've just shifted me to -1 on "os.fspath": it's an attractive nuisance. EIBTI, applications and high-level library functions should use os.fsdecode or os.fsencode. Functions that take a polymorphic argument and want preserve type should invoke __fspath__ on the argument. That will visually signal that the caller is not merely low-level, but is explicitly a boundary function. (You could rename the generic function as "os._fspath", I guess, but I *really* want to deprecate calling the polymorphic version in user code. _fspath can be added if experience shows that polymorphic usage is very desireable outside the stdlib. This remark is in my not-so-Dutch opinion, of course.) > The guarantee we want to provide those folks is that if they're > operating in the binary domain they'll stay there. Et tu, Nick? "Guarantee"?! You can't guarantee any such thing with an implicitly invoked polymorphic API like this one -- unless you consider a crashed program to be in the binary domain. ;-) Note that the current proposala don't even do that for the binary domain, only for the text domain! From p.f.moore at gmail.com Sat Apr 16 08:05:25 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 16 Apr 2016 13:05:25 +0100 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> Message-ID: On 16 April 2016 at 12:21, Stephen J. Turnbull wrote: > OK, you win, __fspath__ needs to be polymorphic. > > But you've just shifted me to -1 on "os.fspath": it's an attractive > nuisance. EIBTI, applications and high-level library functions should > use os.fsdecode or os.fsencode. I presume your expectation is that os.fsencode/os.fsdecode will work with objects supporting the __fspath__ protocol? So the question for me is, if I'm writing a function that takes a path argument p (in the most general sense - I want my function to be able to handle anything the stdlib functions can) then how do I write the code? There are 4 cases I can think of: 1. I just want to pass the argument on to other functions - just do so, stdlib functions will work fine. 2. I need a string - use os.fsdecode(p) 3. I need bytes - use os.fsencode(p) 4. I need a guaranteed pathlib.Path object so that I can use Path methods - convert via pathlib.Path(os.fsdecode(p)) I guess there's the possibility that you want to deliberately reject bytes-like paths, and it's not immediately obvious how you'd do that without os.fspath or using the __fspath__ protocol directly, but I'm not sure what anyone gains by doing so (maybe the chance to fail early? but doesn't using fsdecode mean I never need to fail at all?) While I don't have any specific reason to object to os.fspath, I'd appreciate someone describing a concrete use case that needs it (and isn't covered by any of the options above). Paul From francismb at email.de Sat Apr 16 09:29:41 2016 From: francismb at email.de (francismb) Date: Sat, 16 Apr 2016 15:29:41 +0200 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: <57123E45.9050902@email.de> Hi, On 04/15/2016 07:43 PM, Guido van Rossum wrote: > The update is already serving its real purpose: showing that style is > debatable and cannot always easily be reduced to fixed rules. > As you said, there will be always some kind personal preferences or style taste and one can see on the debate that the current rules are context dependent. But I wonder how far that style context/rule (function) evaluation/application issue could be solved in a machine learning context. Regards, francis From stephen at xemacs.org Sat Apr 16 09:46:02 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 Apr 2016 22:46:02 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> Message-ID: <22290.16922.481493.207376@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > On 16 April 2016 at 12:21, Stephen J. Turnbull wrote: > > OK, you win, __fspath__ needs to be polymorphic. > > > > But you've just shifted me to -1 on "os.fspath": it's an attractive > > nuisance. EIBTI, applications and high-level library functions should > > use os.fsdecode or os.fsencode. > > I presume your expectation is that os.fsencode/os.fsdecode will work > with objects supporting the __fspath__ protocol? Yes, I've suggested that before, and I think it's TOOWTDI, rather than insisting on a os.fspath intervening, even if os.fspath is included after all. > So the question for me is, if I'm writing a function that takes a path > argument p: > 1. I just want to pass the argument on to other functions - just do > so, stdlib functions will work fine. I think this is a bad idea unless you *need* polymorphism, but OK, it's "consenting adults". > 2. I need a string - use os.fsdecode(p) > 3. I need bytes - use os.fsencode(p) > 4. I need a guaranteed pathlib.Path object so that I can use Path > methods - convert via pathlib.Path(os.fsdecode(p)) LGTM. Applications or user toolkits could provide a derived IFeelLuckyPath(Path) for symmetry with the os functions. > I guess there's the possibility that you want to deliberately reject > bytes-like paths, I wouldn't put it that way. I think more likely is the possibility that you want to restrict yourself to a particular type, as all your code is written in terms of that type and expects that type. Note that Nick's example shows that in both the bytes domain and the text domain you can easily end up with a filelike.name of the wrong type. > and it's not immediately obvious how you'd do that without > os.fspath or using the __fspath__ protocol directly, but I'm not > sure what anyone gains by doing so (maybe the chance to fail early? > but doesn't using fsdecode mean I never need to fail at all?) Well, wouldn't you like to raise there if your dataflow spec says only one type should ever be observed? The reasons that I wouldn't bother are that (1) I suspect it's going to be very rare to see bytes in a text application, and (2) in bytes- oriented code I would be fairly likely to either specify literals as str (a bug, but nobody would ever notice) or importing them from an .ini or other text source (which might very well be in a non- filesystem encoding in my environment!) In either case it's probably the filename I want but specified in the wrong form. From stephen at xemacs.org Sat Apr 16 09:48:47 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 Apr 2016 22:48:47 +0900 Subject: [Python-Dev] PEP 8 updated on whether to break before or after a binary update In-Reply-To: References: Message-ID: <22290.17087.797345.923061@turnbull.sk.tsukuba.ac.jp> Victor Stinner writes: > Hum. > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") > > Please remove one space to vertically align "and" operators with the > opening parenthesis: > > if (width == 0 > and height == 0 > and color == 'red' > and emphasis == 'strong' > or highlight > 100): > raise ValueError("sorry, you lose") The RightThang[tm] is to remove "if" and replace it with the Japanese "moshi": moshi (width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100): raise ValueError("sorry, you lose") It-works-for-me-ly y'rs, From p.f.moore at gmail.com Sat Apr 16 12:30:15 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 16 Apr 2016 17:30:15 +0100 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22290.16922.481493.207376@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22290.16922.481493.207376@turnbull.sk.tsukuba.ac.jp> Message-ID: On 16 April 2016 at 14:46, Stephen J. Turnbull wrote: > Paul Moore writes: [...] > > 1. I just want to pass the argument on to other functions - just do > > so, stdlib functions will work fine. > > I think this is a bad idea unless you *need* polymorphism, but OK, > it's "consenting adults". All I'm really saying here is that if you don't need to care about type checking (and 99% of Python programs rely on duck typing, so this is pretty much the norm) then everything will be OK. I'm not suggesting encouraging polymorphism, just pointing out that most code should simply work and this whole debate is a non-issue for code like that. (That's the whole point of getting the stdlib functions to accept Path objects, after all :-)) > > 2. I need a string - use os.fsdecode(p) > > 3. I need bytes - use os.fsencode(p) > > 4. I need a guaranteed pathlib.Path object so that I can use Path > > methods - convert via pathlib.Path(os.fsdecode(p)) > > LGTM. Applications or user toolkits could provide a derived > IFeelLuckyPath(Path) for symmetry with the os functions. > > > I guess there's the possibility that you want to deliberately reject > > bytes-like paths, > > I wouldn't put it that way. I think more likely is the possibility > that you want to restrict yourself to a particular type, as all your > code is written in terms of that type and expects that type. Note > that Nick's example shows that in both the bytes domain and the text > domain you can easily end up with a filelike.name of the wrong type. But within your own code, you do that by convention and good coding practices, not by explicit type checks (except in boundary code). If you're writing a library to be used by others, you should be as permissive as possible - you may not expect your code to be called with bytes-like paths, but why go out of your way to reject it? That's not Pythonic, IMO. (On the other hand, documenting that only text-like path objects are supported by your library is fine). In my experience, bytes/text safety is about being aware of where the two different types appear in your program, not about forcing only one type. So my cases are about keeping the types clear - the output of (1) is "same as input", of (2) is "string", of (3) is "bytes" and of (4) is "Path". Call me with whatever you like, I can work with it in terms I need. But we're mostly just debating coding style here, I think we agree on the basic principle. > > and it's not immediately obvious how you'd do that without > > os.fspath or using the __fspath__ protocol directly, but I'm not > > sure what anyone gains by doing so (maybe the chance to fail early? > > but doesn't using fsdecode mean I never need to fail at all?) > > Well, wouldn't you like to raise there if your dataflow spec says only > one type should ever be observed? Meh. Maybe asserts, maybe unit tests. But typechecks throughout my code sounds more like strong typing than Python. But as I say, coding style - I write scripts, glue code, and general-use libraries. None of these lend themselves to that sort of rigorous dataflow analysis (this is the same reason I have little personal use for the new typechecking stuff). > The reasons that I wouldn't bother are that (1) I suspect it's going > to be very rare to see bytes in a text application, and (2) in bytes- > oriented code I would be fairly likely to either specify literals as > str (a bug, but nobody would ever notice) or importing them from an > .ini or other text source (which might very well be in a non- > filesystem encoding in my environment!) In either case it's probably > the filename I want but specified in the wrong form. Also, that feels very much like the sort of boundary code that needs to do the fiddly rigorous stuff so the rest of us don't have to :-) Paul From chris.barker at noaa.gov Sat Apr 16 14:47:26 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Sat, 16 Apr 2016 11:47:26 -0700 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us> <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us> Message-ID: <8690193008049818583@unknownmsgid> > On Apr 13, 2016, at 8:31 PM, Nick Coghlan wrote: > >>> class Special(bytes): >>> def __fspath__(self): >>> return 'str-val' >>> obj = Special('bytes-val', 'utf8') >>> path_obj = fspath(obj, allow_bytes=True) >>> >>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. > > In this kind of case, inheritance tends to trump protocol. Sure, but... > example, int subclasses can't override operator.index: ... > The reasons for that behaviour are more pragmatic than philosophical: > builtins and their subclasses are extensively special-cased for speed > reasons, OK, but in this case, purity can beat practicality. If the author writes an __fspath__ method, presumably it's because it should be used. And I can certainly imagine one might want to store a path representation as bytes, but NOT want the raw bytes passed off to file handling libs. (of course you could use composition rather than subclassing if you had to) -CHB From gunkmute at gmail.com Sat Apr 16 20:04:57 2016 From: gunkmute at gmail.com (Demur Rumed) Date: Sun, 17 Apr 2016 00:04:57 +0000 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units Message-ID: The outstanding bug with this patch right now is a regression in line numbers causing the test for http://bugs.python.org/issue9936 to fail. I've tried to debug it without success -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Apr 16 21:28:09 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 17 Apr 2016 11:28:09 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> Message-ID: On 16 April 2016 at 21:21, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > On 15 April 2016 at 00:52, Stephen J. Turnbull wrote: > > > Nick Coghlan writes: > > > > > > > The use case for returning bytes from __fspath__ is DirEntry, so you > > > > can write things like this in low level code: > > > > > > > > def myscandir(dirpath): > > > > for entry in os.scandir(dirpath): > > > > if entry.is_file(): > > > > with open(entry) as f: > > > > # do something > > > > > > Excuse me, but that is *not* a use case for returning bytes from > > > DirEntry.__fspath__. open() is perfectly happy taking str (including > > > surrogate-encoded rawbytes). > > > > That results in a different type for the file object's name: > > > > >>> open("README.md").name > > 'README.md' > > >>> open(b"README.md").name > > b'README.md' > > OK, you win, __fspath__ needs to be polymorphic. > > But you've just shifted me to -1 on "os.fspath": it's an attractive > nuisance. > > EIBTI, applications and high-level library functions should > use os.fsdecode or os.fsencode. Functions that take a polymorphic > argument and want preserve type should invoke __fspath__ on the > argument. That will visually signal that the caller is not merely > low-level, but is explicitly a boundary function. str and bytes aren't going to implement __fspath__ (since they're only *sometimes* path objects), so asking people to call the protocol method directly for any purpose would be a pain. > (You could rename > the generic function as "os._fspath", I guess, but I *really* want to > deprecate calling the polymorphic version in user code. _fspath can > be added if experience shows that polymorphic usage is very desireable > outside the stdlib. This remark is in my not-so-Dutch opinion, of > course.) You may have missed my email where I agreed os.fspath() itself needs to ensure the output is a str object and throw an exception otherwise. The remaining API design debate relates to whether the polymorphic version should be "os.fspath(obj, allow_bytes=True)" or "os._raw_fspath(obj)" (with Ethan favouring the former, and me the latter). > > > The guarantee we want to provide those folks is that if they're > > operating in the binary domain they'll stay there. > > Et tu, Nick? "Guarantee"?! You can't guarantee any such thing with > an implicitly invoked polymorphic API like this one -- unless you > consider a crashed program to be in the binary domain. ;-) I do, as one of the core changes in design philosophy between Python 2 and 3 is attempting to remove the implicit level shifting between the binary and text domains, and instead throw exceptions in those cases. Pragmatism requires us to keep some of them (e.g. the codecs module is officially object<->object in both Python 2 and Python 3, and string formatting codes can still do unexpected things), but a great many of them are already gone, and we don't want to add any new ones if alternative designs are available. > Note that > the current proposala don't even do that for the binary domain, only > for the text domain! Folks that want to ensure they're working in the binary domain can already do "memoryview(obj)" to ensure they have a bytes-like object without constraining it to a specific type. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Apr 16 21:38:11 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 17 Apr 2016 11:38:11 +1000 Subject: [Python-Dev] pathlib - current status of discussions In-Reply-To: <8690193008049818583@unknownmsgid> References: <570C1E13.4090909@stoneleaf.us> <87vb3lcem4.fsf@thinkpad.rath.org> <570ECF2E.1070004@stoneleaf.us> <87oa9c3nii.fsf@vostro.rath.org> <570F0B24.50705@stoneleaf.us> <8690193008049818583@unknownmsgid> Message-ID: On 17 April 2016 at 04:47, Chris Barker - NOAA Federal wrote: >> On Apr 13, 2016, at 8:31 PM, Nick Coghlan wrote: >> >>>> class Special(bytes): >>>> def __fspath__(self): >>>> return 'str-val' >>>> obj = Special('bytes-val', 'utf8') >>>> path_obj = fspath(obj, allow_bytes=True) >>>> >>>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. >> >> In this kind of case, inheritance tends to trump protocol. > > Sure, but... > >> example, int subclasses can't override operator.index: > ... >> The reasons for that behaviour are more pragmatic than philosophical: >> builtins and their subclasses are extensively special-cased for speed >> reasons, > > OK, but in this case, purity can beat practicality. If the author > writes an __fspath__ method, presumably it's because it should be > used. > > And I can certainly imagine one might want to store a path > representation as bytes, but NOT want the raw bytes passed off to file > handling libs. > > (of course you could use composition rather than subclassing if you had to) Exactly - inheritance is a really strong relationship that directly affects the in-memory layout of instances (at least in CPython), and also the kinds of assumption other code will make about that type (for example, subclasses are special cased to allow them to override the behaviour of numeric binary operators when they appear as the right operand with an instance of the parent type as the left operand, while with unrelated types, the left operand always gets the first chance to handle the operation). When folks don't want to trigger those "this is an " behaviours, the appropriate design pattern is composition, not inheritance (and many of the ABCs were introduced to make it easier to implement particular interfaces without inheriting from the corresponding builtin types). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Sun Apr 17 04:03:54 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 17 Apr 2016 17:03:54 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> Message-ID: <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > str and bytes aren't going to implement __fspath__ (since they're > only *sometimes* path objects), so asking people to call the > protocol method directly for any purpose would be a pain. It *should* be a pain. People who need bytes should call fsencode, people who need str should call fsdecode, and Ethan's antipathy checks for bytes and str, then calls __fspath__ if needed. Who's left? Just the bartender and the janitor, last call was hours ago. OK, maybe there are enough clients to make it worthwhile to provide the utility, but it should be clearly marked as "double opt-in, for experts only (consenting adults must show proof of insurance)". The functionality of raising on wrong types can be incorporated in fsencode and fsdecode, but I think there's still some discussion needed about the conditions for raising, and what flags are needed. Of course with this reinterpretation, names like "fs_ensure_str" and "fs_ensure_bytes" might be more appropriate (much as y'all hate putting types in function names, in this case I think that's best). But backward compatibility, and the existing names aren't *that* bad I guess. > You may have missed my email where I agreed os.fspath() itself > needs to ensure the output is a str object and throw an exception > otherwise. Presumably it should do the same for bytes when those are desired, though. I don't find the "cast to bytes using memoryview" approach plausible, especially not where I live: if str, very likely some of the characters will be outside of the latin1 repertoire, and thus the internal representation will likely be full of NULs, and certainly not be what the user wants. > The remaining API design debate relates to whether the polymorphic > version should be "os.fspath(obj, allow_bytes=True)" or > "os._raw_fspath(obj)" (with Ethan favouring the former, and me the > latter). > > Et tu, Nick? "Guarantee"?! You can't guarantee any such thing > > with an implicitly invoked polymorphic API like this one -- > > unless you consider a crashed program to be in the binary > > domain. ;-) > > I do, as one of the core changes in design philosophy between > Python 2 and 3 is attempting to remove the implicit level shifting > between the binary and text domains, Hey, Reverend, I've been singing those hymns since the early '90s. > and instead throw exceptions in those cases. Then I don't understand the current design of fsdecode and fsencode. Shouldn't they raise on str and bytes respectively, rather than passing them through? In general, I would expect that something that's explicitly intended to be polymorphic would be documented as such, and the *caller* would be responsible for type-checking and raising if it got the wrong thing. Steve From ncoghlan at gmail.com Sun Apr 17 08:36:15 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 17 Apr 2016 22:36:15 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> Message-ID: On 17 April 2016 at 18:03, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > and instead throw exceptions in those cases. > > Then I don't understand the current design of fsdecode and fsencode. > Shouldn't they raise on str and bytes respectively, rather than > passing them through? In general, I would expect that something > that's explicitly intended to be polymorphic would be documented as > such, and the *caller* would be responsible for type-checking and > raising if it got the wrong thing. > I was initially surprised myself, but then realised it made sense for their intended use cases - if almost every usage looks like "obj if isinstance(obj, str) else os.fsdecode(obj)", then there ends up being a strong pragmatic case for pushing the pass-through down into the underlying function to reduce code duplication and rejecting str input in the cases where it isn't supported. By contrast, there are lots of places where "obj.decode()" gets called without a pass-through for objects that are already decoded. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sun Apr 17 09:58:19 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 17 Apr 2016 16:58:19 +0300 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sun, Apr 17, 2016 at 11:03 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > str and bytes aren't going to implement __fspath__ (since they're > > only *sometimes* path objects), so asking people to call the > > protocol method directly for any purpose would be a pain. > > It *should* be a pain. People who need bytes should call fsencode, > people who need str should call fsdecode, and Ethan's antipathy checks > for bytes and str, then calls __fspath__ if needed. Who's left? Just > the bartender and the janitor, last call was hours ago. OK, maybe > there are enough clients to make it worthwhile to provide the utility, > but it should be clearly marked as "double opt-in, for experts only > (consenting adults must show proof of insurance)". My doubts, expressed several times in these threads, about the need for a *public* os.fspath function to complement the __fspath__ protocol, are now perhaps gone. I'll explain why (and how). The reasons for my doubts were that (1) The audience outside the stdlib for such a function should be small, because it is preferred to either use existing tools in os.path.* or pathlib (or similar) for manipulating paths. (2) There are just too many different possible versions of this function: rejecting str, rejecting bytes, coercion to str, coercion to bytes, and accepting both str and bytes. That's a total of 5 different cases. People also used to talk about versions that would not allow passing through objects that are already bytes or str. That would make it a total of 10 different versions! (in principle, there could be even more, but let's not go there :-). In other words, this argument was that it is probably best to implement whatever flavor is needed for the context, perhaps based on documented recipes. Regarding (2), we can first rule out half of the 10 cases---the ones that reject plain instances of bytes and/or str---because they would not be very useful as all the isinstance/hasattr checking etc. would be left to the caller. And here are the remaining five, explained based on what they accept as argument, what they return, and where they would be used: (A) "polymorphic" *Accept*: str and bytes, provided via __fspath__ as well as plain str and bytes instances. *Return*: str/bytes depending on input. *Audience*: the stdlib, including os.path.things, os.things, shutil.things, open, ... (some functions would need a C version). There may even be a small audience outside the stdlib. (B) "str-based only" *Accept*: str, provided via __fspath__ as well as plain str. *Return*: str. *Audience*: relatively low-level code that works exclusively with str paths but accepts specialized path objects as input. (C) "bytes-based only" *Accept*: bytes, provided via __fspath__ as well as plain bytes. *Return*: bytes. *Audience*: low-level code that explicitly deals with paths as bytes (probably to deal with undefined/ill-defined encodings). (D) "coerce to str" *Accept*: str and bytes, provided via __fspath__ as well as plain str and bytes instances. *Return*: str (coerced / decoded if needed). *Audience*: code that deals explicitly with str but wants to 'try' supporting bytes-based path inputs too via implicit decoding (even if it may result in surrogate escapes, which one cannot for instance print(...).) (E) "coerce to bytes" *Accept*: str and bytes, provided via __fspath__ as well as plain str and bytes instances. *Return*: bytes (coerced / encoded if needed). *Audience*: low-level code that explicitly deals with bytes paths but wants to accept str-based path inputs too via implicit encoding. Even if all options (A-E) probably have small audiences (compared to e.g. os.path.*), some of them have larger audiences than others. But all of them have at least *some* reasonable audience (as desribed above). Recently (well, a few days ago, but 'recently', considering the scale of these discussions anyway ;-), Nick pointed out something I hadn't realized---os.fsencode and os.fsdecode actually already implement coercion to bytes and str, respectively. With those two functions made compatible with the __fspath__ protocol [using (A) above], they would in fact *be* (D) and (E), respectively. Now, we only have options (A-C) left. They could all be implemented roughly as follows: def fspath(pathlike, *, output_types = (str,)): if hasattr(pathlike, '__fspath__'): ret = pathlike.__fspath__() # or pathlike.__fspath__ if it's not a method else: ret = pathlike if not isinstance(ret, output_types): raise TypeError("argument is not and does not provide an acceptable pathname") return ret With an implementation like the above, (A) would correspond to output_types = (str, bytes), (B) to the default, and (C) to output_types = (bytes,). So, with the above considerations as a counterargument, I consider argument (2) gone. What about argument (1), that the audience for the os.fspath(...) function (especially for one selected version of the 5 or 10 variations!) is quite small, and we should not encourage manipulating pathnames by hand, but to use os.path.* or pathlib instead? The counterargument for (1): It seems to me we now "all" agree that __fspath__ should allow str+bytes polymorphism. I could try to list who I mean by "all" (Ethan, Brett, Stephen T, Nick, ... ?), but obviously I won't be able to list all or speak for them so I won't even try :-). Anyway, for this argument, I'm assuming we agree on that. So, __fspath__ can provide either str or bytes, even if str is *highly preferred* in most places. Therefore, the os.fspath function, as part of the protocol, has the important role of *by default* rejecting bytes, so that the protocol effectively becomes str-only by default. With the fspath implementation like the one I drafted above, and os.fsencode+os.fsdecode, we in fact cover all cases (A-E). So, as a summary: With a str+bytes-polymorphic __fspath__, with the above argumentation and the rough implementation of os.fspath(...), the conclusion is that the os.fspath function should indeed be public, and that no further variations are needed. -Koos P.S. There is also the possibility of two dunder methods corresponding to str and bytes, leading to one being preferred over the other in some cases etc. I have gone though various aspects and possible versions of that approach, but concluded it's not worth it, as some of us may also have implied in earlier posts. After all, we want something that's *almost* exclusively str. From ericfahlgren at gmail.com Sun Apr 17 12:08:19 2016 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Sun, 17 Apr 2016 09:08:19 -0700 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: <008901d198c3$5cba8f70$162fae50$@gmail.com> Just on the off chance that it?s related, could it have something to do with the bug in findlabels? http://bugs.python.org/issue26448 (I have high confidence that my patch fixes the problem, just haven?t gotten around to completing the tests.) From: Demur Rumed [mailto:gunkmute at gmail.com] Sent: Saturday, April 16, 2016 17:05 To: python-dev at python.org Subject: Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units The outstanding bug with this patch right now is a regression in line numbers causing the test for http://bugs.python.org/issue9936 to fail. I've tried to debug it without success -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Apr 17 14:14:02 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 17 Apr 2016 11:14:02 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> Message-ID: <5713D26A.4000704@stoneleaf.us> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote: > So, as a summary: With a str+bytes-polymorphic __fspath__, with the > above argumentation and the rough implementation of os.fspath(...), > the conclusion is that the os.fspath function should indeed be public, > and that no further variations are needed. Nice summation, thank you. :) -- ~Ethan~ From k7hoven at gmail.com Sun Apr 17 17:05:24 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 18 Apr 2016 00:05:24 +0300 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <5713D26A.4000704@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <5713D26A.4000704@stoneleaf.us> Message-ID: On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman wrote: > On 04/17/2016 06:58 AM, Koos Zevenhoven wrote: > >> So, as a summary: With a str+bytes-polymorphic __fspath__, with the >> above argumentation and the rough implementation of os.fspath(...), >> the conclusion is that the os.fspath function should indeed be public, >> and that no further variations are needed. > > > Nice summation, thank you. :) > Come on, Ethan, that summary was not for you ;) It was for lazy people, people with bad memory, or people not so involved in the topic. I wrote a big post, provided new arguments, with other points collected into the same logical framework, wrote a new version of os.fspath and argued why it is the right one --- and all you do is read the stupid summary. You can do better than that: read the whole thing! ;-). -Koos From rosuav at gmail.com Sun Apr 17 17:14:19 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 18 Apr 2016 07:14:19 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <5713D26A.4000704@stoneleaf.us> Message-ID: On Mon, Apr 18, 2016 at 7:05 AM, Koos Zevenhoven wrote: > On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman wrote: >> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote: >> >>> So, as a summary: With a str+bytes-polymorphic __fspath__, with the >>> above argumentation and the rough implementation of os.fspath(...), >>> the conclusion is that the os.fspath function should indeed be public, >>> and that no further variations are needed. >> >> >> Nice summation, thank you. :) >> > > Come on, Ethan, that summary was not for you ;) It was for lazy > people, people with bad memory, or people not so involved in the > topic. I wrote a big post, provided new arguments, with other points > collected into the same logical framework, wrote a new version of > os.fspath and argued why it is the right one --- and all you do is > read the stupid summary. You can do better than that: read the whole > thing! ;-). Yes, but people like me who haven't read every single post appreciate the vote of support from someone who has. Ethan's post says that this one-paragraph summary has twice as much weight as it had when only one person attests it. So, thank you Koos for summarizing, and thank you Ethan for affirming the summary. ChrisA From ethan at stoneleaf.us Sun Apr 17 17:52:37 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 17 Apr 2016 14:52:37 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <5713D26A.4000704@stoneleaf.us> Message-ID: <571405A5.7090406@stoneleaf.us> On 04/17/2016 02:05 PM, Koos Zevenhoven wrote: > On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman wrote: >> On 04/17/2016 06:58 AM, Koos Zevenhoven wrote: >> >>> So, as a summary: With a str+bytes-polymorphic __fspath__, with the >>> above argumentation and the rough implementation of os.fspath(...), >>> the conclusion is that the os.fspath function should indeed be public, >>> and that no further variations are needed. >> >> >> Nice summation, thank you. :) >> > > Come on, Ethan, that summary was not for you ;) Heh. > You can do better than that: read the whole thing! ;-). Ah, but I did read the whole thing! I just didn't want to quote it all and then add one line, so I snipped the rest. Let me try again: Good, well thought-out post. Thank you. :) if-at-first-you-don't-succeed'ly yrs, -- ~Ethan~ From burkhardameier at gmail.com Mon Apr 18 01:23:49 2016 From: burkhardameier at gmail.com (Burkhard Meier) Date: Sun, 17 Apr 2016 22:23:49 -0700 Subject: [Python-Dev] My first post here ~ do you need more Python core developers on Windows? Message-ID: Hi, I just subscribed to the "Python-Dev" mailing list and the 'Welcome" reply asked me to introduce myself. My name is Burkhard Meier and I wrote the "Python GUI Programming Cookbook" published by Packt. It is available on Amazon and PacktPub.com. Maybe I can become more involved in the Python community as a Python developer on Windows . Kind regards, Burkhard Meier -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Apr 18 03:41:16 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 18 Apr 2016 17:41:16 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <5713D26A.4000704@stoneleaf.us> Message-ID: On 18 April 2016 at 07:05, Koos Zevenhoven wrote: > On Sun, Apr 17, 2016 at 9:14 PM, Ethan Furman wrote: > > On 04/17/2016 06:58 AM, Koos Zevenhoven wrote: > > > >> So, as a summary: With a str+bytes-polymorphic __fspath__, with the > >> above argumentation and the rough implementation of os.fspath(...), > >> the conclusion is that the os.fspath function should indeed be public, > >> and that no further variations are needed. > > > > > > Nice summation, thank you. :) > > > > Come on, Ethan, that summary was not for you ;) As Chris noted though, the "Yes, that summary is accurate" from active participants in the discussion helps assure readers that it's a good overview :) Given the variant you suggested, what if we defined the API semantics like this: # Offer the simplest possible API as the public vesion def fspath(pathlike) -> str: return os._raw_fspath(pathlike) # Expose the complexity in the "private" variant def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes): # Short-circuit for instances of the output type if isinstance(pathlike, output_types): return pathlike # We'd have a tidier error message here for non-path objects result = pathlike.__fspath__() if not isinstance(result, output_types): raise TypeError("argument is not and does not provide an acceptable pathname") return result That way, the default API would be saying unambiguously that the preferred way of manipulating filesystem paths is as text, but the lower level "mainly for the standard library" API would explicitly handle the 3 different scenarios (binary-input-is-a-bug, text-input-is-a-bug, and either-binary-or-text-input-is-fine). That way the structure of the additional parameters on _raw_fspath can be tailored specifically to the needs of the standard library, without worrying as much about 3rd party use cases. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Apr 18 03:44:12 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 18 Apr 2016 17:44:12 +1000 Subject: [Python-Dev] My first post here ~ do you need more Python core developers on Windows? In-Reply-To: References: Message-ID: On 18 April 2016 at 15:23, Burkhard Meier wrote: > Maybe I can become more involved in the Python community as a Python > developer on Windows . > Welcome! We definitely still have a marked skew towards Linux and *nix programmers in general relative to the global software development population, so participation from additional experienced Windows developers is always appreciated :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Apr 18 04:16:05 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 18 Apr 2016 10:16:05 +0200 Subject: [Python-Dev] My first post here ~ do you need more Python core developers on Windows? In-Reply-To: References: Message-ID: 2016-04-18 7:23 GMT+02:00 Burkhard Meier : > My name is Burkhard Meier and I wrote the "Python GUI Programming Cookbook" > published by Packt. > > It is available on Amazon and PacktPub.com. Welcome! > Maybe I can become more involved in the Python community as a Python > developer on Windows . You can use the Developer Guide to start: https://docs.python.org/devguide/ See also the Python menthors to get help on a dedicated and private mailing list: http://pythonmentors.com/ Sadly yes, we have many open issues specific to Windows. I'm trying to sometimes give time to fix some of them, but I'm less interested than in open source operating systems ;-) Victor From jimjjewett at gmail.com Mon Apr 18 07:20:47 2016 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Mon, 18 Apr 2016 07:20:47 -0400 Subject: [Python-Dev] RFC: PEP 509: Add a private version to dict In-Reply-To: References: <57112af3.641d8c0a.3bf39.03b0@mx.google.com> Message-ID: On Fri, Apr 15, 2016 at 7:31 PM, Victor Stinner wrote: > .2016-04-15 23:45 GMT+02:00 Jim J. Jewett : ... >> I just worry that you may end up closing off even better optimizations >> later, if you make too many promises about exactly how you will do >> which ones. >> Today, dict only cares about ==, and you (reasonably) think that full >> == isn't always worth running ... but when it comes to which tests >> *are* worth running, I'm not confident that the answers won't change >> over the years. > I checked, currently there is no unit test for a==b, only for a is b. > I will add add a test for a==b but a is not b, and ensure that the > version is increased. Again, why? Why not just say "If an object is replaced by something equal to itself, the version_tag may not be changed. While the initial heuristics are simply to check for identity but not full equality, this may change in future releases." >> For example, if I know that my dict values are all 4-digit integers, >> can I write: >> >> d[k] = d[k] + 0 >> >> and be assured that the version_tag will bump? Or is that something >> that a future optimizer might optimize out? > Hum, I will try to clarify that. I would prefer that you clarify it to say that while the initial patch doesn't optimize that out, a future optimizer might. > The problem with storing an identifier (a pointer in C) with no strong > reference is when the object is destroyed, a new object can likely get > the same identifier. So it's likely that "dict[key] is old_value_id" > can be true even if dict[key] is now a new object. Yes, but it shouldn't actually be destroyed until it is removed from the dict, which should change version_tag, so that there will be no need to compare it. > Do you want to modify the PEP 509 to fix this issue? Or you don't > understand why the PEP 509 cannot be used to fix the issue? I'm > lost... I believe it *does* fix the issue in some (but not all) cases. -jJ From jimjjewett at gmail.com Mon Apr 18 07:46:44 2016 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Mon, 18 Apr 2016 07:46:44 -0400 Subject: [Python-Dev] Updated PEP 509 In-Reply-To: References: Message-ID: On Sat, Apr 16, 2016 at 5:01 PM, Victor Stinner wrote: > * I mentionned that version++ must be atomic, and that in the case of > CPython, it's done by the GIL Better; if those methods *already* hold the GIL, it is worth saying "already", to indicate that the change is not expensive. > * I removed the dict[key]=value; dict[key]=value. It's really a > micro-optimization. I also fear that Raymond will complain because it > adds an if in the hot code of dict, and the dict type is very > important for Python performance. That is an acceptable answer. Though I really do prefer explicitly *refusing to promise* either way when the replacement/replaced objects are ==. dicts (and other collections) already assume sensible ==, even explicitly allowing self-matches of objects that are not equal to themselves. I don't like the idea of making new promises that violate (or rely on violations of) that sensible == assumption. -jJ From ethan at stoneleaf.us Mon Apr 18 10:03:28 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Apr 2016 07:03:28 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <5713D26A.4000704@stoneleaf.us> Message-ID: <5714E930.50305@stoneleaf.us> On 04/18/2016 12:41 AM, Nick Coghlan wrote: > Given the variant you [Koos] suggested, what if we defined the API semantics > like this: > > # Offer the simplest possible API as the public vesion > def fspath(pathlike) -> str: > return os._raw_fspath(pathlike) > > # Expose the complexity in the "private" variant > def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes): > # Short-circuit for instances of the output type > if isinstance(pathlike, output_types): > return pathlike > # We'd have a tidier error message here for non-path objects > result = pathlike.__fspath__() > if not isinstance(result, output_types): > raise TypeError("argument is not and does not provide an > acceptable pathname") > return result My initial reaction was that this was overly complex, but after thinking about it a couple days I /really/ like it. It has a reasonable default for the 99% real-world use-case, while still allowing for custom and exact tailoring (for the 99% stdlib use-case ;) . -- ~Ethan~ From oscar.j.benjamin at gmail.com Mon Apr 18 10:38:54 2016 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 18 Apr 2016 15:38:54 +0100 Subject: [Python-Dev] Updated PEP 509 In-Reply-To: References: Message-ID: On 18 April 2016 at 12:46, Jim J. Jewett wrote: >> >> * I removed the dict[key]=value; dict[key]=value. It's really a >> micro-optimization. I also fear that Raymond will complain because it >> adds an if in the hot code of dict, and the dict type is very >> important for Python performance. > > That is an acceptable answer. Though I really do prefer explicitly > *refusing to promise* either way when the replacement/replaced objects > are ==. > > dicts (and other collections) already assume sensible ==, even > explicitly allowing self-matches of objects that are not equal to > themselves. I don't like the idea of making new promises that violate > (or rely on violations of) that sensible == assumption. dicts make assumptions about the behaviour of __eq__ for the *keys* but not for the *values* (on which no assumptions are made). The only way to replace a key in a dict with another equal key (having a well-behaved hash function) is to pop the key out and then insert the new key so it's not possible to replace a key with another equal key without bumping the version twice. So presumably you're referring to the values here right? The purpose of the PEP is to be able to guard for changes to namespaces which are implemented as dicts. So if builtins.__dict__['abs'] is replaced by foo then we don't care what foo.__eq__ says about the situation: any optimisation that assumed builtins.abs was not monkeypatched is invalidated. That's why the version update is needed. Without it the version cannot be relied upon as an optimisation guard. Consider: class MyAbs: def __eq__(self, other): return True def __call__(self, arg): return - arg builtins.abs = MyAbs() -- Oscar From cr0hn at cr0hn.com Mon Apr 18 06:05:28 2016 From: cr0hn at cr0hn.com (cr0hn) Date: Mon, 18 Apr 2016 12:05:28 +0200 Subject: [Python-Dev] [Question][Asyncio] Process + Threads + asyncio... has sense? Message-ID: Hi all, It's the first time I write in this list. Sorry if it's not the best place for this question. After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley articles/talks, etc, I developed a PoC library that mixes: Process + Threads + Asyncio Tasks, doing an scheme like this diagram: main -> Process 1 -> Thread 1.1 -> Task 1.1.1 -> Task 1.1.2 -> Task 1.1.3 -> Thread 1.2 -> Task 1.2.1 -> Task 1.2.2 -> Task 1.2.3 Process 2 -> Thread 2.1 -> Task 2.1.1 -> Task 2.1.2 -> Task 2.1.3 -> Thread 2.2 -> Task 2.2.1 -> Task 2.2.2 -> Task 2.2.3 In my local tests, this approach appear to improve (and simplify) the concurrency/parallelism for some tasks but, before release the library at github, I don't know if my aproach is wrong and I would appreciate your opinion. Thank you very much for your time. Regards! -- Daniel Garc?a a.k.a. cr0hn - Security researcher and pentester @ggdaniel http://www.cr0hn.com/me/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Apr 18 12:40:14 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 18 Apr 2016 09:40:14 -0700 Subject: [Python-Dev] [Question][Asyncio] Process + Threads + asyncio... has sense? In-Reply-To: References: Message-ID: A better place for this question would be the tulip Google group: https://groups.google.com/forum/#!forum/python-tulip On Mon, Apr 18, 2016 at 3:05 AM, cr0hn wrote: > Hi all, > > It's the first time I write in this list. Sorry if it's not the best place > for this question. > > After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley > articles/talks, etc, I developed a PoC library that mixes: Process + > Threads + Asyncio Tasks, doing an scheme like this diagram: > > main -> Process 1 -> Thread 1.1 -> Task 1.1.1 > -> Task 1.1.2 > -> Task 1.1.3 > > -> Thread 1.2 > -> Task 1.2.1 > -> Task 1.2.2 > -> Task 1.2.3 > > Process 2 -> Thread 2.1 -> Task 2.1.1 > -> Task 2.1.2 > -> Task 2.1.3 > > -> Thread 2.2 > -> Task 2.2.1 > -> Task 2.2.2 > -> Task 2.2.3 > > In my local tests, this approach appear to improve (and simplify) the > concurrency/parallelism for some tasks but, before release the library at > github, I don't know if my aproach is wrong and I would appreciate your > opinion. > > Thank you very much for your time. > > Regards! > > -- > Daniel Garc?a a.k.a. cr0hn - Security researcher and pentester > @ggdaniel > http://www.cr0hn.com/me/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Apr 18 13:13:51 2016 From: brett at python.org (Brett Cannon) Date: Mon, 18 Apr 2016 17:13:51 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sun, 17 Apr 2016 at 06:59 Koos Zevenhoven wrote: > On Sun, Apr 17, 2016 at 11:03 AM, Stephen J. Turnbull > wrote: > > Nick Coghlan writes: > > > > > str and bytes aren't going to implement __fspath__ (since they're > > > only *sometimes* path objects), so asking people to call the > > > protocol method directly for any purpose would be a pain. > > > > It *should* be a pain. People who need bytes should call fsencode, > > people who need str should call fsdecode, and Ethan's antipathy checks > > for bytes and str, then calls __fspath__ if needed. Who's left? Just > > the bartender and the janitor, last call was hours ago. OK, maybe > > there are enough clients to make it worthwhile to provide the utility, > > but it should be clearly marked as "double opt-in, for experts only > > (consenting adults must show proof of insurance)". > > My doubts, expressed several times in these threads, about the need > for a *public* os.fspath function to complement the __fspath__ > protocol, are now perhaps gone. I'll explain why (and how). The > reasons for my doubts were that > > (1) The audience outside the stdlib for such a function should be > small, because it is preferred to either use existing tools in > os.path.* or pathlib (or similar) for manipulating paths. > > (2) There are just too many different possible versions of this > function: rejecting str, rejecting bytes, coercion to str, coercion to > bytes, and accepting both str and bytes. That's a total of 5 different > cases. People also used to talk about versions that would not allow > passing through objects that are already bytes or str. That would make > it a total of 10 different versions! > (in principle, there could be even more, but let's not go there :-). > In other words, this argument was that it is probably best to > implement whatever flavor is needed for the context, perhaps based on > documented recipes. > > > Regarding (2), we can first rule out half of the 10 cases---the ones > that reject plain instances of bytes and/or str---because they would > not be very useful as all the isinstance/hasattr checking etc. would > be left to the caller. And here are the remaining five, explained > based on what they accept as argument, what they return, and where > they would be used: > > (A) "polymorphic" > *Accept*: str and bytes, provided via __fspath__ as well as plain str > and bytes instances. > *Return*: str/bytes depending on input. > *Audience*: the stdlib, including os.path.things, os.things, > shutil.things, open, ... (some functions would need a C version). > There may even be a small audience outside the stdlib. > > (B) "str-based only" > *Accept*: str, provided via __fspath__ as well as plain str. > *Return*: str. > *Audience*: relatively low-level code that works exclusively with str > paths but accepts specialized path objects as input. > > (C) "bytes-based only" > *Accept*: bytes, provided via __fspath__ as well as plain bytes. > *Return*: bytes. > *Audience*: low-level code that explicitly deals with paths as bytes > (probably to deal with undefined/ill-defined encodings). > > (D) "coerce to str" > *Accept*: str and bytes, provided via __fspath__ as well as plain str > and bytes instances. > *Return*: str (coerced / decoded if needed). > *Audience*: code that deals explicitly with str but wants to 'try' > supporting bytes-based path inputs too via implicit decoding (even if > it may result in surrogate escapes, which one cannot for instance > print(...).) > > (E) "coerce to bytes" > *Accept*: str and bytes, provided via __fspath__ as well as plain str > and bytes instances. > *Return*: bytes (coerced / encoded if needed). > *Audience*: low-level code that explicitly deals with bytes paths but > wants to accept str-based path inputs too via implicit encoding. > > > Even if all options (A-E) probably have small audiences (compared to > e.g. os.path.*), some of them have larger audiences than others. But > all of them have at least *some* reasonable audience (as desribed > above). > > Recently (well, a few days ago, but 'recently', considering the scale > of these discussions anyway ;-), Nick pointed out something I hadn't > realized---os.fsencode and os.fsdecode actually already implement > coercion to bytes and str, respectively. With those two functions made > compatible with the __fspath__ protocol [using (A) above], they would > in fact *be* (D) and (E), respectively. > > Now, we only have options (A-C) left. They could all be implemented > roughly as follows: > > def fspath(pathlike, *, output_types = (str,)): > if hasattr(pathlike, '__fspath__'): > ret = pathlike.__fspath__() # or pathlike.__fspath__ if it's not a > method > else: > ret = pathlike > if not isinstance(ret, output_types): > raise TypeError("argument is not and does not provide an > acceptable pathname") > return ret > > With an implementation like the above, (A) would correspond to > output_types = (str, bytes), (B) to the default, and (C) to > output_types = (bytes,). > > > So, with the above considerations as a counterargument, I consider > argument (2) gone. > > What about argument (1), that the audience for the os.fspath(...) > function (especially for one selected version of the 5 or 10 > variations!) is quite small, and we should not encourage manipulating > pathnames by hand, but to use os.path.* or pathlib instead? > > The counterargument for (1): > > It seems to me we now "all" agree that __fspath__ should allow > str+bytes polymorphism. I could try to list who I mean by "all" > (Ethan, Brett, Stephen T, Nick, ... ?), but obviously I won't be able > to list all or speak for them so I won't even try :-). Anyway, for > this argument, I'm assuming we agree on that. So, __fspath__ can > provide either str or bytes, even if str is *highly preferred* in most > places. Therefore, the os.fspath function, as part of the protocol, > has the important role of *by default* rejecting bytes, so that the > protocol effectively becomes str-only by default. With the fspath > implementation like the one I drafted above, and > os.fsencode+os.fsdecode, we in fact cover all cases (A-E). > > So, as a summary: With a str+bytes-polymorphic __fspath__, with the > above argumentation and the rough implementation of os.fspath(...), > the conclusion is that the os.fspath function should indeed be public, > and that no further variations are needed. > > -Koos > > P.S. There is also the possibility of two dunder methods corresponding > to str and bytes, leading to one being preferred over the other in > some cases etc. I have gone though various aspects and possible > versions of that approach, but concluded it's not worth it, as some of > us may also have implied in earlier posts. After all, we want > something that's *almost* exclusively str. > Just to add to the chorus of praise, thanks for the summary, Koos! I just wanted to add a rephrasing to your overall conclusion that I reached independently Friday night but couldn't post earlier as I promised my wife I wouldn't write or say the "P" word all weekend which meant I didn't read or respond to any python-dev email all weekend (if you think that's cruel and unusual punishment, her Twitter is https://twitter.com/AndreaMcInnes21 ;) . If we continue with the "str is an encoding of file paths", you can then build from "bytes is an encoding of str" to get a pyramid of file path encodings: Path -> str -> bytes. I don't think this is in any way a controversial view. Now Stephen has been promoting the idea of enhancing os.fsencode() and os.fsdecode() to understand what __fspath__ is (I'm ignoring the str/bytes return points for now). With os.fsencode() this would mean giving it anything in the Path -> str -> bytes pyramid would lead to following the steps to reach bytes at the bottom of the encoding pyramid. That's fine and easy to explain: whatever you pass into os.fsencode() you know it will get encoded to bytes using the file system encoding and surrogate escape. The trick becomes os.fsdecode() and its str return value. Looking at our encoding pyramid of Path -> str -> bytes we notice that the return value for os.fsdecode() is actually now in the *middle* of our encoding pyramid. What that means is that while passing in bytes and decoding them to str makes sense, passing in a Path object and getting back str is actually an *encoding*! My brain wanting semantic purity for the "decode" part of os.fsdecode() started to hurt. But that's when I realized that adding __fspath__ support to os.fsdecode() and os.fsencode(), they become more coercion functions rather than encoding/decoding functions. It also means that os.fspath() has a place when you want to say "I only want to encode a file path to str" and avoid the decode bit that os.fsdecode() would do (IOW it's like a half step of os.fsencode() for full control). You probably also want control of getting just bytes and skipping os.fsencode() and its automatic encoding call so that you don't accidentally get mojibake or something. Now going back to what __fspath__ returns, this starts to promote that it returns the highest level in the Path -> str -> bytes pyramid that isn't the top. We then provide whatever support we need to allow to go straight to the encoding someone might want through the os module. Koos outlined all of this above so I'm not going to rehash it all here, but the point will be the protocol will be more low-level than we expect people to work with and we will promote the use of the proper helper functions in the os module to get the results people desire (although I still feel a little bad for people writing libraries that will be manipulating paths prior to Python 3.6 who don't get this helper code, but my assumption is that they will get TypeError from using whatever __fspath__() returns and e.g. os.path.join() w/ a different type, otherwise they are just passing paths down to the stdlib and so shouldn't inhibit usage of specific path encodings). -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Apr 18 15:25:16 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 19 Apr 2016 04:25:16 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> Message-ID: <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> I don't disagree with the basic analysis, but there are a number of issues with motivational statements. Koos Zevenhoven writes: > (B) "str-based only" > *Accept*: str, provided via __fspath__ as well as plain str. > *Return*: str. > *Audience*: relatively low-level code that works exclusively with str > paths but accepts specialized path objects as input. Why "low-level"? All code that stores paths persistently is likely to store them in text files or database strings or the like, rather than as Path (read: specialized path objects, not necessarily pathlib.Path). But if there is any low-level manipulation of the paths to be done before storing, it would be done as Path. Thus high-level code might also want to accept Path transparently. > (C) "bytes-based only" > *Accept*: bytes, provided via __fspath__ as well as plain bytes. > *Return*: bytes. > *Audience*: low-level code that explicitly deals with paths as bytes > (probably to deal with undefined/ill-defined encodings). No, if it's to deal with encoding issues, we wouldn't accept this. PEP 383 eliminates that concern. We accept bytes to support people who are representing paths with bytes because they think that it's a good idea and that encoding doesn't matter in their application. > (D) "coerce to str" > *Accept*: str and bytes, provided via __fspath__ as well as plain str > and bytes instances. > *Return*: str (coerced / decoded if needed). > *Audience*: code that deals explicitly with str but wants to 'try' > supporting bytes-based path inputs too via implicit decoding (even if > it may result in surrogate escapes, which one cannot for instance > print(...).) No. As Nick points out with respect to fsencode/fsdecode, it's not a question of supporting known bytes via implicit decoding (that's what __fspath__ does for the types that support it), but rather of supporting ambiguity. Best practice is to convert explicitly at the boundary, because it's too likely that data with unexpected type is just the wrong data. Printing surrogates can be done with errors=backslashreplace, and if you're using fsdecode, you probably should use that, namereplace, or xmlcharrefreplace. > (E) "coerce to bytes" > *Accept*: str and bytes, provided via __fspath__ as well as plain str > and bytes instances. > *Return*: bytes (coerced / encoded if needed). > *Audience*: low-level code that explicitly deals with bytes paths but > wants to accept str-based path inputs too via implicit encoding. Again, it's a question of ambiguity, or perhaps sloppy programming (eg, using str literals for paths in a bytes-oriented program). Use cases D and E are basically "guessing when faced with ambiguity", and fsencode and fsdecode are code smells because (as Nick claims) they almost always conceal a situation where you don't know whether you've got bytes or str (and it's way too much work to find out by tracing them back to where they came from). > It seems to me we now "all" agree that __fspath__ should allow > str+bytes polymorphism. I don't agree that we *should* allow polymorphism, because (purity) paths are in the text domain[1] and (practicality) I don't believe that use of os.fspath will be restricted to "low-level boundary code". I would be perfectly happy telling bytes users that the idiom is not "os.fspath(maybe_direntry, allow_types=(bytes,))", but rather "os.fsencode(os.fspath(maybe_direntry))", so that code in the text domain can safely use os.fspath(maybe_direntry) without worrying that it will raise because maybe_direntry.__fspath__() returns bytes. This would allow pathlib.Path to handle arguments providing __fspath__ transparently. With the current proposal, it would need to rule out bytes before invoking os.fspath, or handle the exception, or leave the exception to its caller. None of these options are pleasant. Unfortunately, as Nick points out, defining __fspath__ to return str is very unpleasant because bytes applications will now have to guard *everything* that might provide __fspath__ with that incantation before passing to open and other APIs that store the path on the object returned. So we don't really have a choice about polymorphism if we want to support both __fspath__ and bytes paths. > After all, we want something that's *almost* exclusively str. But we don't want that, AFAICT. Some clearly want this API to be unbiased against bytes in the same way the os APIs are unbiased[2], because that's what we've got in the current proposal. Further, due to the existing ambiguity in fsencode and fsdecode, we're extending the field of ambiguity where bytes and str can mix indiscriminately. If we are serious about "*almost* exclusively str" we should accept that "exclusively str" is a very good approximation and much easier to use correctly, and regretfully postpone inclusion of DirEntry in this protocol to the future. But that's not on the table, is it? Footnotes: [1] Representation on disk as (basically unconstrained) byte sequences is an historical accident. [2] That doesn't mean the bytes variants will be used as often as the str variants, just that the bytes variants are as easy to use. From stephen at xemacs.org Mon Apr 18 15:26:56 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 19 Apr 2016 04:26:56 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> Message-ID: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> Brett Cannon writes: > If we continue with the "str is an encoding of file paths", It's not. It's a representation, but not an encoding. In Python 3, encoding means a representation of a character string using bytes. It's using "encoding" generically for "representation" that makes your head hurt. > you can then build from "bytes is an encoding of str" to get a > pyramid of file path encodings: Path -> str -> bytes. I don't think > this is in any way a controversial view. Perhaps not. But it's not particularly useful. ;-) Here's the pyramid I think about: Path / \ / \ V V str <-> bytes That is, str and bytes are interchangeable *without* any knowledge of paths, which are on a higher level of complexity and abstraction. Although in pathlib, there's an assumption that paths are serialized to str which is (implicitly) serialized to bytes when talking to the OS, this is not necessarily true for other structured path classes, in particular it is not true for DirEntry (which is a "enhanced degenerate" path containing only one path segment but also other useful information abot the filesystem object addressed) I haven't looked at Antipathy, but I would guess from Ethan's promotion of bytes paths and concern with efficiency that "bytes antipaths" do *not* "go through" str to get to bytes, they already are bytes (in the sense of class inheritance). > But that's when I realized that adding __fspath__ support to os.fsdecode() > and os.fsencode(), they become more coercion functions rather than > encoding/decoding functions. It also means that os.fspath() has a place > when you want to say "I only want to encode a file path to str" and avoid > the decode bit that os.fsdecode() would do I don't understand what you're trying to say here. fsdecode currently does not promise to decode anything, because it's polymorphic, accepting str and bytes. fsdecode and fsencode already *are* coercion functions. It's this kind of semantic confusion and broken nomenclature that is *why* I dislike these polymorphic functions and objects so much. It is impossible to reason correctly about them. We're stuck with invoking "practicality" and muddling through. And the names mislead even experienced Pythonistas. Steve From random832 at fastmail.com Mon Apr 18 15:42:59 2016 From: random832 at fastmail.com (Random832) Date: Mon, 18 Apr 2016 15:42:59 -0400 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> Message-ID: <1461008579.2246251.582474273.555E64CC@webmail.messagingengine.com> On Mon, Apr 18, 2016, at 15:26, Stephen J. Turnbull wrote: > in > particular it is not true for DirEntry (which is a "enhanced > degenerate" path containing only one path segment but also other > useful information abot the filesystem object addressed) DirEntry contains multiple path segments - it has the name, and the directory path that was passed into scandir. From ethan at stoneleaf.us Mon Apr 18 15:50:56 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Apr 2016 12:50:56 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> Message-ID: <57153AA0.5090103@stoneleaf.us> On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote: > Koos Zevenhoven writes: >> After all, we want something that's *almost* exclusively str. > > But we don't want that, AFAICT. Some clearly want this API to be > unbiased against bytes in the same way the os APIs are unbiased[2], > because that's what we've got in the current proposal. Are we reading the same thread? For my last several replies I am very biased against bytes (and I know I'm not the only one). Just not so biased that I'm unwilling to let clients say, "No, I'm really okay with getting bytes back". I really like Koos' ideas because they allow the client to say: - I only want str - I only want bytes - I'm okay with either If the client says "I'm okay with either" then I fully expect the client to have code to properly handle str vs bytes after the fspath (or whatever it's called) call. -- ~Ethan~ From wes.turner at gmail.com Mon Apr 18 15:54:49 2016 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 18 Apr 2016 14:54:49 -0500 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <57153AA0.5090103@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> Message-ID: On Apr 18, 2016 2:50 PM, "Ethan Furman" wrote: > > On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote: > >> Koos Zevenhoven writes: > > >>> After all, we want something that's *almost* exclusively str. >> >> >> But we don't want that, AFAICT. Some clearly want this API to be >> unbiased against bytes in the same way the os APIs are unbiased[2], >> because that's what we've got in the current proposal. > > > Are we reading the same thread? For my last several replies I am very biased against bytes (and I know I'm not the only one). > > Just not so biased that I'm unwilling to let clients say, "No, I'm really okay with getting bytes back". > > I really like Koos' ideas because they allow the client to say: > > - I only want str > - I only want bytes > - I'm okay with either > > If the client says "I'm okay with either" then I fully expect the client to have code to properly handle str vs bytes after the fspath (or whatever it's called) call. Don't we *have* to always support bytes because other programs can create filenames containing bytes? > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Apr 18 16:19:22 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Apr 2016 13:19:22 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> Message-ID: <5715414A.7000901@stoneleaf.us> On 04/18/2016 12:54 PM, Wes Turner wrote: > Don't we *have* to always support bytes because other programs can > create filenames containing bytes? Yes, but not every function has to support bytes. -- ~Ethan~ From cr0hn at cr0hn.com Mon Apr 18 13:13:58 2016 From: cr0hn at cr0hn.com (cr0hn) Date: Mon, 18 Apr 2016 13:13:58 -0400 Subject: [Python-Dev] [Question][Asyncio] Process + Threads + asyncio... has sense? In-Reply-To: References: Message-ID: Oks. Thank you very much. --- *Daniel Garc?a (cr0hn)* Security researcher and ethical hacker *Personal site*: http://cr0hn.com *Linkedin*: https://www.linkedin.com/in/garciagarciadaniel *Company*: http://abirtone.com *Twitter*: @ggdaniel El d?a 18 de abril de 2016 a las 18:40:14, Guido van Rossum ( guido at python.org) escrito: > A better place for this question would be the tulip Google group: > https://groups.google.com/forum/#!forum/python-tulip > > On Mon, Apr 18, 2016 at 3:05 AM, cr0hn wrote: > >> Hi all, >> >> It's the first time I write in this list. Sorry if it's not the best >> place for this question. >> >> After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley >> articles/talks, etc, I developed a PoC library that mixes: Process + >> Threads + Asyncio Tasks, doing an scheme like this diagram: >> >> main -> Process 1 -> Thread 1.1 -> Task 1.1.1 >> -> Task 1.1.2 >> -> Task 1.1.3 >> >> -> Thread 1.2 >> -> Task 1.2.1 >> -> Task 1.2.2 >> -> Task 1.2.3 >> >> Process 2 -> Thread 2.1 -> Task 2.1.1 >> -> Task 2.1.2 >> -> Task 2.1.3 >> >> -> Thread 2.2 >> -> Task 2.2.1 >> -> Task 2.2.2 >> -> Task 2.2.3 >> >> In my local tests, this approach appear to improve (and simplify) the >> concurrency/parallelism for some tasks but, before release the library at >> github, I don't know if my aproach is wrong and I would appreciate your >> opinion. >> >> Thank you very much for your time. >> >> Regards! >> >> -- >> Daniel Garc?a a.k.a. cr0hn - Security researcher and pentester >> @ggdaniel >> http://www.cr0hn.com/me/ >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> >> > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Apr 18 16:27:05 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 Apr 2016 06:27:05 +1000 Subject: [Python-Dev] [Python-ideas] pep 7 line break suggestion differs from pep 8 In-Reply-To: References: Message-ID: On Tue, Apr 19, 2016 at 5:33 AM, Joseph Jevnik wrote: > I saw that there was recently a change to pep 8 to suggest adding a line > break before a binary operator. Pep 7 suggests the opposite: > >> When you break a long expression at a binary operator, the operator goes >> at the end of the previous line, e.g.: > >> if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 && >> type->tp_dictoffset == b_size && >> (size_t)t_size == b_size + sizeof(PyObject *)) >> return 0; /* "Forgive" adding a __dict__ only */ > > I imagine that some of the reasons for making the change in pep 8 for > readability reasons will also > translate to C; maybe pep 7 should also be updated. I would agree with this. Passing it directly to python-dev as that's where the key decision makers are. ChrisA From brett at python.org Mon Apr 18 17:40:37 2016 From: brett at python.org (Brett Cannon) Date: Mon, 18 Apr 2016 21:40:37 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull wrote: > Brett Cannon writes: > > > If we continue with the "str is an encoding of file paths", > > It's not. It's a representation, but not an encoding. In Python 3, > encoding means a representation of a character string using bytes. > It's using "encoding" generically for "representation" that makes your > head hurt. > Well, it makes *your* head hurt; for me it helped clarify some things. :) > > > you can then build from "bytes is an encoding of str" to get a > > pyramid of file path encodings: Path -> str -> bytes. I don't think > > this is in any way a controversial view. > > Perhaps not. But it's not particularly useful. ;-) Here's the > pyramid I think about: > > Path > / \ > / \ > V V > str <-> bytes > > That is, str and bytes are interchangeable *without* any knowledge of > paths, which are on a higher level of complexity and abstraction. > Although in pathlib, there's an assumption that paths are serialized > to str which is (implicitly) serialized to bytes when talking to the > OS, this is not necessarily true for other structured path classes, in > particular it is not true for DirEntry (which is a "enhanced > degenerate" path containing only one path segment but also other > useful information about the filesystem object addressed) > > I haven't looked at Antipathy, but I would guess from Ethan's > promotion of bytes paths and concern with efficiency that "bytes > antipaths" do *not* "go through" str to get to bytes, they already are > bytes (in the sense of class inheritance). > > > But that's when I realized that adding __fspath__ support to > os.fsdecode() > > and os.fsencode(), they become more coercion functions rather than > > encoding/decoding functions. It also means that os.fspath() has a place > > when you want to say "I only want to encode a file path to str" and > avoid > > the decode bit that os.fsdecode() would do > > I don't understand what you're trying to say here. fsdecode currently > does not promise to decode anything, because it's polymorphic, > accepting str and bytes. fsdecode and fsencode already *are* coercion > functions. > And they will continue to be coercion functions. My point is that since they coerce there is no way to use them in a way to dictate that you don't want any str/bytes encoding/decoding to occur without checking the arguments going into the function (i.e. "no guessing about encodings, please"). By providing os.fspath() I can say that I do not, under any circumstances, want someone to guess at the encoding some bytes path is under to get me a string and instead I want to start and end entirely in a world of strings. IOW os.fspath() lets me work in such a way that the instant bytes are introduced into my code for file paths it triggers a TypeError. > > It's this kind of semantic confusion and broken nomenclature that is > *why* I dislike these polymorphic functions and objects so much. It > is impossible to reason correctly about them. We're stuck with > invoking "practicality" and muddling through. And the names mislead > even experienced Pythonistas. > Yep, we are stuck with the names unless you want to propose a new name and deprecate the old one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Mon Apr 18 17:58:59 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 19 Apr 2016 00:58:59 +0300 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <5714E930.50305@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <5713D26A.4000704@stoneleaf.us> <5714E930.50305@stoneleaf.us> Message-ID: On Mon, Apr 18, 2016 at 5:03 PM, Ethan Furman wrote: > On 04/18/2016 12:41 AM, Nick Coghlan wrote: > >> Given the variant you [Koos] suggested, what if we defined the API >> semantics >> like this: >> >> # Offer the simplest possible API as the public vesion >> def fspath(pathlike) -> str: >> return os._raw_fspath(pathlike) >> >> # Expose the complexity in the "private" variant >> def _raw_fspath(pathlike, *, output_types = (str,)) -> (str, bytes): >> # Short-circuit for instances of the output type >> if isinstance(pathlike, output_types): >> return pathlike >> # We'd have a tidier error message here for non-path objects >> result = pathlike.__fspath__() >> if not isinstance(result, output_types): >> raise TypeError("argument is not and does not provide an >> acceptable pathname") >> return result > > My initial reaction was that this was overly complex, but after thinking > about it a couple days I /really/ like it. It has a reasonable default for > the 99% real-world use-case, while still allowing for custom and exact > tailoring (for the 99% stdlib use-case ;) . > While it does seem we finally might be nearly there :), this still seems to need some further discussion. As described in that long post of mine, I suppose some third-party code may need the variations (A-C), while it seems that in the stdlib, most places need (str, bytes), i.e. (A), except in pathlib, which needs (str,), i.e. (B). I'm not sure what I think about making the variations private, even if "hiding" the bytes version is, as I said, an important role of the public function. Except for that type hint, there is *nothing* in the function that might mislead the user to think bytes paths are something important in Python 3. It's a matter of documentation whether it "supports" bytes or not. In fact, that function (assuming the name os.fspath) could now even be documented to support this: patharg = os.fspath(patharg, output_types = (str, pathlib.PurePath)) # :-) So are we still going to end up with two functions or can we deal with one? What should the typehint be? Something new in typing.py? How about FSPath[...] as follows: FSPath[bytes] # bytes-based pathlike, including bytes FSPath[str] # str-based pathlike, including str pathstring = typing.TypeVar('pathstring', str, bytes) # could be extended with PurePath or some path ABC So the above variation might become: def fspathname(pathlike: FSPath[pathstring], *, output_types: tuple = (str,)) -> pathstring: # Short-circuit for instances of the output type if isinstance(pathlike, output_types): return pathlike # We'd have a tidier error message here for non-path objects result = pathlike.__fspath__() if not isinstance(result, output_types): raise TypeError("valid output type not provided via __fspath__") return result And similar type hints would apply to os.path functions. For instance, os.path.dirname: def dirname(p: FSPath[pathstring]) -> pathstring: ... This would say pathstring all over and not give anyone any ideas about bytes, unless they know what they're doing. Complicated? Yes, typing is. But I think we will need this kind of hints for os.path functions anyway. -Koos From ethan at stoneleaf.us Mon Apr 18 18:12:53 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Apr 2016 15:12:53 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <5713D26A.4000704@stoneleaf.us> <5714E930.50305@stoneleaf.us> Message-ID: <57155BE5.3010100@stoneleaf.us> On 04/18/2016 02:58 PM, Koos Zevenhoven wrote: > It's a matter of documentation whether it "supports" bytes > or not. In fact, that function (assuming the name os.fspath) could now > even be documented to support this: > > patharg = os.fspath(patharg, output_types = (str, pathlib.PurePath)) # :-) While the os.fspath() function could be abused in such a way, we certainly wouldn't advertise it. (Leave that to StackOverflow. ;) -- ~Ethan~ From ethan at stoneleaf.us Mon Apr 18 18:30:38 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Apr 2016 15:30:38 -0700 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> Message-ID: <5715600E.2060304@stoneleaf.us> On 04/18/2016 12:26 PM, Stephen J. Turnbull wrote: > I haven't looked at Antipathy, but I would guess from Ethan's > promotion of bytes paths and concern with efficiency that "bytes > antipaths" do *not* "go through" str to get to bytes, they already are > bytes (in the sense of class inheritance). Couple points: - Correct: if you create an antipathy.Path with bytes, you get a bytes path (bPath); if you create an antipathy.Path with str you get a str path (uPath) - if you mix a bPath with a uPath, or bytes with a uPath, or str with a bPath -- an exception is raised (conversions are *not* implicit (on 3.0, at least -- on 2.x you can activate that behavior if you want it) - my concern with supporting bytes is primarily for the sake of the stdlib, and secondarily for anyone who needs to work with bytes; it really has no effect on my library (since antipathy uses subclasses of bytes/str) -- ~Ethan~ From guido at python.org Mon Apr 18 18:52:38 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 18 Apr 2016 15:52:38 -0700 Subject: [Python-Dev] [Python-ideas] pep 7 line break suggestion differs from pep 8 In-Reply-To: References: Message-ID: [ideas to bcc] I'm not as excited about this as I am about the PEP 8 change. PEP 8 affects most Python programmers. But PEP 7 is really just for CPython and its extensions, and I don't think it has found anything like as widespread a following as PEP 8. I worry that if we change this in PEP 7 we'll just see either massing inconsistent code or endless diffs that do nothing but change the formatting (and occasionally introduce a bug). And I don't think it would do as much good -- reading and understanding C code is primarily a matter of knowing the language, and the audience is much more heavily skewed towards experts. IOW, -1. On Mon, Apr 18, 2016 at 1:27 PM, Chris Angelico wrote: > On Tue, Apr 19, 2016 at 5:33 AM, Joseph Jevnik wrote: > > I saw that there was recently a change to pep 8 to suggest adding a line > > break before a binary operator. Pep 7 suggests the opposite: > > > >> When you break a long expression at a binary operator, the operator goes > >> at the end of the previous line, e.g.: > > > >> if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 && > >> type->tp_dictoffset == b_size && > >> (size_t)t_size == b_size + sizeof(PyObject *)) > >> return 0; /* "Forgive" adding a __dict__ only */ > > > > I imagine that some of the reasons for making the change in pep 8 for > > readability reasons will also > > translate to C; maybe pep 7 should also be updated. > > I would agree with this. Passing it directly to python-dev as that's > where the key decision makers are. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Apr 18 19:08:42 2016 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 18 Apr 2016 18:08:42 -0500 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> <5715414A.7000901@stoneleaf.us> Message-ID: On Apr 18, 2016 3:19 PM, "Ethan Furman" wrote: > > On 04/18/2016 12:54 PM, Wes Turner wrote: > >> Don't we *have* to always support bytes because other programs can >> create filenames containing bytes? > > > Yes, but not every function has to support bytes. Because there's no function overloading in Python, we then must have explicit typing conditionals. I haven't the time to dig through and compare this with the other fine solutions presented; is there a reason that a proxy/facade PrimitiveType wouldn't solve for this? class TextThing: __init__(self, data): self.data = data self.type_ = type(data) __getattr__(self, key): return getattr(self.data, key) > > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From burkhardameier at gmail.com Mon Apr 18 21:13:01 2016 From: burkhardameier at gmail.com (Burkhard Meier) Date: Mon, 18 Apr 2016 18:13:01 -0700 Subject: [Python-Dev] My first post here ~ do you need more Python core developers on Windows? In-Reply-To: References: Message-ID: Thank you for the warm welcome and the links. I will definitely check them out. Burkhard On Mon, Apr 18, 2016 at 1:16 AM, Victor Stinner wrote: > 2016-04-18 7:23 GMT+02:00 Burkhard Meier : > > My name is Burkhard Meier and I wrote the "Python GUI Programming > Cookbook" > > published by Packt. > > > > It is available on Amazon and PacktPub.com. > > Welcome! > > > Maybe I can become more involved in the Python community as a Python > > developer on Windows . > > You can use the Developer Guide to start: > https://docs.python.org/devguide/ > > See also the Python menthors to get help on a dedicated and private > mailing list: > http://pythonmentors.com/ > > Sadly yes, we have many open issues specific to Windows. I'm trying to > sometimes give time to fix some of them, but I'm less interested than > in open source operating systems ;-) > > Victor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Apr 19 06:27:38 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 19 Apr 2016 12:27:38 +0200 Subject: [Python-Dev] PEP 509: Add a private version to dict (version 3) Message-ID: Hi, Below if the third version of my PEP 509 (dict version). Changes since the version 2: * __setitem__() and update() now always increases the version: remove the micro-optimization on "dict[key] is new_value". Exception: version is not changed with dict.update() is called without argument. * be more explict on version++: explain that the operation must be atomic, and that dict methods are already atomic thanks to the GIL * Usage of the dict version: add Cython * "Guard against changing dict during iteration": don't guess if the new dict version can be used or not. Let's discuss that later. * rephrase/complete some sections * add links to new threads on python-dev I hope that I addressed all Jim's concerns about the version 2. Note: I also updated the implementation. The implementation now contains more tests for identical values and more tests on equal values. HTML version: https://www.python.org/dev/peps/pep-0509/ PEP: 509 Title: Add a private version to dict Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6 Abstract ======== Add a new private version to the builtin ``dict`` type, incremented at each dictionary creation and at each dictionary change, to implement fast guards on namespaces. Rationale ========= In Python, the builtin ``dict`` type is used by many instructions. For example, the ``LOAD_GLOBAL`` instruction looks up a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses ``dict`` for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (function namespace) is usually optimized to an array, but it can be a dict too. Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards". The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a private version to dictionaries to implement fast guards on namespaces. Dictionary lookups can be skipped if the version does not change which is the common case for most namespaces. The version is globally unique, so checking the version is also enough to check if the namespace dictionary was not replaced with a new dictionary. When the dictionary version does not change, the performance of a guard does not depend on the number of watched dictionary entries: the complexity is O(1). Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified. If the global variable is not modified, the function uses the cached copy. If the global variable is modified, the function uses a regular lookup, and maybe also deoptimize the function (to remove the overhead of the guard check for next function calls). See the `PEP 510 -- Specialized functions with guards `_ for the concrete usage of guards to specialize functions and for a more general rationale on Python static optimizers. Guard example ============= Pseudo-code of an fast guard to check if a dictionary entry was modified (created, updated or deleted) using an hypothetical ``dict_get_version(dict)`` function:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict) def check(self): """Return True if the dictionary entry did not change and the dictionary was not replaced.""" # read the version of the dictionary version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True # the key was modified return False Usage of the dict version ========================= Speedup method calls -------------------- Yury Selivanov wrote a `patch to optimize method calls `_. The patch depends on the `"implement per-opcode cache in ceval" `_ patch which requires dictionary versions to invalidate the cache if the globals dictionary or the builtins dictionary has been modified. The cache also requires that the dictionary version is globally unique. It is possible to define a function in a namespace and call it in a different namespace, using ``exec()`` with the *globals* parameter for example. In this case, the globals dictionary was replaced and the cache must also be invalidated. Specialized functions using guards ---------------------------------- The `PEP 510 -- Specialized functions with guards `_ proposes an API to support specialized functions with guards. It allows to implement static optimizers for Python without breaking the Python semantics. The `fatoptimizer `_ of the `FAT Python `_ project is an example of a static Python optimizer. It implements many optimizations which require guards on namespaces: * Call pure builtins: to replace ``len("abc")`` with ``3``, guards on ``builtins.__dict__['len']`` and ``globals()['len']`` are required * Loop unrolling: to unroll the loop ``for i in range(...): ...``, guards on ``builtins.__dict__['range']`` and ``globals()['range']`` are required * etc. Pyjion ------ According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can benefit from dictionary version to implement optimizations. `Pyjion `_ is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core runtime). Cython ------ Cython can benefit from dictionary version to implement optimizations. `Cython `_ is an optimising static compiler for both the Python programming language and the extended Cython programming language. Unladen Swallow --------------- Even if dictionary version was not explicitly mentioned, optimizing globals and builtins lookup was part of the Unladen Swallow plan: "Implement one of the several proposed schemes for speeding lookups of globals and builtins." (source: `Unladen Swallow ProjectPlan `_). Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective `_. Changes ======= Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global dictionary version. Each time a dictionary is created, the global version is incremented and the dictionary version is initialized to the global version. The global version is also incremented and copied to the dictionary version at each dictionary change: * ``clear()`` if the dict is non-empty * ``pop(key)`` if the key exists * ``popitem()`` if the dict is non-empty * ``setdefault(key, value)`` if the key does not exist * ``__delitem__(key)`` if the key exists * ``__setitem__(key, value)`` always increases the version * ``update(...)`` if called with arguments The version increase must be atomic. In CPython, the Global Interpreter Lock (GIL) already protects ``dict`` methods to make them atomic. Example using an hypothetical ``dict_get_version(dict)`` function:: >>> d = {} >>> dict_get_version(d) 100 >>> d['key'] = 'value' >>> dict_get_version(d) 101 >>> d['key'] = 'new value' >>> dict_get_version(d) 102 >>> del d['key'] >>> dict_get_version(d) 103 ``dict.__setitem__(key, value)`` and ``dict.update(...)`` always increases the version, even if the new value is identical or is equal to the current value (even if ``(dict[key] is value) or (dict[key] == value)``). The field is called ``ma_version_tag``, rather than ``ma_version``, to suggest to compare it using ``version_tag == old_version_tag``, rather than ``version <= old_version`` which is wrong most of the time after an integer overflow. Backwards Compatibility ======================= Since the ``PyDictObject`` structure is not part of the stable ABI and the new dictionary version not exposed at the Python scope, changes are backward compatible. Implementation and Performance ============================== The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject `_ contains a patch implementing this PEP. On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations. For example, the following timeit micro-benchmarks takes 318 nanoseconds before and after the change:: python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()' When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast). The `fat module `_ implements such guards: ``fat.GuardDict`` is based on the dictionary version. Integer overflow ================ The implementation uses the C type ``PY_UINT64_T`` to store the version: a 64 bits unsigned integer. The C code uses ``version++``. On integer overflow, the version is wrapped to ``0`` (and then continue to be incremented) according to the C standard. After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug only occurs at a guard check if there are exaclty ``2 ** 64`` dictionary creations or modifications since the previous guard check. If a dictionary is modified every nanosecond, ``2 ** 64`` modifications takes longer than 584 years. Using a 32-bit version, it only takes 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns. A risk of a bug every 584 years is acceptable. Alternatives ============ Expose the version at Python level as a read-only __version__ property ---------------------------------------------------------------------- The first version of the PEP proposed to expose the dictionary version as a read-only ``__version__`` property at Python level, and also to add the property to ``collections.UserDict`` (since this type must mimick the ``dict`` API). There are multiple issues: * To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the ``dict`` type in practice. * All Python implementations would have to implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all. * Exposing the dictionary version at the Python level can lead the false assumption on performances. Checking ``dict.__version__`` at the Python level is not faster than a dictionary lookup. A dictionary lookup in Python has a cost of 48.7 ns and checking the version has a cost of 47.5 ns, the difference is only 1.2 ns (3%):: $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop * The ``__version__`` can be wrapped on integer overflow. It is error prone: using ``dict.__version__ <= guard_version`` is wrong, ``dict.__version__ == guard_version`` must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). Mandatory bikeshedding on the property name: * ``__cache_token__``: name proposed by Nick Coghlan, name coming from `abc.get_cache_token() `_. * ``__version__`` * ``__version_tag__`` * ``__timestamp__`` Add a version to each dict entry -------------------------------- A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version (a simple integer) to avoid the strong reference to the value: only strong references to the dictionary and to the key are needed. Changes: add a ``me_version_tag`` field to the ``PyDictKeyEntry`` structure, the field has the C type ``PY_UINT64_T``. When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete). Pseudo-code of an fast guard to check if a dictionary key was modified using hypothetical ``dict_get_version(dict)`` and ``dict_get_entry_version(dict)`` functions:: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key) def check(self): """Return True if the dictionary entry did not change and the dictionary was not replaced.""" # read the version of the dictionary dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary to read the entry version entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version self.entry_version = entry_version return True # the key was modified return False The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or not used). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system. In Python, the memory footprint matters and the trend is to reduce it. Examples: * `PEP 393 -- Flexible String Representation `_ * `PEP 412 -- Key-Sharing Dictionary `_ Add a new dict subtype ---------------------- Add a new ``verdict`` type, subtype of ``dict``. When guards are needed, use the ``verdict`` for namespaces (module namespace, type namespace, instance namespace, etc.) instead of ``dict``. Leave the ``dict`` type unchanged to not add any overhead (CPU, memory footprint) when guards are not used. Technical issue: a lot of C code in the wild, including CPython core, expecting the exact ``dict`` type. Issues: * ``exec()`` requires a ``dict`` for globals and locals. A lot of code use ``globals={}``. It is not possible to cast the ``dict`` to a ``dict`` subtype because the caller expects the ``globals`` parameter to be modified (``dict`` is mutable). * C functions call directly ``PyDict_xxx()`` functions, instead of calling ``PyObject_xxx()`` if the object is a ``dict`` subtype * ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some functions require the exact ``dict`` type. * ``Python/ceval.c`` does not completely supports dict subtypes for namespaces The ``exec()`` issue is a blocker issue. Other issues: * The garbage collector has a special code to "untrack" ``dict`` instances. If a ``dict`` subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. * Some functions have a fast-path for ``dict`` which would not be taken for ``dict`` subtypes, and so it would make Python a little bit slower. Prior Art ========= Method cache and type version tag --------------------------------- In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It was merged into Python 2.6. The patch adds a "type attribute cache version tag" (``tp_version_tag``) and a "valid version tag" flag to types (the ``PyTypeObject`` structure). The type version tag is not exposed at the Python level. The version tag has the C type ``unsigned int``. The cache is a global hash table of 4096 entries, shared by all types. The cache is global to "make it fast, have a deterministic and low memory footprint, and be easy to invalidate". Each cache entry has a version tag. A global version tag is used to create the next version tag, it also has the C type ``unsigned int``. By default, a type has its "valid version tag" flag cleared to indicate that the version tag is invalid. When the first method of the type is cached, the version tag and the "valid version tag" flag are set. When a type is modified, the "valid version tag" flag of the type and its subclasses is cleared. Later, when a cache entry of these types is used, the entry is removed because its version tag is outdated. On integer overflow, the whole cache is cleared and the global version tag is reset to ``0``. See `Method cache (issue #1685986) `_ and `Armin's method cache optimization updated for Python 2.6 (issue #1700288) `_. Globals / builtins cache ------------------------ In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue #10401) `_ which adds a private ``ma_version`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``Py_ssize_t``. The patch adds a "global and builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the cache. The change on the ``PyDictObject`` structure is very similar to this PEP. Cached globals+builtins lookup ------------------------------ In 2006, Andrea Griffini proposed a patch implementing a `Cached globals+builtins lookup optimization `_. The patch adds a private ``timestamp`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. Thread on python-dev: `About dictionary lookup caching `_ (December 2006). Guard against changing dict during iteration -------------------------------------------- In 2013, Serhiy Storchaka proposed `Guard against changing dict during iteration (issue #19332) `_ which adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict`` type), the field has the C type ``size_t``. This field is incremented when the dictionary is modified. PySizer ------- `PySizer `_: a memory profiler for Python, Google Summer of Code 2005 project by Nick Smallbone. This project has a patch for CPython 2.4 which adds ``key_time`` and ``value_time`` fields to dictionary entries. It uses a global process-wide counter for dictionaries, incremented each time that a dictionary is modified. The times are used to decide when child objects first appeared in their parent objects. Discussion ========== Thread on the mailing lists: * python-dev: `Updated PEP 509 `_ * python-dev: `RFC: PEP 509: Add a private version to dict `_ * python-dev: `PEP 509: Add a private version to dict `_ (january 2016) * python-ideas: `RFC: PEP: Add dict.__version__ `_ (january 2016) Copyright ========= This document has been placed in the public domain. From stephen at xemacs.org Tue Apr 19 07:46:38 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 19 Apr 2016 20:46:38 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> Message-ID: <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> Brett Cannon writes: > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull wrote: > Well, it makes *your* head hurt; It doesn't, because I have a different (and IMHO better) model. I can interpret yours without pain by comparing to that. > By providing os.fspath() I can say that I do not, under any > circumstances, want someone to guess at the encoding some bytes > path is under to get me a string and instead I want to start and > end entirely in a world of strings. IOW os.fspath() lets me work in > such a way that the instant bytes are introduced into my code for > file paths it triggers a TypeError. Does it really help you work that way? open is polymorphic, and will use os._raw_fspath(obj, (bytes,str)). Ditto os.scandir etc. If they don't, there's no point in supporting bytes returns from __fspath__, is there? Application code will normally not be calling os.fspath. In the future, pathlib will, I suppose, but even without os.fspath pathlib already protects you, as does antipathy.[1] More effective, then, is just to use pathlib for your Path-hacking work as soon as the path-representing object appears, and Path will complain about bytes for you. This is an analogue of the "decode bytes at the boundary" principle. > Yep, we are stuck with the names unless you want to propose a new > name and deprecate the old one. I already proposed fs_ensure_bytes and fs_ensure_str. I think they're sufficiently ugly to prove my point. Footnotes: [1] Strictly speaking, antipathy protects you from inadvertant mixing of bytes and str. From stephen at xemacs.org Tue Apr 19 07:55:55 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 19 Apr 2016 20:55:55 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <57153AA0.5090103@stoneleaf.us> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> Message-ID: <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> Ethan Furman writes: > On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote: > > Koos Zevenhoven writes: > > >> After all, we want something that's *almost* exclusively str. > > > > But we don't want that, AFAICT. Some clearly want this API to be > > unbiased against bytes in the same way the os APIs are unbiased, > > because that's what we've got in the current proposal. > > Are we reading the same thread? For my last several replies I am > very biased against bytes (and I know I'm not the only one). I'm not "reinterpreting" what people *write*, I'm looking at *the APIs they propose and advocate*. As I wrote, and you quoted. Except for the original proposal that only supported pathlib.Path, the facilities advocated are actually unbiased. It's just as easy to use bytes as str, but it's proposed not to advertise that fact. So what? A 'my.fspath' is trivial to write, and hard to get wrong AFAICS. Consider a truly biased alternative: __fspath__ of types like DirEntry would return self when bytes-oriented. (This addresses the issue of __fspath__ that coerces to str becoming a timebomb in bytes apps.) bytes-oriented applications would have to use DirEntry.path. No visible difference from now (you get the same API for bytes and the same TypeError from open), and no loss, except for str-envy. So use str! Why isn't that acceptable to you? Maybe even TOOWTDI? I really want to know. I'm not 100% sure that's the right way to go, mostly because Nick and Brett are signed up for polymorphism. But I sure haven't seen any explicit arguments for polymorphism, though I've asked for them. AFAICS, everybody just assumed that because some related APIs are polymorphic, this one should be, too, and dove into the problem of how to make a polymorphic API safe for Python 3. > If the client says "I'm okay with either" then I fully expect the > client to have code to properly handle str vs bytes after the > fspath (or whatever it's called) call. I would too, but, uh, examples of such clients? And no, antipathy isn't an example -- it doesn't consume bytes, it passes them through to the kind of client I want to hear about. AFAICS bytes return from __fspath__ is just YAGNI. Show me something that actually wants it. Steve From k7hoven at gmail.com Tue Apr 19 08:50:01 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 19 Apr 2016 15:50:01 +0300 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull wrote: > > AFAICS bytes return from __fspath__ is just YAGNI. Show me something > that actually wants it. It might be, but as long as bytes paths are supported polymorphicly all over the stdlib, we won't get rid of supporting bytes paths. So are you proposing to deprecate bytes paths? -Koos From victor.stinner at gmail.com Tue Apr 19 09:33:17 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 19 Apr 2016 15:33:17 +0200 Subject: [Python-Dev] PEP 509: Add a private version to dict (version 3) In-Reply-To: References: Message-ID: Hi, > Backwards Compatibility > ======================= > > Since the ``PyDictObject`` structure is not part of the stable ABI and > the new dictionary version not exposed at the Python scope, changes are > backward compatible. My current implementation inserts the new ma_version_tag field in the middle of the PyDictObject structure, so it obviously changes the ABI. Can someone please confirm (double check) that the PyDictObject structure is explicitly excluded from the stable ABI? I'm talking about about the "#ifndef Py_LIMITED_API" in Include/dictobject.h. I understood what is an ABI in the hard way. When I ran the perf.py benchmark, I got a crash in ctypes on django_v3. The ctypes module uses a C type which inherits from the dict type. I compiled Python with and without my patch in the same directory and then I renamed the ./python binary, but the _ctypes.so was shared between the two binaries. Victor From ncoghlan at gmail.com Tue Apr 19 10:26:44 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 20 Apr 2016 00:26:44 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> Message-ID: On 19 April 2016 at 21:55, Stephen J. Turnbull wrote: > I really want to know. I'm not 100% sure that's the right way to go, > mostly because Nick and Brett are signed up for polymorphism. But I > sure haven't seen any explicit arguments for polymorphism, though I've > asked for them. AFAICS, everybody just assumed that because some > related APIs are polymorphic, this one should be, too, and dove into > the problem of how to make a polymorphic API safe for Python 3. > In my case, it's ~5 years of peripheral involvement in porting the Fedora ecosystem to Python 3. I haven't personally done that much of the actual porting work, but I've spent plenty of time talking to the folks that are, and tweaking various things to make their lives easier where I could make the case that there was either a benefit to Python 3, or at least no harm to it. The gist of the motivation for bytes/str polymorphism here is similar to that for restoring __mod__ polymorphism in https://www.python.org/dev/peps/pep-0461/: the bytes/str duality is as much a fact of life when dealing with OS interfaces as it is when dealing with wire protocols, so if __fspath__ is polymorphic, then it's easier for compatibility modules like six and future to define their own "fspath" helper functions that work on both Python 2 and Python 3 across all supported platforms. This is also why I ended up proposing pushing the complexity down into a documented-but-underscore-prefixed API: folks writing pure Python 3 application code *really* shouldn't need to worry about the bytes support in the protocol, but for operating system level use cases, not having it readily available to 2/3 compatible Python code would be a pain. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.r.maier at gmx.de Tue Apr 19 07:38:48 2016 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 19 Apr 2016 13:38:48 +0200 Subject: [Python-Dev] Dependent packages not listed on PyPI Message-ID: <571618C8.5010002@gmx.de> Hi, I have a package "pywbem" which in its setup script specifies a number of dependent packages via "install_requires". I should also say that it extends setuptools/distutils with its own additional keywords, e.g. it adds a "develop_requires", but I believe (hope) that is irrelevant for my problem. In pywbem 0.8.3, the dependencies are: args = { ..., 'install_requires': [ 'six', 'ply', ], ..., } and when running on Python 2.x, an additional one is added, dependent on the OS platform and bit size: if sys.version_info[0] == 2: if platform.system() == 'Windows': if platform.architecture()[0] == '64bit': m2crypto_req = 'M2CryptoWin64>=0.21' else: m2crypto_req = 'M2CryptoWin32>=0.21' else: m2crypto_req = 'M2Crypto>=0.24' args['install_requires'] += [ m2crypto_req, ] The problem is that the pywbem package on PyPI does not show these dependencies: https://pypi.python.org/pypi/pywbem/0.8.3 I wonder whether this is the reason for a particular installation problem we have seen (https://github.com/pywbem/pywbem/issues/113). I do see other projects on PyPI, that show the dependencies they specify in their setup scripts, on their PyPI package page in a "*Requires Distributions*" section: * https://pypi.python.org/pypi/bandit/0.17.3 * https://pypi.python.org/pypi/json-spec/0.9.14 Many others also do not have their dependencies shown, including six, pbr, PyYAML, lxml, to name just a few. So far, I was unable to find out what the presence or absence of that information is related to, in the source of the project. Here are my questions: 1. What causes the "Requires Distributions" section on a PyPI package page to show up there? 2. Is it important to show up there (e.g. for some tools)? Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Apr 19 12:17:51 2016 From: brett at python.org (Brett Cannon) Date: Tue, 19 Apr 2016 16:17:51 +0000 Subject: [Python-Dev] Dependent packages not listed on PyPI In-Reply-To: <571618C8.5010002@gmx.de> References: <571618C8.5010002@gmx.de> Message-ID: Questions about PyPI should be directed at the distutils-sig mailing list. On Tue, 19 Apr 2016 at 08:12 Andreas Maier wrote: > Hi, > I have a package "pywbem" which in its setup script specifies a number of > dependent packages via "install_requires". > > I should also say that it extends setuptools/distutils with its own > additional keywords, e.g. it adds a "develop_requires", but I believe > (hope) that is irrelevant for my problem. > > In pywbem 0.8.3, the dependencies are: > > args = { > ..., > 'install_requires': [ > 'six', > 'ply', > ], > ..., > } > > and when running on Python 2.x, an additional one is added, dependent on > the OS platform and bit size: > > if sys.version_info[0] == 2: > if platform.system() == 'Windows': > if platform.architecture()[0] == '64bit': > m2crypto_req = 'M2CryptoWin64>=0.21' > else: > m2crypto_req = 'M2CryptoWin32>=0.21' > else: > m2crypto_req = 'M2Crypto>=0.24' > args['install_requires'] += [ > m2crypto_req, > ] > > The problem is that the pywbem package on PyPI does not show these > dependencies: https://pypi.python.org/pypi/pywbem/0.8.3 > > I wonder whether this is the reason for a particular installation problem > we have seen (https://github.com/pywbem/pywbem/issues/113). > > I do see other projects on PyPI, that show the dependencies they specify > in their setup scripts, on their PyPI package page in a "*Requires > Distributions*" section: > > * https://pypi.python.org/pypi/bandit/0.17.3 > * https://pypi.python.org/pypi/json-spec/0.9.14 > > Many others also do not have their dependencies shown, including six, pbr, > PyYAML, lxml, to name just a few. > > So far, I was unable to find out what the presence or absence of that > information is related to, in the source of the project. > > Here are my questions: > > 1. What causes the "Requires Distributions" section on a PyPI package > page to show up there? > > 2. Is it important to show up there (e.g. for some tools)? > > Andy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Apr 19 12:50:07 2016 From: brett at python.org (Brett Cannon) Date: Tue, 19 Apr 2016 16:50:07 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, 19 Apr 2016 at 04:46 Stephen J. Turnbull wrote: > Brett Cannon writes: > > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull > wrote: > > > Well, it makes *your* head hurt; > > It doesn't, because I have a different (and IMHO better) model. I can > interpret yours without pain by comparing to that. > > > By providing os.fspath() I can say that I do not, under any > > circumstances, want someone to guess at the encoding some bytes > > path is under to get me a string and instead I want to start and > > end entirely in a world of strings. IOW os.fspath() lets me work in > > such a way that the instant bytes are introduced into my code for > > file paths it triggers a TypeError. > > Does it really help you work that way? open is polymorphic, and will > use os._raw_fspath(obj, (bytes,str)). Ditto os.scandir etc. If they > don't, there's no point in supporting bytes returns from __fspath__, > is there? You're leaving out all of the os.path functions, but you're right that if they didn't support it like Windows then this entire discussion of bytes paths would be moot. > Application code will normally not be calling os.fspath. > In the future, pathlib will, I suppose, but even without os.fspath > pathlib already protects you, as does antipathy.[1] > I disagree that application code won't be calling os.fspath. > > More effective, then, is just to use pathlib for your Path-hacking > work as soon as the path-representing object appears, and Path will > complain about bytes for you. This is an analogue of the "decode > bytes at the boundary" principle. > Ah, but you see that doesn't make porting easy. If I have a bunch of path-manipulating code using os.path already and I want to add support for pathlib I can either (a) rewrite all of that path-manipulating code to work using pathlib, or (b) simply call `path = os.fspath(path)` and be done with it. Basically if you have written any code that uses os.path then you will have to care about (a) or (b) as a way to add support for pathlib short of the `str(path)` hack we're all working to get away from. And if people truly liked option (a) then this conversation wouldn't be such a big deal as we would have seen more people using pathlib already (yes, the provisional tag may have scared some off, but my guess is it's more from not wanting to rewrite os.path-using code). Now if you can convince me that the use of bytes paths is very minimal and thus people doing path manipulations with them will be a very small minority then I'm happy to try and use this to keep pushing people towards avoiding bytes for file paths. But over the years people such as yourself, Stephen, have convinced me that people do some really crazy stuff with their file systems and that it isn't isolated to just one or two people. And so it becomes this situation where we need to ask ourselves if we are going to tell them to just deal with it or help them transition. The other way to convince me is that people needing to support older versions of Python will use `path = path.__fspath__() if hasattr(path, '__fspath__') else path` and that allowing bytes with that idiom is going to cost them dearly. My current assumption is that it won't because people using that idiom are using os.path and those functions will complain when mixing str and bytes together, but I'm open to being convinced otherwise. I guess what I'm trying to get at is that I understand the desire to get people to get the bytes path habit, but to me the best way will be to get people quickly and easily transitioned over to pathlib as a carrot rather than using the lack of bytes path support in this transition as a stick. -Brett > > > Yep, we are stuck with the names unless you want to propose a new > > name and deprecate the old one. > > I already proposed fs_ensure_bytes and fs_ensure_str. I think they're > sufficiently ugly to prove my point. > > > Footnotes: > [1] Strictly speaking, antipathy protects you from inadvertant mixing > of bytes and str. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Apr 19 18:22:46 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 19 Apr 2016 16:22:46 -0600 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon wrote: > Ah, but you see that doesn't make porting easy. If I have a bunch of > path-manipulating code using os.path already and I want to add support for > pathlib I can either (a) rewrite all of that path-manipulating code to work > using pathlib, or (b) simply call `path = os.fspath(path)` and be done with > it. Basically if you have written any code that uses os.path then you will > have to care about (a) or (b) as a way to add support for pathlib short of > the `str(path)` hack we're all working to get away from. And if people truly > liked option (a) then this conversation wouldn't be such a big deal as we > would have seen more people using pathlib already (yes, the provisional tag > may have scared some off, but my guess is it's more from not wanting to > rewrite os.path-using code). > > Now if you can convince me that the use of bytes paths is very minimal and > thus people doing path manipulations with them will be a very small minority > then I'm happy to try and use this to keep pushing people towards avoiding > bytes for file paths. But over the years people such as yourself, Stephen, > have convinced me that people do some really crazy stuff with their file > systems and that it isn't isolated to just one or two people. And so it > becomes this situation where we need to ask ourselves if we are going to > tell them to just deal with it or help them transition. > > The other way to convince me is that people needing to support older > versions of Python will use `path = path.__fspath__() if hasattr(path, > '__fspath__') else path` and that allowing bytes with that idiom is going to > cost them dearly. My current assumption is that it won't because people > using that idiom are using os.path and those functions will complain when > mixing str and bytes together, but I'm open to being convinced otherwise. > > I guess what I'm trying to get at is that I understand the desire to get > people to get the bytes path habit, but to me the best way will be to get > people quickly and easily transitioned over to pathlib as a carrot rather > than using the lack of bytes path support in this transition as a stick. Perhaps I missed previous discussion on the point, but why not support both __fspath__() -> str and __fssyspath__() -> bytes? Returning NotImplemented would indicate "try the other one". For example, DirEntry.__fspath__() would return NotImplemented when the underlying value is bytes and vice-versa. A str-specific os.fspath would looks something like this: def fspath(path): try: fspath = type(path).__fspath__ except AttributeError: pass else: rendered = fspath(path) if rendered is not NotImplemented: return rendered raise TypeError ...and a more lenient, polymorphic version (for use by os.path.*, etc.) would look like this: def _fspath(path): try: fspath = type(path).__fspath__ except AttributeError: pass else: rendered = fspath(path) if rendered is not NotImplemented: return rendered try: fspath = type(path).__fssyspath__ except AttributeError: pass else: rendered = fspath(path) if rendered is not NotImplemented: return rendered # nothing to do return path The hard distinction between the two dunder methods preserves the conceptual str/bytes division we're aiming for. It will be much easier to identify which path implementations are dealing with (or supporting) bytes paths. Likewise with the two helpers and their usage. -eric From brett at python.org Tue Apr 19 19:05:28 2016 From: brett at python.org (Brett Cannon) Date: Tue, 19 Apr 2016 23:05:28 +0000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, 19 Apr 2016 at 15:22 Eric Snow wrote: > On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon wrote: > > Ah, but you see that doesn't make porting easy. If I have a bunch of > > path-manipulating code using os.path already and I want to add support > for > > pathlib I can either (a) rewrite all of that path-manipulating code to > work > > using pathlib, or (b) simply call `path = os.fspath(path)` and be done > with > > it. Basically if you have written any code that uses os.path then you > will > > have to care about (a) or (b) as a way to add support for pathlib short > of > > the `str(path)` hack we're all working to get away from. And if people > truly > > liked option (a) then this conversation wouldn't be such a big deal as we > > would have seen more people using pathlib already (yes, the provisional > tag > > may have scared some off, but my guess is it's more from not wanting to > > rewrite os.path-using code). > > > > Now if you can convince me that the use of bytes paths is very minimal > and > > thus people doing path manipulations with them will be a very small > minority > > then I'm happy to try and use this to keep pushing people towards > avoiding > > bytes for file paths. But over the years people such as yourself, > Stephen, > > have convinced me that people do some really crazy stuff with their file > > systems and that it isn't isolated to just one or two people. And so it > > becomes this situation where we need to ask ourselves if we are going to > > tell them to just deal with it or help them transition. > > > > The other way to convince me is that people needing to support older > > versions of Python will use `path = path.__fspath__() if hasattr(path, > > '__fspath__') else path` and that allowing bytes with that idiom is > going to > > cost them dearly. My current assumption is that it won't because people > > using that idiom are using os.path and those functions will complain when > > mixing str and bytes together, but I'm open to being convinced otherwise. > > > > I guess what I'm trying to get at is that I understand the desire to get > > people to get the bytes path habit, but to me the best way will be to get > > people quickly and easily transitioned over to pathlib as a carrot rather > > than using the lack of bytes path support in this transition as a stick. > > Perhaps I missed previous discussion on the point, but why not support > both __fspath__() -> str and __fssyspath__() -> bytes? Returning > NotImplemented would indicate "try the other one". For example, > DirEntry.__fspath__() would return NotImplemented when the underlying > value is bytes and vice-versa. > It was deemed more complexity than necessary for the protocol to have two functions. Either __fspath__ will be polymorphic or it will only return str. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Apr 19 19:33:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 20 Apr 2016 01:33:44 +0200 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> <56B35AB5.5090308@egenix.com> <56BDDEA3.2060702@egenix.com> Message-ID: Ping? Is someone still opposed to my change #26249 "Change PyMem_Malloc to use pymalloc allocator"? If no, I think that I will push my change. My change only changes two lines, so it can be easily reverted before CPython 3.6 if we detect major issues in third-party extensions. And maybe it's better to push such change today to get more time to play with it, than pushing it late in the development of CPython 3.6. The new PYTHONMALLOC=debug feature allows to quickly and easily check the usage of the PyMem_Malloc() API, even if Python is compiled in release mode. I checked multiple Python extensions written in C. I only found one bug in numpy and I sent a patch (not merged yet). victor 2016-03-15 0:19 GMT+01:00 Victor Stinner : > 2016-02-12 14:31 GMT+01:00 M.-A. Lemburg : >>>> If your program has bugs, you can use a debug build of Python 3.5 to >>>> detect misusage of the API. >> >> Yes, but people don't necessarily do this, e.g. I have >> for a very long time ignored debug builds completely >> and when I started to try them, I found that some of the >> things I had been doing with e.g. free list implementations >> did not work in debug builds. > > I just added support for debug hooks on Python memory allocators on > Python compiled in *release* mode. Set the environment variable > PYTHONMALLOC to debug to try with Python 3.6. > > I added a check on PyObject_Malloc() debug hook to ensure that the > function is called with the GIL held. I opened an issue to add a > similar check on PyMem_Malloc(): > https://bugs.python.org/issue26563 > > >> Yes, but those are part of the stdlib. You'd need to check >> a few C extensions which are not tested as part of the stdlib, >> e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom >> types in C since these will often need the memory management >> APIs). >> >> It may also be a good idea to check wrapper generators such >> as cython, swig, cffi, etc. > > I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi). > > I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL: > https://github.com/numpy/numpy/pull/7404 > > Except of this bug, all other tests pass with PyMem_Malloc() using > pymalloc and all debug checks. > > Victor From stephen at xemacs.org Tue Apr 19 23:11:16 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 20 Apr 2016 12:11:16 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> Message-ID: <22294.62292.860019.21366@turnbull.sk.tsukuba.ac.jp> Koos Zevenhoven writes: > On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull wrote: > > > > AFAICS bytes return from __fspath__ is just YAGNI. Show me something > > that actually wants it. > > It might be, May I take that as meaning you just jumped to the conclusion that extending polymorphism is useful on no actual evidence of usefulness? > but as long as bytes paths are supported polymorphicly all over the > stdlib, we won't get rid of supporting bytes paths. So are you > proposing to deprecate bytes paths? You claim "almost always want str", Ethan claims "bias against bytes." Sorry, guys, you can't have it both ways. Either bytes paths are discouraged (not "deprecated", not yet), or they aren't. I say, let's not encourage them. Ie, keep the status quo for bytes, and make things better for the preferred str. Yes, that means discouraging bytes relative to str in this context. That's a Python 3 principle, one strong enough to justify the huge compatibility break involved in making str be Unicode. That compatibility break has been extremely successful in my personal experience as a sometime Python teacher and Mailman developer, though the Mercurial developers have a different POV. From stephen at xemacs.org Tue Apr 19 23:16:29 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 20 Apr 2016 12:16:29 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> Message-ID: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> Brett Cannon writes: > Now if you can convince me that the use of bytes paths is very > minimal I doubt that I can do that, because all that Python 2 code is effectively bytes. To the extent that people are just passing it into their bytes-domain code and it works for them, they probably "port" to Python 3 by using bytes for paths. I just don't think bytes usage per se matters to the issue of polymorphism of __fspath__. > Ah, but you see that doesn't make porting easy. If I have a bunch > of path-manipulating code using os.path already and I want to add > support for pathlib I can either (a) rewrite all of that > path-manipulating code to work using pathlib, or (b) simply call > `path = os.fspath(path)` and be done with it. OK, so what matters here is not "how many people are using bytes". They can keep using os.path, which is what they probably have already been using. What we are worrying about is that (1) some really attractive producer of pathlib.Paths will be published, and (2) people will want to plug that producer into their bytes paths consumers using os.fspath(path) "and be done with it". Excuse me, but that doesn't make sense as written. Path.__fspath__ will return str, in any case. So these developers have to consume text to use pathlib, even merely as a consumer of Paths. No need for polymorphism here, simply because it won't be used in this instance. What's left is DirEntry (and perhaps other producers of byte-oriented objects in os and os.path). If they're currently using DirEntry, they're currently accessing .path. Surely bytes users can continue doing that, even if we offer str users the advantage of new protocols? I conclude that there is no real use in having a polymorphic __fspath__ unless callers of os.fspath can communicate desired return type to it, and it implicitly coerces to that type. But then open and friends *implicitly* consume __fspath__. So there probably needs to be a way to communicate the desired type to them in the case where they receive an __fspath__-bearing object so they can tell os.fspath what their callers want, no? Supporting both "pipeline polymorphism" of this kind and implicit conversion protocols at the same time is quite complicated, I think. > [Folks] have convinced me that people do some really crazy stuff > with their file systems and that it isn't isolated to just one or > two people. And so it becomes this situation where we need to ask > ourselves if we are going to tell them to just deal with it or help > them transition. People who have to deal with really crazy stuff in filesystems are already manipulating paths as text. It's not we who need help with the transition that matters (bytes to text). We can use os.path or pathlib, but bytes just don't matter because we're not using them in path manipulations. It's people who live in monolingual mono-encoding environments who will be using bytes successfully, and be resistent to costly changes that don't make their lives better. But the bytes vs. text cost is inherent in using pathlib, so polymorphism doesn't help promote pathlib. It might help promote use of os.scandir in bytes-oriented code, though I don't see that as a huge effect nor more than mildly desirable. Is it? Steve From stephen at xemacs.org Tue Apr 19 23:19:31 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 20 Apr 2016 12:19:31 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> Message-ID: <22294.62787.882889.852338@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > The gist of the motivation for bytes/str polymorphism here is similar to > that for restoring __mod__ polymorphism in > https://www.python.org/dev/peps/pep-0461/: I don't think it is, actually. Filenames off the wire cannot be relied on to be in the local file system encoding, and that matters. The semantics of a filename or path requires getting the encodings matched. You cannot be encoding-agnostic. On the other hand, streams of characters are merely a special case of streams of tokens, and the principles that apply to editing streams of characters apply to more general tokens, including bytes and XML. You *can* be content-agnostic as long as you define semantics in terms of moving tokens around, and not in terms of their content. BTW, my opposition to PEP 461 was based on the same mistake with opposite polarity: I think of bytes as encoded text *first*, and therefore feared PEP 461 for quite insufficient reason. Most applications of PEP 461 won't be for text. > This is also why I ended up proposing pushing the complexity down into a > documented-but-underscore-prefixed API: folks writing pure Python 3 > application code *really* shouldn't need to worry about the bytes > support You can't have that with your proposal. They are going to (at least in theory) get a new TypeError which they will not be expecting (vs bytes, which are implicit in the object they have, where previously they would have got one vs. Path or DirEntry which they were expecting). So they will have to learn that much about bytes support. > in the protocol, but for operating system level use cases, not having it > readily available to 2/3 compatible Python code would be a pain. Erm, how do you propose to make this protocol available to Python-2- compatible code? Pervasively monkey-patch the Python 2 os module? Even if so, is it our responsibility to worry about that? BTW, I came to this conclusion thinking about the poster boy for PEP 461, Mercurial. From rosuav at gmail.com Tue Apr 19 23:34:44 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 20 Apr 2016 13:34:44 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Apr 20, 2016 at 1:16 PM, Stephen J. Turnbull wrote: > Brett Cannon writes: > > > Now if you can convince me that the use of bytes paths is very > > minimal > > I doubt that I can do that, because all that Python 2 code is > effectively bytes. To the extent that people are just passing it into > their bytes-domain code and it works for them, they probably "port" to > Python 3 by using bytes for paths. I just don't think bytes usage per > se matters to the issue of polymorphism of __fspath__. > I would prefer to see this kind of code ported to Python 3 by using native strings. Python 2 code: import json with open(".config/obs-studio/basic/scenes/Standard.json") as f: data = json.load(f) for scene in data["scene_order"]: print scene["name"] Python 3 code: import json with open(".config/obs-studio/basic/scenes/Standard.json") as f: data = json.load(f) for scene in data["scene_order"]: print(scene["name"]) The bulk of path string literals in Python programs will be all-ASCII. Porting to Py3 won't fundamentally change this code, yet suddenly now it's using Unicode strings. In reality, both versions of this example are using *text* strings. The Py3 version has text in the source code, a stream of Unicode codepoints in the runtime, and then (since I ran this on Linux) encodes that to bytes for the file system. The Py2 version just does that conversion a little earlier: text in the source code, a stream of eight-bit "texty bytes" in the runtime, and those same bytes get given to the fs. There's no reason to slap a b"..." prefix on every path for Py3. There might be specific situations where you want that, but for the most part, those paths came from human-readable text anyway, so they should stay that way. ChrisA From stephen at xemacs.org Wed Apr 20 02:31:33 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 20 Apr 2016 15:31:33 +0900 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> Message-ID: Eric Snow writes: > On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon wrote: > > Ah, but you see that doesn't make porting easy. > Perhaps I missed previous discussion on the point, but why not support > both __fspath__() -> str and __fssyspath__() -> bytes? That's fine by me, I can live with that although I don't really like it. But the proponents of polymorphic __fspath__ think it's unnecessary. Why I don't like it: what's going to end up happening is that a __fspath__- or __fssyspath__-bearing object of unknown provenance is going to get passed to polymorphic os functions that won't complain, and a few million cycles later something is going to access fileobj.path expecting bytes and getting str, and blooey! Also I just don't see a need for bytes when the original purpose of this was to support passing pathlib.Path objects to open. It's also nice to pass DirEntry objects to open, but it's not obvious to me that we need to support bytes since only new code can use this feature, and there's a way to not-support them that doesn't cause any new problems. It's not that I want bytes to go away[1], it's just that the playing field will tilt a little more against them in new code. Footnotes: [1] I wouldn't weep, but I wouldn't laugh, either. From pjenvey at underboss.org Wed Apr 20 03:54:59 2016 From: pjenvey at underboss.org (Philip Jenvey) Date: Wed, 20 Apr 2016 00:54:59 -0700 Subject: [Python-Dev] Bytes path In-Reply-To: References: Message-ID: Yes, in the 3.2 time frame there was a consensus that only bytes and their subclasses should be accepted. buffer support crept back into the posix module with the major changes in 3.3, likely by mistake. A couple new issues are proposed to remove these inconsistencies/regressions: http://bugs.python.org/issue26754 http://bugs.python.org/issue26800 -- Philip Jenvey > On Apr 14, 2016, at 3:29 AM, Victor Stinner wrote: > > IMHO it's more a side effect of the implementation than a deliberate choice. For new code which really want to support bytes paths, I suggest to only accept bytes and bytes subclasses. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Apr 20 04:20:10 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 20 Apr 2016 11:20:10 +0300 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.62292.860019.21366@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13468.902038.792109@turnbull.sk.tsukuba.ac.jp> <57153AA0.5090103@stoneleaf.us> <22294.7371.765793.202689@turnbull.sk.tsukuba.ac.jp> <22294.62292.860019.21366@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Apr 20, 2016 at 6:11 AM, Stephen J. Turnbull wrote: > Koos Zevenhoven writes: > > On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull wrote: > > > > > > AFAICS bytes return from __fspath__ is just YAGNI. Show me something > > > that actually wants it. > > > > It might be, > > May I take that as meaning you just jumped to the conclusion that > extending polymorphism is useful on no actual evidence of usefulness? No you may not! YAGNI almost never means "you are *never* going to need it". And if you implement a feature, better implement it well. If a variation of the feature is rarely used, that is perfectly fine. I think leaving bytes out would complicate things. If os.fspath does its job well, everyone should be happy. I kept bringing up bytes paths, because that is already a feature in Python 3. Then (already some time ago in these discussions) I briefly visited the thought of 'can we deprecate bytes paths', and it then quickly became clear to me that is not going to happen any time soon. In other words: As long as bytes paths are supported, they should be supported consistently. I don't want DirEntry to behave differently when the underlying type is bytes, which is one of the things I've been talking about all the time. That would just be broken. And as you also understand, one point is to allow passing DirEntry to open. Or any of the os.path functions. An some more: I don't want open(direntry_obj) to ever raise because it is the bytes flavor of direntry, because, when they are created, DirEntry objects always point to existing objects on the file system. I also don't want implicit conversions between str and bytes paths, because there are cases where they will produce strange results and exceptions. [Yes, way back in the p-string thread, I did first suggest a similiar thing that implied implicit conversion, but I soon abandoned that part.] Not that I will ever use these features---just to do this right. > > but as long as bytes paths are supported polymorphicly all over the > > stdlib, we won't get rid of supporting bytes paths. So are you > > proposing to deprecate bytes paths? > > You claim "almost always want str", Ethan claims "bias against bytes." > Sorry, guys, you can't have it both ways. Either bytes paths are > discouraged (not "deprecated", not yet), or they aren't. > > I say, let's not encourage them. It's all essentially the same thing: "almost always want str": Yes, I still claim this. This is the reason for str (and rejecting bytes) being the default for third-party code. If we wanted to, we could even leave bytes support out of the documentation, so no-one will know about it unless they already deal with bytes paths. However, I dont think we should do that---we should just strongly discourage using the bytes version unless there is a reason to, and you know what you are doing. "bias against bytes": I agree with this too. This is in line with making str (and rejecting bytes) the default for third-party code. "let's not encourage them": And I even agree with this, as you may have noticed. I just don't believe in deliberately making implementations awkward for the bytes-based paths. Bytes paths already exist, not because of Python 2 (as you know), but because not all operating systems guarantee that paths make sense in any encoding, and people may need to work at that level. There is no need to make working with bytes-based paths awkward, and we can support them with little additional work compared to supporting str-based rich path objects. The additional work is mostly this discussion. > Ie, keep the status quo for bytes, > and make things better for the preferred str. Yes, that means > discouraging bytes relative to str in this context. That's a Python 3 > principle, one strong enough to justify the huge compatibility break > involved in making str be Unicode. That compatibility break has been > extremely successful in my personal experience as a sometime Python > teacher and Mailman developer, though the Mercurial developers have a > different POV. Yes. Luckily, people are already using str-based paths. We don't need any more discrete transitions. If linux will start to enforce an encoding, as Guido and Random832 may be suggesting on python-ideas, these already obscure bytes paths will slowly fade away. -Koos From k7hoven at gmail.com Wed Apr 20 06:19:50 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 20 Apr 2016 13:19:50 +0300 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Apr 20, 2016 at 6:16 AM, Stephen J. Turnbull wrote: > > (1) some really attractive producer of pathlib.Paths will be > published, and > Yes, pathlib is str-only, so this sounds just right. > (2) people will want to plug that producer into their bytes paths > consumers using os.fspath(path) "and be done with it". > No, fspath can't know that is the the right thing to do. There should be *someone* that is aware of the encoding that happens, either the provider or the consumer. That byte path consumer, assuming it wants to support the behavior you describe, should use os.fsencode instead of os.fspath, which will do exactly what you want, and just as easy for the bytes path consumer to implement! (Unless you want to explicitly reject plain str objects, which you would then indeed do *explicitly*, but I'm not sure there is a point in accepting plain bytes and str-based pathlib objects but not str). To avoid further unnecessary discussion, please read [1] carefully, where I already explained this, among other things. -Koos [1] https://mail.python.org/pipermail/python-dev/2016-April/144239.html From victor.stinner at gmail.com Wed Apr 20 07:52:22 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 20 Apr 2016 13:52:22 +0200 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP Message-ID: Hi, I'm unable to count the number of threads about the fspath protocol. It's even more difficult to count the total number of emails. IMHO everyone had enough time to give him/her opinion. We even had multiple summaries :-) Can you please wait for a PEP? Brett Canon and Ethan Furman are working on a PEP. So please give them time to write it. The PEP should summarize the discussion and help a lot to make concrete progress on the design (avoid restarting to discuss the same points forever). I don't expect that more emails would add anything at the current state of the discussion. I think that we have enough other topics to discuss in the meanwhile ;-) FYI there is already an article about fspath/pathlib on LWN. Here is a free link until the article is freely accessible: "Python looks at paths" By Jake Edge (April 13, 2016) https://lwn.net/SubscriberLink/683350/4f52334af09653c8/ Victor From k7hoven at gmail.com Wed Apr 20 09:30:39 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 20 Apr 2016 16:30:39 +0300 Subject: [Python-Dev] Pathlib enhancements - improve fsdecode and fsencode In-Reply-To: <22287.16117.707682.669635@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16117.707682.669635@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Apr 14, 2016 at 9:55 AM, Stephen J. Turnbull wrote: > Please please please, junk both "filter out bytes" proposals. If you were referring to some of the fspath versions, I think we will need a bytes-rejecting version, for reasons explained in [1-2]. Of course not eve?yone wants or has to use it. > Since they involve an exception, they impose an unnecessary "try" on > all text applications that fear death on bytes returns. May as well > just wrap all objects with __fspath__ in fsdecode, and all is > happy. > > Counterproposal: make fsdecode and fsencode grok __fspath__. Then: Not being a native English speaker, I'm relying on a Wikipedia explanation of "grok", but if you mean that fsdecode and fsencode would accept objects that implement __fspath__, then I think we all agree on this. Making the stdlib accept path objects, after all, is the whole point of the pathlib discussions :). Anyway, I am happy that Nick [3] (and you [4] ?) pointed out that os.fsencode and os.fsdecode currently implement coercion, i.e., they both accept both str and bytes, and return just one of them. This was important for my conclusion in [1]. When these two functions are made __fspath__ compatible using `fspath(patharg, output_types = (str, bytes))`, like most os functions, they will indeed implement coercion to bytes or str from "any pathlike object". [Side note: One may, for instance, ask why os.fsdecode passes str objects through silently, even if they can't be decoded. Well, that's the way it is, and I'm not expecting that to change. But maybe fsdecode should have an additional keyword-only argument to tell them that it should strictly return something it actually did decode. (And similarly for os.fsencode.) But this has nothing to do with the path protocol we are discussing.] > (1) Bytes-lovers and str-addicts are both safe. I don't think everyone is safe if you cant say "I don't want implicit encoding/decoding". > (2) They can omit fspath, too! I think having *one* additonal function for the non-encoding/non-decoding cases is too much, and as shown in [1], one is enough. > No, that doesn't work if the bytes objects aren't in the file system > encoding, but these are *bytes*, mon ami: you have no way to find out > what that encoding is, so you either know already and you substitute > that + fspath for fsdecode, or you're hosed. And in the only concrete > use case so far, fsdecode Just Works. Well, as you say yourself, fsdecode indeed works if your bytes are in the default fs encoding, and when you know they are, go for it, use fsdecode. But I, for instance, rarely have my paths as bytes. Therefore, I would be happy to get an exception if I'm accidentally passing bytes to some non-bytes-supporting function because I've forgotten to decode some input that I got in an encoding other than the file system encoding. > I suppose a similar argument holds for applications that want bytes > and fsencode, but I leave that as an exercise for the reader. A similar counterargument holds, too :). Unrelated to this particular post, I believe these discussions are almost done and I truly hope we at least won't have to keep addressing the same questions that we have already gone through, unless there is something new on the table. I hope it takes a shorter time to read these emails than it takes to write them :). -Koos [1] https://mail.python.org/pipermail/python-dev/2016-April/144239.html [2] https://mail.python.org/pipermail/python-dev/2016-April/144290.html And somewhat older ones: [3] https://mail.python.org/pipermail/python-dev/2016-April/144101.html [4] https://mail.python.org/pipermail/python-dev/2016-April/144107.html From ethan at stoneleaf.us Wed Apr 20 09:58:07 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 20 Apr 2016 06:58:07 -0700 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: References: Message-ID: <57178AEF.7040205@stoneleaf.us> On 04/20/2016 04:52 AM, Victor Stinner wrote: > Can you please wait for a PEP? Brett Canon and Ethan Furman are > working on a PEP. Actually, Brett Canon and Chris Angelico. > So please give them time to write it. Okay, I'll shut-up now. ;) -- ~Ethan~ From rosuav at gmail.com Wed Apr 20 10:00:56 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 21 Apr 2016 00:00:56 +1000 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: <57178AEF.7040205@stoneleaf.us> References: <57178AEF.7040205@stoneleaf.us> Message-ID: On Wed, Apr 20, 2016 at 11:58 PM, Ethan Furman wrote: > On 04/20/2016 04:52 AM, Victor Stinner wrote: > >> Can you please wait for a PEP? Brett Canon and Ethan Furman are >> working on a PEP. > > > Actually, Brett Canon and Chris Angelico. I thought just Brett; my half of the proposal (the generic "string-like" protocol) was withdrawn as being too broad in scope for the justifying use-cases. Brett, your turn. :) ChrisA From k7hoven at gmail.com Wed Apr 20 11:27:59 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 20 Apr 2016 18:27:59 +0300 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: References: Message-ID: On Wed, Apr 20, 2016 at 2:52 PM, Victor Stinner wrote: > Hi, > > I'm unable to count the number of threads about the fspath protocol. > It's even more difficult to count the total number of emails. IMHO > everyone had enough time to give him/her opinion. Couldn't agree more. > We even had multiple > summaries :-) I'm not quite as sure about this. Maybe the meaning of "summary" in the subculture of python lists is different from the one I know. > Can you please wait for a PEP? Brett Canon and Ethan Furman are > working on a PEP. So please give them time to write it. I wonder what happened there... > The PEP should summarize the discussion and help a lot to make > concrete progress on the design (avoid restarting to discuss the same > points forever). I don't expect that more emails would add anything at > the current state of the discussion. Again, agreed, and this part makes me feel relieved. Personally, I got tired of the discussion a long time ago, but felt it had to be finished. > I think that we have enough other topics to discuss in the meanwhile ;-) No doubt about that. > FYI there is already an article about fspath/pathlib on LWN. Here is a > free link until the article is freely accessible: > > "Python looks at paths" By Jake Edge (April 13, 2016) > https://lwn.net/SubscriberLink/683350/4f52334af09653c8/ Wow. Wasn't expecting that. A whole story about the notorious "path discussions"! (well, up to some date). Anyway, the beginning seems fairly accurate, but then, among other things, it fails to mention this for example: https://mail.python.org/pipermail/python-ideas/2016-March/039179.html https://mail.python.org/pipermail/python-ideas/2016-April/039595.html Since I did not get any responses to that suggestion, it felt like a dead end, and I continued experimenting with other things and ended up taking the approach of "subclassing path-types from str gives more complete pathlib support, but the objects should not pretend to be strings in every way". By the way, I even implemented this, which I suppose I failed to mention. Admittedly, it became a little awkward in the end, but the main point was to provide a smooth transition from a str world to a PurePath-subclass world (as opposed to a discrete one like Py3k). While I was working on that, the discussions on -dev seemed to have reopened the gate at exactly that 'dead end' I mentioned before, and had started to step through it. -Koos > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com From ethan at stoneleaf.us Wed Apr 20 11:58:50 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 20 Apr 2016 08:58:50 -0700 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: References: Message-ID: <5717A73A.3070106@stoneleaf.us> On 04/20/2016 04:52 AM, Victor Stinner wrote: > FYI there is already an article about fspath/pathlib on LWN. Here is a > free link until the article is freely accessible: > > "Python looks at paths" By Jake Edge (April 13, 2016) > https://lwn.net/SubscriberLink/683350/4f52334af09653c8/ Nice article, thanks for sharing! -- ~Ethan~ From brett at python.org Wed Apr 20 12:12:01 2016 From: brett at python.org (Brett Cannon) Date: Wed, 20 Apr 2016 16:12:01 +0000 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: References: <57178AEF.7040205@stoneleaf.us> Message-ID: On Wed, 20 Apr 2016 at 07:07 Chris Angelico wrote: > On Wed, Apr 20, 2016 at 11:58 PM, Ethan Furman wrote: > > On 04/20/2016 04:52 AM, Victor Stinner wrote: > > > >> Can you please wait for a PEP? Brett Canon and Ethan Furman are > >> working on a PEP. > I was actually going to send this email when I got in to work today, but Victor and timezones beat me to it. :) > > > > > > Actually, Brett Canon and Chris Angelico. > > I thought just Brett; my half of the proposal (the generic > "string-like" protocol) was withdrawn as being too broad in scope for > the justifying use-cases. > > Brett, your turn. :) > I thought Chris and I w/ Ethan helping with coding, but if it's just me for the PEP then that's fine; luckily my firefighter gear is well-worn: https://goo.gl/photos/R8oWdLE45d99ebaw8 I'll try to get a PEP draft written and posted prior to PyCon US. I will reply to any dangling comments/issues that have appeared overnight to close those threads, but otherwise I will start ignoring all discussions so I can focus on the PEP. Everyone can now consider themselves spared from any further path-related discussions. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Apr 20 12:34:54 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 20 Apr 2016 18:34:54 +0200 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: References: <57178AEF.7040205@stoneleaf.us> Message-ID: Hi, 2016-04-20 18:12 GMT+02:00 Brett Cannon : >> >> Can you please wait for a PEP? Brett Canon and Ethan Furman are >> >> working on a PEP. > > I was actually going to send this email when I got in to work today, but > Victor and timezones beat me to it. :) Ha ha, bitten by the french connection! > I thought Chris and I w/ Ethan helping with coding, but if it's just me for > the PEP then that's fine; luckily my firefighter gear is well-worn: > https://goo.gl/photos/R8oWdLE45d99ebaw8 LOL, it seems appropriate for this topic... > I'll try to get a PEP draft written and posted prior to PyCon US. I will > reply to any dangling comments/issues that have appeared overnight to close > those threads, but otherwise I will start ignoring all discussions so I can > focus on the PEP. Everyone can now consider themselves spared from any > further path-related discussions. :) I hesitated to propose to create a fspath-sig mailing list, but I suck at humor and so I skipped this joke in my email ;-) Victor From k7hoven at gmail.com Wed Apr 20 13:51:16 2016 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 20 Apr 2016 20:51:16 +0300 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: References: <57178AEF.7040205@stoneleaf.us> Message-ID: On Wed, Apr 20, 2016 at 7:34 PM, Victor Stinner wrote: > > 2016-04-20 18:12 GMT+02:00 Brett Cannon : >> >> I thought Chris and I w/ Ethan helping with coding, but if it's just me for >> the PEP then that's fine; Well, just in case you didn't notice this on python-ideas, I offered to work on the PEP in case there turns out to be one. This was when Guido had asked if there is going to be a PEP, in response to my "Type hinting for path-related functions" email. That offer is certainly still valid. >> luckily my firefighter gear is well-worn: >> https://goo.gl/photos/R8oWdLE45d99ebaw8 > > LOL, it seems appropriate for this topic... It sure has been flammable XD >> I'll try to get a PEP draft written and posted prior to PyCon US. I will >> reply to any dangling comments/issues that have appeared overnight to close >> those threads, but otherwise I will start ignoring all discussions so I can >> focus on the PEP. Everyone can now consider themselves spared from any >> further path-related discussions. :) Yes, going from endless discussion to PEP seems like a very healthy direction at this point. > I hesitated to propose to create a fspath-sig mailing list, but I suck > at humor and so I skipped this joke in my email ;-) You did not have to tell that joke, the joke was present all the time ;). > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com From brett at python.org Wed Apr 20 14:21:30 2016 From: brett at python.org (Brett Cannon) Date: Wed, 20 Apr 2016 18:21:30 +0000 Subject: [Python-Dev] Discussion on fspath: please wait for a PEP In-Reply-To: References: <57178AEF.7040205@stoneleaf.us> Message-ID: On Wed, 20 Apr 2016 at 10:51 Koos Zevenhoven wrote: > On Wed, Apr 20, 2016 at 7:34 PM, Victor Stinner > wrote: > > > > 2016-04-20 18:12 GMT+02:00 Brett Cannon : > >> > >> I thought Chris and I w/ Ethan helping with coding, but if it's just me > for > >> the PEP then that's fine; > > Well, just in case you didn't notice this on python-ideas, I offered > to work on the PEP in case there turns out to be one. This was when > Guido had asked if there is going to be a PEP, in response to my "Type > hinting for path-related functions" email. That offer is certainly > still valid. > I'm going to host the draft PEP at https://github.com/brettcannon/path-pep/blob/master/pep-0NNN.rst . Let me get a basic first draft done so there is something to work off of and if you still want to be co-author I'll be more than happy to add you as a co-author to help me finish it! -Brett > > >> luckily my firefighter gear is well-worn: > >> https://goo.gl/photos/R8oWdLE45d99ebaw8 > > > > LOL, it seems appropriate for this topic... > > It sure has been flammable XD > > >> I'll try to get a PEP draft written and posted prior to PyCon US. I will > >> reply to any dangling comments/issues that have appeared overnight to > close > >> those threads, but otherwise I will start ignoring all discussions so I > can > >> focus on the PEP. Everyone can now consider themselves spared from any > >> further path-related discussions. :) > > Yes, going from endless discussion to PEP seems like a very healthy > direction at this point. > > > I hesitated to propose to create a fspath-sig mailing list, but I suck > > at humor and so I skipped this joke in my email ;-) > > You did not have to tell that joke, the joke was present all the time ;). > > > Victor > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Apr 20 19:34:55 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 21 Apr 2016 01:34:55 +0200 Subject: [Python-Dev] PEP 509: Add a private version to dict (version 3) In-Reply-To: References: Message-ID: Hi, Guido van Rossum and Jim J. Jewett suggested me to *not require* to always increase the dict version if a dict method does not modify its content. I modified the Changes section to only require that the version is increased when the dictionary content is modified. I also explained the nice side effect of having an unique identifier for two empty dictionaries: it avoids a strong reference when checking if a namespace (dictionary) was replaced. Yury Selivanov's opcode cache uses this property to avoid a strong refence on builtin and global namespaces. It's also a written reply to Armin Rigo's suggestion to use the version 0 for empty dictionaries (new empty dict or for dict.clear()). The modified Changes section: Changes ======= Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global dictionary version. Each time a dictionary is created, the global version is incremented and the dictionary version is initialized to the global version. Each time the dictionary content is modified, the global version must be incremented and copied to the dictionary version. Dictionary methods which can modify its content: * ``clear()`` * ``pop(key)`` * ``popitem()`` * ``setdefault(key, value)`` * ``__delitem__(key)`` * ``__setitem__(key, value)`` * ``update(...)`` The choice of increasing or not the version when a dictionary method does not change its content is left to the Python implementation. A Python implementation can decide to not increase the version to avoid dictionary lookups in guards. Examples of cases when dictionary methods don't modify its content: * ``clear()`` if the dict is already empty * ``pop(key)`` if the key does not exist * ``popitem()`` if the dict is empty * ``setdefault(key, value)`` if the key already exists * ``__delitem__(key)`` if the key does not exist * ``__setitem__(key, value)`` if the new value is identical to the current value * ``update()`` if called without argument or if new values are identical to current values Setting a key to a new value equals to the old value is also considered as an operation modifying the dictionary content. Two different empty dictionaries must have a different version to be able to identify a dictionary just by its version. It allows to verify in a guard that a namespace was not replaced without storing a strong reference to the dictionary. Using a borrowed reference does not work: if the old dictionary is destroyed, it is possible that a new dictionary is allocated at the same memory address. By the way, dictionaries don't support weak references. The version increase must be atomic. In CPython, the Global Interpreter Lock (GIL) already protects ``dict`` methods to make changes atomic. Example using an hypothetical ``dict_get_version(dict)`` function:: >>> d = {} >>> dict_get_version(d) 100 >>> d['key'] = 'value' >>> dict_get_version(d) 101 >>> d['key'] = 'new value' >>> dict_get_version(d) 102 >>> del d['key'] >>> dict_get_version(d) 103 The field is called ``ma_version_tag``, rather than ``ma_version``, to suggest to compare it using ``version_tag == old_version_tag``, rather than ``version <= old_version`` which becomes wrong after an integer overflow. -- Victor From burkhardameier at gmail.com Thu Apr 21 07:00:29 2016 From: burkhardameier at gmail.com (Burkhard Meier) Date: Thu, 21 Apr 2016 04:00:29 -0700 Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our Python backup? Message-ID: Well, Just a thought. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Apr 21 07:32:16 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 21 Apr 2016 21:32:16 +1000 Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our Python backup? In-Reply-To: References: Message-ID: <20160421113214.GH1819@ando.pearwood.info> On Thu, Apr 21, 2016 at 04:00:29AM -0700, Burkhard Meier wrote: > Well, > > Just a thought. I'm afraid I have no idea what you are referring to. -- Steve From burkhardameier at gmail.com Thu Apr 21 07:41:29 2016 From: burkhardameier at gmail.com (Burkhard Meier) Date: Thu, 21 Apr 2016 04:41:29 -0700 Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our Python backup? In-Reply-To: <20160421113214.GH1819@ando.pearwood.info> References: <20160421113214.GH1819@ando.pearwood.info> Message-ID: Don't be afraid. This is just CEO talk. Let's imagine Python without a leader. All commercial companies...well ... are we free? Burkhard On Thursday, April 21, 2016, Steven D'Aprano wrote: > O Thu, Apr 21, 2016 at 04:00:29AM -0700, Burkhard Meier wrote: > > Well, > > > > Just a thought. > > I'm afraid I have no idea what you are referring to. > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/burkhardameier%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From burkhardameier at gmail.com Thu Apr 21 07:54:32 2016 From: burkhardameier at gmail.com (Burkhard Meier) Date: Thu, 21 Apr 2016 04:54:32 -0700 Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it may well be... Message-ID: Please do allow me to share my humble experiences of being a software professional on a Windows platform. Almost 20 years. You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin' dude kicked me out 5 times in one sole eve, Maybe this is just *me*.. You know what: I did have my time with this *open source community*... I was just asking a sincere question. C'mon This was rather very ridiculous. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Thu Apr 21 08:36:30 2016 From: brian at python.org (Brian Curtin) Date: Thu, 21 Apr 2016 08:36:30 -0400 Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it may well be... In-Reply-To: References: Message-ID: On Thursday, April 21, 2016, Burkhard Meier wrote: > Please do allow me to share my humble experiences of being a software > professional on a Windows platform. > > Almost 20 years. > > You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin' > dude kicked me out 5 times in one sole eve, > > Maybe this is just *me*.. > > You know what: I did have my time with this *open source community*... > > I was just asking a sincere question. > > C'mon > > This was rather very ridiculous. > > > As someone who spent many years as a Windows user and several years as a contributor to the Windows build here, if you have constructive thoughts to share on Python-on-Windows, please share them...but I can't decipher what any of this message is actually about. Additionally, you may want to try the python-list mailing list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Apr 21 08:43:42 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 21 Apr 2016 22:43:42 +1000 Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our Python backup? In-Reply-To: References: <20160421113214.GH1819@ando.pearwood.info> Message-ID: On Thu, Apr 21, 2016 at 9:41 PM, Burkhard Meier wrote: > Don't be afraid. > > This is just CEO talk. > > Let's imagine Python without a leader. > > All commercial companies...well ... are we free? > I still have no clue what you're talking about. Every project has a leader. If Guido dies, goes insane [1], or gets bored with the Python project, someone else can and will take over. ChrisA [1] More than usual, I mean. It has to be bad enough that we'd notice. From ncoghlan at gmail.com Thu Apr 21 09:13:38 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 Apr 2016 23:13:38 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> Message-ID: On 20 April 2016 at 13:16, Stephen J. Turnbull wrote: > What's left is DirEntry (and perhaps other producers of byte-oriented > objects in os and os.path). If they're currently using DirEntry, > they're currently accessing .path. Surely bytes users can continue > doing that, even if we offer str users the advantage of new protocols? > The consuming functions aren't currently allowing DirEntry objects either (since scandir is even newer than pathlib), so we want to allow both pathlib and DirEntry objects with a single change to consuming functions. I'd like to see that change in consuming functions be as simple as possible: an unconditional "path = os._raw_fspath(path)" at the start of their existing input processing Those consuming functions fall into one of three categories: 1. They're bytes/str polymorphic 2. They're bytes only 3. They're str only Whichever category they're in, their existing argument processing will be readily able to cope with a polymorphic result from os._raw_fspath, since that's no different from today, where the argument passed in may be bytes or str and they need to handle that appropriately. Having os.fspath(path) as a specifically str-only layer then gives consuming functions in category 3 an alternative option, and encourages category 3 functions and APIs (like pathlib) as the future default, without getting in the way of the folks that need to mess about down in the low level weeds of operating system interfaces. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Apr 21 09:20:21 2016 From: barry at python.org (Barry Warsaw) Date: Thu, 21 Apr 2016 09:20:21 -0400 Subject: [Python-Dev] When the infamous Bier trunk hits ... where is our Python backup? In-Reply-To: References: <20160421113214.GH1819@ando.pearwood.info> Message-ID: <20160421092021.23d381da@subdivisions.wooz.org> On Apr 21, 2016, at 10:43 PM, Chris Angelico wrote: >I still have no clue what you're talking about. Every project has a >leader. If Guido dies, goes insane [1], or gets bored with the Python >project, someone else can and will take over. Fortunately, the Python Secret Underground (PSU) which most emphatically does not exist, has a succession plan involving three questions and holy gr From ncoghlan at gmail.com Thu Apr 21 09:29:47 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 Apr 2016 23:29:47 +1000 Subject: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath() In-Reply-To: <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> References: <5709309D.8030007@stoneleaf.us> <570A7C67.3010304@stoneleaf.us> <570BCE39.8090306@stoneleaf.us> <570BDB17.5000601@stoneleaf.us> <570BECC6.1080708@stoneleaf.us> <570C12C2.9000602@stoneleaf.us> <570D1F26.5090800@stoneleaf.us> <1460560661.3906642.577656945.5C733153@webmail.messagingengine.com> <570E659C.8010108@stoneleaf.us> <1460566289.3931013.577768921.769A02C3@webmail.messagingengine.com> <22287.16524.453660.299756@turnbull.sk.tsukuba.ac.jp> <22287.44717.25724.851559@turnbull.sk.tsukuba.ac.jp> <22290.8274.88619.67572@turnbull.sk.tsukuba.ac.jp> <22291.17258.728993.780602@turnbull.sk.tsukuba.ac.jp> <22293.13568.684046.249548@turnbull.sk.tsukuba.ac.jp> <22294.6814.42422.31760@turnbull.sk.tsukuba.ac.jp> <22294.62605.745068.363482@turnbull.sk.tsukuba.ac.jp> Message-ID: On 20 April 2016 at 13:16, Stephen J. Turnbull wrote: > It's people who live in monolingual mono-encoding environments who > will be using bytes successfully, and be resistent to costly changes > that don't make their lives better. But the bytes vs. text cost is > inherent in using pathlib, so polymorphism doesn't help promote > pathlib. It might help promote use of os.scandir in bytes-oriented > code, though I don't see that as a huge effect nor more than mildly > desirable. Is it? > Some of us are also interested in optimised network service development use cases where UTF-8 already rules the world [1]. It's a vastly different domain from desktop computing, and different even from traditional stateful servers where the same instance may be kept running for years. When "absolutely everything is UTF-8, and your system boundaries are policed accordingly" is a valid assumption, then writing bytes level network code is a far more viable option than when you're writing software to give to other people to run in arbitrary environments (that's how Go is able to get away with its "all system boundaries use UTF-8" approach - if you're not prepared to meet that precondition, you don't choose to use Go in the first place). I think this is also why we're talking past each other - as a default, I completely agree it makes sense to present a "str-only" API (that's where my proposed fspath/_raw_fspath split came from). However, there really are contexts where "our text is always stored as bytes, those bytes are always UTF-8 encoded, and our software only needs to work on *nix systems" is a reasonable approach, and those are the domains where being *able* to stay entirely in the binary domain is actually a desirable characteristic, rather than merely a tool for migrating from Python 2. Cheers, Nick. [1] http://utf8everywhere.org/ -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at python.ca Thu Apr 21 17:44:52 2016 From: neil at python.ca (Neil Schemenauer) Date: Thu, 21 Apr 2016 14:44:52 -0700 Subject: [Python-Dev] obmalloc mmap/munmap thrashing Message-ID: <20160421214452.GA22080@python.ca> I was running Python 2.4.11 under strace and I noticed some odd looking system calls: mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000 munmap(0x7f9848681000, 262144) = 0 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000 munmap(0x7f9848681000, 262144) = 0 [... repeated a number of times ...] Looking at obmalloc.c, there doesn't seem to be any high/low watermark (hysteresis) associated with unallocating arenas. Is that true? If so, does it seem prudent to implement something to avoid this behavior? It seems potentially expensive if you program is running just at the threshold of needing another arena. From tritium-list at sdamon.com Thu Apr 21 17:55:48 2016 From: tritium-list at sdamon.com (Alexander Walters) Date: Thu, 21 Apr 2016 17:55:48 -0400 Subject: [Python-Dev] obmalloc mmap/munmap thrashing In-Reply-To: <20160421214452.GA22080@python.ca> References: <20160421214452.GA22080@python.ca> Message-ID: <57194C64.8030001@sdamon.com> ...is that a typo for 2.7.11? On 4/21/2016 17:44, Neil Schemenauer wrote: > I was running Python 2.4.11 under strace and I noticed some odd > looking system calls: > > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000 > munmap(0x7f9848681000, 262144) = 0 > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000 > munmap(0x7f9848681000, 262144) = 0 > [... repeated a number of times ...] > > Looking at obmalloc.c, there doesn't seem to be any high/low > watermark (hysteresis) associated with unallocating arenas. Is that > true? If so, does it seem prudent to implement something to avoid > this behavior? It seems potentially expensive if you program is > running just at the threshold of needing another arena. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-list%40sdamon.com From tim.peters at gmail.com Thu Apr 21 17:59:20 2016 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 21 Apr 2016 16:59:20 -0500 Subject: [Python-Dev] obmalloc mmap/munmap thrashing In-Reply-To: <20160421214452.GA22080@python.ca> References: <20160421214452.GA22080@python.ca> Message-ID: You may be interested in this seemingly related bug report: http://bugs.python.org/issue26601 [Neil Schemenauer ] > I was running Python 2.4.11 under strace and I noticed some odd > looking system calls: > > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000 > munmap(0x7f9848681000, 262144) = 0 > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9848681000 > munmap(0x7f9848681000, 262144) = 0 > [... repeated a number of times ...] > > Looking at obmalloc.c, there doesn't seem to be any high/low > watermark (hysteresis) associated with unallocating arenas. Is that > true? If so, does it seem prudent to implement something to avoid > this behavior? It seems potentially expensive if you program is > running just at the threshold of needing another arena. From chris.barker at noaa.gov Thu Apr 21 18:43:08 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 21 Apr 2016 15:43:08 -0700 Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it may well be... In-Reply-To: References: Message-ID: I'm really confused -- you had a handful of very positive responses to your offer to help with Python on Windows. Then a couple off the cuff remarks (at least one of which was serious) about what is often known as "the bus factor": But I think you may want to take into account the history here. This has been talked about A LOT in the Python community for years -- so we may be a bit blase about it. Note that Wikipedia's page on the bus factor: https://en.wikipedia.org/wiki/Bus_factor """An early instance of this sort of query was when Michael McLay publicly asked, in 1994, what would happen to the Python language if Guido van Rossum were hit by a bus. [8] """ So this has been very, very well hashed out in the Python community. And a quick look at the existence of this list, the messages on it, and the source repo will tell you that Python is in no way a personal project of one person. (not to mentions the PSF) I think the lessons here are: - don't be too sensitive and, important for every open source community: - your comments and questions will be taken far more seriously if you have done your homework. -CHB On Thu, Apr 21, 2016 at 4:54 AM, Burkhard Meier wrote: > Please do allow me to share my humble experiences of being a software > professional on a Windows platform. > > Almost 20 years. > > You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin' > dude kicked me out 5 times in one sole eve, > > Maybe this is just *me*.. > > You know what: I did have my time with this *open source community*... > > I was just asking a sincere question. > > C'mon > > This was rather very ridiculous. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Fri Apr 22 09:09:53 2016 From: erik.m.bray at gmail.com (Erik Bray) Date: Fri, 22 Apr 2016 15:09:53 +0200 Subject: [Python-Dev] Issue with DLL import library installation in Cygwin Message-ID: Hi all, I've been working on compiling/installing Python on Cygwin and have hit upon an odd issue in the Makefile that seems to have been around for as long as there's been Cygwin support in it. When building Python on Cygwin, both a libpython-X.Y.dll and a libpython-X.Y.dll.a are created. The latter is an "import library" consisting of stubs for functions in the DLL so that it can be linked to statically when building, for example, extension modules. The odd bit is that in the altbininstall target (see [1]) if the $(DLLLIBRARY) variable is defined then only it is installed, while $(LDLIBRARY) (which in this cases references the import library) is *not* installed, except in $(prefix)/lib/pythonX.Y/config, which is not normally on the linker search path, or even included by python-config --ldflags. Therefore static linking to libpython fails, unless the search path is explicitly modified, or a symlink is created from $(prefix)/lib/pythonX.Y/config/libpython.dll.a to $(prefix)/lib. In fact Cygwin's own package for Python manually creates the latter symlink in its install script. But it's not clear why Python's Makefile doesn't install this file in the first place. In other words, why not install $LDLIBRARY regardless? Thanks, Erik [1] https://hg.python.org/cpython/file/496e094f4734/Makefile.pre.in#l1097 From victor.stinner at gmail.com Fri Apr 22 10:46:12 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 22 Apr 2016 16:46:12 +0200 Subject: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance In-Reply-To: References: <56B3254F.7020605@egenix.com> <56B34A1E.4010501@egenix.com> <56B35AB5.5090308@egenix.com> <56BDDEA3.2060702@egenix.com> Message-ID: Hi, My pull request has been merged into numpy. numpy now uses PyMem_RawMalloc() rather than PyMem_Malloc() since it uses the memory allocator without holding the GIL: https://github.com/numpy/numpy/pull/7404 It was proposed to modify numpy to hold the GIL. Maybe it will be done later. It means that there are no more C extensions known to not use correctly Python memory allocators. So I pushed my change in CPython to use the pymalloc memory allocator in PyMem_Malloc(): https://hg.python.org/cpython/rev/68b2a43d8653 I documented that porting C extensions to Python 3.6 require to run tests with PYTHONMALLOC=debug. This environment variable enables checks at runtime to validate the usage of Python memory allocators, including checks on the GIL. PYTHONMALLOC=debug and the check on the GIL are new in Python 3.6. By the way, I modified the code to log the fatal error. if a buffer overflow/underflow is detected in a free function like PyObject_Free() and tracemalloc is enabled, the traceback where the memory block was allocated is now displayed: https://docs.python.org/dev/whatsnew/3.6.html#pythonmalloc-environment-variable Moreover, the warning logger now also log where file, socket, etc. were allocated on ResourceWarning: https://docs.python.org/dev/whatsnew/3.6.html#warnings It looks like Python 3.6 will help developers ;-) Victor 2016-04-20 1:33 GMT+02:00 Victor Stinner : > Ping? Is someone still opposed to my change #26249 "Change > PyMem_Malloc to use pymalloc allocator"? If no, I think that I will > push my change. > > My change only changes two lines, so it can be easily reverted before > CPython 3.6 if we detect major issues in third-party extensions. And > maybe it's better to push such change today to get more time to play > with it, than pushing it late in the development of CPython 3.6. > > The new PYTHONMALLOC=debug feature allows to quickly and easily check > the usage of the PyMem_Malloc() API, even if Python is compiled in > release mode. > > I checked multiple Python extensions written in C. I only found one > bug in numpy and I sent a patch (not merged yet). > > victor > > 2016-03-15 0:19 GMT+01:00 Victor Stinner : >> 2016-02-12 14:31 GMT+01:00 M.-A. Lemburg : >>>>> If your program has bugs, you can use a debug build of Python 3.5 to >>>>> detect misusage of the API. >>> >>> Yes, but people don't necessarily do this, e.g. I have >>> for a very long time ignored debug builds completely >>> and when I started to try them, I found that some of the >>> things I had been doing with e.g. free list implementations >>> did not work in debug builds. >> >> I just added support for debug hooks on Python memory allocators on >> Python compiled in *release* mode. Set the environment variable >> PYTHONMALLOC to debug to try with Python 3.6. >> >> I added a check on PyObject_Malloc() debug hook to ensure that the >> function is called with the GIL held. I opened an issue to add a >> similar check on PyMem_Malloc(): >> https://bugs.python.org/issue26563 >> >> >>> Yes, but those are part of the stdlib. You'd need to check >>> a few C extensions which are not tested as part of the stdlib, >>> e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom >>> types in C since these will often need the memory management >>> APIs). >>> >>> It may also be a good idea to check wrapper generators such >>> as cython, swig, cffi, etc. >> >> I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi). >> >> I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL: >> https://github.com/numpy/numpy/pull/7404 >> >> Except of this bug, all other tests pass with PyMem_Malloc() using >> pymalloc and all debug checks. >> >> Victor From status at bugs.python.org Fri Apr 22 12:08:42 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 22 Apr 2016 18:08:42 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160422160842.D4D7956645@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-04-15 - 2016-04-22) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5491 ( +2) closed 33095 (+56) total 38586 (+58) Open issues with patches: 2384 Issues opened (41) ================== #26773: Shelve works inconsistently when carried over to child process http://bugs.python.org/issue26773 opened by Paul Ellenbogen #26774: Elide Py_atomic fences when WITH_THREAD is disabled? http://bugs.python.org/issue26774 opened by larry #26776: Determining the failure of C API call is ambiguous http://bugs.python.org/issue26776 opened by serhiy.storchaka #26779: pdb continue followed by an exception in the same frame shows http://bugs.python.org/issue26779 opened by Sriram Rajagopalan #26781: os.walk max_depth http://bugs.python.org/issue26781 opened by palaviv #26786: bdist_msi duplicates directories with names in ALL CAPS to a b http://bugs.python.org/issue26786 opened by Ivan.Pozdeev #26787: test_distutils fails when configured --with-lto http://bugs.python.org/issue26787 opened by gregory.p.smith #26788: test_gdb fails all tests on a profile-opt build configured --w http://bugs.python.org/issue26788 opened by gregory.p.smith #26789: Please do not log during shutdown http://bugs.python.org/issue26789 opened by smurfix #26790: bdist_msi package duplicates everything to a bogus location wh http://bugs.python.org/issue26790 opened by Ivan.Pozdeev #26791: shutil.move fails to move symlink (Invalid cross-device link) http://bugs.python.org/issue26791 opened by Unode #26792: docstrings of runpy.run_{module,path} are rather sparse http://bugs.python.org/issue26792 opened by Antony.Lee #26793: uuid causing thread issues when forking using os.fork py3.4+ http://bugs.python.org/issue26793 opened by Steven Adams #26794: curframe can be None in pdb.py http://bugs.python.org/issue26794 opened by Jacek.Pliszka #26796: BaseEventLoop.run_in_executor shouldn't specify max_workers fo http://bugs.python.org/issue26796 opened by Hans Lawrenz #26797: Segafault in _PyObject_Alloc http://bugs.python.org/issue26797 opened by yselivanov #26798: add BLAKE2 to hashlib http://bugs.python.org/issue26798 opened by Zooko.Wilcox-O'Hearn #26800: Don't accept bytearray as filenames part 2 http://bugs.python.org/issue26800 opened by pjenvey #26801: Fix shutil.get_terminal_size() to catch AttributeError http://bugs.python.org/issue26801 opened by ebarry #26803: syslog logging handler fails with address in unix abstract nam http://bugs.python.org/issue26803 opened by xdegaye #26804: Prioritize lowercase proxy variables in urllib.request http://bugs.python.org/issue26804 opened by frispete #26806: IDLE not displaying RecursionError tracebacks http://bugs.python.org/issue26806 opened by terry.reedy #26807: mock_open()().readline() fails at EOF http://bugs.python.org/issue26807 opened by rbcollins #26809: `string` exposes ChainMap from `collections` http://bugs.python.org/issue26809 opened by leewz #26810: inconsistent garbage collector behavior across platforms when http://bugs.python.org/issue26810 opened by unsec treedee #26811: segfault due to null pointer in tuple http://bugs.python.org/issue26811 opened by random832 #26812: ExtendedInterpolation drops user-defined 'vars' during _interp http://bugs.python.org/issue26812 opened by yab-arz #26814: [WIP] Add a new _PyObject_FastCall() function which avoids the http://bugs.python.org/issue26814 opened by haypo #26815: SIGBUS in test_ssl.test_dealloc_warn() on "AMD64 FreeBSD 10.0 http://bugs.python.org/issue26815 opened by haypo #26816: Make concurrent.futures.Executor an abc http://bugs.python.org/issue26816 opened by xiang.zhang #26817: Docs for StringIO should link to io.BytesIO http://bugs.python.org/issue26817 opened by guettli #26818: trace CLI doesn't respect -s option http://bugs.python.org/issue26818 opened by berker.peksag #26819: _ProactorReadPipeTransport pause_reading()/resume_reading() br http://bugs.python.org/issue26819 opened by Fulvio Esposito #26820: Prevent uses of format string based PyObject_Call* that do not http://bugs.python.org/issue26820 opened by josh.r #26822: itemgetter/attrgetter/methodcaller objects ignore keyword argu http://bugs.python.org/issue26822 opened by serhiy.storchaka #26823: Shrink recursive tracebacks http://bugs.python.org/issue26823 opened by ebarry #26824: Make some macros use Py_TYPE http://bugs.python.org/issue26824 opened by xiang.zhang #26826: Expose new copy_file_range() syscal in os module and use it to http://bugs.python.org/issue26826 opened by StyXman #26827: PyObject *PyInit_myextention -> PyMODINIT_FUNC PyInit_myextent http://bugs.python.org/issue26827 opened by prinsherbert #26828: Implement __length_hint__() on map() and filter() to optimize http://bugs.python.org/issue26828 opened by haypo #26829: update docs: when creating classes a new dict is created for t http://bugs.python.org/issue26829 opened by ethan.furman Most recent 15 issues with no replies (15) ========================================== #26829: update docs: when creating classes a new dict is created for t http://bugs.python.org/issue26829 #26819: _ProactorReadPipeTransport pause_reading()/resume_reading() br http://bugs.python.org/issue26819 #26818: trace CLI doesn't respect -s option http://bugs.python.org/issue26818 #26817: Docs for StringIO should link to io.BytesIO http://bugs.python.org/issue26817 #26812: ExtendedInterpolation drops user-defined 'vars' during _interp http://bugs.python.org/issue26812 #26794: curframe can be None in pdb.py http://bugs.python.org/issue26794 #26792: docstrings of runpy.run_{module,path} are rather sparse http://bugs.python.org/issue26792 #26790: bdist_msi package duplicates everything to a bogus location wh http://bugs.python.org/issue26790 #26789: Please do not log during shutdown http://bugs.python.org/issue26789 #26786: bdist_msi duplicates directories with names in ALL CAPS to a b http://bugs.python.org/issue26786 #26779: pdb continue followed by an exception in the same frame shows http://bugs.python.org/issue26779 #26771: python-config.sh.in INCDIR does not match python version if ex http://bugs.python.org/issue26771 #26767: Inconsistant error messages for failed attribute modification http://bugs.python.org/issue26767 #26752: Mock(2.0.0).assert_has_calls() raise AssertionError in two sam http://bugs.python.org/issue26752 #26750: Mock autospec does not work with subclasses of property() http://bugs.python.org/issue26750 Most recent 15 issues waiting for review (15) ============================================= #26824: Make some macros use Py_TYPE http://bugs.python.org/issue26824 #26823: Shrink recursive tracebacks http://bugs.python.org/issue26823 #26822: itemgetter/attrgetter/methodcaller objects ignore keyword argu http://bugs.python.org/issue26822 #26818: trace CLI doesn't respect -s option http://bugs.python.org/issue26818 #26816: Make concurrent.futures.Executor an abc http://bugs.python.org/issue26816 #26814: [WIP] Add a new _PyObject_FastCall() function which avoids the http://bugs.python.org/issue26814 #26811: segfault due to null pointer in tuple http://bugs.python.org/issue26811 #26809: `string` exposes ChainMap from `collections` http://bugs.python.org/issue26809 #26804: Prioritize lowercase proxy variables in urllib.request http://bugs.python.org/issue26804 #26803: syslog logging handler fails with address in unix abstract nam http://bugs.python.org/issue26803 #26801: Fix shutil.get_terminal_size() to catch AttributeError http://bugs.python.org/issue26801 #26796: BaseEventLoop.run_in_executor shouldn't specify max_workers fo http://bugs.python.org/issue26796 #26787: test_distutils fails when configured --with-lto http://bugs.python.org/issue26787 #26786: bdist_msi duplicates directories with names in ALL CAPS to a b http://bugs.python.org/issue26786 #26781: os.walk max_depth http://bugs.python.org/issue26781 Top 10 most discussed issues (10) ================================= #26801: Fix shutil.get_terminal_size() to catch AttributeError http://bugs.python.org/issue26801 16 msgs #26814: [WIP] Add a new _PyObject_FastCall() function which avoids the http://bugs.python.org/issue26814 16 msgs #26824: Make some macros use Py_TYPE http://bugs.python.org/issue26824 15 msgs #26809: `string` exposes ChainMap from `collections` http://bugs.python.org/issue26809 14 msgs #26601: Use new madvise()'s MADV_FREE on the private heap http://bugs.python.org/issue26601 13 msgs #26803: syslog logging handler fails with address in unix abstract nam http://bugs.python.org/issue26803 13 msgs #26811: segfault due to null pointer in tuple http://bugs.python.org/issue26811 13 msgs #26793: uuid causing thread issues when forking using os.fork py3.4+ http://bugs.python.org/issue26793 12 msgs #26804: Prioritize lowercase proxy variables in urllib.request http://bugs.python.org/issue26804 10 msgs #26058: PEP 509: Add ma_version to PyDictObject http://bugs.python.org/issue26058 9 msgs Issues closed (52) ================== #4806: Function calls taking a generator as star argument can mask Ty http://bugs.python.org/issue4806 closed by martin.panter #7694: DeprecationWarnings in distutils are pointless http://bugs.python.org/issue7694 closed by berker.peksag #8978: "tarfile.ReadError: file could not be opened successfully" if http://bugs.python.org/issue8978 closed by lars.gustaebel #9317: Incorrect coverage file from trace test_pickle.py http://bugs.python.org/issue9317 closed by berker.peksag #10261: tarfile iterator without members caching http://bugs.python.org/issue10261 closed by lars.gustaebel #13876: Sporadic failure in test_socket: testRecvmsgEOF http://bugs.python.org/issue13876 closed by berker.peksag #15933: flaky test in test_datetime http://bugs.python.org/issue15933 closed by berker.peksag #17859: improve error message for saving ints to file http://bugs.python.org/issue17859 closed by serhiy.storchaka #18591: threading.Thread.run returning a result http://bugs.python.org/issue18591 closed by berker.peksag #20739: PEP 463 (except expression) implementation http://bugs.python.org/issue20739 closed by berker.peksag #21668: The select and time modules uses libm functions without linkin http://bugs.python.org/issue21668 closed by haypo #22625: When cross-compiling, don???t try to execute binaries http://bugs.python.org/issue22625 closed by martin.panter #22873: Re: SSLsocket.getpeercert - return ALL the fields of the certi http://bugs.python.org/issue22873 closed by berker.peksag #23029: test_warnings produces extra output in quiet mode http://bugs.python.org/issue23029 closed by berker.peksag #23251: mention in time.sleep() docs that it does not block other Pyth http://bugs.python.org/issue23251 closed by berker.peksag #24173: curses HOWTO/implementation disagreement http://bugs.python.org/issue24173 closed by berker.peksag #24838: tarfile.py: fix GNU and USTAR formats to properly handle paths http://bugs.python.org/issue24838 closed by lars.gustaebel #24922: assertWarnsRegex doesn't allow multiple warning messages http://bugs.python.org/issue24922 closed by berker.peksag #25314: Documentation: argparse's actions store_{true,false} default t http://bugs.python.org/issue25314 closed by martin.panter #25642: Setting maxsize breaks asyncio.JoinableQueue/Queue http://bugs.python.org/issue25642 closed by berker.peksag #25989: documentation version switcher is broken fro 2.6, 3.2, 3.3 http://bugs.python.org/issue25989 closed by berker.peksag #26535: Minor typo in the docs for struct.unpack http://bugs.python.org/issue26535 closed by martin.panter #26615: Missing entry in WRAPPER_ASSIGNMENTS in update_wrapper's doc http://bugs.python.org/issue26615 closed by berker.peksag #26657: Directory traversal with http.server and SimpleHTTPServer on w http://bugs.python.org/issue26657 closed by martin.panter #26659: slice() leaks memory when part of a cycle http://bugs.python.org/issue26659 closed by python-dev #26717: wsgiref.simple_server: mojibake with cp1252 bytes in PATH_INFO http://bugs.python.org/issue26717 closed by martin.panter #26720: memoryview from BufferedWriter becomes garbage http://bugs.python.org/issue26720 closed by martin.panter #26745: Redundant code in _PyObject_GenericSetAttrWithDict http://bugs.python.org/issue26745 closed by serhiy.storchaka #26751: Possible bug in sorting algorithm http://bugs.python.org/issue26751 closed by benjamin.peterson #26755: Update version{added,changed} docs in devguide http://bugs.python.org/issue26755 closed by berker.peksag #26760: Document PyFrameObject http://bugs.python.org/issue26760 closed by brett.cannon #26763: Update PEP-8 regarding binary operators http://bugs.python.org/issue26763 closed by gvanrossum #26766: The result type of bytearray formatting is not stable http://bugs.python.org/issue26766 closed by berker.peksag #26770: _Py_set_inheritable(): do nothing if the FD_CLOEXEC close is a http://bugs.python.org/issue26770 closed by haypo #26772: regex.ENHANCEMATCH crashes interpreter http://bugs.python.org/issue26772 closed by SilentGhost #26775: Improve test coverage on urllib.parse http://bugs.python.org/issue26775 closed by orsenthil #26777: test_asyncio: test_timeout_disable() fails randomly http://bugs.python.org/issue26777 closed by haypo #26778: More typo fixes http://bugs.python.org/issue26778 closed by serhiy.storchaka #26780: Illustrate both binary operator conventions in PEP-8 http://bugs.python.org/issue26780 closed by orsenthil #26782: subprocess.__all__ incomplete on Windows http://bugs.python.org/issue26782 closed by martin.panter #26783: test_os.WalkTests.test_walk_topdown don't test fwalk and Bytes http://bugs.python.org/issue26783 closed by serhiy.storchaka #26784: regular expression problem at umlaut handling http://bugs.python.org/issue26784 closed by serhiy.storchaka #26785: repr of -nan value should contain the sign http://bugs.python.org/issue26785 closed by mark.dickinson #26795: Fix PEP 344 Python version http://bugs.python.org/issue26795 closed by SilentGhost #26799: gdb support fails with "Invalid cast." http://bugs.python.org/issue26799 closed by haypo #26802: Avoid copy in call_function_var when no extra stack args are p http://bugs.python.org/issue26802 closed by serhiy.storchaka #26805: Refer to types.SimpleNamespace in namedtuple documentation http://bugs.python.org/issue26805 closed by paul.moore #26808: wsgiref.simple_server breaks unicode in URIs http://bugs.python.org/issue26808 closed by martin.panter #26813: Wrong Japanese translation of "Adverb" on Documentation http://bugs.python.org/issue26813 closed by benjamin.peterson #26821: array module "minimum size in bytes" table is wrong for int/lo http://bugs.python.org/issue26821 closed by georg.brandl #26825: Variable defined in exec(code) unreachable inside function cal http://bugs.python.org/issue26825 closed by eryksun #1612012: builtin compile() doc needs PyCF_DONT_IMPLY_DEDENT http://bugs.python.org/issue1612012 closed by berker.peksag From burkhardameier at gmail.com Fri Apr 22 16:12:19 2016 From: burkhardameier at gmail.com (Burkhard Meier) Date: Fri, 22 Apr 2016 13:12:19 -0700 Subject: [Python-Dev] I hope this won't be my last comment here ~ yet it may well be... In-Reply-To: References: Message-ID: Ok. no more ellipses...what I was trying to share is an unhappy experience I had with the open source Linux community. I am sure this will not happen on this Python Dev list of professionals. Please ignore my comments. >From now on I will focus on contributing to Python (especially on a Windows platform) and not taking up valuable reading time. Burkhard On Thu, Apr 21, 2016 at 3:43 PM, Chris Barker wrote: > > I'm really confused -- you had a handful of very positive responses to > your offer to help with Python on Windows. > > Then a couple off the cuff remarks (at least one of which was serious) > about what is often known as "the bus factor": > > But I think you may want to take into account the history here. This has > been talked about A LOT in the Python community for years -- so we may be a > bit blase about it. Note that Wikipedia's page on the bus factor: > > https://en.wikipedia.org/wiki/Bus_factor > > """An early instance of this sort of query was when Michael McLay publicly > asked, in 1994, what would happen to the Python language > if Guido > van Rossum were hit by a > bus.[8] """ > > So this has been very, very well hashed out in the Python community. > > And a quick look at the existence of this list, the messages on it, and > the source repo will tell you that Python is in no way a personal project > of one person. (not to mentions the PSF) > > I think the lessons here are: > > - don't be too sensitive > > and, important for every open source community: > > - your comments and questions will be taken far more seriously if you have > done your homework. > > -CHB > > > On Thu, Apr 21, 2016 at 4:54 AM, Burkhard Meier > wrote: > >> Please do allow me to share my humble experiences of being a software >> professional on a Windows platform. >> >> Almost 20 years. >> >> You know what; when I tried out 'sugar Linux' or Peppermint,,,the "admin' >> dude kicked me out 5 times in one sole eve, >> >> Maybe this is just *me*.. >> >> You know what: I did have my time with this *open source community*... >> >> I was just asking a sincere question. >> >> C'mon >> >> This was rather very ridiculous. >> >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Apr 23 11:59:23 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 23 Apr 2016 18:59:23 +0300 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: On 13.04.16 19:33, Guido van Rossum wrote: > Nice work. I think that for CPython, speed is much more important than > memory use for the code. Disk space is practically free for anything > smaller than a video. :-) I collected statistics for use opcodes with different arguments during running CPython tests. Estimated size with using wordcode is 1.33 times less than with using current bytecode. [1] http://comments.gmane.org/gmane.comp.python.ideas/38293 From xdegaye at gmail.com Sun Apr 24 03:20:08 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Sun, 24 Apr 2016 09:20:08 +0200 Subject: [Python-Dev] support of the android platform Message-ID: <571C73A8.1030908@gmail.com> Starting with API level 21 (Android 5.0), the build of python3 with the official android toolchains (that is, without resorting to external libraries for wide character support) runs correctly. With the set of patches described in the patches/Makefile file at [1], the cpython test suite runs[2] on the android x86 and armv7 emulators with only few errors[3]. Those errors are listed with their corresponding error messages, this may give a raw idea of the effort needed to support this platform. Xavier [1] https://bitbucket.org/xdegaye/pyona/src [2] To reproduce these results, follow the instructions found in INSTALL at https://bitbucket.org/xdegaye/pyona/wiki/install [3] https://bitbucket.org/xdegaye/pyona/wiki/testsuite From stefan at bytereef.org Sun Apr 24 05:50:10 2016 From: stefan at bytereef.org (Stefan Krah) Date: Sun, 24 Apr 2016 09:50:10 +0000 (UTC) Subject: [Python-Dev] support of the android platform References: <571C73A8.1030908@gmail.com> Message-ID: Xavier de Gaye gmail.com> writes: > Starting with API level 21 (Android 5.0), the build of python3 with the > official android toolchains (that is, without resorting to external libraries > for wide character support) runs correctly. With the set of patches described > in the patches/Makefile file at [1], the cpython test suite runs[2] on the > android x86 and armv7 emulators with only few errors[3]. Those errors are > listed with their corresponding error messages, this may give a raw idea of > the effort needed to support this platform. > > Xavier > > [1] https://bitbucket.org/xdegaye/pyona/src > [2] To reproduce these results, follow the instructions found in INSTALL > at https://bitbucket.org/xdegaye/pyona/wiki/install > [3] https://bitbucket.org/xdegaye/pyona/wiki/testsuite This looks great, very clean! As I understand the patches, the locale.h and langinfo.h problems are solved. Do you think the following issues on the Python bug tracker could be closed? http://bugs.python.org/issue20305 http://bugs.python.org/issue22747 http://bugs.python.org/issue17905 Stefan Krah From raymond.hettinger at gmail.com Sun Apr 24 15:45:15 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 24 Apr 2016 12:45:15 -0700 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: > On Apr 23, 2016, at 8:59 AM, Serhiy Storchaka wrote: > > I collected statistics for use opcodes with different arguments during running CPython tests. Estimated size with using wordcode is 1.33 times less than with using current bytecode. > > [1] http://comments.gmane.org/gmane.comp.python.ideas/38293 I think the word code patch should go in sooner rather than later. Several of us have been through the patch and it is in pretty good shape (some parts still need work though). The earlier this goes in, the more time we'll have to shake out any unexpected secondary effects. perfect-is-the-enemy-of-good-ly yours, Raymond P.S. The patch is smaller, more tractable, and in better shape than the C version of OrderedDict was when it went in. From victor.stinner at gmail.com Sun Apr 24 16:16:35 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 24 Apr 2016 22:16:35 +0200 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: Hi Raymond, 2016-04-24 21:45 GMT+02:00 Raymond Hettinger : > I think the word code patch should go in sooner rather than later. Several of us have been through the patch and it is in pretty good shape (some parts still need work though). The earlier this goes in, the more time we'll have to shake out any unexpected secondary effects. Yury Selivanov and Serhiy Storchaka told me that they will review shortly the patch. I give them one more week and then I will push the patch. I agree that the patch is in a good shape. I reviewed first versions of the change. I pushed some minor and obvious changes. I also asked to revert unrelated changes. I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a single 16-bit operation. It should be easy to implement it later, but I prefer to focus on changing the format of the bytecode. Victor From raymond.hettinger at gmail.com Sun Apr 24 17:16:25 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 24 Apr 2016 14:16:25 -0700 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: Message-ID: <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com> > On Apr 24, 2016, at 1:16 PM, Victor Stinner wrote: > > I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a > single 16-bit operation. It should be easy to implement it later, but > I prefer to focus on changing the format of the bytecode. Improving instruction decoding was the whole point and it was what kicked-off the work on the patch. It is also where most of the performance improvement comes from and isn't the difficult part of the patch. The persnickety parts of the patch lay elsewhere, so there is really nothing to be gained gutting out our actual objective. The OPs original patch had already gotten this part done and it ran fine for me. Raymond From victor.stinner at gmail.com Sun Apr 24 17:31:44 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 24 Apr 2016 23:31:44 +0200 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com> References: <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com> Message-ID: 2016-04-24 23:16 GMT+02:00 Raymond Hettinger : >> On Apr 24, 2016, at 1:16 PM, Victor Stinner wrote: >> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a >> single 16-bit operation. It should be easy to implement it later, but >> I prefer to focus on changing the format of the bytecode. > > Improving instruction decoding was the whole point and it was what kicked-off the work on the patch. It is also where most of the performance improvement comes from and isn't the difficult part of the patch. The persnickety parts of the patch lay elsewhere, so there is really nothing to be gained gutting out our actual objective. > > The OPs original patch had already gotten this part done and it ran fine for me. Oh wait, my phrasing is unclear. I do want optimize the (opcode, oparg) fetch, I just suggested to split the patch in two parts, and first review carefully the first part. Victor From raymond.hettinger at gmail.com Mon Apr 25 02:51:51 2016 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 24 Apr 2016 23:51:51 -0700 Subject: [Python-Dev] Wordcode: new regular bytecode using 16-bit units In-Reply-To: References: <4D8D6768-F161-435A-9176-05BDAD316105@gmail.com> Message-ID: <5C78B40A-B9A1-4343-8104-7E946674A858@gmail.com> > On Apr 24, 2016, at 2:31 PM, Victor Stinner wrote: > > 2016-04-24 23:16 GMT+02:00 Raymond Hettinger : >>> On Apr 24, 2016, at 1:16 PM, Victor Stinner wrote: >>> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a >>> single 16-bit operation. It should be easy to implement it later, but >>> I prefer to focus on changing the format of the bytecode. >> >> Improving instruction decoding was the whole point and it was what kicked-off the work on the patch. It is also where most of the performance improvement comes from and isn't the difficult part of the patch. The persnickety parts of the patch lay elsewhere, so there is really nothing to be gained gutting out our actual objective. >> >> The OPs original patch had already gotten this part done and it ran fine for me. > > Oh wait, my phrasing is unclear. I do want optimize the (opcode, > oparg) fetch, I just suggested to split the patch in two parts, and > first review carefully the first part. Unless it is presenting a tough review challenge, we should do whatever we can to make it easier on the OP who seems to be working with very limited computational resources (I had to run the benchmarks for him because his setup lacked the requisite resources). He's already put a lot of work into the patch which is pretty good shape when it arrived. The opcode/oparg fetch logic is mostly already isolated to the part of the patch that touches ceval.c. I found that part to be relatively clean and clear. The part that took the most time to go through was for peephole.c. How about we let Yury and Serhiy take a pass at it as is. And, if they would benefit from splitting the patch into parts, then perhaps one of us with better tooling can pitch in to the help the OP. Raymond From xdegaye at gmail.com Mon Apr 25 04:11:38 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 25 Apr 2016 10:11:38 +0200 Subject: [Python-Dev] support of the android platform In-Reply-To: References: <571C73A8.1030908@gmail.com> Message-ID: <571DD13A.6070401@gmail.com> On 04/24/2016 11:50 AM, Stefan Krah wrote: > Xavier de Gaye gmail.com> writes: >> Starting with API level 21 (Android 5.0), the build of python3 with the >> official android toolchains (that is, without resorting to external libraries >> for wide character support) runs correctly. With the set of patches described >> in the patches/Makefile file at [1], the cpython test suite runs[2] on the >> android x86 and armv7 emulators with only few errors[3]. Those errors are >> listed with their corresponding error messages, this may give a raw idea of >> the effort needed to support this platform. >> >> Xavier >> >> [1] https://bitbucket.org/xdegaye/pyona/src >> [2] To reproduce these results, follow the instructions found in INSTALL >> at https://bitbucket.org/xdegaye/pyona/wiki/install >> [3] https://bitbucket.org/xdegaye/pyona/wiki/testsuite > > > This looks great, very clean! As I understand the patches, the locale.h and > langinfo.h problems are solved. Do you think the following issues on the > Python bug tracker could be closed? > > > http://bugs.python.org/issue20305 > http://bugs.python.org/issue22747 > http://bugs.python.org/issue17905 Thanks. A fix is still needed because Android does not HAVE_LANGINFO_H. I have tried to answer your question directly in those issues. Xavier From ericsnowcurrently at gmail.com Mon Apr 25 10:36:34 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 25 Apr 2016 08:36:34 -0600 Subject: [Python-Dev] support of the android platform In-Reply-To: <571C73A8.1030908@gmail.com> References: <571C73A8.1030908@gmail.com> Message-ID: On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye wrote: > Starting with API level 21 (Android 5.0), the build of python3 with the > official android toolchains (that is, without resorting to external > libraries > for wide character support) runs correctly. With the set of patches > described > in the patches/Makefile file at [1], the cpython test suite runs[2] on the > android x86 and armv7 emulators with only few errors[3]. Those errors are > listed with their corresponding error messages, this may give a raw idea of > the effort needed to support this platform. How does this relate to http://bugs.python.org/issue23496? -eric From stefan at bytereef.org Mon Apr 25 10:53:07 2016 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 25 Apr 2016 14:53:07 +0000 (UTC) Subject: [Python-Dev] support of the android platform References: <571C73A8.1030908@gmail.com> Message-ID: Eric Snow gmail.com> writes: > On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye gmail.com> wrote: > > Starting with API level 21 (Android 5.0), the build of python3 with the > > official android toolchains (that is, without resorting to external > How does this relate to http://bugs.python.org/issue23496? As I understand, that issue seems abandoned and the patches are (despite core devs asking otherwise) against 3.4. If Xavier is willing to do so, I think it would be best to start over with a new issue that integrates his work into 3.6. Stefan Krah From xdegaye at gmail.com Mon Apr 25 16:22:56 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 25 Apr 2016 22:22:56 +0200 Subject: [Python-Dev] support of the android platform In-Reply-To: References: <571C73A8.1030908@gmail.com> Message-ID: <571E7CA0.6030703@gmail.com> On 04/25/2016 04:36 PM, Eric Snow wrote: > On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye wrote: >> Starting with API level 21 (Android 5.0), the build of python3 with the >> official android toolchains (that is, without resorting to external >> libraries >> for wide character support) runs correctly. With the set of patches >> described >> in the patches/Makefile file at [1], the cpython test suite runs[2] on the >> android x86 and armv7 emulators with only few errors[3]. Those errors are >> listed with their corresponding error messages, this may give a raw idea of >> the effort needed to support this platform. > > How does this relate to http://bugs.python.org/issue23496? The patches in issue 23496 address the native compilation of Android 4.4.2 on an android device using a port of gcc on this device. Some of these patches are not needed anymore on Android 5.0 and it seems that the kbox_fix.patch is needed because the KBOX application is used to build python in issue 23496. The existing issues that are relevant to the android platform are, I think: issue #26723: Add an option to skip _decimal module issue #22747: Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined issue #16353: add function to os module for getting path to default shell issue #20306: Lack of pw_gecos field in Android's struct passwd causes cross-compilation for the pwd module to fail Xavier From xdegaye at gmail.com Mon Apr 25 16:25:55 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 25 Apr 2016 22:25:55 +0200 Subject: [Python-Dev] support of the android platform In-Reply-To: References: <571C73A8.1030908@gmail.com> Message-ID: <571E7D53.2000809@gmail.com> On 04/25/2016 04:53 PM, Stefan Krah wrote: > Eric Snow gmail.com> writes: >> On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye gmail.com> > wrote: >>> Starting with API level 21 (Android 5.0), the build of python3 with the >>> official android toolchains (that is, without resorting to external > >> How does this relate to http://bugs.python.org/issue23496? > > As I understand, that issue seems abandoned and the patches are > (despite core devs asking otherwise) against 3.4. > > > If Xavier is willing to do so, I think it would be best to start over > with a new issue that integrates his work into 3.6. I will enter a new issue that lists all the new issues and the other already existing issues that, would have they been fixed, would have allowed a successfull cross-build and the same test suite results as described in my previous post. Xavier From kennethjwright at yahoo.co.uk Mon Apr 25 16:51:03 2016 From: kennethjwright at yahoo.co.uk (Kenny) Date: Mon, 25 Apr 2016 21:51:03 +0100 Subject: [Python-Dev] thingy Message-ID: Dear thingy, Please replace me with DZWORD. Put in HKEY\SYSTEM_IO_MEMORY\%USB%\%DZWORD%\%ADD\%CDATA\%DATA\ FI thingy Sent from Samsung Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennethjwright at yahoo.co.uk Mon Apr 25 17:15:07 2016 From: kennethjwright at yahoo.co.uk (Kenny) Date: Mon, 25 Apr 2016 22:15:07 +0100 Subject: [Python-Dev] Terminal console Message-ID: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> fopen Terminal.app.python. 3.5.0.() def fopen Termina.app.python.3.5.0.() %add.%data(CDATA[])::true||false fclose(); end Terminal.app.python.3.5.0.() Yours thingy Sent from Samsung Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Apr 25 17:27:48 2016 From: brett at python.org (Brett Cannon) Date: Mon, 25 Apr 2016 21:27:48 +0000 Subject: [Python-Dev] Terminal console In-Reply-To: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> Message-ID: Can someone disable this person's subscription? On Mon, 25 Apr 2016 at 14:15 Kenny via Python-Dev wrote: > > fopen Terminal.app.python. > 3.5.0.() > > def fopen Termina.app.python.3.5.0.() > > %add.%data(CDATA[])::true||false > > fclose(); > > end Terminal.app.python.3.5.0.() > > Yours thingy > > > Sent from Samsung Mobile > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Apr 25 17:33:43 2016 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 25 Apr 2016 16:33:43 -0500 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> Message-ID: [Brett Cannon ] > Can someone disable this person's subscription? Done. > On Mon, 25 Apr 2016 at 14:15 Kenny via Python-Dev > wrote: >> >> >> fopen Terminal.app.python. >> 3.5.0.() >> >> def fopen Termina.app.python.3.5.0.() >> >> %add.%data(CDATA[])::true||false >> >> fclose(); >> >> end Terminal.app.python.3.5.0.() >> >> Yours thingy >> >> >> Sent from Samsung Mobile >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/brett%40python.org From mail at timgolden.me.uk Mon Apr 25 17:37:37 2016 From: mail at timgolden.me.uk (Tim Golden) Date: Mon, 25 Apr 2016 22:37:37 +0100 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> Message-ID: <571E8E21.7020600@timgolden.me.uk> Not subscribed; probably via gmane. I've added him to a hold list via spam filter. See if that works. TJG On 25/04/2016 22:27, Brett Cannon wrote: > Can someone disable this person's subscription? > > On Mon, 25 Apr 2016 at 14:15 Kenny via Python-Dev > wrote: > > > fopen Terminal.app.python. > 3.5.0.() > > def fopen Termina.app.python.3.5.0.() > > %add.%data(CDATA[])::true||false > > fclose(); > > end Terminal.app.python.3.5.0.() > > Yours thingy > > > Sent from Samsung Mobile > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mail%40timgolden.me.uk > From tim.peters at gmail.com Mon Apr 25 17:43:07 2016 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 25 Apr 2016 16:43:07 -0500 Subject: [Python-Dev] Terminal console In-Reply-To: <571E8E21.7020600@timgolden.me.uk> References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: [Tim Golden , on Kenny the "thingy" guy] > Not subscribed; probably via gmane. They were subscribed, but I already did the unsub. > I've added him to a hold list via spam filter. See if that works. So now we're doubly safe ;-) From brett at python.org Mon Apr 25 17:48:04 2016 From: brett at python.org (Brett Cannon) Date: Mon, 25 Apr 2016 21:48:04 +0000 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: On Mon, 25 Apr 2016 at 14:45 Tim Peters wrote: > [Tim Golden , on Kenny the "thingy" guy] > > Not subscribed; probably via gmane. > > They were subscribed, but I already did the unsub. > > > > I've added him to a hold list via spam filter. See if that works. > > So now we're doubly safe ;-) > Well, now I just received an attempted unsubscribe, so maybe safe from more email to the list, but it looks like a start at harassment of me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oreilldf at gmail.com Mon Apr 25 18:02:20 2016 From: oreilldf at gmail.com (Dan O'Reilly) Date: Mon, 25 Apr 2016 22:02:20 +0000 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: Brett, your initial email shows up in Google Inbox (and maybe Gmail, too) like this (including the ellipses): *Can someone disable this person's subscription?...* * Unsubscribe: from python-dev here>...* So someone might have mistakenly clicked that link, thinking they were helping to remove Kenny's subscription, for what it's worth. On Mon, Apr 25, 2016 at 5:48 PM Brett Cannon wrote: > On Mon, 25 Apr 2016 at 14:45 Tim Peters wrote: > >> [Tim Golden , on Kenny the "thingy" guy] >> > Not subscribed; probably via gmane. >> >> They were subscribed, but I already did the unsub. >> >> >> > I've added him to a hold list via spam filter. See if that works. >> >> So now we're doubly safe ;-) >> > > Well, now I just received an attempted unsubscribe, so maybe safe from > more email to the list, but it looks like a start at harassment of me. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/oreilldf%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Apr 25 18:07:31 2016 From: brett at python.org (Brett Cannon) Date: Mon, 25 Apr 2016 22:07:31 +0000 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: On Mon, 25 Apr 2016 at 15:02 Dan O'Reilly wrote: > Brett, your initial email shows up in Google Inbox (and maybe Gmail, too) > like this (including the ellipses): > > > *Can someone disable this person's subscription?* > > *...* > > * Unsubscribe: from python-dev here>...* > > So someone might have mistakenly clicked that link, thinking they were > helping to remove Kenny's subscription, for what it's worth. > Good point. Hopefully that's all it was then. -Brett > > > On Mon, Apr 25, 2016 at 5:48 PM Brett Cannon wrote: > >> On Mon, 25 Apr 2016 at 14:45 Tim Peters wrote: >> >>> [Tim Golden , on Kenny the "thingy" guy] >>> > Not subscribed; probably via gmane. >>> >>> They were subscribed, but I already did the unsub. >>> >>> >>> > I've added him to a hold list via spam filter. See if that works. >>> >>> So now we're doubly safe ;-) >>> >> >> Well, now I just received an attempted unsubscribe, so maybe safe from >> more email to the list, but it looks like a start at harassment of me. >> > _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/oreilldf%40gmail.com >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.ware+pydev at gmail.com Mon Apr 25 18:12:56 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Mon, 25 Apr 2016 17:12:56 -0500 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: On Apr 25, 2016 17:08, "Brett Cannon" wrote: > > Good point. Hopefully that's all it was then. Is there any particular reason we include that link in python-dev emails? We don't for any other list as far as I know. -- Zach (On a phone) -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Mon Apr 25 18:55:10 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Mon, 25 Apr 2016 18:55:10 -0400 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: FWIW, Gmail's policies require: """ A user must be able to unsubscribe from your mailing list through one of the following means: * A prominent link in the body of an email leading users to a page confirming his or her unsubscription (no input from the user, other than confirmation, should be required). * By replying to your email with an unsubscribe request. """ (https://support.google.com/mail/answer/81126) That link is currently the only obvious way to unsubscribe. On Mon, Apr 25, 2016 at 6:12 PM, Zachary Ware wrote: > On Apr 25, 2016 17:08, "Brett Cannon" wrote: >> >> Good point. Hopefully that's all it was then. > > Is there any particular reason we include that link in python-dev emails? We > don't for any other list as far as I know. > > -- > Zach > (On a phone) > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com > From ncoghlan at gmail.com Mon Apr 25 22:02:52 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 26 Apr 2016 12:02:52 +1000 Subject: [Python-Dev] support of the android platform In-Reply-To: <571E7D53.2000809@gmail.com> References: <571C73A8.1030908@gmail.com> <571E7D53.2000809@gmail.com> Message-ID: On 26 April 2016 at 06:25, Xavier de Gaye wrote: > On 04/25/2016 04:53 PM, Stefan Krah wrote: > > Eric Snow gmail.com> writes: > >> On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye gmail.com > > > > wrote: > >>> Starting with API level 21 (Android 5.0), the build of python3 with the > >>> official android toolchains (that is, without resorting to external > > > >> How does this relate to http://bugs.python.org/issue23496? > > > > As I understand, that issue seems abandoned and the patches are > > (despite core devs asking otherwise) against 3.4. > > > > > > If Xavier is willing to do so, I think it would be best to start over > > with a new issue that integrates his work into 3.6. > > I will enter a new issue that lists all the new issues and the other > already > existing issues that, would have they been fixed, would have allowed a > successfull cross-build and the same test suite results as described in my > previous post. > Thanks for this, Xavier! Once you have that, in addition to posting the link back here, you may also want to ping the Mobile SIG list: https://www.python.org/community/sigs/current/mobile-sig/ Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Apr 25 22:07:32 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 25 Apr 2016 21:07:32 -0500 Subject: [Python-Dev] support of the android platform In-Reply-To: <571E7CA0.6030703@gmail.com> References: <571C73A8.1030908@gmail.com> <571E7CA0.6030703@gmail.com> Message-ID: Oh wow, has a year passed already? I don't have access to an Android device suitable for development, and Cyd seems to have disappeared, which is why the issue ended up abandoned. I'd be happy to try to help with the new effort if possible! -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ On Apr 25, 2016 3:24 PM, "Xavier de Gaye" wrote: > On 04/25/2016 04:36 PM, Eric Snow wrote: > > On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye > wrote: > >> Starting with API level 21 (Android 5.0), the build of python3 with the > >> official android toolchains (that is, without resorting to external > >> libraries > >> for wide character support) runs correctly. With the set of patches > >> described > >> in the patches/Makefile file at [1], the cpython test suite runs[2] on > the > >> android x86 and armv7 emulators with only few errors[3]. Those errors > are > >> listed with their corresponding error messages, this may give a raw > idea of > >> the effort needed to support this platform. > > > > How does this relate to http://bugs.python.org/issue23496? > > > The patches in issue 23496 address the native compilation of Android 4.4.2 > on > an android device using a port of gcc on this device. Some of these > patches > are not needed anymore on Android 5.0 and it seems that the kbox_fix.patch > is > needed because the KBOX application is used to build python in issue 23496. > > The existing issues that are relevant to the android platform are, I think: > issue #26723: Add an option to skip _decimal module > issue #22747: Interpreter fails in initialize on systems where > HAVE_LANGINFO_H is undefined > issue #16353: add function to os module for getting path to default > shell > issue #20306: Lack of pw_gecos field in Android's struct passwd causes > cross-compilation for the pwd module to fail > > Xavier > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Apr 26 04:02:29 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 26 Apr 2016 09:02:29 +0100 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: On 25 April 2016 at 23:55, Franklin? Lee wrote: > FWIW, Gmail's policies require: [...] > That link is currently the only obvious way to unsubscribe. I'm not sure why gmail's policies should apply to this list. I'm not against having an easy reminder of how to unsubscribe, but the clickable link on every message that requests that the poster be unsubscribed seems like the wrong way to do it, to me... Paul From ben+python at benfinney.id.au Tue Apr 26 04:24:43 2016 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 26 Apr 2016 18:24:43 +1000 Subject: [Python-Dev] Mailing list metadata via RFC 2369 (was: Terminal console) References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: <85a8kgn5f8.fsf_-_@benfinney.id.au> "Franklin? Lee" writes: > FWIW, Gmail's policies require: > """ > A user must be able to unsubscribe from your mailing list through > one of the following means: > > * A prominent link in the body of an email leading users to a page > confirming his or her unsubscription (no input from the user, other > than confirmation, should be required). > * By replying to your email with an unsubscribe request. > """ > (https://support.google.com/mail/answer/81126) GMail already has all the information needed to offer mailing list functionality to every user. The header of every message delivered via the mailing list has full RFC 2369 fields which is ample information, correctly structured for any application to provide the functions GMail is referring to. GMail support staff have known this for many years because RFC 2369 support has been requested for their interface over and over again. There are reports they even make some use of that standard information though as I never use GMail I can't verify that. If not, then their refusal to follow a mature, well-implemented internet standard is no reason for anyone else to change behaviour. It is up to GMail to use the standard information. -- \ ?Anything that we scientists can do to weaken the hold of | `\ religion should be done and may in the end be our greatest | _o__) contribution to civilization.? ?Steven Weinberg | Ben Finney From leewangzhong+python at gmail.com Tue Apr 26 08:45:20 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Tue, 26 Apr 2016 08:45:20 -0400 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: On Apr 26, 2016 4:02 AM, "Paul Moore" wrote: > > On 25 April 2016 at 23:55, Franklin? Lee wrote: > > FWIW, Gmail's policies require: > [...] > > That link is currently the only obvious way to unsubscribe. > > I'm not sure why gmail's policies should apply to this list. They're Gmail's policies on how not to get your messages filtered by Gmail as spam. I am not clear on whether they're descriptive (i.e. users will mark you as spam) or prescriptive (i.e. Google's algorithms will determine that you're spam). -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.ware+pydev at gmail.com Tue Apr 26 08:57:16 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Tue, 26 Apr 2016 07:57:16 -0500 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: On Apr 26, 2016 07:45, "Franklin? Lee" wrote: > > On Apr 26, 2016 4:02 AM, "Paul Moore" wrote: > > > > On 25 April 2016 at 23:55, Franklin? Lee wrote: > > > FWIW, Gmail's policies require: > > [...] > > > That link is currently the only obvious way to unsubscribe. > > > > I'm not sure why gmail's policies should apply to this list. > > They're Gmail's policies on how not to get your messages filtered by Gmail as spam. > > I am not clear on whether they're descriptive (i.e. users will mark you as spam) or prescriptive (i.e. Google's algorithms will determine that you're spam). I have no trouble with Gmail with several other Python lists that do not include an unsubscribe link. -- Zach (On a phone) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Apr 26 09:13:50 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 26 Apr 2016 23:13:50 +1000 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: On 26 April 2016 at 22:57, Zachary Ware wrote: > > On Apr 26, 2016 07:45, "Franklin? Lee" wrote: > > > > On Apr 26, 2016 4:02 AM, "Paul Moore" wrote: > > > > > > On 25 April 2016 at 23:55, Franklin? Lee wrote: > > > > FWIW, Gmail's policies require: > > > [...] > > > > That link is currently the only obvious way to unsubscribe. > > > > > > I'm not sure why gmail's policies should apply to this list. > > > > They're Gmail's policies on how not to get your messages filtered by Gmail as spam. > > > > I am not clear on whether they're descriptive (i.e. users will mark you as spam) or prescriptive (i.e. Google's algorithms will determine that you're spam). > > I have no trouble with Gmail with several other Python lists that do not include an unsubscribe link. Indeed, Mailman inserts the appropriate List-Unsubscribe headers, so there's no need for a link in the body of the emails (and including it can cause problems when link scrapers hit the archives, or link pre-fetching in a webmail client misbehaves) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Tue Apr 26 09:17:43 2016 From: barry at python.org (Barry Warsaw) Date: Tue, 26 Apr 2016 09:17:43 -0400 Subject: [Python-Dev] Terminal console In-Reply-To: References: <93ie6gyc72yin66bmtaohqj3.1461618714120@email.android.com> <571E8E21.7020600@timgolden.me.uk> Message-ID: <20160426091743.797027b2@subdivisions.wooz.org> On Apr 26, 2016, at 09:02 AM, Paul Moore wrote: >I'm not against having an easy reminder of how to unsubscribe, but the >clickable link on every message that requests that the poster be >unsubscribed seems like the wrong way to do it, to me... And yet, we have it anyway! This list turns on full personalization so the footers will all have a link to your unsubscribe page. As Ben pointed out, we also implement RFC 2369, which is only an 18 year old standard. Cheers, -Barry From ethan at stoneleaf.us Tue Apr 26 10:14:35 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 26 Apr 2016 07:14:35 -0700 Subject: [Python-Dev] support of the android platform In-Reply-To: <571C73A8.1030908@gmail.com> References: <571C73A8.1030908@gmail.com> Message-ID: <571F77CB.3000905@stoneleaf.us> On 04/24/2016 12:20 AM, Xavier de Gaye wrote: > [1] https://bitbucket.org/xdegaye/pyona/src The license: ----------- This software is licensed under the GNU General Public License version 3 or later. ----------- Will combining your code with Python 3 be a problem? -- ~Ethan~ From steve at pearwood.info Tue Apr 26 11:40:30 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 27 Apr 2016 01:40:30 +1000 Subject: [Python-Dev] Terminal console In-Reply-To: References: <571E8E21.7020600@timgolden.me.uk> Message-ID: <20160426154029.GM13497@ando.pearwood.info> On Tue, Apr 26, 2016 at 08:45:20AM -0400, Franklin? Lee wrote: > On Apr 26, 2016 4:02 AM, "Paul Moore" wrote: > > > > On 25 April 2016 at 23:55, Franklin? Lee > wrote: > > > FWIW, Gmail's policies require: > > [...] > > > That link is currently the only obvious way to unsubscribe. > > > > I'm not sure why gmail's policies should apply to this list. > > They're Gmail's policies on how not to get your messages filtered by Gmail > as spam. > > I am not clear on whether they're descriptive (i.e. users will mark you as > spam) or prescriptive (i.e. Google's algorithms will determine that you're > spam). I don't think it's just Google. If I remember correctly, having a clearly visible and *working* unsubscribe link in the body of the email (not merely hidden away in the headers where non-technical users would never think to look) is a requirement for the CanSpam act, or whatever it was called. In any case, whether it is a legal or practical requirement or not, it's a fairly small burden. As I see it, the only time it causes a (tiny) issue is if somebody accidently includes the footer from a list mail they received when forwarding to somebody else (or sending to the list), and the receiver mistakenly (or in an attempt to cause trouble) clicks on that link. Which is harmless. Considering how many hundreds, thousands (hundreds of thousands? sometimes it feels like that *wink*) of emails go through this list alone, I don't think this is a problem that needs fixing. -- Steve From xdegaye at gmail.com Tue Apr 26 11:41:51 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Tue, 26 Apr 2016 17:41:51 +0200 Subject: [Python-Dev] support of the android platform In-Reply-To: References: <571C73A8.1030908@gmail.com> <571E7D53.2000809@gmail.com> Message-ID: <571F8C3F.7060803@gmail.com> On 04/26/2016 04:02 AM, Nick Coghlan wrote: > On 26 April 2016 at 06:25, Xavier de Gaye > wrote: > > On 04/25/2016 04:53 PM, Stefan Krah wrote: > > Eric Snow gmail.com > writes: > >> On Sun, Apr 24, 2016 at 1:20 AM, Xavier de Gaye gmail.com > > > wrote: > >>> Starting with API level 21 (Android 5.0), the build of python3 with the > >>> official android toolchains (that is, without resorting to external > > > >> How does this relate tohttp://bugs.python.org/issue23496? > > > > As I understand, that issue seems abandoned and the patches are > > (despite core devs asking otherwise) against 3.4. > > > > > > If Xavier is willing to do so, I think it would be best to start over > > with a new issue that integrates his work into 3.6. > > I will enter a new issue that lists all the new issues and the other already > existing issues that, would have they been fixed, would have allowed a > successfull cross-build and the same test suite results as described in my > previous post. > > > Thanks for this, Xavier! > > Once you have that, in addition to posting the link back here, you may also want to ping the Mobile SIG list: https://www.python.org/community/sigs/current/mobile-sig/ Issue 26865 [1] lists issues that may have to be fixed in the perspective of a future support of the android platform. Xavier [1] http://bugs.python.org/issue26865 From xdegaye at gmail.com Tue Apr 26 11:53:28 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Tue, 26 Apr 2016 17:53:28 +0200 Subject: [Python-Dev] support of the android platform In-Reply-To: <571F77CB.3000905@stoneleaf.us> References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us> Message-ID: <571F8EF8.5080807@gmail.com> On 04/26/2016 04:14 PM, Ethan Furman wrote: > On 04/24/2016 12:20 AM, Xavier de Gaye wrote: > >> [1] https://bitbucket.org/xdegaye/pyona/src > > The license: > ----------- > This software is licensed under the GNU General Public License version 3 or later. > ----------- > > > Will combining your code with Python 3 be a problem? This code, or part of it, could be used to setup a buildbot and in this case there would not be any conflict between the GPL v3 license and the Python license, I think. I don't see how it can be combined with Python 3. Xavier From barry at python.org Tue Apr 26 12:12:01 2016 From: barry at python.org (Barry Warsaw) Date: Tue, 26 Apr 2016 12:12:01 -0400 Subject: [Python-Dev] Terminal console In-Reply-To: <20160426154029.GM13497@ando.pearwood.info> References: <571E8E21.7020600@timgolden.me.uk> <20160426154029.GM13497@ando.pearwood.info> Message-ID: <20160426121201.323dcf2b@subdivisions.wooz.org> On Apr 27, 2016, at 01:40 AM, Steven D'Aprano wrote: >I don't think it's just Google. If I remember correctly, having a clearly >visible and *working* unsubscribe link in the body of the email (not merely >hidden away in the headers where non-technical users would never think to >look) is a requirement for the CanSpam act, or whatever it was called. BTW, the whole point of RFC 2369 headers is so that MUAs can implement a nice big fat blinky UNSUBSCRIBE button in their UI. -Barry From stefan at bytereef.org Tue Apr 26 13:12:26 2016 From: stefan at bytereef.org (Stefan Krah) Date: Tue, 26 Apr 2016 17:12:26 +0000 (UTC) Subject: [Python-Dev] support of the android platform References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us> <571F8EF8.5080807@gmail.com> Message-ID: Xavier de Gaye gmail.com> writes: > This code, or part of it, could be used to setup a buildbot and in this case > there would not be any conflict between the GPL v3 license and the Python > license, I think. I don't see how it can be combined with Python 3. For the patches on the tracker I just went by your contributor agreement. I didn't check the lineage of the patches. Can I assume that either you are re-licensing GPL-stuff written by yourself to the PSF (which is a perfectly valid use case of the agreement) or rewriting from scratch? Stefan Krah From xdegaye at gmail.com Tue Apr 26 15:59:02 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Tue, 26 Apr 2016 21:59:02 +0200 Subject: [Python-Dev] support of the android platform In-Reply-To: References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us> <571F8EF8.5080807@gmail.com> Message-ID: <571FC886.1020904@gmail.com> On 04/26/2016 07:12 PM, Stefan Krah wrote: > Xavier de Gaye gmail.com> writes: >> This code, or part of it, could be used to setup a buildbot and in this case >> there would not be any conflict between the GPL v3 license and the Python >> license, I think. I don't see how it can be combined with Python 3. > > For the patches on the tracker I just went by your contributor agreement. > I didn't check the lineage of the patches. Can I assume that either you > are re-licensing GPL-stuff written by yourself to the PSF (which is a > perfectly valid use case of the agreement) or rewriting from scratch? Yes, I am re-licensing GPL code to the PSF for all the patches written by me in the issues listed on http://bugs.python.org/issue26865#msg264310. I have only rewritten the patches from scratch in the following issues: issue #26849: android does not support versioning in SONAME (using a switch case on ac_sys_system) issue #26854: missing header on android for the ossaudiodev module (actually it's difficult to rewrite such an obvious patch) issue #26855: add platform.android_ver() for android (using configparser; Chi Hsuan Yen is proposing a more complete approach) Fixes for those three issues can also be found in other projects porting python3 to android, the ones that I know of are: * Python 3 Android at https://github.com/yan12125/python3-android, author Chi Hsuan Yen * python-for-android at https://github.com/kuri65536/python-for-android, author shimoda dragon I also browsed rapidly issue 23496 and could not find any overlap with my patches. Xavier From stefan at bytereef.org Tue Apr 26 16:35:52 2016 From: stefan at bytereef.org (Stefan Krah) Date: Tue, 26 Apr 2016 20:35:52 +0000 (UTC) Subject: [Python-Dev] support of the android platform References: <571C73A8.1030908@gmail.com> <571F77CB.3000905@stoneleaf.us> <571F8EF8.5080807@gmail.com> <571FC886.1020904@gmail.com> Message-ID: Xavier de Gaye gmail.com> writes: > Yes, I am re-licensing GPL code to the PSF for all the patches written by me > in the issues listed on http://bugs.python.org/issue26865#msg264310. I have > only rewritten the patches from scratch in the following issues: Thanks, this all sounds good. > issue #26854: missing header on android for the ossaudiodev module > (actually it's difficult to rewrite such an obvious patch) Indeed. :) Stefan Krah From storchaka at gmail.com Wed Apr 27 03:14:41 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 27 Apr 2016 10:14:41 +0300 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() Message-ID: There are three functions (or at least three documented functions) in C API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and PyModule_AddObject(). The first two "steals" references even on failure, and this is well known behaviour. But PyModule_AddObject() "steals" a reference only on success. There is nothing in the documentation that points on this. Most usages of PyModule_AddObject() in the stdlib don't decref the reference to the value on PyModule_AddObject() failure. The only exceptions are in _json, _io, and _tkinter modules. In many cases, including examples in the documentation, the successfulness of PyModule_AddObject() is not checked either, but this is different issue. We can just fix the documentation but adding a note that PyModule_AddObject() doesn't steal a reference on failure. And add explicit decrefs after PyModule_AddObject() in hundreds of places in the code. But I think it would be better to "fix" PyModule_AddObject() by making it decrefing a reference on failure as expected by most developers. But this is dangerous change, because if the author of third-party code read not only the documentation, but CPython code, and added explicit decref on PyModule_AddObject() failure, we will get a double decrefing. I think that we can resolve this issue by following steps: 1. Add a new function PyModule_AddObject2(), that steals a reference even on failure. 2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions about a name?). If it is defined, define PyModule_AddObject as PyModule_AddObject2. Define this macro before including Python.h in all CPython modules except _json, _io, and _tkinter. 3. Make old PyModule_AddObject to emit a warning about possible leak and a suggestion to define above macro. From berker.peksag at gmail.com Wed Apr 27 06:00:29 2016 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Wed, 27 Apr 2016 13:00:29 +0300 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On Wed, Apr 27, 2016 at 10:14 AM, Serhiy Storchaka wrote: > I think that we can resolve this issue by following steps: > > 1. Add a new function PyModule_AddObject2(), that steals a reference even on > failure. +1 It would be good to document PyModule_AddObject's current behavior in 3.5+ (already attached a patch). > 2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions about a > name?). If it is defined, define PyModule_AddObject as PyModule_AddObject2. > Define this macro before including Python.h in all CPython modules except > _json, _io, and _tkinter. +1 > 3. Make old PyModule_AddObject to emit a warning about possible leak and a > suggestion to define above macro. +0 From hrvoje.niksic at avl.com Wed Apr 27 08:31:37 2016 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Wed, 27 Apr 2016 14:31:37 +0200 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: <5720B129.5010900@avl.com> On 04/27/2016 09:14 AM, Serhiy Storchaka wrote: > There are three functions (or at least three documented functions) in C > API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and > PyModule_AddObject(). The first two "steals" references even on failure, > and this is well known behaviour. But PyModule_AddObject() "steals" a > reference only on success. There is nothing in the documentation that > points on this. This inconsistency has caused bugs (or, more fairly, potential leaks) before, see http://bugs.python.org/issue1782 Unfortunately, the suggested Python 3 change to PyModule_AddObject was not accepted. > 1. Add a new function PyModule_AddObject2(), that steals a reference > even on failure. This sounds like a good idea, except the name could be prettier :), e.g. PyModule_InsertObject. PyModule_AddObject could be deprecated. Hrvoje From ncoghlan at gmail.com Wed Apr 27 09:08:37 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 27 Apr 2016 23:08:37 +1000 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On 27 April 2016 at 17:14, Serhiy Storchaka wrote: > I think that we can resolve this issue by following steps: > > 1. Add a new function PyModule_AddObject2(), that steals a reference even on > failure. I'd suggest a variant on this that more closely matches the PyList_SetItem and PyTuple_SetItem cases: PyModule_SetAttrString The first two match the signature of PySequence_SetItem, but steal the reference instead of making a new one, and the same relationship would exist between PyObject_SetAttrString and the new PyModule_SetAttrString. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Apr 27 09:10:55 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 27 Apr 2016 23:10:55 +1000 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On 27 April 2016 at 23:08, Nick Coghlan wrote: > On 27 April 2016 at 17:14, Serhiy Storchaka wrote: >> I think that we can resolve this issue by following steps: >> >> 1. Add a new function PyModule_AddObject2(), that steals a reference even on >> failure. > > I'd suggest a variant on this that more closely matches the > PyList_SetItem and PyTuple_SetItem cases: PyModule_SetAttrString And for the record: that suggestion was prompted by Hrvoje's email suggesting using a more descriptive name, I just went and looked up the name of the corresponding PyObject_* API. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Wed Apr 27 13:55:47 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 27 Apr 2016 20:55:47 +0300 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On 27.04.16 10:14, Serhiy Storchaka wrote: > There are three functions (or at least three documented functions) in C > API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and > PyModule_AddObject(). The first two "steals" references even on failure, > and this is well known behaviour. But PyModule_AddObject() "steals" a > reference only on success. There is nothing in the documentation that > points on this. Most usages of PyModule_AddObject() in the stdlib don't > decref the reference to the value on PyModule_AddObject() failure. The > only exceptions are in _json, _io, and _tkinter modules. In many cases, > including examples in the documentation, the successfulness of > PyModule_AddObject() is not checked either, but this is different issue. > > We can just fix the documentation but adding a note that > PyModule_AddObject() doesn't steal a reference on failure. And add > explicit decrefs after PyModule_AddObject() in hundreds of places in the > code. > > But I think it would be better to "fix" PyModule_AddObject() by making > it decrefing a reference on failure as expected by most developers. But > this is dangerous change, because if the author of third-party code read > not only the documentation, but CPython code, and added explicit decref > on PyModule_AddObject() failure, we will get a double decrefing. > > I think that we can resolve this issue by following steps: > > 1. Add a new function PyModule_AddObject2(), that steals a reference > even on failure. > > 2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions > about a name?). If it is defined, define PyModule_AddObject as > PyModule_AddObject2. Define this macro before including Python.h in all > CPython modules except _json, _io, and _tkinter. > > 3. Make old PyModule_AddObject to emit a warning about possible leak and > a suggestion to define above macro. Opened an issue: http://bugs.python.org/issue26871 . Provided patch introduces new macros PY_MODULE_ADDOBJECT_CLEAN that controls the behavior of PyModule_AddObject() as PY_SSIZE_T_CLEAN controls the behavior of PyArg_Parse* functions. If the macro is defined before including "Python.h", PyModule_AddObject() steals a reference unconditionally. Otherwise it steals a reference only on success, and the caller is responsible for decref'ing it on error (current behavior). From storchaka at gmail.com Wed Apr 27 14:02:19 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 27 Apr 2016 21:02:19 +0300 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: <5720B129.5010900@avl.com> References: <5720B129.5010900@avl.com> Message-ID: On 27.04.16 15:31, Hrvoje Niksic wrote: > On 04/27/2016 09:14 AM, Serhiy Storchaka wrote: >> There are three functions (or at least three documented functions) in C >> API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and >> PyModule_AddObject(). The first two "steals" references even on failure, >> and this is well known behaviour. But PyModule_AddObject() "steals" a >> reference only on success. There is nothing in the documentation that >> points on this. > > This inconsistency has caused bugs (or, more fairly, potential leaks) > before, see http://bugs.python.org/issue1782 Glad to hear I'm not the first faced with this problem. > Unfortunately, the suggested Python 3 change to PyModule_AddObject was > not accepted. Bad. May be it happened because of the risk to break third-party working code. I propose a gradual path to change PyModule_AddObject. >> 1. Add a new function PyModule_AddObject2(), that steals a reference >> even on failure. > > This sounds like a good idea, except the name could be prettier :), e.g. > PyModule_InsertObject. PyModule_AddObject could be deprecated. I have decided to not introduce new public function. But just control the behavior of old function with the macro. This needs minimal changes to user code. From storchaka at gmail.com Wed Apr 27 14:06:25 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 27 Apr 2016 21:06:25 +0300 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On 27.04.16 16:08, Nick Coghlan wrote: > On 27 April 2016 at 17:14, Serhiy Storchaka wrote: >> I think that we can resolve this issue by following steps: >> >> 1. Add a new function PyModule_AddObject2(), that steals a reference even on >> failure. > > I'd suggest a variant on this that more closely matches the > PyList_SetItem and PyTuple_SetItem cases: PyModule_SetAttrString > > The first two match the signature of PySequence_SetItem, but steal the > reference instead of making a new one, and the same relationship would > exist between PyObject_SetAttrString and the new > PyModule_SetAttrString. I think it is better to have relation with PyModule_AddIntConstant() etc than with PyObject_SetAttrString. My patch doesn't introduce new public function, but changes the behavior of the old function. This needs minimal changes to user code that mostly use PyModule_AddObject() incorrectly (not blaming authors). From stefan at bytereef.org Wed Apr 27 16:51:29 2016 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 27 Apr 2016 20:51:29 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?= References: <5720B129.5010900@avl.com> Message-ID: Hrvoje Niksic avl.com> writes: > This inconsistency has caused bugs (or, more fairly, potential leaks) > before, see http://bugs.python.org/issue1782 > > Unfortunately, the suggested Python 3 change to PyModule_AddObject was > not accepted. First, these "leaks" only potentially show up when you already have much bigger problems (i.e. on Linux the machine would already freeze due to overallocation). Second, these "leaks" don't even show up as "definitely lost" in Valgrind (yes, I checked). On the bright side, Python must be in a very healthy state if we can afford to spend time on issues such as this one. Stefan Krah From casevh at gmail.com Wed Apr 27 18:24:39 2016 From: casevh at gmail.com (Case Van Horsen) Date: Wed, 27 Apr 2016 15:24:39 -0700 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On Wed, Apr 27, 2016 at 11:06 AM, Serhiy Storchaka wrote: > I think it is better to have relation with PyModule_AddIntConstant() etc > than with PyObject_SetAttrString. > > My patch doesn't introduce new public function, but changes the behavior of > the old function. This needs minimal changes to user code that mostly use > PyModule_AddObject() incorrectly (not blaming authors). How will this impact code that uses PyModule_AddObject() correctly? From storchaka at gmail.com Thu Apr 28 04:15:35 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 28 Apr 2016 11:15:35 +0300 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On 28.04.16 01:24, Case Van Horsen wrote: > On Wed, Apr 27, 2016 at 11:06 AM, Serhiy Storchaka wrote: >> I think it is better to have relation with PyModule_AddIntConstant() etc >> than with PyObject_SetAttrString. >> >> My patch doesn't introduce new public function, but changes the behavior of >> the old function. This needs minimal changes to user code that mostly use >> PyModule_AddObject() incorrectly (not blaming authors). > > How will this impact code that uses PyModule_AddObject() correctly? No impact except emitting a deprecation warning at build time. But we can remove a deprecation warning and add it in future release if this is annoying. But are you sure, that your code uses PyModule_AddObject() correctly? Only two modules in the stdlib (_json and _tkinter) used it correctly. Other modules have bugs even in tries to use PyModule_AddObject() correctly for some operations. From stefan at bytereef.org Thu Apr 28 04:38:19 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 28 Apr 2016 08:38:19 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?= References: Message-ID: Serhiy Storchaka gmail.com> writes: > No impact except emitting a deprecation warning at build time. But we > can remove a deprecation warning and add it in future release if this is > annoying. > > But are you sure, that your code uses PyModule_AddObject() correctly? > Only two modules in the stdlib (_json and _tkinter) used it correctly. > Other modules have bugs even in tries to use PyModule_AddObject() > correctly for some operations. Could you perhaps stop labeling this as a bug? Usually we are talking about a *single* "leak" that a) does not even show up in Valgrind and b) only occurs under severe memory pressure when the OOM-killer is already waiting. I'm honestly mystified by your terminology and it's beginning to feel that you need to justify this patch at all costs. Stefan Krah From stefan at bytereef.org Thu Apr 28 05:05:13 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 28 Apr 2016 09:05:13 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?= References: Message-ID: Serhiy Storchaka gmail.com> writes: > But are you sure, that your code uses PyModule_AddObject() correctly? > Only two modules in the stdlib (_json and _tkinter) used it correctly. > Other modules have bugs even in tries to use PyModule_AddObject() > correctly for some operations. For the list, this is the extent of this horrible "bug": diff --git a/Modules/_decimal/_decimal.c b/Modules/_decimal/_decimal.c --- a/Modules/_decimal/_decimal.c +++ b/Modules/_decimal/_decimal.c @@ -5804,8 +5804,7 @@ PyObject_CallObject((PyObject *)&PyDecContext_Type, NULL)); init_basic_context(basic_context_template); Py_INCREF(basic_context_template); - CHECK_INT(PyModule_AddObject(m, "BasicContext", - basic_context_template)); + CHECK_INT(-1); $ valgrind --suppressions=Misc/valgrind-python.supp ./python -c "import decimal" [...] ==16945== LEAK SUMMARY: ==16945== definitely lost: 0 bytes in 0 blocks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [...] Stefan Krah From random832 at fastmail.com Thu Apr 28 09:28:46 2016 From: random832 at fastmail.com (Random832) Date: Thu, 28 Apr 2016 09:28:46 -0400 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com> On Thu, Apr 28, 2016, at 05:05, Stefan Krah wrote: > $ valgrind --suppressions=Misc/valgrind-python.supp ./python -c "import > decimal" > > [...] > ==16945== LEAK SUMMARY: > ==16945== definitely lost: 0 bytes in 0 blocks > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Well, the obvious flaw with your test case is that a reference is retained forever in the C static variable basic_context_template. Now, it is arguable that this may be a reasonably common pattern, and that this doesn't actually constitute misuse of the API (the reference count will be wrong, but the object itself is immortal anyway, so it doesn't matter if it's 2 or 1 since it can't be 0 even with correct usage) From storchaka at gmail.com Thu Apr 28 09:55:32 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 28 Apr 2016 16:55:32 +0300 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: Message-ID: On 28.04.16 11:38, Stefan Krah wrote: > Serhiy Storchaka gmail.com> writes: >> No impact except emitting a deprecation warning at build time. But we >> can remove a deprecation warning and add it in future release if this is >> annoying. >> >> But are you sure, that your code uses PyModule_AddObject() correctly? >> Only two modules in the stdlib (_json and _tkinter) used it correctly. >> Other modules have bugs even in tries to use PyModule_AddObject() >> correctly for some operations. > > Could you perhaps stop labeling this as a bug? Usually we are talking > about a *single* "leak" that a) does not even show up in Valgrind and > b) only occurs under severe memory pressure when the OOM-killer is > already waiting. > > > I'm honestly mystified by your terminology and it's beginning to feel > that you need to justify this patch at all costs. I say this is a bug because 1. PyModule_AddObject() behavior doesn't match the documentation. 2. Most code that use PyModule_AddObject() doesn't work as intended. Since the bahavior of PyModule_AddObject() contradicts the documentation and is contrintuitive, we can't blame authors in this. I don't say this is a high-impacting bug, I even agree that there is no need to fix the second part in maintained releases. But this is a bug unless you propose different definition for a bug. What can we do with this? 1. Change the documentation of PyModule_AddObject(). I think this is not questionable, and Berker provided a patch in http://bugs.python.org/issue26868 . 2. Update examples in the documentation to correctly handle errors of PyModule_AddObject(). This is more questionable, due to the case (3c) below and because correct error handling code distracts attention from main purpose of examples. 3. One of alternatives: 3a) Fix almost all usages of PyModule_AddObject() in stdlib extension modules. This is hundreds occurrences in over a half-hundred files. 3b) Allow to change the behavior of PyModule_AddObject() to match most authors expectations. This needs to add only one line to switch on new behavior in most files. 3c) Ignore issue. In this case we can not check the result of PyModule_AddObject() at all. But I afraid that correct fixing issues with subinterpreters will need us to return to this issue. From stefan at bytereef.org Thu Apr 28 10:11:29 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 28 Apr 2016 14:11:29 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?= References: <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com> Message-ID: Random832 fastmail.com> writes: > On Thu, Apr 28, 2016, at 05:05, Stefan Krah wrote: > > $ valgrind --suppressions=Misc/valgrind-python.supp ./python -c "import > > decimal" > > > > [...] > > ==16945== LEAK SUMMARY: > > ==16945== definitely lost: 0 bytes in 0 blocks > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Well, the obvious flaw with your test case is that a reference is > retained forever in the C static variable basic_context_template. For actual users of Valgrind this is patently obvious and was pretty much the point of my post. Stefan Krah From nileshdate1990 at gmail.com Thu Apr 28 08:00:56 2016 From: nileshdate1990 at gmail.com (Nilesh Date) Date: Thu, 28 Apr 2016 17:30:56 +0530 Subject: [Python-Dev] Needs to install python 3.4.4 in RHEL 6 Message-ID: Hi team, I wanted to install python version 3.4.4 in my RHEL 6 system. Can someone give installation process or any reference link from which I can get required steps and download desire package. Thanks, *Nilesh Date* -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Apr 28 11:17:35 2016 From: random832 at fastmail.com (Random832) Date: Thu, 28 Apr 2016 11:17:35 -0400 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com> Message-ID: <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com> On Thu, Apr 28, 2016, at 10:11, Stefan Krah wrote: > For actual users of Valgrind this is patently obvious and was > pretty much the point of my post. A more relevant point would be that _decimal does *not* use the API in a way *which would be broken by the proposed change*, regardless of whether the way in which it uses it is subjectively correct or can cause leaks. From stefan at bytereef.org Thu Apr 28 11:26:07 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 28 Apr 2016 15:26:07 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?= References: <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com> <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com> Message-ID: Random832 fastmail.com> writes: > A more relevant point would be that _decimal does *not* use the API in a > way *which would be broken by the proposed change*, regardless of > whether the way in which it uses it is subjectively correct or can cause > leaks. And the ultimate point is that I don't want to spend about a week per year to evaluate the effect of needless code changes on a highly audited module. And no, this isn't theoretical... Stefan Krah From stefan at bytereef.org Thu Apr 28 11:29:11 2016 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 28 Apr 2016 15:29:11 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Inconsistency_of_PyModule=5FAddObject=28?= =?utf-8?q?=29?= References: Message-ID: Serhiy Storchaka gmail.com> writes: > 2. Most code that use PyModule_AddObject() doesn't work as intended. > Since the bahavior of PyModule_AddObject() contradicts the documentation > and is contrintuitive, we can't blame authors in this. > > I don't say this is a high-impacting bug, I even agree that there is no > need to fix the second part in maintained releases. But this is a bug > unless you propose different definition for a bug. Why do you think that module authors don't know that? For _decimal, I was aware of the strange behavior. Yes, a single reference can "leak" on failure. The problem is that we don't seem to have any common ground here. Do you accept the following? 1) PyModule_AddObject() can only fail if malloc() fails. a) Normally (for small allocations) this is such a serious problem that the whole application fails anyway. b) Say that you're lucky and the application continues. i) The import fails. In some cases ImportError is caught and a fallback is imported (example _pydecimal). In that case you leak an entire DSO and something small like a single context object. What is the practical difference between the two? ii) The import fails and there's no fallback. Usually the application stops, otherwise DSO+small leak again. iii) Retry the import (I have never seen this): while(1): try: import leftpad except (ImportError, MemoryError): continue break You could have a legitimate leak here, but see a). Module initializations are intricate and boring. I suspect that if we promote wide changes across PyPI packages we'll see more additional segfaults than theoretically plugged memory leaks. Stefan Krah From zachary.ware+pydev at gmail.com Thu Apr 28 11:38:22 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Thu, 28 Apr 2016 10:38:22 -0500 Subject: [Python-Dev] Needs to install python 3.4.4 in RHEL 6 In-Reply-To: References: Message-ID: Hi Nilesh, On Thu, Apr 28, 2016 at 7:00 AM, Nilesh Date wrote: > Hi team, > > I wanted to install python version 3.4.4 in my RHEL 6 system. > Can someone give installation process or any reference link from which I can > get required steps and download desire package. You have a couple of options. Option 1: use software collections [1]. As I vaguely understand it (having never used this myself), the rh-python34 package is supported by Red Hat, and is like any other package for the most part. Looking at that page it does look a bit more complex than option 2 to me, but I've built and installed Python several times over the past few years :) Option 2: compile and install yourself. At a minimum, you'll need a c compiler (gcc, icc, or clang are recommended), and development headers for any extension modules that you require (I'd recommend openssl-devel and readline-devel at the least). Then download the source [2], extract it, and run `cd Python-3.4.4 && ./configure && make profile-opt && make test && sudo make install`. That series of commands will give you python installed in `/usr/local/` that has been compiled with profile-guided optimization (PGO) and has passed the full Python test suite. If any but the last step fails, nothing will have changed on your system. [1] https://www.softwarecollections.org/en/scls/rhscl/rh-python34/ [2] https://www.python.org/downloads/source/ Hope this helps, -- Zach From gvanrossum at gmail.com Thu Apr 28 12:30:21 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 28 Apr 2016 09:30:21 -0700 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com> <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com> Message-ID: Stefan, could you explain which module you are talking about and why it would cost you a week? What is your responsibility here? --Guido (mobile) On Apr 28, 2016 8:28 AM, "Stefan Krah" wrote: > Random832 fastmail.com> writes: > > A more relevant point would be that _decimal does *not* use the API in a > > way *which would be broken by the proposed change*, regardless of > > whether the way in which it uses it is subjectively correct or can cause > > leaks. > > And the ultimate point is that I don't want to spend about a week per year > to evaluate the effect of needless code changes on a highly audited module. > > And no, this isn't theoretical... > > > Stefan Krah > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Apr 28 12:56:36 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 28 Apr 2016 09:56:36 -0700 Subject: [Python-Dev] Inconsistency of PyModule_AddObject() In-Reply-To: References: <1461850126.3154397.592289905.70140D82@webmail.messagingengine.com> <1461856655.3183220.592420497.7429979F@webmail.messagingengine.com> Message-ID: <572240C4.9010107@stoneleaf.us> On 04/28/2016 08:26 AM, Stefan Krah wrote: > Random832 writes: >> A more relevant point would be that _decimal does *not* use the API in a >> way *which would be broken by the proposed change*, regardless of >> whether the way in which it uses it is subjectively correct or can cause >> leaks. > > And the ultimate point is that I don't want to spend about a week per year > to evaluate the effect of needless code changes on a highly audited module. > > And no, this isn't theoretical... Considering you have to opt-in to the change, why would this be a big deal for you? Or are you saying you'd rather have the PyModule_AddObject deprecated (without removal?), and a new PyWhatever_Whatever to take it's place? -- ~Ethan~ From brett at python.org Thu Apr 28 15:07:48 2016 From: brett at python.org (Brett Cannon) Date: Thu, 28 Apr 2016 19:07:48 +0000 Subject: [Python-Dev] Anyone want to lead the sprints at PyCon US 2016? In-Reply-To: References: Message-ID: No one stepped forward to lead the sprints this year, so I will put myself as the sprint leader and lean on everyone else who appears to help. :) On Tue, 5 Apr 2016 at 09:36 Brett Cannon wrote: > The call has started to go out for sprint groups to list themselves > online. Anyone want to specifically lead the core sprint this year? If no > one specifically does then I will sign us up and do my usual thing of > pointing people at the devguide and encourage people to ask questions but > not do a lot of hand-holding (I'm expecting to be busy either working on > GitHub migration stuff or doing other things that I have been neglecting > due to my GitHub migration work). > > ---------- Forwarded message --------- > From: Ewa Jodlowska > Date: Mon, 4 Apr 2016 at 07:14 > Subject: [PSF-Community] Sprinting at PyCon US 2016 > To: > > > Are you coming to PyCon US? Have you thought about sprinting? > > The coding Sprints are the hidden gem of PyCon, up to 4 days (June 2-5) of > coding with many Python projects and their maintainers. And if you're > coming to PyCon, taking part in the Sprints is easy! > > You don?t need to change your registration* to join the Sprints. There?s > no additional registration fee, and you even get lunch. You do need to > cover the additional lodging and other meals, but that?s it. If you?ve > booked a room through the PyCon registration system, you'll need to contact > the registration team at pycon2016 at cteusa.com as soon as possible to > request the extra nights. The sprinting itself (along with lunch every day) > is free, so your only expenses are your room and other meals. > > If you're interested in what projects will be sprinting, just keep an eye > on the sprints page on the PyCon web site at > https://us.pycon.org/2016/community/sprints/ Be sure to check back, as > groups are being added all the time. > > If you haven't sprinted before, or if you just need to brush up on > sprinting tools and techniques, there will again be an 'Intro to Sprinting' > session the evening of June 1, lead by Shauna Gordon-McKeon and other > members of Python community. To grab a free ticket for this session, just > visit > https://www.eventbrite.com/e/introduction-to-open-source-the-pycon-sprints-tickets-22435151141 > . > > *Please note that conference registration is sold out, but you do not need > a conference registration to come to the Sprints. > > _______________________________________________ > PSF-Community mailing list > PSF-Community at python.org > https://mail.python.org/mailman/listinfo/psf-community > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Apr 28 21:35:57 2016 From: larry at hastings.org (Larry Hastings) Date: Thu, 28 Apr 2016 18:35:57 -0700 Subject: [Python-Dev] Release schedule for Python 3.5.2 Message-ID: <5722BA7D.2020008@hastings.org> I've been holding off on the hope that one or two bugs would get fixes. But those seem to have stalled. So I think it's time that we pushed out a 3.5.2. Maybe announcing a schedule will light a fire under some rumps. I put "Spring 2016" as the release date for 3.5.2 on the 3.5 release schedule PEP. Officially, spring ends--and summer begins--Tuesday June 21 at 12:24am EDT. However on the off chance that the PyCon sprints are productive, I want to hold off until those are done, and maybe give it a couple extra days for the dust to settle. Last sprint day is Sunday June 5th. So, bottom line, the RC will happen during spring, but the final release will technically be during summer. 3.5.2 RC 1 - tag Sat June 11, release Sun June 12 3.5.2 Final - tag Sat June 25, release Sun June 26 Any problems with that? Speak up now. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Apr 29 04:37:58 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 29 Apr 2016 18:37:58 +1000 Subject: [Python-Dev] Needs to install python 3.4.4 in RHEL 6 In-Reply-To: References: Message-ID: On 29 April 2016 at 01:38, Zachary Ware wrote: > Hi Nilesh, > > On Thu, Apr 28, 2016 at 7:00 AM, Nilesh Date wrote: >> Hi team, >> >> I wanted to install python version 3.4.4 in my RHEL 6 system. >> Can someone give installation process or any reference link from which I can >> get required steps and download desire package. > > You have a couple of options. > > Option 1: use software collections [1]. As I vaguely understand it > (having never used this myself), the rh-python34 package is supported > by Red Hat, and is like any other package for the most part. Looking > at that page it does look a bit more complex than option 2 to me, but > I've built and installed Python several times over the past few years > :) Note that the versions hosted on softwarecollections.org are provided by the SCLo CentOS SIG. For the commercially supported versions, most RHEL subscriptions include access to the relevant channels: https://access.redhat.com/solutions/472793 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mdione at grulic.org.ar Fri Apr 29 10:45:04 2016 From: mdione at grulic.org.ar (Marcos Dione) Date: Fri, 29 Apr 2016 16:45:04 +0200 Subject: [Python-Dev] Convert int() to size_t in Python/C Message-ID: <20160429144504.GA17754@diablo.grulicueva.local> First of all, I'm not subbscribed to the list (too much traffic for me), so please CC: me in any answers if possible. I'm trying to add a new syscall to the os module: https://bugs.python.org/issue26826 One of the few missing parts is to cenvert a parameter, which would be a Python int object using PyArg_ParseTupleAndKeywords() to a size_t variable. For something similar, the 'n' format exists, but that one converts to Py_ssize_t (which is ssize_t, really), but that one is signed. One possible solution hat was suggested to me in the #python IRC channel was to use that, then test if the resulting value is negative, and adjust accordingly, but I wonder if there is a cleaner, more general solution (for instance, what if the type was something else, like loff_t, although for that one in particular there *is* a convertion function/macro). -- (Not so) Random fortune: Premature optimization is the root of all evil. -- Donald Knuth From ruizriverafelipejavier at yahoo.com.mx Fri Apr 29 04:04:08 2016 From: ruizriverafelipejavier at yahoo.com.mx (ruizriverafelipejavier at yahoo.com.mx) Date: Fri, 29 Apr 2016 08:04:08 +0000 (UTC) Subject: [Python-Dev] Problemas con modulos References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com> Message-ID: <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com> ? Hola, Estoy intentando conectarme a twitter para recibir tweets, sin embargo algunos c?digos que he bajado de internet, me indican que debo de instalar tweepy y matplotlib, lo hago y sigo recibiendo el mensaje de que no est?n instalados. tweepy no reporta problemas, lo?invoco?en la l?nea de comandos, todo bien, Igual con?matplotlib requiere de varias dependencias (dateutils, numpy, tornado, etc, ya las instales)?antes de su instalaci?n, pero ya en el editor de Python, al ejecutar el c?digo, me aparece el siguiente mensaje: ImportError: No module named 'matplotlib' alguna idea? Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Fri Apr 29 11:26:31 2016 From: random832 at fastmail.com (Random832) Date: Fri, 29 Apr 2016 11:26:31 -0400 Subject: [Python-Dev] Convert int() to size_t in Python/C In-Reply-To: <20160429144504.GA17754@diablo.grulicueva.local> References: <20160429144504.GA17754@diablo.grulicueva.local> Message-ID: <1461943591.3516157.593502305.734D3C1F@webmail.messagingengine.com> On Fri, Apr 29, 2016, at 10:45, Marcos Dione wrote: > One possible solution hat was suggested to me in the #python IRC > channel was to use that, then test if the resulting value is negative, > and adjust accordingly, but I wonder if there is a cleaner, more general > solution (for instance, what if the type was something else, like loff_t, > although for that one in particular there *is* a convertion > function/macro). In principle, you could just use PyLong_AsUnsignedLong (or LongLong), and raise OverflowError manually if the value happens to be out of size_t's range. (99% sure that on every linux platform unsigned long is the same size as size_t. But it's not like it'd be the first function in OS to call a system call that takes a size_t. Read just uses Py_ssize_t. Write uses the buffer protocol, which uses Py_ssize_t. How concerned are you really about the lost range here? What does the system call return (its return type is ssize_t) if it writes more than SSIZE_MAX bytes? (This shouldn't be hard to test, just try copying a >2GB file on a 32-bit system) I'm more curious about what your calling convention is going to be for off_in and off_out. I can't think of any other interfaces that have optional output parameters. Python functions generally deal with output parameters in the underlying C function (there are a few examples in math) by returning a tuple. Maybe return a tuple (returned value, off_in, off_out), where None corresponds to the input parameter having been NULL (and passing None in makes it use NULL)? From status at bugs.python.org Fri Apr 29 12:08:40 2016 From: status at bugs.python.org (Python tracker) Date: Fri, 29 Apr 2016 18:08:40 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20160429160840.62E8B5688D@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2016-04-22 - 2016-04-29) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 5475 (-16) closed 33167 (+72) total 38642 (+56) Open issues with patches: 2380 Issues opened (40) ================== #26348: activate.fish sets VENV prompt incorrectly http://bugs.python.org/issue26348 reopened by brett.cannon #26830: Refactor Tools/scripts/google.py http://bugs.python.org/issue26830 opened by franciscouzo #26832: ProactorEventLoop doesn't support stdin/stdout nor files with http://bugs.python.org/issue26832 opened by Gabriel Mesquita Cangussu #26833: returning ctypes._SimpleCData objects from callbacks http://bugs.python.org/issue26833 opened by tilsche #26834: Add truncated SHA512/224 and SHA512/256 http://bugs.python.org/issue26834 opened by christian.heimes #26835: Add file-sealing ops to fcntl http://bugs.python.org/issue26835 opened by christian.heimes #26836: Add memfd_create to os module http://bugs.python.org/issue26836 opened by christian.heimes #26839: Python 3.5 running in a virtual machine with Linux kernel 3.17 http://bugs.python.org/issue26839 opened by doko #26844: Wrong error message during import http://bugs.python.org/issue26844 opened by lev.maximov #26845: Misleading variable name in exception handling http://bugs.python.org/issue26845 opened by Valentin.Lorentz #26848: asyncio.subprocess's communicate() method mishandles empty inp http://bugs.python.org/issue26848 opened by oconnor663 #26849: android does not support versioning in SONAME http://bugs.python.org/issue26849 opened by xdegaye #26850: PyMem_RawMalloc(): update also sys.getallocatedblocks() in deb http://bugs.python.org/issue26850 opened by haypo #26851: android compilation and link flags http://bugs.python.org/issue26851 opened by xdegaye #26852: add a COMPILEALL_FLAGS Makefile variable http://bugs.python.org/issue26852 opened by xdegaye #26855: add platform.android_ver() for android http://bugs.python.org/issue26855 opened by xdegaye #26856: android does not have pwd.getpwall() http://bugs.python.org/issue26856 opened by xdegaye #26858: setting SO_REUSEPORT fails on android http://bugs.python.org/issue26858 opened by xdegaye #26859: unittest fails with "Start directory is not importable" http://bugs.python.org/issue26859 opened by xdegaye #26860: os.walk and os.fwalk yield namedtuple instead of tuple http://bugs.python.org/issue26860 opened by palaviv #26861: shutil.copyfile() doesn't close the opened files http://bugs.python.org/issue26861 opened by vocdetnojz #26862: SYS_getdents64 does not need to be defined on android API 21 http://bugs.python.org/issue26862 opened by xdegaye #26864: urllib.request no_proxy check differs from curl http://bugs.python.org/issue26864 opened by Daniel Morrison #26865: Meta-issue: support of the android platform http://bugs.python.org/issue26865 opened by xdegaye #26866: Inconsistent environment in Windows using "Open With" http://bugs.python.org/issue26866 opened by busfault #26867: test_ssl test_options fails on ubuntu 16.04 http://bugs.python.org/issue26867 opened by xiang.zhang #26868: Document PyModule_AddObject's behavior on error http://bugs.python.org/issue26868 opened by berker.peksag #26869: unittest longMessage docs http://bugs.python.org/issue26869 opened by guettli #26870: Unexpected call to readline's add_history in call_readline http://bugs.python.org/issue26870 opened by tylercrompton #26871: Change weird behavior of PyModule_AddObject() http://bugs.python.org/issue26871 opened by serhiy.storchaka #26872: Default ConfigParser in python is not able to load values habi http://bugs.python.org/issue26872 opened by sorin #26873: xmlrpclib raises when trying to convert an int to string when http://bugs.python.org/issue26873 opened by Nathan Williams #26876: Extend MSVCCompiler class to respect environment variables http://bugs.python.org/issue26876 opened by rohitjamuar #26877: tarfile use wrong code when read from fileobj http://bugs.python.org/issue26877 opened by mmarkk #26878: Allow doctest to deep copy globals http://bugs.python.org/issue26878 opened by DqASe #26881: modulefinder should reuse the dis module http://bugs.python.org/issue26881 opened by haypo #26882: The Python process stops responding immediately after starting http://bugs.python.org/issue26882 opened by ?????????????????? ???????????????????? #26883: input() call blocks multiprocessing http://bugs.python.org/issue26883 opened by the #26884: cross-compilation of extension module links to the wrong pytho http://bugs.python.org/issue26884 opened by xdegaye #26885: Add parsing support for more types in xmlrpc http://bugs.python.org/issue26885 opened by serhiy.storchaka Most recent 15 issues with no replies (15) ========================================== #26885: Add parsing support for more types in xmlrpc http://bugs.python.org/issue26885 #26884: cross-compilation of extension module links to the wrong pytho http://bugs.python.org/issue26884 #26883: input() call blocks multiprocessing http://bugs.python.org/issue26883 #26858: setting SO_REUSEPORT fails on android http://bugs.python.org/issue26858 #26856: android does not have pwd.getpwall() http://bugs.python.org/issue26856 #26852: add a COMPILEALL_FLAGS Makefile variable http://bugs.python.org/issue26852 #26851: android compilation and link flags http://bugs.python.org/issue26851 #26845: Misleading variable name in exception handling http://bugs.python.org/issue26845 #26836: Add memfd_create to os module http://bugs.python.org/issue26836 #26835: Add file-sealing ops to fcntl http://bugs.python.org/issue26835 #26834: Add truncated SHA512/224 and SHA512/256 http://bugs.python.org/issue26834 #26833: returning ctypes._SimpleCData objects from callbacks http://bugs.python.org/issue26833 #26829: update docs: when creating classes a new dict is created for t http://bugs.python.org/issue26829 #26819: _ProactorReadPipeTransport pause_reading()/resume_reading() br http://bugs.python.org/issue26819 #26818: trace CLI doesn't respect -s option http://bugs.python.org/issue26818 Most recent 15 issues waiting for review (15) ============================================= #26885: Add parsing support for more types in xmlrpc http://bugs.python.org/issue26885 #26884: cross-compilation of extension module links to the wrong pytho http://bugs.python.org/issue26884 #26881: modulefinder should reuse the dis module http://bugs.python.org/issue26881 #26876: Extend MSVCCompiler class to respect environment variables http://bugs.python.org/issue26876 #26873: xmlrpclib raises when trying to convert an int to string when http://bugs.python.org/issue26873 #26871: Change weird behavior of PyModule_AddObject() http://bugs.python.org/issue26871 #26868: Document PyModule_AddObject's behavior on error http://bugs.python.org/issue26868 #26864: urllib.request no_proxy check differs from curl http://bugs.python.org/issue26864 #26862: SYS_getdents64 does not need to be defined on android API 21 http://bugs.python.org/issue26862 #26860: os.walk and os.fwalk yield namedtuple instead of tuple http://bugs.python.org/issue26860 #26859: unittest fails with "Start directory is not importable" http://bugs.python.org/issue26859 #26858: setting SO_REUSEPORT fails on android http://bugs.python.org/issue26858 #26856: android does not have pwd.getpwall() http://bugs.python.org/issue26856 #26855: add platform.android_ver() for android http://bugs.python.org/issue26855 #26852: add a COMPILEALL_FLAGS Makefile variable http://bugs.python.org/issue26852 Top 10 most discussed issues (10) ================================= #26826: Expose new copy_file_range() syscal in os module. http://bugs.python.org/issue26826 20 msgs #22234: urllib.parse.urlparse accepts any falsy value as an url http://bugs.python.org/issue22234 12 msgs #26839: Python 3.5 running in a virtual machine with Linux kernel 3.17 http://bugs.python.org/issue26839 11 msgs #26864: urllib.request no_proxy check differs from curl http://bugs.python.org/issue26864 11 msgs #19251: bitwise ops for bytes of equal length http://bugs.python.org/issue19251 8 msgs #26800: Don't accept bytearray as filenames part 2 http://bugs.python.org/issue26800 8 msgs #26439: ctypes.util.find_library fails when ldconfig/glibc not availab http://bugs.python.org/issue26439 7 msgs #19317: ctypes.util.find_library should examine binary's RPATH on Sola http://bugs.python.org/issue19317 6 msgs #26039: More flexibility in zipfile write interface http://bugs.python.org/issue26039 6 msgs #26348: activate.fish sets VENV prompt incorrectly http://bugs.python.org/issue26348 6 msgs Issues closed (64) ================== #7504: Same name cookies http://bugs.python.org/issue7504 closed by berker.peksag #9321: CGIHTTPServer cleanup htbin http://bugs.python.org/issue9321 closed by berker.peksag #12305: Building PEPs doesn't work on Python 3 http://bugs.python.org/issue12305 closed by berker.peksag #12640: test_ctypes seg fault (test_callback_register_double); armv7; http://bugs.python.org/issue12640 closed by berker.peksag #14713: PEP 414 installation hook fails with an AssertionError http://bugs.python.org/issue14713 closed by berker.peksag #16394: Reducing tee() memory footprint http://bugs.python.org/issue16394 closed by rhettinger #18353: PyUnicode_WRITE_CHAR macro definition missing http://bugs.python.org/issue18353 closed by berker.peksag #18551: child_exec() doesn't check return value of fcntl() http://bugs.python.org/issue18551 closed by berker.peksag #18572: Remove redundant note about surrogates in string escape doc http://bugs.python.org/issue18572 closed by berker.peksag #19731: Fix copyright footer http://bugs.python.org/issue19731 closed by berker.peksag #20077: Format of TypeError differs between comparison and arithmetic http://bugs.python.org/issue20077 closed by berker.peksag #20112: The documentation for http.server error_message_format is inad http://bugs.python.org/issue20112 closed by berker.peksag #20247: Condition._is_owned is wrong http://bugs.python.org/issue20247 closed by berker.peksag #20305: Android's incomplete locale.h implementation prevents cross-co http://bugs.python.org/issue20305 closed by skrah #20306: Lack of pw_gecos field in Android's struct passwd causes cross http://bugs.python.org/issue20306 closed by skrah #20447: doctest.debug_script: insecure use of /tmp http://bugs.python.org/issue20447 closed by berker.peksag #20453: json.load() error message changed in 3.4 http://bugs.python.org/issue20453 closed by berker.peksag #20598: argparse docs: '7'.split() is confusing magic http://bugs.python.org/issue20598 closed by martin.panter #21382: Signal module doesnt raises ValueError Exception http://bugs.python.org/issue21382 closed by berker.peksag #22477: GCD in Fractions http://bugs.python.org/issue22477 closed by serhiy.storchaka #23277: Cleanup unused and duplicate imports in tests http://bugs.python.org/issue23277 closed by berker.peksag #23662: Cookie.domain is undocumented http://bugs.python.org/issue23662 closed by berker.peksag #23806: documentation for no_proxy is missing from the python3 urllib http://bugs.python.org/issue23806 closed by orsenthil #23961: IDLE autocomplete window does not automatically close when sel http://bugs.python.org/issue23961 closed by berker.peksag #23986: Inaccuracy about "in" keyword for list and tuple http://bugs.python.org/issue23986 closed by rhettinger #24296: Queue documentation note needed http://bugs.python.org/issue24296 closed by rhettinger #24331: *** Error in `/usr/bin/python': double free or corruption (!pr http://bugs.python.org/issue24331 closed by berker.peksag #24715: Sorting HOW TO: bad example for reverse sort stability http://bugs.python.org/issue24715 closed by rhettinger #24902: http.server: on startup, show host/port as URL http://bugs.python.org/issue24902 closed by berker.peksag #24911: Context manager of socket.socket is not documented http://bugs.python.org/issue24911 closed by martin.panter #25243: decouple string-to-boolean logic from ConfigParser.getboolean http://bugs.python.org/issue25243 closed by rhettinger #25420: "import random" blocks on entropy collection on Linux with low http://bugs.python.org/issue25420 closed by haypo #25551: Event's test_reset_internal_locks too fragile http://bugs.python.org/issue25551 closed by berker.peksag #25788: fileinput.hook_encoded has no way to pass arguments to codecs http://bugs.python.org/issue25788 closed by serhiy.storchaka #25981: Intern namedtuple field names http://bugs.python.org/issue25981 closed by serhiy.storchaka #26041: Update deprecation messages of platform.dist() and platform.li http://bugs.python.org/issue26041 closed by berker.peksag #26089: Duplicated keyword in distutils metadata http://bugs.python.org/issue26089 closed by berker.peksag #26249: Change PyMem_Malloc to use pymalloc allocator http://bugs.python.org/issue26249 closed by haypo #26322: Missing docs for typing.Set http://bugs.python.org/issue26322 closed by berker.peksag #26634: recursive_repr forgets to override __qualname__ of wrapper http://bugs.python.org/issue26634 closed by serhiy.storchaka #26672: regrtest missing in the module name http://bugs.python.org/issue26672 closed by berker.peksag #26733: staticmethod and classmethod are ignored when disassemble clas http://bugs.python.org/issue26733 closed by serhiy.storchaka #26804: Prioritize lowercase proxy variables in urllib.request http://bugs.python.org/issue26804 closed by orsenthil #26822: itemgetter/attrgetter/methodcaller objects ignore keyword argu http://bugs.python.org/issue26822 closed by serhiy.storchaka #26824: Make some macros use Py_TYPE http://bugs.python.org/issue26824 closed by serhiy.storchaka #26827: PyObject *PyInit_myextention -> PyMODINIT_FUNC PyInit_myextent http://bugs.python.org/issue26827 closed by python-dev #26831: ConfigParser parsing failures with default_section and Extende http://bugs.python.org/issue26831 closed by SilentGhost #26837: assertSequenceEqual() raises BytesWarning when format message http://bugs.python.org/issue26837 closed by serhiy.storchaka #26838: sax.xmlreader.InputSource.setCharacterStream() does not work? http://bugs.python.org/issue26838 closed by sourcejedi #26840: Hidden test in test_heapq http://bugs.python.org/issue26840 closed by berker.peksag #26841: Hidden test in ctypes tests http://bugs.python.org/issue26841 closed by berker.peksag #26842: Python Tutorial 4.7.1: Need to explain default parameter lifet http://bugs.python.org/issue26842 closed by rhettinger #26843: tokenize does not include Other_ID_Start or Other_ID_Continue http://bugs.python.org/issue26843 closed by serhiy.storchaka #26846: Workaround for non-standard stdlib.h on Android http://bugs.python.org/issue26846 closed by skrah #26847: filter docs unclear wording http://bugs.python.org/issue26847 closed by georg.brandl #26853: missing symbols in curses and readline modules on android http://bugs.python.org/issue26853 closed by xdegaye #26854: missing header on android for the ossaudiodev module http://bugs.python.org/issue26854 closed by skrah #26857: gethostbyname_r() is broken on android http://bugs.python.org/issue26857 closed by skrah #26863: android lacks some declarations for the posix module http://bugs.python.org/issue26863 closed by skrah #26874: Docstring error in divmod function http://bugs.python.org/issue26874 closed by python-dev #26875: mmap doc gives wrong code example http://bugs.python.org/issue26875 closed by python-dev #26879: Spam http://bugs.python.org/issue26879 closed by ethan.furman #26880: Remove redundant checks from set.__init__ http://bugs.python.org/issue26880 closed by serhiy.storchaka #1145257: shutil.copystat() may fail... http://bugs.python.org/issue1145257 closed by berker.peksag From facundobatista at gmail.com Fri Apr 29 12:37:20 2016 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 29 Apr 2016 13:37:20 -0300 Subject: [Python-Dev] Problemas con modulos In-Reply-To: <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com> References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com> <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com> Message-ID: Just to mention that I already answered this (in Spanish, in private), redirecting to proper lists. Regards, 2016-04-29 5:04 GMT-03:00 Felipe Ruiz via Python-Dev : > Hola, > > Estoy intentando conectarme a twitter para recibir tweets, sin embargo > algunos c?digos que he bajado de internet, me indican que debo de instalar > tweepy y matplotlib, lo hago y sigo recibiendo el mensaje de que no est?n > instalados. tweepy no reporta problemas, lo invoco en la l?nea de comandos, > todo bien, Igual con matplotlib requiere de varias dependencias (dateutils, > numpy, tornado, etc, ya las instales) antes de su instalaci?n, pero ya en el > editor de Python, al ejecutar el c?digo, me aparece el siguiente mensaje: > > ImportError: No module named 'matplotlib' > > alguna idea? > > Felipe > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/facundobatista%40gmail.com > -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista From guido at python.org Fri Apr 29 12:52:03 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 29 Apr 2016 09:52:03 -0700 Subject: [Python-Dev] Problemas con modulos In-Reply-To: References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com> <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com> Message-ID: Thank you Facundo, and thanks for following up here! (I wonder if it wouldn't have been just as efficient if you had just BCC'ed the list to your original response? Or perhaps with a brief English note at the top?) 2016-04-29 9:37 GMT-07:00 Facundo Batista : > Just to mention that I already answered this (in Spanish, in private), > redirecting to proper lists. > > Regards, > > 2016-04-29 5:04 GMT-03:00 Felipe Ruiz via Python-Dev < > python-dev at python.org>: > > Hola, > > > > Estoy intentando conectarme a twitter para recibir tweets, sin embargo > > algunos c?digos que he bajado de internet, me indican que debo de > instalar > > tweepy y matplotlib, lo hago y sigo recibiendo el mensaje de que no est?n > > instalados. tweepy no reporta problemas, lo invoco en la l?nea de > comandos, > > todo bien, Igual con matplotlib requiere de varias dependencias > (dateutils, > > numpy, tornado, etc, ya las instales) antes de su instalaci?n, pero ya > en el > > editor de Python, al ejecutar el c?digo, me aparece el siguiente mensaje: > > > > ImportError: No module named 'matplotlib' > > > > alguna idea? > > > > Felipe > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > > https://mail.python.org/mailman/options/python-dev/facundobatista%40gmail.com > > > > > > -- > . Facundo > > Blog: http://www.taniquetil.com.ar/plog/ > PyAr: http://www.python.org/ar/ > Twitter: @facundobatista > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Apr 29 13:22:07 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 29 Apr 2016 13:22:07 -0400 Subject: [Python-Dev] Convert int() to size_t in Python/C In-Reply-To: <20160429144504.GA17754@diablo.grulicueva.local> References: <20160429144504.GA17754@diablo.grulicueva.local> Message-ID: <92907886-f0bb-4216-b7a1-91774828706b@udel.edu> On 4/29/2016 10:45 AM, Marcos Dione wrote: > > First of all, I'm not subbscribed to the list (too much traffic for > me), so please CC: me in any answers if possible. I am indulging you this once, but the proper solution is to read pydev via the gmane.comp.python.devel mirror at news.gmane.com. You can do so either with a newsreader, part of most mail clients, subscribed to the group, or with a browser pointed at the site. There are multiple problems with CC:. First, the paragraph above may be (properly) snipped from replies, so you will not get replies to replies. Second, 'Reply all' is a nuisance as it takes 'all' too literally. Since I receive via gmane, Thunderbird tries to reply to both gmane and mail.python.org, but the latter is invalid and generates a nuisance email as I am not subscribed. If I were subscribed, sending and posting this twice would also be wrong. Third, and related, CC lists tend to grow. If someone hits 'Reply all' to this message, I will be added to the list, and will received a nuisance duplicate email, unless the person takes the trouble to remove me. (They often do not.) -- Terry Jan Reedy From mdione at grulic.org.ar Fri Apr 29 14:11:15 2016 From: mdione at grulic.org.ar (Marcos Dione) Date: Fri, 29 Apr 2016 20:11:15 +0200 Subject: [Python-Dev] Convert int() to size_t in Python/C In-Reply-To: <1461946726.3529591.593566721.7CF3D87D@webmail.messagingengine.com> Message-ID: <20160429181115.GA19359@diablo.grulicueva.local> On Fri, Apr 29, 2016 at 12:18:46PM -0400, Random832 wrote: > On Fri, Apr 29, 2016, at 10:45, Marcos Dione wrote: > > One possible solution hat was suggested to me in the #python IRC > > channel was to use that, then test if the resulting value is negative, > > and adjust accordingly, but I wonder if there is a cleaner, more general > > solution (for instance, what if the type was something else, like loff_t, > > although for that one in particular there *is* a convertion > > function/macro). > > In principle, you could just use PyLong_AsUnsignedLong (or LongLong), > and raise OverflowError manually if the value happens to be out of > size_t's range. (99% sure that on every linux platform unsigned long is > the same size as size_t. > > But it's not like it'd be the first function in OS to call a system call > that takes a size_t. Read just uses Py_ssize_t. Write uses the buffer > protocol, which uses Py_ssize_t. How concerned are you really about the > lost range here? What does the system call return (its return type is > ssize_t) if it writes more than SSIZE_MAX bytes? (This shouldn't be hard > to test, just try copying a >2GB file on a 32-bit system) It's a very good point, but I don't have any 32 bits systems around with a kernel-4.5. I'll try to figure it out and/or ask in the kernel ML. > I'm more curious about what your calling convention is going to be for > off_in and off_out. I can't think of any other interfaces that have > optional output parameters. Python functions generally deal with output > parameters in the underlying C function (there are a few examples in > math) by returning a tuple. These are not output parameters, even if they're pointers. they'r using the NULL pointer to signal that the current offsets should not be touched, to differentiate from a offset of 0. Something that in Python we would use None. From random832 at fastmail.com Fri Apr 29 14:25:31 2016 From: random832 at fastmail.com (Random832) Date: Fri, 29 Apr 2016 14:25:31 -0400 Subject: [Python-Dev] Convert int() to size_t in Python/C In-Reply-To: <20160429181115.GA19359@diablo.grulicueva.local> References: <20160429181115.GA19359@diablo.grulicueva.local> Message-ID: <1461954331.3559602.593676209.6D0351F5@webmail.messagingengine.com> On Fri, Apr 29, 2016, at 14:11, Marcos Dione wrote: > These are not output parameters, even if they're pointers. they'r > using the NULL pointer to signal that the current offsets should not be > touched, to differentiate from a offset of 0. Something that in Python we > would use None. That's not actually true according to the documentation. (And if it were, they could simply use -1 rather than a null pointer) If you pass a null pointer in, the file's offset is used and *is* updated, same as if you used an ordinary read/write call. If you pass a value in, that value is used *and updated* (which makes it an output parameter) and the file's offset is left alone. Documentation below, I've >>>highlighted<<< the part that shows they are used as output parameters: The following semantics apply for off_in, and similar statements apply to off_out: * If off_in is NULL, then bytes are read from fd_in starting from the file offset, and the file offset is adjusted by the number of bytes copied. * If off_in is not NULL, then off_in must point to a buffer that specifies the starting offset where bytes from fd_in will be read. The file offset of fd_in is not changed, >>>but off_in is adjusted appropriately.<<< From stephen at xemacs.org Fri Apr 29 14:35:51 2016 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 30 Apr 2016 03:35:51 +0900 Subject: [Python-Dev] Problemas con modulos In-Reply-To: References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com> <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com> Message-ID: <22307.43399.857616.124558@turnbull.sk.tsukuba.ac.jp> Guido van Rossum writes: > Thank you Facundo, and thanks for following up here! (I wonder if it > wouldn't have been just as efficient if you had just BCC'ed the list to > your original response? Or perhaps with a brief English note at the > top?) BCC'ing lists usually gets your post held, rejected, or just discarded, although I don't have access to the python-dev configuration. IIRC reject is the default in Mailman. From facundobatista at gmail.com Fri Apr 29 15:19:02 2016 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 29 Apr 2016 16:19:02 -0300 Subject: [Python-Dev] Problemas con modulos In-Reply-To: References: <1634012024.3637822.1461917048943.JavaMail.yahoo.ref@mail.yahoo.com> <1634012024.3637822.1461917048943.JavaMail.yahoo@mail.yahoo.com> Message-ID: 2016-04-29 13:52 GMT-03:00 Guido van Rossum : > Thank you Facundo, and thanks for following up here! (I wonder if it > wouldn't have been just as efficient if you had just BCC'ed the list to your > original response? Or perhaps with a brief English note at the top?) Probably yes, I didn't want to mess the list with non-english stuff :) Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista From vadmium+py at gmail.com Fri Apr 29 19:26:53 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Fri, 29 Apr 2016 23:26:53 +0000 Subject: [Python-Dev] Convert int() to size_t in Python/C In-Reply-To: <1461954331.3559602.593676209.6D0351F5@webmail.messagingengine.com> References: <20160429181115.GA19359@diablo.grulicueva.local> <1461954331.3559602.593676209.6D0351F5@webmail.messagingengine.com> Message-ID: On 29 April 2016 at 18:25, Random832 wrote: > On Fri, Apr 29, 2016, at 14:11, Marcos Dione wrote: >> These are not output parameters, even if they're pointers. they'r >> using the NULL pointer to signal that the current offsets should not be >> touched, to differentiate from a offset of 0. Something that in Python we >> would use None. > > That's not actually true according to the documentation. (And if it > were, they could simply use -1 rather than a null pointer) > . . . > * If off_in is not NULL, then off_in must point to a buffer that > specifies the starting offset where bytes from fd_in will be > read. > The file offset of fd_in is not changed, >>>but off_in is > adjusted > appropriately.<<< Linux?s sendfile() syscall takes a similar offset parameter that may be updated, but Python?s os.sendfile() wrapper does not return the updated offset. Do you think we need to return the updated offsets for copy_file_range()? From vadmium+py at gmail.com Fri Apr 29 19:42:03 2016 From: vadmium+py at gmail.com (Martin Panter) Date: Fri, 29 Apr 2016 23:42:03 +0000 Subject: [Python-Dev] Convert int() to size_t in Python/C In-Reply-To: <20160429181115.GA19359@diablo.grulicueva.local> References: <1461946726.3529591.593566721.7CF3D87D@webmail.messagingengine.com> <20160429181115.GA19359@diablo.grulicueva.local> Message-ID: On 29 April 2016 at 18:11, Marcos Dione wrote: > On Fri, Apr 29, 2016 at 12:18:46PM -0400, Random832 wrote: >> On Fri, Apr 29, 2016, at 10:45, Marcos Dione wrote: >> > One possible solution hat was suggested to me in the #python IRC >> > channel was to use that, then test if the resulting value is negative, >> > and adjust accordingly, but I wonder if there is a cleaner, more general >> > solution (for instance, what if the type was something else, like loff_t, >> > although for that one in particular there *is* a convertion >> > function/macro). >> >> In principle, you could just use PyLong_AsUnsignedLong (or LongLong), >> and raise OverflowError manually if the value happens to be out of >> size_t's range. (99% sure that on every linux platform unsigned long is >> the same size as size_t. >> >> But it's not like it'd be the first function in OS to call a system call >> that takes a size_t. Read just uses Py_ssize_t. Write uses the buffer >> protocol, which uses Py_ssize_t. How concerned are you really about the >> lost range here? What does the system call return (its return type is >> ssize_t) if it writes more than SSIZE_MAX bytes? (This shouldn't be hard >> to test, just try copying a >2GB file on a 32-bit system) I would probably just use Py_ssize_t, since that is what the return value is. Otherwise, a large positive count input could return a negative value, which would be inconsistent, and could be mistaken as an error. > It's a very good point, but I don't have any 32 bits systems around > with a kernel-4.5. I'll try to figure it out and/or ask in the kernel ML. Maybe you can compile a 32-bit program and run it on a 64-bit computer (gcc -m32). From greg.ewing at canterbury.ac.nz Sat Apr 30 21:22:56 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 01 May 2016 13:22:56 +1200 Subject: [Python-Dev] Bug in 2to3 concerning import statements? Message-ID: <57255A70.3000404@canterbury.ac.nz> It seems that 2to3 is a bit simplistic when it comes to translating import statements. I have a module GUI.py2exe containing: import py2exe.mf as modulefinder 2to3 translates this into: from . import py2exe.mf as modulefinder which is a syntax error. It looks like 2to3 is getting confused by the fact that there is both a submodule and a top-level module here called py2exe. But the original can only be an absolute import because it has a dot in it, so 2to3 shouldn't be translating it into a relative one. Putting "from __future__ import absolute_import" at the top fixes it, but I shouldn't have to do that, should I? -- Greg